Added a comment

This commit is contained in:
http://joeyh.name/ 2013-12-02 17:58:55 +00:00 committed by admin
parent bc786b6f06
commit b6fe18b87d

View file

@ -0,0 +1,28 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.64"
subject="comment 1"
date="2013-12-02T17:58:55Z"
content="""
See [[todo/incremental_fsck]] for background.
Options:
* using per-object files
* using sqllite or another relational database
* using git as the database
* using a key/value store
Per-object files have the problem that they clutter up the .git/annex/objects/ tree quite a lot with info only used by fsck. We have a similar clutter problem with direct mode mapping files, and for various reasons (including [[direct mode mappings scale badly with thousands of identical files|bugs/\"Adding_4923_files\"_is_really_slow]]) I have been hoping to move that data to a better storage, perhaps a database, eventually. Switching fsck to use a database first might be a good first step to using a database for the more important direct mode mapping storage. If the fsck database goes wrong, the worse that happens is some extra incremental fscking.
Using git as the database is possible. Just store the info in a separate git ref, like the git-annex branch, but that is not synced. But it will use more disk over time. Probably not the best choice.
There are plenty of sqllite interfaces for haskell. They all do have the problem that the C sqlite has to be installed. This will make `cabal install git-annex` harder. It is currently possible to do a `cabal install git-annex` with flags that avoid needing to install any C libraries. This is useful for my sanity, since otherwise people want hand-holding on installing libraries on OSX and stuff.
Perhaps there's a pure haskell key/value store that would be a better choice than sqllite. I do not anticipate git-annex needing complex relational database storage, if we look at everything it needs to store so far, key/value is enough. (The more complicated stuff is stored in git anyway.)
* <http://hackage.haskell.org/package/io-storage> is actually memory only, not suitable
* <http://hackage.haskell.org/package/acid-state> is I think the gold standard in its area. Strongly typed. Has some Template Haskell stuff but it's optional. Packaged in Debian already, but only on TH capable architectures so that would need to be fixed. Seems to have the disadvantage that it's not really a key/value store, so works by loading the *entire* data into ram. This is a problem since git-annex wants to run in constant memory.
* <http://hackage.haskell.org/package/keyvaluehash> has some caveats about not being able to remove old keys
* <http://hackage.haskell.org/package/HongoDB> looks possible, has only had one release in 2011 though and undocumented
"""]]