devblog
This commit is contained in:
parent
0ad35db26b
commit
f400e0ebec
1 changed files with 56 additions and 0 deletions
56
doc/devblog/day_253__sqlite_for_incremental_fsck.mdwn
Normal file
56
doc/devblog/day_253__sqlite_for_incremental_fsck.mdwn
Normal file
|
@ -0,0 +1,56 @@
|
||||||
|
Yesterday I did a little more investigation of key/value stores.
|
||||||
|
I'd love a pure haskell key/value store that didn't buffer everything in
|
||||||
|
memory, and that allowed concurrent readers, and was ACID, and production
|
||||||
|
quality. But so far, I have not found anything that meets all those
|
||||||
|
criteria. It seems that sqlite is the best choice for now.
|
||||||
|
|
||||||
|
Started working on the `database` branch today. The plan is to use
|
||||||
|
sqlite for incremental fsck first, and if that works well, do the rest
|
||||||
|
of what's planned in [[design/caching_database]].
|
||||||
|
|
||||||
|
At least for now, I'm going to use a dedicated database file for each
|
||||||
|
different thing. (This may not be as space-efficient due to lacking
|
||||||
|
normalization, but it keeps things simple.)
|
||||||
|
|
||||||
|
So, .git/annex/fsck.db will be used by incremental fsck, and it has
|
||||||
|
a super simple Persistent database schema:
|
||||||
|
|
||||||
|
[[!format haskell """
|
||||||
|
Fscked
|
||||||
|
key SKey
|
||||||
|
UniqueKey key
|
||||||
|
"""]]
|
||||||
|
|
||||||
|
It was pretty easy to implement this and make incremental fsck use it. The
|
||||||
|
hard part is making it both fast and robust.
|
||||||
|
|
||||||
|
At first, I was doing everything inside a single `runSqlite` action.
|
||||||
|
Including creating the table. But, it turns out that runs as a single
|
||||||
|
transaction, and if it was interrupted, this left the database in a
|
||||||
|
state where it exists, but has no tables. Hard to recover from.
|
||||||
|
|
||||||
|
So, I separated out creating the database, made that be done in a separate
|
||||||
|
transation and fully atomically. Now `fsck --incremental` could be crtl-c'd
|
||||||
|
and resumed with `fsck --more`, but it would lose the transaction and so
|
||||||
|
not remember anything had been checked.
|
||||||
|
|
||||||
|
To fix that, I tried making a separate transation per file fscked. That
|
||||||
|
worked, and it resumes nicely where it left off, but all those transactions
|
||||||
|
made it much slower.
|
||||||
|
|
||||||
|
To fix the speed, I made it commit just one transaction per minute. This
|
||||||
|
seems like an ok balance. Having fsck re-do one minute's work when restarting
|
||||||
|
an interrupted incremental fsck is perfectly reasonable, and now the speed,
|
||||||
|
using the sqlite database, is nearly as fast as the old sticky bit hack was.
|
||||||
|
(Specifically, 6m7s old vs 6m27s new, fscking 37000 files from cold cache
|
||||||
|
in --fast mode.)
|
||||||
|
|
||||||
|
There is still a problem with multiple concurrent `fsck --more`
|
||||||
|
failing. Probably a concurrent writer problem? And, some porting will be
|
||||||
|
required to get sqlite and persistent working on Windows and Android.
|
||||||
|
So the branch isn't ready to merge yet, but it seems promising.
|
||||||
|
|
||||||
|
In retrospect, while incremental fsck has the simplest database schema, it
|
||||||
|
might be one of the harder things listed in [[design/caching_database]],
|
||||||
|
just because it involves so many writes to the database. The other use
|
||||||
|
cases are more read heavy.
|
Loading…
Add table
Add a link
Reference in a new issue