The one exception is in Utility.Daemon. As long as a process only
daemonizes once, which seems reasonable, and as long as it avoids calling
checkDaemon once it's already running as a daemon, the fcntl locking
gotchas won't be a problem there.
Annex.LockFile has it's own separate lock pool layer, which has been
renamed to LockCache. This is a persistent cache of locks that persist
until closed.
This is not quite done; lockContent stil needs to be converted.
Also, moved the database to a subdir, as there are multiple files.
This seems to work well with concurrent fscks, although they still do
redundant work due to the commit granularity. Occasionally two writes will
conflict, and one is then deferred and happens later.
Except, with 3 concurrent fscks, I got failures:
git-annex: user error (SQLite3 returned ErrorBusy while attempting to perform prepare "SELECT \"fscked\".\"key\"\nFROM \"fscked\"\nWHERE \"fscked\".\"key\" = ?\n": database is locked)
Argh!!!
Still not robust enough. I have 3 fscks running concurrently, and am
seeing:
("commit deferred",user error (SQLite3 returned ErrorBusy while attempting
to perform step.))
and
git-annex: user error (SQLite3 returned ErrorBusy while attempting to perform prepare "SELECT \"fscked\".\"key\"\nFROM \"fscked\"\nWHERE \"fscked\".\"key\" = ?\n": database is locked)
Sqlite doesn't support multiple concurrent writers
at all. One of them will fail to write. It's not even possible to have two
processes building up separate transactions at the same time. Before using
sqlite, incremental fsck could work perfectly well with multiple fsck
processes running concurrently. I'd like to keep that working.
My partial solution, so far, is to make git-annex buffer writes, and every
so often send them all to sqlite at once, in a transaction. So most of the
time, nothing is writing to the database. (And if it gets unlucky and
a write fails due to a collision with another writer, it can just wait and
retry the write later.) This lets multiple processes write to the database
successfully.
But, for the purposes of concurrent, incremental fsck, it's not ideal.
Each process doesn't immediately learn of files that another process has
checked. So they'll tend to do redundant work.
Only way I can see to improve this is to use some other mechanism for
short-term IPC between the fsck processes. Not yet done.
----
Also, make addDb check if an item is in the database already, and not try
to re-add it. That fixes an intermittent crash with
"SQLite3 returned ErrorConstraint while attempting to perform step."
I am not 100% sure why; it only started happening when I moved write
buffering into the queue. It seemed to generally happen on the same file
each time, so could just be due to multiple files having the same key.
However, I doubt my sound repo has many duplicate keys, and I suspect
something else is going on.
----
Updated benchmark, with the 1000 item queue: 6m33.808s
Turns out sqlite does not like having its database deleted out from
underneath it. It might suffice to empty the table, but I would rather
start each fsck over with a new database, so I added a lock file, and
running incremental fscks use a shared lock.
This leaves one concurrency bug left; running two concurrent fsck --more
will lead to: "SQLite3 returned ErrorBusy while attempting to perform step."
and one or both will fail. This is a concurrent writers problem.
Database.Handle can now be given a CommitPolicy, making it easy to specify
transaction granularity.
Benchmarking the old git-annex incremental fsck that flips sticky bits
to the new that uses sqlite, running in a repo with 37000 annexed files,
both from cold cache:
old: 6m6.906s
new: 6m26.913s
This commit was sponsored by TasLUG.
Did not keep backwards compat for sticky bit records. An incremental fsck
that is already in progress will start over on upgrade to this version.
This is not yet ready for merging. The autobuilders need to have sqlite
installed.
Also, interrupting a fsck --incremental does not commit the database.
So, resuming with fsck --more restarts from beginning.
Memory: Constant during a fsck of tens of thousands of files.
(But, it does seem to buffer whole transation in memory, so
may really scale with number of files.)
CPU: ?