todo item based on behavior yoh showed me

This commit is contained in:
Joey Hess 2019-03-28 14:04:20 -04:00
parent 2c735f1747
commit b09c6e3016
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38

View file

@ -0,0 +1,31 @@
`git-annex info uuid` was observed to be slow on a slow NFS, because
it is opening lots of .git/annex/journal files that DNE. So does
`git annex find --in remote`.
Normally the journal is empty, but each query of a file from the git-annex
branch still tries to open the corresponding journal file.
It seems that this could be improved by making such query commands
either commit the journal to the branch once at startup, or check if the
journal is empty, and once there's known to be nothing in the journal,
avoid opening files from there.
But: Concurrency. Another process may be writing changes to the git-annex
branch, via the journal, and so this would be a behavior change. Mostly
that seems acceptible, because there's no defined ordering of events in
such a situation, and this change only makes it so that the writes
effectively always come after the reads.
But: Batch jobs. When a git-annex command is run in a batch mode,
its caller can currently safely expect that running some other command,
that modifies the git-annex branch, followed by asking the batch
mode command to query it will yield a result that takes the earlier write
into account.
So, this optimisation seems it would be limited to commands that
are not in batch mode and do strictly read-only queries. Which seems a bit
hard to plumb through to the git-annex branch reading code.
An easier alternative would be an option that bypasses reading the journal.
But maybe there's some other, better way to avoid this slow case?
--[[Joey]]