diff --git a/doc/todo/inneficiency_on_slow_filesystems_opening_nonexistant_journal_files.mdwn b/doc/todo/inneficiency_on_slow_filesystems_opening_nonexistant_journal_files.mdwn new file mode 100644 index 0000000000..d87c374c89 --- /dev/null +++ b/doc/todo/inneficiency_on_slow_filesystems_opening_nonexistant_journal_files.mdwn @@ -0,0 +1,31 @@ +`git-annex info uuid` was observed to be slow on a slow NFS, because +it is opening lots of .git/annex/journal files that DNE. So does +`git annex find --in remote`. + +Normally the journal is empty, but each query of a file from the git-annex +branch still tries to open the corresponding journal file. + +It seems that this could be improved by making such query commands +either commit the journal to the branch once at startup, or check if the +journal is empty, and once there's known to be nothing in the journal, +avoid opening files from there. + +But: Concurrency. Another process may be writing changes to the git-annex +branch, via the journal, and so this would be a behavior change. Mostly +that seems acceptible, because there's no defined ordering of events in +such a situation, and this change only makes it so that the writes +effectively always come after the reads. + +But: Batch jobs. When a git-annex command is run in a batch mode, +its caller can currently safely expect that running some other command, +that modifies the git-annex branch, followed by asking the batch +mode command to query it will yield a result that takes the earlier write +into account. + +So, this optimisation seems it would be limited to commands that +are not in batch mode and do strictly read-only queries. Which seems a bit +hard to plumb through to the git-annex branch reading code. + +An easier alternative would be an option that bypasses reading the journal. +But maybe there's some other, better way to avoid this slow case? +--[[Joey]]