git-annex/doc/todo/optimise_journal_access.mdwn

34 lines
1.5 KiB
Text
Raw Normal View History

2019-12-18 20:11:14 +00:00
Often a command will need to read a number of files from the git-annex
branch, and it uses getJournalFile for each to check for any journalled
change that has not reached the branch. But typically, the journal is empty
and in such a case, that's a lot of time spent trying to open journal files
that DNE.
Profiling eg, `git annex find --in web` shows things called by getJournalFile
use around 5% of runtime.
What if, once at startup, it checked if the journal was entirely empty.
If so, it can remember that, and avoid reading journal files.
Perhaps paired with staging the journal if it's not empty.
When a process writes to the journal, it will need to update its state
to remember it's no longer empty.
2019-12-18 20:11:14 +00:00
This could lead to behavior changes in some cases where one command is
writing changes and another command used to read them from the journal and
may no longer do so. But any such behavior change is of a behavior that
used to involve a race; the reader could just as well be ahead of the
writer and it would have already behaved as it would after the change.
> Hmm, not so fast. If the user has two --batch processes, one that makes
> changes and the other that queries, they will expect the querying process
> to see the changes after they were made. There's no race, the user can
> control which process runs by feeding batch inputs to them.
>
> So, --batch and the assistant, as well as batch-like things that don't
> use --batch will need to disable this optimisation it seems. --[[Joey]]
[[!tag confirmed]]
>> [[done]] speedup was around 5% --[[Joey]]