2019-12-18 20:11:14 +00:00
|
|
|
Often a command will need to read a number of files from the git-annex
|
|
|
|
branch, and it uses getJournalFile for each to check for any journalled
|
|
|
|
change that has not reached the branch. But typically, the journal is empty
|
|
|
|
and in such a case, that's a lot of time spent trying to open journal files
|
|
|
|
that DNE.
|
|
|
|
|
|
|
|
Profiling eg, `git annex find --in web` shows things called by getJournalFile
|
|
|
|
use around 5% of runtime.
|
|
|
|
|
|
|
|
What if, once at startup, it checked if the journal was entirely empty.
|
|
|
|
If so, it can remember that, and avoid reading journal files.
|
|
|
|
Perhaps paired with staging the journal if it's not empty.
|
|
|
|
|
2020-04-09 17:54:43 +00:00
|
|
|
When a process writes to the journal, it will need to update its state
|
|
|
|
to remember it's no longer empty.
|
|
|
|
|
2019-12-18 20:11:14 +00:00
|
|
|
This could lead to behavior changes in some cases where one command is
|
|
|
|
writing changes and another command used to read them from the journal and
|
|
|
|
may no longer do so. But any such behavior change is of a behavior that
|
|
|
|
used to involve a race; the reader could just as well be ahead of the
|
|
|
|
writer and it would have already behaved as it would after the change.
|
|
|
|
|
2020-04-09 17:54:43 +00:00
|
|
|
> Hmm, not so fast. If the user has two --batch processes, one that makes
|
|
|
|
> changes and the other that queries, they will expect the querying process
|
|
|
|
> to see the changes after they were made. There's no race, the user can
|
|
|
|
> control which process runs by feeding batch inputs to them.
|
|
|
|
>
|
|
|
|
> So, --batch and the assistant, as well as batch-like things that don't
|
|
|
|
> use --batch will need to disable this optimisation it seems. --[[Joey]]
|
2020-01-30 19:22:05 +00:00
|
|
|
|
|
|
|
[[!tag confirmed]]
|
2020-04-09 17:54:43 +00:00
|
|
|
|
|
|
|
>> [[done]] speedup was around 5% --[[Joey]]
|