git-annex/doc/todo/optimise_journal_access.mdwn

Often a command will need to read a number of files from the git-annex
branch, and it uses getJournalFile for each to check for any journalled
change that has not reached the branch. But typically, the journal is empty
and in such a case, that's a lot of time spent trying to open journal files
that DNE.

Profiling eg, `git annex find --in web` shows things called by getJournalFile
use around 5% of runtime.

What if, once at startup, it checked if the journal was entirely empty.
If so, it can remember that, and avoid reading journal files.
Perhaps paired with staging the journal if it's not empty.

When a process writes to the journal, it will need to update its state
to remember it's no longer empty.

This could lead to behavior changes in some cases where one command is
writing changes and another command used to read them from the journal and
may no longer do so. But any such behavior change is of a behavior that
used to involve a race; the reader could just as well be ahead of the
writer and it would have already behaved as it would after the change.

> Hmm, not so fast. If the user has two --batch processes, one that makes
> changes and the other that queries, they will expect the querying process
> to see the changes after they were made. There's no race, the user can
> control which process runs by feeding batch inputs to them.
> 
> So, --batch and the assistant, as well as batch-like things that don't
> use --batch will need to disable this optimisation it seems. --[[Joey]]

[[!tag confirmed]]

>> [[done]] speedup was around 5% --[[Joey]]
todo 2019-12-18 20:11:14 +00:00			`Often a command will need to read a number of files from the git-annex`
			`branch, and it uses getJournalFile for each to check for any journalled`
			`change that has not reached the branch. But typically, the journal is empty`
			`and in such a case, that's a lot of time spent trying to open journal files`
			`that DNE.`

			Profiling eg, `git annex find --in web` shows things called by getJournalFile
			`use around 5% of runtime.`

			`What if, once at startup, it checked if the journal was entirely empty.`
			`If so, it can remember that, and avoid reading journal files.`
			`Perhaps paired with staging the journal if it's not empty.`

Sped up query commands that read the git-annex branch by around 5% The only price paid is one additional MVar read per write to the journal. Presumably writing a journal file dominiates over a MVar read time by several orders of magnitude. --batch does not get the speedup because then it needs to notice when another process has made a change. Also made the assistant and other damon modes bypass the optimisation, which would not help them anyway. 2020-04-09 17:54:43 +00:00			`When a process writes to the journal, it will need to update its state`
			`to remember it's no longer empty.`

todo 2019-12-18 20:11:14 +00:00			`This could lead to behavior changes in some cases where one command is`
			`writing changes and another command used to read them from the journal and`
			`may no longer do so. But any such behavior change is of a behavior that`
			`used to involve a race; the reader could just as well be ahead of the`
			`writer and it would have already behaved as it would after the change.`

Sped up query commands that read the git-annex branch by around 5% The only price paid is one additional MVar read per write to the journal. Presumably writing a journal file dominiates over a MVar read time by several orders of magnitude. --batch does not get the speedup because then it needs to notice when another process has made a change. Also made the assistant and other damon modes bypass the optimisation, which would not help them anyway. 2020-04-09 17:54:43 +00:00			`> Hmm, not so fast. If the user has two --batch processes, one that makes`
			`> changes and the other that queries, they will expect the querying process`
			`> to see the changes after they were made. There's no race, the user can`
			`> control which process runs by feeding batch inputs to them.`
			`>`
			`> So, --batch and the assistant, as well as batch-like things that don't`
			`> use --batch will need to disable this optimisation it seems. --[[Joey]]`
tagged the past 2 years of open todos and followed up to a few of them also moved some that were really bug reports to bugs/ and closed a couple 2020-01-30 19:22:05 +00:00
			`[[!tag confirmed]]`
Sped up query commands that read the git-annex branch by around 5% The only price paid is one additional MVar read per write to the journal. Presumably writing a journal file dominiates over a MVar read time by several orders of magnitude. --batch does not get the speedup because then it needs to notice when another process has made a change. Also made the assistant and other damon modes bypass the optimisation, which would not help them anyway. 2020-04-09 17:54:43 +00:00
			`>> [[done]] speedup was around 5% --[[Joey]]`