I doubt that warning has ever been right, but I'm sure it is not right
now.
For there to be a risk of git gc deleting objects that are in the annex
index, journal files would have to be staged into it, and deleted, but
the index not committed to the git-annex branch. And AFAICS, there is no
code path where that actually happens. I considered adding one recently,
but didn't.
The way it actually works is, as long as the user has annex.alwayscommit=false
the data lives in the journal, where it's safe from git gc. Then when
git-annex is run w/o that config, the journal is staged into the index,
which is immediately committed to the branch. There's no window where
git gc could delete the objects, because git gc only deletes objects
after some time (2 weeks by default).
Now, if git-annex gets suspended at just the wrong time, or interrupted,
then yes, it's possible. But doesn't matter whether that config was ever
set or not. And many uses of git-annex also recover from that situation
by committing to the git-annex branch.
A couple of these were probably actual bugs in edge cases. Most of the
changes I'm fine with. The fact that aeson's object returns sometihng
that we know will be an Object, but the type checker does not know is
kind of annoying.
The journal read optimisation in aeca7c220 later got fixed in eedd73b84
to stage and commit any files that were left in the journal by a
previous git-annex run. That's necessary for the optimisation to work
correctly. But it also meant that alwayscommit=false started committing
the previous git-annex processes journalled changes, which defeated the
purpose of the config setting entirely.
So, disable the optimisation when alwayscommit=false, leaving the
files in the journal and not committing them. See my comments on the bug
report for why this seemed the best approach.
Also fixes a problem when annex.merge-annex-branches=false and there
are changes in the journal. That config indirectly prevents committing
the journal. (Which seems a bit odd given its name, but it always has..)
So, when there were changes in the journal, perhaps left there due to
alwayscommit=false being set before, the optimisation would prevent
git-annex from reading the journal files, and it would operate with out
of date information.
This was very susprising to me that it was not caught by -Wall, so I
enabled -Wincomplete-uni-patterns to catch such things. It found a
second one just lines above, but no others anywhere.
This change does impact git-annex config
eg "git annex config --set annex.addunlocked on"
will store "on" and new git-annex will understand that value, while
old git-annex will error:
git-annex: bad annex.addunlocked configuration in git annex config:
Parse failure: near "on"
That seems acceptable.
Not special remote configs that are only documented as =true or =false
however. Having git-annex support other values for those would break
backwards compatability when used with old versions of git-annex. And
older versions ignore invalid special remote configs.. That would not
be a good combination.
Eg"core.bare" is the same as "core.bare = true".
Note that git treats "core.bare =" the same as "core.bare = false", so the
code had to become more complicated in order to treat the absense of a
value differently than an empty value. Ugh.
Git has an obnoxious special case in git config, a line "foo" is the same
as "foo = true". That means there is no way to examine the output of
git config and tell if it was run with --null or not, since a "foo"
in the first line could be such a boolean, or could be followed by its
value on the next line if --null were used.
So, rather than trying to do such a detection, track the style of config
at all the points where it's generated.
aeca7c2207 was predicated on the
assumption that updateTo would stage any journal files, but in one case
it did not actually do so. The test suite happened to expose the bug.
Around 3% total speedup.
Profiling git annex find --not --in web, it's now bytestring end-to-end,
and there is only a little added overhead in eg accessing the Annex
state MVar (3%). The rest of the runtime is spent reading symlinks, and
in attoparsec.
This feels like the end of the optimisation road, without a major change
like caching information for faster queries.