Speeds up eg git-annex sync --content by up to 50%. When it does not need
to transfer or drop anything, it now noops a lot more quickly.
I didn't see anything else in sync --content noop loop that could really
be sped up. It has to cat git objects to keys, stat object files, etc.
Sponsored-by: unqueued on Patreon
Speed up importing trees from special remotes somewhat by avoiding
redundant writes to sqlite database.
Before, import would write to both the git-annex branch and also to the
sqlite database. But then the next time it was run, needsUpdateFromLog
would see the branch had changed, so run updateFromLog, which would make
the same writes to the sqlite database a second time.
Now import writes only to the git-annex branch. The next time it's run,
needsUpdateFromLog sees that the branch has changed and so calls
updateFromLog, which updates the sqlite database.
Why defer the write to the sqlite database like this? It seems that it
could write to the database as it goes, and at the end call
recordAnnexBranchTree to indicate that the information in the git-annex
branch has all been written to the cidsdb. That would avoid the second
import doing extra work.
But, there could be other processes running at the same time, and one of
them may update the git-annex branch, eg merging a remote git-annex branch
into it. Any cids logs on that merged git-annex branch would not be
reflected in the cidsdb yet. If the import then called
recordAnnexBranchTree, the cidsdb would never get updated with that merged
information.
I don't think there's a good way to prevent, or to detect that situation.
So, it can't call recordAnnexBranchTree at the end. So it might as well
wait until the next run and do updateFromLog then. It could instead do
updateFromLog at the end, but it's going to check needsUpdateFromLog
at the beginning anyway.
Note that the database writes were queued, so there is already a cidmap
that is used to remember changes that the current process has made.
So, omitting database writes can't change the behavior of the current
process.
Also note that thirdpartypopulatedimport uses recordcidkeyindb, which
reflects what it already did. That code path does not use the cidmap,
but does not need to query it either. It might be possible to make that
code path also only update the git-annex branch and not the db, but I
haven't checked.
Sponsored-by: Noam Kremen on Patreon
The obvious way to fix this would be to adapt lines to split on null.
However, it's actually nontrivial to rewrite lines. In particular it has a
weird implementation to avoid a space leak. See:
https://gitlab.haskell.org/ghc/ghc/-/issues/4334
Also, while that is a small amount of code, it's covered by a rather
complex copyright and I'd have to include that copyright in git-annex.
So, I opted to filter out the trailing empty string instead.
Sponsored-by: Dartmouth College's Datalad project
POSIX character classes allowed in globs was a surprise, but just
happened to fall out of the implementation in a way that seems
to behave correctly.
mdwn2man has to be tweaked to render the example properly.
The line I modified is the one that strips ikiwiki wikilinks out of the
man page.
Sponsored-by: Graham Spencer on Patreon