This was a bit disappointing, I was hoping for a 2x speedup. But, I think
the metadata lookup is wasting a lot of time and also needs to be made to
stream.
The changes to catObjectStreamLsTree were benchmarked to not also speed
up --all around 3% more. Seems I managed to make it polymorphic after all.
planned to use for an optimisation
most things using stagedDetails were not expecting to get dup files in a
conflicted merge and deal with them, so converted them to use
inRepoDetails.
Turns out the %(rest) trick was not needed. Instead, just maintain a
list of files we've asked for, and each cat-file response is for the
next file in the list.
This actually benchmarks 25% faster than before! Very surprising, but it
must be due to needing to shove less data through the pipe, and parse
less.
This assumes that no location log files will have a newline or carriage
return in their name. catObjectStream skips any such files due to
cat-file not supporting them.
Keys have been prevented from containing newlines since 2011,
commit 480495beb4. If some old repo
had a key with a newline in it, --all will just skip processing that key.
Other things, like .git/annex/unused files certianly assume no newlines in
keys too, and AFAICR, such keys never actually worked.
Carriage return is escaped by preSanitizeKeyName since 2013. WORM keys
generated before that point could perhaps contain a CR. (URL probably not,
http probably doesn't support an URL with a raw CR in it.) So, added
a warning in fsck about such keys. Although, fsck --all will naturally
skip them, so won't be able to warn about them. Not entirely
satisfactory, but I'll bet there are not really any such keys in
existence.
Thanks to Lukey for finding this optimisation.
Only supported by some special remotes: directory
I need to check the rest and they're currently missing methods until I do.
git-annex sync --no-content does not yet use this to do imports
Looked into this, and dropKey from web actually removes the url,
so git-annex won't try to get content from it.
So, if lockContent were implemented for web, and the web was left as the
only thing containing an object, another repo could at the same time
drop from web and remove its url, leaving no way to get the object.
Add to that, of course, the web is typically set untrusted, and so
implementing lockContent would not then be useful.
Similar reasoning applies to the bittorrent special remote, as well
as the fact that it does not even implement checkKey.
Audited for openFile and openFd, and this fixes all the ones I found
where an async exception could prevent the file getting closed.
Except for the lock pool, which is a whole other can of worms.
Masking ensures that EndStderrHandler gets written, so the helper
threads shut down.
However, nothing currently guarantees that calls to closeP2PSshConnection
are async exception safe, so made a note about it.
At this point, I've audited all calls to async, and made them all async
exception safe, except for ones in the assistant, and a few in leaf
commands (remotedaemon, enable-tor, multicast, p2p) which don't need to
be.
This handles all createProcessSuccess callers, and aside from process
pools, the complete conversion of all process running to async exception
safety should be complete now.
Also, was able to remove from Utility.Process the old API that I now
know was not a good idea. And proof it was bad: The code size went *down*,
despite there being a fair bit of boilerplate for some future API to
reduce.
Added annex.skipunknown git config, that can be set to false to change the
behavior of commands like `git annex get foo*`, to not skip over files/dirs
that are not checked into git and are explicitly listed in the command
line.
Significant complexity was needed to handle git-annex add, which uses some
git ls-files calls, but needs to not use --error-unmatch because of course
the files are not known to git.
annex.skipunknown is planned to change to default to false in a
git-annex release in early 2022. There's a todo for that.
Todo item is done at last.
Might later want to think about testing some other types of remotes that
can be tested locally. The git remote itself is probably already well
enough tested by the test suite that testremote is not needed. Could
test things like bup, or rsync to a local directory. Or even external,
although that would require embedding an external special remote program
into the test suite..