It was moved to avoid a race, but that's now avoided in another way.
I prefer having it here, because this way if it somehow fails and
deletes the locpath that is going to be used, at least it will get
re-created.
Use a different tmp directory so the cache cleanup won't delete the
locpath directory while it's being populated.
This does change the hash used for the locpath directory, but it already
changed in f0ec725234
This fixes a race between two runshells from two different
bundles. One could have run the cache cleanup code, seen the
LOCPATH the other one was in the process of populating, which didn't
have a base or a buildid file written yet, and so the cache cleanup code
would delete it out from under the other process.
Also, doing it fully atomically simplifies where the races between two
runshell processes from the same bundle. Now that needs to be
dealt with to only the mv that puts it in place.
Note that, if the same bundle has 2 runshells run first thing, they will
both generate locales, which is unncessary work, but that should be a
very unusual circumstance and after the LOCPATH is set up, it won't
happen again anyway.
This should fix doc/bugs/standalone_runshell_can_race_and_fail_to_remove___96____126____47__.cache__47__git-annex__47__locales__47____96___dirs
where 2 runshells were running and the second one tried to clean up
LOCPATH while the first one was still populating it.
By moving the cleanup until after LOCPATH is populated, we guarantee
it's populated, so don't need to worry about such a race with another
process populating our same LOCPATH.
This avoids the possibility that the bundle could be updated in place,
leading to LOCPATH existing but containing locales for the old version,
which needed to be tested for with code that was not race-free.
LOCPATH/buildid is still written and checked when cleaning up stale caches.
That is not actually necessary, except old versions of the standalone
bundle expect to see it, and this prevents them cleaning up the locale
cache of a new version. And still checking it prevents the new version
cleaning up the locale cache of the old version while the old version is
still in use.
Added explicit tests before creating LOCPATH and the base and buildid files.
The buildid file no longer needs to be updated every time, because it's
stable for the given LOCPATH directory.
And the base file actually did not need to be updated every time,
because the LOCPATH is derived from base, so if the bundle is moved
elsewhere, a different LOCPATH will be used.
Transitioning to this will mean that two git-annex builds that otherwise
have the same buildid -- the same git-annex md5sum -- will use different
LOCPATH values, but that's handled fine by the cache cleanup code, so at
most it will mean one extra generation of the locale files.
This seems to be the best way to deal with the race; if the first and
second runshell are running very close together, the first will generate
the locale directory, and a second test -d would still leave a race
window.
Only turning it off when the criterion library is not installed.
Not enabled for osx or i386ancient yet since that will need some
invesitgation to update their respective stack.yaml files.
Broke a while ago during optimisation work, and not noticed since the flag
is disabled by default.
This commit was sponsored by Brock Spratlen on Patreon.
isKnownImportLocation does a database lookup and there's an index
to make that lookup fast, so it's probably faster than talking to git
check-ignore. Checking the matcher is faster still.
While before the gitignore check was added it did not need to always
check isknown, now it does, because it's that or the more expensive
notignored. But at least we can skip notignored when a file is known,
which will often be the common case: Importing from a remote that's been
exported to, and/or imported from before, only new files will not be
known, so only those will need to check notignored.
At first, I had this:
(matches <&&> (isknown <||> notignored)) <||> isknown
Notice that checks isknown every time, whether it matches or not.
So, it's no slower to instead do this:
isknown <||> (matches <&&> notignored)
That has the benefit that, when it's known, it doesn't need to run
matches, which while faster than isknown, is still going to use some CPU.
And it perhaps more clearly expresses the condition: Any known file is
wanted, otherwise it's down to what matches and is not ignored.
This commit was sponsored by Jack Hill on Patren.
It seemed best to do this, for consistency with every other way files can
get into a git-annex repo. Although it's just a bit strange that a local
.gitignore file affects the pseudo-commits made for the remote that's
imported from.
This commit was sponsored by Brett Eisenberg on Patreon.
Which lets progress be displayed when doing concurrent downloads.
Amoung other things, like --json-progress etc.
The youtube-dl output is no longer displayed, except for any errors.
This commit was sponsored by Denis Dzyubenko on Patreon.