Only done in checkPresentChunks, although retrieveChunks could also do
it. Does not seem necessary though, because git-annex never retrives
content without first checking if it's present AFAICR. And really this will
only be needed when using fsck. Puttting it here, rather than in fsck
avoids breaking an abstraction boundary, and is nice and inexpensive.
When a special remote has chunking enabled, but no chunk sizes are
recorded (or the recorded ones are not found), speculatively try chunks
using the configured chunk size.
This makes eg, git-annex fsck --from remote be able to fix up the
location log of a file that the git-annex branch does not indicate is
stored on the remote.
Note that fsck does *not* fix up the chunk log to indicate the chunk
size. So, changing the chunk config of the remote after that will still
prevent accessing the chunks stored on it. Maybe fsck should, but I
wanted to start with this and see if it's needed.
In cases where numcopies checks prevented the resumed move from dropping
the object from the source repository, it now relies on a log of recent
moves to replicate the behavior of the interrupted command.
Performance: Probably noticable impact, since it has to add to the log,
check the log, and remove from the log. Seems worth it to avoid this
annoying edge case. The log functions are pretty well optimised to avoid
unncessary work.
An performance improvement to make later would be to avoid cleanup doing
anything if it's not written to the log file, and has confirmed that the
log file does not contain the log line.
This commit was sponsored by Jake Vosloo on Patreon.
The problem was this line:
cleanup = and <$> sequence (map snd v)
That caused all of v to be held onto until the end, when the cleanup action
was run.
I could not seem to find a bang pattern that avoided the leak, so I
resorted to a IORef, rather clunky, but not a performance problem because
it will only be written once per git ls-files, so typically just 1 time.
This commit was sponsored by Mark Reidenbach on Patreon.
inet_addr was removed, but all this needs is localhost, so hardcoding it
should work fine.
It may be that this windows ifdef is no longer needed. It was added in 2013
with a note that getAddrInfo didn't work on windows, but it seems likely
such a problem would have been fixed since.
This avoids the possibility that the bundle could be updated in place,
leading to LOCPATH existing but containing locales for the old version,
which needed to be tested for with code that was not race-free.
LOCPATH/buildid is still written and checked when cleaning up stale caches.
That is not actually necessary, except old versions of the standalone
bundle expect to see it, and this prevents them cleaning up the locale
cache of a new version. And still checking it prevents the new version
cleaning up the locale cache of the old version while the old version is
still in use.
Added explicit tests before creating LOCPATH and the base and buildid files.
The buildid file no longer needs to be updated every time, because it's
stable for the given LOCPATH directory.
And the base file actually did not need to be updated every time,
because the LOCPATH is derived from base, so if the bundle is moved
elsewhere, a different LOCPATH will be used.
Transitioning to this will mean that two git-annex builds that otherwise
have the same buildid -- the same git-annex md5sum -- will use different
LOCPATH values, but that's handled fine by the cache cleanup code, so at
most it will mean one extra generation of the locale files.
It seemed best to do this, for consistency with every other way files can
get into a git-annex repo. Although it's just a bit strange that a local
.gitignore file affects the pseudo-commits made for the remote that's
imported from.
This commit was sponsored by Brett Eisenberg on Patreon.
Ensure that checkCanAdd is used everywhere a file is added to git,
so git add is run with -f, presumably avoiding the work it would usually
do to check ignores.
This avoids import with --no-content and with --content potentially
generating two different trees, leading to a merge conflict when run in
two different clones of a repo. And it's necessary groundwork to make
git-annex sync --no-content import from special remotes that support
importKey.
Only the directory special remote currently supports importKey, and it
generates the same key as git-annex usually does, so there is no
behavior change for it.
Future special remotes will need to take care when adding importKey,
if it generates different keys. Added some warnings about that to
comments.
This commit was sponsored by Noam Kremen on Patreon.
Import small files into git, the same as is done when importing with content.
Which means, for small files, --no-content does download them.
If the largefiles expression needs the file content available
(due to mimetype or mimeencoding being used), the import will fail.
This commit was sponsored by Jake Vosloo on Patreon.
Sped up seeking for files to operate on, when using options like --copies
or --in, by around 20%.
Benchmark showed an increase for --copies from 155 seconds to 121
seconds, and --in remote will be similar to that.
For --in here, the speedup was less, 5-10% or so.
(both warm cache)
This commit was sponsored by Jack Hill on Patreon.