Directory special remotes with importtree=yes now avoid unncessary overhead
when inodes of files have changed, as happens whenever a FAT filesystem
gets remounted.
A few unusual edge cases of modifications won't be detected and
imported. I think they're unusual enough not to be a concern. It would
be possible to add a config setting that controls whether to compare
inodes too, but does not seem worth bothering the user about currently.
I chose to continue to use the InodeCache serialization, just with the
inode zeroed. This way, if I later change my mind or make it
configurable, can parse it back to an InodeCache and operate on it. The
overhead of storing a 0 in the content identifier log seems worth it.
There is a one-time cost to this change; all directory special remotes
with importtree=yes will re-hash all files once, and will update the
content identifier logs with zeroed inodes.
This commit was sponsored by Brett Eisenberg on Patreon.
Including the non-standard URI form that git-remote-gcrypt uses for rsync.
Eg, "ook://foo:bar" cannot be parsed because "bar" is not a valid port
number. But git could have a remote with that, it would try to run
git-remote-ook to handle it. So, git-annex has to allow for such things,
rather than crashing.
This commit was sponsored by Luke Shumaker on Patreon.
It was just slapping on a path separator to the front of the path to
make it absolute, but on windows, a path like "//foo/bar" actually
has a network "drive" of "//foo" and so that broke the test case.
Since "a:foo" is a somehow relative path on windows
(who knows how), drop any drive from the input. But dropDrive also drops
any leading path separator, making the input path relative. So now
it should be safe to slapp on a leading path separator.
This was not a good test, it broke the requirement that
relPathDirToFileAbs take absolute paths. And it failed when the two
input paths were eg, the same but differently normalized.
Replaced with some tests of the real basics of that function.
It got broken in several ways by the streaming seeking optimisations
around version 8.20201007.
Moved time limit checking out of the matcher, which was a hack in the
first place. So everywhere that uses Limit.getMatcher needs to check
time limit. Well, almost everywhere. Command.Info uses it, but it does
not make sense to time limit getting info. And Command.MultiCast uses it
just to build up a list of files that then get passed to a command, so
it would never have hit the timeout in a useful way.
This implementation is a little more expensive when at time limit than
necessary, since it continues seeking only to discard everything after the
time limit. I did try making it close the file handles to force a faster
shutdown, but that didn't work and hung. Could certianly be improved
somehow, but seeking is probably not the expensive bit when a time limit
is hit, so this seems acceptable for now.
For reasons explained in the bug report.
Implemented using a persistent migration, which works fine. It may add a
little startup overhead when a remote is enabled that uses this, but
probably un-noticable.
On the next major version, it would be fine to delete this database,
and regenerate it from the git-annex branch information. Then this
change could be reverted.
Did nothing about adding back the data that got dropped from the db
due to the bug. Only the borg special remote was probably affected,
and it's not been released yet. rm -rf .git/annex/cidsdb does work.
And vice-versa, but it's better to use '/' for portability.
Notably, standardPreferredContent contains "archive/*" and that might not
match if the filename ends up coming in with the slashes the other way
around.
I do think this was a reversion, but I have not tracked back to what
version. While involving the remote config, it's not the same class of
problems that I kept having to chase down for a while after the remote
config parser reworking.
MatchingKey is not the thing to use when matching on actual worktreee
files.
Fix reversion in 8.20201116 that made include= and exclude= in
preferred/required content expressions match a path relative to the current
directory, rather than the path from the top of the repository.
Avoid spurious "verification of content failed" message when downloading
content from a ssh or tor remote fails due to the remote no longer having a
copy of the content.
The P2P protocol already handled this case by sending DATA 0, followed by
VALID. But VALID was not really right, because the data is not the
requested data. So, send DATA 0, followed by INVALID. Old versions of
git-annex handle INVALID the same as VALID in this case. Now new versions
avoid displaying an incorrect message.
It would be better for the P2P protocol to have a different way to indicate
this, like perhaps sending INVALID without DATA. But that would be a
breaking change and need a new protocol verison. Since INVALID already is
part of the protocol and already needs to be handled, using it for this
special case too seems ok, and avoids the complication of another protocol
version.
This commit was sponsored by Jochen Bartl on Patreon.
This is an edge case, which happened to be triggered by the P2P protocol
seeing DATA 0. When reading 0 bytes, getting an empty string does
not mean the handle has reached EOF.
I verified there was in fact a bug, where get of an empty file followed
by another file would get the empty file and then fail
with "handle is closed". This fixes it.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
Reversion introduced in version 8.20201007, one release after the 1st
release with the extension.
Surprisingly, hClose can hang if another thread is reading from the
handle. This is because it uses takeMVar.
The use of cancel here does mean that, if receiveMessageAddonProcess
or Remote.External.AsyncExtension.receiveloop allocated some resource in
a non-async-exception safe way, they might not get a chance to clean it up.
They do not appear to, and anyway, this only happens when git-annex is
shutting down, so any recource that did leak would not be a problem.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
Fix a bug that could make resuming a download from the web fail when the
entire content of the file is actually already present locally.
What a mess that Request can throw exceptions or not, depending on how
it's configured. Makes it very hard if you need to handle some specific
http status codes in a function like this! Implementing everything two
ways did not seem appealing, if possible at all, so I decided to
override the Request if it did come configured to throw exception on
non-2xx http status. Other exceptions, like from http-client-restricted,
or due to a redirect to a non-http url, still get thrown.
This commit was sponsored by Luke Shumaker on Patreon.
Prevent windows assistant from trying (and failing) to upgrade itself,
which has never been supported on windows.
The new windows build is made with UPGRADE_LOCATION set, which enabled this
code path that had never run on windows before, and doesn't work. I don't
want to try to support self-upgrade on windows, or generally on other OS's
than the ones where its working, so added a check for that. This way the
build can keep setting UPGRADE_LOCATION and if some later git-annex does
learn how to upgrade itself on some OS, it won't need changing the build
setup.
Warn when adding a annex symlink or pointer file that uses a key that is
not known to the repository, to prevent confusion if the user has copied it
from some other repository.
This commit was sponsored by Jake Vosloo on Patreon.
All properties changed to use them, except for
prop_encode_c_decode_c_roundtrip, which already filtered to ascii
for other reasons.
A few modules had to be split out, because Setup does not build-depend
on QuickCheck.
This fixes a bug where a file that was not preferred content could be
transferred to a remote. This happened when the file got deleted after
the sync started running.
The only time checkMatcher is run without a Key is in calls to
checkFileMatcher, which are only done by add, addurl, import, and
smudge --clean. Those won't be affected by this kind of race. Anything
else that might be precaching and have a similar race as sync will also
be fixed, but I don't know if it actually affected anything other than
sync.
As well as fixing a bug, this also probably makes sync and --auto faster
by avoiding the redundant key lookup.
This commit was sponsored by Graham Spencer on Patreon.
Because it's a special character on Windows ("c:").
Use same technique already used for '/' and '\'.
I didn't record how I generated their encoded forms before, so am sure
there was a better way, but the way I did it now is to look at
ghci> encodeFilePath "∕"
"\226\136\149"
And then the difference from that to "\56546\56456\56469"
is adding 56320 to each, to get up to the escaped code plane.
See comment for why I think handling ':' is ok, but that other illegal
windows filenames won't. Note that, this should be enough to make the
test suite always work. Other windows illegal filenames will fail at
checkout time when it tries to put the illegal filename on the
filesystem.
This fixes the bug.
Note, it's only done when GIT_DIR is set. When it's not set,
Git.Construct already handled it. This is why it was only noticed with this
git submodule command.
This commit was sponsored by Brett Eisenberg on Patreon.
Only done in checkPresentChunks, although retrieveChunks could also do
it. Does not seem necessary though, because git-annex never retrives
content without first checking if it's present AFAICR. And really this will
only be needed when using fsck. Puttting it here, rather than in fsck
avoids breaking an abstraction boundary, and is nice and inexpensive.
When a special remote has chunking enabled, but no chunk sizes are
recorded (or the recorded ones are not found), speculatively try chunks
using the configured chunk size.
This makes eg, git-annex fsck --from remote be able to fix up the
location log of a file that the git-annex branch does not indicate is
stored on the remote.
Note that fsck does *not* fix up the chunk log to indicate the chunk
size. So, changing the chunk config of the remote after that will still
prevent accessing the chunks stored on it. Maybe fsck should, but I
wanted to start with this and see if it's needed.
inet_addr was removed, but all this needs is localhost, so hardcoding it
should work fine.
It may be that this windows ifdef is no longer needed. It was added in 2013
with a note that getAddrInfo didn't work on windows, but it seems likely
such a problem would have been fixed since.
This avoids the possibility that the bundle could be updated in place,
leading to LOCPATH existing but containing locales for the old version,
which needed to be tested for with code that was not race-free.
LOCPATH/buildid is still written and checked when cleaning up stale caches.
That is not actually necessary, except old versions of the standalone
bundle expect to see it, and this prevents them cleaning up the locale
cache of a new version. And still checking it prevents the new version
cleaning up the locale cache of the old version while the old version is
still in use.
Added explicit tests before creating LOCPATH and the base and buildid files.
The buildid file no longer needs to be updated every time, because it's
stable for the given LOCPATH directory.
And the base file actually did not need to be updated every time,
because the LOCPATH is derived from base, so if the bundle is moved
elsewhere, a different LOCPATH will be used.
Transitioning to this will mean that two git-annex builds that otherwise
have the same buildid -- the same git-annex md5sum -- will use different
LOCPATH values, but that's handled fine by the cache cleanup code, so at
most it will mean one extra generation of the locale files.