Fix an oddity in matching options and preferred content expressions such as
"foo (bar or baz)", which was incorrectly handled as if it were "(foo or
bar) and baz)" rather than the intended "foo and (bar or baz)"
Seemed like a change to consume should be able to handle this case
better, but I was having trouble writing it that way, so instead added
a separate pass that inserts the implicit ands explicitly. Also added
several test cases to make sure versions with and without explicit ands
generate the same.
Missed this when implementing it because of the default case catching
the new constructor. So, removed that default case to make sure
future types of adjusted branches don't make the same mistake.
Complicated by git-annex addurl --fast which adds the file whose content
is not present, so it needs to stay unlocked when on such a branch.
This commit was sponsored by Brock Spratlen on Patreon.
Fixed that, and made parserLsTree accept the space as well as tab.
Fixes a reversion that made import of a tree from a special remote result in
a merge that deleted files that were not preferred content of that special
remote.
Avoids the smudge --clean filter failing because URL keys do not support
genKey. Instead the modified content will be added using the default
backend.
This commit was sponsored by Jochen Bartl on Patreon.
Don't accept the cid of the temp file that the content has just been
written to as something we will accept if another file has that same
content. There's no reason to, and on FAT, due to mtime resolution,
the test suite hit just such a case.
This fixes a reversion from 73df633a62
which removed inode from the ContentIdentifier.
Seems that dropDrive on windows only drops eg c:/ but not a leading /
while on linux, it does drop a leading / (which is what it considers
to be equivilant to a drive letter. I had been relying on it to drop
both. So need to drop leading directory separators.
Also, if the quickcheck generated input is eg "c:c:c:c:foo",
dropDrive will only drop the first one, leaving a path that's
still not relative. So instead of using dropDrive, just remove the
colons from the path.
This is probably a reversion, but not sure what caused it. By the time
Annex.Init runs fixupUnusualReposAfterInit, another git-annex process has
at least sometimes already done the necessary fixups. (Eg, one run
indirectly by a git command.) But since the Repo is cached, it doesn't
realize and does them again. So, avoid crashing when git config --unset
fails.
This commit was sponsored by Jack Hill on Patreon.
Directory special remotes with importtree=yes now avoid unncessary overhead
when inodes of files have changed, as happens whenever a FAT filesystem
gets remounted.
A few unusual edge cases of modifications won't be detected and
imported. I think they're unusual enough not to be a concern. It would
be possible to add a config setting that controls whether to compare
inodes too, but does not seem worth bothering the user about currently.
I chose to continue to use the InodeCache serialization, just with the
inode zeroed. This way, if I later change my mind or make it
configurable, can parse it back to an InodeCache and operate on it. The
overhead of storing a 0 in the content identifier log seems worth it.
There is a one-time cost to this change; all directory special remotes
with importtree=yes will re-hash all files once, and will update the
content identifier logs with zeroed inodes.
This commit was sponsored by Brett Eisenberg on Patreon.
Including the non-standard URI form that git-remote-gcrypt uses for rsync.
Eg, "ook://foo:bar" cannot be parsed because "bar" is not a valid port
number. But git could have a remote with that, it would try to run
git-remote-ook to handle it. So, git-annex has to allow for such things,
rather than crashing.
This commit was sponsored by Luke Shumaker on Patreon.
It was just slapping on a path separator to the front of the path to
make it absolute, but on windows, a path like "//foo/bar" actually
has a network "drive" of "//foo" and so that broke the test case.
Since "a:foo" is a somehow relative path on windows
(who knows how), drop any drive from the input. But dropDrive also drops
any leading path separator, making the input path relative. So now
it should be safe to slapp on a leading path separator.
This was not a good test, it broke the requirement that
relPathDirToFileAbs take absolute paths. And it failed when the two
input paths were eg, the same but differently normalized.
Replaced with some tests of the real basics of that function.
It got broken in several ways by the streaming seeking optimisations
around version 8.20201007.
Moved time limit checking out of the matcher, which was a hack in the
first place. So everywhere that uses Limit.getMatcher needs to check
time limit. Well, almost everywhere. Command.Info uses it, but it does
not make sense to time limit getting info. And Command.MultiCast uses it
just to build up a list of files that then get passed to a command, so
it would never have hit the timeout in a useful way.
This implementation is a little more expensive when at time limit than
necessary, since it continues seeking only to discard everything after the
time limit. I did try making it close the file handles to force a faster
shutdown, but that didn't work and hung. Could certianly be improved
somehow, but seeking is probably not the expensive bit when a time limit
is hit, so this seems acceptable for now.
For reasons explained in the bug report.
Implemented using a persistent migration, which works fine. It may add a
little startup overhead when a remote is enabled that uses this, but
probably un-noticable.
On the next major version, it would be fine to delete this database,
and regenerate it from the git-annex branch information. Then this
change could be reverted.
Did nothing about adding back the data that got dropped from the db
due to the bug. Only the borg special remote was probably affected,
and it's not been released yet. rm -rf .git/annex/cidsdb does work.
And vice-versa, but it's better to use '/' for portability.
Notably, standardPreferredContent contains "archive/*" and that might not
match if the filename ends up coming in with the slashes the other way
around.
I do think this was a reversion, but I have not tracked back to what
version. While involving the remote config, it's not the same class of
problems that I kept having to chase down for a while after the remote
config parser reworking.
MatchingKey is not the thing to use when matching on actual worktreee
files.
Fix reversion in 8.20201116 that made include= and exclude= in
preferred/required content expressions match a path relative to the current
directory, rather than the path from the top of the repository.
Avoid spurious "verification of content failed" message when downloading
content from a ssh or tor remote fails due to the remote no longer having a
copy of the content.
The P2P protocol already handled this case by sending DATA 0, followed by
VALID. But VALID was not really right, because the data is not the
requested data. So, send DATA 0, followed by INVALID. Old versions of
git-annex handle INVALID the same as VALID in this case. Now new versions
avoid displaying an incorrect message.
It would be better for the P2P protocol to have a different way to indicate
this, like perhaps sending INVALID without DATA. But that would be a
breaking change and need a new protocol verison. Since INVALID already is
part of the protocol and already needs to be handled, using it for this
special case too seems ok, and avoids the complication of another protocol
version.
This commit was sponsored by Jochen Bartl on Patreon.
This is an edge case, which happened to be triggered by the P2P protocol
seeing DATA 0. When reading 0 bytes, getting an empty string does
not mean the handle has reached EOF.
I verified there was in fact a bug, where get of an empty file followed
by another file would get the empty file and then fail
with "handle is closed". This fixes it.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
Reversion introduced in version 8.20201007, one release after the 1st
release with the extension.
Surprisingly, hClose can hang if another thread is reading from the
handle. This is because it uses takeMVar.
The use of cancel here does mean that, if receiveMessageAddonProcess
or Remote.External.AsyncExtension.receiveloop allocated some resource in
a non-async-exception safe way, they might not get a chance to clean it up.
They do not appear to, and anyway, this only happens when git-annex is
shutting down, so any recource that did leak would not be a problem.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
Fix a bug that could make resuming a download from the web fail when the
entire content of the file is actually already present locally.
What a mess that Request can throw exceptions or not, depending on how
it's configured. Makes it very hard if you need to handle some specific
http status codes in a function like this! Implementing everything two
ways did not seem appealing, if possible at all, so I decided to
override the Request if it did come configured to throw exception on
non-2xx http status. Other exceptions, like from http-client-restricted,
or due to a redirect to a non-http url, still get thrown.
This commit was sponsored by Luke Shumaker on Patreon.
Prevent windows assistant from trying (and failing) to upgrade itself,
which has never been supported on windows.
The new windows build is made with UPGRADE_LOCATION set, which enabled this
code path that had never run on windows before, and doesn't work. I don't
want to try to support self-upgrade on windows, or generally on other OS's
than the ones where its working, so added a check for that. This way the
build can keep setting UPGRADE_LOCATION and if some later git-annex does
learn how to upgrade itself on some OS, it won't need changing the build
setup.
Warn when adding a annex symlink or pointer file that uses a key that is
not known to the repository, to prevent confusion if the user has copied it
from some other repository.
This commit was sponsored by Jake Vosloo on Patreon.
All properties changed to use them, except for
prop_encode_c_decode_c_roundtrip, which already filtered to ascii
for other reasons.
A few modules had to be split out, because Setup does not build-depend
on QuickCheck.
This fixes a bug where a file that was not preferred content could be
transferred to a remote. This happened when the file got deleted after
the sync started running.
The only time checkMatcher is run without a Key is in calls to
checkFileMatcher, which are only done by add, addurl, import, and
smudge --clean. Those won't be affected by this kind of race. Anything
else that might be precaching and have a similar race as sync will also
be fixed, but I don't know if it actually affected anything other than
sync.
As well as fixing a bug, this also probably makes sync and --auto faster
by avoiding the redundant key lookup.
This commit was sponsored by Graham Spencer on Patreon.
Because it's a special character on Windows ("c:").
Use same technique already used for '/' and '\'.
I didn't record how I generated their encoded forms before, so am sure
there was a better way, but the way I did it now is to look at
ghci> encodeFilePath "∕"
"\226\136\149"
And then the difference from that to "\56546\56456\56469"
is adding 56320 to each, to get up to the escaped code plane.
See comment for why I think handling ':' is ok, but that other illegal
windows filenames won't. Note that, this should be enough to make the
test suite always work. Other windows illegal filenames will fail at
checkout time when it tries to put the illegal filename on the
filesystem.
This fixes the bug.
Note, it's only done when GIT_DIR is set. When it's not set,
Git.Construct already handled it. This is why it was only noticed with this
git submodule command.
This commit was sponsored by Brett Eisenberg on Patreon.
Only done in checkPresentChunks, although retrieveChunks could also do
it. Does not seem necessary though, because git-annex never retrives
content without first checking if it's present AFAICR. And really this will
only be needed when using fsck. Puttting it here, rather than in fsck
avoids breaking an abstraction boundary, and is nice and inexpensive.
When a special remote has chunking enabled, but no chunk sizes are
recorded (or the recorded ones are not found), speculatively try chunks
using the configured chunk size.
This makes eg, git-annex fsck --from remote be able to fix up the
location log of a file that the git-annex branch does not indicate is
stored on the remote.
Note that fsck does *not* fix up the chunk log to indicate the chunk
size. So, changing the chunk config of the remote after that will still
prevent accessing the chunks stored on it. Maybe fsck should, but I
wanted to start with this and see if it's needed.
inet_addr was removed, but all this needs is localhost, so hardcoding it
should work fine.
It may be that this windows ifdef is no longer needed. It was added in 2013
with a note that getAddrInfo didn't work on windows, but it seems likely
such a problem would have been fixed since.
This avoids the possibility that the bundle could be updated in place,
leading to LOCPATH existing but containing locales for the old version,
which needed to be tested for with code that was not race-free.
LOCPATH/buildid is still written and checked when cleaning up stale caches.
That is not actually necessary, except old versions of the standalone
bundle expect to see it, and this prevents them cleaning up the locale
cache of a new version. And still checking it prevents the new version
cleaning up the locale cache of the old version while the old version is
still in use.
Added explicit tests before creating LOCPATH and the base and buildid files.
The buildid file no longer needs to be updated every time, because it's
stable for the given LOCPATH directory.
And the base file actually did not need to be updated every time,
because the LOCPATH is derived from base, so if the bundle is moved
elsewhere, a different LOCPATH will be used.
Transitioning to this will mean that two git-annex builds that otherwise
have the same buildid -- the same git-annex md5sum -- will use different
LOCPATH values, but that's handled fine by the cache cleanup code, so at
most it will mean one extra generation of the locale files.
Works better with automatic merge conflict resolution than git's ususual
default of "conflict".
This is not done when automatic merge conflict resolution is disabled.
This commit was sponsored by Mark Reidenbach on Patreon.
This case was handled by cleanConflictCruft, but only when the annexed
file's object was present. When not present, it left the annexed file
with the original name, not checked into git, while adding the variant
file. So, add an explicit deletion of the deleted file in this case.
My specific case where this happened actually involves
merge.directoryRenames=conflict. After a merge involving that,
the situation was the file appears as "added by them", because that
caused the file that they added to be moved into a directory we renamed.
That case is the same as them adding a modified version of the file,
while we deleted it. (Except for the history of the file, since it's a
new file, but this doesn't look at history.)
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
Fixed several cases where files were created without file mode bits that
the umask would usually set. This included exports to the directory special
remote, torrent files used by the bittorrent special remote, hooks written
by git-annex init, and some log files in .git/annex/
Audited all calls, looking for ones that didn't want the umask bits to be
set. All such turned out to already set the specific restrictive file mode
they wanted.
Also audited for other calls to openTempFile, and all are ok,
except for viaTmp which will need further work.
Remote.Directory fixed to set umask mode when writing to an export,
although it has another one using viaTmp that's not fixed.
Will make exports that are published via a http server running as
another user work, for example.
Remote.BitTorrent fixed to set umask mode when downloading the torrent
file. Normally this does not matter as that file does not hang around
after the download, but if a bittorrent download were started by one user,
got interrupted and then another user ran it, this will let them access
the torrent file created by the first user.
They normally shutdown when the GNUPGHOME directory is deleted, but on
NFS they keep the directory from being deleted. And also, this avoids
a number of them piling up while the test suite is running.
Fixes reversion in 8.20200617 that made annex.pidlock being enabled result
in some commands stalling, particularly those needing to autoinit.
Renamed runsGitAnnexChildProcess to make clearer where it should be
used.
Arguably, it would be better to have a way to make any process git-annex
runs have the env var set. But then it would need to take the pid lock
when running any and all processes, and that would be a problem when
git-annex runs two processes concurrently. So, I'm left doing it ad-hoc
in places where git-annex really does run a child process, directly
or indirectly via a particular git command.
addurl: Fix reversion in 7.20190322 that made --file not be honored when
youtube-dl was used to download media.
8758f9c561 was on the right track, but missed that | otherwise prevented
the code it added from being used.
Also, refactored out a common function.
This commit was sponsored by Graham Spencer on Patreon.
Since there's a race here, and since Kyle saw an exception leak out,
which I have not been able to reproduce that. See my comment for what
I think might be going on.
Note that, I used tryNonAsync, because it seems a later tryNonAsync
caught the exception. I don't actually understand how it did, as I
understand exception classification, it's the data type, not the way it
was thrown. One possibility is that the async exception may have been wrapped
in some other, non-async exception, and Show displayed it the same way.