Don't want to try to use these remotes as key/value remotes, which will
surely fail. It only recently became possible for importtree to be set
w/o exporttree, so before this code was ok.
(cherry picked from commit 97599cb0f7f4115aa5a3e81a91ee3d1d6c52dc84)
In cd1676d604, it stopped using that to avoid surprising behavior
when the location log and remote content were out of sync.
But, it seems that may have changed some behavior users relied on as
well, and also Remote.hasKeyCheap should be faster than checking then
location log.
So, try Remote.hasKeyCheap first, and only if it does not have the key,
fall back to checking the location log. If the location log still thinks
it's present, go ahead and try to get it, so the user will see a failure
rather than silently skipping a file what whereis says is on the remote.
This does make slightly slower the case where the remote does not have
the key, and location log and Remote.hasKeyCheap agree, since it now
checks both. But only 1 stat slower.
And vice-versa, but it's better to use '/' for portability.
Notably, standardPreferredContent contains "archive/*" and that might not
match if the filename ends up coming in with the slashes the other way
around.
I do think this was a reversion, but I have not tracked back to what
version. While involving the remote config, it's not the same class of
problems that I kept having to chase down for a while after the remote
config parser reworking.
git -c was already propagated via environment, but need this for
consistency.
Also, notice it does not use gitAnnexChildProcess to run the
transferrer. So nothing is done about avoid it taking the
pid lock. It's possible that the caller is already doing something that
took the pid lock, and if so, the transferrer will certianly fail,
since it needs to take the pid lock too. This may prevent combining
annex.stalldetection with annex.pidlock, but I have not verified it's
really a problem. If it was, it seems git-annex would have to take
the pid lock when starting a transferrer, and hold it until shutdown,
or would need to take pid lock when starting to use a transferrer,
and hold it until done with a transfer and then drop it. The latter
would require starting the transferrer with pid locking disabled for the
child process, so assumes that the transferrer does not do anyting that
needs locking when not running a transfer.
MatchingKey is not the thing to use when matching on actual worktreee
files.
Fix reversion in 8.20201116 that made include= and exclude= in
preferred/required content expressions match a path relative to the current
directory, rather than the path from the top of the repository.
* Guard against running in a repo where annex.uuid is set but
annex.version is set, or vice-versa.
* Avoid autoinit when a repo does not have annex.version or annex.uuid
set, but has a git-annex objects directory, suggesting it was used
by git-annex before.
I need to think about this some more, not clear if it's a todo
item specific to stalldetection at all. Remotes with this behavior
also show no progress when run with -J. And some other remotes don't
update any progress meters at all, eg adb is that way and so are hook
remotes and of course external remotes don't have to send progress info.
Done on unix, could not implement it on windows quite.
The signal library gets part of the way needed for windows.
But I had to open https://github.com/pmlodawski/signal/issues/1 because
it lacks raiseSignal.
Also, I don't know what the equivilant of getProcessGroupIDOf is on
windows. And System.Process does not provide a way to send any signal to
a process group except for SIGINT.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
All callers adjusted to update it themselves.
In Command.ReKey, and Command.SetKey, the cleanup action already did,
so it was updating the log twice before.
This fixes a bug when annex.stalldetection is set, as now
Command.Transferrer can skip updating the location log, and let it be
updated by the calling process.
This is to avoid breakage when upgrading or downgrading git-annex with a
process running that uses the interface. It's better to keep the
compatability code for a few years than worry about such breakage.
This commit was sponsored by Brett Eisenberg on Patreon.
New config annex.stalldetection, remote.name.annex-stalldetection, which
can be used to deal with remotes that stall during transfers, or are
sometimes too slow to want to use.
This commit was sponsored by Luke Shumaker on Patreon.
That seems to be the last thing needed for message serialization.
Although it's only used in the assistant currently, so hard to tell if I
forgot something.
At this point, it should be possible to start using transferkeys
when performing transfers, which will allow killing a transferkeys
process if a transfer times out or stalls. But that's for another day.
This commit was sponsored by Ethan Aubin.
Necessarily threw out the old protocol, so if an old git-annex assistant
is running, and starts a transferkeys from the new git-annex, it would
fail. But, that seems unlikely; the assistant starts up transferkeys
processes and then keeps them running. Still, may need to test that
scenario.
The new protocol is simple read/show and looks like this:
TransferRequest Download (Right "origin") (Key {keyName = "f8f8766a836fb6120abf4d5328ce8761404e437529e997aaa0363bdd4fecd7bb", keyVariety = SHA2Key (HashSize 256) (HasExt True), keySize = Just 30, keyMtime = Nothing, keyChunkSize = Nothing, keyChunkNum = Nothing}) (AssociatedFile (Just "foo"))
TransferOutput (ProgressMeter (Just 30) (MeterState {meterBytesProcessed = BytesProcessed 0, meterTimeStamp = 1.6070268727892535e9}) (MeterState {meterBytesProcessed = BytesProcessed 30, meterTimeStamp = 1.6070268728043e9}))
TransferOutput (OutputMessage "(checksum...) ")
TransferResult True
Granted, this is not optimally fast, but it seems good enough, and is
probably nearly as fast as the old protocol anyhow.
emitSerializedOutput for ProgressMeter is not yet implemented. It needs
to somehow start or update a progress meter. There may need to be a new
message that allocates a progress meter, and then have ProgressMeter
update it.
This commit was sponsored by Ethan Aubin
Avoid spurious "verification of content failed" message when downloading
content from a ssh or tor remote fails due to the remote no longer having a
copy of the content.
The P2P protocol already handled this case by sending DATA 0, followed by
VALID. But VALID was not really right, because the data is not the
requested data. So, send DATA 0, followed by INVALID. Old versions of
git-annex handle INVALID the same as VALID in this case. Now new versions
avoid displaying an incorrect message.
It would be better for the P2P protocol to have a different way to indicate
this, like perhaps sending INVALID without DATA. But that would be a
breaking change and need a new protocol verison. Since INVALID already is
part of the protocol and already needs to be handled, using it for this
special case too seems ok, and avoids the complication of another protocol
version.
This commit was sponsored by Jochen Bartl on Patreon.
This is an edge case, which happened to be triggered by the P2P protocol
seeing DATA 0. When reading 0 bytes, getting an empty string does
not mean the handle has reached EOF.
I verified there was in fact a bug, where get of an empty file followed
by another file would get the empty file and then fail
with "handle is closed". This fixes it.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
Reversion introduced in version 8.20201007, one release after the 1st
release with the extension.
Surprisingly, hClose can hang if another thread is reading from the
handle. This is because it uses takeMVar.
The use of cancel here does mean that, if receiveMessageAddonProcess
or Remote.External.AsyncExtension.receiveloop allocated some resource in
a non-async-exception safe way, they might not get a chance to clean it up.
They do not appear to, and anyway, this only happens when git-annex is
shutting down, so any recource that did leak would not be a problem.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
Fix a bug that could make resuming a download from the web fail when the
entire content of the file is actually already present locally.
What a mess that Request can throw exceptions or not, depending on how
it's configured. Makes it very hard if you need to handle some specific
http status codes in a function like this! Implementing everything two
ways did not seem appealing, if possible at all, so I decided to
override the Request if it did come configured to throw exception on
non-2xx http status. Other exceptions, like from http-client-restricted,
or due to a redirect to a non-http url, still get thrown.
This commit was sponsored by Luke Shumaker on Patreon.
Prevent windows assistant from trying (and failing) to upgrade itself,
which has never been supported on windows.
The new windows build is made with UPGRADE_LOCATION set, which enabled this
code path that had never run on windows before, and doesn't work. I don't
want to try to support self-upgrade on windows, or generally on other OS's
than the ones where its working, so added a check for that. This way the
build can keep setting UPGRADE_LOCATION and if some later git-annex does
learn how to upgrade itself on some OS, it won't need changing the build
setup.
Added annex.adjustedbranchrefresh git config to update adjusted branches
set up by git-annex adjust --unlock-present/--hide-missing.
Note, in a few cases, I was not able to make the adjusted branch
be updated in calls to moveAnnex, because information about what
file corresponds to a key is not available. They are:
* If two files point to one file, then eg, `git annex get foo` will
update the branch to unlock foo, but will not unlock bar, because it
does not know about it. Might be fixable by making `git annex get
bar` do something besides skipping bar?
* git-annex-shell recvkey likewise (so sends over ssh from old versions
of git-annex)
* git-annex setkey
* git-annex transferkey if the user does not use --file
* git-annex multicast sends keys with no associated file info
Doing a single full refresh at the end, after any incremental refresh,
will deal with those edge cases.
An --unlock-present branch reverses back to a branch where
all files that get modified or renamed become locked, even if they were
originally unlocked. This is the same that reversing a --unlock branch
works, and the new name makes that commonality more clear.
Like --hide-missing the branch does not get updated when content
availability changes.
Seems to basically work, but sync does not update it yet.
Also, when a file is present and so unlocked, git mv followed by
git-annex sync results in the basis branch being updated to contain the
file with the new name, unlocked. This seems different than what
happens in an adjusted unlocked branch, where the commit propigates back
locked. Probably the reverse adjustment code needs to be improved to
handle this case.
Note that, the way the SeekInput parser is written to support batch mode,
it's actually possible to do git-annex examinekey
"SHA1--foo foo.tar.gz" --migrate-to-backend=SHA1E
While that might be kind of useful to support multiple migrations not using
batch mode, I have not documented it. It would be better to take pairs of
key and file in that case.
Warn when adding a annex symlink or pointer file that uses a key that is
not known to the repository, to prevent confusion if the user has copied it
from some other repository.
This commit was sponsored by Jake Vosloo on Patreon.
All properties changed to use them, except for
prop_encode_c_decode_c_roundtrip, which already filtered to ascii
for other reasons.
A few modules had to be split out, because Setup does not build-depend
on QuickCheck.
This fixes a bug where a file that was not preferred content could be
transferred to a remote. This happened when the file got deleted after
the sync started running.
The only time checkMatcher is run without a Key is in calls to
checkFileMatcher, which are only done by add, addurl, import, and
smudge --clean. Those won't be affected by this kind of race. Anything
else that might be precaching and have a similar race as sync will also
be fixed, but I don't know if it actually affected anything other than
sync.
As well as fixing a bug, this also probably makes sync and --auto faster
by avoiding the redundant key lookup.
This commit was sponsored by Graham Spencer on Patreon.
Because it's a special character on Windows ("c:").
Use same technique already used for '/' and '\'.
I didn't record how I generated their encoded forms before, so am sure
there was a better way, but the way I did it now is to look at
ghci> encodeFilePath "∕"
"\226\136\149"
And then the difference from that to "\56546\56456\56469"
is adding 56320 to each, to get up to the escaped code plane.
See comment for why I think handling ':' is ok, but that other illegal
windows filenames won't. Note that, this should be enough to make the
test suite always work. Other windows illegal filenames will fail at
checkout time when it tries to put the illegal filename on the
filesystem.
This is a cleaner build than on Jenkins because the whole environment setup
is handled by the CI config, at least up to the point of "get a random bag
of Windows bytes".
Also, the Jenkins autobuilder has been intermittently failing for a long
time, not due to any problem with git-annex but just a failure to clean up
directories.
Also, this build runs the test suite, and it is (mostly) passing. Test
suite always failed in the jenkins environment.
Also, this build includes libmagic.
Here is the build workflow used by github actions:
https://github.com/datalad/datalad-extensions/blob/master/.github/workflows/build-git-annex-windows.yaml
The libmagic build has its own workflow:
https://github.com/datalad/file-windows/blob/master/.github/workflows/build.yml
(Also cleaned up some windows build cruft I don't use anymore.)
There is no build-version file to link to. I've opened a todo requesting
one: https://github.com/datalad/datalad-extensions/issues/55
Presumably most users will get it from their linux distribution, or
whatever. Don't want to need to keep updating links if they're going to
rot like this.
This fixes the bug.
Note, it's only done when GIT_DIR is set. When it's not set,
Git.Construct already handled it. This is why it was only noticed with this
git submodule command.
This commit was sponsored by Brett Eisenberg on Patreon.
Only done in checkPresentChunks, although retrieveChunks could also do
it. Does not seem necessary though, because git-annex never retrives
content without first checking if it's present AFAICR. And really this will
only be needed when using fsck. Puttting it here, rather than in fsck
avoids breaking an abstraction boundary, and is nice and inexpensive.
When a special remote has chunking enabled, but no chunk sizes are
recorded (or the recorded ones are not found), speculatively try chunks
using the configured chunk size.
This makes eg, git-annex fsck --from remote be able to fix up the
location log of a file that the git-annex branch does not indicate is
stored on the remote.
Note that fsck does *not* fix up the chunk log to indicate the chunk
size. So, changing the chunk config of the remote after that will still
prevent accessing the chunks stored on it. Maybe fsck should, but I
wanted to start with this and see if it's needed.
In cases where numcopies checks prevented the resumed move from dropping
the object from the source repository, it now relies on a log of recent
moves to replicate the behavior of the interrupted command.
Performance: Probably noticable impact, since it has to add to the log,
check the log, and remove from the log. Seems worth it to avoid this
annoying edge case. The log functions are pretty well optimised to avoid
unncessary work.
An performance improvement to make later would be to avoid cleanup doing
anything if it's not written to the log file, and has confirmed that the
log file does not contain the log line.
This commit was sponsored by Jake Vosloo on Patreon.
The problem was this line:
cleanup = and <$> sequence (map snd v)
That caused all of v to be held onto until the end, when the cleanup action
was run.
I could not seem to find a bang pattern that avoided the leak, so I
resorted to a IORef, rather clunky, but not a performance problem because
it will only be written once per git ls-files, so typically just 1 time.
This commit was sponsored by Mark Reidenbach on Patreon.
inet_addr was removed, but all this needs is localhost, so hardcoding it
should work fine.
It may be that this windows ifdef is no longer needed. It was added in 2013
with a note that getAddrInfo didn't work on windows, but it seems likely
such a problem would have been fixed since.
This avoids the possibility that the bundle could be updated in place,
leading to LOCPATH existing but containing locales for the old version,
which needed to be tested for with code that was not race-free.
LOCPATH/buildid is still written and checked when cleaning up stale caches.
That is not actually necessary, except old versions of the standalone
bundle expect to see it, and this prevents them cleaning up the locale
cache of a new version. And still checking it prevents the new version
cleaning up the locale cache of the old version while the old version is
still in use.
Added explicit tests before creating LOCPATH and the base and buildid files.
The buildid file no longer needs to be updated every time, because it's
stable for the given LOCPATH directory.
And the base file actually did not need to be updated every time,
because the LOCPATH is derived from base, so if the bundle is moved
elsewhere, a different LOCPATH will be used.
Transitioning to this will mean that two git-annex builds that otherwise
have the same buildid -- the same git-annex md5sum -- will use different
LOCPATH values, but that's handled fine by the cache cleanup code, so at
most it will mean one extra generation of the locale files.
It seemed best to do this, for consistency with every other way files can
get into a git-annex repo. Although it's just a bit strange that a local
.gitignore file affects the pseudo-commits made for the remote that's
imported from.
This commit was sponsored by Brett Eisenberg on Patreon.
Ensure that checkCanAdd is used everywhere a file is added to git,
so git add is run with -f, presumably avoiding the work it would usually
do to check ignores.
This avoids import with --no-content and with --content potentially
generating two different trees, leading to a merge conflict when run in
two different clones of a repo. And it's necessary groundwork to make
git-annex sync --no-content import from special remotes that support
importKey.
Only the directory special remote currently supports importKey, and it
generates the same key as git-annex usually does, so there is no
behavior change for it.
Future special remotes will need to take care when adding importKey,
if it generates different keys. Added some warnings about that to
comments.
This commit was sponsored by Noam Kremen on Patreon.
Import small files into git, the same as is done when importing with content.
Which means, for small files, --no-content does download them.
If the largefiles expression needs the file content available
(due to mimetype or mimeencoding being used), the import will fail.
This commit was sponsored by Jake Vosloo on Patreon.