Including the non-standard URI form that git-remote-gcrypt uses for rsync.
Eg, "ook://foo:bar" cannot be parsed because "bar" is not a valid port
number. But git could have a remote with that, it would try to run
git-remote-ook to handle it. So, git-annex has to allow for such things,
rather than crashing.
This commit was sponsored by Luke Shumaker on Patreon.
It was just slapping on a path separator to the front of the path to
make it absolute, but on windows, a path like "//foo/bar" actually
has a network "drive" of "//foo" and so that broke the test case.
Since "a:foo" is a somehow relative path on windows
(who knows how), drop any drive from the input. But dropDrive also drops
any leading path separator, making the input path relative. So now
it should be safe to slapp on a leading path separator.
This was not a good test, it broke the requirement that
relPathDirToFileAbs take absolute paths. And it failed when the two
input paths were eg, the same but differently normalized.
Replaced with some tests of the real basics of that function.
Since that can lead to data loss, which should never be enabled by an
option other than --force.
I suppose that using --trust was in some situation, safer than --force,
because it doesn't entirely disable checking for data loss, but only
disables checking involving data that is on the specified repository.
But it seems better to be able to say that data loss only happens with
--force.
This commit was sponsored by Graham Spencer on Patreon.
Since unconsidered use of trusted repositories can lead to data loss.
Trusted has always been this way, but it used to be acceptable for
git-annex to be set up so that data could be lost without using --force,
and most or all other ways that can happen have already been eliminated.
This commit was sponsored by Mark Reidenbach on Patreon.
This is conceptually very simple, just making a 1 that was hard coded be
exposed as a config option. The hard part was plumbing all that, and
dealing with complexities like reading it from git attributes at the
same time that numcopies is read.
Behavior change: When numcopies is set to 0, git-annex used to drop
content without requiring any copies. Now to get that (highly unsafe)
behavior, mincopies also needs to be set to 0. It seemed better to
remove that edge case, than complicate mincopies by ignoring it when
numcopies is 0.
This commit was sponsored by Denis Dzyubenko on Patreon.
It got broken in several ways by the streaming seeking optimisations
around version 8.20201007.
Moved time limit checking out of the matcher, which was a hack in the
first place. So everywhere that uses Limit.getMatcher needs to check
time limit. Well, almost everywhere. Command.Info uses it, but it does
not make sense to time limit getting info. And Command.MultiCast uses it
just to build up a list of files that then get passed to a command, so
it would never have hit the timeout in a useful way.
This implementation is a little more expensive when at time limit than
necessary, since it continues seeking only to discard everything after the
time limit. I did try making it close the file handles to force a faster
shutdown, but that didn't work and hung. Could certianly be improved
somehow, but seeking is probably not the expensive bit when a time limit
is hit, so this seems acceptable for now.
* add: Significantly speed up adding lots of non-large files to git,
by disabling the annex smudge filter when running git add.
* add --force-small: Run git add rather than updating the index itself,
so any other smudge filters than the annex one that may be enabled will
be used.
Split out two todos for things that were mentioned as still open items
in there. Most of the others were already dealt with. I didn't open a
new todo for the import from readonly S3 bucket because I guess if
someone needs that, they can ask for it.
Especially from borg, where the content identifier logs
all end up being the same identical file!
But also, for other imports, the location tracking logs can,
in some cases, be identical files.
Bonus optimisation: Avoid looking up (and parsing when set)
GIT_ANNEX_VECTOR_CLOCK env var every time a log is written to.
Although the lookup does happen at startup even when no
log will be written now.
For reasons explained in the bug report.
Implemented using a persistent migration, which works fine. It may add a
little startup overhead when a remote is enabled that uses this, but
probably un-noticable.
On the next major version, it would be fine to delete this database,
and regenerate it from the git-annex branch information. Then this
change could be reverted.
Did nothing about adding back the data that got dropped from the db
due to the bug. Only the borg special remote was probably affected,
and it's not been released yet. rm -rf .git/annex/cidsdb does work.
Note that, after changing it with enableremote, syncing won't rescan
known archives in the borg repo using the changed config. Probably not a
problem?
Also used File in some places where filenames that could theoretically
start with - are passed to borg, to avoid it confusing them with
options.
Don't want to try to use these remotes as key/value remotes, which will
surely fail. It only recently became possible for importtree to be set
w/o exporttree, so before this code was ok.
(cherry picked from commit 97599cb0f7f4115aa5a3e81a91ee3d1d6c52dc84)
In cd1676d604, it stopped using that to avoid surprising behavior
when the location log and remote content were out of sync.
But, it seems that may have changed some behavior users relied on as
well, and also Remote.hasKeyCheap should be faster than checking then
location log.
So, try Remote.hasKeyCheap first, and only if it does not have the key,
fall back to checking the location log. If the location log still thinks
it's present, go ahead and try to get it, so the user will see a failure
rather than silently skipping a file what whereis says is on the remote.
This does make slightly slower the case where the remote does not have
the key, and location log and Remote.hasKeyCheap agree, since it now
checks both. But only 1 stat slower.
And vice-versa, but it's better to use '/' for portability.
Notably, standardPreferredContent contains "archive/*" and that might not
match if the filename ends up coming in with the slashes the other way
around.
I do think this was a reversion, but I have not tracked back to what
version. While involving the remote config, it's not the same class of
problems that I kept having to chase down for a while after the remote
config parser reworking.
git -c was already propagated via environment, but need this for
consistency.
Also, notice it does not use gitAnnexChildProcess to run the
transferrer. So nothing is done about avoid it taking the
pid lock. It's possible that the caller is already doing something that
took the pid lock, and if so, the transferrer will certianly fail,
since it needs to take the pid lock too. This may prevent combining
annex.stalldetection with annex.pidlock, but I have not verified it's
really a problem. If it was, it seems git-annex would have to take
the pid lock when starting a transferrer, and hold it until shutdown,
or would need to take pid lock when starting to use a transferrer,
and hold it until done with a transfer and then drop it. The latter
would require starting the transferrer with pid locking disabled for the
child process, so assumes that the transferrer does not do anyting that
needs locking when not running a transfer.
MatchingKey is not the thing to use when matching on actual worktreee
files.
Fix reversion in 8.20201116 that made include= and exclude= in
preferred/required content expressions match a path relative to the current
directory, rather than the path from the top of the repository.
* Guard against running in a repo where annex.uuid is set but
annex.version is set, or vice-versa.
* Avoid autoinit when a repo does not have annex.version or annex.uuid
set, but has a git-annex objects directory, suggesting it was used
by git-annex before.
I need to think about this some more, not clear if it's a todo
item specific to stalldetection at all. Remotes with this behavior
also show no progress when run with -J. And some other remotes don't
update any progress meters at all, eg adb is that way and so are hook
remotes and of course external remotes don't have to send progress info.
Done on unix, could not implement it on windows quite.
The signal library gets part of the way needed for windows.
But I had to open https://github.com/pmlodawski/signal/issues/1 because
it lacks raiseSignal.
Also, I don't know what the equivilant of getProcessGroupIDOf is on
windows. And System.Process does not provide a way to send any signal to
a process group except for SIGINT.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
All callers adjusted to update it themselves.
In Command.ReKey, and Command.SetKey, the cleanup action already did,
so it was updating the log twice before.
This fixes a bug when annex.stalldetection is set, as now
Command.Transferrer can skip updating the location log, and let it be
updated by the calling process.
This is to avoid breakage when upgrading or downgrading git-annex with a
process running that uses the interface. It's better to keep the
compatability code for a few years than worry about such breakage.
This commit was sponsored by Brett Eisenberg on Patreon.
New config annex.stalldetection, remote.name.annex-stalldetection, which
can be used to deal with remotes that stall during transfers, or are
sometimes too slow to want to use.
This commit was sponsored by Luke Shumaker on Patreon.
That seems to be the last thing needed for message serialization.
Although it's only used in the assistant currently, so hard to tell if I
forgot something.
At this point, it should be possible to start using transferkeys
when performing transfers, which will allow killing a transferkeys
process if a transfer times out or stalls. But that's for another day.
This commit was sponsored by Ethan Aubin.
Necessarily threw out the old protocol, so if an old git-annex assistant
is running, and starts a transferkeys from the new git-annex, it would
fail. But, that seems unlikely; the assistant starts up transferkeys
processes and then keeps them running. Still, may need to test that
scenario.
The new protocol is simple read/show and looks like this:
TransferRequest Download (Right "origin") (Key {keyName = "f8f8766a836fb6120abf4d5328ce8761404e437529e997aaa0363bdd4fecd7bb", keyVariety = SHA2Key (HashSize 256) (HasExt True), keySize = Just 30, keyMtime = Nothing, keyChunkSize = Nothing, keyChunkNum = Nothing}) (AssociatedFile (Just "foo"))
TransferOutput (ProgressMeter (Just 30) (MeterState {meterBytesProcessed = BytesProcessed 0, meterTimeStamp = 1.6070268727892535e9}) (MeterState {meterBytesProcessed = BytesProcessed 30, meterTimeStamp = 1.6070268728043e9}))
TransferOutput (OutputMessage "(checksum...) ")
TransferResult True
Granted, this is not optimally fast, but it seems good enough, and is
probably nearly as fast as the old protocol anyhow.
emitSerializedOutput for ProgressMeter is not yet implemented. It needs
to somehow start or update a progress meter. There may need to be a new
message that allocates a progress meter, and then have ProgressMeter
update it.
This commit was sponsored by Ethan Aubin
Avoid spurious "verification of content failed" message when downloading
content from a ssh or tor remote fails due to the remote no longer having a
copy of the content.
The P2P protocol already handled this case by sending DATA 0, followed by
VALID. But VALID was not really right, because the data is not the
requested data. So, send DATA 0, followed by INVALID. Old versions of
git-annex handle INVALID the same as VALID in this case. Now new versions
avoid displaying an incorrect message.
It would be better for the P2P protocol to have a different way to indicate
this, like perhaps sending INVALID without DATA. But that would be a
breaking change and need a new protocol verison. Since INVALID already is
part of the protocol and already needs to be handled, using it for this
special case too seems ok, and avoids the complication of another protocol
version.
This commit was sponsored by Jochen Bartl on Patreon.
This is an edge case, which happened to be triggered by the P2P protocol
seeing DATA 0. When reading 0 bytes, getting an empty string does
not mean the handle has reached EOF.
I verified there was in fact a bug, where get of an empty file followed
by another file would get the empty file and then fail
with "handle is closed". This fixes it.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
Reversion introduced in version 8.20201007, one release after the 1st
release with the extension.
Surprisingly, hClose can hang if another thread is reading from the
handle. This is because it uses takeMVar.
The use of cancel here does mean that, if receiveMessageAddonProcess
or Remote.External.AsyncExtension.receiveloop allocated some resource in
a non-async-exception safe way, they might not get a chance to clean it up.
They do not appear to, and anyway, this only happens when git-annex is
shutting down, so any recource that did leak would not be a problem.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.