* Added annex.resolvemerge configuration, which can be set to false to
disable the usual automatic merge conflict resolution done by git-annex
sync and the assistant.
* sync: Added --no-resolvemerge option.
Note that disabling merge conflict resolution is probably not a good idea
in a direct mode repo or adjusted branch. Since updates to both are done
outside the usual work tree, if it fails the tree is not left in a
conflicted state, and it would be hard to manually resolve the conflict.
Still, made annex.resolvemerge be supported in those cases for consistency.
This commit was sponsored by Riku Voipio.
When setting metadata of a file that did not exist, no error message was
displayed, unlike getting metadata and most other git-annex commands. Fixed
this oversight.
Note that, if the file exists but is not annexed, there's no error.
This is the same behavior as other git-annex commands.
This commit was supported by the NSF-funded DataLad project.
* move --to=here moves from all reachable remotes to the local repository.
The output of move --from remote is changed slightly, when the remote and
local both have the content. It used to say:
move foo ok
Now:
move foo (from theremote...) ok
That was done so that, when move --to=here is used and the content is
locally present and also in several remotes, it's clear which remotes the
content gets dropped from.
Note that move --to=here will report an error if a non-reachable remote
contains the file, even if the local repository also contains the file. I
think that's reasonable; the user may be intending to move all other copies
of the file from remotes.
OTOH, if a copy of the file is believed to be present in some repository
that is not a configured remote, move --to=here does not report an error.
So a little bit inconsistent, but erroring in this case feels wrong.
copy --to=here came along for free, but it's basically the same behavior as
git-annex get, and probably with not as good messages in edge cases
(especially on failure), so I've not documented it.
This commit was sponsored by Anthony DeRobertis on Patreon.
See my comment. This only avoids the problem for -J; two git-annex
processes started at the same time could still both try to write to
.git/config and one fail. That would be very unlikely though, and it
doesn't really seem worth adding an additional layer of locking around
.git/config.
This commit was supported by the NSF-funded DataLad project.
orElse is great, but was not the right thing to use here because
waitTakeLock could retry for other reasons than the lock being held,
which made tryTakeLock fail when it shouldn't.
Instead, move the code to tryTakeLock and implement waitTakeLock using
tryTakeLock and retry.
(Also, in runTransfer, when checkSaneLock fails, dropLock to avoid leaking a
lock handle.)
This commit was supported by the NSF-funded DataLad project.
When built with concurrent-output 1.9, ssh password prompts will no longer
interfere with the -J display.
To avoid flicker, only done when ssh actually does need to prompt;
ssh is first run in batch mode and if that succeeds the connection is up
and no need to clear regions.
This commit was supported by the NSF-funded DataLad project.
Might want to remove this when it gets fixed, in case adjusted branches are
used in a repo with a great many refs, which would become unnecessarily
slow.
This commit was supported by the NSF-funded DataLad project.
Removed dependency on MissingH, instead depending on the split
library.
After laying groundwork for this since 2015, it
was mostly straightforward. Added Utility.Tuple and
Utility.Split. Eyeballed System.Path.WildMatch while implementing
the same thing.
Since MissingH's progress meter display was being used, I re-implemented
my own. Bonus: Now progress is displayed for transfers of files of
unknown size.
This commit was sponsored by Shane-o on Patreon.
When ssh connection caching is enabled (and when GIT_ANNEX_USE_GIT_SSH is
not set), only one ssh password prompt will be made per host, and only one
ssh password prompt will be made at a time.
This also fixes a race in prepSocket's stale ssh connection stopping
when run with -J. It was possible for one thread to start a cached ssh
connection, and another thread to immediately stop it, resulting in excess
connections being made.
This commit was supported by the NSF-funded DataLad project.
It takes a single key-value backend, rather than the unncessary and confusing list.
The old option still works if set.
Simplified some old old code too.
This commit was sponsored by Thomas Hochstein on Patreon.
fsck already special-cased dead keys to make --all not report errors with
them, and it makes sense to also expand that to whereis. I think it makes
sense for dead keys to be skipped by all uses of --all, so mistakes can be
completely forgotten about and not come back to haunt us.
The speed impact of testing if the key is dead is negligible for fsck and
whereis, since they use the location log anyway and it gets cached.
This does slow down a few commands that support --all, in particular
metadata --all runs around 2x as slow. I don't think metadata
--all is often used though. It might slow down copy/move/mirror
--all and get --all.
log --all is not affected (does not use the normal --all machinery).
Dead keys will still be processed by --incomplete, --branch,
--failed, and --key. Although it would be unlikely for a dead key to
ave in incomplete or failed transfer. It seems to make perfect sense for
--branch to process keys on the branch, even if dead.
(fsck's special-casing of dead keys was left in, so if one of these options
causes a dead key to be fscked, there will be a nice message.)
This commit was supported by the NSF-funded DataLad project.
Unlike git add -u, git annex add -u does not update the index for files
removed from the working tree. But then, "git add ." stages removals,
and "git annex add ." does not, so that's an existing divergence.
Seems that --update --batch would need to run git ls-files once per line of
batch input, which would surely be too slow, so just throw an error for
that.
This commit was supported by the NSF-funded DataLad project.
This was never supported before. And it doesn't re-encrypt the
gcrypt repo to the new gcrypt-participants, but it does at least now not
crash, and set gcrypt-participants.
This commit was sponsored by andrea rota.
They were silently ignored, a reversion introduced in 6.20160527.
I don't like this regular git remote special case in enableremote, but I
can't see a way to get rid of it. So, check if the existing remote is
a Remote.Git
This commit was sponsored by Trenton Cronholm on Patreon.
This is necessary because as feared, the extra -n parameter that git-annex
passes breaks uses of these environment variables that expect exactly the
parameters that git passes.
For example, see https://github.com/datalad/datalad/issues/1456
It would of course be possible to pre-close stdin before running ssh so not
needing the -n, and I think that would not even break ssh's password
caching. But it would probably involve a lot of work, possibly would need
to deal with some layering violations, and would be error-prone. The really
clean fix would be to make all the ssh stuff return a CreateProcess, which
could have the handle closed when appropriate, but that would be a large
reworing of the code base.
This commit was supported by the NSF-funded DataLad project.
The former can be useful to make remotes that don't get fully synced with
local changes, which comes up in a lot of situations.
The latter was mostly added for symmetry, but could be useful (though less
likely to be).
Implementing `remote.<name>.annex-pull` was a bit tricky, as there's no one
place where git-annex pulls/fetches from remotes. I audited all
instances of "fetch" and "pull". A few cases were left not checking this
config:
* Git.Repair can try to pull missing refs from a remote, and if the local
repo is corrupted, that seems a reasonable thing to do even though
the config would normally prevent it.
* Assistant.WebApp.Gpg and Remote.Gcrypt and Remote.Git do fetches
as part of the setup process of a remote. The config would probably not
be set then, and having the setup fail seems worse than honoring it if it
is already set.
I have not prevented all the code that does a "merge" from merging branches
from remotes with remote.<name>.annex-pull=false. That could perhaps
be done, but it would need a way to map from branch name to remote name,
and the way refspecs work makes that hard to get really correct. So if the
user fetches manually, the git-annex branch will get merged, for example.
Anther way of looking at/justifying this is that the setting is called
"annex-pull", not "annex-merge".
This commit was supported by the NSF-funded DataLad project.
They are handled close the same as they are by git. However, unlike git,
git-annex sometimes needs to pass the -n parameter when using these.
So, this has the potential for breaking some setup, and perhaps there ought
to be a ANNEX_USE_GIT_SSH=1 needed to use these. But I'd rather avoid that
if possible, so let's see if anyone complains.
Almost all places where "ssh" was run have been changed to support the env
vars. Anything still calling sshOptions does not support them. In
particular, rsync special remotes don't. Seems that annex-rsync-transport
already gives sufficient control there.
(Fixed in passing: Remote.Helper.Ssh.toRepo used to extract
remoteAnnexSshOptions and pass them to sshOptions, which was redundant
since sshOptions also extracts those.)
This commit was sponsored by Jeff Goeke-Smith on Patreon.
Fix bug when used with a recently cloned repository, where
"merging" messages were included in the output of configlist (and perhaps
other commands) and caused a "Failed to get annex.uuid configuration"
error.
This does not seem to have been a reversion.
I saw this with configlist, but it seems possible for other commands to be
effected, and it might not always happen only after a fresh clone. Eg, if a
foo/git-annex branch is pushed to the remote, the next git-annex-shell will
auto-merge it and display the message.
Decided to run all git-annex-shell commands with noMessages,
even ones that don't currently use stdout for structured communication.
Better to keep open the possibility for using stdout in the future.
This commit was supported by the NSF-funded DataLad project
The bug was that withFile closes the handle afterwards, but the content
of the file was not read due to laziness. Using readFile avoids it.
This commit was sponsored by Nick Daly on Patreon.
findShellCommand needs a full path to a file in order to check it for a
shebang on Windows. It was being run with only the base name of the external
special remote program, which would only work when it was in the current
directory.
This is why users in
https://github.com/DanielDent/git-annex-remote-rclone/pull/10 and elsewhere
were complaining that the previous improvements to git-annex didn't make
git-remote-rclone work on Windows.
Also, reworked checkearlytermination, which while it worked, seemed
to rely on a race condition. And, improved its error messages.
This commit was sponsored by Shane-o on Patreon.
It was distributing jobs to remotes that were not being used by any other
job. But, suppose that there are only 2 remotes, and -J10. In such a case,
the first 2 downloads would be distributed amoung the 2 remotes, but
the other 8 would all go to remote #1. Improved by keeping a counter
of how many jobs are assigned to a remote, and prefer remotes with fewer
jobs.
Note use of Data.Map.Strict to avoid blowing up space. I kept the
bang-patterns as-is, although probably not needed with Data.Map.Strict.
This commit was sponsored by Jack Hill on Patreon.
The slowdown is not going to be large in typical small-ish repos.
And it does not seem to matter if the assistant reacts a little bit slower
in situations involving the expensive scan, since:
a) Those situations typically involve getting back in sync after something
has changed on a remote, often after a disconnect of some duration.
So taking a few seconds more is not noticable.
b) If the scan finds things that it needs to do, it will start
blocking anyway after 10 transfers are queued (due to use of
queueTransferWhenSmall). So, only the speed of finding the first 10
transfers will be impacted by this change.
This commit was sponsored by Jochen Bartl on Patreon.
It was relying on segmentPaths to work correctly, so when it didn't,
sometimes the file that did not exist got matched up with a non-null
list of results. Fixed by always checking if each parameter exists.
There are two reason segmentPaths might not work correctly.
For one, it assumes that when the original list of paths
has more than 100 paths, it's not worth paying the CPU cost to
preserve input orders.
And then, it fails when a directory such as "." or ".." or
/path/to/repo is in the input list, and the list of found paths
does not start with that same thing. It should probably not be using
dirContains, but something else.
But, it's not clear how to handle this fully. Consider
when [".", "subdir"] has been expanded by git ls-files to
["subdir/1", "subdir/2"]
-- Both of the inputs contained those results, so there's
no one right answer for segmentPaths. All these would be equally valid:
[["subdir/1", "subdir/2"], []]
[[], ["subdir/1", "subdir/2"]]
[["subdir/1"], [""subdir/2"]]
So I've not tried to improve segmentPaths.
* init: When annex.securehashesonly has been set with git-annex config,
copy that value to the annex.securehashesonly git config.
* config --set: As well as setting value in git-annex branch,
set local gitconfig. This is needed especially for
annex.securehashesonly, which is read only from local gitconfig and not
the git-annex branch.
doc/todo/sha1_collision_embedding_in_git-annex_keys.mdwn has the
rationalle for doing it this way. There's no perfect solution; this
seems to be the least-bad one.
This commit was supported by the NSF-funded DataLad project.
Added --securehash option to match files using a secure hash function, and
corresponding securehash preferred content expression.
This commit was sponsored by Ethan Aubin.