This negotiation is not supported by versions of git-annex older
than 6.20180312. Well, maybe really 6.20180227 or so, but using that in
the changelog simplifies things since it was the version for the other
changes as well.
See commit c81768d425 for the back story.
As well as allowing for future protocol improvements, this will result
in negoatiating protocol version 1, which is an improvement over default
version 0.
In fact, it looks like no supported version of git-annex will use
protocol version 0, since version 1 was introduced in 6.20180227.
Still, removing the code for version 0 seems unncessary.
See commit 31e1adc005.
Sponsored-by: Brett Eisenberg on Patreon.
* Removed support for accessing git remotes that use versions of
git-annex older than 6.20180312.
* git-annex-shell: Removed several commands that were only needed to
support git-annex versions older than 6.20180312.
(lockcontent, recvkey, sendkey, transferinfo, commit)
The P2P protocol was added in that version, and used ever since, so
this code was only needed for interop with older versions.
"git-annex-shell commit" is used by newer git-annex versions, though
unnecessarily so, because the p2pstdio command makes a single commit at
shutdown. Luckily, it was run with stderr and stdout sent to /dev/null,
and non-zero exit status or other exceptions are caught and ignored. So,
that was able to be removed from git-annex-shell too.
git-annex-shell inannex, recvkey, sendkey, and dropkey are still used by
gcrypt special remotes accessed over ssh, so those had to be kept.
It would probably be possible to convert that to using the P2P protocol,
but it would be another multi-year transition.
Some git-annex-shell fields were able to be removed. I hoped to remove
all of them, and the very concept of them, but unfortunately autoinit
is used by git-annex sync, and gcrypt uses remoteuuid.
The main win here is really in Remote.Git, removing piles of hairy fallback
code.
Sponsored-by: Luke Shumaker
Commit 4bf7940d6b introduced this
problem, but was otherwise doing a good thing. Problem being
that fileRef "/foo" used to return ":./foo", which was actually wrong,
but as long as there was no foo in the local repository, catKey
could operate on it without crashing. After that fix though, fileRef
would return eg "../../foo", resulting in fileRef returning
":./../../foo", which will make git cat-file crash since that's
not a valid path in the repo.
Fix is simply to make fileRef detect paths outside the repo and return
Nothing. Then catKey can be skipped. This needed several bugfixes to
dirContains as well, in previous commits.
In Command.Smudge, this led to needing to check for Nothing. That case
should actually never happen, because the fileoutsiderepo check will
detect it earlier.
Sponsored-by: Brock Spratlen on Patreon
I first saw this getting with -J2 over ssh, but later saw it also
without the -J2. It was resuming, and the calulated unboundDelay was
many minutes. The first update of the meter jumped to some large value,
because of the resuming, and so it thought the BW was super fast.
Avoid by waiting until the second meter update.
Might be a good idea to also guard for the delay being many seconds
and avoid waiting. But how many? If BW is legitimately super fast, and a
remote happens to read more than a 32kb or so chunk at a time, it could
in theory download megabytes or gigabytes of data before the first meter
update. It would actually be appropriate then to delay for a long time,
if the desired BW was low. Could make up some numbers that are sane now,
but tech may improve.
(BTW, pleased to see bwlimit does work with -J. I had worried that
it might not, if the meter update happened in a different thread than
the downloading, but it's done in the same thread.)
Sponsored-by: Brett Eisenberg on Patreon
New method is much better. Avoids unrestrained transfer at the beginning
(except for the first block. Keeps right at or a few kb/s below the
configured limit, with very little varation in the actual reported bandwidth.
Removed the /s part of the config as it's not needed.
Ready to merge.
Sponsored-by: Luke Shumaker on Patreon
* When downloading urls fail, explain which urls failed for which
reasons.
* web: Avoid displaying a warning when downloading one url failed
but another url later succeeded.
Some other uses of downloadUrl use urls that are effectively internal use,
and should not all be displayed to the user on failure. Eg, Remote.Git
tries different urls where content could be located depending on how the
remote repo is set up. Exposing those urls to the user would lead to wild
goose chases. So had to parameterize it to control whether it displays urls
or not.
A side effect of this change is that when there are some youtube urls
and some regular urls, it will try regular urls first, even if the
youtube urls are listed first. This seems like an improvement if
anything, but in any case there's no defined order of urls that it's
supposed to use.
Sponsored-by: Dartmouth College's Datalad project
New --batch-keys option added to these commands: get, drop, move, copy, whereis
git-annex-matching-options had to be reworded since some of its options
can be used to match on keys, not only files.
Sponsored-by: Luke Shumaker on Patreon
And that should be all the special remotes supporting it on linux now,
except for in the odd edge case here and there.
Sponsored-by: Dartmouth College's DANDI project
Except when configuration makes curl be used. It did not seem worth
trying to tail the file when curl is downloading.
But when an interrupted download is resumed, it does not read the whole
existing file to hash it. Same reason discussed in
commit 7eb3742e4b76d1d7a487c2c53bf25cda4ee5df43; that could take a long
time with no progress being displayed. And also there's an open http
request, which needs to be consumed; taking a long time to hash the file
might cause it to time out.
Also in passing implemented it for git and external special remotes when
downloading from the web. Several others like S3 are within striking
distance now as well.
Sponsored-by: Dartmouth College's DANDI project
Added fileRetriever', which will let the remaining special remotes
eventually also support incremental verify.
Sponsored-by: Dartmouth College's DANDI project
Simply feed each chunk in turn to the incremental verifier.
When resuming an interrupted retrieve, it does not do incremental
verification. That would need to read the file, up to the resume point,
and feed it to the incremental verifier. That seems easy to get wrong.
Also it would mean extra work done before the transfer can start. Which
would complicate displaying progress, and would perhaps not appear to the
user as if it was resuming from where it left off. Instead, in that
situation, return UnVerified, and let the verification be done in a
separate pass.
Granted, Annex.CopyFile does manage all that, but it's not complicated
by dealing with chunks too.
Sponsored-by: Dartmouth College's DANDI project
I'm now reasonably sure I've identified both cases where this can
happen. v8 upgrades and certian filesystems eg NFS. Both are handled as
well as can be, though it may involve some extra checksumming work.
Dropping an object with drop --unused or dropunused will mark it as
dead, preventing fsck --all from complaining about it after it's been
dropped from all repositories.
If another repository still has a copy, it won't be treated as dead
until it's also dropped from there.
The drop has to use --unused, can't be --key or something else, because
this indicates that the user has recently ran git-annex unused. If it
checked the unused log on every drop, bad things would happen when the
unused log was out of date, eg a file used to be unused but then got
re-added. Marking such a file as dead could be confusing. When the user
uses --unused/dropunused, they must consider the unused information to be
up-to-date.
The particular workflow this enables is:
git annex add foo
git annex unannex foo
git annex unused
git annex drop --unused / dropunused
git annex fsck --all # no warnings
The docs for git-annex unannex say to use git-annex unused and dropunused,
so the user should be pointed in this direction when they want to undo an
accidental add.
Sponsored-by: Brock Spratlen on Patreon
Freeze first sets the file perms, and then runs
freezecontent-command. Thaw runs thawcontent-command before
restoring file permissions. This is in case the freeze command
prevents changing file perms, as eg setting a file immutable does.
Also, changing file perms tends to mess up previously set ACLs.
git-annex init's probe for crippled filesystem uses them, so if file perms
don't work, but freezecontent-command manages to prevent write to a file,
it won't treat the filesystem as crippled.
When the the filesystem has been probed as crippled, the hooks are not
used, because there seems to be no point then; git-annex won't be relying
on locking annex objects down. Also, this avoids them being run when the
file perms have not been changed, in case they somehow rely on
git-annex's setting of the file perms in order to work.
Sponsored-by: Dartmouth College's Datalad project
Remove closed bugs and todos that were last edited or commented before 2020.
Except for ones tagged projects/* since projects like datalad want to keep
around records of old deleted bugs longer.
Command line used:
for f in $(grep -l '|done\]\]' -- ./*.mdwn); do if ! grep -q "projects/" "$f"; then d="$(echo "$f" | sed 's/.mdwn$//')"; if [ -z "$(git log --since=01-01-2020 --pretty=oneline -- "$f")" -a -z "$(git log --since=01-01-2020 --pretty=oneline -- "$d")" ]; then git rm -- "./$f" ; git rm -rf "./$d"; fi; fi; done
for f in $(grep -l '\[\[done\]\]' -- ./*.mdwn); do if ! grep -q "projects/" "$f"; then d="$(echo "$f" | sed 's/.mdwn$//')"; if [ -z "$(git log --since=01-01-2020 --pretty=oneline -- "$f")" -a -z "$(git log --since=01-01-2020 --pretty=oneline -- "$d")" ]; then git rm -- "./$f" ; git rm -rf "./$d"; fi; fi; done
Eg, before with a .gitattributes like:
*.2 annex.numcopies=2
*.1 annex.numcopies=1
And foo.1 and foo.2 having the same content and key, git-annex drop foo.1 foo.2
would succeed, leaving just 1 copy, despite foo.2 needing 2 copies.
It dropped foo.1 first and then skipped foo.2 since its content was gone.
Now that the keys database includes locked files, this longstanding wart
can be fixed.
Sponsored-by: Noam Kremen on Patreon
Most of this is just refactoring. But, handleDropsFrom
did not verify that associated files from the keys db were still
accurate, and has now been fixed to.
A minor improvement to this would be to avoid calling catKeyFile
twice on the same file, when getting the numcopies and mincopies value,
in the common case where the same file has the highest value for both.
But, it avoids checking every associated file, so it will scale well to
lots of dups already.
Sponsored-by: Kevin Mueller on Patreon
When the log has an activity that is not known, eg added by a future
version of git-annex, it used to be treated as no activity at all,
which would make git-annex expire think it should expire the repository,
despite it having some kind of recent activity.
Hopefully there will be no reason to add a new activity until enough
time has passed that this commit is in use everywhere.
Sponsored-by: Jake Vosloo on Patreon
This will mostly just avoid a DB lookup, so things get marginally
faster. But in cases where there are many files using the same key, it
can be a more significant speedup.
Added overhead is one MVar lookup per call, which should be small
enough, since this happens after transferring or ingesting a file,
which is always a lot more work than that. It would be nice, though,
to move getGitConfig to AnnexRead, which there is an open todo about.
Clear visible progress bar first.
Removed showSideActionAfter because it can't be used in reconcileStaged
(import loop). Instead, it counts the number of files it
processes and displays it after it's seen a sufficient to know it's
taking a while.
Sponsored-by: Dartmouth College's Datalad project
This makes git checkout and git merge hooks do the work to catch up with
changes that they made to the tree. Rather than doing it at some later
point when the user is not thinking about that past operation.
Sponsored-by: Dartmouth College's Datalad project
reconcileStaged was doing a redundant scan to scannAnnexedFiles.
It would probably make sense to move the body of scannAnnexedFiles
into reconcileStaged, the separation does not really serve any purpose.
Sponsored-by: Dartmouth College's Datalad project
Commit 428c91606b made it need to do more
work in situations like switching between very different branches.
Compare with seekFilteredKeys which has a similar optimisation. Might be
possible to factor out the common part from these?
Sponsored-by: Dartmouth College's Datalad project