Added to git-annex_proxies todo because this is something OpenNeuro
would need in order to use the git-annex proxy.
Sponsored-by: Dartmouth College's OpenNeuro project
Rather than requiring the last listed bundle in the manifest include all
refs that are in the remote, build up refs from each bundle listed in
the manifest.
This fixes a bug where pushing first a new branch foo from one clone,
and then pushing a new branch bar from another clone, caused the second
push to lose branch foo. Now the second push will add a new bundle, but
the foo ref in the bundle from the first push will still be used.
Pushing a deletion of a ref now has to delete all bundles and push a new
bundle with only the remaining refs in it.
In a "list for-push", it now has to unbundle all bundles, in order for a
deletion repush to have available all objects. (And a non-deletion push
can also rely on refs/namespaces/mine/ being up-to-date.)
It would have been possible to fix the bug by only making it do that
unbundling in "list for-push", without changing what's stored in the
bundles. But I think I prefer to populate the bundles this way. For one
thing, deleting a pushed ref now really deletes all data relating to it,
rather than leaving it present in old bundles. For another, it's easier
to explain since there is no special case for the last bundle. And, it
will often result in smaller bundles.
Note that further efficiency gains are possible with respect to what
objects are included in an incremental bundle. Two XXX comments
document how to reduce excess objects. It didn't seem worth implementing
those optimisations in this proof of concept code.
Sponsored-by: Brock Spratlen on Patreon
In a situation where there are two repos that are diverged and each pushes
in turn to git-remote-annex, the first to push updates it. Then the second
push fails because it is not a fast-forward. The problem is, before git
push fails with "non-fast-forward", it actually calls git-remote-annex
with push.
So, to the user it appears as if the push failed, but it actually reached
the remote, and overwrote the other push!
The only solution to this seems to be for git-remote-annex push to notice
when a non-force-push would overwrite a ref stored in the remote, and
refuse to push that ref, returning an error to git. This seems strange,
why would git make remote helpers implement that when it later checks the
same thing itself?
With this fix, it's still possible for a race to overwrite a change to
the MANIFEST and lose work that was pushed from the other repo. But that
needs two pushes to be running at the same time. From the user's
perspective, that situation is the same as if one repo pushed new work,
then the other repo did a git push --force, overwriting the first repo's
push. In the first repo, another push will then fail as a non
fast-forward, and the user can recover as usual. But, a MANIFEST
overwrite will leave bundle files in the remote that are not listed in
the MANIFEST. It seems likely that git-annex will eventually be able to
detect that after the fact and clean it up. Eg, it can learn all bundles
that are stored in the remote using the location log, and compare them
to the MANIFEST to find bundles that got lost.
The race can also appear to the user as if they pushed a ref, but then
it got deleted from the remote. This happens when two two pushes are
pushing different ref names. This might be harder for the user to
notice; git fetch does not indicate that a remote ref got deleted.
They would have to use git fetch --prune to notice the deletion.
Once the user does notice, they can re-push their ref to recover.
Sponsored-by: Jack Hill on Patreon
This is a shell script, so not final code, and it does not use git-annex
at all, but it shows how to push to git bundles, listed in a MANIFEST,
the same as the git-remote-annex program will eventually do.
While developing this, I realized that the design needed to be changed
slightly regarding where refs are stored. Since a push can delete a ref
from a remote, storing each newly pushed ref in a bundle won't work,
because deleting a ref would then entail deleting all old bundles and
re-uploading from scratch. So instead, only the refs in the last bundle
listed in the MANIFEST are the active refs. Any refs in prior bundles
are just old refs that were stored previously (a reflog as it were).
That means that, in a situation where two different people are pushing
to the same special remote from different repos, whoever pushes last
wins. Any refs pushed by the other person earlier will be ignored. This
may not be desirable, and git-annex might be able use the git-annex
branch to detect such situations and rescue the refs that got lost. Even
without such a recovery process though, the refs that the other person
thought they pushed will be preserved in their refs/namespaces/mine, so
a pull followed by a push will generally resolve the situation.
Note that the use of refs/namespaces/mine in the bundle is not really
desirable, and it might be worth making a local clone of the repo in
order to set up the refs that will be put in the bundle. Which seems to
be the only way to avoid needing that. But it does need to maintain
the refs/namespaces/mine/ in the git repo in order to remember what refs
have been pushed to the remote before, in order to include them in the
next bundle pushed. A name that includes the remote uuid will be needed
in the final implementation.
Anyway, this shell script seems to fully work, including incremental
pushing, force pushing, and pushes that delete refs.
Sponsored-by: Brett Eisenberg on Patreon
Added rclone special remote, which can be used without needing to install
the git-annex-remote-rclone program. This needs a new version of rclone,
which supports "rclone gitannex".
This is implemented as a variant of an external special remote, that
runs "rclone gitannex" instead of the usual git-annex-remote- command.
Parameterized Remote.External to support that.
Sponsored-by: Luke T. Shumaker on Patreon
Test suite passes this time. When committing the adjusted branch, use
the old method to make a message that old git-annex can consume. Also
made the code accept the new message, so that eventually
commitTreeExactMessage can be removed.
Sponsored-by: Kevin Mueller on Patreon
This reverts commit cee12f6a2f.
This commit broke git-annex init run in a repo that was cloned from a
repo with an adjusted branch checked out.
The problem is that findAdjustingCommit was not able to identify the
commit that created the adjusted branch. It seems that there is an extra
"\n" at the end of the commit message that it does not expect.
Since backwards compatability needs to be maintained, cannot just make
findAdjustingCommit accept it with the "\n". Will have to instead
have one commitTree variant that uses the old method, and use it for
adjusted branch committing.
sync, assist, import: Allow -m option to be specified multiple times, to
provide additional paragraphs for the commit message.
The option parser didn't allow multiple -m before, so there is no risk of
behavior change breaking something that was for some reason using multiple
-m already.
Pass through to git commands, so that the method used to assemble the
paragrahs is whatever git does. Which might conceivably change in the
future.
Note that git commit-tree has supported -m since git 1.7.7. commitTree
was probably not using it since it predates that version. Since the
configure script prevents building git-annex with git older than 2.1,
there is no risk that it's not supported now.
Sponsored-by: Nicholas Golder-Manning on Patreon
While redundant concurrent transfers were already prevented in most
cases, it failed to prevent the case where two different repositories were
sending the same content to the same repository. By removing the uuid
from the transfer lock file for Download transfers, one repository
sending content will block the other one from also sending the same
content.
In order to interoperate with old git-annex, the old lock file is still
locked, as well as locking the new one. That added a lot of extra code
and work, and the plan is to eventually stop locking the old lock file,
at some point in time when an old git-annex process is unlikely to be
running at the same time.
Note that in the case of 2 repositories both doing eg
`git-annex copy foo --to origin`
the output is not that great:
copy b (to origin...)
transfer already in progress, or unable to take transfer lock
git-annex: transfer already in progress, or unable to take transfer lock
97% 966.81 MiB 534 GiB/s 0sp2pstdio: 1 failed
Lost connection (fd:14: hPutBuf: resource vanished (Broken pipe))
Transfer failed
Perhaps that output could be cleaned up? Anyway, it's a lot better than letting
the redundant transfer happen and then failing with an obscure error about
a temp file, which is what it did before. And it seems users don't often
try to do this, since nobody ever reported this bug to me before.
(The "97%" there is actually how far along the *other* transfer is.)
Sponsored-by: Joshua Antonishen on Patreon
What this can currently be used for is only to change an url from being
used by a special remote to being used by the web remote.
This could have been a --move-from option to registerurl. But, that would
have complicated its option and --batch processing, and also would have
complicated unregisterurl, which is implemented on top of
Command.Registerurl. So, a separate command was actually less complicated
to implement.
The generic description of the command is because I want to make this
command a catch-all for other url updating kind of things, if there are
ever any more. Also because it was hard to come up with a good name for the
specific action. I considered `git-annex moveurl`, but that seems to
indicate data is perhaps actually being moved, and seems to sit at the same
level as addurl and rmurl, and this command is at the plumbing
level of registerurl and unregisterurl.
Sponsored-by: Dartmouth College's DANDI project
This needs the content to be present in order to hash it. But it's not
possible for a module used by Backend.URL to call inAnnex because that
would entail a dependency loop. So instead, rely on the fact that
Command.Migrate calls inAnnex before performing a migration.
But, Command.ExamineKey calls fastMigrate and the key may or may not
exist, and it's not wanting to actually perform a migration in any case.
To handle that, had to add an additional value to fastMigrate to
indicate whether the content is inAnnex.
Factored generateEquivilantKey out of Remote.Web.
Note that migrateFromURLToVURL hardcodes use of the SHA256E backend.
It would have been difficult not to, given all the dependency loop
issues. But --backend and annex.backend are used to tell git-annex
migrate to use VURL in any case, so there's no config knob that
the user could expect to configure that.
Sponsored-by: Brock Spratlen on Patreon
VURL is now fully working, though needs more testing.
Still need to implement verifyKeyContentIncrementally but it works
without it.
Sponsored-by: Luke T. Shumaker on Patreon
Considerable difficulty to work around an import cycle. Had to move the
list of backends (except for VURL) to Backend.Variety to VURL could use
it.
Sponsored-by: Kevin Mueller on Patreon
When downloading a VURL from the web, make sure that the equivilant key
log is populated.
Unfortunately, this does not hash the content while it's being
downloaded from the web. There is not an interface in Backend currently
for incrementally hash generation, only for incremental verification of an
existing hash. So this might add a noticiable delay, and it has to show
a "(checksum...") message. This could stand to be improved.
But, that separate hashing step only has to happen on the first download
of new content from the web. Once the hash is known, the VURL key can have
its hash verified incrementally while downloading except when the
content in the web has changed. (Doesn't happen yet because
verifyKeyContentIncrementally is not implemented yet for VURL keys.)
Note that the equivilant key log file is formatted as a presence log.
This adds a tiny bit of overhead (eg "1 ") per line over just listing the
urls. The reason I chose to use that format is it seems possible that
there will need to be a way to remove an equivilant key at some point in
the future. I don't know why that would be necessary, but it seemed wise
to allow for the possibility.
Downloads of VURL keys from other special remotes that claim urls,
like bittorrent for example, does not popilate the equivilant key log.
So for now, no checksum verification will be done for those.
Sponsored-by: Nicholas Golder-Manning on Patreon
Not yet implemented is recording hashes on download from web and
verifying hashes.
addurl --verifiable option added with -V short option because I
expect a lot of people will want to use this.
It seems likely that --verifiable will become the default eventually,
and possibly rather soon. While old git-annex versions don't support
VURL, that doesn't prevent using them with keys that use VURL. Of
course, they won't verify the content on transfer, and fsck will warn
that it doesn't know about VURL. So there's not much problem with
starting to use VURL even when interoperating with old versions.
Sponsored-by: Joshua Antonishen on Patreon
Except when a commit is made in a view, which changes metadata.
Make the assistant commit the git-annex branch after git commit of working
tree changes.
This allows using the annex.commitmessage-command in the assistant to
generate a commit message for the git-annex branch that relies on state
gathered during the commit of the working tree. Eg, it might reuse the
commit message.
Note that, when not using the assistant, a git-annex add still commits
the git-annex branch, so such a annex.commitmessage-command set up would
not work then. But if someone is using the assistant and wants
programmatic control over commit messages, this is useful. Someone not
using the assistant can get the same result by using annex.alwayscommit=false
during the git-annex add, and git-annex merge after they git commit.
pre-commit was never really intended to commit the git-annex branch
(except after recording changed metadata), but the assistant did sort of
rely on it. It does later commit the git-annex branch before pushing to
remotes, but I didn't want to risk building up lots of uncommitted changes
to it if that didn't happen frequently.
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
Was doing a Git.Branch.commit for historical reasons to do with direct
mode, which no longer apply.
Note that the preCommitAnnexHook is no longer called in commitStaged
because git-annex installs a pre-commit hook that runs the pre-commit-annex
hook. And git commit will run the pre-commit hook.
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
--raw-except=web allows using yt-dlp but not any other special remotes.
Currently this option can only be used once, trying to use it repeatedly
will make option parsing fail. Perhaps it ought to support being used more
than once, but it seemed like an unlikely use case to need that.
Note that getParsed is called repeatedly when the option is used with
several urls. While implementing DeferredParseClass would avoid that
innefficiency, it didn't seem worth the added boilerplate since
getParsed only calls byNameWithUUID which does minimal work.
Sponsored-by: Dartmouth College's DANDI project