120 lines
5.4 KiB
Markdown
120 lines
5.4 KiB
Markdown
git-remote-annex will be a program that allows push/pull/clone of a git
|
|
repository to many types of git-annex special remote.
|
|
|
|
This is a redesign and reimplementation of git-remote-datalad-annex.
|
|
It will be a safer implementation, will support incremental pushes, and
|
|
will be available to users who don't use datalad.
|
|
--[[Joey]]
|
|
|
|
---
|
|
|
|
This is implememented and working. Remaining todo list for it:
|
|
|
|
* Test incremental push edge cases involving checkprereq.
|
|
|
|
* Cloning from an annex:: url with importtree=yes doesn't work
|
|
(with or without exporttree=yes). This is because the ContentIdentifier
|
|
db is not populated. It should be possible to work around this.
|
|
|
|
* It would be nice if git-annex could generate an annex:: url
|
|
for a special remote and show it to the user, eg when
|
|
they have set the shorthand "annex::" url, so they know the full url.
|
|
`git-annex info $remote` could also display it.
|
|
Currently, the user has to remember how the special remote was
|
|
configured and replicate it all in the url.
|
|
|
|
There are some difficulties to doing this, including that
|
|
RemoteConfig can have hidden fields that should be omitted.
|
|
|
|
* initremote/enableremote could have an option that configures the url to a
|
|
special remote to a annex:: url. This would make it easier to use
|
|
git-remote-annex, since the user would not need to set up the url
|
|
themselves. (Also it would then avoid setting `skipFetchAll = true`)
|
|
|
|
* datalad-annex supports cloning from the web special remote,
|
|
using an url that contains the result of pushing to eg, a directory
|
|
special remote.
|
|
`datalad-annex::https://example.com?type=web&url={noquery}`
|
|
Supporting something like this would be good.
|
|
|
|
* Improve behavior in push races. A race can overwrite a change
|
|
to the MANIFEST and lose work that was pushed from the other repo.
|
|
From the user's perspective, that situation is the same as if one repo
|
|
pushed new work, then the other repo did a git push --force, overwriting
|
|
the first repo's push. In the first repo, another push will then fail as
|
|
a non fast-forward, and the user can recover as usual. This is probably
|
|
okish.
|
|
|
|
But.. a MANIFEST overwrite will leave bundle files in the remote that
|
|
are not listed in the MANIFEST. It seems likely that git-annex could
|
|
detect that after the fact and clean it up. Eg, if it caches
|
|
the last MANIFEST it uploaded, next time it downloads the MANIFEST
|
|
it can check if there are bundle files in the old one that are not
|
|
in the new one. If so, it can drop those bundle files from the remote.
|
|
(May be unsafe, see below section on bundle deletion problems.)
|
|
|
|
* A push race can also appear to the user as if they pushed a ref, but then
|
|
it got deleted from the remote. This happens when two pushes are
|
|
pushing different ref names. This might be harder for the user to
|
|
notice; git fetch does not indicate that a remote ref got deleted.
|
|
They would have to use git fetch --prune to notice the deletion.
|
|
Once the user does notice, they can re-push their ref to recover.
|
|
Can this be improved?
|
|
|
|
## bundle deletion problems
|
|
|
|
Deleting bundles results in some problems involving races,
|
|
detailed below, that result in the manifest file listing a bundle that has
|
|
been deleted. Which breaks cloning, and is data loss, and so *must*
|
|
be solved before release.
|
|
|
|
* A race between an incremental push and a full push can result in
|
|
a bundle that the incremental push is based on being deleted by the full
|
|
push, and then incremental push's manifest file being written later.
|
|
Which will prevent cloning or some pulls from working.
|
|
|
|
A fix: Make each full push (and emptying push) also write to a fallback
|
|
manifest file that is only written by full pushes (and emptying pushes),
|
|
not incremental pushes. When fetching the main manifest file, always
|
|
check that all bundles mentioned in it are still in the remote. If any
|
|
are missing, fetch and use the fallback manifest file instead.
|
|
|
|
* A race between two full pushes can also result in the manifest file listing
|
|
a bundle that has been deleted:
|
|
|
|
Start with a full push of bundle A.
|
|
|
|
Then there are 2 racing full pushes X and Y, of bundle A and B
|
|
respectively. With this series of operations:
|
|
|
|
1. Y: write bundle B
|
|
1. Y: read manifest (listing A)
|
|
1. Y: write B to manifest
|
|
1. X: write bundle A
|
|
1. Y: delete bundle A
|
|
1. X: read manifest (listing B)
|
|
1. X: write A to manifest
|
|
1. X: delete bundle B
|
|
|
|
Which results in a manifest that lists A, but that bundle was deleted.
|
|
|
|
The problems above *could* be solved by not deleting bundles, but that is
|
|
unsatisfactory.
|
|
|
|
Old bundles could be deleted after some period of time. But a process can
|
|
be suspended at any point and resumed later, so the race windows can be
|
|
arbitrarily wide.
|
|
|
|
What if only emptying pushes delete bundles? If a manifest file refers to a
|
|
bundle that has been deleted, that can be treated the same as if the
|
|
manifest file was empty, because we know that, for that bundle to have been
|
|
deleted, there must have been an emptying push. So this would work.
|
|
|
|
It is kind of a cop-out, because it requires the user to do an emptying
|
|
push from time to time. But by doing that, the user will expect that
|
|
someone who pulls at that point gets an empty repository.
|
|
|
|
Note that a race between an emptying push an a ref push will result in the
|
|
emptying push winning, so the ref push is lost. This is the same behavior
|
|
as can happen in a push race not involving deletion though, and any
|
|
improvements that are made to the UI around that will also help with this.
|