34a6db4f15
On push, first try to drop all outManifest keys listed in the current manifest file, which resumes from an interrupted push that didn't get a chance to delete those keys. The new manifest gets its outManifest populated with the keys that were in the old manifest, plus any of the keys that were unable to be dropped. Note that it would be possible for uploadManifest to skip dropping old keys at all. The old keys would get dropped on the next push. But it seems better to delete stuff immediately rather than waiting. And the extra work is limited to push and typically is small. A remote where dropKey always fails will result in an outManifest that grows longer and longer. It would be possible to check if the remote has appendonly = True and avoid populating the outManifest. Of course, an appendonly remote will grow with every git push anyway. And currently only Remote.GitLFS sets that, which can't be used as a git-remote-annex remote anyway.
84 lines
4.1 KiB
Markdown
84 lines
4.1 KiB
Markdown
git-remote-annex will be a program that allows push/pull/clone of a git
|
|
repository to many types of git-annex special remote.
|
|
|
|
This is a redesign and reimplementation of git-remote-datalad-annex.
|
|
It will be a safer implementation, will support incremental pushes, and
|
|
will be available to users who don't use datalad.
|
|
--[[Joey]]
|
|
|
|
---
|
|
|
|
This is implememented and working. Remaining todo list for it:
|
|
|
|
* A race between an incremental push and a full push can result in
|
|
a bundle that the incremental push is based on being deleted by the full
|
|
push, and then incremental push's manifest file being written later.
|
|
Which will prevent cloning or some pulls from working.
|
|
|
|
There is no way to prevent this race, but the problem can be avoided:
|
|
Make each full push also write to a fallback manifest file that is only
|
|
written by full pushes, not incremental pushes. When fetching the main
|
|
manifest file, always check that all bundles mentioned in it are still in
|
|
the remote. If any are missing, fetch and use the fallback manifest file
|
|
instead.
|
|
|
|
(The only other solution I can think of is to never delete old bundles,
|
|
except after some amount of time long enough that it credibly avoids
|
|
the race. But since a process could be suspended at any point and resumed
|
|
later, the race window could be arbitrarily wide.)
|
|
|
|
* Test incremental push edge cases involving checkprereq.
|
|
|
|
* Cloning from an annex:: url with importtree=yes doesn't work
|
|
(with or without exporttree=yes). This is because the ContentIdentifier
|
|
db is not populated. It should be possible to work around this.
|
|
|
|
* See XXX in uploadManifest about recovering from a situation
|
|
where the remote is left with a deleted manifest when a push
|
|
is interrupted part way through. This should be recoverable
|
|
by caching the manifest locally and re-uploading it when
|
|
the remote has no manifest or prompting the user to merge and re-push.
|
|
|
|
* It would be nice if git-annex could generate an annex:: url
|
|
for a special remote and show it to the user, eg when
|
|
they have set the shorthand "annex::" url, so they know the full url.
|
|
`git-annex info $remote` could also display it.
|
|
Currently, the user has to remember how the special remote was
|
|
configured and replicate it all in the url.
|
|
|
|
There are some difficulties to doing this, including that
|
|
RemoteConfig can have hidden fields that should be omitted.
|
|
|
|
* initremote/enableremote could have an option that configures the url to a
|
|
special remote to a annex:: url. This would make it easier to use
|
|
git-remote-annex, since the user would not need to set up the url
|
|
themselves. (Also it would then avoid setting `skipFetchAll = true`)
|
|
|
|
* datalad-annex supports cloning from the web special remote,
|
|
using an url that contains the result of pushing to eg, a directory
|
|
special remote.
|
|
`datalad-annex::https://example.com?type=web&url={noquery}`
|
|
Supporting something like this would be good.
|
|
|
|
* Improve behavior in push races. A race can overwrite a change
|
|
to the MANIFEST and lose work that was pushed from the other repo.
|
|
From the user's perspective, that situation is the same as if one repo
|
|
pushed new work, then the other repo did a git push --force, overwriting
|
|
the first repo's push. In the first repo, another push will then fail as
|
|
a non fast-forward, and the user can recover as usual. This is probably
|
|
okish.
|
|
|
|
But.. a MANIFEST overwrite will leave bundle files in the remote that
|
|
are not listed in the MANIFEST. It seems likely that git-annex could
|
|
detect that after the fact and clean it up. Eg, if it caches
|
|
the last MANIFEST it uploaded, next time it downloads the MANIFEST
|
|
it can check if there are bundle files in the old one that are not
|
|
in the new one. If so, it can drop those bundle files from the remote.
|
|
|
|
* A push race can also appear to the user as if they pushed a ref, but then
|
|
it got deleted from the remote. This happens when two pushes are
|
|
pushing different ref names. This might be harder for the user to
|
|
notice; git fetch does not indicate that a remote ref got deleted.
|
|
They would have to use git fetch --prune to notice the deletion.
|
|
Once the user does notice, they can re-push their ref to recover.
|
|
Can this be improved?
|