git-annex/doc/todo/git-remote-annex.mdwn
Joey Hess 3a38520aac
avoid interrupted push leaving remote without a manifest
Added a backup manifest key, which is used if the main manifest key is
not present. When uploading a new Manifest, it makes sure that it never
drops one key except when the other key is present.

It's entirely possible for the two manifest keys to get out of sync, due
to races. The main one wins when it's present, it is possible for the
main one being dropped to expose the backup one, which has a different
push recorded.
2024-05-20 15:41:09 -04:00

92 lines
4.3 KiB
Markdown

git-remote-annex will be a program that allows push/pull/clone of a git
repository to many types of git-annex special remote.
This is a redesign and reimplementation of git-remote-datalad-annex.
It will be a safer implementation, will support incremental pushes, and
will be available to users who don't use datalad.
--[[Joey]]
---
This is implememented and working. Remaining todo list for it:
* Test incremental push edge cases involving checkprereq.
* A race between an incremental push and a full push can result in
a bundle that the incremental push is based on being deleted by the full
push, and then incremental push's manifest file being written later.
Which will prevent cloning or some pulls from working.
There is no way to prevent this race, but the problem can be avoided:
Make each full push also write to a fallback manifest file that is only
written by full pushes, not incremental pushes. When fetching the main
manifest file, always check that all bundles mentioned in it are still in
the remote. If any are missing, fetch and use the fallback manifest file
instead.
(The only other solution I can think of is to never delete old bundles,
except after some amount of time long enough that it credibly avoids
the race. But since a process could be suspended at any point and resumed
later, the race window could be arbitrarily wide.)
* A race between two full pushes can also result in the manifest file listing
a bundle that has been deleted:
Start with a full push that results in manifest file M.
Then make a full push of something else. This overwrites the
manifest file, and then deletes the bundle listed in M.
At the same time, make another full push of M. This uploads the bundle
listed in M (just before the other push deletes it), and then writes
manifest file M.
Will the fallback manifest file help with this case?
* Cloning from an annex:: url with importtree=yes doesn't work
(with or without exporttree=yes). This is because the ContentIdentifier
db is not populated. It should be possible to work around this.
* It would be nice if git-annex could generate an annex:: url
for a special remote and show it to the user, eg when
they have set the shorthand "annex::" url, so they know the full url.
`git-annex info $remote` could also display it.
Currently, the user has to remember how the special remote was
configured and replicate it all in the url.
There are some difficulties to doing this, including that
RemoteConfig can have hidden fields that should be omitted.
* initremote/enableremote could have an option that configures the url to a
special remote to a annex:: url. This would make it easier to use
git-remote-annex, since the user would not need to set up the url
themselves. (Also it would then avoid setting `skipFetchAll = true`)
* datalad-annex supports cloning from the web special remote,
using an url that contains the result of pushing to eg, a directory
special remote.
`datalad-annex::https://example.com?type=web&url={noquery}`
Supporting something like this would be good.
* Improve behavior in push races. A race can overwrite a change
to the MANIFEST and lose work that was pushed from the other repo.
From the user's perspective, that situation is the same as if one repo
pushed new work, then the other repo did a git push --force, overwriting
the first repo's push. In the first repo, another push will then fail as
a non fast-forward, and the user can recover as usual. This is probably
okish.
But.. a MANIFEST overwrite will leave bundle files in the remote that
are not listed in the MANIFEST. It seems likely that git-annex could
detect that after the fact and clean it up. Eg, if it caches
the last MANIFEST it uploaded, next time it downloads the MANIFEST
it can check if there are bundle files in the old one that are not
in the new one. If so, it can drop those bundle files from the remote.
* A push race can also appear to the user as if they pushed a ref, but then
it got deleted from the remote. This happens when two pushes are
pushing different ref names. This might be harder for the user to
notice; git fetch does not indicate that a remote ref got deleted.
They would have to use git fetch --prune to notice the deletion.
Once the user does notice, they can re-push their ref to recover.
Can this be improved?