2024-05-14 17:52:20 +00:00
|
|
|
git-remote-annex will be a program that allows push/pull/clone of a git
|
|
|
|
repository to many types of git-annex special remote.
|
2024-04-06 23:52:07 +00:00
|
|
|
|
|
|
|
This is a redesign and reimplementation of git-remote-datalad-annex.
|
|
|
|
It will be a safer implementation, will support incremental pushes, and
|
|
|
|
will be available to users who don't use datalad.
|
2024-05-10 18:41:18 +00:00
|
|
|
--[[Joey]]
|
2024-04-06 23:52:07 +00:00
|
|
|
|
2024-05-10 18:41:18 +00:00
|
|
|
---
|
2024-04-06 23:52:07 +00:00
|
|
|
|
2024-05-10 18:41:18 +00:00
|
|
|
This is implememented and working. Remaining todo list for it:
|
2024-04-25 21:01:17 +00:00
|
|
|
|
2024-05-20 18:17:00 +00:00
|
|
|
* Test incremental push edge cases involving checkprereq.
|
|
|
|
|
2024-05-16 13:37:28 +00:00
|
|
|
* A race between an incremental push and a full push can result in
|
|
|
|
a bundle that the incremental push is based on being deleted by the full
|
|
|
|
push, and then incremental push's manifest file being written later.
|
|
|
|
Which will prevent cloning or some pulls from working.
|
|
|
|
|
|
|
|
There is no way to prevent this race, but the problem can be avoided:
|
|
|
|
Make each full push also write to a fallback manifest file that is only
|
|
|
|
written by full pushes, not incremental pushes. When fetching the main
|
|
|
|
manifest file, always check that all bundles mentioned in it are still in
|
|
|
|
the remote. If any are missing, fetch and use the fallback manifest file
|
|
|
|
instead.
|
|
|
|
|
|
|
|
(The only other solution I can think of is to never delete old bundles,
|
|
|
|
except after some amount of time long enough that it credibly avoids
|
|
|
|
the race. But since a process could be suspended at any point and resumed
|
|
|
|
later, the race window could be arbitrarily wide.)
|
|
|
|
|
2024-05-20 18:17:00 +00:00
|
|
|
* A race between two full pushes can also result in the manifest file listing
|
|
|
|
a bundle that has been deleted:
|
|
|
|
|
2024-05-21 13:35:46 +00:00
|
|
|
Start with a full push of bundle A.
|
2024-05-20 18:17:00 +00:00
|
|
|
|
2024-05-21 13:35:46 +00:00
|
|
|
Then there are 2 racing full pushes X and Y, of bundle A and B
|
|
|
|
respectively. With this series of operations:
|
2024-05-20 18:17:00 +00:00
|
|
|
|
2024-05-21 13:35:46 +00:00
|
|
|
1. Y: write bundle B
|
|
|
|
1. Y: read manifest (listing A)
|
|
|
|
1. Y: write B to manifest
|
|
|
|
1. X: write bundle A
|
|
|
|
1. Y: delete bundle A
|
|
|
|
1. X: read manifest (listing B)
|
|
|
|
1. X: write A to manifest
|
|
|
|
1. X: delete bundle B
|
2024-05-20 18:17:00 +00:00
|
|
|
|
2024-05-21 13:35:46 +00:00
|
|
|
Which results in a manifest that lists A, but that bundle was deleted.
|
2024-05-13 18:30:18 +00:00
|
|
|
|
2024-05-13 18:42:25 +00:00
|
|
|
* Cloning from an annex:: url with importtree=yes doesn't work
|
|
|
|
(with or without exporttree=yes). This is because the ContentIdentifier
|
2024-05-20 17:49:45 +00:00
|
|
|
db is not populated. It should be possible to work around this.
|
2024-05-15 21:41:55 +00:00
|
|
|
|
2024-05-14 17:52:20 +00:00
|
|
|
* It would be nice if git-annex could generate an annex:: url
|
|
|
|
for a special remote and show it to the user, eg when
|
|
|
|
they have set the shorthand "annex::" url, so they know the full url.
|
|
|
|
`git-annex info $remote` could also display it.
|
|
|
|
Currently, the user has to remember how the special remote was
|
|
|
|
configured and replicate it all in the url.
|
|
|
|
|
|
|
|
There are some difficulties to doing this, including that
|
|
|
|
RemoteConfig can have hidden fields that should be omitted.
|
2024-05-10 18:41:18 +00:00
|
|
|
|
2024-05-14 17:52:20 +00:00
|
|
|
* initremote/enableremote could have an option that configures the url to a
|
|
|
|
special remote to a annex:: url. This would make it easier to use
|
|
|
|
git-remote-annex, since the user would not need to set up the url
|
|
|
|
themselves. (Also it would then avoid setting `skipFetchAll = true`)
|
2024-05-10 18:41:18 +00:00
|
|
|
|
|
|
|
* datalad-annex supports cloning from the web special remote,
|
|
|
|
using an url that contains the result of pushing to eg, a directory
|
|
|
|
special remote.
|
|
|
|
`datalad-annex::https://example.com?type=web&url={noquery}`
|
|
|
|
Supporting something like this would be good.
|
|
|
|
|
|
|
|
* Improve behavior in push races. A race can overwrite a change
|
|
|
|
to the MANIFEST and lose work that was pushed from the other repo.
|
|
|
|
From the user's perspective, that situation is the same as if one repo
|
|
|
|
pushed new work, then the other repo did a git push --force, overwriting
|
|
|
|
the first repo's push. In the first repo, another push will then fail as
|
|
|
|
a non fast-forward, and the user can recover as usual. This is probably
|
|
|
|
okish.
|
|
|
|
|
|
|
|
But.. a MANIFEST overwrite will leave bundle files in the remote that
|
|
|
|
are not listed in the MANIFEST. It seems likely that git-annex could
|
|
|
|
detect that after the fact and clean it up. Eg, if it caches
|
|
|
|
the last MANIFEST it uploaded, next time it downloads the MANIFEST
|
|
|
|
it can check if there are bundle files in the old one that are not
|
|
|
|
in the new one. If so, it can drop those bundle files from the remote.
|
|
|
|
|
|
|
|
* A push race can also appear to the user as if they pushed a ref, but then
|
|
|
|
it got deleted from the remote. This happens when two pushes are
|
|
|
|
pushing different ref names. This might be harder for the user to
|
|
|
|
notice; git fetch does not indicate that a remote ref got deleted.
|
|
|
|
They would have to use git fetch --prune to notice the deletion.
|
|
|
|
Once the user does notice, they can re-push their ref to recover.
|
|
|
|
Can this be improved?
|