git-annex/doc/internals/git-remote-annex.mdwn
Joey Hess 8b56d6b283
fix conflicting push situation
In a situation where there are two repos that are diverged and each pushes
in turn to git-remote-annex, the first to push updates it. Then the second
push fails because it is not a fast-forward. The problem is, before git
push fails with "non-fast-forward", it actually calls git-remote-annex
with push.

So, to the user it appears as if the push failed, but it actually reached
the remote, and overwrote the other push!

The only solution to this seems to be for git-remote-annex push to notice
when a non-force-push would overwrite a ref stored in the remote, and
refuse to push that ref, returning an error to git. This seems strange,
why would git make remote helpers implement that when it later checks the
same thing itself?

With this fix, it's still possible for a race to overwrite a change to
the MANIFEST and lose work that was pushed from the other repo. But that
needs two pushes to be running at the same time. From the user's
perspective, that situation is the same as if one repo pushed new work,
then the other repo did a git push --force, overwriting the first repo's
push. In the first repo, another push will then fail as a non
fast-forward, and the user can recover as usual. But, a MANIFEST
overwrite will leave bundle files in the remote that are not listed in
the MANIFEST. It seems likely that git-annex will eventually be able to
detect that after the fact and clean it up. Eg, it can learn all bundles
that are stored in the remote using the location log, and compare them
to the MANIFEST to find bundles that got lost.

The race can also appear to the user as if they pushed a ref, but then
it got deleted from the remote. This happens when two two pushes are
pushing different ref names. This might be harder for the user to
notice; git fetch does not indicate that a remote ref got deleted.
They would have to use git fetch --prune to notice the deletion.
Once the user does notice, they can re-push their ref to recover.

Sponsored-by: Jack Hill on Patreon
2024-04-26 15:03:04 -04:00

49 lines
1.7 KiB
Markdown

This adds two new object types to git-annex, GITMANIFEST and a GITBUNDLE.
GITMANIFEST--$UUID is the manifest for a git repository stored in the
git-annex repository with that UUID.
GITBUNDLE--sha256 is a git bundle.
# format of the manifest file
An ordered list of bundle keys, one per line.
The last bundle in the list provides all refs that are currently stored in
the repository. The bundles before it in the list can incrementally provide
objects, but not refs.
# fetching
1. download GITMANIFEST for the uuid of the special remote
2. download each listed GITBUNDLE object that we don't have
3. `git bundle unpack` each bundle in order
4. `git fetch` from the last bundle listed in the manifest
# pushing (incrementally)
1. create git bundle all refs that will be stored in the repository,
and objects since the previously pushed refs
2. hash to calculate GITBUNDLE key
3. upload GITBUNDLE object
4. download current manifest
5. append GITBUNDLE key to manifest
# pushing (replacing incrementals with single bundle)
1. create git bundle containing all refs stored in the repository, and all
objects
2. hash to calculate GITBUNDLE object name
3. upload GITBUNDLE object
4. download old manifest
4. upload new manifest listing only the single new GITBUNDLE
5. delete all other GITBUNDLEs that were listed in the old manifest
# multiple GITMANIFEST files
Usually there will only be one per special remote, but it's possible for
multiple special remotes to point to the same object storage, and if so
multiple GITMANIFEST objects can be stored.
It follows that the UUID of the special remote has to be included in the
annex:// uri, to know which GITMANIFEST to use when cloning from it.