preparing to merge git-remote-annex
Update its todo with remaining items. Add changelog entry. Simplified internals document to no longer be notes to myself, but target users who want to understand how the data is stored and might want to extract these repos manually. Sponsored-by: Kevin Mueller on Patreon
This commit is contained in:
parent
4d0543932e
commit
dfb09ad1ad
5 changed files with 116 additions and 54 deletions
|
@ -1,5 +1,8 @@
|
|||
git-annex (10.20240431) UNRELEASED; urgency=medium
|
||||
|
||||
* git-remote-annex: New program which allows pushing a git repo to a
|
||||
git-annex special remote, and cloning from a special remote.
|
||||
(Based on Michael Hanke's git-remote-datalad-annex.)
|
||||
* Typo fixes.
|
||||
Thanks, Yaroslav Halchenko
|
||||
|
||||
|
|
|
@ -49,5 +49,5 @@ problem:
|
|||
[[fairly simple shell script using standard tools|tips/Decrypting_files_in_special_remotes_without_git-annex]]
|
||||
(gpg and openssl) can decrypt files stored on such
|
||||
a remote, as long as you have access to the encryption keys (which
|
||||
are stored in the git-annex branch of the repository, sometimes
|
||||
encrypted with your gpg key).
|
||||
for some types of encryption are stored in the git-annex branch of
|
||||
the repository, sometimes encrypted with your gpg key).
|
||||
|
|
|
@ -2,6 +2,8 @@ In the world of git, we're not scared about internal implementation
|
|||
details, and sometimes we like to dive in and tweak things by hand. Here's
|
||||
some documentation to that end.
|
||||
|
||||
[[!toc ]]
|
||||
|
||||
## The .git/ directory
|
||||
|
||||
### `.git/annex/objects/aa/bb/*/*`
|
||||
|
@ -364,3 +366,8 @@ of actual annexed files.
|
|||
|
||||
These trees are recorded in history of the git-annex branch, but the
|
||||
head of the git-annex branch will never contain them.
|
||||
|
||||
## Other internals documentation
|
||||
|
||||
* [[git-remote-annex]] documents how git repositories are stored
|
||||
on special remotes when using git with "annex::" urls.
|
||||
|
|
|
@ -1,3 +1,6 @@
|
|||
The [[git-remote-annex|/git-remote-annex]] command allows pushing a git
|
||||
repository to a special remote, and later cloning from it.
|
||||
|
||||
This adds two new key types to git-annex, GITMANIFEST and a GITBUNDLE.
|
||||
|
||||
GITMANIFEST--$UUID is the manifest for a git repository stored in the
|
||||
|
@ -11,44 +14,26 @@ An ordered list of bundle keys, one per line.
|
|||
|
||||
(Lines end with unix `"\n"`, not `"\r\n"`.)
|
||||
|
||||
# fetching
|
||||
|
||||
1. download GITMANIFEST for the uuid of the special remote
|
||||
2. download each listed GITBUNDLE key that we don't have
|
||||
3. `git fetch` from each new bundle in order
|
||||
(note that later bundles can update refs from the versions in previous
|
||||
bundles)
|
||||
|
||||
# pushing (incrementally)
|
||||
|
||||
This is how pushes are usually done.
|
||||
|
||||
1. create git bundle of all refs that are being pushed and have changed,
|
||||
and objects since the previously pushed refs
|
||||
2. hash to calculate GITBUNDLE key
|
||||
3. upload GITBUNDLE key
|
||||
4. download current manifest
|
||||
5. append GITBUNDLE key to manifest
|
||||
|
||||
# pushing (full)
|
||||
|
||||
Note that this can be used to replace incrementals with a single bundle for
|
||||
performance. It is also the only way to handle a push that deletes a
|
||||
previously pushed ref.
|
||||
|
||||
1. create git bundle containing all refs stored in the repository, and all
|
||||
objects
|
||||
2. hash to calculate GITBUNDLE key name
|
||||
3. upload GITBUNDLE key
|
||||
4. download old manifest
|
||||
4. upload new manifest listing only the single new GITBUNDLE
|
||||
5. delete all other GITBUNDLEs that were listed in the old manifest
|
||||
|
||||
# multiple GITMANIFEST files
|
||||
|
||||
Usually there will only be one per special remote, but it's possible for
|
||||
multiple special remotes to point to the same object storage, and if so
|
||||
multiple GITMANIFEST objects can be stored.
|
||||
|
||||
It follows that the UUID of the special remote has to be included in the
|
||||
annex:// uri, to know which GITMANIFEST to use when cloning from it.
|
||||
This is why the UUID of the special remote is included in the GITMANIFEST
|
||||
key, and in the annex:: uri.
|
||||
|
||||
# manually cloning from these files
|
||||
|
||||
If you are unable to use git-annex and need to clone a git repository
|
||||
stored in such a special remote, this procedure will work:
|
||||
|
||||
* Find and download the GITMANIFEST
|
||||
* Download each listed GITBUNDLE
|
||||
* `git fetch` from each new bundle in order.
|
||||
(Note that later bundles can update refs from the versions in previous
|
||||
bundles.)
|
||||
|
||||
When the special remote is encryptee, the GITMANIFEST and GITBUNDLE will
|
||||
also be encrypted. To decrypt those manually, see this
|
||||
[[fairly simple shell script using standard tools|tips/Decrypting_files_in_special_remotes_without_git-annex]].
|
||||
|
|
|
@ -4,21 +4,88 @@ repository to any git-annex special remote.
|
|||
This is a redesign and reimplementation of git-remote-datalad-annex.
|
||||
It will be a safer implementation, will support incremental pushes, and
|
||||
will be available to users who don't use datalad.
|
||||
|
||||
Work is in the `git-remote-annex` branch, currently we have a design for
|
||||
the core data files and operations.
|
||||
<http://source.git-annex.branchable.com/?p=source.git;a=blob;f=doc/internals/git-remote-annex.mdwn;hb=git-remote-annex>
|
||||
|
||||
Also, that branch has a proof of concept implementation in a shell script.
|
||||
Though it doesn't yet use special remotes at all, it is able to do
|
||||
incremental pushes to git bundles with a manifest.
|
||||
|
||||
I still need to do some design work around using the git-annex branch to
|
||||
detect concurrent push situations where changes to the manifest get lost,
|
||||
and re-add those changes to it later.
|
||||
|
||||
Also, it's not clear what will happen when two people make conflicting pushes
|
||||
to a ref, the goal would be to replicate git push to a regular git remote,
|
||||
but that may not be entirely possible. This will need to be investigated
|
||||
further.
|
||||
--[[Joey]]
|
||||
|
||||
---
|
||||
|
||||
This is implememented and working. Remaining todo list for it:
|
||||
|
||||
* Need to test all types of pushes, barely tested at all.
|
||||
|
||||
* Support exporttree=yes remotes.
|
||||
|
||||
* Support max-bundles config
|
||||
|
||||
* Need to mention git-remote-annex in special remotes page, and perhaps
|
||||
write a tip for it. Also link to it from git-annex man page.
|
||||
|
||||
* initremote could optionally configure the url to a special remote
|
||||
to an annex:: url. This would make it easier to use git-remote-annex,
|
||||
since the user would not need to set up the url themselves.
|
||||
(Also it would then avoid setting `skipFetchAll = true`)
|
||||
|
||||
* Prevent using with remotes that are encrypted using a cipher
|
||||
stored in the repo. Chicken and egg problem cloning from
|
||||
such a remote. Maybe allow advanced users to force it?
|
||||
|
||||
* When the remote has no manifest, a pull from it should fail,
|
||||
while a push should succeed. Otherwise, it can be confusing
|
||||
to clone from a wrong url, since it fails to download
|
||||
a manifest and so appears as if the remote is empty.
|
||||
|
||||
* See XXX in uploadManifest about recovering from a situation
|
||||
where the remote is left with a deleted manifest when a push
|
||||
is interrupted part way through. This should be recoverable
|
||||
by caching the manifest locally and re-uploading it when
|
||||
the remote has no manifest.
|
||||
|
||||
* datalad-annex supports cloning from the web special remote,
|
||||
using an url that contains the result of pushing to eg, a directory
|
||||
special remote.
|
||||
`datalad-annex::https://example.com?type=web&url={noquery}`
|
||||
Supporting something like this would be good.
|
||||
|
||||
* It would be nice if git-annex could generate an annex:: url
|
||||
for a special remote and show it to the user, eg when
|
||||
they have set the shorthand "annex::" url, so they know the full url.
|
||||
`git-annex info $remote` could also display it.
|
||||
Currently, the user has to remember how the special remote was
|
||||
configured and replicate it all in the url.
|
||||
|
||||
There are some difficulties to doing this, including that
|
||||
RemoteConfig can have hidden fields that should be omitted,
|
||||
and that some, like type=directory, remove some configs
|
||||
(eg directory=) in their setup action.
|
||||
|
||||
* Improve behavior in push races. A race can overwrite a change
|
||||
to the MANIFEST and lose work that was pushed from the other repo.
|
||||
From the user's perspective, that situation is the same as if one repo
|
||||
pushed new work, then the other repo did a git push --force, overwriting
|
||||
the first repo's push. In the first repo, another push will then fail as
|
||||
a non fast-forward, and the user can recover as usual. This is probably
|
||||
okish.
|
||||
|
||||
But.. a MANIFEST overwrite will leave bundle files in the remote that
|
||||
are not listed in the MANIFEST. It seems likely that git-annex could
|
||||
detect that after the fact and clean it up. Eg, if it caches
|
||||
the last MANIFEST it uploaded, next time it downloads the MANIFEST
|
||||
it can check if there are bundle files in the old one that are not
|
||||
in the new one. If so, it can drop those bundle files from the remote.
|
||||
|
||||
* A push race can also appear to the user as if they pushed a ref, but then
|
||||
it got deleted from the remote. This happens when two pushes are
|
||||
pushing different ref names. This might be harder for the user to
|
||||
notice; git fetch does not indicate that a remote ref got deleted.
|
||||
They would have to use git fetch --prune to notice the deletion.
|
||||
Once the user does notice, they can re-push their ref to recover.
|
||||
Can this be improved?
|
||||
|
||||
* The race condition described in
|
||||
[[!commit 797f27ab0517e0021363791ff269300f2ba095a5]]
|
||||
where before git-annex init is run in a repo,
|
||||
using git-remote-annex and at the same time git-annex init can lose
|
||||
changes that the latter command (and ones after it) write to the
|
||||
git-annex branch.
|
||||
|
||||
This should be fixable by making git-remote-annex not write to the
|
||||
git-annex branch, but to eg, a temporary journal directory.
|
||||
|
|
Loading…
Reference in a new issue