Commit graph

35 commits

Author SHA1 Message Date
Joey Hess
7d61a99da3
todo 2024-05-24 13:57:33 -04:00
Joey Hess
b792b128a0
verified checkprereq
The case documented in its comment worked in a test push and clone.
2024-05-24 13:06:29 -04:00
Joey Hess
1a3c60cc8e
git-remote-annex: avoid bundle object leakage in push race or interrupted push
Locally record the manifest before uploading it or any bundles,
and read it on the next push. Any bundles from the push that are
not included in the currently being pushed manifest will get added
to the outManifest, and so eventually get deleted.

This deals with an interrupted push that is not resumed and instead
something else is pushed. And it deals with a push race that overwrites
the manifest.

Of course, this can't help if one of those situations is followed by
the local repo being deleted. But that's equivilant to doing a git-annex
copy of a new annexed file to a special remote and then deleting the
special repo w/o pushing. In either case the special remote ends up with
a object in it that git-annex doesn't know about.
2024-05-24 12:47:32 -04:00
Joey Hess
10a60183e1
guard pushEmpty 2024-05-21 12:12:44 -04:00
Joey Hess
14c79373c4
update 2024-05-21 12:05:44 -04:00
Joey Hess
b3d7ae51f0
fix edge case where git-annex branch does not have config for enabled special remote
One way this could happen is cloning an empty special remote.
A later fetch would then fail.
2024-05-21 11:27:49 -04:00
Joey Hess
3e7324bbcb
only delete bundles on pushEmpty
This avoids some apparently otherwise unsolveable problems involving
races that resulted in the manifest listing bundles that were deleted.

Removed the annex-max-git-bundles config because it can't actually
result in deleting old bundles. It would still be possible to have a
config that controls how often to do a full push, which would avoid
needing to download too many bundles on clone, as well as needing to
checkpresent too many bundles in verifyManifest. But it would need a
different name and description.
2024-05-21 11:13:27 -04:00
Joey Hess
f544946b09
update 2024-05-21 10:20:30 -04:00
Joey Hess
b042dfeb0e
emptying pushes only delete 2024-05-21 09:52:35 -04:00
Joey Hess
5d40759470
formalize problem description 2024-05-21 09:35:46 -04:00
Joey Hess
3a38520aac
avoid interrupted push leaving remote without a manifest
Added a backup manifest key, which is used if the main manifest key is
not present. When uploading a new Manifest, it makes sure that it never
drops one key except when the other key is present.

It's entirely possible for the two manifest keys to get out of sync, due
to races. The main one wins when it's present, it is possible for the
main one being dropped to expose the backup one, which has a different
push recorded.
2024-05-20 15:41:09 -04:00
Joey Hess
594ca2fd3a
update 2024-05-20 14:52:06 -04:00
Joey Hess
34a6db4f15
improve recovery from interrupted push
On push, first try to drop all outManifest keys listed in the current
manifest file, which resumes from an interrupted push that didn't
get a chance to delete those keys.

The new manifest gets its outManifest populated with the keys that were
in the old manifest, plus any of the keys that were unable to be
dropped.

Note that it would be possible for uploadManifest to skip dropping old
keys at all. The old keys would get dropped on the next push. But it
seems better to delete stuff immediately rather than waiting. And the
extra work is limited to push and typically is small.

A remote where dropKey always fails will result in an outManifest that
grows longer and longer. It would be possible to check if the remote
has appendonly = True and avoid populating the outManifest. Of course,
an appendonly remote will grow with every git push anyway. And currently
only Remote.GitLFS sets that, which can't be used as a git-remote-annex
remote anyway.
2024-05-20 13:49:45 -04:00
Joey Hess
ce60211881
add incremental vs full push race to todo
with plan to deal with it
2024-05-16 09:37:28 -04:00
Joey Hess
b1b6e35d4c
reorg todo 2024-05-15 17:41:55 -04:00
Joey Hess
adcebbae47
clean up git-remote-annex git-annex branch handling
Implemented alternateJournal, which git-remote-annex
uses to avoid any writes to the git-annex branch while setting up
a special remote from an annex:: url.

That prevents the remote.log from being overwritten with the special
remote configuration from the url, which might not be 100% the same as
the existing special remote configuration.

And it prevents an overwrite deleting of other stuff that was
already in the remote.log.

Also, when the branch was created by git-remote-annex, only delete it
at the end if nothing else has been written to it by another command.
This fixes the race condition described in
797f27ab05, where git-remote-annex
set up the branch and git-annex init and other commands were
run at the same time and their writes to the branch were lost.
2024-05-15 17:33:38 -04:00
Joey Hess
d24d8870c5
todo 2024-05-15 14:33:13 -04:00
Joey Hess
2dfffa0621
bugfix
When pushing branch foo, we don't want to delete other tracking
branches. In particular, a full push needs all the tracking branches.
2024-05-14 16:17:27 -04:00
Joey Hess
169e673ad4
result of some testing 2024-05-14 16:01:24 -04:00
Joey Hess
0722c504c5
update docs for git-remote-annex 2024-05-14 15:31:16 -04:00
Joey Hess
23c4125ed4
mention other commands shipped with git-annex in SEE ALSO in man page 2024-05-14 15:23:45 -04:00
Joey Hess
24af51e66d
git-annex unused --from remote skips its git-remote-annex keys
This turns out to only be necessary is edge cases. Most of the
time, git-annex unused --from remote doesn't see git-remote-annex keys
at all, because it does not record a location log for them.

On the other hand, git-annex unused does find them, since it does not
rely on the location log. And that's good because they're a local cache
that the user should be able to drop.

If, however, the user ran git-annex unused and then git-annex move
--unused --to remote, the keys would have a location log for that
remote. Then git-annex unused --from remote would see them, and would
consider them unused. Even when they are present on the special remote
they belong to. And that risks losing data if they drop the keys from
the special remote, but didn't expect it would delete git branches they
had pushed to it.

So, make git-annex unused --from skip git-remote-annex keys whose uuid
is the same as the remote.
2024-05-14 15:17:40 -04:00
Joey Hess
0bf72ef103
max-git-bundles config for git-remote-annex 2024-05-14 14:23:40 -04:00
Joey Hess
8ad768fdba
todo 2024-05-14 13:58:35 -04:00
Joey Hess
6f1039900d
prevent using git-remote-annex with unsuitable special remote configs
I hope to support importtree=yes eventually, but it does not currently
work.

Added remote.<name>.allow-encrypted-gitrepo that needs to be set to
allow using it with encrypted git repos.

Note that even encryption=pubkey uses a cipher stored in the git repo
to encrypt the keys stored in the remote. While it would be possible to
not encrypt the GITBUNDLE and GITMANIFEST keys, and then allow using
encryption=pubkey, it doesn't currently work, and that would be a
complication that I doubt is worth it.
2024-05-14 13:52:20 -04:00
Joey Hess
8bf6dab615
update 2024-05-13 14:42:25 -04:00
Joey Hess
ddf05c271b
fix cloning from an annex:: remote with exporttree=yes
Updating the remote list needs the config to be written to the git-annex
branch, which was not done for good reasons. While it would be possible
to instead use Remote.List.remoteGen without writing to the branch, I
already have a plan to discard git-annex branch writes made by
git-remote-annex, so the simplest fix is to write the config to the
branch.

Sponsored-by: k0ld on Patreon
2024-05-13 14:35:17 -04:00
Joey Hess
552b000ef1
update 2024-05-13 14:30:18 -04:00
Joey Hess
34eae54ff9
git-remote-annex support exporttree=yes remotes
Put the annex objects in .git/annex/objects/ inside the export remote.
This way, when importing from the remote, they will be filtered out.

Note that, when importtree=yes, content identifiers are used, and this
means that pushing to a remote updates the git-annex branch. Urk.
Will need to try to prevent that later, but I already had a todo about
that for other reasons.

Untested!

Sponsored-By: Brock Spratlen on Patreon
2024-05-13 11:48:00 -04:00
Joey Hess
3f848564ac
refuse to fetch from a remote that has no manifest
Otherwise, it can be confusing to clone from a wrong url, since it fails
to download a manifest and so appears as if the remote exists but is empty.

Sponsored-by: Jack Hill on Patreon
2024-05-13 09:47:21 -04:00
Joey Hess
97b309b56e
extend manifest with keys to be deleted
This will eventually be used to recover from an interrupted fullPush
and drop the old bundle keys it was unable to delete.

Sponsored-by: Luke T. Shumaker on Patreon
2024-05-13 09:09:33 -04:00
Joey Hess
dfb09ad1ad
preparing to merge git-remote-annex
Update its todo with remaining items.

Add changelog entry.

Simplified internals document to no longer be notes to myself, but
target users who want to understand how the data is stored
and might want to extract these repos manually.

Sponsored-by: Kevin Mueller on Patreon
2024-05-10 15:06:15 -04:00
Joey Hess
d895df1010
update 2024-04-25 17:01:17 -04:00
Joey Hess
967f887b95
update 2024-04-08 09:16:27 -04:00
Joey Hess
5060185a7b
project we started at Distribits 2024-04-06 19:52:07 -04:00