Commit graph

68 commits

Author SHA1 Message Date
Joey Hess
3e7324bbcb
only delete bundles on pushEmpty
This avoids some apparently otherwise unsolveable problems involving
races that resulted in the manifest listing bundles that were deleted.

Removed the annex-max-git-bundles config because it can't actually
result in deleting old bundles. It would still be possible to have a
config that controls how often to do a full push, which would avoid
needing to download too many bundles on clone, as well as needing to
checkpresent too many bundles in verifyManifest. But it would need a
different name and description.
2024-05-21 11:13:27 -04:00
Joey Hess
3a38520aac
avoid interrupted push leaving remote without a manifest
Added a backup manifest key, which is used if the main manifest key is
not present. When uploading a new Manifest, it makes sure that it never
drops one key except when the other key is present.

It's entirely possible for the two manifest keys to get out of sync, due
to races. The main one wins when it's present, it is possible for the
main one being dropped to expose the backup one, which has a different
push recorded.
2024-05-20 15:41:09 -04:00
Joey Hess
34eae54ff9
git-remote-annex support exporttree=yes remotes
Put the annex objects in .git/annex/objects/ inside the export remote.
This way, when importing from the remote, they will be filtered out.

Note that, when importtree=yes, content identifiers are used, and this
means that pushing to a remote updates the git-annex branch. Urk.
Will need to try to prevent that later, but I already had a todo about
that for other reasons.

Untested!

Sponsored-By: Brock Spratlen on Patreon
2024-05-13 11:48:00 -04:00
Joey Hess
97b309b56e
extend manifest with keys to be deleted
This will eventually be used to recover from an interrupted fullPush
and drop the old bundle keys it was unable to delete.

Sponsored-by: Luke T. Shumaker on Patreon
2024-05-13 09:09:33 -04:00
Joey Hess
dfb09ad1ad
preparing to merge git-remote-annex
Update its todo with remaining items.

Add changelog entry.

Simplified internals document to no longer be notes to myself, but
target users who want to understand how the data is stored
and might want to extract these repos manually.

Sponsored-by: Kevin Mueller on Patreon
2024-05-10 15:06:15 -04:00
Joey Hess
c7731cdbd9
add Backend.GitRemoteAnnex
Making GITBUNDLE be in the backend list allows those keys to be
hashed to verify, both when git-remote-annex downloads them, and by other
transfers and by git fsck.

GITMANIFEST is not in the backend list, because those keys will never be
stored in .git/annex/objects and can't be verified in any case.

This does mean that git-annex version will include GITBUNDLE in the list
of backends.

Also documented these in backends.mdwn

Sponsored-by: Kevin Mueller on Patreon
2024-05-07 13:54:08 -04:00
Joey Hess
483887591d
working toward git-remote-annex using a special remote
Not quite there yet.

Also, changed the format of GITBUNDLE keys to use only one '-'
after the UUID. A sha256 does not contain that character, so can just
split at the last one.

Amusingly, the sha256 will probably not actually be verified. A git
bundle contains its own checksums that git uses to verify it. And if
someone wanted to replace the content of a GITBUNDLE object, they
could just edit the manifest to use a new one whose sha256 does verify.

Sponsored-by: Nicholas Golder-Manning
2024-05-06 16:28:04 -04:00
Joey Hess
0be9f7a2c6
add UUID to GITBUNDLE
The UUID is included in the GITMANIFEST in order to allow a single
key/value store to be used to store several special remotes, without any
namespacing. In that situation though, if the same ref is pushed to two
special remotes, it will result in git bundles with the same content.

Which is ok, until a re-push happens to one of the special remote.
At that point, the old git bundle will be deleted. That will prevent
fetching it from the other special remote, where the re-push has not
happened.

Adding the UUID avoids this problem.
2024-05-06 12:51:44 -04:00
Joey Hess
fc37243ffe
convert git-remote-annex to not include old pushed refs in new bundle
Rather than requiring the last listed bundle in the manifest include all
refs that are in the remote, build up refs from each bundle listed in
the manifest.

This fixes a bug where pushing first a new branch foo from one clone,
and then pushing a new branch bar from another clone, caused the second
push to lose branch foo. Now the second push will add a new bundle, but
the foo ref in the bundle from the first push will still be used.

Pushing a deletion of a ref now has to delete all bundles and push a new
bundle with only the remaining refs in it.

In a "list for-push", it now has to unbundle all bundles, in order for a
deletion repush to have available all objects. (And a non-deletion push
can also rely on refs/namespaces/mine/ being up-to-date.)

It would have been possible to fix the bug by only making it do that
unbundling in "list for-push", without changing what's stored in the
bundles. But I think I prefer to populate the bundles this way. For one
thing, deleting a pushed ref now really deletes all data relating to it,
rather than leaving it present in old bundles. For another, it's easier
to explain since there is no special case for the last bundle. And, it
will often result in smaller bundles.

Note that further efficiency gains are possible with respect to what
objects are included in an incremental bundle. Two XXX comments
document how to reduce excess objects. It didn't seem worth implementing
those optimisations in this proof of concept code.

Sponsored-by: Brock Spratlen on Patreon
2024-04-30 14:30:09 -04:00
Joey Hess
8b56d6b283
fix conflicting push situation
In a situation where there are two repos that are diverged and each pushes
in turn to git-remote-annex, the first to push updates it. Then the second
push fails because it is not a fast-forward. The problem is, before git
push fails with "non-fast-forward", it actually calls git-remote-annex
with push.

So, to the user it appears as if the push failed, but it actually reached
the remote, and overwrote the other push!

The only solution to this seems to be for git-remote-annex push to notice
when a non-force-push would overwrite a ref stored in the remote, and
refuse to push that ref, returning an error to git. This seems strange,
why would git make remote helpers implement that when it later checks the
same thing itself?

With this fix, it's still possible for a race to overwrite a change to
the MANIFEST and lose work that was pushed from the other repo. But that
needs two pushes to be running at the same time. From the user's
perspective, that situation is the same as if one repo pushed new work,
then the other repo did a git push --force, overwriting the first repo's
push. In the first repo, another push will then fail as a non
fast-forward, and the user can recover as usual. But, a MANIFEST
overwrite will leave bundle files in the remote that are not listed in
the MANIFEST. It seems likely that git-annex will eventually be able to
detect that after the fact and clean it up. Eg, it can learn all bundles
that are stored in the remote using the location log, and compare them
to the MANIFEST to find bundles that got lost.

The race can also appear to the user as if they pushed a ref, but then
it got deleted from the remote. This happens when two two pushes are
pushing different ref names. This might be harder for the user to
notice; git fetch does not indicate that a remote ref got deleted.
They would have to use git fetch --prune to notice the deletion.
Once the user does notice, they can re-push their ref to recover.

Sponsored-by: Jack Hill on Patreon
2024-04-26 15:03:04 -04:00
Joey Hess
6ff4300bd1
proof of concent for push to git bundles with MANIFEST
This is a shell script, so not final code, and it does not use git-annex
at all, but it shows how to push to git bundles, listed in a MANIFEST,
the same as the git-remote-annex program will eventually do.

While developing this, I realized that the design needed to be changed
slightly regarding where refs are stored. Since a push can delete a ref
from a remote, storing each newly pushed ref in a bundle won't work,
because deleting a ref would then entail deleting all old bundles and
re-uploading from scratch. So instead, only the refs in the last bundle
listed in the MANIFEST are the active refs. Any refs in prior bundles
are just old refs that were stored previously (a reflog as it were).

That means that, in a situation where two different people are pushing
to the same special remote from different repos, whoever pushes last
wins. Any refs pushed by the other person earlier will be ignored. This
may not be desirable, and git-annex might be able use the git-annex
branch to detect such situations and rescue the refs that got lost. Even
without such a recovery process though, the refs that the other person
thought they pushed will be preserved in their refs/namespaces/mine, so
a pull followed by a push will generally resolve the situation.

Note that the use of refs/namespaces/mine in the bundle is not really
desirable, and it might be worth making a local clone of the repo in
order to set up the refs that will be put in the bundle. Which seems to
be the only way to avoid needing that. But it does need to maintain
the refs/namespaces/mine/ in the git repo in order to remember what refs
have been pushed to the remote before, in order to include them in the
next bundle pushed. A name that includes the remote uuid will be needed
in the final implementation.

Anyway, this shell script seems to fully work, including incremental
pushing, force pushing, and pushes that delete refs.

Sponsored-by: Brett Eisenberg on Patreon
2024-04-25 16:55:19 -04:00
Joey Hess
f900c56ca3
parameterize manifest on UUID
and expand slightly
2024-04-06 08:34:01 -04:00
Joey Hess
9b116870a6
added docs for git-remote-annex special remote contents
Designed with the help of Timothy Sanders and Michael Hanke
at Distribits 2024
2024-04-06 05:28:49 -04:00
Joey Hess
9d60385001
convert renameFile to moveFile to support cross-device moves
Improve handling of some .git/annex/ subdirectories being on other
filesystems, in the bittorrent special remote, and youtube-dl integration,
and git-annex addurl.

The only one of these that I've confirmed to be a problem is in the
bittorrent special remote when .git/annex/tmp and .git/annex/othertmp are
on different filesystems.

As well as auditing for renameFile, also audited for createLink, all of
those are ok as are the other remaining renameFile calls. Also audited all
code paths that use .git/annex/othertmp, and did not find any other
cross-device problems. So, removing mention of othertmp needing to be on
the same device.

Sponsored-by: Dartmouth College's Datalad project
2022-12-20 15:17:50 -04:00
yarikoptic
572aa58a50 Added a comment: why othertmp to be on the same file system? 2022-12-13 14:15:28 +00:00
nick.guenther@e418ed3c763dff37995c2ed5da4232a7c6cee0a9
a42e76d89f Added a comment 2022-05-20 18:09:45 +00:00
Joey Hess
7f28f41c37
response 2022-05-20 13:17:44 -04:00
nick.guenther@e418ed3c763dff37995c2ed5da4232a7c6cee0a9
514f50e5be Added a comment: How to disable lockdown in bare repos? 2022-05-15 21:30:50 +00:00
Joey Hess
67245ae00f
fully specify the pointer file format
This format is designed to detect accidental appends, while having some
room for future expansion.

Detect when an unlocked file whose content is not present has gotten some
other content appended to it, and avoid treating it as a pointer file, so
that appended content will not be checked into git, but will be annexed
like any other file.

Dropped the max size of a pointer file down to 32kb, it was around 80 kb,
but without any good reason and certianly there are no valid pointer files
anywhere that are larger than 8kb, because it's just been specified what it
means for a pointer file with additional data even looks like.

I assume 32kb will be good enough for anyone. ;-) Really though, it needs
to be some smallish number, because that much of a file in git gets read
into memory when eg, catting pointer files. And since we have no use cases
for the extra lines of a pointer file yet, except possibly to add
some human-visible explanation that it is a git-annex pointer file, 32k
seems as reasonable an arbitrary number as anything. Increasing it would be
possible, eg to 64k, as long as users of such jumbo pointer files didn't
mind upgrading all their git-annex installations to one that supports the
new larger size.

Sponsored-by: Dartmouth College's Datalad project
2022-02-23 14:20:31 -04:00
Joey Hess
4b1b9d7a83
Added annex.freezecontent-command and annex.thawcontent-command configs
Freeze first sets the file perms, and then runs
freezecontent-command. Thaw runs thawcontent-command before
restoring file permissions. This is in case the freeze command
prevents changing file perms, as eg setting a file immutable does.
Also, changing file perms tends to mess up previously set ACLs.

git-annex init's probe for crippled filesystem uses them, so if file perms
don't work, but freezecontent-command manages to prevent write to a file,
it won't treat the filesystem as crippled.

When the the filesystem has been probed as crippled, the hooks are not
used, because there seems to be no point then; git-annex won't be relying
on locking annex objects down. Also, this avoids them being run when the
file perms have not been changed, in case they somehow rely on
git-annex's setting of the file perms in order to work.

Sponsored-by: Dartmouth College's Datalad project
2021-06-21 14:40:52 -04:00
Joey Hess
52e72f878e
expand 2020-07-03 14:42:04 -04:00
Joey Hess
a568302460
response 2020-02-20 16:26:52 -04:00
Joey Hess
30423f2b2d
response 2020-02-20 16:21:34 -04:00
https://christian.amsuess.com/chrysn
e07fbf936a Added a comment: Key character set 2019-12-10 10:27:58 +00:00
atrent
6d66e6a377 Added a comment: migrating... 2019-11-30 22:30:06 +00:00
Ilya_Shlyakhter
f8f3bd8eb4 Added a comment: hardlinking identical files in annex may break invariants 2019-11-30 21:36:38 +00:00
Ilya_Shlyakhter
78c2f2a973 Added a comment 2019-11-30 21:11:53 +00:00
atrent
955042a0bf Added a comment: no collisions 2019-11-30 20:37:00 +00:00
Ilya_Shlyakhter
e9ff2381bd Added a comment: same contents with different keys 2019-11-30 16:51:58 +00:00
atrent
d9b0481779 Added a comment: duplicate objects? 2019-11-30 14:04:17 +00:00
Ilya_Shlyakhter
072bc0dcee Added a comment: representing unlocked state of files 2019-09-19 18:02:31 +00:00
Ilya_Shlyakhter
d9fcc9c6cc added a hyperlink from key internals page to git-annex-examinekey 2019-04-24 17:18:33 +00:00
driusan@4d47e7deeb2f5d3846792d049ed06f96a0c3ca98
2599296a31 be more explicit about new hash format 2019-04-01 19:52:00 +00:00
Joey Hess
05de519d2c
update re field ordering 2019-01-11 16:51:54 -04:00
Ilya_Shlyakhter
62bbc15a8a Added a comment 2018-09-19 16:07:45 +00:00
Joey Hess
8cbe9b7dd3
fix typo 2018-07-19 13:11:09 -04:00
Joey Hess
a944549db9
response 2018-02-22 12:59:44 -04:00
Joey Hess
06454be3a7
remove duplicate comment 2018-02-22 12:55:13 -04:00
arseny-n@6aba76e573dcdf2fd9e033fb3132944c8466125a
1be68dd2f4 removed 2018-01-17 13:06:45 +00:00
arseny-n@6aba76e573dcdf2fd9e033fb3132944c8466125a
cd35b4a83d Added a comment: .git/annex/misctmp very large 2018-01-17 13:05:05 +00:00
arseny-n@6aba76e573dcdf2fd9e033fb3132944c8466125a
0a72d634fa Added a comment: .git/annex/misctmp very large 2018-01-17 13:04:52 +00:00
arseny-n@6aba76e573dcdf2fd9e033fb3132944c8466125a
7a4c12d692 Added a comment: .git/annex/misctmp very large 2018-01-17 13:04:40 +00:00
Edward Betts
0750913136
correct spelling mistakes 2017-02-12 17:30:23 -04:00
Joey Hess
fa4ac76d79 fix filename 2015-06-09 17:51:59 -04:00
Joey Hess
d185442382 remove .swp file 2015-06-09 17:51:10 -04:00
Joey Hess
0e4596de69 response 2015-06-09 16:33:35 -04:00
https://id.koumbit.net/anarcat
7002c1514a Added a comment: .git/annex/tmp third-party use? 2015-06-09 20:21:40 +00:00
giomasce
b17566d8bc Added a comment: Python implementation 2015-03-22 22:38:54 +00:00
Joey Hess
149d4bda61 comment 2015-02-17 17:53:08 -04:00
https://id.koumbit.net/anarcat
2865e9ff8b Added a comment: why md5sum? 2015-02-13 15:59:46 +00:00