Commit graph

44531 commits

Author SHA1 Message Date
Joey Hess
f4ba6e0c1e
add annex: url parser
Changed the format of the url to use annex: rather than annex::

The reason is that in the future, might want to support an url that
includes an uriAuthority part, eg:

annex://foo@example.com:42/358ff77e-0bc3-11ef-bc49-872e6695c0e3?type=directory&encryption=none&directory=/mnt/foo/"

To parse that foo@example.com:42 as an uriAuthority it needs to start with
annex: rather than annex::

That would also need something to be done with uriAuthority, and also
the uriPath (the UUID) is prefixed with "/" in that example. So the
current parser won't handle that example currently. But this leaves the
possibility for expansion.

Sponsored-by: Joshua Antonishen on Patreon
2024-05-06 14:50:41 -04:00
Joey Hess
4b94fc371e
implement gitremote-helpers protocol parsing
Sponsored-by: Leon Schuermann on Patreon
2024-05-06 14:07:27 -04:00
Joey Hess
f17fa48b7c
ignore git-remote-annex 2024-05-06 13:13:39 -04:00
Joey Hess
306ea42447
improve git-remote-annex docs
renamed the git config to something shorter too
2024-05-06 13:06:22 -04:00
Joey Hess
a01d64a4ad
add git-remote-annex stub and build machinery
Renamed git-remote-annex.sh, keeping it around for now for reference.

Sponsored-by: Graham Spencer on Patreon
2024-05-06 13:05:58 -04:00
Joey Hess
0be9f7a2c6
add UUID to GITBUNDLE
The UUID is included in the GITMANIFEST in order to allow a single
key/value store to be used to store several special remotes, without any
namespacing. In that situation though, if the same ref is pushed to two
special remotes, it will result in git bundles with the same content.

Which is ok, until a re-push happens to one of the special remote.
At that point, the old git bundle will be deleted. That will prevent
fetching it from the other special remote, where the re-push has not
happened.

Adding the UUID avoids this problem.
2024-05-06 12:51:44 -04:00
Joey Hess
a8cef2bf85
added man page for git-remote-annex
And document remote.<name>.git-remote-annex-max-bundles which will
configure it.

datalad-annex uses a similar url format, but with some enhancements.
See https://github.com/datalad/datalad-next/blob/main/datalad_next/gitremotes/datalad_annex.py

I added the UUID to the URL, because it is needed in order to pick out which
manifest file to use. The design allows for a single key/value store to have
several special remotes all stored in it, and so the manifest includes
the UUID in its name.

While datalad-annex allows datalad-annex::<url>?, and allows referencing
peices of the url in the parameters, needing the UUID prevents
git-remote-annex from supporting that syntax. And anyway, it is a
complication and I want to keep things simple for now.

Sponsored-by: unqueued on Patreon
2024-05-06 12:48:04 -04:00
Joey Hess
90b389369f
fix name of gitremote-helpers
The git man page has that name.
2024-05-06 12:07:05 -04:00
Joey Hess
7a9633312e
got git clone from git-remote-annex prototype working
eg git clone annex://`pwd` when the MANIFEST file is in the pwd.

This is easy in the prototype, just use $GIT_DIR, but in git-annex, it
will need to automatically git-annex init, and set up the special
remote, in order to be able to download the manifest and bundle keys
from it.

Sponsored-by: k0ld on Patreon
2024-04-30 14:40:49 -04:00
Joey Hess
fc37243ffe
convert git-remote-annex to not include old pushed refs in new bundle
Rather than requiring the last listed bundle in the manifest include all
refs that are in the remote, build up refs from each bundle listed in
the manifest.

This fixes a bug where pushing first a new branch foo from one clone,
and then pushing a new branch bar from another clone, caused the second
push to lose branch foo. Now the second push will add a new bundle, but
the foo ref in the bundle from the first push will still be used.

Pushing a deletion of a ref now has to delete all bundles and push a new
bundle with only the remaining refs in it.

In a "list for-push", it now has to unbundle all bundles, in order for a
deletion repush to have available all objects. (And a non-deletion push
can also rely on refs/namespaces/mine/ being up-to-date.)

It would have been possible to fix the bug by only making it do that
unbundling in "list for-push", without changing what's stored in the
bundles. But I think I prefer to populate the bundles this way. For one
thing, deleting a pushed ref now really deletes all data relating to it,
rather than leaving it present in old bundles. For another, it's easier
to explain since there is no special case for the last bundle. And, it
will often result in smaller bundles.

Note that further efficiency gains are possible with respect to what
objects are included in an incremental bundle. Two XXX comments
document how to reduce excess objects. It didn't seem worth implementing
those optimisations in this proof of concept code.

Sponsored-by: Brock Spratlen on Patreon
2024-04-30 14:30:09 -04:00
Joey Hess
e5cfaf003c
found a bug 2024-04-26 17:11:30 -04:00
Joey Hess
8b56d6b283
fix conflicting push situation
In a situation where there are two repos that are diverged and each pushes
in turn to git-remote-annex, the first to push updates it. Then the second
push fails because it is not a fast-forward. The problem is, before git
push fails with "non-fast-forward", it actually calls git-remote-annex
with push.

So, to the user it appears as if the push failed, but it actually reached
the remote, and overwrote the other push!

The only solution to this seems to be for git-remote-annex push to notice
when a non-force-push would overwrite a ref stored in the remote, and
refuse to push that ref, returning an error to git. This seems strange,
why would git make remote helpers implement that when it later checks the
same thing itself?

With this fix, it's still possible for a race to overwrite a change to
the MANIFEST and lose work that was pushed from the other repo. But that
needs two pushes to be running at the same time. From the user's
perspective, that situation is the same as if one repo pushed new work,
then the other repo did a git push --force, overwriting the first repo's
push. In the first repo, another push will then fail as a non
fast-forward, and the user can recover as usual. But, a MANIFEST
overwrite will leave bundle files in the remote that are not listed in
the MANIFEST. It seems likely that git-annex will eventually be able to
detect that after the fact and clean it up. Eg, it can learn all bundles
that are stored in the remote using the location log, and compare them
to the MANIFEST to find bundles that got lost.

The race can also appear to the user as if they pushed a ref, but then
it got deleted from the remote. This happens when two two pushes are
pushing different ref names. This might be harder for the user to
notice; git fetch does not indicate that a remote ref got deleted.
They would have to use git fetch --prune to notice the deletion.
Once the user does notice, they can re-push their ref to recover.

Sponsored-by: Jack Hill on Patreon
2024-04-26 15:03:04 -04:00
Joey Hess
99491f572f
TOPDIR 2024-04-26 13:27:16 -04:00
Joey Hess
6ff4300bd1
proof of concent for push to git bundles with MANIFEST
This is a shell script, so not final code, and it does not use git-annex
at all, but it shows how to push to git bundles, listed in a MANIFEST,
the same as the git-remote-annex program will eventually do.

While developing this, I realized that the design needed to be changed
slightly regarding where refs are stored. Since a push can delete a ref
from a remote, storing each newly pushed ref in a bundle won't work,
because deleting a ref would then entail deleting all old bundles and
re-uploading from scratch. So instead, only the refs in the last bundle
listed in the MANIFEST are the active refs. Any refs in prior bundles
are just old refs that were stored previously (a reflog as it were).

That means that, in a situation where two different people are pushing
to the same special remote from different repos, whoever pushes last
wins. Any refs pushed by the other person earlier will be ignored. This
may not be desirable, and git-annex might be able use the git-annex
branch to detect such situations and rescue the refs that got lost. Even
without such a recovery process though, the refs that the other person
thought they pushed will be preserved in their refs/namespaces/mine, so
a pull followed by a push will generally resolve the situation.

Note that the use of refs/namespaces/mine in the bundle is not really
desirable, and it might be worth making a local clone of the repo in
order to set up the refs that will be put in the bundle. Which seems to
be the only way to avoid needing that. But it does need to maintain
the refs/namespaces/mine/ in the git repo in order to remember what refs
have been pushed to the remote before, in order to include them in the
next bundle pushed. A name that includes the remote uuid will be needed
in the final implementation.

Anyway, this shell script seems to fully work, including incremental
pushing, force pushing, and pushes that delete refs.

Sponsored-by: Brett Eisenberg on Patreon
2024-04-25 16:55:19 -04:00
Joey Hess
f900c56ca3
parameterize manifest on UUID
and expand slightly
2024-04-06 08:34:01 -04:00
Joey Hess
9b116870a6
added docs for git-remote-annex special remote contents
Designed with the help of Timothy Sanders and Michael Hanke
at Distribits 2024
2024-04-06 05:28:49 -04:00
Joey Hess
974455ea33
Merge branch 'master' of ssh://git-annex.branchable.com 2024-04-02 17:36:42 -04:00
Joey Hess
e216bd5f10
mention reversion 2024-04-02 17:33:20 -04:00
Joey Hess
a8dd85ea5a
Revert "multiple -m"
This reverts commit cee12f6a2f.

This commit broke git-annex init run in a repo that was cloned from a
repo with an adjusted branch checked out.

The problem is that findAdjustingCommit was not able to identify the
commit that created the adjusted branch. It seems that there is an extra
"\n" at the end of the commit message that it does not expect.

Since backwards compatability needs to be maintained, cannot just make
findAdjustingCommit accept it with the "\n". Will have to instead
have one commitTree variant that uses the old method, and use it for
adjusted branch committing.
2024-04-02 17:29:07 -04:00
psxvoid
633a1b01a9 Added a comment: support for bulk write/read/test remote 2024-04-02 06:41:25 +00:00
oadams
8d858fdce2 Added a comment 2024-04-02 03:56:50 +00:00
d@403a635aa8eaa8bfa8613acb6a375d9e06ed7001
2b0df3a76d Added a comment 2024-03-27 22:11:47 +00:00
Joey Hess
96bbe9fafc
fixes 2024-03-27 16:00:18 -04:00
Joey Hess
7462d7c2d1
Merge branch 'master' of ssh://git-annex.branchable.com 2024-03-27 15:58:45 -04:00
Joey Hess
cee12f6a2f
multiple -m
sync, assist, import: Allow -m option to be specified multiple times, to
provide additional paragraphs for the commit message.

The option parser didn't allow multiple -m before, so there is no risk of
behavior change breaking something that was for some reason using multiple
-m already.

Pass through to git commands, so that the method used to assemble the
paragrahs is whatever git does. Which might conceivably change in the
future.

Note that git commit-tree has supported -m since git 1.7.7. commitTree
was probably not using it since it predates that version. Since the
configure script prevents building git-annex with git older than 2.1,
there is no risk that it's not supported now.

Sponsored-by: Nicholas Golder-Manning on Patreon
2024-03-27 15:58:27 -04:00
Joey Hess
377e9fff18
fix typo 2024-03-27 12:45:40 -04:00
d@403a635aa8eaa8bfa8613acb6a375d9e06ed7001
b886aefe69 Added a comment 2024-03-27 13:26:34 +00:00
Joey Hess
e32a5166a0
Merge branch 'master' of ssh://git-annex.branchable.com 2024-03-26 18:19:29 -04:00
Joey Hess
8d35ea976c
todo 2024-03-26 18:19:23 -04:00
yarikoptic
6b837d17c2 Added a comment 2024-03-26 19:13:15 +00:00
Joey Hess
e23721f579
fix build warning
A recent change made plumbing the backend through fsck unncessary.

Left fsck checking backend and skipping operating on key when it could
not find one. Not checking the backend would be a behavior change.
For example the command git-annex fsck --key FOO--bar does nothing
since FOO is not a known backend. If this were removed it would
instead go on and fsck it and warn that no copies exist of the key.
That behavior change seems like it would be fine, but I also have no
reason to make it.
2024-03-26 14:13:59 -04:00
Joey Hess
f601e06b90
avoid build warning on windows 2024-03-26 14:07:41 -04:00
Joey Hess
81608c3c37
windows build fix 2024-03-26 13:51:51 -04:00
Joey Hess
fc04e6fa58
comment 2024-03-26 13:48:57 -04:00
Joey Hess
962da7bcf9
update for new rclone gitannex command 2024-03-26 13:48:43 -04:00
Joey Hess
a3a09f20e9
Merge branch 'master' of ssh://git-annex.branchable.com 2024-03-26 13:17:09 -04:00
Joey Hess
a69871491f
avoid build warning on windows
Since append was only exported by Annex.Common on unix, excluding it
from import caused a build warning on windows.
2024-03-26 13:16:33 -04:00
Joey Hess
07baa7ffcf
fix build on windows
deletestale renamed from cleanstale
2024-03-26 13:12:58 -04:00
Joey Hess
418e97e847
fix build warnings on windows 2024-03-26 13:11:53 -04:00
Joey Hess
7c5007279c
Windows: Fix escaping output to terminal when using old versions of MinTTY 2024-03-26 13:09:21 -04:00
Joey Hess
db95de6f2b
clean up windows build warnings about unused imports 2024-03-26 13:06:52 -04:00
d@403a635aa8eaa8bfa8613acb6a375d9e06ed7001
4a6d17c8f9 Added a comment 2024-03-26 14:07:19 +00:00
Joey Hess
9f0295495d
Merge branch 'master' of ssh://git-annex.branchable.com 2024-03-25 14:53:02 -04:00
Joey Hess
331f9dd764
link to commit 2024-03-25 14:51:36 -04:00
Joey Hess
f04d9574d6
fix transfer lock file for Download to not include uuid
While redundant concurrent transfers were already prevented in most
cases, it failed to prevent the case where two different repositories were
sending the same content to the same repository. By removing the uuid
from the transfer lock file for Download transfers, one repository
sending content will block the other one from also sending the same
content.

In order to interoperate with old git-annex, the old lock file is still
locked, as well as locking the new one. That added a lot of extra code
and work, and the plan is to eventually stop locking the old lock file,
at some point in time when an old git-annex process is unlikely to be
running at the same time.

Note that in the case of 2 repositories both doing eg
`git-annex copy foo --to origin`
the output is not that great:

copy b (to origin...)
  transfer already in progress, or unable to take transfer lock
git-annex: transfer already in progress, or unable to take transfer lock
97%   966.81 MiB      534 GiB/s 0sp2pstdio: 1 failed

  Lost connection (fd:14: hPutBuf: resource vanished (Broken pipe))

  Transfer failed

Perhaps that output could be cleaned up? Anyway, it's a lot better than letting
the redundant transfer happen and then failing with an obscure error about
a temp file, which is what it did before. And it seems users don't often
try to do this, since nobody ever reported this bug to me before.
(The "97%" there is actually how far along the *other* transfer is.)

Sponsored-by: Joshua Antonishen on Patreon
2024-03-25 14:47:46 -04:00
Joey Hess
62129f0b24
fix windows transfer lock check
If the lock file was not able to be exclusivlely locked, don't indicate
locking failed. I'm pretty sure this was a typo. It goes all the way
back to 891c85cd88 where locking was first
introduced on windows, and there's no indication of why it would make
sense to return True here.

Sponsored-by: Leon Schuermann on Patreon
2024-03-25 14:11:25 -04:00
nobodyinperson
f937658edf Add news of git-annex merch from hellotux.com 2024-03-25 06:27:48 +00:00
nobodyinperson
7c7b9f5da9 Add link to hellotux.com git-annex shirts 2024-03-25 06:18:36 +00:00
Joey Hess
1ace305159
bug 2024-03-24 15:05:49 -04:00
Joey Hess
3d55a7ac02
comment 2024-03-22 11:07:00 -04:00