Commit graph

44979 commits

Author SHA1 Message Date
Joey Hess
ce60211881
add incremental vs full push race to todo
with plan to deal with it
2024-05-16 09:37:28 -04:00
Joey Hess
434a88c368
Merge branch 'git-remote-annex' 2024-05-15 17:57:50 -04:00
Joey Hess
768cdee461
testremote: Really fsck downloaded objects
8844372c23 exposted a bug in testremote, it
was passing the serialized key, not the object file, to be checksummed.
2024-05-15 17:57:27 -04:00
Joey Hess
468de43d66
Merge branch 'master' into git-remote-annex 2024-05-15 17:49:12 -04:00
Joey Hess
b1b6e35d4c
reorg todo 2024-05-15 17:41:55 -04:00
Joey Hess
adcebbae47
clean up git-remote-annex git-annex branch handling
Implemented alternateJournal, which git-remote-annex
uses to avoid any writes to the git-annex branch while setting up
a special remote from an annex:: url.

That prevents the remote.log from being overwritten with the special
remote configuration from the url, which might not be 100% the same as
the existing special remote configuration.

And it prevents an overwrite deleting of other stuff that was
already in the remote.log.

Also, when the branch was created by git-remote-annex, only delete it
at the end if nothing else has been written to it by another command.
This fixes the race condition described in
797f27ab05, where git-remote-annex
set up the branch and git-annex init and other commands were
run at the same time and their writes to the branch were lost.
2024-05-15 17:33:38 -04:00
Joey Hess
d24d8870c5
todo 2024-05-15 14:33:13 -04:00
Joey Hess
2dfffa0621
bugfix
When pushing branch foo, we don't want to delete other tracking
branches. In particular, a full push needs all the tracking branches.
2024-05-14 16:17:27 -04:00
Joey Hess
169e673ad4
result of some testing 2024-05-14 16:01:24 -04:00
Joey Hess
7dd2a67c41
fix names of new git configs 2024-05-14 15:33:47 -04:00
Joey Hess
0722c504c5
update docs for git-remote-annex 2024-05-14 15:31:16 -04:00
Joey Hess
23c4125ed4
mention other commands shipped with git-annex in SEE ALSO in man page 2024-05-14 15:23:45 -04:00
Joey Hess
24af51e66d
git-annex unused --from remote skips its git-remote-annex keys
This turns out to only be necessary is edge cases. Most of the
time, git-annex unused --from remote doesn't see git-remote-annex keys
at all, because it does not record a location log for them.

On the other hand, git-annex unused does find them, since it does not
rely on the location log. And that's good because they're a local cache
that the user should be able to drop.

If, however, the user ran git-annex unused and then git-annex move
--unused --to remote, the keys would have a location log for that
remote. Then git-annex unused --from remote would see them, and would
consider them unused. Even when they are present on the special remote
they belong to. And that risks losing data if they drop the keys from
the special remote, but didn't expect it would delete git branches they
had pushed to it.

So, make git-annex unused --from skip git-remote-annex keys whose uuid
is the same as the remote.
2024-05-14 15:17:40 -04:00
Joey Hess
0bf72ef103
max-git-bundles config for git-remote-annex 2024-05-14 14:23:40 -04:00
Joey Hess
8ad768fdba
todo 2024-05-14 13:58:35 -04:00
Joey Hess
6f1039900d
prevent using git-remote-annex with unsuitable special remote configs
I hope to support importtree=yes eventually, but it does not currently
work.

Added remote.<name>.allow-encrypted-gitrepo that needs to be set to
allow using it with encrypted git repos.

Note that even encryption=pubkey uses a cipher stored in the git repo
to encrypt the keys stored in the remote. While it would be possible to
not encrypt the GITBUNDLE and GITMANIFEST keys, and then allow using
encryption=pubkey, it doesn't currently work, and that would be a
complication that I doubt is worth it.
2024-05-14 13:52:20 -04:00
Joey Hess
e154c6da92
bug report (copied from email) 2024-05-13 17:11:34 -04:00
Joey Hess
8bf6dab615
update 2024-05-13 14:42:25 -04:00
Joey Hess
ddf05c271b
fix cloning from an annex:: remote with exporttree=yes
Updating the remote list needs the config to be written to the git-annex
branch, which was not done for good reasons. While it would be possible
to instead use Remote.List.remoteGen without writing to the branch, I
already have a plan to discard git-annex branch writes made by
git-remote-annex, so the simplest fix is to write the config to the
branch.

Sponsored-by: k0ld on Patreon
2024-05-13 14:35:17 -04:00
Joey Hess
552b000ef1
update 2024-05-13 14:30:18 -04:00
Joey Hess
13a6a20716
fix --is-ancestor option 2024-05-13 13:52:58 -04:00
Joey Hess
34eae54ff9
git-remote-annex support exporttree=yes remotes
Put the annex objects in .git/annex/objects/ inside the export remote.
This way, when importing from the remote, they will be filtered out.

Note that, when importtree=yes, content identifiers are used, and this
means that pushing to a remote updates the git-annex branch. Urk.
Will need to try to prevent that later, but I already had a todo about
that for other reasons.

Untested!

Sponsored-By: Brock Spratlen on Patreon
2024-05-13 11:48:00 -04:00
Joey Hess
3f848564ac
refuse to fetch from a remote that has no manifest
Otherwise, it can be confusing to clone from a wrong url, since it fails
to download a manifest and so appears as if the remote exists but is empty.

Sponsored-by: Jack Hill on Patreon
2024-05-13 09:47:21 -04:00
Joey Hess
424afe46d7
fix incremental push to preserve existing bundle keys in manifest
Also broke Manifest out to its own type with a smart constructor.

Sponsored-by: mycroft on Patreon
2024-05-13 09:47:05 -04:00
Joey Hess
97b309b56e
extend manifest with keys to be deleted
This will eventually be used to recover from an interrupted fullPush
and drop the old bundle keys it was unable to delete.

Sponsored-by: Luke T. Shumaker on Patreon
2024-05-13 09:09:33 -04:00
Joey Hess
0281f7f23e
Avoid the --fast option preventing checksumming in some cases it was not supposed to
fsck --fast was intended to disable checksumming, but checksumming is done
after transfers too. Due to the check being in the non-incremental path,
it would only affect non-incremental checksumming during a transfer,
and I'm not 100% sure that it was a problem.

Also, when using an external backend that does checksumming, fsck --fast
didn't disable it and now does.
2024-05-12 21:36:48 -04:00
Joey Hess
8844372c23
remove doesPathExist check in checkKeyCheckSum
Before commit c565340adc,
it was statting the file in order to get its size, which was needed to
use an external hasher. In that commit since it no longer needed the
stat, I made it check doesPathExist since the old code implicitly
checked if the file existed. But that is not necessary, this should only
be ever called on files that exist.
2024-05-12 21:31:58 -04:00
Joey Hess
05684bdd6c
fsck: Fix recent reversion that made it say it was checksumming files whose content is not present
Did not track down the commit that caused the problem, but git-annex
version 10.20240431 didn't behave that way.
2024-05-12 21:23:27 -04:00
Joey Hess
1f62bc861a
delete shell prototype 2024-05-10 15:09:56 -04:00
Joey Hess
dfb09ad1ad
preparing to merge git-remote-annex
Update its todo with remaining items.

Add changelog entry.

Simplified internals document to no longer be notes to myself, but
target users who want to understand how the data is stored
and might want to extract these repos manually.

Sponsored-by: Kevin Mueller on Patreon
2024-05-10 15:06:15 -04:00
Joey Hess
4d0543932e
pushEmpty: upload empty manifest 2024-05-10 14:40:38 -04:00
Joey Hess
ff5193c6ad
Merge branch 'master' into git-remote-annex 2024-05-10 14:20:36 -04:00
Joey Hess
1250bb26a0
reject annex:: url that omits a uuid
Such as annex::?type=foo&...

I accidentially left out the uuid when creating one,
and the result is it appears to clone an empty repository.
So let's guard against that mistake.
2024-05-10 13:59:35 -04:00
Joey Hess
ef5e9aa082
git-remote-annex working
A few bugfixes. Have not tested extensively, but a push followed by a
clone worked.

Sponsored-by: Nicholas Golder-Manning on Patreon
2024-05-10 13:55:46 -04:00
Joey Hess
3039331529
git-remote-annex: incremental pushing
Untested

Sponsored-by: Joshua Antonishen on Patreon
2024-05-10 13:32:37 -04:00
Joey Hess
f2d17cf154
git-remote-annex: mostly implemented pushing
Full pushing will probably work, but is untested.
Incremental pushing is not implemented yet.

While a fairly straightforward port of the shell prototype, the details
of exactly how to get the objects to the remote were tricky. And the
prototype did not consider how to deal with partial failures and
interruptions.

I've taken considerable care to make sure it always leaves things in a
consistent state when interrupted or when it loses access to a remote in
the middle of a push.

Sponsored-by: Leon Schuermann on Patreon
2024-05-09 16:18:10 -04:00
Joey Hess
797f27ab05
handle cloning from a special remote that does not contain a git-annex branch
It did not seem possible to avoid creating a git-annex branch while
git-remote-annex is running. Special remotes can even store their own
state in it. So instead, if it didn't exist before git-remote-annex
created it, it deletes it at the end.

This does possibly allow a race condition, where git-annex init and
perhaps other git-annex writing commands are run, that writes to the
git-annex branch, at the same time a git-remote-annex process is being
run by git fetch/push with a full annex:: url. Those writes would be
lost. If the repository has already been initialized before
git-remote-annex, that race won't happen. So it's pretty unlikely.

Sponsored-by: Graham Spencer on Patreon
2024-05-08 18:37:43 -04:00
Joey Hess
59fc2005ec
git clone support for git-remote-annex
Also support using annex:: urls that specify the whole special remote
config.

Both of these cases need a special remote to be initialized enough to
use it, which means writing to .git/config but not to the git-annex
branch. When cloning, the remote is left set up in .git/config,
so further use of it, by git-annex or git-remote-annex will work. When
using git with an annex:: url, a temporary remote is written to
.git/config, but then removed at the end.

While that's a little bit ugly, the fact is that the Remote interface
expects that it's ok to set git configs of the remote that is being
initialized. And it's nowhere near as ugly as the alternative of making
a temporary git repository and initializing the special remote in there.

Cloning from a repository that does not contain a git-annex branch and
then later running git-annex init is currently broken, although I've
gotten most of the way there to supporting it.
See cleanupInitialization FIXME.

Special shout out to git clone for running gitremote-helpers with
GIT_DIR set, but not in the git repository and with GIT_WORK_TREE not
set. Resulting in needing the fixupRepo hack.

Sponsored-by: unqueued on Patreon
2024-05-08 17:07:33 -04:00
Joey Hess
df5011ec43
git-remote-annex: fix hang on fetch
Sponsored-by: k0ld on Patreon
2024-05-07 15:34:55 -04:00
Joey Hess
cdcf2fe3a2
git-remote-annex can fetch from an existing special remote
Tested using a manually populated directory special remote.

Pushing is still to be done. So is fetching from special remotes
configured via the annex:: url.

Sponsored-by: Brock Spratlen on Patreon
2024-05-07 15:13:41 -04:00
Joey Hess
a89e8f6bad
skip remotes with an annex:: url
These remotes are not regular git remotes, they are special remotes that
git uses git-remote-annex to access.

Sponsored-by: Jack Hill on Patreon
2024-05-07 15:02:20 -04:00
Joey Hess
947cf1c345
back to annex:: for git-remote-annex url
Oh, turns out git needs two colons to use a gitremote-helper. Ok.
2024-05-07 14:37:29 -04:00
Joey Hess
e1447dc2e2
add git bundle interface
Sponsored-by: mycroft on Patreon
2024-05-07 14:22:41 -04:00
Joey Hess
8d58a23548
add git for-each-ref binding
Sponsored-by: Luke T. Shumaker on Patreon
2024-05-07 14:22:04 -04:00
Joey Hess
c7731cdbd9
add Backend.GitRemoteAnnex
Making GITBUNDLE be in the backend list allows those keys to be
hashed to verify, both when git-remote-annex downloads them, and by other
transfers and by git fsck.

GITMANIFEST is not in the backend list, because those keys will never be
stored in .git/annex/objects and can't be verified in any case.

This does mean that git-annex version will include GITBUNDLE in the list
of backends.

Also documented these in backends.mdwn

Sponsored-by: Kevin Mueller on Patreon
2024-05-07 13:54:08 -04:00
Joey Hess
483887591d
working toward git-remote-annex using a special remote
Not quite there yet.

Also, changed the format of GITBUNDLE keys to use only one '-'
after the UUID. A sha256 does not contain that character, so can just
split at the last one.

Amusingly, the sha256 will probably not actually be verified. A git
bundle contains its own checksums that git uses to verify it. And if
someone wanted to replace the content of a GITBUNDLE object, they
could just edit the manifest to use a new one whose sha256 does verify.

Sponsored-by: Nicholas Golder-Manning
2024-05-06 16:28:04 -04:00
Joey Hess
f4ba6e0c1e
add annex: url parser
Changed the format of the url to use annex: rather than annex::

The reason is that in the future, might want to support an url that
includes an uriAuthority part, eg:

annex://foo@example.com:42/358ff77e-0bc3-11ef-bc49-872e6695c0e3?type=directory&encryption=none&directory=/mnt/foo/"

To parse that foo@example.com:42 as an uriAuthority it needs to start with
annex: rather than annex::

That would also need something to be done with uriAuthority, and also
the uriPath (the UUID) is prefixed with "/" in that example. So the
current parser won't handle that example currently. But this leaves the
possibility for expansion.

Sponsored-by: Joshua Antonishen on Patreon
2024-05-06 14:50:41 -04:00
Joey Hess
4b94fc371e
implement gitremote-helpers protocol parsing
Sponsored-by: Leon Schuermann on Patreon
2024-05-06 14:07:27 -04:00
Joey Hess
f17fa48b7c
ignore git-remote-annex 2024-05-06 13:13:39 -04:00
Joey Hess
306ea42447
improve git-remote-annex docs
renamed the git config to something shorter too
2024-05-06 13:06:22 -04:00