Commit graph

4543 commits

Author SHA1 Message Date
Joey Hess
2670508b97
also broke git-remote-annex 2024-05-24 13:35:45 -04:00
Joey Hess
b792b128a0
verified checkprereq
The case documented in its comment worked in a test push and clone.
2024-05-24 13:06:29 -04:00
Joey Hess
1a3c60cc8e
git-remote-annex: avoid bundle object leakage in push race or interrupted push
Locally record the manifest before uploading it or any bundles,
and read it on the next push. Any bundles from the push that are
not included in the currently being pushed manifest will get added
to the outManifest, and so eventually get deleted.

This deals with an interrupted push that is not resumed and instead
something else is pushed. And it deals with a push race that overwrites
the manifest.

Of course, this can't help if one of those situations is followed by
the local repo being deleted. But that's equivilant to doing a git-annex
copy of a new annexed file to a special remote and then deleting the
special repo w/o pushing. In either case the special remote ends up with
a object in it that git-annex doesn't know about.
2024-05-24 12:47:32 -04:00
Joey Hess
264c51b4f4
comment 2024-05-22 06:06:18 -04:00
Joey Hess
4131e31f5c
PATH_MAX 2024-05-22 04:26:36 -04:00
Joey Hess
5fb307f1c5
comment 2024-05-21 17:47:55 -04:00
Joey Hess
938e714a11
bleh 2024-05-21 17:32:49 -04:00
Joey Hess
10a60183e1
guard pushEmpty 2024-05-21 12:12:44 -04:00
Joey Hess
14c79373c4
update 2024-05-21 12:05:44 -04:00
Joey Hess
b3d7ae51f0
fix edge case where git-annex branch does not have config for enabled special remote
One way this could happen is cloning an empty special remote.
A later fetch would then fail.
2024-05-21 11:27:49 -04:00
Joey Hess
3e7324bbcb
only delete bundles on pushEmpty
This avoids some apparently otherwise unsolveable problems involving
races that resulted in the manifest listing bundles that were deleted.

Removed the annex-max-git-bundles config because it can't actually
result in deleting old bundles. It would still be possible to have a
config that controls how often to do a full push, which would avoid
needing to download too many bundles on clone, as well as needing to
checkpresent too many bundles in verifyManifest. But it would need a
different name and description.
2024-05-21 11:13:27 -04:00
Joey Hess
f544946b09
update 2024-05-21 10:20:30 -04:00
Joey Hess
b042dfeb0e
emptying pushes only delete 2024-05-21 09:52:35 -04:00
Joey Hess
5d40759470
formalize problem description 2024-05-21 09:35:46 -04:00
Joey Hess
3a38520aac
avoid interrupted push leaving remote without a manifest
Added a backup manifest key, which is used if the main manifest key is
not present. When uploading a new Manifest, it makes sure that it never
drops one key except when the other key is present.

It's entirely possible for the two manifest keys to get out of sync, due
to races. The main one wins when it's present, it is possible for the
main one being dropped to expose the backup one, which has a different
push recorded.
2024-05-20 15:41:09 -04:00
Joey Hess
594ca2fd3a
update 2024-05-20 14:52:06 -04:00
Joey Hess
34a6db4f15
improve recovery from interrupted push
On push, first try to drop all outManifest keys listed in the current
manifest file, which resumes from an interrupted push that didn't
get a chance to delete those keys.

The new manifest gets its outManifest populated with the keys that were
in the old manifest, plus any of the keys that were unable to be
dropped.

Note that it would be possible for uploadManifest to skip dropping old
keys at all. The old keys would get dropped on the next push. But it
seems better to delete stuff immediately rather than waiting. And the
extra work is limited to push and typically is small.

A remote where dropKey always fails will result in an outManifest that
grows longer and longer. It would be possible to check if the remote
has appendonly = True and avoid populating the outManifest. Of course,
an appendonly remote will grow with every git push anyway. And currently
only Remote.GitLFS sets that, which can't be used as a git-remote-annex
remote anyway.
2024-05-20 13:49:45 -04:00
Joey Hess
ce60211881
add incremental vs full push race to todo
with plan to deal with it
2024-05-16 09:37:28 -04:00
Joey Hess
b1b6e35d4c
reorg todo 2024-05-15 17:41:55 -04:00
Joey Hess
adcebbae47
clean up git-remote-annex git-annex branch handling
Implemented alternateJournal, which git-remote-annex
uses to avoid any writes to the git-annex branch while setting up
a special remote from an annex:: url.

That prevents the remote.log from being overwritten with the special
remote configuration from the url, which might not be 100% the same as
the existing special remote configuration.

And it prevents an overwrite deleting of other stuff that was
already in the remote.log.

Also, when the branch was created by git-remote-annex, only delete it
at the end if nothing else has been written to it by another command.
This fixes the race condition described in
797f27ab05, where git-remote-annex
set up the branch and git-annex init and other commands were
run at the same time and their writes to the branch were lost.
2024-05-15 17:33:38 -04:00
Joey Hess
d24d8870c5
todo 2024-05-15 14:33:13 -04:00
Joey Hess
2dfffa0621
bugfix
When pushing branch foo, we don't want to delete other tracking
branches. In particular, a full push needs all the tracking branches.
2024-05-14 16:17:27 -04:00
Joey Hess
169e673ad4
result of some testing 2024-05-14 16:01:24 -04:00
Joey Hess
0722c504c5
update docs for git-remote-annex 2024-05-14 15:31:16 -04:00
Joey Hess
23c4125ed4
mention other commands shipped with git-annex in SEE ALSO in man page 2024-05-14 15:23:45 -04:00
Joey Hess
24af51e66d
git-annex unused --from remote skips its git-remote-annex keys
This turns out to only be necessary is edge cases. Most of the
time, git-annex unused --from remote doesn't see git-remote-annex keys
at all, because it does not record a location log for them.

On the other hand, git-annex unused does find them, since it does not
rely on the location log. And that's good because they're a local cache
that the user should be able to drop.

If, however, the user ran git-annex unused and then git-annex move
--unused --to remote, the keys would have a location log for that
remote. Then git-annex unused --from remote would see them, and would
consider them unused. Even when they are present on the special remote
they belong to. And that risks losing data if they drop the keys from
the special remote, but didn't expect it would delete git branches they
had pushed to it.

So, make git-annex unused --from skip git-remote-annex keys whose uuid
is the same as the remote.
2024-05-14 15:17:40 -04:00
Joey Hess
0bf72ef103
max-git-bundles config for git-remote-annex 2024-05-14 14:23:40 -04:00
Joey Hess
8ad768fdba
todo 2024-05-14 13:58:35 -04:00
Joey Hess
6f1039900d
prevent using git-remote-annex with unsuitable special remote configs
I hope to support importtree=yes eventually, but it does not currently
work.

Added remote.<name>.allow-encrypted-gitrepo that needs to be set to
allow using it with encrypted git repos.

Note that even encryption=pubkey uses a cipher stored in the git repo
to encrypt the keys stored in the remote. While it would be possible to
not encrypt the GITBUNDLE and GITMANIFEST keys, and then allow using
encryption=pubkey, it doesn't currently work, and that would be a
complication that I doubt is worth it.
2024-05-14 13:52:20 -04:00
Joey Hess
8bf6dab615
update 2024-05-13 14:42:25 -04:00
Joey Hess
ddf05c271b
fix cloning from an annex:: remote with exporttree=yes
Updating the remote list needs the config to be written to the git-annex
branch, which was not done for good reasons. While it would be possible
to instead use Remote.List.remoteGen without writing to the branch, I
already have a plan to discard git-annex branch writes made by
git-remote-annex, so the simplest fix is to write the config to the
branch.

Sponsored-by: k0ld on Patreon
2024-05-13 14:35:17 -04:00
Joey Hess
552b000ef1
update 2024-05-13 14:30:18 -04:00
Joey Hess
34eae54ff9
git-remote-annex support exporttree=yes remotes
Put the annex objects in .git/annex/objects/ inside the export remote.
This way, when importing from the remote, they will be filtered out.

Note that, when importtree=yes, content identifiers are used, and this
means that pushing to a remote updates the git-annex branch. Urk.
Will need to try to prevent that later, but I already had a todo about
that for other reasons.

Untested!

Sponsored-By: Brock Spratlen on Patreon
2024-05-13 11:48:00 -04:00
Joey Hess
3f848564ac
refuse to fetch from a remote that has no manifest
Otherwise, it can be confusing to clone from a wrong url, since it fails
to download a manifest and so appears as if the remote exists but is empty.

Sponsored-by: Jack Hill on Patreon
2024-05-13 09:47:21 -04:00
Joey Hess
97b309b56e
extend manifest with keys to be deleted
This will eventually be used to recover from an interrupted fullPush
and drop the old bundle keys it was unable to delete.

Sponsored-by: Luke T. Shumaker on Patreon
2024-05-13 09:09:33 -04:00
Joey Hess
dfb09ad1ad
preparing to merge git-remote-annex
Update its todo with remaining items.

Add changelog entry.

Simplified internals document to no longer be notes to myself, but
target users who want to understand how the data is stored
and might want to extract these repos manually.

Sponsored-by: Kevin Mueller on Patreon
2024-05-10 15:06:15 -04:00
Yaroslav Halchenko
9c2ab31549
Fix compatable typo (yet to add to codespell)
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "git-sedi compatable compatible",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
2024-05-01 15:46:25 -04:00
Joey Hess
cbaf2172ab
started on a design for P2P protocol over HTTP
Added to git-annex_proxies todo because this is something OpenNeuro
would need in order to use the git-annex proxy.

Sponsored-by: Dartmouth College's OpenNeuro project
2024-05-01 15:26:51 -04:00
Joey Hess
d28adebd6b
number list 2024-05-01 12:19:12 -04:00
Joey Hess
0d0c891ff9
add headers for tocs 2024-05-01 12:18:14 -04:00
Joey Hess
4cd2c980d2
toc 2024-05-01 12:14:59 -04:00
Joey Hess
e7333aa505
fix link 2024-05-01 11:08:57 -04:00
Joey Hess
a612fe7299
add todo linking to two design docs and some related todos
Tagging with projects/openneuro as Christopher Markiewicz has oked
them funding at least the initial design work on this.
2024-05-01 11:04:20 -04:00
Joey Hess
5b36e6b4fb
comments 2024-04-30 16:08:46 -04:00
Joey Hess
f3cca8a9f8
applied patch 2024-04-30 15:17:38 -04:00
Joey Hess
1f37d0b00d
promote comment to todo 2024-04-30 15:13:59 -04:00
Joey Hess
84611e7ee6
todo 2024-04-26 04:03:10 -04:00
Joey Hess
e3c5f0079d
Merge branch 'master' of ssh://git-annex.branchable.com 2024-04-25 17:01:32 -04:00
Joey Hess
d895df1010
update 2024-04-25 17:01:17 -04:00
ErrGe
cafa9af811 2024-04-22 15:37:04 +00:00
ErrGe
67d92c3aee 2024-04-22 15:36:26 +00:00
ErrGe
649909cc94 2024-04-22 15:35:18 +00:00
Joey Hess
c410b2bb73
annex.maxextensions configuration
Controls how many filename extensions to preserve.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2024-04-18 14:23:38 -04:00
Joey Hess
b700c48b15
comment 2024-04-18 13:50:19 -04:00
Joey Hess
a1fd72b91a
update to focus on why this is still open 2024-04-18 12:40:53 -04:00
ErrGe
e44513cfe7 Added a comment: hook idea implementation is cool, but usage is not so simple for the enduser 2024-04-18 01:17:02 +00:00
Joey Hess
d55e3f5fe2
Merge branch 'master' of ssh://git-annex.branchable.com 2024-04-17 15:27:09 -04:00
Joey Hess
d372553540
rclone special remote
Added rclone special remote, which can be used without needing to install
the git-annex-remote-rclone program. This needs a new version of rclone,
which supports "rclone gitannex".

This is implemented as a variant of an external special remote, that
runs "rclone gitannex" instead of the usual git-annex-remote- command.
Parameterized Remote.External to support that.

Sponsored-by: Luke T. Shumaker on Patreon
2024-04-17 15:20:37 -04:00
Joey Hess
5c542c0382
update 2024-04-17 13:11:17 -04:00
yarikoptic
6bcb2f7f02 original possible todo on extension 2024-04-17 13:30:23 +00:00
mih
66222e9354 Added a comment: Need for more than HEAD/URL? 2024-04-15 05:00:58 +00:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92
eaece30f90 Added a comment: prior art 2024-04-13 20:30:56 +00:00
mih
08be93a5cb Thoughts on support special remotes that compute keys instead of downloading 2024-04-11 12:26:40 +00:00
Joey Hess
4bb5b7c519
comment 2024-04-10 13:00:16 -04:00
Joey Hess
38b1e8a36e
todo 2024-04-10 12:46:27 -04:00
Joey Hess
00593523c6
Merge branch 'master' of ssh://git-annex.branchable.com 2024-04-09 12:59:01 -04:00
Joey Hess
2c73845d90
multiple -m second try
Test suite passes this time. When committing the adjusted branch, use
the old method to make a message that old git-annex can consume. Also
made the code accept the new message, so that eventually
commitTreeExactMessage can be removed.

Sponsored-by: Kevin Mueller on Patreon
2024-04-09 12:56:47 -04:00
nobodyinperson
e3a58a2710 Added a comment: or use numcopies for safety 2024-04-09 10:48:10 +00:00
nobodyinperson
f020a804a2 Added a comment: when someone names files like keys, they probably want trouble 🙃 2024-04-09 10:43:16 +00:00
Joey Hess
69546f73ca
comment 2024-04-08 16:57:53 -04:00
lukasz.opiola@8b366725db99c2a5e0e638d1a5d57d457d0bdad4
5cb2186cd5 2024-04-08 13:56:32 +00:00
lukasz.opiola@8b366725db99c2a5e0e638d1a5d57d457d0bdad4
58180abd6b 2024-04-08 13:55:19 +00:00
lukasz.opiola@8b366725db99c2a5e0e638d1a5d57d457d0bdad4
a1616be8d6 2024-04-08 13:53:48 +00:00
Joey Hess
cefba1c4dc
Merge branch 'master' of ssh://git-annex.branchable.com 2024-04-08 09:16:34 -04:00
Joey Hess
967f887b95
update 2024-04-08 09:16:27 -04:00
grawity@dec5f8ddda45c421809e4687d9950f9ed2a03e46
3dfe165a05 2024-04-08 09:52:49 +00:00
Joey Hess
6401a1eed5
Merge branch 'master' of ssh://git-annex.branchable.com 2024-04-06 19:52:26 -04:00
Joey Hess
5060185a7b
project we started at Distribits 2024-04-06 19:52:07 -04:00
nobodyinperson
bae7076e43 Added a comment: more use cases for configurable default preferred content 2024-04-06 15:12:50 +00:00
Joey Hess
e216bd5f10
mention reversion 2024-04-02 17:33:20 -04:00
Joey Hess
a8dd85ea5a
Revert "multiple -m"
This reverts commit cee12f6a2f.

This commit broke git-annex init run in a repo that was cloned from a
repo with an adjusted branch checked out.

The problem is that findAdjustingCommit was not able to identify the
commit that created the adjusted branch. It seems that there is an extra
"\n" at the end of the commit message that it does not expect.

Since backwards compatability needs to be maintained, cannot just make
findAdjustingCommit accept it with the "\n". Will have to instead
have one commitTree variant that uses the old method, and use it for
adjusted branch committing.
2024-04-02 17:29:07 -04:00
Joey Hess
cee12f6a2f
multiple -m
sync, assist, import: Allow -m option to be specified multiple times, to
provide additional paragraphs for the commit message.

The option parser didn't allow multiple -m before, so there is no risk of
behavior change breaking something that was for some reason using multiple
-m already.

Pass through to git commands, so that the method used to assemble the
paragrahs is whatever git does. Which might conceivably change in the
future.

Note that git commit-tree has supported -m since git 1.7.7. commitTree
was probably not using it since it predates that version. Since the
configure script prevents building git-annex with git older than 2.1,
there is no risk that it's not supported now.

Sponsored-by: Nicholas Golder-Manning on Patreon
2024-03-27 15:58:27 -04:00
Joey Hess
e32a5166a0
Merge branch 'master' of ssh://git-annex.branchable.com 2024-03-26 18:19:29 -04:00
Joey Hess
8d35ea976c
todo 2024-03-26 18:19:23 -04:00
yarikoptic
6b837d17c2 Added a comment 2024-03-26 19:13:15 +00:00
Joey Hess
962da7bcf9
update for new rclone gitannex command 2024-03-26 13:48:43 -04:00
Joey Hess
331f9dd764
link to commit 2024-03-25 14:51:36 -04:00
Joey Hess
f04d9574d6
fix transfer lock file for Download to not include uuid
While redundant concurrent transfers were already prevented in most
cases, it failed to prevent the case where two different repositories were
sending the same content to the same repository. By removing the uuid
from the transfer lock file for Download transfers, one repository
sending content will block the other one from also sending the same
content.

In order to interoperate with old git-annex, the old lock file is still
locked, as well as locking the new one. That added a lot of extra code
and work, and the plan is to eventually stop locking the old lock file,
at some point in time when an old git-annex process is unlikely to be
running at the same time.

Note that in the case of 2 repositories both doing eg
`git-annex copy foo --to origin`
the output is not that great:

copy b (to origin...)
  transfer already in progress, or unable to take transfer lock
git-annex: transfer already in progress, or unable to take transfer lock
97%   966.81 MiB      534 GiB/s 0sp2pstdio: 1 failed

  Lost connection (fd:14: hPutBuf: resource vanished (Broken pipe))

  Transfer failed

Perhaps that output could be cleaned up? Anyway, it's a lot better than letting
the redundant transfer happen and then failing with an obscure error about
a temp file, which is what it did before. And it seems users don't often
try to do this, since nobody ever reported this bug to me before.
(The "97%" there is actually how far along the *other* transfer is.)

Sponsored-by: Joshua Antonishen on Patreon
2024-03-25 14:47:46 -04:00
Joey Hess
7044232696
todo 2024-03-13 11:04:06 -04:00
Joey Hess
eb2cd944d9
update 2024-03-08 14:32:29 -04:00
Joey Hess
ad966e5e7b
update 2024-03-08 13:43:31 -04:00
Joey Hess
1bf02029f9
small problem 2024-03-05 13:45:31 -04:00
Joey Hess
3874b7364f
add todo for tracking free space in repos via git-annex branch
For balanced preferred content perhaps, or just for git-annex info
display.

Sponsored-by: unqueued on Patreon
2024-03-05 13:16:42 -04:00
Joey Hess
a6a7b8320a
Merge branch 'master' of ssh://git-annex.branchable.com 2024-03-01 16:53:13 -04:00
Joey Hess
e7652b0997
implement URL to VURL migration
This needs the content to be present in order to hash it. But it's not
possible for a module used by Backend.URL to call inAnnex because that
would entail a dependency loop. So instead, rely on the fact that
Command.Migrate calls inAnnex before performing a migration.

But, Command.ExamineKey calls fastMigrate and the key may or may not
exist, and it's not wanting to actually perform a migration in any case.
To handle that, had to add an additional value to fastMigrate to
indicate whether the content is inAnnex.

Factored generateEquivilantKey out of Remote.Web.

Note that migrateFromURLToVURL hardcodes use of the SHA256E backend.
It would have been difficult not to, given all the dependency loop
issues. But --backend and annex.backend are used to tell git-annex
migrate to use VURL in any case, so there's no config knob that
the user could expect to configure that.

Sponsored-by: Brock Spratlen on Patreon
2024-03-01 16:42:02 -04:00
Joey Hess
cb50cdcc58
todo 2024-03-01 15:14:45 -04:00
Joey Hess
def94fbff6
update 2024-03-01 13:48:51 -04:00
Joey Hess
1b0de3021a
avoid double checksum when downloading VURL from web for 1st time
Sponsored-by: Jack Hill on Patreon
2024-03-01 13:44:40 -04:00
Joey Hess
4046f17ca0
incremental verification for VURL
Sponsored-by: Brett Eisenberg on Patreon
2024-03-01 13:33:29 -04:00
yarikoptic
283e071bcb has potential in DANDI project 2024-02-29 23:31:05 +00:00