Commit graph

44152 commits

Author SHA1 Message Date
Joey Hess
41edf73789
listImportableContents filtering to wanted files
This could in theory allow importing subsets of files with less memory
use. Rather than building up a big import list and then filtering it to
a smaller list of wanted files, support optionally filtering wanted
files first.

So far, the directory special remote implements it and will probably use
less memory. (Since dirContentsRecursiveSkipping does lazy streaming.)

Implementation in Remote.S3 is incomplete and fails to compile. Bit of a
mess with ResourceT needing to use Annex.

Also, in Remote.S3, filtering is not done for old versions.
And mkImportableContentsUnversioned is doing now redundant work
to filterwanted.
2023-12-20 15:55:09 -04:00
Joey Hess
d7ca716759
response 2023-12-20 13:12:56 -04:00
Joey Hess
da5726e790
Merge branch 'master' of ssh://git-annex.branchable.com 2023-12-20 12:52:46 -04:00
jkniiv
6c0259018a close bug as notabug due to user error 2023-12-19 23:12:21 +00:00
jkniiv
191dde2857 Added a comment: my report was actually a User Failure on my part 2023-12-19 23:02:16 +00:00
unqueued
09acfef0b6 Added a comment 2023-12-19 18:50:08 +00:00
Joey Hess
c64db46b7f
refactor 2023-12-18 21:35:00 -04:00
lemondata
19edfc69a7 2023-12-19 00:17:00 +00:00
Joey Hess
9a67ed0f10
importtree: support preferred content expressions needing keys
When importing from a special remote, support preferred content expressions
that use terms that match on keys (eg "present", "copies=1"). Such terms
are ignored when importing, since the key is not known yet.

When "standard" or "groupwanted" is used, the terms in those
expressions also get pruned accordingly.

This does allow setting preferred content to "not (copies=1)" to make a
special remote into a "source" type of repository. Importing from it will
import all files. Then exporting to it will drop all files from it.

In the case of setting preferred content to "present", it's pruned on
import, so everything gets imported from it. Then on export, it's applied,
and everything in it is left on it, and no new content is exported to it.

Since the old behavior on these preferred content expressions was for
importtree to error out, there's no backwards compatability to worry about.
Except that sync/pull/etc will now import where before it errored out.
2023-12-18 16:27:59 -04:00
Joey Hess
0e161a7404
comment 2023-12-18 13:56:08 -04:00
Joey Hess
93e0810ad5
comment 2023-12-18 13:49:53 -04:00
Joey Hess
f79685f05e
fix a typo 2023-12-18 13:40:23 -04:00
Joey Hess
0a9c5724c1
Merge branch 'master' of ssh://git-annex.branchable.com 2023-12-18 12:37:57 -04:00
Atemu
4dbbc45b4d Added a comment 2023-12-18 12:08:08 +00:00
nobodyinperson
526545bf48 Added a comment: numcopies is no the target 2023-12-17 19:04:41 +00:00
Atemu
43262855e2 2023-12-17 16:28:19 +00:00
oadams
7c108335eb Added a comment 2023-12-16 21:37:22 +00:00
Joey Hess
e777ad9e87
Merge branch 'master' of ssh://git-annex.branchable.com 2023-12-16 17:07:11 -04:00
aaa
b09c85bf1a Added a comment: Key permissions 2023-12-12 22:29:24 +00:00
jkniiv
986b9caa80 Added a comment 2023-12-12 19:25:17 +00:00
Joey Hess
9f17383c00
Merge branch 'master' of ssh://git-annex.branchable.com 2023-12-12 10:49:46 -04:00
imlew
488ffce640 Added a comment 2023-12-12 13:36:21 +00:00
imlew
261bd2af55 Added a comment 2023-12-12 13:32:17 +00:00
nobodyinperson
7aebfd6068 Added a comment 2023-12-12 12:44:38 +00:00
imlew
bee99dc1a4 Added a comment 2023-12-12 11:56:06 +00:00
Joey Hess
eb59da9dd2
Lower precision of timestamps in git-annex branch
This can reduce the size of the branch by up to 8%. My test was
running git-annex add 1000 times on one file each.
Lots of different high-resolution timestamps were recorded before
and eliminating those, after packing, the git repo was 8% smaller.

Due to the use of vector clocks, high resolution timestamps are
not necessary to make clear which information is most recent when
eg, a value is changed repeatedly in the same second. In such a
case, the vector clock will be advanced to the next second after
the last modification. For example, running
git-annex numcopies 1; git-annex numcopies 2
The first will record the current second, while the next records
the second after that even if it runs in the same second.

As for conflicting information written to two different clones of the
repository, this will make git-annex sometimes pick information that
was written earlier in a second over information written later in the
same second. Usually git-annex does not write conflicting information,
but there are some cases where it could. Eg, storing an object on a remote
can update the remote state log with some state. If two repos both store the
same object, and end up storing different remote state for some reason,
this can result in one that ran a tiny bit later winning. Such a situation
seems unlikely to be user visible. And a small amount of clock skew could
already result in such things.

The only case I can think of where this might be a user visible change
is if a configuration command like git-annex numcopies is being run
in 2 clones of a repository on the same machine at very
close to the same time. Then the user will know which they ran last,
and git-annex won't.

If that did become a problem, this could be dialed back to eg log
milliseconds with still some space saving.
2023-12-11 15:04:06 -04:00
Joey Hess
86dbe9a825
migrate: support adding size back to URL keys
migrate: Support adding size to URL keys that were added with --relaxed, by
running eg: git-annex migrate --backend=URL foo

Since url keys cannot be generated, that used to fail. Make it notice that
the backend is not changed, and just get the size of the content.

Sponsored-by: Brock Spratlen on Patreon
2023-12-08 16:22:14 -04:00
Joey Hess
cb9bb2027c
update for distributed migration 2023-12-08 14:39:38 -04:00
Joey Hess
60d00fdd33
Merge branch 'master' of ssh://git-annex.branchable.com 2023-12-08 14:26:26 -04:00
Joey Hess
362a2808a5
split out todo for special remotes and close the main todo 2023-12-08 14:26:08 -04:00
Joey Hess
76e11e4458
Merge branch 'master' into distributedmigration 2023-12-08 14:18:23 -04:00
Joey Hess
257f01729c
distributed migration for pull and sync --content
pull, sync: When operating on content, automatically hard link objects
that have been migrated.

Added annex.syncmigrations config that can be set to false to prevent
pull and sync from migrating object content.

I think that true is a good default for this config, because it avoids
users having to re-download migrated content or learning about migration.
But, some users will surely not like it, whether because it does take some
time (especially for the first git-annex branch scan when there is a long
history), or because they want to deal with it manually, or because their
filesystem doesn't support hard links and they don't want it to copy
objects.

Sponsored-by: k0ld on Patreon
2023-12-08 14:18:18 -04:00
Joey Hess
4ed71b34de
migrate --apply
And avoid migrate --update/--aply migrating when the new key was already
present in the repository, and got dropped. Luckily, the location log
allows distinguishing from the new key never having been present!

That is mostly useful for --apply because otherwise dropped files would
keep coming back until the old objects were reaped as unused. But it
seemed to make sense to also do it for --update. for consistency in edge
cases if nothing else. One case where --update can use it is when one
branch got migrated earlier, and we dropped the file, and now another
branch has migrated the same file.

Sponsored-by: Jack Hill on Patreon
2023-12-08 13:23:46 -04:00
Joey Hess
51b974d9f0
skip distributed migration to insecure key when annex.securehashesonly is set
This only avoids extra work and a warning messsage. It seems likely that
in such a situation, the user does not want migrations to insecure
hashes, and so best to ignore them as much as possible. If
the user merges a branch that switches annexed files to an insecure
hash, they will notice that the file contents are unavailable,
and git-annex get will tell them the problem then. So it does not seem
useful to have migrate --update also complain about it.
2023-12-08 12:41:50 -04:00
jkniiv
9189c41dde report on the 'Production' build flag not producing a binary that passes the test suite 2023-12-08 16:38:12 +00:00
Joey Hess
b65379a107
fix missing space in warning message 2023-12-08 12:36:33 -04:00
Joey Hess
30c2728d65
always verify content in distributed migration
doc/todo/distributed_migration.mdwn discusses security of distributed
migration, and this was identified as necessary to do.
2023-12-07 20:05:42 -04:00
Joey Hess
62ce56c4ea
display filenames in migrate --update
Have to go to a lot of bother to find them, but I think it's worth it
for usability.

Sponsored-by: Luke T. Shumaker on Patreon
2023-12-07 18:00:09 -04:00
Joey Hess
abea01d9e0
migrate --update fully working
Could use some more testing.

When the old key is not present, Command.ReKey.linkKey' will return
False, so this handles that case ok.

But, I do wonder if distributed migration may need to deal with the old
key getting copied into the repository later. In that situation,
re-running migrate --update won't link it to the new key. It may be that
some users will need that. They can delete .git/annex/migrate.log and
run it again, but that is not a good user interface. Maybe either have
a way to re-run all distributed migrations, or record migrations
in a database and scan the db to find migrations to do in a future run?

Sponsored-by: Kevin Mueller on Patreon
2023-12-07 17:27:51 -04:00
Joey Hess
7c7c9912c1
migrate --update gets keys
The git log is outputting the diff, but this only looks at the new
files. When we have a new file, we can get the old filename by just
replacing "new" with "old". And then use branchFileRef to refer to it
allows catting the old key.

While this does have to skip past the old files in the diff, it's still
faster than calling git diff separately.

Sponsored-by: Nicholas Golder-Manning on Patreon
2023-12-07 17:25:56 -04:00
kolam
bae20e09cb Added a comment 2023-12-07 20:16:29 +00:00
kolam
5aee7ef234 removed 2023-12-07 20:13:13 +00:00
kolam
319e3b6252 Added a comment 2023-12-07 20:12:31 +00:00
kolam
acd942f6d8 Added a comment 2023-12-07 20:08:52 +00:00
Joey Hess
f1ce15036f
started migrate --update
This is most of the way there, but not quite working.

The layout of migrate.tree/ needs to be changed to follow this approach.
git log will list all the files in tree order, so the new layout needs
to alternate old and new keys. Can that be done? git may not document
tree order, or may not preserve it here.

Alternatively, change to using git log --format=raw and extract
the tree header from that, then use
git diff --raw $tree:migrate.tree/old $tree:migrate.tree/new
That will be a little more expensive, but only when there are lots of
migrations.

Sponsored-by: Joshua Antonishen on Patreon
2023-12-07 15:50:52 -04:00
kolam
625deffec4 2023-12-07 18:30:40 +00:00
kolam
2aa13feb3d 2023-12-07 14:08:54 +00:00
https://esgf-node.llnl.gov/esgf-idp/openid/mvhulten
60e9bb005e Added a comment 2023-12-07 12:30:25 +00:00
nobodyinperson
df62843f64 Added a comment: Similar to initremote type=git 2023-12-07 08:31:09 +00:00
kolam
84d0bd9969 2023-12-06 23:46:49 +00:00