Commit graph

34698 commits

Author SHA1 Message Date
Joey Hess
2f20b939b7
LiveUpdate db updates working
I've tested the behavior of the thread that waits for the LiveUpdate to
be finished, and it does get signaled and exit cleanly when the
LiveUpdate is GCed instead.

Made finishedLiveUpdate wait for the thread to finish updating the
database.

There is a case where GC doesn't happen in time and the database is left
with a live update recorded in it. This should not be a problem as such
stale data can also happen when interrupted and will need to be detected
when loading the database.

Balanced preferred content expressions now call startLiveUpdate.
2024-08-24 11:49:58 -04:00
Joey Hess
84d1bb746b
LiveUpdate for clusters 2024-08-24 10:20:12 -04:00
Joey Hess
18cd8bf43a
punt on LiveUpdate plumbing through assistant for now 2024-08-24 09:37:24 -04:00
yarikoptic
efdee386c0 initial report on desire to do handle pathspecs 2024-08-24 01:35:31 +00:00
yarikoptic
c3877f648c initial idea on another ability for get 2024-08-24 01:23:04 +00:00
Joey Hess
c3d40b9ec3
plumb in LiveUpdate (WIP)
Each command that first checks preferred content (and/or required
content) and then does something that can change the sizes of
repositories needs to call prepareLiveUpdate, and plumb it through the
preferred content check and the location log update.

So far, only Command.Drop is done. Many other commands that don't need
to do this have been updated to keep working.

There may be some calls to NoLiveUpdate in places where that should be
done. All will need to be double checked.

Not currently in a compilable state.
2024-08-23 16:35:12 -04:00
Joey Hess
4885073377
add live size changes to RepoSize database
Not yet used.
2024-08-23 12:51:00 -04:00
Joey Hess
dad1fb150f
update 2024-08-23 11:45:36 -04:00
Joey Hess
d0ab1550ec
possible design to address reposizes concurrency issues 2024-08-23 11:19:38 -04:00
gauss@055c9051f507c97fa5612f46c74ce636f5ecde10
d71ca87bc9 Added a comment: No root privileges server - annex-shell replaced by git-annex-shell 2024-08-23 01:51:49 +00:00
Joey Hess
8ade3fc5d6
improve docs 2024-08-22 08:09:10 -04:00
Joey Hess
abdd49d8c1
update 2024-08-22 07:53:56 -04:00
Joey Hess
173500872f
update 2024-08-22 07:17:04 -04:00
Joey Hess
70e2fca257
Added the annex.fullybalancedthreshhold git config. 2024-08-22 07:15:55 -04:00
Joey Hess
3fe67744b1
display new empty repos in maxsize table
A new repo that has no location log info yet, but has an entry in
uuid.log has 0 size, so make RepoSize aware of that.

Note that a new repo that does not yet appear in uuid.log will still not
be displayed.

When a remote is added but not synced with yet, it has no uuid.log
entry. If git-annex maxsize is used to configure that remote, it needs
to appear in the maxsize table, and the change to Command.MaxSize takes
care of that.
2024-08-22 07:03:22 -04:00
Spencer
acaa8e9cd5 Added a comment: Precise Workflow 2024-08-22 00:18:28 +00:00
Joey Hess
76ece2a699
make --rebalance of balanced use fullysizebalanced when useful
When the specified number of copies is > 1, and some repositories are
too full, it can be better to move content from them to other less full
repositories, in order to make space for new content.

annex.fullybalancedthreshhold is documented, but not implemented yet

This is not tested very well yet, and is known to sometimes take several
runs to stabalize.
2024-08-21 17:59:08 -04:00
Joey Hess
9e87061de2
Support "sizebalanced=" and "fullysizebalanced=" too
Might want to make --rebalance turn balanced=group:N where N > 1
to fullysizebalanced=group:N. Have not yet determined if that will
improve situations enough to be worth the extra work.
2024-08-21 15:01:54 -04:00
Joey Hess
4e1dcc0372
bug 2024-08-21 12:18:31 -04:00
Joey Hess
476d223bce
implement fullbalanced=group:N
Rebalancing this when it gets into a suboptimal situation will need
further work.
2024-08-20 13:51:02 -04:00
Matthew
4a9e637d36 Added a comment: Help with .nfsXXXX files 2024-08-19 21:20:59 +00:00
matrss
9cfdae4c3b Added a comment 2024-08-19 10:25:13 +00:00
Joey Hess
68a99a8f48
size based rebalancing design 2024-08-18 16:25:12 -04:00
Joey Hess
99514f9d18
maxsize overview display and --json support 2024-08-18 12:08:13 -04:00
xentac
74b953cded Added a comment 2024-08-18 03:17:12 +00:00
Joey Hess
f985c58d8e
consistently don't show sizes of empty repositories
This used to be the case, and when matching options are used, that code
path still omits them, so also omit them in the getRepoSize code path.
2024-08-17 15:09:16 -04:00
Joey Hess
b62b58b50b
git-annex info speed up using getRepoSizes 2024-08-17 14:54:31 -04:00
Joey Hess
d09a005f2b
update RepoSize database from git-annex branch incrementally
The use of catObjectStream is optimally fast. Although it might be
possible to combine this with git-annex branch merge to avoid some
redundant work.

Benchmarking, a git-annex branch that had 100000 files changed
took less than 1.88 seconds to run through this.
2024-08-17 13:35:00 -04:00
Spencer
40b49e2ddd Added a comment: Remote Helper? 2024-08-17 05:33:01 +00:00
matrss
bcf876e3a0 2024-08-16 15:52:32 +00:00
matrss
f057010086 Added a comment 2024-08-16 15:45:45 +00:00
Joey Hess
61d95627f3
fix Annex.repoSize sharing between threads 2024-08-16 10:56:51 -04:00
Joey Hess
e361b9ea3c
todo 2024-08-15 16:15:48 -04:00
Joey Hess
63ccf6ffa7
todo 2024-08-15 13:50:50 -04:00
Joey Hess
4a0c7e2b2c
update 2024-08-15 13:41:47 -04:00
Joey Hess
a2da9c526b
RepoSize concurrency fix
When loading the journalled repo sizes, make sure that the current
process is prevented from making changes to the journal in another
thread.
2024-08-15 13:37:41 -04:00
Joey Hess
06064f897c
update Annex.reposizes when changing location logs
The live update is only needed when Annex.reposizes has already been
populated.
2024-08-15 13:27:14 -04:00
Joey Hess
c376b1bd7e
show message when doing possibly expensive from scratch reposize calculation 2024-08-15 12:42:36 -04:00
Joey Hess
c200523bac
implement getRepoSizes
At this point the RepoSize database is getting populated, and it
all seems to be working correctly. Incremental updates still need to be
done to make it performant.
2024-08-15 12:31:56 -04:00
Joey Hess
eac4e9391b
finalize RepoSize database
Including locking on creation, handling of permissions errors, and
setting repo sizes.

I'm confident that locking is not needed while using this database.
Since writes happen in a single transaction. When there are two writers
that are recording sizes based on different git-annex branch commits,
one will overwrite what the other one recorded. Which is fine, it's only
necessary that the database stays consistent with the content of a
git-annex branch commit.
2024-08-15 12:29:34 -04:00
Atemu
e8997d8899 Added a comment 2024-08-15 15:40:20 +00:00
Joey Hess
3e6eb2a58d
implement journalledRepoSizes
Plan is to run this when populating Annex.reposizes on demand.
So Annex.reposizes will be up-to-date with the journal, including
crucially journal entries for private repositories. But also
anything that has been written to the journal by another process,
especially if the process was ran with annex.alwayscommit=false.

From there, Annex.reposizes can be kept up to date with changes made
by the running process.
2024-08-14 13:53:24 -04:00
pedro-lopes-de-azevedo
c75ecc5350 Added a comment: parameter --from not accepted 2024-08-14 14:27:54 +00:00
bvaa
11eb2ae6ec Added a comment 2024-08-14 07:18:26 +00:00
Joey Hess
90a79a6c1e
plan 2024-08-13 15:13:30 -04:00
Joey Hess
a979d8da41
update 2024-08-13 14:14:47 -04:00
Joey Hess
10d8b3cc63
fixed --rebalance stability on drop
Was checking the wrong uuid, oops
2024-08-13 13:32:11 -04:00
Joey Hess
745bc5c547
take maxsize into account for balanced preferred content
This is very innefficient, it will need to be optimised not to
calculate the sizes of repos every time.

Also, fixed a bug in balancedPicker that caused it to pick a too high
index when some repos were excluded due to being full.
2024-08-13 11:00:20 -04:00
Spencer
05a62e4e5f Added a comment: Workaround: --force-small 2024-08-13 07:05:57 +00:00
Spencer
3d252da06c Added a comment: Exact Moment Things Go Wrong 2024-08-13 06:22:11 +00:00
Spencer
ab5f920d77 .md linting 2024-08-13 04:46:53 +00:00
Spencer
8a91a8c208 2024-08-13 04:46:10 +00:00
Spencer
c4296fbd45 Added a comment: Still a Problem (on Mac?) 2024-08-13 04:21:33 +00:00
ewen
491cf67ce2 Added a comment: Most servers upgraded to TLS v1.2 EMS / TLS v1.3 2024-08-13 00:01:05 +00:00
Joey Hess
b201792391
update 2024-08-12 18:57:03 -04:00
Joey Hess
1e799e7842
update 2024-08-12 11:56:52 -04:00
Joey Hess
71043fe9f7
update 2024-08-12 10:01:48 -04:00
Joey Hess
bcd2b9a5c4
idea 2024-08-12 09:43:14 -04:00
Joey Hess
1265d7e5df
implement maxsize log and command
* maxsize: New command to tell git-annex how large the expected maximum
  size of a repository is.
* vicfg: Include maxsize configuration.
2024-08-11 15:41:26 -04:00
Joey Hess
3019b21c40
more formal documentation of balancing 2024-08-11 13:29:06 -04:00
Joey Hess
bd5affa362
use hmac in balanced preferred content
This deals with the possible security problem that someone could make an
unusually low UUID and generate keys that are all constructed to hash to
a number that, mod the number of repositories in the group, == 0.
So balanced preferred content would always put those keys in the
repository with the low UUID as long as the group contains the
number of repositories that the attacker anticipated.
Presumably the attacker than holds the data for ransom? Dunno.

Anyway, the partial solution is to use HMAC (sha256) with all the UUIDs
combined together as the "secret", and the key as the "message". Now any
change in the set of UUIDs in a group will invalidate the attacker's
constructed keys from hashing to anything in particular.

Given that there are plenty of other things someone can do if they can
write to the repository -- including modifying preferred content so only
their repository wants files, and numcopies so other repositories drom
them -- this seems like safeguard enough.

Note that, in balancedPicker, combineduuids is memoized.
2024-08-10 16:32:54 -04:00
Joey Hess
bde58e6c71
todo 2024-08-09 16:57:10 -04:00
Joey Hess
412f6057e4
todo 2024-08-09 16:47:28 -04:00
xentac
fb186ab0a8 Added a comment 2024-08-09 19:31:12 +00:00
xentac
55a5cb7904 2024-08-09 19:22:19 +00:00
Joey Hess
f1cb5cb908
wrote git-annex maxsize man page 2024-08-09 14:57:11 -04:00
Joey Hess
5a6afff3d6
left off number option 2024-08-09 14:22:05 -04:00
Joey Hess
3ce2e95a5f
balanced preferred content and --rebalance
This all works fine. But it doesn't check repository sizes yet, and
without repository size checking, once a repository gets full, there
will be no other repository that will want its files.

Use of sha2 seems unncessary, probably alder2 or md5 or crc would have
been enough. Possibly just summing up the bytes of the key mod the number
of repositories would have sufficed. But sha2 is there, and probably
hardware accellerated. I doubt very much there is any security benefit
to using it though. If someone wants to construct a key that will be
balanced onto a given repository, sha2 is certianly not going to stop
them.
2024-08-09 14:16:09 -04:00
Joey Hess
152c87140b
update 2024-08-08 16:06:02 -04:00
Joey Hess
0959bfe5d3
update for exporttree=yes 2024-08-08 15:51:36 -04:00
Joey Hess
727b6a0b6d
update 2024-08-08 15:34:36 -04:00
Joey Hess
2616056cde
Merge branch 'exportreeplus' 2024-08-08 15:31:57 -04:00
Joey Hess
3b758aaad6
add news item for git-annex 10.20240808 2024-08-08 15:27:11 -04:00
Joey Hess
3ea835c7e8
proxied exporttree=yes versionedexport=yes remotes are not untrusted
This removes versionedExport, which was only used by the S3 special
remote. Instead, versionedexport=yes is a common way for remotes to
indicate that they are versioned.
2024-08-08 15:24:19 -04:00
Joey Hess
5c36177e58
proxied exporttree=yes remotes are untrustworthy
This is not perfect because it does not handle versioned special
remotes, which should not be untrustworthy, but now are when proxied.

The implementation turned out to be easy, because the exporttree field
is a default field, so is available in RemoteConfig even for git
remotes.
2024-08-08 14:43:53 -04:00
Joey Hess
b23c7f769e
update 2024-08-08 14:25:18 -04:00
Joey Hess
9663888c77
update 2024-08-08 14:05:05 -04:00
Joey Hess
a2eb3b450a
post-receive: use the exporttree=yes remote as a source
This handles cases where a single key is used by multiple files in the
exported tree. When using `git-annex push`, the key's content gets
stored in the annexobjects location, and then when the branch is pushed,
it gets renamed from the annexobjects location to the first exported
file. For subsequent exported files, a copy of the content needs to be
made. This causes it to download the key from the remote in order to
upload another copy to it.

This is not needed when using `git push` followed by `git-annex copy --to`
the proxied remote, because the received key is stored at all export
locations then.

Also, fixed handling of the synced branch push, it was exporting master
when synced/master was pushed.

Note that currently, the first push to the remote does not see that it
is able to get a key from it in order to upload it back. It displays
"(not available)". The second push is able to. Since git-annex push
pushes first the synced branch and then the branch, this does end up
with a full export being made, but it is not quite right.
2024-08-08 13:49:53 -04:00
Joey Hess
7294d23d78
export: Added --from option
This is similar to git-annex copy --from --to, in that it downloads a
local copy, locks it for removal, uploads it, and drops it. Removal of
the temporary local copy is done without verifying numcopies for the
same reason as that command.

I do wonder, looking at this, if there's a race where the local copy
gets used as a copy to allow some other drop in the narrow window after
it is downloaded and before it gets locked for removal. That would need
some other repository to have an out of date location log that says the
repository contains a copy of the key, in order for it to try to use it
as a copy. If there is such a race, git-annex copy/move would also be
vulnerable to it. It would be better to lock it for removal before
starting to download it! That is possible in v10 repositories, which do
use a separate content lock file.

Note that, when the exported tree contains several files that use the
same key, it will be downloaded repeatedly, once per time needed to
upload it. It would be possible to avoid that extra work, but it would
complicate this since the local copy would need to be preserved, locked
for removal, until the end. Also, that would mean that interrupting the
export would leave possibly a lot of temporarily downloaded keys in the
local repository, while currently it can only leave one.
2024-08-08 12:08:55 -04:00
Joey Hess
01edd186e9
update proxied exporttree=yes remote on receive of sync branch
Since git-annex sync sends the sync branch first, and only displays the
output of the push to the sync branch, this makes git-annex
post-retrieve's output when updating the exported tree be visible when
syncing.

This also makes syncing with a non-bare repository still update the
exported tree, even when the checked out branch is not able to be
updated. The sync branch gets sent regardless.
2024-08-07 13:11:06 -04:00
Joey Hess
55adbb6694
avoid trying to export tree to proxied exporttree=yes remotes
This avoids a lot of ugly messages when syncing with such a remote.
The export tree happens on the proxy side.
2024-08-07 13:00:19 -04:00
Joey Hess
6d96734128
updateproxy, updatecluster check annexobjects=yes
updateproxy, updatecluster: Prevent using an exporttree=yes special remote
that does not have annexobjects=yes, since it will not work.
2024-08-07 12:27:24 -04:00
Joey Hess
8864a9e353
update 2024-08-07 11:49:53 -04:00
Joey Hess
1e0f13ad7f
comment 2024-08-07 11:39:29 -04:00
Joey Hess
b8f8c38e88
Merge branch 'master' into exportreeplus 2024-08-07 11:28:21 -04:00
Joey Hess
509b23fa00
catch ClientError from withClientM
When getting from a P2P HTTP remote, prompt for credentials when required,
instead of failing.

This feels like it might be a bug in servant-client. withClientM's type
suggests it would not throw a ClientError. But it does in this case.
2024-08-07 11:24:34 -04:00
Joey Hess
43e1f590c9
comment 2024-08-07 10:47:47 -04:00
Joey Hess
1038567881
proxy stores received keys to known export locations
This handles the workflow where the branch is first pushed to the proxy,
and then files in the exported tree are later are copied to the proxied remote.

Turns out that the way the export log is structured, nothing needs
to be done to finalize the export once the last key is sent to it. Which
is great because that would have been a lot of complication. On
receiving the push, Command.Export runs and calls recordExportBeginning,
does as much as it can to update the export with the files currently
on it, and then calls recordExportUnderway. At that point, the
export.log records the export as "complete", but it's not really. And
that's fine. The same happens when using `git-annex export` when some
files are not available to send. Other repositories that have
access to the special remote can already retrieve files from it. As
the missing files get copied to the exported remote, all that needs
to be done is record each in the export db.

At this point, proxying to exporttree=yes annexobjects=yes special remotes
is fully working. Except for in the case where multiple files in the
tree use the same key, and the files are sent to the proxied remote
before pushing the tree.

It seems that even special remotes without annexobjects=yes will work if
used with the workflow where the git-annex branch is pushed before
copying files. But not with the `git-annex push` workflow.
2024-08-07 09:47:34 -04:00
matrss
3ccbcc5662 2024-08-07 12:12:29 +00:00
git-annex@82b5fddc759dffdf749b19add6f0be2a0c78b62c
d3cc84db3b 2024-08-07 12:05:53 +00:00
git-annex@82b5fddc759dffdf749b19add6f0be2a0c78b62c
e8f60e7daa 2024-08-07 12:04:42 +00:00
Joey Hess
ba1cb517c0
update 2024-08-06 14:46:56 -04:00
Joey Hess
c53f61e93f
Merge branch 'master' into exportreeplus 2024-08-06 14:46:33 -04:00
Joey Hess
f01d872059
fixed 2024-08-06 14:42:46 -04:00
Joey Hess
3289b1ad02
proxying to exporttree=yes annexobjects=yes basically working
It works when using git-annex sync/push/assist, or when manually sending
all content to the proxied remote before pushing to the proxy remote.
But when the push comes before the content is sent, sending content does
not update the exported tree.
2024-08-06 14:21:23 -04:00
Joey Hess
be5c86c248
refine 2024-08-06 12:15:18 -04:00
Joey Hess
4750ffbd3b
finalized design for proxying to exporttree=yes annexobjects=yes special remotes 2024-08-06 11:45:45 -04:00
Joey Hess
84d27cf34f
update 2024-08-06 11:13:51 -04:00
matrss
6d1592f857 2024-08-06 12:44:18 +00:00
Spencer
66ff2bc833 Added a comment: D: Correct 2024-08-05 22:17:55 +00:00