Commit graph

35128 commits

Author SHA1 Message Date
Matthew
4a9e637d36 Added a comment: Help with .nfsXXXX files 2024-08-19 21:20:59 +00:00
matrss
9cfdae4c3b Added a comment 2024-08-19 10:25:13 +00:00
Joey Hess
68a99a8f48
size based rebalancing design 2024-08-18 16:25:12 -04:00
Joey Hess
99514f9d18
maxsize overview display and --json support 2024-08-18 12:08:13 -04:00
xentac
74b953cded Added a comment 2024-08-18 03:17:12 +00:00
Joey Hess
f985c58d8e
consistently don't show sizes of empty repositories
This used to be the case, and when matching options are used, that code
path still omits them, so also omit them in the getRepoSize code path.
2024-08-17 15:09:16 -04:00
Joey Hess
b62b58b50b
git-annex info speed up using getRepoSizes 2024-08-17 14:54:31 -04:00
Joey Hess
d09a005f2b
update RepoSize database from git-annex branch incrementally
The use of catObjectStream is optimally fast. Although it might be
possible to combine this with git-annex branch merge to avoid some
redundant work.

Benchmarking, a git-annex branch that had 100000 files changed
took less than 1.88 seconds to run through this.
2024-08-17 13:35:00 -04:00
Spencer
40b49e2ddd Added a comment: Remote Helper? 2024-08-17 05:33:01 +00:00
matrss
bcf876e3a0 2024-08-16 15:52:32 +00:00
matrss
f057010086 Added a comment 2024-08-16 15:45:45 +00:00
Joey Hess
61d95627f3
fix Annex.repoSize sharing between threads 2024-08-16 10:56:51 -04:00
Joey Hess
e361b9ea3c
todo 2024-08-15 16:15:48 -04:00
Joey Hess
63ccf6ffa7
todo 2024-08-15 13:50:50 -04:00
Joey Hess
4a0c7e2b2c
update 2024-08-15 13:41:47 -04:00
Joey Hess
a2da9c526b
RepoSize concurrency fix
When loading the journalled repo sizes, make sure that the current
process is prevented from making changes to the journal in another
thread.
2024-08-15 13:37:41 -04:00
Joey Hess
06064f897c
update Annex.reposizes when changing location logs
The live update is only needed when Annex.reposizes has already been
populated.
2024-08-15 13:27:14 -04:00
Joey Hess
c376b1bd7e
show message when doing possibly expensive from scratch reposize calculation 2024-08-15 12:42:36 -04:00
Joey Hess
c200523bac
implement getRepoSizes
At this point the RepoSize database is getting populated, and it
all seems to be working correctly. Incremental updates still need to be
done to make it performant.
2024-08-15 12:31:56 -04:00
Joey Hess
eac4e9391b
finalize RepoSize database
Including locking on creation, handling of permissions errors, and
setting repo sizes.

I'm confident that locking is not needed while using this database.
Since writes happen in a single transaction. When there are two writers
that are recording sizes based on different git-annex branch commits,
one will overwrite what the other one recorded. Which is fine, it's only
necessary that the database stays consistent with the content of a
git-annex branch commit.
2024-08-15 12:29:34 -04:00
Atemu
e8997d8899 Added a comment 2024-08-15 15:40:20 +00:00
Joey Hess
3e6eb2a58d
implement journalledRepoSizes
Plan is to run this when populating Annex.reposizes on demand.
So Annex.reposizes will be up-to-date with the journal, including
crucially journal entries for private repositories. But also
anything that has been written to the journal by another process,
especially if the process was ran with annex.alwayscommit=false.

From there, Annex.reposizes can be kept up to date with changes made
by the running process.
2024-08-14 13:53:24 -04:00
pedro-lopes-de-azevedo
c75ecc5350 Added a comment: parameter --from not accepted 2024-08-14 14:27:54 +00:00
bvaa
11eb2ae6ec Added a comment 2024-08-14 07:18:26 +00:00
Joey Hess
90a79a6c1e
plan 2024-08-13 15:13:30 -04:00
Joey Hess
a979d8da41
update 2024-08-13 14:14:47 -04:00
Joey Hess
10d8b3cc63
fixed --rebalance stability on drop
Was checking the wrong uuid, oops
2024-08-13 13:32:11 -04:00
Joey Hess
745bc5c547
take maxsize into account for balanced preferred content
This is very innefficient, it will need to be optimised not to
calculate the sizes of repos every time.

Also, fixed a bug in balancedPicker that caused it to pick a too high
index when some repos were excluded due to being full.
2024-08-13 11:00:20 -04:00
Spencer
05a62e4e5f Added a comment: Workaround: --force-small 2024-08-13 07:05:57 +00:00
Spencer
3d252da06c Added a comment: Exact Moment Things Go Wrong 2024-08-13 06:22:11 +00:00
Spencer
ab5f920d77 .md linting 2024-08-13 04:46:53 +00:00
Spencer
8a91a8c208 2024-08-13 04:46:10 +00:00
Spencer
c4296fbd45 Added a comment: Still a Problem (on Mac?) 2024-08-13 04:21:33 +00:00
ewen
491cf67ce2 Added a comment: Most servers upgraded to TLS v1.2 EMS / TLS v1.3 2024-08-13 00:01:05 +00:00
Joey Hess
b201792391
update 2024-08-12 18:57:03 -04:00
Joey Hess
1e799e7842
update 2024-08-12 11:56:52 -04:00
Joey Hess
71043fe9f7
update 2024-08-12 10:01:48 -04:00
Joey Hess
bcd2b9a5c4
idea 2024-08-12 09:43:14 -04:00
Joey Hess
1265d7e5df
implement maxsize log and command
* maxsize: New command to tell git-annex how large the expected maximum
  size of a repository is.
* vicfg: Include maxsize configuration.
2024-08-11 15:41:26 -04:00
Joey Hess
3019b21c40
more formal documentation of balancing 2024-08-11 13:29:06 -04:00
Joey Hess
bd5affa362
use hmac in balanced preferred content
This deals with the possible security problem that someone could make an
unusually low UUID and generate keys that are all constructed to hash to
a number that, mod the number of repositories in the group, == 0.
So balanced preferred content would always put those keys in the
repository with the low UUID as long as the group contains the
number of repositories that the attacker anticipated.
Presumably the attacker than holds the data for ransom? Dunno.

Anyway, the partial solution is to use HMAC (sha256) with all the UUIDs
combined together as the "secret", and the key as the "message". Now any
change in the set of UUIDs in a group will invalidate the attacker's
constructed keys from hashing to anything in particular.

Given that there are plenty of other things someone can do if they can
write to the repository -- including modifying preferred content so only
their repository wants files, and numcopies so other repositories drom
them -- this seems like safeguard enough.

Note that, in balancedPicker, combineduuids is memoized.
2024-08-10 16:32:54 -04:00
Joey Hess
bde58e6c71
todo 2024-08-09 16:57:10 -04:00
Joey Hess
412f6057e4
todo 2024-08-09 16:47:28 -04:00
xentac
fb186ab0a8 Added a comment 2024-08-09 19:31:12 +00:00
xentac
55a5cb7904 2024-08-09 19:22:19 +00:00
Joey Hess
f1cb5cb908
wrote git-annex maxsize man page 2024-08-09 14:57:11 -04:00
Joey Hess
5a6afff3d6
left off number option 2024-08-09 14:22:05 -04:00
Joey Hess
3ce2e95a5f
balanced preferred content and --rebalance
This all works fine. But it doesn't check repository sizes yet, and
without repository size checking, once a repository gets full, there
will be no other repository that will want its files.

Use of sha2 seems unncessary, probably alder2 or md5 or crc would have
been enough. Possibly just summing up the bytes of the key mod the number
of repositories would have sufficed. But sha2 is there, and probably
hardware accellerated. I doubt very much there is any security benefit
to using it though. If someone wants to construct a key that will be
balanced onto a given repository, sha2 is certianly not going to stop
them.
2024-08-09 14:16:09 -04:00
Joey Hess
152c87140b
update 2024-08-08 16:06:02 -04:00
Joey Hess
0959bfe5d3
update for exporttree=yes 2024-08-08 15:51:36 -04:00