git-annex

Author	SHA1	Message	Date
Joey Hess	06064f897c	update Annex.reposizes when changing location logs The live update is only needed when Annex.reposizes has already been populated.	2024-08-15 13:27:14 -04:00
Joey Hess	c376b1bd7e	show message when doing possibly expensive from scratch reposize calculation	2024-08-15 12:42:36 -04:00
Joey Hess	c200523bac	implement getRepoSizes At this point the RepoSize database is getting populated, and it all seems to be working correctly. Incremental updates still need to be done to make it performant.	2024-08-15 12:31:56 -04:00
Joey Hess	bba23e7cc9	do not need a db queue This database is read once and written at most once per run.	2024-08-15 12:31:27 -04:00
Joey Hess	eac4e9391b	finalize RepoSize database Including locking on creation, handling of permissions errors, and setting repo sizes. I'm confident that locking is not needed while using this database. Since writes happen in a single transaction. When there are two writers that are recording sizes based on different git-annex branch commits, one will overwrite what the other one recorded. Which is fine, it's only necessary that the database stays consistent with the content of a git-annex branch commit.	2024-08-15 12:29:34 -04:00
Joey Hess	63a3cedc45	slightly improve hairy types	2024-08-14 16:04:18 -04:00
Joey Hess	3e6eb2a58d	implement journalledRepoSizes Plan is to run this when populating Annex.reposizes on demand. So Annex.reposizes will be up-to-date with the journal, including crucially journal entries for private repositories. But also anything that has been written to the journal by another process, especially if the process was ran with annex.alwayscommit=false. From there, Annex.reposizes can be kept up to date with changes made by the running process.	2024-08-14 13:53:24 -04:00
Joey Hess	8ac2685b33	calcBranchRepoSizes without journal files This will be used to prime the RepoSizes database, which will always contain values that correpond to information in the git-annex branch, so without anything from journal files. Factored out overJournalFileContents which will later be used to update Annex.reposizes to include information from journal files. This will be partitcularly important to support private UUIDs which only ever get to journal files and not to the branch.	2024-08-14 03:19:30 -04:00
Joey Hess	90a79a6c1e	plan	2024-08-13 15:13:30 -04:00
Joey Hess	343c87db45	improve haddocks	2024-08-13 15:05:49 -04:00
Joey Hess	f612ebb934	avoid changing git-annex info behavior `5afbea25e7` changed it to ignore journal files that did not correspond to a key in the git-annex branch. However, when there is a private journal, that can happen. Neither behavior is fully correct, so keep the old incorrect behavior rather than introducing a new differently incorrect behavior. I plan to eventually make git-annex info use Annex.reposizes instead of calculating it itself, and once Annex.reposizes handles this all correctly, this will be a moot problem.	2024-08-13 14:17:20 -04:00
Joey Hess	a979d8da41	update	2024-08-13 14:14:47 -04:00
Joey Hess	08f55948e9	take all repository locations into account for balancing This fully fixes --rebalance stability, and also deals with an issue where a file is present in each balanced repository and it didn't want to remove it from any.	2024-08-13 13:46:47 -04:00
Joey Hess	10d8b3cc63	fixed --rebalance stability on drop Was checking the wrong uuid, oops	2024-08-13 13:32:11 -04:00
Joey Hess	5afbea25e7	avoid counting size of keys that are in the journal twice In calcRepoSizes and also git-annex info, when a key was in the journal, it was passed to the callback twice, so the calculated size was wrong.	2024-08-13 13:23:39 -04:00
Joey Hess	467d80101a	improve handling of unmerged git-annex branches in readonly repo git-annex info was displaying a message that didn't make sense in context. In calcRepoSizes, it seems better to return the information from the git-annex branch, rather than giving up. Especially since balanced preferred content uses it, and we can't just give up evaluating a preferred content expression if git-annex is to be usable in such a readonly repo. Commit `6d7ecd9e5d` nobly wanted git-annex to behave the same with such unmerged branches as it does when it can merge them. But for the purposes of preferred content, it seems to me there's a sense that such an unmerged branch is the same as a remote we have not pulled from. The balanced preferred content will either way operate under outdated information, and so make not the best choices.	2024-08-13 13:13:12 -04:00
Joey Hess	5c35b3d579	fix typo	2024-08-13 11:47:37 -04:00
Joey Hess	745bc5c547	take maxsize into account for balanced preferred content This is very innefficient, it will need to be optimised not to calculate the sizes of repos every time. Also, fixed a bug in balancedPicker that caused it to pick a too high index when some repos were excluded due to being full.	2024-08-13 11:00:20 -04:00
Joey Hess	b201792391	update	2024-08-12 18:57:03 -04:00
Joey Hess	0c3771beb1	add	2024-08-12 18:50:58 -04:00
Joey Hess	1e799e7842	update	2024-08-12 11:56:52 -04:00
Joey Hess	99a126bebb	added reposize database The idea is that upon a merge of the git-annex branch, or a commit to the git-annex branch, the reposize database will be updated. So it should always accurately reflect the location log sizes, but it will often be behind the actual current sizes. Annex.reposizes will start with the value from the database, and get updated with each transfer, so it will reflect a process's best understanding of the current sizes. When there are multiple processes all transferring to the same repo, Annex.reposize will not reflect transfers made by the other processes since the current process started. So when using balanced preferred content, it may make suboptimal choices, including trying to transfer content to the repo when another process has already filled it up. But this is the same as if there are multiple processes running on ifferent machines, so is acceptable. The reposize will eventually get an accurate value reflecting changes made by other processes or in other repos.	2024-08-12 11:19:58 -04:00
Joey Hess	71043fe9f7	update	2024-08-12 10:01:48 -04:00
Joey Hess	bcd2b9a5c4	idea	2024-08-12 09:43:14 -04:00
Joey Hess	1265d7e5df	implement maxsize log and command * maxsize: New command to tell git-annex how large the expected maximum size of a repository is. * vicfg: Include maxsize configuration.	2024-08-11 15:41:26 -04:00
Joey Hess	d33ab4bbe4	add preciseSize	2024-08-11 15:40:21 -04:00
Joey Hess	1224f1c183	improve usage	2024-08-11 14:37:18 -04:00
Joey Hess	3019b21c40	more formal documentation of balancing	2024-08-11 13:29:06 -04:00
Joey Hess	bd5affa362	use hmac in balanced preferred content This deals with the possible security problem that someone could make an unusually low UUID and generate keys that are all constructed to hash to a number that, mod the number of repositories in the group, == 0. So balanced preferred content would always put those keys in the repository with the low UUID as long as the group contains the number of repositories that the attacker anticipated. Presumably the attacker than holds the data for ransom? Dunno. Anyway, the partial solution is to use HMAC (sha256) with all the UUIDs combined together as the "secret", and the key as the "message". Now any change in the set of UUIDs in a group will invalidate the attacker's constructed keys from hashing to anything in particular. Given that there are plenty of other things someone can do if they can write to the repository -- including modifying preferred content so only their repository wants files, and numcopies so other repositories drom them -- this seems like safeguard enough. Note that, in balancedPicker, combineduuids is memoized.	2024-08-10 16:32:54 -04:00
Joey Hess	bde58e6c71	todo	2024-08-09 16:57:10 -04:00
Joey Hess	412f6057e4	todo	2024-08-09 16:47:28 -04:00
Joey Hess	f1cb5cb908	wrote git-annex maxsize man page	2024-08-09 14:57:11 -04:00
Joey Hess	5a6afff3d6	left off number option	2024-08-09 14:22:05 -04:00
Joey Hess	3ce2e95a5f	balanced preferred content and --rebalance This all works fine. But it doesn't check repository sizes yet, and without repository size checking, once a repository gets full, there will be no other repository that will want its files. Use of sha2 seems unncessary, probably alder2 or md5 or crc would have been enough. Possibly just summing up the bytes of the key mod the number of repositories would have sufficed. But sha2 is there, and probably hardware accellerated. I doubt very much there is any security benefit to using it though. If someone wants to construct a key that will be balanced onto a given repository, sha2 is certianly not going to stop them.	2024-08-09 14:16:09 -04:00
Joey Hess	152c87140b	update	2024-08-08 16:06:02 -04:00
Joey Hess	bda23daa6c	update	2024-08-08 15:54:22 -04:00
Joey Hess	fd03b31633	update	2024-08-08 15:53:36 -04:00
Joey Hess	7e48e712b2	update	2024-08-08 15:52:52 -04:00
Joey Hess	0959bfe5d3	update for exporttree=yes	2024-08-08 15:51:36 -04:00
Joey Hess	727b6a0b6d	update	2024-08-08 15:34:36 -04:00
Joey Hess	2616056cde	Merge branch 'exportreeplus'	2024-08-08 15:31:57 -04:00
Joey Hess	3b758aaad6	add news item for git-annex 10.20240808	2024-08-08 15:27:11 -04:00
Joey Hess	c15c32b5f8	releasing package git-annex version 10.20240808	2024-08-08 15:27:04 -04:00
Joey Hess	349b1e443b	proxied importtree=yes remotes are untrustworthy Even without exporttree=yes.	2024-08-08 15:26:02 -04:00
Joey Hess	3ea835c7e8	proxied exporttree=yes versionedexport=yes remotes are not untrusted This removes versionedExport, which was only used by the S3 special remote. Instead, versionedexport=yes is a common way for remotes to indicate that they are versioned.	2024-08-08 15:24:19 -04:00
Joey Hess	5c36177e58	proxied exporttree=yes remotes are untrustworthy This is not perfect because it does not handle versioned special remotes, which should not be untrustworthy, but now are when proxied. The implementation turned out to be easy, because the exporttree field is a default field, so is available in RemoteConfig even for git remotes.	2024-08-08 14:43:53 -04:00
Joey Hess	b23c7f769e	update	2024-08-08 14:25:18 -04:00
Joey Hess	9663888c77	update	2024-08-08 14:05:05 -04:00
Joey Hess	c84d1a9462	update export db after rename from annexobjects location This allows git-annex post-receive, on the first push to the remote to see that it is able to get a key from it in order to upload it back. Also avoided actively checking if the source remote contains a key. The location log is good enough. If the location log is wrong, the export of that file will fail with an informative message.	2024-08-08 14:03:02 -04:00
Joey Hess	a2eb3b450a	post-receive: use the exporttree=yes remote as a source This handles cases where a single key is used by multiple files in the exported tree. When using `git-annex push`, the key's content gets stored in the annexobjects location, and then when the branch is pushed, it gets renamed from the annexobjects location to the first exported file. For subsequent exported files, a copy of the content needs to be made. This causes it to download the key from the remote in order to upload another copy to it. This is not needed when using `git push` followed by `git-annex copy --to` the proxied remote, because the received key is stored at all export locations then. Also, fixed handling of the synced branch push, it was exporting master when synced/master was pushed. Note that currently, the first push to the remote does not see that it is able to get a key from it in order to upload it back. It displays "(not available)". The second push is able to. Since git-annex push pushes first the synced branch and then the branch, this does end up with a full export being made, but it is not quite right.	2024-08-08 13:49:53 -04:00

1 2 3 4 5 ...

45424 commits