git-annex

Author	SHA1	Message	Date
Joey Hess	d7813876a0	fixed the build Manually tested getLiveRepoSizes and it is working correctly.	2024-08-27 09:41:35 -04:00
Joey Hess	521e0a7062	fix a deadlock When finishedLiveUpdate was run on a different key than expected, it blocked forever waiting for an indication the database had been updated. Since the journal is locked when finishedLiveUpdate runs, this could also have caused other git-annex commands to hang.	2024-08-27 00:13:54 -04:00
Joey Hess	21608716bd	started work on getLiveRepoSizes Doesn't quite compile	2024-08-26 14:50:09 -04:00
Joey Hess	db89e39df6	partially fix concurrency issue in updating the rollingtotal It's possible for two processes or threads to both be doing the same operation at the same time. Eg, both dropping the same key. If one finishes and updates the rollingtotal, then the other one needs to be prevented from later updating the rollingtotal as well. And they could finish at the same time, or with some time in between. Addressed this by making updateRepoSize be called with the journal locked, and only once it's been determined that there is an actual location change to record in the log. updateRepoSize waits for the database to be updated. When there is a redundant operation, updateRepoSize won't be called, and the redundant LiveUpdate will be removed from the database on garbage collection. But: There will be a window where the redundant LiveUpdate is still visible in the db, and processes can see it, combine it with the rollingtotal, and arrive at the wrong size. This is a small window, but it still ought to be addressed. Unsure if it would always be safe to remove the redundant LiveUpdate? Consider the case where two drops and a get are all running concurrently somehow, and the order they finish is [drop, get, drop]. The second drop seems redundant to the first, but it would not be safe to remove it. While this seems unlikely, it's hard to rule out that a get and drop at different stages can both be running at the same time.	2024-08-26 09:43:32 -04:00
Joey Hess	03c7f99957	todo	2024-08-25 10:48:42 -04:00
Joey Hess	18f8d61f55	rolling total of size changes in RepoSize database When a live size change completes successfully, the same transaction that removes it from the database updates the rolling total for its repository. The idea is that when RepoSizes is read, SizeChanges will be as well, and cached locally. Any time a change is made, the local cache will be updated. So by comparing the local cache with the current SizeChanges, it can learn about size changes that were made by other processes. Then read the LiveSizeChanges, and add that in to get a live picture of the current sizes. Also added a SizeChangeId. This allows 2 different threads, or processes, to both record a live size change for the same repo and key, and update their own information without stepping on one-another's toes.	2024-08-25 10:34:47 -04:00
Joey Hess	9188825a4d	use FileSize It's just an alias, so this doesn't change the db schema, but it makes explicit that it's not stored as an int64	2024-08-25 08:22:40 -04:00
Joey Hess	2b037d36a1	update	2024-08-24 15:06:00 -04:00
Joey Hess	6660984442	update	2024-08-24 13:15:39 -04:00
Joey Hess	d60a33fd13	improve live update starting In an expression like "balanced=foo and exclude=bar", avoid it starting a live update when the overall expression doesn't match.	2024-08-24 13:07:05 -04:00
Joey Hess	16f945459c	todo	2024-08-24 11:58:17 -04:00
Joey Hess	2f20b939b7	LiveUpdate db updates working I've tested the behavior of the thread that waits for the LiveUpdate to be finished, and it does get signaled and exit cleanly when the LiveUpdate is GCed instead. Made finishedLiveUpdate wait for the thread to finish updating the database. There is a case where GC doesn't happen in time and the database is left with a live update recorded in it. This should not be a problem as such stale data can also happen when interrupted and will need to be detected when loading the database. Balanced preferred content expressions now call startLiveUpdate.	2024-08-24 11:49:58 -04:00
Joey Hess	84d1bb746b	LiveUpdate for clusters	2024-08-24 10:20:12 -04:00
Joey Hess	18cd8bf43a	punt on LiveUpdate plumbing through assistant for now	2024-08-24 09:37:24 -04:00
Joey Hess	1d51f18dd0	remove FIXME Using NoLiveUpdate here is appropriate, because this is running the server side of the P2P protocol. There no preferred content checking is done.	2024-08-24 09:34:22 -04:00
Joey Hess	3f8675f339	more LiveUpdate plumbing	2024-08-24 09:28:41 -04:00
Joey Hess	eb841ab004	plumb in LiveUpdate to copy/get/move/mirror copy and get do check preferred content, so need to prepareLiveUpdate. move and mirror do not, but copy is implemented using move, so move also needed to have a LiveUpdate plumbed through.	2024-08-24 09:20:58 -04:00
Joey Hess	418fbf3f2f	NoLiveExport for export and import While these do check preferred content, it would not make sense to use balanced preferred content with them.	2024-08-24 09:19:12 -04:00
Joey Hess	c3d40b9ec3	plumb in LiveUpdate (WIP) Each command that first checks preferred content (and/or required content) and then does something that can change the sizes of repositories needs to call prepareLiveUpdate, and plumb it through the preferred content check and the location log update. So far, only Command.Drop is done. Many other commands that don't need to do this have been updated to keep working. There may be some calls to NoLiveUpdate in places where that should be done. All will need to be double checked. Not currently in a compilable state.	2024-08-23 16:35:12 -04:00
Joey Hess	4885073377	add live size changes to RepoSize database Not yet used.	2024-08-23 12:51:00 -04:00
Joey Hess	dad1fb150f	update	2024-08-23 11:45:36 -04:00
Joey Hess	d0ab1550ec	possible design to address reposizes concurrency issues	2024-08-23 11:19:38 -04:00
Joey Hess	8ade3fc5d6	improve docs	2024-08-22 08:09:10 -04:00
Joey Hess	abdd49d8c1	update	2024-08-22 07:53:56 -04:00
Joey Hess	173500872f	update	2024-08-22 07:17:04 -04:00
Joey Hess	70e2fca257	Added the annex.fullybalancedthreshhold git config.	2024-08-22 07:15:55 -04:00
Joey Hess	3fe67744b1	display new empty repos in maxsize table A new repo that has no location log info yet, but has an entry in uuid.log has 0 size, so make RepoSize aware of that. Note that a new repo that does not yet appear in uuid.log will still not be displayed. When a remote is added but not synced with yet, it has no uuid.log entry. If git-annex maxsize is used to configure that remote, it needs to appear in the maxsize table, and the change to Command.MaxSize takes care of that.	2024-08-22 07:03:22 -04:00
Joey Hess	a643699b7b	display ">100%" when past maxsize This is to avoid a value like 1000% causing the table to not align.	2024-08-21 20:52:54 -04:00
Joey Hess	76ece2a699	make --rebalance of balanced use fullysizebalanced when useful When the specified number of copies is > 1, and some repositories are too full, it can be better to move content from them to other less full repositories, in order to make space for new content. annex.fullybalancedthreshhold is documented, but not implemented yet This is not tested very well yet, and is known to sometimes take several runs to stabalize.	2024-08-21 17:59:08 -04:00
Joey Hess	9e87061de2	Support "sizebalanced=" and "fullysizebalanced=" too Might want to make --rebalance turn balanced=group:N where N > 1 to fullysizebalanced=group:N. Have not yet determined if that will improve situations enough to be worth the extra work.	2024-08-21 15:01:54 -04:00
Joey Hess	4e1dcc0372	bug	2024-08-21 12:18:31 -04:00
Joey Hess	2ec4602e36	fix column width	2024-08-21 12:18:16 -04:00
Joey Hess	de7ac1bb70	fix	2024-08-20 13:52:46 -04:00
Joey Hess	476d223bce	implement fullbalanced=group:N Rebalancing this when it gets into a suboptimal situation will need further work.	2024-08-20 13:51:02 -04:00
Joey Hess	d4b2f8201d	add %full field to table	2024-08-19 11:41:48 -04:00
Joey Hess	68a99a8f48	size based rebalancing design	2024-08-18 16:25:12 -04:00
Joey Hess	99514f9d18	maxsize overview display and --json support	2024-08-18 12:08:13 -04:00
Joey Hess	016edcf437	adjust countdown number for RepoSize update message Benchmarking a git-annex branch with half a million files changed, it takes about 1 minute to update the RepoSizes. So this will display the message after a few seconds.	2024-08-17 15:59:07 -04:00
Joey Hess	f985c58d8e	consistently don't show sizes of empty repositories This used to be the case, and when matching options are used, that code path still omits them, so also omit them in the getRepoSize code path.	2024-08-17 15:09:16 -04:00
Joey Hess	b62b58b50b	git-annex info speed up using getRepoSizes	2024-08-17 14:54:31 -04:00
Joey Hess	d09a005f2b	update RepoSize database from git-annex branch incrementally The use of catObjectStream is optimally fast. Although it might be possible to combine this with git-annex branch merge to avoid some redundant work. Benchmarking, a git-annex branch that had 100000 files changed took less than 1.88 seconds to run through this.	2024-08-17 13:35:00 -04:00
Joey Hess	8239824d92	consistently omit clusters when calculating RepoSizes updateRepoSize is only called on the UUID of a repository, not any cluster it might be a node of. But overLocationLogs and overLocationLogsJournal were inclusing cluster UUIDs. So it was inconsistent. Currently I don't see any reason to calculate RepoSize for a cluster. It's not even clear what it should mean, the total size of all nodes, or the amount of information stored in the cluster in total?	2024-08-17 11:24:14 -04:00
Joey Hess	61d95627f3	fix Annex.repoSize sharing between threads	2024-08-16 10:56:51 -04:00
Joey Hess	e361b9ea3c	todo	2024-08-15 16:15:48 -04:00
Joey Hess	63ccf6ffa7	todo	2024-08-15 13:50:50 -04:00
Joey Hess	4a0c7e2b2c	update	2024-08-15 13:41:47 -04:00
Joey Hess	a2da9c526b	RepoSize concurrency fix When loading the journalled repo sizes, make sure that the current process is prevented from making changes to the journal in another thread.	2024-08-15 13:37:41 -04:00
Joey Hess	06064f897c	update Annex.reposizes when changing location logs The live update is only needed when Annex.reposizes has already been populated.	2024-08-15 13:27:14 -04:00
Joey Hess	c376b1bd7e	show message when doing possibly expensive from scratch reposize calculation	2024-08-15 12:42:36 -04:00
Joey Hess	c200523bac	implement getRepoSizes At this point the RepoSize database is getting populated, and it all seems to be working correctly. Incremental updates still need to be done to make it performant.	2024-08-15 12:31:56 -04:00

1 2 3 4 5 ...

45471 commits