git-annex

Author	SHA1	Message	Date
Joey Hess	5934e7d402	Merge branch 'master' of ssh://git-annex.branchable.com	2023-06-08 16:54:05 -04:00
Joey Hess	6821ba8dab	sync: use log to track adjusted branch needs updating Speeds up sync in an adjusted branch by avoiding re-adjusting the branch unncessarily, particularly when it is adjusted with --hide-missing or --unlock-present. When there are a lot of files, that was the majority of the time of a --no-content sync. Uses a log file, which is updated when content presence changes. This adds a little bit of overhead to every file get/drop when on such an adjusted branch. The overhead is minimal for get of any size of file, but might be noticable for drop in some cases. It seems like a reasonable trade-off. It would be possible to update the log file only at the end, but then it would not happen if the command is interrupted. When not in an adjusted branch, there should be no additional overhead. (getCurrentBranch is an MVar read, and it avoids the MVar read of getGitConfig.) Note that this does not deal with situations such as: git checkout master, git-annex get, git checkout adjusted branch, git-annex sync. The sync won't know that the adjusted branch needs to be updated. Dealing with that would add overhead to operation in non-adjusted branches, which I don't like. Also, there are other situations like having two adjusted branches that both need to be updated like this, and switching between them and sync not updating. This does mean a behavior change to sync, since it did previously deal with those situations. But, the documentation did not say that it did. The man pages only talk about sync updating the adjusted branch after it transfers content. I did consider making sync keep track of content it transferred (and dropped) and only update the adjusted branch then, not to catch up to other changes made previously. That would perform better. But it seemed rather hard to implement, and also it would have problems with races with a concurrent get/drop, which this implementation avoids. And it seemed pretty likely someone had gotten used to get/drop followed by sync updating the branch. It seems much less likely someone is switching branches, doing get/drop, and then switching back and expecting sync to update the branch. Re-running git-annex adjust still does a full re-adjusting of the branch, for anyone who needs that. Sponsored-by: Leon Schuermann on Patreon	2023-06-08 14:35:41 -04:00
Joey Hess	637f19bebb	fix adjusted branch update breakage Introduced recently in commit `64fc34b3da`. adjustBranch changes the sha that is recorded for the current branch (eg the adjusted branch). So, have to get the original sha before calling it. Sponsored-by: Jack Hill on Patreon	2023-06-08 13:33:58 -04:00
yarikoptic	96a6946a14	stalling report	2023-06-08 15:46:29 +00:00
Joey Hess	7888702955	update	2023-06-07 11:32:53 -04:00
Joey Hess	3e3d225ca0	Merge branch 'master' of ssh://git-annex.branchable.com	2023-06-07 11:16:39 -04:00
Joey Hess	64fc34b3da	narrow window where HEAD is detached Updating an adjusted branch can take a while when there are a lot of files. HEAD was detached at the start, so if eg git-annex sync was interrupted at the wrong point, there was a possibly wide window where it would leave the repo with HEAD detached. There's still a window, just much narrower. I don't know if it's possible to close the window entirely. While git can clearly update the currently checked out branch in eg git merge, it doesn't seem to provide another way to do it. Sponsored-by: Graham Spencer on Patreon	2023-06-07 11:10:54 -04:00
nobodyinperson	2fe032c4ee	Added a comment	2023-06-07 04:49:04 +00:00
Joey Hess	5bc37c2de2	comment	2023-06-06 15:17:09 -04:00
Joey Hess	d63af3f52e	comment	2023-06-06 14:45:48 -04:00
Joey Hess	3c15e0f7a0	cache negative lookups of global numcopies and mincopies Speeds up eg git-annex sync --content by up to 50%. When it does not need to transfer or drop anything, it now noops a lot more quickly. I didn't see anything else in sync --content noop loop that could really be sped up. It has to cat git objects to keys, stat object files, etc. Sponsored-by: unqueued on Patreon	2023-06-06 14:43:25 -04:00
Joey Hess	4437e187e6	update	2023-06-06 13:04:47 -04:00
Joey Hess	3efcb58b6a	comment	2023-06-06 13:02:15 -04:00
Joey Hess	4c88f68061	Merge branch 'master' of ssh://git-annex.branchable.com	2023-06-06 12:48:47 -04:00
nobodyinperson	aa61ac4273	Added a comment	2023-06-06 12:54:36 +00:00
nobodyinperson	cf7249d00c		2023-06-06 12:49:11 +00:00
Mowgli	6c60e1d715	Added a comment	2023-06-05 20:35:19 +00:00
Mowgli	c0b2eb3914	Added a comment: comment igendwas	2023-06-05 20:33:42 +00:00
jgoerzen	432e7cd9f3	Added a comment	2023-06-05 19:32:29 +00:00
Joey Hess	cfad0def18	wrap	2023-06-05 15:15:20 -04:00
Joey Hess	1f0f774ab7	close this release blocker	2023-06-05 15:10:52 -04:00
Joey Hess	4c9326dab5	reject	2023-06-05 15:00:39 -04:00
Joey Hess	07db8e234a	comment and wontfix	2023-06-05 14:40:25 -04:00
Joey Hess	528882a6df	comment	2023-06-05 14:08:12 -04:00
Joey Hess	190a538c0b	Merge branch 'master' of ssh://git-annex.branchable.com	2023-06-05 11:46:19 -04:00
Joey Hess	c6c6e3f5d6	update	2023-06-05 11:45:18 -04:00
jgoerzen	2c2a84caac	Added a comment	2023-06-02 21:44:54 +00:00
Joey Hess	fe1b2dfb4b	speed up very first tree import by 25% Reading from the cidsdb is responsible for about 25% of the runtime of an import. Since the cidmap is used to store the same information in ram, the cidsdb is not written to during an import any longer. And so, if it started off empty (and updateFromLog wasn't needed), those reads can just be skipped. This is kind of a cheesy optimisation, since after any import from any special remote, the database will no longer be empty, so it's a single use optimisation. But it's probably not uncommon to start by importing a lot of files, and it can save a lot of time then. Sponsored-by: Brock Spratlen on Patreon	2023-06-02 13:30:30 -04:00
Joey Hess	b43fb4923f	comment	2023-06-02 13:11:24 -04:00
Joey Hess	b8750bcb17	Merge branch 'master' of ssh://git-annex.branchable.com	2023-06-02 12:14:03 -04:00
Joey Hess	b40b368857	comment	2023-06-02 12:13:50 -04:00
jgoerzen	5dcbf7d41e	Added a comment	2023-06-02 03:25:27 +00:00
Joey Hess	f6dd34ca81	sync content with import remotes This didn't used to be needed because importKeys would import all content and so doing another pass was redundant. But since `40017089f2` it uses importChanges, so only new files are imported. If a file that was already imported before was dropped, that would prevent sync --content from gettng its content again. Sponsored-by: Jack Hill on Patreon	2023-06-01 18:52:19 -04:00
Joey Hess	92e4ed3cc0	retitle	2023-06-01 18:44:11 -04:00
Joey Hess	7178db5e06	Merge branch 'master' of ssh://git-annex.branchable.com	2023-06-01 18:43:29 -04:00
Joey Hess	2e92cef13f	comment	2023-06-01 18:43:17 -04:00
jgoerzen	53eeca40ae	Added a comment	2023-06-01 21:26:23 +00:00
Joey Hess	f1fe13c79c	devblog	2023-06-01 15:07:03 -04:00
Joey Hess	594110a6af	comment	2023-06-01 14:21:55 -04:00
Joey Hess	40017089f2	use importChanges optimisation Large speed up to importing trees from special remotes that contain a lot of files, by only processing changed files. Benchmarks: Importing from a special remote that has 10000 files, that have all been imported before, and 1 new file sped up from 26.06 to 2.59 seconds. An import with no change and 10000 unchanged files sped up from 24.3 to 1.99 seconds. Going up to 20000 files, an import with no changes sped up from 125.95 to 3.84 seconds. Sponsored-by: k0ld on Patreon	2023-06-01 13:47:00 -04:00
Joey Hess	029b08f54b	Merge branch 'master' of ssh://git-annex.branchable.com	2023-05-31 16:34:03 -04:00
Joey Hess	c6acf574c7	implement importChanges optimisaton (not used yet) For simplicity, I've not tried to make it handle History yet, so when there is a history, a full import will still be done. Probably the right way to handle history is to first diff from the current tree to the last imported tree. Then, diff from the current tree to each of the historical trees, and recurse through the history diffing from child tree to parent tree. I don't think that will need a record of the previously imported historical trees, and so Logs.Import doesn't store them. Although I did leave room for future expansion in that log just in case. Next step will be to change importTree to importChanges and modify recordImportTree et all to handle it, by using adjustTree. Sponsored-by: Brett Eisenberg on Patreon	2023-05-31 16:01:34 -04:00
Joey Hess	7298123520	build git trees using ContentIdentifier to speed up import This gets the trees built, but it does not use them. Next step will be to remember the tree for next time an import is done, and diff between old and new trees to find the files that have changed. Added --missing to the mktree parameters. That only disables a check, so it's ok to do everywhere mktree is used. It probably also speeds up mktree to disable the check. Note that git fsck does not complain about the resulting tree objects that point to shas that are not in the repository. Even with --strict. A quick benchmark, importing 10000 files, this slowed it down from 2:04.06 to 2:04.28. So it will more than pay for itself. Sponsored-by: Luke Shumaker on Patreon	2023-05-31 12:46:54 -04:00
Joey Hess	51319f8558	update	2023-05-30 17:19:23 -04:00
Joey Hess	f6aa097a39	avoid import writing to cidsdb initially Speed up importing trees from special remotes somewhat by avoiding redundant writes to sqlite database. Before, import would write to both the git-annex branch and also to the sqlite database. But then the next time it was run, needsUpdateFromLog would see the branch had changed, so run updateFromLog, which would make the same writes to the sqlite database a second time. Now import writes only to the git-annex branch. The next time it's run, needsUpdateFromLog sees that the branch has changed and so calls updateFromLog, which updates the sqlite database. Why defer the write to the sqlite database like this? It seems that it could write to the database as it goes, and at the end call recordAnnexBranchTree to indicate that the information in the git-annex branch has all been written to the cidsdb. That would avoid the second import doing extra work. But, there could be other processes running at the same time, and one of them may update the git-annex branch, eg merging a remote git-annex branch into it. Any cids logs on that merged git-annex branch would not be reflected in the cidsdb yet. If the import then called recordAnnexBranchTree, the cidsdb would never get updated with that merged information. I don't think there's a good way to prevent, or to detect that situation. So, it can't call recordAnnexBranchTree at the end. So it might as well wait until the next run and do updateFromLog then. It could instead do updateFromLog at the end, but it's going to check needsUpdateFromLog at the beginning anyway. Note that the database writes were queued, so there is already a cidmap that is used to remember changes that the current process has made. So, omitting database writes can't change the behavior of the current process. Also note that thirdpartypopulatedimport uses recordcidkeyindb, which reflects what it already did. That code path does not use the cidmap, but does not need to query it either. It might be possible to make that code path also only update the git-annex branch and not the db, but I haven't checked. Sponsored-by: Noam Kremen on Patreon	2023-05-30 17:05:28 -04:00
jgoerzen	f47e7abd57	Added a comment	2023-05-30 20:58:21 +00:00
Joey Hess	c1e415887a	improve test descriptions	2023-05-30 16:11:29 -04:00
Joey Hess	5070087a63	repair: Fix handling of git ref names on Windows Sponsored-by: Kevin Mueller on Patreon	2023-05-30 16:09:13 -04:00
Joey Hess	9ca81ed02a	update	2023-05-30 15:49:52 -04:00
Joey Hess	aaeae746f0	comment and a neat idea	2023-05-30 15:42:34 -04:00

... 10 11 12 13 14 ...

43967 commits