git-annex

Author	SHA1	Message	Date
Joey Hess	6d7ecd9e5d	merge git-annex branch in memory in read-only repository Improved support for using git-annex in a read-only repository, git-annex branch information from remotes that cannot be merged into the git-annex branch will now not crash it, but will be merged in memory. To avoid this making git-annex behave one way in a read-only repository, and another way when it can write, it's important that Annex.Branch.get return the same thing (modulo log file compaction) in both cases. This manages that mostly. There are some exceptions: - When there is a transition in one of the remote git-annex branches that has not yet been applied to the local or other git-annex branches. Transitions are not handled. - `git-annex log` runs git log on the git-annex branch, and so it will not be able to show information coming from the other, not yet merged branches. - Annex.Branch.files only looks at files in the git-annex branch and not unmerged branches. This affects git-annex info output. - Annex.Branch.hs.overBranchFileContents ditto. Affects --all and also importfeed (but importfeed cannot work in a read-only repo anyway). - CmdLine.Seek.seekFilteredKeys when precaching location logs. Note use of Annex.Branch.fullname - Database.ContentIdentifier.needsUpdateFromLog and updateFromLog These warts make this not suitable to be merged yet. This readonly code path is more expensive, since it has to query several branches. The value does get cached, but still large queries will be slower in a read-only repository when there are unmerged git-annex branches. When annex.merge-annex-branches=false, updateTo skips doing anything, and so the read-only repository code does not get triggered. So a user who is bothered by the extra work can set that. Other writes to the repository can still result in permissions errors. This includes the initial creation of the git-annex branch, and of course any writes to the git-annex branch. Sponsored-by: Dartmouth College's Datalad project	2021-12-27 13:21:15 -04:00
Joey Hess	ced91b3fbd	Avoid excess commits to the git-annex branch when stall detection is enabled When git-annex transferrer started up, and the journal contained something, it would commit it to the git-annex branch. This caused excess commits to the branch, in cases where normally several changes would be journalled and committed together. That generated some excess git objects and was also just noisy on stdout. Since transferrer uses enableInteractiveBranchAccess, it does not need to commit journalled changes, since the optimisation that avoids checking the journal when reading from the branch is disabled for processes that call that. This commit was sponsored by Svenne Krap on Patreon.	2021-04-02 11:57:18 -04:00
Joey Hess	c75f7e1d98	improve comment	2021-04-02 10:35:15 -04:00
Joey Hess	9483b10469	cache one more log file for metadata My worry was that a preferred content expression that matches on metadata would have removed the location log from cache, causing an expensive re-read when a Seek action later checked the location log. Especially when the --all optimisation in the previous commit pre-cached the location log. This also means that the --all optimisation could cache the metadata log too, if it wanted too, but not currently done. The cache is a list, with the most recently accessed file first. That optimises it for the common case of reading the same file twice, eg a get, examine, followed by set reads it twice. And sync --content reads the location log 3 times in a row commonly. But, as a list, it should not be made to be too long. I thought about expanding it to 5 items, but that seemed unlikely to be a win commonly enough to outweigh the extra time spent checking the cache. Clearly there could be some further benchmarking and tuning here.	2020-07-07 14:18:55 -04:00
Joey Hess	e72ec8b9b2	add back git-annex branch read cache The cache was removed way back in 2012, commit `3417c55189` Then I forgot I had removed it! I remember clearly multiple times when I thought, "this reads the same data twice, but the cache will avoid that being very expensive". The reason it was removed was it messed up the assistant noticing when other processes made changes. That same kind of problem has recently been addressed when adding the optimisation to avoid reading the journal unnecessarily. Indeed, enableInteractiveJournalAccess is run in just the right places, so can just piggyback on it to know when it's not safe to use the cache.	2020-07-06 12:22:33 -04:00
Joey Hess	43a9808292	disable journal read optimisation when alwayscommit=false The journal read optimisation in `aeca7c220` later got fixed in `eedd73b84` to stage and commit any files that were left in the journal by a previous git-annex run. That's necessary for the optimisation to work correctly. But it also meant that alwayscommit=false started committing the previous git-annex processes journalled changes, which defeated the purpose of the config setting entirely. So, disable the optimisation when alwayscommit=false, leaving the files in the journal and not committing them. See my comments on the bug report for why this seemed the best approach. Also fixes a problem when annex.merge-annex-branches=false and there are changes in the journal. That config indirectly prevents committing the journal. (Which seems a bit odd given its name, but it always has..) So, when there were changes in the journal, perhaps left there due to alwayscommit=false being set before, the optimisation would prevent git-annex from reading the journal files, and it would operate with out of date information.	2020-04-15 13:24:33 -04:00
Joey Hess	aeca7c2207	Sped up query commands that read the git-annex branch by around 5% The only price paid is one additional MVar read per write to the journal. Presumably writing a journal file dominiates over a MVar read time by several orders of magnitude. --batch does not get the speedup because then it needs to notice when another process has made a change. Also made the assistant and other damon modes bypass the optimisation, which would not help them anyway.	2020-04-09 13:54:43 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	737e45156e	remove 163 lines of code without changing anything except imports	2016-01-20 16:36:33 -04:00
Joey Hess	0880c8319e	simplify and make more atomic	2015-04-10 15:16:17 -04:00
Joey Hess	afc5153157	update my email address and homepage url	2015-01-21 12:50:09 -04:00
Joey Hess	3417c55189	remove git-annex branch read cache This cache prevented noticing changes made by another process. The case I just ran into involved the assistant dropping a file, which cached its presence info. Then the same file was downloaded again, but the assistant didn't know its presence info had changed. I don't see a way to keep this cache. Will instead rely on the OS level file cache, for files in the journal. May need to add more higher-level caching of info that it's ok to have a potentially stale copy of, although much of git-annex already does so.	2012-10-19 14:25:15 -04:00
Joey Hess	a3d97e0c85	tweak	2012-01-14 14:31:16 -04:00
Joey Hess	5e2b4e16ba	avoid multiple unnecessary stats of the index file Up to one per file processed.	2012-01-14 12:07:36 -04:00
Joey Hess	98dfc0c9b0	split out Annex/BranchState.hs	2011-12-12 17:38:46 -04:00

15 commits