git-annex

Author	SHA1	Message	Date
Joey Hess	889603336a	fix reversion in skipping deleted files And add a test case for that. This certianly loses some of the 2x performance improvement in file seeking that seekFilteredKeys led to, because now it has to stat the worktree files again. Without benchmarking, I expect there will still be a sizable improvement, and also the git-annex branch precaching that seekFilteredKeys can do will still be a win of its approach. Also worth noting that lookupKey, when the file DNE, check if it's in an adjusted branch with hidden files, and if so, finds the key for the file anyway. That was intended to make git-annex sync --content be able to process those files, but a side effect was that, when a file was deleted but the deletion not yet staged, git-annex commands used to still list it. That was actually a bug. This commit fixes that bug too. (git-annex sync --content on such a branch does not use seekFilteredKeys so was not affected by the reversion or by this behavior change) This commit was sponsored by Jake Vosloo on Patreon.	2020-07-19 21:25:01 -04:00
Joey Hess	5dbb2924bb	remove unused function	2020-07-19 20:16:24 -04:00
Joey Hess	535cdc8d48	importfeed: Made checking known urls step around 10% faster. This was a bit disappointing, I was hoping for a 2x speedup. But, I think the metadata lookup is wasting a lot of time and also needs to be made to stream. The changes to catObjectStreamLsTree were benchmarked to not also speed up --all around 3% more. Seems I managed to make it polymorphic after all.	2020-07-14 12:47:51 -04:00
Joey Hess	75aab72d23	mostly done with location log precaching Some nice wins.	2020-07-13 17:04:02 -04:00
Joey Hess	a290792a4f	speed up seeking pointer files This solves the same problem as commit `b4d0f6dfc2` but in a better way, that should make processing pointer files maximally fast. If there is a mixture of pointer files and symlinks, the first symlinks until the pointer file are handled maximally fast, while the ones after that go via the slightly slower path.	2020-07-13 14:25:07 -04:00
Joey Hess	918b1faa3d	avoid hanging on exception	2020-07-13 12:36:15 -04:00
Joey Hess	88a7fb5cbb	convert all applicable commands to new 2x faster annexed file seeking This removes all calls to inAnnex, except for some involving --batch. It may be that the batch code could get a similar speedup, but I don't know if people habitually pass a huge number of files through --batch that git-annex does not need to do anything to process, so I skipped it for now. A few calls to ifAnnexed remain, and might be worth doing more to convert. In particular, Command.Sync has one that would probably speed it up by a good amount. (also removed some dead code from Command.Lock)	2020-07-10 15:45:38 -04:00
Joey Hess	b4d0f6dfc2	slower but sequential filtering of large files from pointer files There should still be a speedup seeking over pointer files, just not as large as the one seeking over symlinks.	2020-07-10 15:21:58 -04:00
Joey Hess	0f6b1ee048	check pointer file size This is all good, except for one small problem... When a pointer file has to be fed into the metadata cat-file, it's possible for a non-pointer file that comes after it to get fed into the main cat-file first, so the two files will be processed in a different order than the user specified. So, while this is the fast way, I guess I'll have to change it to be slower, but sequential..	2020-07-10 15:11:14 -04:00
Joey Hess	7a42a47902	renaming	2020-07-10 14:17:35 -04:00
Joey Hess	4c9ad1de46	optimisation: stream keys through git cat-file --buffer This is only implemented for git-annex get so far. It makes git-annex get nearly twice as fast in a repo with 10k files, all of them present! But, see the TODO for some caveats.	2020-07-10 13:54:52 -04:00
Joey Hess	d08c178f97	avoid catObjectStream skipping over unavailable shas Not needed as it's used for --all, but will be needed later.	2020-07-08 13:57:17 -04:00
Joey Hess	d010ab04be	sped up the --all option by 2x to 16x by using git cat-file --buffer This assumes that no location log files will have a newline or carriage return in their name. catObjectStream skips any such files due to cat-file not supporting them. Keys have been prevented from containing newlines since 2011, commit `480495beb4`. If some old repo had a key with a newline in it, --all will just skip processing that key. Other things, like .git/annex/unused files certianly assume no newlines in keys too, and AFAICR, such keys never actually worked. Carriage return is escaped by preSanitizeKeyName since 2013. WORM keys generated before that point could perhaps contain a CR. (URL probably not, http probably doesn't support an URL with a raw CR in it.) So, added a warning in fsck about such keys. Although, fsck --all will naturally skip them, so won't be able to warn about them. Not entirely satisfactory, but I'll bet there are not really any such keys in existence. Thanks to Lukey for finding this optimisation.	2020-07-07 13:54:04 -04:00
Joey Hess	e72ec8b9b2	add back git-annex branch read cache The cache was removed way back in 2012, commit `3417c55189` Then I forgot I had removed it! I remember clearly multiple times when I thought, "this reads the same data twice, but the cache will avoid that being very expensive". The reason it was removed was it messed up the assistant noticing when other processes made changes. That same kind of problem has recently been addressed when adding the optimisation to avoid reading the journal unnecessarily. Indeed, enableInteractiveJournalAccess is run in just the right places, so can just piggyback on it to know when it's not safe to use the cache.	2020-07-06 12:22:33 -04:00
Joey Hess	f912f8e5fd	refix bug in a better way Always run Git.Config.store, so when the git config gets reloaded, the override gets re-added to it, and changeGitRepo then calls extractGitConfig on it and sees the annex.* settings from the override. Remove any prior occurance of -c v and add it to the end. This way, -c foo=1 -c foo=2 -c foo=1 will pass -c foo=1 to git, rather than -c foo=2 Note that, if git had some multiline config that got built up by multiple -c's, this would not work still. But it never worked because before the bug got fixed in the first place, the -c value was repeated many times, so the multivalue thing would have been wrong. I don't think -c can be used with multiline configs anyway, though git-config does talk about them?	2020-07-02 13:32:33 -04:00
Joey Hess	ec0f8a6e74	Fix reversion that broke passing git configs with -c Reverting commit `c8fec6ab0`	2020-07-02 12:42:13 -04:00
Joey Hess	89b2542d3c	annex.skipunknown with transition plan Added annex.skipunknown git config, that can be set to false to change the behavior of commands like `git annex get foo*`, to not skip over files/dirs that are not checked into git and are explicitly listed in the command line. Significant complexity was needed to handle git-annex add, which uses some git ls-files calls, but needs to not use --error-unmatch because of course the files are not known to git. annex.skipunknown is planned to change to default to false in a git-annex release in early 2022. There's a todo for that.	2020-05-28 15:55:17 -04:00
Joey Hess	864ba4ecaa	disable buggy concurrency in Command.Export Fix a crash or potentially not all files being exported when sync -J --content is used with an export remote. Crash as described in fixed bug report. waitForAllRunningCommandActions inserted in several points where all the commandActions started before need to have finished before moving on to the next stage of the export. A race across those points could have maybe resulted in not all files being exported, or a wrong tree being export. For example, changeExport starting up an action like a rename of A to B. Then, with that action still running, fillExport uploading a new A, before the rename occurred. That race seems unlikely to have happened. There are some other ones that this also fixes.	2020-05-26 13:54:08 -04:00
Joey Hess	2a8fdfc7d8	Display a warning message when asked to operate on a file inside a directory that's a symbolic link to elsewhere This relicates git's behavior. It adds a few stat calls for the command line parameters, so there is some minor slowdown, but even with thousands of parameters it will not be very noticable, and git does the same statting in similar circumstances. Note that this does not prevent eg "git annex add symlink"; the symlink will be added to git as usual. And "git annex find symlink" will silently list nothing as well. It's only "symlink/foo" or "subdir/symlink/foo" that triggers the warning.	2020-05-11 15:03:35 -04:00
Joey Hess	cee6b344b4	cat-file resource pool Avoid running a large number of git cat-file child processes when run with a large -J value. This implementation takes care to avoid adding any overhead to git-annex when run without -J. When run with -J, there is a small bit of added overhead, to manipulate the resource pool. That optimisation added a fair bit of complexity.	2020-04-20 15:19:31 -04:00
Joey Hess	fada5c120c	remove unused import	2020-04-17 15:19:49 -04:00
Joey Hess	fe9cf1256e	move remoteList into dupState This does mean that RemoteDaemon.Transport.Tor's call runs it, otherwise no change, but this is groundwork for doing more such expensive actions in dupState.	2020-04-17 14:36:45 -04:00
Joey Hess	957a87b437	fix absolute filenames fed into --batch and git-annex info	2020-04-15 16:04:05 -04:00
Joey Hess	ca9c6c5f60	Fix a potential failure to parse git config Git has an obnoxious special case in git config, a line "foo" is the same as "foo = true". That means there is no way to examine the output of git config and tell if it was run with --null or not, since a "foo" in the first line could be such a boolean, or could be followed by its value on the next line if --null were used. So, rather than trying to do such a detection, track the style of config at all the points where it's generated.	2020-04-13 13:05:41 -04:00
Joey Hess	aeca7c2207	Sped up query commands that read the git-annex branch by around 5% The only price paid is one additional MVar read per write to the journal. Presumably writing a journal file dominiates over a MVar read time by several orders of magnitude. --batch does not get the speedup because then it needs to notice when another process has made a change. Also made the assistant and other damon modes bypass the optimisation, which would not help them anyway.	2020-04-09 13:54:43 -04:00
Joey Hess	c8fec6ab03	Fix a minor bug that caused options provided with -c to be passed multiple times to git.	2020-03-16 13:06:44 -04:00
Joey Hess	f6d629e483	changelog and minor style	2020-02-28 12:57:55 -04:00
Peter Simons	73cf523a4b	Fix build with ghc-8.8.x. The 'fail' method has been moved to the 'MonadFail' class. I made the changes so that the code still compiles with previous versions of 'base' that don't have the new MonadFail class exported by Prelude yet.	2020-02-28 12:54:20 -04:00
Joey Hess	c78b9b55b6	rename changeGitConfig to overrideGitConfig and avoid unncessary calls It's important that it be clear that it overrides a config, such that reloading the git config won't change it, and in particular, setConfig won't change it. Most of the calls to changeGitConfig were actually after setConfig, which was redundant and unncessary. So removed those. The only remaining one, besides --debug, is in the handling of repository-global config values. That one's ok, because the way mergeGitConfig is implemented, it does not override any value that is set in git config. If a value with a repo-global setting was passed to setConfig, it would set it in the git config, reload the git config, re-apply mergeGitConfig, and use the newly set value, which is the right thing.	2020-02-27 01:11:53 -04:00
Joey Hess	029c883713	Merge branch 'master' into v8	2020-02-19 14:32:11 -04:00
Joey Hess	aa949bbb7d	initremote --describe-other-params Does not yet include descriptions from external special remote programs.	2020-01-20 16:05:51 -04:00
Joey Hess	3cd3757236	annex.dotfiles The git add behavior changes could be avoided if it turns out to be really annoying, but then it would need to behave the old way when annex.dotfiles=false and the new way when annex.dotfiles=true. I'd rather not have the config option result in such divergent behavior as `git annex add .` skipping a dotfile (old) vs adding to annex (new). Note that the assistant always adds dotfiles to the annex. This is surprising, but not new behavior. Might be worth making it also honor annex.dotfiles, but I wonder if perhaps some user somewhere uses it and keeps large files in a directory that happens to begin with a dot. Since dotfiles and dotdirs are a unix culture thing, and the assistant users may not be part of that culture, it seems best to keep its current behavior for now.	2019-12-26 16:33:39 -04:00
Joey Hess	7d9dff5b05	Merge branch 'master' into bs and update changelog	2019-12-18 15:13:30 -04:00
Joey Hess	7fd5376334	inprogress: Support --key	2019-12-18 14:14:16 -04:00
Joey Hess	c19211774f	use filepath-bytestring for annex object manipulations git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.	2019-12-11 15:25:07 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	a0168cd9a2	use RawFilePath getSymbolicLinkStatus for speed	2019-12-06 15:42:54 -04:00
Joey Hess	c20f4704a7	all commands building except for assistant also, changed ConfigValue to a newtype, and moved it into Git.Config.	2019-12-05 14:41:18 -04:00
Joey Hess	3c7fd09ec8	get many more commands building again about half are building now	2019-12-05 11:40:10 -04:00
Joey Hess	b88f89c1ef	get the most commonly used commands building again A quick benchmark of whereis shows not much speed improvement, maybe a few percent. Profiling it found a hotspot, adds to todo.	2019-12-04 13:45:18 -04:00
Joey Hess	d7833def66	use ByteString for git config The parser and looking up config keys in the map should both be faster due to using ByteString. I had hoped this would speed up startup time, but any improvement to that was too small to measure. Seems worth keeping though. Note that the parser breaks up the ByteString, but a config map ends up pointing to the config as read, which is retained in memory until every value from it is no longer used. This can change memory usage patterns marginally, but won't affect git-annex.	2019-11-27 17:40:09 -04:00
Joey Hess	067aabdd48	wip RawFilePath 2x git-annex find speedup Finally builds (oh the agoncy of making it build), but still very unmergable, only Command.Find is included and lots of stuff is badly hacked to make it compile. Benchmarking vs master, this git-annex find is significantly faster! Specifically: num files old new speedup 48500 4.77 3.73 28% 12500 1.36 1.02 66% 20 0.075 0.074 0% (so startup time is unchanged) That's without really finishing the optimization. Things still to do: * Eliminate all the fromRawFilePath, toRawFilePath, encodeBS, decodeBS conversions. * Use versions of IO actions like getFileStatus that take a RawFilePath. * Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy. * Use ByteString for parsing git config to speed up startup. It's likely several of those will speed up git-annex find further. And other commands will certianly benefit even more.	2019-11-26 16:01:58 -04:00
Joey Hess	6a97ff6b3a	wip RawFilePath Goal is to make git-annex faster by using ByteString for all the worktree traversal. For now, this is focusing on Command.Find, in order to benchmark how much it helps. (All other commands are temporarily disabled) Currently in a very bad unbuildable in-between state.	2019-11-25 16:18:19 -04:00
Joey Hess	667d38a8f1	Fix a crash (STM deadlock) when -J is used with multiple files that point to the same key See the comment for a trace of the deadlock. Added a new StartStage. New worker threads begin in the StartStage. Once a thread is ready to do work, it moves away from the StartStage, and no thread will ever transition back to it. A thread that blocks waiting on another thread that is processing the same key will block while in the StartStage. That other thread will never switch back to the StartStage, and so the deadlock is avoided.	2019-11-14 13:51:09 -04:00
Joey Hess	61b384d2b7	add --sameas option, not yet used	2019-10-01 12:36:25 -04:00
Joey Hess	2b55a2b882	remotedaemon: Don't list --stop in help since it's not supported. Also, move out of plumbing section. When using tor, the remotedaemon is part of the user's workflow, as it runs the tor hidden service.	2019-09-30 14:40:46 -04:00
Joey Hess	b13a350556	added --unlocked and --locked	2019-09-19 12:33:13 -04:00
Joey Hess	fda1bdd679	Added --mimetype and --mimeencoding file matching options. Already had these for largefiles matching, but I forgot to add them as command-line options.	2019-09-19 12:09:59 -04:00
Joey Hess	3f0eef4baa	v7 for all repositories * Default to v7 for new repositories. * Automatically upgrade v5 repositories to v7.	2019-08-30 14:09:14 -04:00
Joey Hess	9a5ddda511	remove many old version ifdefs Drop support for building with ghc older than 8.4.4, and with older versions of serveral haskell libraries than will be included in Debian 10. The only remaining version ifdefs in the entire code base are now a couple for aws! This commit should only be merged after the Debian 10 release. And perhaps it will need to wait longer than that; it would make backporting new versions of git-annex to Debian 9 (stretch) which has been actively happening as recently as this year. This commit was sponsored by Ilya Shlyakhter.	2019-07-05 15:09:37 -04:00

1 2 3 4 5 ...

300 commits