git-annex

Author	SHA1	Message	Date
Joey Hess	506ffea5e6	stop symlink check once the top of the working tree is reached Avoid complaining that a file with "is beyond a symbolic link" when the filepath is absolute and the symlink in question is not actually inside the git repository. This assumes that inodes remain stable while the command is running. I think they always will, the filesystems where they are unstable change them across mounts. (If inodes were not stable, it would just complain about symlinks in the path that are not inside the working tree.) (On windows, I don't want to assume anything about inodes, they could be random numbers for all I know. But if they were, this would still be ok, as long as windows doesn't have symlinks that are detected by isSymbolicLink. Which seems a fair bet.)	2020-08-06 20:14:30 -04:00
Joey Hess	5d380c6c5c	when workTreeItems finds a problem with a parameter, don't go on to process it Part of workTreeItems is trying detect a case where git porcelain refuses to process a file, and where git ls-files silently outputs nothing. But, it's hard to perfectly replicate git's behavior, and besides, git's behavior could change. So it could be that we warn, but then git ls-files does not skip over it, and so git-annex also processes it after warning about it. So, if we think we have a problem with a parameter, display the warning, and skip processing it at all. Implementing this was complicated by needing to handle the case where all command-line parameters get filtered out this way. Which is different than the case where there are none, because we don't want to operate on all files in this new case..	2020-08-06 13:47:45 -04:00
Joey Hess	f75be32166	external backends wip It's able to start them up, the only thing not implemented is generating and verifying keys. And, the key translation for HasExt.	2020-07-29 15:23:18 -04:00
Joey Hess	00865cdae8	Fix a bug in find --branch in the previous version inAnnex check was lost for that code path. To avoid more such mistakes, made withKeyOptions check it when the AnnexedFileSeeker specifies.	2020-07-24 12:05:28 -04:00
Joey Hess	1be92381ec	unify batch mode with non-batch by using AnnexedFileSeeker	2020-07-22 14:23:28 -04:00
Joey Hess	889603336a	fix reversion in skipping deleted files And add a test case for that. This certianly loses some of the 2x performance improvement in file seeking that seekFilteredKeys led to, because now it has to stat the worktree files again. Without benchmarking, I expect there will still be a sizable improvement, and also the git-annex branch precaching that seekFilteredKeys can do will still be a win of its approach. Also worth noting that lookupKey, when the file DNE, check if it's in an adjusted branch with hidden files, and if so, finds the key for the file anyway. That was intended to make git-annex sync --content be able to process those files, but a side effect was that, when a file was deleted but the deletion not yet staged, git-annex commands used to still list it. That was actually a bug. This commit fixes that bug too. (git-annex sync --content on such a branch does not use seekFilteredKeys so was not affected by the reversion or by this behavior change) This commit was sponsored by Jake Vosloo on Patreon.	2020-07-19 21:25:01 -04:00
Joey Hess	5dbb2924bb	remove unused function	2020-07-19 20:16:24 -04:00
Joey Hess	535cdc8d48	importfeed: Made checking known urls step around 10% faster. This was a bit disappointing, I was hoping for a 2x speedup. But, I think the metadata lookup is wasting a lot of time and also needs to be made to stream. The changes to catObjectStreamLsTree were benchmarked to not also speed up --all around 3% more. Seems I managed to make it polymorphic after all.	2020-07-14 12:47:51 -04:00
Joey Hess	75aab72d23	mostly done with location log precaching Some nice wins.	2020-07-13 17:04:02 -04:00
Joey Hess	a290792a4f	speed up seeking pointer files This solves the same problem as commit `b4d0f6dfc2` but in a better way, that should make processing pointer files maximally fast. If there is a mixture of pointer files and symlinks, the first symlinks until the pointer file are handled maximally fast, while the ones after that go via the slightly slower path.	2020-07-13 14:25:07 -04:00
Joey Hess	918b1faa3d	avoid hanging on exception	2020-07-13 12:36:15 -04:00
Joey Hess	88a7fb5cbb	convert all applicable commands to new 2x faster annexed file seeking This removes all calls to inAnnex, except for some involving --batch. It may be that the batch code could get a similar speedup, but I don't know if people habitually pass a huge number of files through --batch that git-annex does not need to do anything to process, so I skipped it for now. A few calls to ifAnnexed remain, and might be worth doing more to convert. In particular, Command.Sync has one that would probably speed it up by a good amount. (also removed some dead code from Command.Lock)	2020-07-10 15:45:38 -04:00
Joey Hess	b4d0f6dfc2	slower but sequential filtering of large files from pointer files There should still be a speedup seeking over pointer files, just not as large as the one seeking over symlinks.	2020-07-10 15:21:58 -04:00
Joey Hess	0f6b1ee048	check pointer file size This is all good, except for one small problem... When a pointer file has to be fed into the metadata cat-file, it's possible for a non-pointer file that comes after it to get fed into the main cat-file first, so the two files will be processed in a different order than the user specified. So, while this is the fast way, I guess I'll have to change it to be slower, but sequential..	2020-07-10 15:11:14 -04:00
Joey Hess	7a42a47902	renaming	2020-07-10 14:17:35 -04:00
Joey Hess	4c9ad1de46	optimisation: stream keys through git cat-file --buffer This is only implemented for git-annex get so far. It makes git-annex get nearly twice as fast in a repo with 10k files, all of them present! But, see the TODO for some caveats.	2020-07-10 13:54:52 -04:00
Joey Hess	d08c178f97	avoid catObjectStream skipping over unavailable shas Not needed as it's used for --all, but will be needed later.	2020-07-08 13:57:17 -04:00
Joey Hess	d010ab04be	sped up the --all option by 2x to 16x by using git cat-file --buffer This assumes that no location log files will have a newline or carriage return in their name. catObjectStream skips any such files due to cat-file not supporting them. Keys have been prevented from containing newlines since 2011, commit `480495beb4`. If some old repo had a key with a newline in it, --all will just skip processing that key. Other things, like .git/annex/unused files certianly assume no newlines in keys too, and AFAICR, such keys never actually worked. Carriage return is escaped by preSanitizeKeyName since 2013. WORM keys generated before that point could perhaps contain a CR. (URL probably not, http probably doesn't support an URL with a raw CR in it.) So, added a warning in fsck about such keys. Although, fsck --all will naturally skip them, so won't be able to warn about them. Not entirely satisfactory, but I'll bet there are not really any such keys in existence. Thanks to Lukey for finding this optimisation.	2020-07-07 13:54:04 -04:00
Joey Hess	e72ec8b9b2	add back git-annex branch read cache The cache was removed way back in 2012, commit `3417c55189` Then I forgot I had removed it! I remember clearly multiple times when I thought, "this reads the same data twice, but the cache will avoid that being very expensive". The reason it was removed was it messed up the assistant noticing when other processes made changes. That same kind of problem has recently been addressed when adding the optimisation to avoid reading the journal unnecessarily. Indeed, enableInteractiveJournalAccess is run in just the right places, so can just piggyback on it to know when it's not safe to use the cache.	2020-07-06 12:22:33 -04:00
Joey Hess	f912f8e5fd	refix bug in a better way Always run Git.Config.store, so when the git config gets reloaded, the override gets re-added to it, and changeGitRepo then calls extractGitConfig on it and sees the annex.* settings from the override. Remove any prior occurance of -c v and add it to the end. This way, -c foo=1 -c foo=2 -c foo=1 will pass -c foo=1 to git, rather than -c foo=2 Note that, if git had some multiline config that got built up by multiple -c's, this would not work still. But it never worked because before the bug got fixed in the first place, the -c value was repeated many times, so the multivalue thing would have been wrong. I don't think -c can be used with multiline configs anyway, though git-config does talk about them?	2020-07-02 13:32:33 -04:00
Joey Hess	ec0f8a6e74	Fix reversion that broke passing git configs with -c Reverting commit `c8fec6ab0`	2020-07-02 12:42:13 -04:00
Joey Hess	89b2542d3c	annex.skipunknown with transition plan Added annex.skipunknown git config, that can be set to false to change the behavior of commands like `git annex get foo*`, to not skip over files/dirs that are not checked into git and are explicitly listed in the command line. Significant complexity was needed to handle git-annex add, which uses some git ls-files calls, but needs to not use --error-unmatch because of course the files are not known to git. annex.skipunknown is planned to change to default to false in a git-annex release in early 2022. There's a todo for that.	2020-05-28 15:55:17 -04:00
Joey Hess	864ba4ecaa	disable buggy concurrency in Command.Export Fix a crash or potentially not all files being exported when sync -J --content is used with an export remote. Crash as described in fixed bug report. waitForAllRunningCommandActions inserted in several points where all the commandActions started before need to have finished before moving on to the next stage of the export. A race across those points could have maybe resulted in not all files being exported, or a wrong tree being export. For example, changeExport starting up an action like a rename of A to B. Then, with that action still running, fillExport uploading a new A, before the rename occurred. That race seems unlikely to have happened. There are some other ones that this also fixes.	2020-05-26 13:54:08 -04:00
Joey Hess	2a8fdfc7d8	Display a warning message when asked to operate on a file inside a directory that's a symbolic link to elsewhere This relicates git's behavior. It adds a few stat calls for the command line parameters, so there is some minor slowdown, but even with thousands of parameters it will not be very noticable, and git does the same statting in similar circumstances. Note that this does not prevent eg "git annex add symlink"; the symlink will be added to git as usual. And "git annex find symlink" will silently list nothing as well. It's only "symlink/foo" or "subdir/symlink/foo" that triggers the warning.	2020-05-11 15:03:35 -04:00
Joey Hess	cee6b344b4	cat-file resource pool Avoid running a large number of git cat-file child processes when run with a large -J value. This implementation takes care to avoid adding any overhead to git-annex when run without -J. When run with -J, there is a small bit of added overhead, to manipulate the resource pool. That optimisation added a fair bit of complexity.	2020-04-20 15:19:31 -04:00
Joey Hess	fada5c120c	remove unused import	2020-04-17 15:19:49 -04:00
Joey Hess	fe9cf1256e	move remoteList into dupState This does mean that RemoteDaemon.Transport.Tor's call runs it, otherwise no change, but this is groundwork for doing more such expensive actions in dupState.	2020-04-17 14:36:45 -04:00
Joey Hess	957a87b437	fix absolute filenames fed into --batch and git-annex info	2020-04-15 16:04:05 -04:00
Joey Hess	ca9c6c5f60	Fix a potential failure to parse git config Git has an obnoxious special case in git config, a line "foo" is the same as "foo = true". That means there is no way to examine the output of git config and tell if it was run with --null or not, since a "foo" in the first line could be such a boolean, or could be followed by its value on the next line if --null were used. So, rather than trying to do such a detection, track the style of config at all the points where it's generated.	2020-04-13 13:05:41 -04:00
Joey Hess	aeca7c2207	Sped up query commands that read the git-annex branch by around 5% The only price paid is one additional MVar read per write to the journal. Presumably writing a journal file dominiates over a MVar read time by several orders of magnitude. --batch does not get the speedup because then it needs to notice when another process has made a change. Also made the assistant and other damon modes bypass the optimisation, which would not help them anyway.	2020-04-09 13:54:43 -04:00
Joey Hess	c8fec6ab03	Fix a minor bug that caused options provided with -c to be passed multiple times to git.	2020-03-16 13:06:44 -04:00
Joey Hess	f6d629e483	changelog and minor style	2020-02-28 12:57:55 -04:00
Peter Simons	73cf523a4b	Fix build with ghc-8.8.x. The 'fail' method has been moved to the 'MonadFail' class. I made the changes so that the code still compiles with previous versions of 'base' that don't have the new MonadFail class exported by Prelude yet.	2020-02-28 12:54:20 -04:00
Joey Hess	c78b9b55b6	rename changeGitConfig to overrideGitConfig and avoid unncessary calls It's important that it be clear that it overrides a config, such that reloading the git config won't change it, and in particular, setConfig won't change it. Most of the calls to changeGitConfig were actually after setConfig, which was redundant and unncessary. So removed those. The only remaining one, besides --debug, is in the handling of repository-global config values. That one's ok, because the way mergeGitConfig is implemented, it does not override any value that is set in git config. If a value with a repo-global setting was passed to setConfig, it would set it in the git config, reload the git config, re-apply mergeGitConfig, and use the newly set value, which is the right thing.	2020-02-27 01:11:53 -04:00
Joey Hess	029c883713	Merge branch 'master' into v8	2020-02-19 14:32:11 -04:00
Joey Hess	aa949bbb7d	initremote --describe-other-params Does not yet include descriptions from external special remote programs.	2020-01-20 16:05:51 -04:00
Joey Hess	3cd3757236	annex.dotfiles The git add behavior changes could be avoided if it turns out to be really annoying, but then it would need to behave the old way when annex.dotfiles=false and the new way when annex.dotfiles=true. I'd rather not have the config option result in such divergent behavior as `git annex add .` skipping a dotfile (old) vs adding to annex (new). Note that the assistant always adds dotfiles to the annex. This is surprising, but not new behavior. Might be worth making it also honor annex.dotfiles, but I wonder if perhaps some user somewhere uses it and keeps large files in a directory that happens to begin with a dot. Since dotfiles and dotdirs are a unix culture thing, and the assistant users may not be part of that culture, it seems best to keep its current behavior for now.	2019-12-26 16:33:39 -04:00
Joey Hess	7d9dff5b05	Merge branch 'master' into bs and update changelog	2019-12-18 15:13:30 -04:00
Joey Hess	7fd5376334	inprogress: Support --key	2019-12-18 14:14:16 -04:00
Joey Hess	c19211774f	use filepath-bytestring for annex object manipulations git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.	2019-12-11 15:25:07 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	a0168cd9a2	use RawFilePath getSymbolicLinkStatus for speed	2019-12-06 15:42:54 -04:00
Joey Hess	c20f4704a7	all commands building except for assistant also, changed ConfigValue to a newtype, and moved it into Git.Config.	2019-12-05 14:41:18 -04:00
Joey Hess	3c7fd09ec8	get many more commands building again about half are building now	2019-12-05 11:40:10 -04:00
Joey Hess	b88f89c1ef	get the most commonly used commands building again A quick benchmark of whereis shows not much speed improvement, maybe a few percent. Profiling it found a hotspot, adds to todo.	2019-12-04 13:45:18 -04:00
Joey Hess	d7833def66	use ByteString for git config The parser and looking up config keys in the map should both be faster due to using ByteString. I had hoped this would speed up startup time, but any improvement to that was too small to measure. Seems worth keeping though. Note that the parser breaks up the ByteString, but a config map ends up pointing to the config as read, which is retained in memory until every value from it is no longer used. This can change memory usage patterns marginally, but won't affect git-annex.	2019-11-27 17:40:09 -04:00
Joey Hess	067aabdd48	wip RawFilePath 2x git-annex find speedup Finally builds (oh the agoncy of making it build), but still very unmergable, only Command.Find is included and lots of stuff is badly hacked to make it compile. Benchmarking vs master, this git-annex find is significantly faster! Specifically: num files old new speedup 48500 4.77 3.73 28% 12500 1.36 1.02 66% 20 0.075 0.074 0% (so startup time is unchanged) That's without really finishing the optimization. Things still to do: * Eliminate all the fromRawFilePath, toRawFilePath, encodeBS, decodeBS conversions. * Use versions of IO actions like getFileStatus that take a RawFilePath. * Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy. * Use ByteString for parsing git config to speed up startup. It's likely several of those will speed up git-annex find further. And other commands will certianly benefit even more.	2019-11-26 16:01:58 -04:00
Joey Hess	6a97ff6b3a	wip RawFilePath Goal is to make git-annex faster by using ByteString for all the worktree traversal. For now, this is focusing on Command.Find, in order to benchmark how much it helps. (All other commands are temporarily disabled) Currently in a very bad unbuildable in-between state.	2019-11-25 16:18:19 -04:00
Joey Hess	667d38a8f1	Fix a crash (STM deadlock) when -J is used with multiple files that point to the same key See the comment for a trace of the deadlock. Added a new StartStage. New worker threads begin in the StartStage. Once a thread is ready to do work, it moves away from the StartStage, and no thread will ever transition back to it. A thread that blocks waiting on another thread that is processing the same key will block while in the StartStage. That other thread will never switch back to the StartStage, and so the deadlock is avoided.	2019-11-14 13:51:09 -04:00
Joey Hess	61b384d2b7	add --sameas option, not yet used	2019-10-01 12:36:25 -04:00
Joey Hess	2b55a2b882	remotedaemon: Don't list --stop in help since it's not supported. Also, move out of plumbing section. When using tor, the remotedaemon is part of the user's workflow, as it runs the tor hidden service.	2019-09-30 14:40:46 -04:00
Joey Hess	b13a350556	added --unlocked and --locked	2019-09-19 12:33:13 -04:00
Joey Hess	fda1bdd679	Added --mimetype and --mimeencoding file matching options. Already had these for largefiles matching, but I forgot to add them as command-line options.	2019-09-19 12:09:59 -04:00
Joey Hess	3f0eef4baa	v7 for all repositories * Default to v7 for new repositories. * Automatically upgrade v5 repositories to v7.	2019-08-30 14:09:14 -04:00
Joey Hess	9a5ddda511	remove many old version ifdefs Drop support for building with ghc older than 8.4.4, and with older versions of serveral haskell libraries than will be included in Debian 10. The only remaining version ifdefs in the entire code base are now a couple for aws! This commit should only be merged after the Debian 10 release. And perhaps it will need to wait longer than that; it would make backporting new versions of git-annex to Debian 9 (stretch) which has been actively happening as recently as this year. This commit was sponsored by Ilya Shlyakhter.	2019-07-05 15:09:37 -04:00
Joey Hess	ad6b8c5f77	fix STM deadlock in finishCommandActions Happened every time, because it was taking the pool TMVar while threads were still running, and then the thread would try to switch state.	2019-06-19 18:34:26 -04:00
Joey Hess	37d505dd6b	avoid STM deadlock When all worker threads are running and enteringStage is called, it waits for an idle slot. If all off the other threads then call it in turn, a deadlock occurrs. This is the same problem I didn't actually fix in `5a9842d7ed`. Fixed by doing two separate STM transactions, the first replaces its active thread with an idle thread, and the second waits for another idle thread. That guarantees there will eventually be an idle thread to find. The changes to WorkerPool were necessary because it can't add an idle thread containing the Annex state and go on to run an action using that same state, so I had to remove the Annex state from IdleWorker.	2019-06-19 18:15:25 -04:00
Joey Hess	a0d3a699e2	fix concurrency Broken by recent commits, because before dupState is called, the Annex state needs to have concurrent output enabled, and the thread pool populated.	2019-06-19 16:12:39 -04:00
Joey Hess	9671248fff	speed up enteringStage in non-concurrent mode Avoid a STM transaction. Also got rid of UnallocatedWorkerPool.	2019-06-19 15:47:54 -04:00
Joey Hess	53882ab4a7	make WorkerStage an open type Rather than limiting it to PerformStage and CleanupStage, this opens it up so any number of stages can be added as needed by commands. Each concurrent command has a set of stages that it uses, and only transitions between those can block waiting for a free slot in the worker pool. Calling enteringStage for some other stage does not block, and has very little overhead. Note that while before the Annex state was duplicated on the first call to commandAction, this now happens earlier, in startConcurrency. That means that seek stage actions should that use startConcurrency and then modify Annex state won't modify the state of worker threads they then start. I audited all of them, and only Command.Seek did so; prepMerge changes the working directory and so has to come before startConcurrency. Also, the remote list is built before duplicating the state, which means that it gets built earlier now than it used to. This would only have an effect of making commands that end up not needing to perform any actions unncessary build the remote list (only when they're run with concurrency enable), but that's a minor overhead compared to commands seeking through the work tree and determining they don't need to do anything.	2019-06-19 13:05:03 -04:00
Joey Hess	5a9842d7ed	avoid STM deadlock onredundant call to changeStageTo I couldn't find a way to avoid the deadlock w/o rewriting it to clearly not have one. I'm not quite sure what was the actual cause of the deadlock. This makes me unsure how I now know it clearly doesn't have a deadlock. But, it was easy to reproduce before (just call it twice in a row) and doesn't happen now.	2019-06-17 14:51:30 -04:00
Joey Hess	ecbd456312	fix restoring worker pool bug The bug might have led to a STM deadlock, if this case could ever actually fire.	2019-06-17 12:52:57 -04:00
Joey Hess	70bc30acb1	get rid of implicitMessages state Oh joyous day, this is probably git-annex's oldest implementation wart, source of much unncessary bother. Now that we have a StartMessage, showEndResult' can look at it to know if it needs to display an end message or not. This is also going to be faster, because it avoids an uncessary state lookup for each file processed.	2019-06-12 14:01:41 -04:00
Joey Hess	8e5ea28c26	finish CommandStart transition The hoped for optimisation of CommandStart with -J did not materialize. In fact, not runnign CommandStart in parallel is slower than -J3. So, CommandStart are still run in parallel. (The actual bad performance I've been seeing with -J in my big repo has to do with building the remoteList.) But, this is still progress toward making -J faster, because it gets rid of the onlyActionOn roadblock in the way of making CommandCleanup jobs run separate from CommandPerform jobs. Added OnlyActionOn constructor for ActionItem which fixes the onlyActionOn breakage in the last commit. Made CustomOutput include an ActionItem, so even things using it can specify OnlyActionOn. In Command.Move and Command.Sync, there were CommandStarts that used includeCommandAction, so output messages, which is no longer allowed. Fixed by using startingCustomOutput, but that's still not quite right, since it prevents message display for the includeCommandAction run inside it too.	2019-06-12 13:24:01 -04:00
Joey Hess	436f107715	make CommandStart return a StartMessage The goal is to be able to run CommandStart in the main thread when -J is used, rather than unncessarily passing it off to a worker thread, which incurs overhead that is signficant when the CommandStart is going to quickly decide to stop. To do that, the message it displays needs to be displayed in the worker thread, after the CommandStart has run. Also, the change will mean that CommandStart will no longer necessarily run with the same Annex state as CommandPerform. While its docs already said it should avoid modifying Annex state, I audited all the CommandStart code as part of the conversion. (Note that CommandSeek already sometimes runs with a different Annex state, and that has not been a source of any problems, so I am not too worried that this change will lead to breakage going forward.) The only modification of Annex state I found was it calling allowMessages in some Commands that default to noMessages. Dealt with that by adding a startCustomOutput and a startingUsualMessages. This lets a command start with noMessages and then select the output it wants for each CommandStart. One bit of breakage: onlyActionOn has been removed from commands that used it. The plan is that, since a StartMessage contains an ActionItem, when a Key can be extracted from that, the parallel job runner can run onlyActionOn' automatically. Then commands won't need to worry about this detail. Future work. Otherwise, this was a fairly straightforward process of making each CommandStart compile again. Hopefully other behavior changes were mostly avoided. In a few cases, a command had a CommandStart that called a CommandPerform that then called showStart multiple times. I have collapsed those down to a single start action. The main command to perhaps suffer from it is Command.Direct, which used to show a start for each file, and no longer does. Another minor behavior change is that some commands used showStart before, but had an associated file and a Key available, so were changed to ShowStart with an ActionItemAssociatedFile. That will not change the normal output or behavior, but --json output will now include the key. This should not break it for anyone using a real json parser.	2019-06-06 17:13:54 -04:00
Joey Hess	258a7c5cd1	add Key to all ActionItem constructors	2019-06-06 12:53:24 -04:00
Joey Hess	4932972487	fix STM deadlock `659640e224` was buggy, it had a STM deadlock because two actions both wanted to takeTMVar the WorkerPool and so blocked one-another. Fixed by completely reworking how the pool is maintained. Maintenace threads now wait for the Async actions and update the WorkerPool. This means twice as many threads as before, but green threads so will only use a few extra bytes ram per thread.	2019-06-05 20:07:35 -04:00
Joey Hess	659640e224	separate queue for cleanup actions When running multiple concurrent actions, the cleanup phase is run in a separate queue than the main action queue. This can make some commands faster, because less time is spent on bookkeeping in between each file transfer. But as far as I can see, nothing will be sped up much by this yet, because all the existing cleanup actions are very light-weight. This is just groundwork for deferring checksum verification to cleanup time. This change does mean that if the user expects -J2 will mean that they see no more than 2 jobs running at a time, they may be surprised to see 4 in some cases (if the cleanup actions are slow enough to notice). It might also make sense to enable background cleanup without the -J, for at least one cleanup action. Indeed, that's the behavior that -J1 has now. At some point in the future, it make make sense to make the behavior with no -J the same as -J1. The only reason it's not currently is that git-annex can build w/o concurrent-output, and also any bugs in concurrent-output (such as perhaps misbehaving on non-VT100 compatible terminals) are avoided by default by only using it when -J is used.	2019-06-05 17:54:35 -04:00
Joey Hess	c04b2af3e1	improved WorkerPool abstraction No behavior changes.	2019-06-05 14:26:48 -04:00
Joey Hess	30286bf067	improve layout	2019-06-05 12:20:20 -04:00
Joey Hess	aa7710982b	avoid list lookup by parseToken Minor optimisation to parsing of a preferred content expression.	2019-05-14 13:11:29 -04:00
Joey Hess	fa070df373	fix usage	2019-05-10 14:52:52 -04:00
Joey Hess	82186ca58f	annex.jobs=cpus etc Added the ability to run one job per CPU (core), by setting annex.jobs=cpus, or using option --jobs=cpus or -Jcpus. Built with future expansion in mind, including not defaulting matching on Concurrency so more constructors can later be added, and using "cpu" instead of "0".	2019-05-10 13:27:08 -04:00
Joey Hess	c0c38e986d	added renameremote command	2019-04-15 13:49:03 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	8fdea8f444	WIP Added graftTree but it's buggy. Should use graftTree in Annex.Branch.graftTreeish; it will be faster than the current implementation there. Started Annex.Import, but untested and it doesn't yet handle tree grafting.	2019-02-21 17:32:59 -04:00
Joey Hess	b080699a95	fromkey --json * fromkey: Added --json. * fromkey --batch output changed to support using it with --json. The old output was not parseable for any useful information, so this is not expected to break anything.	2019-02-05 14:03:29 -04:00
Joey Hess	303e828b7c	rest of the deserializeKey renameing	2019-01-14 13:17:47 -04:00
Joey Hess	727767e1e2	make everything build again after ByteString Key changes	2019-01-11 16:39:46 -04:00
Joey Hess	5ba14b5095	build cleanrly when benchmark flag is not enabled	2019-01-05 08:09:28 -04:00
Joey Hess	11d6e2e260	new improved benchmark command that can benchmark anything git-annex does	2019-01-04 13:46:36 -04:00
Joey Hess	904be4e6be	add --branch option to git-annex find and mildly deprecate findref in favor of it No deprecation warning at run time, just one on the man page. One thing findref remains able to do that find cannot is to run in a bare repo. Find was made to refuse to run in a bare repo because it seemed confusing for it to not list any files ever in that situation. It would be better for find --branch to work in a bare repo but not without --branch but I don't currently have a way to do that. Probably a better solution would be to make git-annex in a bare repo default to --branch master or something like that instead of --all. This commit was sponsored by Denis Dzyubenko on Patreon.	2018-12-09 14:10:37 -04:00
Joey Hess	029ae8d4db	support findred and --branch with file matching options * findref: Support file matching options: --include, --exclude, --want-get, --want-drop, --largerthan, --smallerthan, --accessedwithin * Commands supporting --branch now apply file matching options --include, --exclude, --want-get, --want-drop to filenames from the branch. Previously, combining --branch with those would fail to match anything. * add, import, findref: Support --time-limit. This commit was sponsored by Jake Vosloo on Patreon.	2018-12-09 13:38:35 -04:00
Joey Hess	e612633999	rethrow ExitStatus exceptions Several git-annex commands want to exit right away, but that's an exception, which is caught due to `39fbaa0682`. So, re-throw it.	2018-11-19 13:18:08 -04:00
Joey Hess	83109affd1	remove leftovers from removed TestSuite build flag Test suite is always built, so this can be simplified.	2018-11-19 12:39:16 -04:00
Joey Hess	39fbaa0682	catch all (non-async) exceptions when running a commandAction When a command is operating on multiple files and there's an error with one, try harder to continue to the rest. (As was already done for many types of errors including IO errors.) This handles cases like lockContentForRemoval throwing an exception when the content is already locked. Just because a drop of one file fails, does not mean it shouldn't go on to try to drop other files. I looked over uses of `giveup` in Command/*; there are too many to check them all extensively, but none stood out as being problems that should let one commandAction stop running other commandActions. Worst case, something bad will happen and rather than stopping right away with an error, git-annex will display multiple errors as it fails over and over on each file. I don't think I ever really intended `error`/`giveup` to stop other commandActions; this was a relic of old confusion over haskell exception handling. Test suite passes. This commit was sponsored by Ethan Aubin.	2018-11-15 15:59:43 -04:00
Joey Hess	872af2b2f1	avoid using concurrent-output at all when --quiet or --json Of course, it wasn't used much in those modes, because normal output is avoided. But it was still initialized and used in a few places, including a call to hideRegionsWhile.	2018-11-15 14:26:40 -04:00
Joey Hess	4a788fbb3b	sync --content now supports --hide-missing adjusted branches This relies on git ls-files --with-tree, which I'm using in a way that its man page does not document. Hm. I emailed the git list to try to get the docs improved, but at least the git test suite does test the same kind of use case I'm using here. Performance impact when not in an adjusted branch is limited to some additional MVar accesses, and a single git call to determine the name of the current branch. So very minimal. When in an adjusted branch, the performance impact is in Annex.WorkTree.lookupFile, which starts doing an equal amount of work for files that didn't exist as it already did for files that were unlocked. This commit was sponsored by Jochen Bartl on Patreon.	2018-10-19 17:51:25 -04:00
Joey Hess	38d691a10f	removed the old Android app Running git-annex linux builds in termux seems to work well enough that the only reason to keep the Android app would be to support Android 4-5, which the old Android app supported, and which I don't know if the termux method works on (although I see no reason why it would not). According to [1], Android 4-5 remains on around 29% of devices, down from 51% one year ago. [1] https://www.statista.com/statistics/271774/share-of-android-platforms-on-mobile-devices-with-android-os/ This is a rather large commit, but mostly very straightfoward removal of android ifdefs and patches and associated cruft. Also, removed support for building with very old ghc < 8.0.1, and with yesod < 1.4.3, and without concurrent-output, which were only being used by the cross build. Some documentation specific to the Android app (screenshots etc) needs to be updated still. This commit was sponsored by Brett Eisenberg on Patreon.	2018-10-13 01:41:11 -04:00
Joey Hess	6ba3dea566	annex.jobs Added annex.jobs setting, which is like using the -J option. Of course, -J overrides annex.jobs. This commit was sponsored by Trenton Cronholm on Patreon.	2018-10-04 12:47:27 -04:00
Joey Hess	53526136e8	move commandAction out of CmdLine.Seek This is groundwork for nested seek loops, eg seeking over all files and then performing commandActions on a list of remotes, which can be done concurrently. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2018-10-01 14:12:06 -04:00
Joey Hess	6134431254	clean P2P protocol shutdown on EOF try 2 Same goal as `b18fb1e343` but without breaking backwards compatability. Just return IO exceptions when running the P2P protocol, so that git-annex-shell can detect eof and avoid the ugly message. This commit was sponsored by Ethan Aubin.	2018-09-25 16:49:59 -04:00
Joey Hess	1d1054faa6	added -z Added -z option to git-annex commands that use --batch, useful for supporting filenames containing newlines. It only controls input to --batch, the output will still be line delimited unless --json or etc is used to get some other output. While git often makes -z affect both input and output, I don't like trying them together, and making it affect output would have been a significant complication, and also git-annex output is generally not intended to be machine parsed, unless using --json or a format option. Commands that take pairs like "file key" still separate them with a space in --batch mode. All such commands take care to support filenames with spaces when parsing that, so there was no need to change it, and it would have needed significant changes to the batch machinery to separate tose with a null. To make fromkey and registerurl support -z, I had to give them a --batch option. The implicit batch mode they enter when not provided with input parameters does not support -z as that would have complicated option parsing. Seemed better to move these toward using the same --batch as everything else, though the implicit batch mode can still be used. This commit was sponsored by Ole-Morten Duesund on Patreon.	2018-09-20 16:11:47 -04:00
Joey Hess	50217f62a1	avoid duplicate add action for v6 unlocked modified file The new second pass sees the file as type changed because the first pass's changes have typically not reached git yet. So, have to explicitly check for unmodified files in the second pass. Note that, if the file has been touched but not really modified, the first pass will handle it, and so the second pass does nothing. This commit was sponsored by Jochen Bartl on Patreon.	2018-09-12 15:20:34 -04:00
Joey Hess	2743224658	change v6 git-annex add of staged unmodified unlocked file v6: When a file is unlocked but has not been modified, and the unlocking is only staged, git-annex add did not lock it. Now it will, for consistency with how modified files are handled and with v5. Note the removal of the sameInodeCache check. Otherwise it would see that the unmodified file is unmodified and stop there. That check seems to have been copied from the direct mode branch. But, direct mode had a specific reason to check for unmodified content, that does not apply to v6. The second pass means there is potential for a race, eg the unlocked file could be modified in between the first and second passes. No problem with that, since both passes do the same thing. This commit was sponsored by Jake Vosloo on Patreon.	2018-09-12 14:00:05 -04:00
Joey Hess	12460fcea6	make --batch honor matching options When --batch is used with matching options like --in, --metadata, etc, only operate on the provided files when they match those options. Otherwise, a blank line is output in the batch protocol. Affected commands: find, add, whereis, drop, copy, move, get In the case of find, the documentation for --batch already said it honored the matching options. The docs for the rest didn't, but it makes sense to have them honor them. While this is a behavior change, why specify the matching options with --batch if you didn't want them to apply? Note that the batch output for all of the affected commands could already output a blank line in other cases, so batch users should already be prepared to deal with it. git-annex metadata didn't seem worth making support the matching options, since all it does is output metadata or set metadata, the use cases for using it in combination with the martching options seem small. Made it refuse to run when they're combined, leaving open the possibility for later support if a use case develops. This commit was sponsored by Brett Eisenberg on Patreon.	2018-08-08 12:07:06 -04:00
Joey Hess	6e6c9cc6d3	Added --accessedwithin matching option. Useful for dropping old objects from cache repositories. But also, quite a genrally useful thing to have.. Rather than imitiating find's -atime and other options, all of which are pretty horrible to use, I made this match files accessed within a time period, using the same duration format used by git-annex schedule and --limit-time In passing, changed the --limit-time option parser to parse the duration, instead of having it later throw an error. This commit was supported by the NSF-funded DataLad project.	2018-08-01 15:34:03 -04:00
Joey Hess	85f9360d9b	GIT_ANNEX_SHELL_APPENDONLY Makes it allow writes, but not deletion of annexed content. Note that securing pushes to the git repository is left up to the user. This commit was sponsored by Jack Hill on Patreon.	2018-05-25 13:17:56 -04:00
Joey Hess	bea0ad220a	avoid --all buffering list of all keys In Annex.Branch.branch, the (++) was killing laziness. Rewrote so it streams lazily. filterM also kills laziness, so made loggedKeys use a Unchecked type, and check if the key is dead in the seek loop. Note that loggedKeysFor still buffers, so git-annex info <remote> and git-annex unused --from remote still use more memory than necessary. Also removed some unused functions from Annex.Journal.	2018-04-26 16:00:20 -04:00
Joey Hess	bfa26661d1	import: Avoid buffering all filenames to be imported in memory. Test case is 24 directories each containing files named 1..10000. The concat and filterM destroyed what laziness there is in dirContentsRecursive, making it buffer all the filenames. Memory use was around 300 mb (possibly growing slightly as it progressed). After this fix, memory use drops to a constant 59 mb. Note that dirContentsRecursive still buffers the entire content of a directory (not subdirectories) so this is still not optimal.	2018-04-26 12:06:12 -04:00

1 2 3 4 5 ...

355 commits