git-annex

Author	SHA1	Message	Date
Joey Hess	3a66cd715f	avoid making absolute git remote path relative When a git remote is configured with an absolute path, use that path, rather than making it relative. If it's configured with a relative path, use that. Git.Construct.fromPath changed to preserve the path as-is, rather than making it absolute. And Annex.new changed to not convert the path to relative. Instead, Git.CurrentRepo.get generates a relative path. A few things that used fromAbsPath unncessarily were changed in passing to use fromPath instead. I'm seeing fromAbsPath as a security check, while before it was being used in some cases when the path was known absolute already. It may be that fromAbsPath is not really needed, but only git-annex-shell uses it now, and I'm not 100% sure that there's not some input that would cause a relative path to be used, opening a security hole, without the security check. So left it as-is. Test suite passes and strace shows the configured remote url is used unchanged in the path into it. I can't be 100% sure there's not some code somewhere that takes an absolute path to the repo and converts it to relative and uses it, but it seems pretty unlikely that the code paths used for a git remote would call such code. One place I know of is gitAnnexLink, but I'm pretty sure that git remotes never deal with annex symlinks. If that did get called, it generates a path relative to cwd, which would have been wrong before this change as well, when operating on a remote.	2021-02-08 13:18:01 -04:00
Joey Hess	cc89699457	mincopies This is conceptually very simple, just making a 1 that was hard coded be exposed as a config option. The hard part was plumbing all that, and dealing with complexities like reading it from git attributes at the same time that numcopies is read. Behavior change: When numcopies is set to 0, git-annex used to drop content without requiring any copies. Now to get that (highly unsafe) behavior, mincopies also needs to be set to 0. It seemed better to remove that edge case, than complicate mincopies by ignoring it when numcopies is 0. This commit was sponsored by Denis Dzyubenko on Patreon.	2021-01-06 14:15:19 -04:00
Joey Hess	a3a19518d8	fix --time-limit It got broken in several ways by the streaming seeking optimisations around version 8.20201007. Moved time limit checking out of the matcher, which was a hack in the first place. So everywhere that uses Limit.getMatcher needs to check time limit. Well, almost everywhere. Command.Info uses it, but it does not make sense to time limit getting info. And Command.MultiCast uses it just to build up a list of files that then get passed to a command, so it would never have hit the timeout in a useful way. This implementation is a little more expensive when at time limit than necessary, since it continues seeking only to discard everything after the time limit. I did try making it close the file handles to force a faster shutdown, but that didn't work and hung. Could certianly be improved somehow, but seeking is probably not the expensive bit when a time limit is hit, so this seems acceptable for now.	2021-01-04 15:57:11 -04:00
Joey Hess	6280af2901	generate more compact git-annex branch for imports Especially from borg, where the content identifier logs all end up being the same identical file! But also, for other imports, the location tracking logs can, in some cases, be identical files. Bonus optimisation: Avoid looking up (and parsing when set) GIT_ANNEX_VECTOR_CLOCK env var every time a log is written to. Although the lookup does happen at startup even when no log will be written now.	2020-12-23 15:25:16 -04:00
Joey Hess	00526a6739	pass along -c options to child git-annex processes	2020-12-15 10:49:29 -04:00
Joey Hess	d3f78da0ed	propagate signals to the transferrer process group Done on unix, could not implement it on windows quite. The signal library gets part of the way needed for windows. But I had to open https://github.com/pmlodawski/signal/issues/1 because it lacks raiseSignal. Also, I don't know what the equivilant of getProcessGroupIDOf is on windows. And System.Process does not provide a way to send any signal to a process group except for SIGINT. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-12-11 15:32:00 -04:00
Joey Hess	47016fc656	move TransferrerPool from Assistant state to Annex state This commit was sponsored by Graham Spencer on Patreon.	2020-12-07 13:21:35 -04:00
Joey Hess	0896038ba7	annex.adjustedbranchrefresh Added annex.adjustedbranchrefresh git config to update adjusted branches set up by git-annex adjust --unlock-present/--hide-missing. Note, in a few cases, I was not able to make the adjusted branch be updated in calls to moveAnnex, because information about what file corresponds to a key is not available. They are: * If two files point to one file, then eg, `git annex get foo` will update the branch to unlock foo, but will not unlock bar, because it does not know about it. Might be fixable by making `git annex get bar` do something besides skipping bar? * git-annex-shell recvkey likewise (so sends over ssh from old versions of git-annex) * git-annex setkey * git-annex transferkey if the user does not use --file * git-annex multicast sends keys with no associated file info Doing a single full refresh at the end, after any incremental refresh, will deal with those edge cases.	2020-11-16 14:27:28 -04:00
Joey Hess	885974be99	add newtypes for QuickCheck to avoid LANG=C issues All properties changed to use them, except for prop_encode_c_decode_c_roundtrip, which already filtered to ascii for other reasons. A few modules had to be split out, because Setup does not build-depend on QuickCheck.	2020-11-09 20:21:18 -04:00
Joey Hess	f45ad178cb	more RawFilePath conversion At 318/645 after 4k lines of changes This commit was sponsored by Jake Vosloo on Patreon.	2020-10-29 12:03:50 -04:00
Joey Hess	8d66f7ba0f	more RawFilePath conversion Added a RawFilePath createDirectory and kept making stuff build. Up to 296/645 This commit was sponsored by Mark Reidenbach on Patreon.	2020-10-28 17:25:59 -04:00
Joey Hess	5cfcf1f05f	cache remote.log Unlikely to speed up any of the existing uses much, but I want to use it in a message that might be displayed many times.	2020-09-22 13:52:26 -04:00
Joey Hess	77c42782d0	differentiate between concurrency enabled at command line and by git config The latter should not affect --batch mode.	2020-09-16 11:47:12 -04:00
Joey Hess	f912f8e5fd	refix bug in a better way Always run Git.Config.store, so when the git config gets reloaded, the override gets re-added to it, and changeGitRepo then calls extractGitConfig on it and sees the annex.* settings from the override. Remove any prior occurance of -c v and add it to the end. This way, -c foo=1 -c foo=2 -c foo=1 will pass -c foo=1 to git, rather than -c foo=2 Note that, if git had some multiline config that got built up by multiple -c's, this would not work still. But it never worked because before the bug got fixed in the first place, the -c value was repeated many times, so the multivalue thing would have been wrong. I don't think -c can be used with multiline configs anyway, though git-config does talk about them?	2020-07-02 13:32:33 -04:00
Joey Hess	d5451afc8f	fix deadlock Fix a deadlock that could occur after git-annex got an unlocked file, causing the command to hang indefinitely. Known to happen on vfat filesystems, possibly others. Note that a deadlock is still theoretically possible, if anything smudge --clean does causes it to run the git queue for some other reason. Apparently that doesn't happen, but will need to keep an eye on it.	2020-06-18 12:56:29 -04:00
Joey Hess	04352ed9c5	check-ignore resource pool Much like check-attr before.	2020-04-21 11:25:28 -04:00
Joey Hess	45fb7af21c	check-attr resource pool Limited to min of -JN or number of CPU cores, because it will often be CPU bound, once it's read the gitignore file for a directory. In some situations it's more disk bound, but in any case it's unlikely to be the main bottleneck that -J is used to avoid. Eg, when dropping, this is used for numcopies checks, but the main bottleneck will be accessing the remotes to verify presence. So the user might decide to -J32 that, but having 32 check-attr processes would just waste however many filehandles they open, and probably worsen their performance due to CPU contention. Note that, I first tried just letting up to the -JN be started. However, even when it's no bottleneck at all, that still results in all of them being started. Why? Well, all the worker threads start up nearly simulantaneously, so there's a thundering herd..	2020-04-21 11:05:57 -04:00
Joey Hess	cee6b344b4	cat-file resource pool Avoid running a large number of git cat-file child processes when run with a large -J value. This implementation takes care to avoid adding any overhead to git-annex when run without -J. When run with -J, there is a small bit of added overhead, to manipulate the resource pool. That optimisation added a fair bit of complexity.	2020-04-20 15:19:31 -04:00
Joey Hess	529f488ec4	fix a thundering herd problem Avoid repeatedly opening keys db when accessing a local git remote and -J is used. What was happening was that Remote.Git.onLocal created a new annex state as each thread started up. The way the MVar was used did not prevent that. And that, in turn, led to repeated opening of the keys db, as well as probably other extra work or resource use. Also managed to get rid of Annex.remoteannexstate, and it turned out there was an unncessary Maybe in the keysdbhandle, since the handle starts out closed.	2020-04-17 17:09:29 -04:00
Joey Hess	2caf579718	cache annex index filename for 1.5% speedup to queries	2020-04-10 13:37:04 -04:00
Joey Hess	c8fec6ab03	Fix a minor bug that caused options provided with -c to be passed multiple times to git.	2020-03-16 13:06:44 -04:00
Joey Hess	c78b9b55b6	rename changeGitConfig to overrideGitConfig and avoid unncessary calls It's important that it be clear that it overrides a config, such that reloading the git config won't change it, and in particular, setConfig won't change it. Most of the calls to changeGitConfig were actually after setConfig, which was redundant and unncessary. So removed those. The only remaining one, besides --debug, is in the handling of repository-global config values. That one's ok, because the way mergeGitConfig is implemented, it does not override any value that is set in git config. If a value with a repo-global setting was passed to setConfig, it would set it in the git config, reload the git config, re-apply mergeGitConfig, and use the newly set value, which is the right thing.	2020-02-27 01:11:53 -04:00
Joey Hess	c089f395b0	Bugfix: Don't ignore --debug when it is followed by -c	2020-02-27 00:52:37 -04:00
Joey Hess	4acbb40112	git-annex config annex.largefiles annex.largefiles can be configured by git-annex config, to more easily set a default that will also be used by clones, without needing to shoehorn the expression into the gitattributes file. The git config and gitattributes override that. Whenever something is added to git-annex config, we have to consider what happens if a user puts a purposfully bad value in there. Or, if a new git-annex adds some new value that an old git-annex can't parse. In this case, a global annex.largefiles that can't be parsed currently makes an error be thrown. That might not be ideal, but the gitattribute behaves the same, and is almost equally repo-global. Performance notes: git-annex add and addurl construct a matcher once and uses it for every file, so the added time penalty for reading the global config log is minor. If the gitattributes annex.largefiles were deprecated, git-annex add would get around 2% faster (excluding hashing), because looking that up for each file is not fast. So this new way of setting it is progress toward speeding up add. git-annex smudge does need to load the log every time. As well as checking the git attribute. Not ideal. Setting annex.gitaddtoannex=false avoids both overheads.	2019-12-20 13:01:41 -04:00
Joey Hess	f08cc8218b	optimisation This was already optimised before, but profiling found that delEntry was around 1.5% of the total runtime of git-annex whereis. It was being called once per environment variable per file processed. Fixed by better caching. Since withIndexFile is almost always run with the same .git/annex/index file, it can cache the modified environment, rather than re-modifying it each time called. (cherry picked from commit `6535aea49a`)	2019-12-04 14:30:22 -04:00
Joey Hess	99536e3a0b	remove one more warningIO Had to generalize Git.Queue so it can run an Annex action, yipes. Only remaining warningIO are in the legacy chunk code.	2019-11-12 10:45:52 -04:00
Joey Hess	fef3cd055d	Removed support for git versions older than 2.1 debian oldoldstable has 2.1, and that's what i386ancient uses. It would be better to require git 2.2, which is needed to use adjusted branches, but can't do that w/o losing support for some old linux kernels or a complicated git backport.	2019-09-11 16:14:43 -04:00
Joey Hess	9671248fff	speed up enteringStage in non-concurrent mode Avoid a STM transaction. Also got rid of UnallocatedWorkerPool.	2019-06-19 15:47:54 -04:00
Joey Hess	659640e224	separate queue for cleanup actions When running multiple concurrent actions, the cleanup phase is run in a separate queue than the main action queue. This can make some commands faster, because less time is spent on bookkeeping in between each file transfer. But as far as I can see, nothing will be sped up much by this yet, because all the existing cleanup actions are very light-weight. This is just groundwork for deferring checksum verification to cleanup time. This change does mean that if the user expects -J2 will mean that they see no more than 2 jobs running at a time, they may be surprised to see 4 in some cases (if the cleanup actions are slow enough to notice). It might also make sense to enable background cleanup without the -J, for at least one cleanup action. Indeed, that's the behavior that -J1 has now. At some point in the future, it make make sense to make the behavior with no -J the same as -J1. The only reason it's not currently is that git-annex can build w/o concurrent-output, and also any bugs in concurrent-output (such as perhaps misbehaving on non-VT100 compatible terminals) are avoided by default by only using it when -J is used.	2019-06-05 17:54:35 -04:00
Joey Hess	c04b2af3e1	improved WorkerPool abstraction No behavior changes.	2019-06-05 14:26:48 -04:00
Joey Hess	b03e65d260	Improved locking when multiple git-annex processes are writing to the .git/index file	2019-05-06 15:15:12 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	e4e464da65	import command is updating tracking branch	2019-02-26 13:15:48 -04:00
Joey Hess	2e0e557e75	Support being built with ghc 8.0.1 (MonadFail) Tested on an older ghc by enabling MonadFailDesugaring globally. In TransferQueue, the lack of a MonadFail for STM exposed what would normally be a bug in the pattern matching, although in this case an earlier check that the queue was not empty avoided a pattern match failure.	2019-01-05 11:55:15 -04:00
Joey Hess	894716512d	add a UUIDDesc type containing a ByteString Groundwork for handling uuid.log using ByteString	2019-01-01 16:17:54 -04:00
Joey Hess	8be5a7269a	refactor getCurrentBranch Both Command.Sync and Annex.Ingest had their own versions of this. The one in Annex.Ingest used Git.Branch.currentUnsafe, but does not seem to need it. That is only checking to see if it's in an adjusted unlocked branch, and when in an adjusted branch, the branch does in fact exist, so the added check that Git.Branch.current does is fine. This commit was sponsored by Denis Dzyubenko on Patreon.	2018-10-19 17:29:18 -04:00
Joey Hess	759a87ad70	fix git command queue to be concurrency safe Probably not noticed until now because the queue is large enough that two threads each filling theirs at the same time and flushing is unlikely to happen. Also made explicit that each worker thread gets its own queue. I think that was the case before, but if something was put in the queue before worker threads were forked off, they could have each inherited the same queue. Could have gone with a single shared queue, but per-worker queues is more efficient, because a worker can add lots of stuff to its own queue without any locking. This commit was sponsored by Ole-Morten Duesund on Patreon.	2018-08-28 13:16:33 -04:00
Joey Hess	256d8f07e8	avoid insertWith' depreaction warning Switch to Data.Map.Strict everywhere that used it. There are still lots of lazy maps in git-annex. I think switching these is safe. The risk is that there might be a map that is used in a way that relies on the values not being evaluated to WHNF, and switching to strict might result in bad performance or memory use. So, I have not switched everything.	2018-04-22 13:28:31 -04:00
Joey Hess	2ec07bc29f	Avoid running annex.http-headers-command more than once.	2018-04-04 15:15:08 -04:00
Joey Hess	fa65f1d240	fix --json-progress --json to be same as --json --json-progress Fix behavior of --json-progress followed by --json, in which the latter option disabled the former. This commit was supported by the NSF-funded DataLad project.	2018-02-19 14:12:15 -04:00
Joey Hess	2b66492d6e	Improve startup time for commands that do not operate on remotes And for tab completion, by not unnessessarily statting paths to remotes, which used to cause eg, spin-up of removable drives. Got rid of the remotes member of Git.Repo. This was a bit painful. Remote.Git modifies the list of remotes as it reads their configs, so still need a persistent list of remotes. So, put it in as Annex.gitremotes. It's only populated by getGitRemotes, so commands like examinekey that don't care about remotes won't do so. This commit was sponsored by Jake Vosloo on Patreon.	2018-01-09 16:22:07 -04:00
Joey Hess	24f27ec39d	convert importfeed to youtube-dl Fully working, including --fast/--relaxed. Note that, while git-annex addurl --relaxed is not going to check youtube-dl, I kept git annex importfeed --relaxed checking it. Thinking is that, let's not break people's importfeed cron jobs, and importfeed does not typically have to check a large number of new items, so it's ok if it's a little bit slower when used with youtube playlist feeds. importfeed's behavior is also improved (?) when a feed has links in it to non-media files. Before, those were skipped. Now, the content of the link is downloaded. This had to be done, because trying to use youtube-dl is slow, and if those were skipped, it would have to check every time importfeed was run. While this behavior change may not be desirable for some feeds, that intersperse links to web pages with enclosures, it will be desirable for other feeds, that have non-enclosure directy links to media files. Remove old quvi modules. This commit was sponsored by Øyvind Andersen Holm.	2017-11-29 17:30:02 -04:00
Joey Hess	e1ac299ad0	better dup key with -J fix This avoids all the complication about redundant work discussed in the previous try at fixing this. At the expense of needing each command that could have the problem to be patched to simply wrap the action in onlyActionOn once the key is known. But there do not seem to be many such commands. onlyActionOn' should not be used with a CommandStart (or CommandPerform), although the types do allow it. onlyActionOn handles running the whole CommandStart chain. I couldn't immediately see a way to avoid mistken use of onlyActionOn'. This commit was supported by the NSF-funded DataLad project.	2017-10-17 18:48:53 -04:00
Joey Hess	68a49adcda	Improve behavior when -J transfers multiple files that point to the same key After a false start, I found a fairly non-intrusive way to deal with it. Although it only handles transfers -- there may be issues with eg concurrent dropping of the same key, or other operations. There is no added overhead when -J is not used, other than an added inAnnex check. When -J is used, it has to maintain and check a small Set, which should be negligible overhead. It could output some message saying that the transfer is being done by another thread. Or it could even display the same progress info for both files that are being downloaded since they have the same content. But I opted to keep it simple, since this is rather an edge case, so it just doesn't say anything about the transfer of the file until the other thread finishes. Since the deferred transfer action still runs, actions that do more than transfer content will still get a chance to do their other work. (An example of something that needs to do such other work is P2P.Annex, where the download always needs to receive the content from the peer.) And, if the first thread fails to complete a transfer, the second thread can resume it. But, this unfortunately means that there's a risk of redundant work being done to transfer a key that just got transferred. That's not ideal, but should never cause breakage; the same thing can occur when running two separate git-annex processes. The get/move/copy/mirror --from commands had extra inAnnex checks added, inside the download actions. Without those checks, the first thread downloaded the content, and then the second thread woke up and downloaded the same content redundantly. move/copy/mirror --to is left doing redundant uploads for now. It would need a second checkPresent of the remote inside the upload to avoid them, which would be expensive. A better way to avoid redundant work needs to be found.. This commit was supported by the NSF-funded DataLad project.	2017-10-17 17:10:50 -04:00
Joey Hess	d39c120afa	add annex-ignore-command and annex-sync-command configs Added remote configuration settings annex-ignore-command and annex-sync-command, which are dynamic equivilants of the annex-ignore and annex-sync configurations. For this I needed a new DynamicConfig infrastructure. Its implementation should be as fast as before when there is no dynamic config, and it caches so shell commands are only run once. Note that annex-ignore-command exits nonzero when the remote should be ignored. While that may seem backwards, it allows using the same command for it as for annex-sync-command when you want to disable both. This commit was sponsored by Trenton Cronholm on Patreon.	2017-08-17 13:54:14 -04:00
Joey Hess	3f4b671486	fix sshCleanup race using STM	2017-05-11 18:29:51 -04:00
Joey Hess	6992fe133b	Ssh password prompting improved when using -J When ssh connection caching is enabled (and when GIT_ANNEX_USE_GIT_SSH is not set), only one ssh password prompt will be made per host, and only one ssh password prompt will be made at a time. This also fixes a race in prepSocket's stale ssh connection stopping when run with -J. It was possible for one thread to start a cached ssh connection, and another thread to immediately stop it, resulting in excess connections being made. This commit was supported by the NSF-funded DataLad project.	2017-05-11 17:36:03 -04:00
Joey Hess	4c1e3210fa	annex.backend is the new name for what was annex.backends It takes a single key-value backend, rather than the unncessary and confusing list. The old option still works if set. Simplified some old old code too. This commit was sponsored by Thomas Hochstein on Patreon.	2017-05-09 15:04:07 -04:00
Joey Hess	0534152685	get -J: Improve distribution of jobs amoung remotes when there are more jobs than remotes. It was distributing jobs to remotes that were not being used by any other job. But, suppose that there are only 2 remotes, and -J10. In such a case, the first 2 downloads would be distributed amoung the 2 remotes, but the other 8 would all go to remote #1. Improved by keeping a counter of how many jobs are assigned to a remote, and prefer remotes with fewer jobs. Note use of Data.Map.Strict to avoid blowing up space. I kept the bang-patterns as-is, although probably not needed with Data.Map.Strict. This commit was sponsored by Jack Hill on Patreon.	2017-03-08 14:49:30 -04:00
Joey Hess	ed56dba868	annex.autocommit can be configured via git-annex config ... to control the default behavior in all clones of a repository. This includes a new Configurable data type, so the GitConfig type indicates which values can be configured this way. The implementation should be quite efficient; the config log is only read once, and only when a Configurable value has not already been set by git-config. Indeed, it would be nice in the future to extend this, so that git-config is itself only read on demand. Some commands may not need to look at the git configuration at all. This commit was sponsored by Trenton Cronholm on Patreon.	2017-02-03 13:58:53 -04:00

1 2 3 4 5 ...

275 commits