git-annex

Author	SHA1	Message	Date
Joey Hess	ce91f10132	fix annex.skipunknown false error propagation Propagate nonzero exit status from git ls-files when a specified file does not exist, or a specified directory does not contain any files checked into git. The recent completion of the annex.skipunknown transition exposed this bug, that has unfortunately been lurking all along. It is also possible that git ls-files errors out for some other reason -- perhaps a permission problem -- and this will also fix error propagation in such situations. Sponsored-by: Dartmouth College's Datalad project	2022-02-28 12:54:56 -04:00
Joey Hess	5b373a9dd2	read a consistent amount from pointer file A few places were reading the max symlink size of a pointer file, then passing tp parseLinkTargetOrPointer. Which is fine currently, but to support pointer files with lines of data after the pointer, enough has to be read that parseLinkTargetOrPointer can be assured of seeing enough of that data to know if it's correctly formatted. Sponsored-by: Dartmouth College's Datalad project	2022-02-23 12:52:34 -04:00
Joey Hess	835c50966a	reject batch options combined with non-batch options Reject combinations of --batch (or --batch-keys) with options like --all or --key or with filenames. Most commands ignored the non-batch items when batch mode was enabled. For some reason, addurl and dropkey both processed first the specified non-batch items, followed by entering batch mode. Changed them to also error out, for consistency. Sponsored-by: Dartmouth College's Datalad project	2022-01-26 13:00:19 -04:00
Joey Hess	7f6b2ca49c	handle overBranchFileContents with read-only unmerged git-annex branches This makes --all error out in that situation. Which is better than ignoring information from the branches. To really handle the branches right, overBranchFileContents would need to both query all the branches and union merge file contents (or perhaps not provide any file content), as well as diffing between branches to find files that are only present in the unmerged branches. And also, it would need to handle transitions.. Sponsored-by: Dartmouth College's Datalad project	2021-12-27 14:30:51 -04:00
Joey Hess	d9d0fe5fa4	disable precaching git-annex branch when there are unmerged branches in a read-only repo The way precaching works, it can't merge in information from those branches efficiently, so just disable it and fall back to Annex.Branch.get in order to get the correct information. Sponsored-by: Dartmouth College's Datalad project	2021-12-27 14:08:50 -04:00
Joey Hess	68257e9076	add git-annex filter-process filter-process: New command that can make git add/checkout faster when there are a lot of unlocked annexed files or non-annexed files, but that also makes git add of large annexed files slower. Use it by running: git config filter.annex.process 'git-annex filter-process' Fully tested and working, but I have not benchmarked it at all. And, incremental hashing is not done when git add uses it, so extra work is done in that case. Sponsored-by: Mark Reidenbach on Patreon	2021-11-04 15:02:36 -04:00
Joey Hess	afe5d9e137	fix duplicated commands in git-annex-shell usage display	2021-10-11 15:42:27 -04:00
Joey Hess	7bdc7350a5	remove git-annex-shell compat code * Removed support for accessing git remotes that use versions of git-annex older than 6.20180312. * git-annex-shell: Removed several commands that were only needed to support git-annex versions older than 6.20180312. (lockcontent, recvkey, sendkey, transferinfo, commit) The P2P protocol was added in that version, and used ever since, so this code was only needed for interop with older versions. "git-annex-shell commit" is used by newer git-annex versions, though unnecessarily so, because the p2pstdio command makes a single commit at shutdown. Luckily, it was run with stderr and stdout sent to /dev/null, and non-zero exit status or other exceptions are caught and ignored. So, that was able to be removed from git-annex-shell too. git-annex-shell inannex, recvkey, sendkey, and dropkey are still used by gcrypt special remotes accessed over ssh, so those had to be kept. It would probably be possible to convert that to using the P2P protocol, but it would be another multi-year transition. Some git-annex-shell fields were able to be removed. I hoped to remove all of them, and the very concept of them, but unfortunately autoinit is used by git-annex sync, and gcrypt uses remoteuuid. The main win here is really in Remote.Git, removing piles of hairy fallback code. Sponsored-by: Luke Shumaker	2021-10-11 15:36:51 -04:00
Joey Hess	ab7b5a492c	--batch-keys New --batch-keys option added to these commands: get, drop, move, copy, whereis git-annex-matching-options had to be reworded since some of its options can be used to match on keys, not only files. Sponsored-by: Luke Shumaker on Patreon	2021-08-25 14:21:12 -04:00
Joey Hess	fa62c98910	simplify and speed up Utility.FileSystemEncoding This eliminates the distinction between decodeBS and decodeBS', encodeBS and encodeBS', etc. The old implementation truncated at NUL, and the primed versions had to do extra work to avoid that problem. The new implementation does not truncate at NUL, and is also a lot faster. (Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the primed versions.) Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation, and upgrading to it will speed up to/fromRawFilePath. AFAIK, nothing relied on the old behavior of truncating at NUL. Some code used the faster versions in places where I was sure there would not be a NUL. So this change is unlikely to break anything. Also, moved s2w8 and w82s out of the module, as they do not involve filesystem encoding really. Sponsored-by: Shae Erisson on Patreon	2021-08-11 12:13:31 -04:00
Joey Hess	47d3dccf19	whereused implemented except --historical Sponsored-by: Jack Hill on Patreon	2021-07-14 14:27:21 -04:00
Joey Hess	8a13bbedd6	--size-limit exit 101 Sponsored-by: Mark Reidenbach on Patreon	2021-06-04 16:43:47 -04:00
Joey Hess	771a122c9e	add --size-limit option When this option is not used, there should be effectively no added overhead, thanks to the optimisation in `b3cd0cc6ba`. When an action fails on a file, the size of the file still counts toward the size limit. This was necessary to support concurrency, but also generally seems like the right choice. Most commands that operate on annexed files support the option. export and import do not, and I don't know if it would make sense for export to.. Why would you want an incomplete export? sync doesn't, and while it would be easy to make it support it for transferring files, it's not clear if dropping files should also take the size limit into account. Commands like add that don't operate on annexed files don't support the option either. Exiting 101 not yet implemented. Sponsored-by: Denis Dzyubenko on Patreon	2021-06-04 16:16:53 -04:00
Joey Hess	b3cd0cc6ba	minor optimisation Avoid a second mvar access. Sponsored-by: Jochen Bartl on Patreon	2021-06-04 14:57:19 -04:00
Joey Hess	b5f5475ed6	New matching options --excludesamecontent and --includesamecontent The normalisation of filenames turns out to be the tricky part here, because the associated files coming out of the keys db may look like "./foo/bar" or "../bar". For the former to match a glob like "foo/", it needs to be normalised. Note that, on windows, normalise "./foo/bar" = "foo\\bar" which a glob like "foo/" won't match. So the glob is matched a second time, on the toInternalGitPath, so allowing the user to provide a glob with the slashes in either direction. However, this still won't support some wacky edge cases like the user providing a glob of "foo/bar\\*" Sponsored-by: Dartmouth College's Datalad project	2021-05-25 13:08:18 -04:00
Joey Hess	a58c90ccf4	skeleton of filter-branch command, with option parser	2021-05-14 10:59:48 -04:00
Joey Hess	0e830b6bb5	make remoteKeyToRemoteName safer If it's passed a ConfigKey such as annex.version, avoid returning an empty remote name and return Nothing instead. Also, foo.bar.baz is not treated as a remote named "bar".	2021-04-23 13:29:21 -04:00
Joey Hess	653b719472	fix --all to include not yet committed files from the journal Fix bug caused by recent optimisations that could make git-annex not see recently recorded status information when configured with annex.alwayscommit=false. This does mean that --all can end up processing the same key more than once, but before the optimisations that introduced this bug, it used to also behave that way. So I didn't try to fix that; it's an edge case and anyway git-annex behaves well when run on the same key repeatedly. I am not too happy with the use of a MVar to buffer the list of files in the journal. I guess it doesn't defeat lazy streaming of the list, if that list is actually generated lazily, and anyway the size of the journal is normally capped and small, so if configs are changed to make it huge and this code path fire, git-annex using enough memory to buffer it all is not a large problem.	2021-04-21 15:40:32 -04:00
Joey Hess	74acf17a31	refactoring	2021-04-21 14:29:02 -04:00
Joey Hess	6eb3c0a6b4	fix branch precacheing bug by checking journal Fix bug caused by recent optimisations that could make git-annex not see recently recorded status information when configured with annex.alwayscommit=false. When not using --all, precaching only gets triggered when the command actually needs location logs, and so there's no speed hit there. This is a minor speed hit for --all, because it precaches even when the location log is not actually going to be used, and so checking the journal is not necessary. It would have been possible to defer checking the journal until the cache gets used. But that would complicate the usual Branch.get code path with two different kinds of caches, and the speed hit is really minimal. A better way to speed up --all, later, would be to avoid precaching at all when the location log is not going to be used.	2021-04-21 14:02:15 -04:00
Joey Hess	2e9d4ac754	fix fastDebug to check if debugging is actually enabled Had to add to AnnexRead an indication of whether debugging is enabled. Could have just made setupConsole not install a debug output action that outputs, and have enableDebug be what installs that, but then in the common case where there is no debug selector, and so all debug output is selected, it would run the debug output action every time, which entails an IORef access. Which would make fastDebug too slow..	2021-04-06 16:28:37 -04:00
Joey Hess	d16d739ce2	implement fastDebug Most of the changes here involve global option parsing: GlobalSetter changed so it can both run an Annex action to set state, but can also change the AnnexRead value, which is immutable once the Annex monad is running. That allowed a debugselector value to be added to AnnexRead, seeded from the git config. The --debugfilter option's GlobalSetter then updates the AnnexRead. This improved GlobalSetter can later be used to move more stuff to AnnexRead. Things that don't involve a git config will be easier to move, and probably a lot of things can be moved eventually. fastDebug, while implemented, is not used anywhere yet. But it should be fast..	2021-04-06 15:24:28 -04:00
Joey Hess	1b645e1ace	added --debugfilter (and annex.debugfilter)	2021-04-05 15:31:10 -04:00
Joey Hess	c2f612292a	start splitting out readonly values from AnnexState Values in AnnexRead can be read more efficiently, without MVar overhead. Only a few things have been moved into there, and the performance increase so far is not likely to be noticable. This is groundwork for putting more stuff in there, particularly a value that indicates if debugging is enabled. The obvious next step is to change option parsing to not run in the Annex monad to set values in AnnexState, and instead return a pure value that gets stored in AnnexRead.	2021-04-02 15:51:44 -04:00
Joey Hess	a8b837aaef	add git ls-tree --long parser Not yet used, but allows getting the size of items in the tree fairly cheaply. I noticed that CmdLine.Seek uses ls-tree and the feeds the files into another long-running process to check their size. That would be an example of a place that might be sped up by using this. Although in that particular case, it only needs to know the size of unlocked files, not locked. And since enabling --long probably doubles the ls-tree runtime or more, the overhead of using it there may outwweigh the benefit.	2021-03-23 12:47:00 -04:00
Joey Hess	5545e78a1e	Make --debug also enable debugging in child git-annex processes Especially necessary with stalldetection using child processes for transfers. This commit was sponsored by Jack Hill on Patreon.	2021-03-22 14:25:28 -04:00
Joey Hess	a14001785e	fix --branch combined with --unlocked or --locked Since it's using git ls-tree anyway, can just look at the file modes to see if they're unlocked or are symlinks.	2021-03-02 13:47:27 -04:00
Joey Hess	cbf94fd13d	prep for fixing find --branch --unlocked Added LinkType to ProvidedInfo, and unified MatchingKey with ProvidedInfo. They're both used in the same way, so there was no real reason to keep separate. Note that addLocked and addUnlocked still set matchNeedsFileName, because to handle MatchingFile, they do need it. However, they don't use it when MatchingInfo is provided. This should be ok, the --branch case will be able skip checking matchNeedsFileName, since it will provide a filename in any case.	2021-03-02 13:39:31 -04:00
Joey Hess	ee4fd38ecf	remove unused contentFile = Nothing	2021-03-01 16:35:38 -04:00
Joey Hess	25e4ab7e81	Prevent combinations of options such as --all with --include Previously such nonsensical combinations always treated the matching option as if it didn't match. For now, made find --branch refuse matching options that need a filename, because one is not provided to them in a way they'll use. There's an open bug report to support it, but making it error out is better than the old behavior of not finding what it was asked to. Also, made --mimetype combined with eg --all work, by looking at the object file when operating on keys.	2021-03-01 16:25:23 -04:00
Joey Hess	eb594c710e	unregisterurl: New command Implemented by generalizing registerurl. Without the implicit batch mode of registerurl since that is only a backwards compatability thing (see commit `1d1054faa6`).	2021-03-01 14:28:24 -04:00
Joey Hess	dd39e9e255	suggest when user may want annex.stalldetection When annex.stalldetection is not enabled, and a likely stall is detected, display a suggestion to enable it. Note that the progress meter display is not taken down when displaying the message, so it will display like this: 0% 8 B 0 B/s Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection 0% 10 B 0 B/s Although of course if it's really stalled, it will never update again after the message. Taking down the progress meter and starting a new one doesn't seem too necessary given how unusual this is, also this does help show the state it was at when it stalled. Use of uninterruptibleCancel here is ok, the thread it's canceling only does STM transactions and sleeps. The annex thread that gets forked off is separate to avoid it being canceled, so that it can be joined back at the end. A module cycle required moving from dupState the precaching of the remote list. Doing it at startConcurrency should cover all the cases where the remote list is used in concurrent actions. This commit was sponsored by Kevin Mueller on Patreon.	2021-02-03 15:57:19 -04:00
Joey Hess	7db4e62a90	remove accidental duplicated code The code in Annex.WorkerStage and Annex.Concurrent was 100% identical.	2021-02-03 15:23:52 -04:00
Joey Hess	aec2cf0abe	addon commands Seems only fair, that, like git runs git-annex, git-annex runs git-annex-foo. Implementation relies on O.forwardOptions, so that any options are passed through to the addon program. Note that this includes options before the subcommand, eg: git-annex -cx=y foo Unfortunately, git-annex eats the --help/-h options. This is because it uses O.hsubparser, which injects that option into each subcommand. Seems like this should be possible to avoid somehow, to let commands display their own --help, instead of the dummy one git-annex displays. The two step searching mirrors how git works, it makes finding git-annex-foo fast when "git annex foo" is run, but will also support fuzzy matching, once findAllAddonCommands gets implemented. This commit was sponsored by Dr. Land Raider on Patreon.	2021-02-02 16:32:49 -04:00
Joey Hess	e78d2c9642	move global options handling closer to Command definitions	2021-02-02 15:55:45 -04:00
Joey Hess	c8b1fa67b4	Behavior change: --trust-glacier option no longer overrides trust Since that can lead to data loss, which should never be enabled by an option other than --force. This commit was sponsored by Jake Vosloo on Patreon.	2021-01-07 10:37:43 -04:00
Joey Hess	2bf34fc17f	Behavior change: --trust option no longer overrides trust Since that can lead to data loss, which should never be enabled by an option other than --force. I suppose that using --trust was in some situation, safer than --force, because it doesn't entirely disable checking for data loss, but only disables checking involving data that is on the specified repository. But it seems better to be able to say that data loss only happens with --force. This commit was sponsored by Graham Spencer on Patreon.	2021-01-07 10:34:57 -04:00
Joey Hess	cc89699457	mincopies This is conceptually very simple, just making a 1 that was hard coded be exposed as a config option. The hard part was plumbing all that, and dealing with complexities like reading it from git attributes at the same time that numcopies is read. Behavior change: When numcopies is set to 0, git-annex used to drop content without requiring any copies. Now to get that (highly unsafe) behavior, mincopies also needs to be set to 0. It seemed better to remove that edge case, than complicate mincopies by ignoring it when numcopies is 0. This commit was sponsored by Denis Dzyubenko on Patreon.	2021-01-06 14:15:19 -04:00
Joey Hess	a3a19518d8	fix --time-limit It got broken in several ways by the streaming seeking optimisations around version 8.20201007. Moved time limit checking out of the matcher, which was a hack in the first place. So everywhere that uses Limit.getMatcher needs to check time limit. Well, almost everywhere. Command.Info uses it, but it does not make sense to time limit getting info. And Command.MultiCast uses it just to build up a list of files that then get passed to a command, so it would never have hit the timeout in a useful way. This implementation is a little more expensive when at time limit than necessary, since it continues seeking only to discard everything after the time limit. I did try making it close the file handles to force a faster shutdown, but that didn't work and hung. Could certianly be improved somehow, but seeking is probably not the expensive bit when a time limit is hit, so this seems acceptable for now.	2021-01-04 15:57:11 -04:00
Joey Hess	53fd1564b1	improve synopsis	2020-12-17 12:51:49 -04:00
Joey Hess	5e094d02d6	avoid using MatchingKey where MatchingFile can be used now This is actually matching worktree files, and now that a Key can be provided along with the file when doing that, using MatchingFile reflects that.	2020-12-14 17:54:25 -04:00
Joey Hess	01527b21d8	add key to FileInfo MatchingKey is not the thing to use when matching on actual worktreee files. Fix reversion in 8.20201116 that made include= and exclude= in preferred/required content expressions match a path relative to the current directory, rather than the path from the top of the repository.	2020-12-14 17:42:02 -04:00
Joey Hess	677003a6df	rename helper More consistent name with TransferrerPool	2020-12-09 13:24:24 -04:00
Joey Hess	05c0543e8e	move new interface to git-annex transfer This is to avoid breakage when upgrading or downgrading git-annex with a process running that uses the interface. It's better to keep the compatability code for a few years than worry about such breakage. This commit was sponsored by Brett Eisenberg on Patreon.	2020-12-09 12:33:56 -04:00
Joey Hess	5a1e73617d	finished this stage of the RawFilePath conversion Finally compiles again, and test suite passes. This commit was sponsored by Brock Spratlen on Patreon.	2020-11-04 14:20:37 -04:00
Joey Hess	4bcb4030a5	more RawFilePath conversion 580/645 This commit was sponsored by Jack Hill on Patreon.	2020-11-03 18:34:27 -04:00
Joey Hess	87f91ce563	more RawFilePath conversion 451/645	2020-10-30 15:55:59 -04:00
Joey Hess	7036d0a4c1	add, import: Fix a reversion in 7.20191009 that broke handling of --largerthan and --smallerthan This commit was sponsored by Jochen Bartl on Patreon.	2020-10-19 15:36:18 -04:00
Joey Hess	72644d919a	fix build warning	2020-10-19 14:48:21 -04:00
Joey Hess	9a5cd96f0d	Fix a memory leak introduced in the last release The problem was this line: cleanup = and <$> sequence (map snd v) That caused all of v to be held onto until the end, when the cleanup action was run. I could not seem to find a bang pattern that avoided the leak, so I resorted to a IORef, rather clunky, but not a performance problem because it will only be written once per git ls-files, so typically just 1 time. This commit was sponsored by Mark Reidenbach on Patreon.	2020-10-13 16:31:01 -04:00
Joey Hess	00dbe35fbc	allow matching on files whose content is not present Anything that needs to examine the file content will fail to match, or fall back to other available information. But the intent is that the matcher be checked for matchNeedsFileContent and only be used if it does not, so the exact behavior doesn't much matter as it should never happen. The real point of this is to not need to provide a dummy content file when matching. This commit was sponsored by Martin D on Patreon.	2020-09-28 11:17:46 -04:00
Joey Hess	f624876dc2	remove zombie process in file seeking This was the last one marked as a zombie. There might be others I don't know about, but except for in the hypothetical case of a thread dying due to an async exception before it can wait on a process it started, I don't know of any. It would probably be safe to remove the reapZombies now, but let's wait and so that in its own commit in case it turns out to cause problems. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-09-25 11:38:42 -04:00
Joey Hess	ace02f41b0	seek: defer matcher check until more info is known Sped up seeking for files to operate on, when using options like --copies or --in, by around 20%. Benchmark showed an increase for --copies from 155 seconds to 121 seconds, and --in remote will be similar to that. For --in here, the speedup was less, 5-10% or so. (both warm cache) This commit was sponsored by Jack Hill on Patreon.	2020-09-24 17:59:12 -04:00
Joey Hess	3457b526ef	make git-annex add --no-check-gitignore not skip ignored files, same as with --force	2020-09-18 13:33:35 -04:00
Joey Hess	b6642dde8a	avoid interleaving command stages with Concurrency 1 Before, -J1 was different than no -J: It makes concurrent-output be used for display, and it actually can run concurrent jobs in some situations. Eg, a perform stage and a cleanup stage can both run. dupState is used in several places, and changes NonConcurrent to Concurrent 1. My concern is that this might, in some case, enable that concurrent behavior. And in particular, that it might get enabled in --batch mode, when the user is not expecting concurrent output because they did not pass -J. While I don't have a test case where that happens and causes out of order output, it looks like it could, and so prudent to make this change.	2020-09-16 12:10:45 -04:00
Joey Hess	877ef84a1b	support --batch -J --batch combined with -J now runs batch requests concurrently for many commands. Before, the combination was accepted, but did not enable concurrency. Since the output of batch requests can be in any order, --json with the new "input" field is recommended to be used, to determine which batch request each response corresponds to. If --json is not used, batch mode still runs concurrently, using the usual concurrent-output. That will not be very useful for most batch mode users, probably, but who knows. If a program was using --batch -J before, and was parsing non-json output, this could break it. But, it was relying on git-annex not supporting concurrency despite it being enabled, so it should have expected concurrent output. So, I think that's ok. annex.jobs does not enable concurrency in --batch mode, because that would confuse programs that use --batch but don't expect concurrency.	2020-09-16 12:10:37 -04:00
Joey Hess	77c42782d0	differentiate between concurrency enabled at command line and by git config The latter should not affect --batch mode.	2020-09-16 11:47:12 -04:00
Joey Hess	7a4f3ff345	remove callCommandAction' This is prep for making batchCommandAction use commandAction, which will enable concurrency for batch mode. Since commandAction can't return anything, have to handle the case of a CommandStart that chooses to do nothing in a different way.	2020-09-16 10:53:16 -04:00
Joey Hess	7b4c7a7ffc	refactor batchStart folded into only caller	2020-09-16 10:18:36 -04:00
Joey Hess	3a05d53761	add SeekInput (not yet used) No behavior changes (hopefully), just adding SeekInput and plumbing it through to the JSON display code for later use. Over the course of 2 grueling days. withFilesNotInGit reimplemented in terms of seekHelper should be the only possible behavior change. It seems to test as behaving the same. Note that seekHelper dummies up the SeekInput in the case where segmentPaths' gives up on sorting the expanded paths because there are too many input paths. When SeekInput later gets exposed as a json field, that will result in it being a little bit wrong in the case where 100 or more paths are passed to a git-annex command. I think this is a subtle enough problem to not matter. If it does turn out to be a problem, fixing it would require splitting up the input parameters into groups of < 100, which would make git ls-files run perhaps more than is necessary. May want to revisit this, because that fix seems fairly low-impact.	2020-09-15 15:41:13 -04:00
Joey Hess	4c58433c48	avoid using MonadFail in ParseDuration There's no instance for Either String, so that makes it not as useful as it could be, so instead just return an Either String.	2020-08-15 15:53:35 -04:00
Joey Hess	506ffea5e6	stop symlink check once the top of the working tree is reached Avoid complaining that a file with "is beyond a symbolic link" when the filepath is absolute and the symlink in question is not actually inside the git repository. This assumes that inodes remain stable while the command is running. I think they always will, the filesystems where they are unstable change them across mounts. (If inodes were not stable, it would just complain about symlinks in the path that are not inside the working tree.) (On windows, I don't want to assume anything about inodes, they could be random numbers for all I know. But if they were, this would still be ok, as long as windows doesn't have symlinks that are detected by isSymbolicLink. Which seems a fair bet.)	2020-08-06 20:14:30 -04:00
Joey Hess	5d380c6c5c	when workTreeItems finds a problem with a parameter, don't go on to process it Part of workTreeItems is trying detect a case where git porcelain refuses to process a file, and where git ls-files silently outputs nothing. But, it's hard to perfectly replicate git's behavior, and besides, git's behavior could change. So it could be that we warn, but then git ls-files does not skip over it, and so git-annex also processes it after warning about it. So, if we think we have a problem with a parameter, display the warning, and skip processing it at all. Implementing this was complicated by needing to handle the case where all command-line parameters get filtered out this way. Which is different than the case where there are none, because we don't want to operate on all files in this new case..	2020-08-06 13:47:45 -04:00
Joey Hess	f75be32166	external backends wip It's able to start them up, the only thing not implemented is generating and verifying keys. And, the key translation for HasExt.	2020-07-29 15:23:18 -04:00
Joey Hess	00865cdae8	Fix a bug in find --branch in the previous version inAnnex check was lost for that code path. To avoid more such mistakes, made withKeyOptions check it when the AnnexedFileSeeker specifies.	2020-07-24 12:05:28 -04:00
Joey Hess	1be92381ec	unify batch mode with non-batch by using AnnexedFileSeeker	2020-07-22 14:23:28 -04:00
Joey Hess	889603336a	fix reversion in skipping deleted files And add a test case for that. This certianly loses some of the 2x performance improvement in file seeking that seekFilteredKeys led to, because now it has to stat the worktree files again. Without benchmarking, I expect there will still be a sizable improvement, and also the git-annex branch precaching that seekFilteredKeys can do will still be a win of its approach. Also worth noting that lookupKey, when the file DNE, check if it's in an adjusted branch with hidden files, and if so, finds the key for the file anyway. That was intended to make git-annex sync --content be able to process those files, but a side effect was that, when a file was deleted but the deletion not yet staged, git-annex commands used to still list it. That was actually a bug. This commit fixes that bug too. (git-annex sync --content on such a branch does not use seekFilteredKeys so was not affected by the reversion or by this behavior change) This commit was sponsored by Jake Vosloo on Patreon.	2020-07-19 21:25:01 -04:00
Joey Hess	5dbb2924bb	remove unused function	2020-07-19 20:16:24 -04:00
Joey Hess	535cdc8d48	importfeed: Made checking known urls step around 10% faster. This was a bit disappointing, I was hoping for a 2x speedup. But, I think the metadata lookup is wasting a lot of time and also needs to be made to stream. The changes to catObjectStreamLsTree were benchmarked to not also speed up --all around 3% more. Seems I managed to make it polymorphic after all.	2020-07-14 12:47:51 -04:00
Joey Hess	75aab72d23	mostly done with location log precaching Some nice wins.	2020-07-13 17:04:02 -04:00
Joey Hess	a290792a4f	speed up seeking pointer files This solves the same problem as commit `b4d0f6dfc2` but in a better way, that should make processing pointer files maximally fast. If there is a mixture of pointer files and symlinks, the first symlinks until the pointer file are handled maximally fast, while the ones after that go via the slightly slower path.	2020-07-13 14:25:07 -04:00
Joey Hess	918b1faa3d	avoid hanging on exception	2020-07-13 12:36:15 -04:00
Joey Hess	88a7fb5cbb	convert all applicable commands to new 2x faster annexed file seeking This removes all calls to inAnnex, except for some involving --batch. It may be that the batch code could get a similar speedup, but I don't know if people habitually pass a huge number of files through --batch that git-annex does not need to do anything to process, so I skipped it for now. A few calls to ifAnnexed remain, and might be worth doing more to convert. In particular, Command.Sync has one that would probably speed it up by a good amount. (also removed some dead code from Command.Lock)	2020-07-10 15:45:38 -04:00
Joey Hess	b4d0f6dfc2	slower but sequential filtering of large files from pointer files There should still be a speedup seeking over pointer files, just not as large as the one seeking over symlinks.	2020-07-10 15:21:58 -04:00
Joey Hess	0f6b1ee048	check pointer file size This is all good, except for one small problem... When a pointer file has to be fed into the metadata cat-file, it's possible for a non-pointer file that comes after it to get fed into the main cat-file first, so the two files will be processed in a different order than the user specified. So, while this is the fast way, I guess I'll have to change it to be slower, but sequential..	2020-07-10 15:11:14 -04:00
Joey Hess	7a42a47902	renaming	2020-07-10 14:17:35 -04:00
Joey Hess	4c9ad1de46	optimisation: stream keys through git cat-file --buffer This is only implemented for git-annex get so far. It makes git-annex get nearly twice as fast in a repo with 10k files, all of them present! But, see the TODO for some caveats.	2020-07-10 13:54:52 -04:00
Joey Hess	d08c178f97	avoid catObjectStream skipping over unavailable shas Not needed as it's used for --all, but will be needed later.	2020-07-08 13:57:17 -04:00
Joey Hess	d010ab04be	sped up the --all option by 2x to 16x by using git cat-file --buffer This assumes that no location log files will have a newline or carriage return in their name. catObjectStream skips any such files due to cat-file not supporting them. Keys have been prevented from containing newlines since 2011, commit `480495beb4`. If some old repo had a key with a newline in it, --all will just skip processing that key. Other things, like .git/annex/unused files certianly assume no newlines in keys too, and AFAICR, such keys never actually worked. Carriage return is escaped by preSanitizeKeyName since 2013. WORM keys generated before that point could perhaps contain a CR. (URL probably not, http probably doesn't support an URL with a raw CR in it.) So, added a warning in fsck about such keys. Although, fsck --all will naturally skip them, so won't be able to warn about them. Not entirely satisfactory, but I'll bet there are not really any such keys in existence. Thanks to Lukey for finding this optimisation.	2020-07-07 13:54:04 -04:00
Joey Hess	e72ec8b9b2	add back git-annex branch read cache The cache was removed way back in 2012, commit `3417c55189` Then I forgot I had removed it! I remember clearly multiple times when I thought, "this reads the same data twice, but the cache will avoid that being very expensive". The reason it was removed was it messed up the assistant noticing when other processes made changes. That same kind of problem has recently been addressed when adding the optimisation to avoid reading the journal unnecessarily. Indeed, enableInteractiveJournalAccess is run in just the right places, so can just piggyback on it to know when it's not safe to use the cache.	2020-07-06 12:22:33 -04:00
Joey Hess	f912f8e5fd	refix bug in a better way Always run Git.Config.store, so when the git config gets reloaded, the override gets re-added to it, and changeGitRepo then calls extractGitConfig on it and sees the annex.* settings from the override. Remove any prior occurance of -c v and add it to the end. This way, -c foo=1 -c foo=2 -c foo=1 will pass -c foo=1 to git, rather than -c foo=2 Note that, if git had some multiline config that got built up by multiple -c's, this would not work still. But it never worked because before the bug got fixed in the first place, the -c value was repeated many times, so the multivalue thing would have been wrong. I don't think -c can be used with multiline configs anyway, though git-config does talk about them?	2020-07-02 13:32:33 -04:00
Joey Hess	ec0f8a6e74	Fix reversion that broke passing git configs with -c Reverting commit `c8fec6ab0`	2020-07-02 12:42:13 -04:00
Joey Hess	89b2542d3c	annex.skipunknown with transition plan Added annex.skipunknown git config, that can be set to false to change the behavior of commands like `git annex get foo*`, to not skip over files/dirs that are not checked into git and are explicitly listed in the command line. Significant complexity was needed to handle git-annex add, which uses some git ls-files calls, but needs to not use --error-unmatch because of course the files are not known to git. annex.skipunknown is planned to change to default to false in a git-annex release in early 2022. There's a todo for that.	2020-05-28 15:55:17 -04:00
Joey Hess	864ba4ecaa	disable buggy concurrency in Command.Export Fix a crash or potentially not all files being exported when sync -J --content is used with an export remote. Crash as described in fixed bug report. waitForAllRunningCommandActions inserted in several points where all the commandActions started before need to have finished before moving on to the next stage of the export. A race across those points could have maybe resulted in not all files being exported, or a wrong tree being export. For example, changeExport starting up an action like a rename of A to B. Then, with that action still running, fillExport uploading a new A, before the rename occurred. That race seems unlikely to have happened. There are some other ones that this also fixes.	2020-05-26 13:54:08 -04:00
Joey Hess	2a8fdfc7d8	Display a warning message when asked to operate on a file inside a directory that's a symbolic link to elsewhere This relicates git's behavior. It adds a few stat calls for the command line parameters, so there is some minor slowdown, but even with thousands of parameters it will not be very noticable, and git does the same statting in similar circumstances. Note that this does not prevent eg "git annex add symlink"; the symlink will be added to git as usual. And "git annex find symlink" will silently list nothing as well. It's only "symlink/foo" or "subdir/symlink/foo" that triggers the warning.	2020-05-11 15:03:35 -04:00
Joey Hess	cee6b344b4	cat-file resource pool Avoid running a large number of git cat-file child processes when run with a large -J value. This implementation takes care to avoid adding any overhead to git-annex when run without -J. When run with -J, there is a small bit of added overhead, to manipulate the resource pool. That optimisation added a fair bit of complexity.	2020-04-20 15:19:31 -04:00
Joey Hess	fada5c120c	remove unused import	2020-04-17 15:19:49 -04:00
Joey Hess	fe9cf1256e	move remoteList into dupState This does mean that RemoteDaemon.Transport.Tor's call runs it, otherwise no change, but this is groundwork for doing more such expensive actions in dupState.	2020-04-17 14:36:45 -04:00
Joey Hess	957a87b437	fix absolute filenames fed into --batch and git-annex info	2020-04-15 16:04:05 -04:00
Joey Hess	ca9c6c5f60	Fix a potential failure to parse git config Git has an obnoxious special case in git config, a line "foo" is the same as "foo = true". That means there is no way to examine the output of git config and tell if it was run with --null or not, since a "foo" in the first line could be such a boolean, or could be followed by its value on the next line if --null were used. So, rather than trying to do such a detection, track the style of config at all the points where it's generated.	2020-04-13 13:05:41 -04:00
Joey Hess	aeca7c2207	Sped up query commands that read the git-annex branch by around 5% The only price paid is one additional MVar read per write to the journal. Presumably writing a journal file dominiates over a MVar read time by several orders of magnitude. --batch does not get the speedup because then it needs to notice when another process has made a change. Also made the assistant and other damon modes bypass the optimisation, which would not help them anyway.	2020-04-09 13:54:43 -04:00
Joey Hess	c8fec6ab03	Fix a minor bug that caused options provided with -c to be passed multiple times to git.	2020-03-16 13:06:44 -04:00
Joey Hess	f6d629e483	changelog and minor style	2020-02-28 12:57:55 -04:00
Peter Simons	73cf523a4b	Fix build with ghc-8.8.x. The 'fail' method has been moved to the 'MonadFail' class. I made the changes so that the code still compiles with previous versions of 'base' that don't have the new MonadFail class exported by Prelude yet.	2020-02-28 12:54:20 -04:00
Joey Hess	c78b9b55b6	rename changeGitConfig to overrideGitConfig and avoid unncessary calls It's important that it be clear that it overrides a config, such that reloading the git config won't change it, and in particular, setConfig won't change it. Most of the calls to changeGitConfig were actually after setConfig, which was redundant and unncessary. So removed those. The only remaining one, besides --debug, is in the handling of repository-global config values. That one's ok, because the way mergeGitConfig is implemented, it does not override any value that is set in git config. If a value with a repo-global setting was passed to setConfig, it would set it in the git config, reload the git config, re-apply mergeGitConfig, and use the newly set value, which is the right thing.	2020-02-27 01:11:53 -04:00
Joey Hess	029c883713	Merge branch 'master' into v8	2020-02-19 14:32:11 -04:00
Joey Hess	aa949bbb7d	initremote --describe-other-params Does not yet include descriptions from external special remote programs.	2020-01-20 16:05:51 -04:00
Joey Hess	3cd3757236	annex.dotfiles The git add behavior changes could be avoided if it turns out to be really annoying, but then it would need to behave the old way when annex.dotfiles=false and the new way when annex.dotfiles=true. I'd rather not have the config option result in such divergent behavior as `git annex add .` skipping a dotfile (old) vs adding to annex (new). Note that the assistant always adds dotfiles to the annex. This is surprising, but not new behavior. Might be worth making it also honor annex.dotfiles, but I wonder if perhaps some user somewhere uses it and keeps large files in a directory that happens to begin with a dot. Since dotfiles and dotdirs are a unix culture thing, and the assistant users may not be part of that culture, it seems best to keep its current behavior for now.	2019-12-26 16:33:39 -04:00
Joey Hess	7d9dff5b05	Merge branch 'master' into bs and update changelog	2019-12-18 15:13:30 -04:00
Joey Hess	7fd5376334	inprogress: Support --key	2019-12-18 14:14:16 -04:00

1 2 3 4 5 ...

416 commits