git-annex

Author	SHA1	Message	Date
Joey Hess	a325524454	--json-exceptions Added a --json-exceptions option, which makes some exceptions be output in json. The distinction is that --json-error-messages is for messages relating to a particular ActionItem, while --json-exceptions is for messages that are not, eg ones for a file that does not exist. It's unfortunate that we need two switches with such a fine distinction between them, but I'm worried about maintaining backwards compatability in the json output, to avoid breaking anything that parses it, and this was the way to make sure I didn't. toplevelWarning is generally used for the latter kind of message. And the other calls to toplevelWarning could be converted to showException. The only possible gotcha is that if toplevelWarning is ever called after starting acting on a file, it will add to the --json-error-messages of the json displayed for that file and converting to showException would be a behavior change. That seems unlikely, but I didn't convery everything to avoid needing to satisfy myself it was not a concern. Sponsored-by: Dartmouth College's Datalad project	2023-04-25 17:05:33 -04:00
Joey Hess	9155ed1072	configremote New command, currently limited to changing autoenable= setting of a special remote. It will probably never be used for more than that given the limitations on it. Sponsored-by: Brock Spratlen on Patreon	2023-04-18 15:30:49 -04:00
Joey Hess	3290a09a70	filter out control characters in warning messages Converted warning and similar to use StringContainingQuotedPath. Most warnings are static strings, some do refer to filepaths that need to be quoted, and others don't need quoting. Note that, since quote filters out control characters of even UnquotedString, this makes all warnings safe, even when an attacker sneaks in a control character in some other way. When json is being output, no quoting is done, since json gets its own quoting. This does, as a side effect, make warning messages in json output not be indented. The indentation is only needed to offset warning messages underneath the display of the file they apply to, so that's ok. Sponsored-by: Brett Eisenberg on Patreon	2023-04-10 15:55:44 -04:00
Joey Hess	cd544e548b	filter out control characters in error messages giveup changed to filter out control characters. (It is too low level to make it use StringContainingQuotedPath.) error still does not, but it should only be used for internal errors, where the message is not attacker-controlled. Changed a lot of existing error to giveup when it is not strictly an internal error. Of course, other exceptions can still be thrown, either by code in git-annex, or a library, that include some attacker-controlled value. This does not guard against those. Sponsored-by: Noam Kremen on Patreon	2023-04-10 13:50:51 -04:00
Joey Hess	2b940f7725	registerurl, unregisterurl: Added --remote option This serves two purposes. --remote=web bypasses other special remotes that claim the url, same as addurl --raw. And, specifying some other remote allows making sure that an url is claimed by the remote you expect, which makes then using setpresentkey not be fragile. Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-04-05 15:54:41 -04:00
Yaroslav Halchenko	84b0a3707a	Apply codespell -w throughout	2023-03-17 15:14:58 -04:00
Joey Hess	54ad1b4cfb	Windows: Support long filenames in more (possibly all) of the code Works around this bug in unix-compat: https://github.com/jacobstanley/unix-compat/issues/56 getFileStatus and other FilePath using functions in unix-compat do not do UNC conversion on Windows. Made Utility.RawFilePath use convertToWindowsNativeNamespace to do the necessary conversion on windows to support long filenames. Audited all imports of System.PosixCompat.Files to make sure that no functions that operate on FilePath were imported from it. Instead, use the equvilants from Utility.RawFilePath. In particular the re-export of that module in Common had to be removed, which led to lots of other changes throughout the code. The changes to Build.Configure, Build.DesktopFile, and Build.TestConfig make Utility.Directory not be needed to build setup. And so let it use Utility.RawFilePath, which depends on unix, which cannot be in setup-depends. Sponsored-by: Dartmouth College's Datalad project	2023-03-01 15:55:58 -04:00
Joey Hess	c209e0f643	add FIELD?=GLOB to git-annex view usage And also to vadd usage. Also added some other things to the usage that were omitted before to save space. Adding even FIELD?=GLOB made the git-annex --help list of commands grow too wide for an 80 column display. So, removed the description of parameters from that list of commands. Sponsored-By: Brock Spratlen on Patreon	2023-02-07 18:09:10 -04:00
Joey Hess	a6c1d9752b	move/copy: option parsing for --from with --to Allowing --from and --to as an alternative to --from or --to is hard to do with optparse-applicative! The obvious approach of (pfrom <\|> pto <\|> pfromandto) does not work when pfromandto uses the same option names as pfrom and pto do. It compiles but the generated parser does not work for all desired combinations. Instead, have to parse optionally from and optionally to. When neither is provided, the parser succeeds, but it's a result that can't be handled. So, have to giveup after option parsing. There does not seem to be a way to make an optparse-applicative Parser give up internally either. Also, need seek' because I first tried making fto be a where binding, but that resulted in a hang when git-annex move was run without --from or --to. I think because startConcurrency was not expecting the stages value to contain an exception and so ended up blocking. Sponsored-by: Dartmouth College's DANDI project	2023-01-18 14:42:39 -04:00
Joey Hess	f8bc208e89	findkeys: New command, very similar to git-annex find but operating on keys I've long been asked for `git-annex find --all` or something like that, but pushed back on it because I feel that the command is analagous to find(1) and so it would be surprising for it to list keys rather than files. So instead, add a new findkeys subcommand. Note that the use of withKeyOptions is rather strange because usually that is used to fall back to --all rather than listing files, but here it's made to default to --all like behavior and never list files. A performance thing that could be improved is that withKeyOptions always reads and caches location logs. But findkeys with no options does not need them, so it could be made faster. That caching does speed up options like --in though. This is really just a subset of a more general performance thing that --all reads location logs sometimes unncessarily. Anyway, it needs to read the location log in order to checkDead, and it seems good that findkeys does skip dead keys. Also, cleaned up comments on git-annex-find man page asking for --all option. Sponsored-by: Dartmouth College's DANDI project	2023-01-17 14:51:57 -04:00
Joey Hess	a522a41a42	fix misleading helper function name noworktreeitems was false for NoWorkTreeItems which was hard to understand. Sponsored-by: Dartmouth College's DANDI project	2023-01-17 14:25:38 -04:00
Joey Hess	0b2dd374d8	--anything and --nothing Added --anything (and --nothing). Eg, git-annex find --anything will list all annexed files whether or not the content is present. This is slightly faster and clearer than --include=* or --exclude=* While I can't imagine how --nothing will be used, preferred content expressions already had anything and nothing, so might as well support both as matching options as well. Sponsored-by: Dartmouth College's Datalad project	2022-12-20 15:44:09 -04:00
Joey Hess	731e806c96	use lookupKeyStaged in --batch code paths Make --batch mode handle unstaged annexed files consistently whether the file is unlocked or not. Before this, a unstaged locked file would have the symlink on disk examined and operated on in --batch mode, while an unstaged unlocked file would be skipped. Note that, when not in batch mode, unstaged files are skipped over too. That is actually somewhat new behavior; as late as 7.20191114 a command like `git-annex whereis .` would operate on unstaged locked files and skip over unstaged unlocked files. That changed during optimisation of CmdLine.Seek with apparently little fanfare or notice. Turns out that rmurl still behaved that way when given an unstaged file on the command line. It was changed to use lookupKeyStaged to handle its --batch mode. That also affected its non-batch mode, but since that's just catching up to the change earlier made to most other commands, I have not mentioed that in the changelog. It may be that other uses of lookupKey should also change to lookupKeyStaged. But it may also be that would slow down some things, or lead to unwanted behavior changes, so I've kept the changes minimal for now. An example of a place where the use of lookupKey is better than lookupKeyStaged is in Command.AddUrl, where it looks to see if the file already exists, and adds the url to the file when so. It does not matter there whether the file is staged or not (when it's locked). The use of lookupKey in Command.Unused likewise seems good (and faster). Sponsored-by: Nicholas Golder-Manning on Patreon	2022-10-26 14:43:06 -04:00
Joey Hess	b2ee2496ee	remove whenAnnexed and ifAnnexed In preparation for adding a new variation on lookupKey. Sponsored-by: Max Thoursie on Patreon	2022-10-26 14:06:32 -04:00
Joey Hess	ba7ecbc6a9	avoid flushing keys db queue after each Annex action The flush was only done Annex.run' to make sure that the queue was flushed before git-annex exits. But, doing it there means that as soon as one change gets queued, it gets flushed soon after, which contributes to excessive writes to the database, slowing git-annex down. (This does not yet speed git-annex up, but it is a stepping stone to doing so.) Database queues do not autoflush when garbage collected, so have to be flushed explicitly. I don't think it's possible to make them autoflush (except perhaps if git-annex sqitched to using ResourceT..). The comment in Database.Keys.closeDb used to be accurate, since the automatic flushing did mean that all writes reached the database even when closeDb was not called. But now, closeDb or flushDb needs to be called before stopping using an Annex state. So, removed that comment. In Remote.Git, change to using quiesce everywhere that it used to use stopCoProcesses. This means that uses on onLocal in there are just as slow as before. I considered only calling closeDb on the local git remotes when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses in each onLocal is so as not to leave git processes running that have files open on the remote repo, when it's on removable media. So, it seemed to make sense to also closeDb after each one, since sqlite may also keep files open. Although that has not seemed to cause problems with removable media so far. It was also just easier to quiesce in each onLocal than once at the end. This does likely leave performance on the floor, so could be revisited. In Annex.Content.saveState, there was no reason to close the db, flushing it is enough. The rest of the changes are from auditing for Annex.new, and making sure that quiesce is called, after any action that might possibly need it. After that audit, I'm pretty sure that the change to Annex.run' is safe. The only concern might be that this does let more changes get queued for write to the db, and if git-annex is interrupted, those will be lost. But interrupting git-annex can obviously already prevent it from writing the most recent change to the db, so it must recover from such lost data... right? Sponsored-by: Dartmouth College's Datalad project	2022-10-12 14:12:23 -04:00
Joey Hess	85dbc21c1c	fix typo	2022-10-07 12:30:07 -04:00
Joey Hess	70d2ece381	improve usage These commands operate on not only remotes, but any way a repository can be specified, including "here" etc. Sponsored-by: Graham Spencer on Patreon	2022-10-03 13:49:42 -04:00
Joey Hess	2478e9e03a	restage: New git-annex command, handles restaging unlocked files This is much easier and less failure-prone than having the user run git update-index --refresh themselves. Sponsored-by: Dartmouth College's DANDI project	2022-09-23 16:29:59 -04:00
Joey Hess	66bd4f80b3	Improved handling of --time-limit when combined with -J When concurrency is enabled, there can be worker threads still running when the time limit is checked. Exiting right there does not give those threads time to finish what they're doing. Instead, the seeking is wrapped up, and git-annex then shuts down cleanly. The whole point of --time-limit existing, rather than using timeout(1) when running git-annex is to let git-annex finish the action(s) it is working on when the time limit is reached, and shut down cleanly. I noticed this problem when investigating why restagePointerFile might not have run after get/drop of an unlocked file. With --time-limit -J, a worker thread may have finished updating a work tree file, and be killed by the time limit check before it can run restagePointerFile. So despite --time-limit running the shutdown actions, the work tree file didn't get restaged. Sponsored-by: Dartmouth College's DANDI project	2022-09-22 12:54:52 -04:00
Joey Hess	eefc026370	fix reversion on skipping dead keys in --all/bare Fix a reversion that made dead keys not be skipped when operating on all keys via --all or in a bare repo. (Introduced in version 8.20200720) Also, improved the documentation of git-annex-dead, it does not only apply to fsck --all. Also, made git-annex fsck, when run on a file whose key is dead, display that. Before, it displayed that only when run with --all, but with this fix, it skips dead keys with --all. But it can still be run on a file that uses a dead key, and displaying "This key is dead" explains to the user why it does not consider missing content for it to be a problem. Sponsored-by: k0ld on Patreon	2022-09-13 14:38:13 -04:00
Joey Hess	8a4cfd4f2d	use getSymbolicLinkStatus not getFileStatus to avoid crash on broken symlink Fix crash importing from a directory special remote that contains a broken symlink. The crash was in listImportableContentsM but some other places in Remote.Directory also seemed like they could have the same problem. Also audited for other places that have such a problem. Not all calls to getFileStatus are bad, in some cases it's better to crash on something unexpected. For example, `git-annex import path` when the path is a broken symlink should crash, the same as when it does not exist. Many of the getFileStatus calls are like that, particularly when they involve .git/annex/objects which should never have a broken symlink in it. Fixed a few other possible cases of the problem. Sponsored-by: Lawrence Brogan on Patreon	2022-09-05 13:46:32 -04:00
Joey Hess	ed39979ac8	import: Avoid following symbolic links inside directories being imported Too big a footgun. This does not prevent attackers who can write to the directory being imported from racing the check. But they can cause anything to be imported anyway, so would be limited to getting the legacy import to follow into a directory they do not write to, and move files out of it into the annex. (The directory special remote does not have that problem since it does not move files.) Sponsored-by: Jack Hill on Patreon	2022-08-19 13:31:16 -04:00
Joey Hess	3a513cfe73	add --dry-run: New option This is intended for users who want to see what it would output in order to eg, check if a file would be added to git or the annex. It is not intended as a way for scripts to get information. Sponsored-by: Dartmouth College's Datalad project	2022-08-03 11:16:04 -04:00
Joey Hess	be19a68276	new matching options --want-get-by and --want-drop-by Sponsored-by: Graham Spencer on Patreon	2022-07-28 13:26:03 -04:00
Joey Hess	b223988e22	remove --backend from global options --backend is no longer a global option, and is only accepted by commands that actually need it. Three commands that used to support backend but don't any longer are watch, webapp, and assistant. It would be possible to make them support it, but I doubt anyone used the option with these. And in the case of webapp and assistant, the option was handled inconsistently, only taking affect when the command is run with an existing git-annex repo, not when it creates a new one. Also, renamed GlobalOption etc to AnnexOption. Because there are many options of this type that are not actually global (any more) and get added to commands that need them. Sponsored-by: Kevin Mueller on Patreon	2022-06-29 13:33:25 -04:00
Joey Hess	8040ecf9b8	final readonly values moves to AnnexRead At this point I've checked all AnnexState values and these were all that remained that could move. Pity that Annex.repo can't move, but it gets modified sometimes.. A couple of AnnexState values are set by options and could be AnnexRead, but happen to use Annex when being set. Sponsored-by: Max Thoursie on Patreon	2022-06-28 16:04:58 -04:00
Joey Hess	cb9cf30c48	move several readonly values to AnnexRead This improves performance to a small extent in several places. Sponsored-by: Tobias Ammann on Patreon	2022-06-28 15:40:19 -04:00
Joey Hess	d266a41f8d	prevent numcopies or mincopies being configured to 0 Ignore annex.numcopies set to 0 in gitattributes or git config, or by git-annex numcopies or by --numcopies, since that configuration would make git-annex easily lose data. Same for mincopies. This is a continuation of the work to make data only be able to be lost when --force is used. It earlier led to the --trust option being disabled, and similar reasoning applies here. Most numcopies configs had docs that strongly discouraged setting it to 0 anyway. And I can't imagine a use case for setting to 0. Not that there might not be one, but it's just so far from the intended use case of git-annex, of managing and storing your data, that it does not seem like it makes sense to cater to such a hypothetical use case, where any git-annex drop can lose your data at any time. Using a smart constructor makes sure every place avoids 0. Note that this does mean that NumCopies is for the configured desired values, and not the actual existing number of copies, which of course can be 0. The name configuredNumCopies is used to make that clear. Sponsored-by: Brock Spratlen on Patreon	2022-03-28 15:20:34 -04:00
Joey Hess	ce91f10132	fix annex.skipunknown false error propagation Propagate nonzero exit status from git ls-files when a specified file does not exist, or a specified directory does not contain any files checked into git. The recent completion of the annex.skipunknown transition exposed this bug, that has unfortunately been lurking all along. It is also possible that git ls-files errors out for some other reason -- perhaps a permission problem -- and this will also fix error propagation in such situations. Sponsored-by: Dartmouth College's Datalad project	2022-02-28 12:54:56 -04:00
Joey Hess	5b373a9dd2	read a consistent amount from pointer file A few places were reading the max symlink size of a pointer file, then passing tp parseLinkTargetOrPointer. Which is fine currently, but to support pointer files with lines of data after the pointer, enough has to be read that parseLinkTargetOrPointer can be assured of seeing enough of that data to know if it's correctly formatted. Sponsored-by: Dartmouth College's Datalad project	2022-02-23 12:52:34 -04:00
Joey Hess	835c50966a	reject batch options combined with non-batch options Reject combinations of --batch (or --batch-keys) with options like --all or --key or with filenames. Most commands ignored the non-batch items when batch mode was enabled. For some reason, addurl and dropkey both processed first the specified non-batch items, followed by entering batch mode. Changed them to also error out, for consistency. Sponsored-by: Dartmouth College's Datalad project	2022-01-26 13:00:19 -04:00
Joey Hess	7f6b2ca49c	handle overBranchFileContents with read-only unmerged git-annex branches This makes --all error out in that situation. Which is better than ignoring information from the branches. To really handle the branches right, overBranchFileContents would need to both query all the branches and union merge file contents (or perhaps not provide any file content), as well as diffing between branches to find files that are only present in the unmerged branches. And also, it would need to handle transitions.. Sponsored-by: Dartmouth College's Datalad project	2021-12-27 14:30:51 -04:00
Joey Hess	d9d0fe5fa4	disable precaching git-annex branch when there are unmerged branches in a read-only repo The way precaching works, it can't merge in information from those branches efficiently, so just disable it and fall back to Annex.Branch.get in order to get the correct information. Sponsored-by: Dartmouth College's Datalad project	2021-12-27 14:08:50 -04:00
Joey Hess	68257e9076	add git-annex filter-process filter-process: New command that can make git add/checkout faster when there are a lot of unlocked annexed files or non-annexed files, but that also makes git add of large annexed files slower. Use it by running: git config filter.annex.process 'git-annex filter-process' Fully tested and working, but I have not benchmarked it at all. And, incremental hashing is not done when git add uses it, so extra work is done in that case. Sponsored-by: Mark Reidenbach on Patreon	2021-11-04 15:02:36 -04:00
Joey Hess	afe5d9e137	fix duplicated commands in git-annex-shell usage display	2021-10-11 15:42:27 -04:00
Joey Hess	7bdc7350a5	remove git-annex-shell compat code * Removed support for accessing git remotes that use versions of git-annex older than 6.20180312. * git-annex-shell: Removed several commands that were only needed to support git-annex versions older than 6.20180312. (lockcontent, recvkey, sendkey, transferinfo, commit) The P2P protocol was added in that version, and used ever since, so this code was only needed for interop with older versions. "git-annex-shell commit" is used by newer git-annex versions, though unnecessarily so, because the p2pstdio command makes a single commit at shutdown. Luckily, it was run with stderr and stdout sent to /dev/null, and non-zero exit status or other exceptions are caught and ignored. So, that was able to be removed from git-annex-shell too. git-annex-shell inannex, recvkey, sendkey, and dropkey are still used by gcrypt special remotes accessed over ssh, so those had to be kept. It would probably be possible to convert that to using the P2P protocol, but it would be another multi-year transition. Some git-annex-shell fields were able to be removed. I hoped to remove all of them, and the very concept of them, but unfortunately autoinit is used by git-annex sync, and gcrypt uses remoteuuid. The main win here is really in Remote.Git, removing piles of hairy fallback code. Sponsored-by: Luke Shumaker	2021-10-11 15:36:51 -04:00
Joey Hess	ab7b5a492c	--batch-keys New --batch-keys option added to these commands: get, drop, move, copy, whereis git-annex-matching-options had to be reworded since some of its options can be used to match on keys, not only files. Sponsored-by: Luke Shumaker on Patreon	2021-08-25 14:21:12 -04:00
Joey Hess	fa62c98910	simplify and speed up Utility.FileSystemEncoding This eliminates the distinction between decodeBS and decodeBS', encodeBS and encodeBS', etc. The old implementation truncated at NUL, and the primed versions had to do extra work to avoid that problem. The new implementation does not truncate at NUL, and is also a lot faster. (Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the primed versions.) Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation, and upgrading to it will speed up to/fromRawFilePath. AFAIK, nothing relied on the old behavior of truncating at NUL. Some code used the faster versions in places where I was sure there would not be a NUL. So this change is unlikely to break anything. Also, moved s2w8 and w82s out of the module, as they do not involve filesystem encoding really. Sponsored-by: Shae Erisson on Patreon	2021-08-11 12:13:31 -04:00
Joey Hess	47d3dccf19	whereused implemented except --historical Sponsored-by: Jack Hill on Patreon	2021-07-14 14:27:21 -04:00
Joey Hess	8a13bbedd6	--size-limit exit 101 Sponsored-by: Mark Reidenbach on Patreon	2021-06-04 16:43:47 -04:00
Joey Hess	771a122c9e	add --size-limit option When this option is not used, there should be effectively no added overhead, thanks to the optimisation in `b3cd0cc6ba`. When an action fails on a file, the size of the file still counts toward the size limit. This was necessary to support concurrency, but also generally seems like the right choice. Most commands that operate on annexed files support the option. export and import do not, and I don't know if it would make sense for export to.. Why would you want an incomplete export? sync doesn't, and while it would be easy to make it support it for transferring files, it's not clear if dropping files should also take the size limit into account. Commands like add that don't operate on annexed files don't support the option either. Exiting 101 not yet implemented. Sponsored-by: Denis Dzyubenko on Patreon	2021-06-04 16:16:53 -04:00
Joey Hess	b3cd0cc6ba	minor optimisation Avoid a second mvar access. Sponsored-by: Jochen Bartl on Patreon	2021-06-04 14:57:19 -04:00
Joey Hess	b5f5475ed6	New matching options --excludesamecontent and --includesamecontent The normalisation of filenames turns out to be the tricky part here, because the associated files coming out of the keys db may look like "./foo/bar" or "../bar". For the former to match a glob like "foo/", it needs to be normalised. Note that, on windows, normalise "./foo/bar" = "foo\\bar" which a glob like "foo/" won't match. So the glob is matched a second time, on the toInternalGitPath, so allowing the user to provide a glob with the slashes in either direction. However, this still won't support some wacky edge cases like the user providing a glob of "foo/bar\\*" Sponsored-by: Dartmouth College's Datalad project	2021-05-25 13:08:18 -04:00
Joey Hess	a58c90ccf4	skeleton of filter-branch command, with option parser	2021-05-14 10:59:48 -04:00
Joey Hess	0e830b6bb5	make remoteKeyToRemoteName safer If it's passed a ConfigKey such as annex.version, avoid returning an empty remote name and return Nothing instead. Also, foo.bar.baz is not treated as a remote named "bar".	2021-04-23 13:29:21 -04:00
Joey Hess	653b719472	fix --all to include not yet committed files from the journal Fix bug caused by recent optimisations that could make git-annex not see recently recorded status information when configured with annex.alwayscommit=false. This does mean that --all can end up processing the same key more than once, but before the optimisations that introduced this bug, it used to also behave that way. So I didn't try to fix that; it's an edge case and anyway git-annex behaves well when run on the same key repeatedly. I am not too happy with the use of a MVar to buffer the list of files in the journal. I guess it doesn't defeat lazy streaming of the list, if that list is actually generated lazily, and anyway the size of the journal is normally capped and small, so if configs are changed to make it huge and this code path fire, git-annex using enough memory to buffer it all is not a large problem.	2021-04-21 15:40:32 -04:00
Joey Hess	74acf17a31	refactoring	2021-04-21 14:29:02 -04:00
Joey Hess	6eb3c0a6b4	fix branch precacheing bug by checking journal Fix bug caused by recent optimisations that could make git-annex not see recently recorded status information when configured with annex.alwayscommit=false. When not using --all, precaching only gets triggered when the command actually needs location logs, and so there's no speed hit there. This is a minor speed hit for --all, because it precaches even when the location log is not actually going to be used, and so checking the journal is not necessary. It would have been possible to defer checking the journal until the cache gets used. But that would complicate the usual Branch.get code path with two different kinds of caches, and the speed hit is really minimal. A better way to speed up --all, later, would be to avoid precaching at all when the location log is not going to be used.	2021-04-21 14:02:15 -04:00
Joey Hess	2e9d4ac754	fix fastDebug to check if debugging is actually enabled Had to add to AnnexRead an indication of whether debugging is enabled. Could have just made setupConsole not install a debug output action that outputs, and have enableDebug be what installs that, but then in the common case where there is no debug selector, and so all debug output is selected, it would run the debug output action every time, which entails an IORef access. Which would make fastDebug too slow..	2021-04-06 16:28:37 -04:00
Joey Hess	d16d739ce2	implement fastDebug Most of the changes here involve global option parsing: GlobalSetter changed so it can both run an Annex action to set state, but can also change the AnnexRead value, which is immutable once the Annex monad is running. That allowed a debugselector value to be added to AnnexRead, seeded from the git config. The --debugfilter option's GlobalSetter then updates the AnnexRead. This improved GlobalSetter can later be used to move more stuff to AnnexRead. Things that don't involve a git config will be easier to move, and probably a lot of things can be moved eventually. fastDebug, while implemented, is not used anywhere yet. But it should be fast..	2021-04-06 15:24:28 -04:00
Joey Hess	1b645e1ace	added --debugfilter (and annex.debugfilter)	2021-04-05 15:31:10 -04:00
Joey Hess	c2f612292a	start splitting out readonly values from AnnexState Values in AnnexRead can be read more efficiently, without MVar overhead. Only a few things have been moved into there, and the performance increase so far is not likely to be noticable. This is groundwork for putting more stuff in there, particularly a value that indicates if debugging is enabled. The obvious next step is to change option parsing to not run in the Annex monad to set values in AnnexState, and instead return a pure value that gets stored in AnnexRead.	2021-04-02 15:51:44 -04:00
Joey Hess	a8b837aaef	add git ls-tree --long parser Not yet used, but allows getting the size of items in the tree fairly cheaply. I noticed that CmdLine.Seek uses ls-tree and the feeds the files into another long-running process to check their size. That would be an example of a place that might be sped up by using this. Although in that particular case, it only needs to know the size of unlocked files, not locked. And since enabling --long probably doubles the ls-tree runtime or more, the overhead of using it there may outwweigh the benefit.	2021-03-23 12:47:00 -04:00
Joey Hess	5545e78a1e	Make --debug also enable debugging in child git-annex processes Especially necessary with stalldetection using child processes for transfers. This commit was sponsored by Jack Hill on Patreon.	2021-03-22 14:25:28 -04:00
Joey Hess	a14001785e	fix --branch combined with --unlocked or --locked Since it's using git ls-tree anyway, can just look at the file modes to see if they're unlocked or are symlinks.	2021-03-02 13:47:27 -04:00
Joey Hess	cbf94fd13d	prep for fixing find --branch --unlocked Added LinkType to ProvidedInfo, and unified MatchingKey with ProvidedInfo. They're both used in the same way, so there was no real reason to keep separate. Note that addLocked and addUnlocked still set matchNeedsFileName, because to handle MatchingFile, they do need it. However, they don't use it when MatchingInfo is provided. This should be ok, the --branch case will be able skip checking matchNeedsFileName, since it will provide a filename in any case.	2021-03-02 13:39:31 -04:00
Joey Hess	ee4fd38ecf	remove unused contentFile = Nothing	2021-03-01 16:35:38 -04:00
Joey Hess	25e4ab7e81	Prevent combinations of options such as --all with --include Previously such nonsensical combinations always treated the matching option as if it didn't match. For now, made find --branch refuse matching options that need a filename, because one is not provided to them in a way they'll use. There's an open bug report to support it, but making it error out is better than the old behavior of not finding what it was asked to. Also, made --mimetype combined with eg --all work, by looking at the object file when operating on keys.	2021-03-01 16:25:23 -04:00
Joey Hess	eb594c710e	unregisterurl: New command Implemented by generalizing registerurl. Without the implicit batch mode of registerurl since that is only a backwards compatability thing (see commit `1d1054faa6`).	2021-03-01 14:28:24 -04:00
Joey Hess	dd39e9e255	suggest when user may want annex.stalldetection When annex.stalldetection is not enabled, and a likely stall is detected, display a suggestion to enable it. Note that the progress meter display is not taken down when displaying the message, so it will display like this: 0% 8 B 0 B/s Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection 0% 10 B 0 B/s Although of course if it's really stalled, it will never update again after the message. Taking down the progress meter and starting a new one doesn't seem too necessary given how unusual this is, also this does help show the state it was at when it stalled. Use of uninterruptibleCancel here is ok, the thread it's canceling only does STM transactions and sleeps. The annex thread that gets forked off is separate to avoid it being canceled, so that it can be joined back at the end. A module cycle required moving from dupState the precaching of the remote list. Doing it at startConcurrency should cover all the cases where the remote list is used in concurrent actions. This commit was sponsored by Kevin Mueller on Patreon.	2021-02-03 15:57:19 -04:00
Joey Hess	7db4e62a90	remove accidental duplicated code The code in Annex.WorkerStage and Annex.Concurrent was 100% identical.	2021-02-03 15:23:52 -04:00
Joey Hess	aec2cf0abe	addon commands Seems only fair, that, like git runs git-annex, git-annex runs git-annex-foo. Implementation relies on O.forwardOptions, so that any options are passed through to the addon program. Note that this includes options before the subcommand, eg: git-annex -cx=y foo Unfortunately, git-annex eats the --help/-h options. This is because it uses O.hsubparser, which injects that option into each subcommand. Seems like this should be possible to avoid somehow, to let commands display their own --help, instead of the dummy one git-annex displays. The two step searching mirrors how git works, it makes finding git-annex-foo fast when "git annex foo" is run, but will also support fuzzy matching, once findAllAddonCommands gets implemented. This commit was sponsored by Dr. Land Raider on Patreon.	2021-02-02 16:32:49 -04:00
Joey Hess	e78d2c9642	move global options handling closer to Command definitions	2021-02-02 15:55:45 -04:00
Joey Hess	c8b1fa67b4	Behavior change: --trust-glacier option no longer overrides trust Since that can lead to data loss, which should never be enabled by an option other than --force. This commit was sponsored by Jake Vosloo on Patreon.	2021-01-07 10:37:43 -04:00
Joey Hess	2bf34fc17f	Behavior change: --trust option no longer overrides trust Since that can lead to data loss, which should never be enabled by an option other than --force. I suppose that using --trust was in some situation, safer than --force, because it doesn't entirely disable checking for data loss, but only disables checking involving data that is on the specified repository. But it seems better to be able to say that data loss only happens with --force. This commit was sponsored by Graham Spencer on Patreon.	2021-01-07 10:34:57 -04:00
Joey Hess	cc89699457	mincopies This is conceptually very simple, just making a 1 that was hard coded be exposed as a config option. The hard part was plumbing all that, and dealing with complexities like reading it from git attributes at the same time that numcopies is read. Behavior change: When numcopies is set to 0, git-annex used to drop content without requiring any copies. Now to get that (highly unsafe) behavior, mincopies also needs to be set to 0. It seemed better to remove that edge case, than complicate mincopies by ignoring it when numcopies is 0. This commit was sponsored by Denis Dzyubenko on Patreon.	2021-01-06 14:15:19 -04:00
Joey Hess	a3a19518d8	fix --time-limit It got broken in several ways by the streaming seeking optimisations around version 8.20201007. Moved time limit checking out of the matcher, which was a hack in the first place. So everywhere that uses Limit.getMatcher needs to check time limit. Well, almost everywhere. Command.Info uses it, but it does not make sense to time limit getting info. And Command.MultiCast uses it just to build up a list of files that then get passed to a command, so it would never have hit the timeout in a useful way. This implementation is a little more expensive when at time limit than necessary, since it continues seeking only to discard everything after the time limit. I did try making it close the file handles to force a faster shutdown, but that didn't work and hung. Could certianly be improved somehow, but seeking is probably not the expensive bit when a time limit is hit, so this seems acceptable for now.	2021-01-04 15:57:11 -04:00
Joey Hess	53fd1564b1	improve synopsis	2020-12-17 12:51:49 -04:00
Joey Hess	5e094d02d6	avoid using MatchingKey where MatchingFile can be used now This is actually matching worktree files, and now that a Key can be provided along with the file when doing that, using MatchingFile reflects that.	2020-12-14 17:54:25 -04:00
Joey Hess	01527b21d8	add key to FileInfo MatchingKey is not the thing to use when matching on actual worktreee files. Fix reversion in 8.20201116 that made include= and exclude= in preferred/required content expressions match a path relative to the current directory, rather than the path from the top of the repository.	2020-12-14 17:42:02 -04:00
Joey Hess	677003a6df	rename helper More consistent name with TransferrerPool	2020-12-09 13:24:24 -04:00
Joey Hess	05c0543e8e	move new interface to git-annex transfer This is to avoid breakage when upgrading or downgrading git-annex with a process running that uses the interface. It's better to keep the compatability code for a few years than worry about such breakage. This commit was sponsored by Brett Eisenberg on Patreon.	2020-12-09 12:33:56 -04:00
Joey Hess	5a1e73617d	finished this stage of the RawFilePath conversion Finally compiles again, and test suite passes. This commit was sponsored by Brock Spratlen on Patreon.	2020-11-04 14:20:37 -04:00
Joey Hess	4bcb4030a5	more RawFilePath conversion 580/645 This commit was sponsored by Jack Hill on Patreon.	2020-11-03 18:34:27 -04:00
Joey Hess	87f91ce563	more RawFilePath conversion 451/645	2020-10-30 15:55:59 -04:00
Joey Hess	7036d0a4c1	add, import: Fix a reversion in 7.20191009 that broke handling of --largerthan and --smallerthan This commit was sponsored by Jochen Bartl on Patreon.	2020-10-19 15:36:18 -04:00
Joey Hess	72644d919a	fix build warning	2020-10-19 14:48:21 -04:00
Joey Hess	9a5cd96f0d	Fix a memory leak introduced in the last release The problem was this line: cleanup = and <$> sequence (map snd v) That caused all of v to be held onto until the end, when the cleanup action was run. I could not seem to find a bang pattern that avoided the leak, so I resorted to a IORef, rather clunky, but not a performance problem because it will only be written once per git ls-files, so typically just 1 time. This commit was sponsored by Mark Reidenbach on Patreon.	2020-10-13 16:31:01 -04:00
Joey Hess	00dbe35fbc	allow matching on files whose content is not present Anything that needs to examine the file content will fail to match, or fall back to other available information. But the intent is that the matcher be checked for matchNeedsFileContent and only be used if it does not, so the exact behavior doesn't much matter as it should never happen. The real point of this is to not need to provide a dummy content file when matching. This commit was sponsored by Martin D on Patreon.	2020-09-28 11:17:46 -04:00
Joey Hess	f624876dc2	remove zombie process in file seeking This was the last one marked as a zombie. There might be others I don't know about, but except for in the hypothetical case of a thread dying due to an async exception before it can wait on a process it started, I don't know of any. It would probably be safe to remove the reapZombies now, but let's wait and so that in its own commit in case it turns out to cause problems. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-09-25 11:38:42 -04:00
Joey Hess	ace02f41b0	seek: defer matcher check until more info is known Sped up seeking for files to operate on, when using options like --copies or --in, by around 20%. Benchmark showed an increase for --copies from 155 seconds to 121 seconds, and --in remote will be similar to that. For --in here, the speedup was less, 5-10% or so. (both warm cache) This commit was sponsored by Jack Hill on Patreon.	2020-09-24 17:59:12 -04:00
Joey Hess	3457b526ef	make git-annex add --no-check-gitignore not skip ignored files, same as with --force	2020-09-18 13:33:35 -04:00
Joey Hess	b6642dde8a	avoid interleaving command stages with Concurrency 1 Before, -J1 was different than no -J: It makes concurrent-output be used for display, and it actually can run concurrent jobs in some situations. Eg, a perform stage and a cleanup stage can both run. dupState is used in several places, and changes NonConcurrent to Concurrent 1. My concern is that this might, in some case, enable that concurrent behavior. And in particular, that it might get enabled in --batch mode, when the user is not expecting concurrent output because they did not pass -J. While I don't have a test case where that happens and causes out of order output, it looks like it could, and so prudent to make this change.	2020-09-16 12:10:45 -04:00
Joey Hess	877ef84a1b	support --batch -J --batch combined with -J now runs batch requests concurrently for many commands. Before, the combination was accepted, but did not enable concurrency. Since the output of batch requests can be in any order, --json with the new "input" field is recommended to be used, to determine which batch request each response corresponds to. If --json is not used, batch mode still runs concurrently, using the usual concurrent-output. That will not be very useful for most batch mode users, probably, but who knows. If a program was using --batch -J before, and was parsing non-json output, this could break it. But, it was relying on git-annex not supporting concurrency despite it being enabled, so it should have expected concurrent output. So, I think that's ok. annex.jobs does not enable concurrency in --batch mode, because that would confuse programs that use --batch but don't expect concurrency.	2020-09-16 12:10:37 -04:00
Joey Hess	77c42782d0	differentiate between concurrency enabled at command line and by git config The latter should not affect --batch mode.	2020-09-16 11:47:12 -04:00
Joey Hess	7a4f3ff345	remove callCommandAction' This is prep for making batchCommandAction use commandAction, which will enable concurrency for batch mode. Since commandAction can't return anything, have to handle the case of a CommandStart that chooses to do nothing in a different way.	2020-09-16 10:53:16 -04:00
Joey Hess	7b4c7a7ffc	refactor batchStart folded into only caller	2020-09-16 10:18:36 -04:00
Joey Hess	3a05d53761	add SeekInput (not yet used) No behavior changes (hopefully), just adding SeekInput and plumbing it through to the JSON display code for later use. Over the course of 2 grueling days. withFilesNotInGit reimplemented in terms of seekHelper should be the only possible behavior change. It seems to test as behaving the same. Note that seekHelper dummies up the SeekInput in the case where segmentPaths' gives up on sorting the expanded paths because there are too many input paths. When SeekInput later gets exposed as a json field, that will result in it being a little bit wrong in the case where 100 or more paths are passed to a git-annex command. I think this is a subtle enough problem to not matter. If it does turn out to be a problem, fixing it would require splitting up the input parameters into groups of < 100, which would make git ls-files run perhaps more than is necessary. May want to revisit this, because that fix seems fairly low-impact.	2020-09-15 15:41:13 -04:00
Joey Hess	4c58433c48	avoid using MonadFail in ParseDuration There's no instance for Either String, so that makes it not as useful as it could be, so instead just return an Either String.	2020-08-15 15:53:35 -04:00
Joey Hess	506ffea5e6	stop symlink check once the top of the working tree is reached Avoid complaining that a file with "is beyond a symbolic link" when the filepath is absolute and the symlink in question is not actually inside the git repository. This assumes that inodes remain stable while the command is running. I think they always will, the filesystems where they are unstable change them across mounts. (If inodes were not stable, it would just complain about symlinks in the path that are not inside the working tree.) (On windows, I don't want to assume anything about inodes, they could be random numbers for all I know. But if they were, this would still be ok, as long as windows doesn't have symlinks that are detected by isSymbolicLink. Which seems a fair bet.)	2020-08-06 20:14:30 -04:00
Joey Hess	5d380c6c5c	when workTreeItems finds a problem with a parameter, don't go on to process it Part of workTreeItems is trying detect a case where git porcelain refuses to process a file, and where git ls-files silently outputs nothing. But, it's hard to perfectly replicate git's behavior, and besides, git's behavior could change. So it could be that we warn, but then git ls-files does not skip over it, and so git-annex also processes it after warning about it. So, if we think we have a problem with a parameter, display the warning, and skip processing it at all. Implementing this was complicated by needing to handle the case where all command-line parameters get filtered out this way. Which is different than the case where there are none, because we don't want to operate on all files in this new case..	2020-08-06 13:47:45 -04:00
Joey Hess	f75be32166	external backends wip It's able to start them up, the only thing not implemented is generating and verifying keys. And, the key translation for HasExt.	2020-07-29 15:23:18 -04:00
Joey Hess	00865cdae8	Fix a bug in find --branch in the previous version inAnnex check was lost for that code path. To avoid more such mistakes, made withKeyOptions check it when the AnnexedFileSeeker specifies.	2020-07-24 12:05:28 -04:00
Joey Hess	1be92381ec	unify batch mode with non-batch by using AnnexedFileSeeker	2020-07-22 14:23:28 -04:00
Joey Hess	889603336a	fix reversion in skipping deleted files And add a test case for that. This certianly loses some of the 2x performance improvement in file seeking that seekFilteredKeys led to, because now it has to stat the worktree files again. Without benchmarking, I expect there will still be a sizable improvement, and also the git-annex branch precaching that seekFilteredKeys can do will still be a win of its approach. Also worth noting that lookupKey, when the file DNE, check if it's in an adjusted branch with hidden files, and if so, finds the key for the file anyway. That was intended to make git-annex sync --content be able to process those files, but a side effect was that, when a file was deleted but the deletion not yet staged, git-annex commands used to still list it. That was actually a bug. This commit fixes that bug too. (git-annex sync --content on such a branch does not use seekFilteredKeys so was not affected by the reversion or by this behavior change) This commit was sponsored by Jake Vosloo on Patreon.	2020-07-19 21:25:01 -04:00
Joey Hess	5dbb2924bb	remove unused function	2020-07-19 20:16:24 -04:00
Joey Hess	535cdc8d48	importfeed: Made checking known urls step around 10% faster. This was a bit disappointing, I was hoping for a 2x speedup. But, I think the metadata lookup is wasting a lot of time and also needs to be made to stream. The changes to catObjectStreamLsTree were benchmarked to not also speed up --all around 3% more. Seems I managed to make it polymorphic after all.	2020-07-14 12:47:51 -04:00
Joey Hess	75aab72d23	mostly done with location log precaching Some nice wins.	2020-07-13 17:04:02 -04:00
Joey Hess	a290792a4f	speed up seeking pointer files This solves the same problem as commit `b4d0f6dfc2` but in a better way, that should make processing pointer files maximally fast. If there is a mixture of pointer files and symlinks, the first symlinks until the pointer file are handled maximally fast, while the ones after that go via the slightly slower path.	2020-07-13 14:25:07 -04:00
Joey Hess	918b1faa3d	avoid hanging on exception	2020-07-13 12:36:15 -04:00

1 2 3 4 5 ...

444 commits