git-annex

Author	SHA1	Message	Date
Joey Hess	8b74f01a26	split ProvidedInfo and UserProvidedInfo The latter is for git-annex matchexpression and matching against it can throw an exception. Splitting out the former reduces the potential for mistakes and avoids needing to worry about matching against that throwing an exception. This is more groundwork for matching largefiles while importing, without downloading content. This commit was sponsored by Graham Spencer on Patreon.	2020-09-28 12:12:38 -04:00
Joey Hess	00dbe35fbc	allow matching on files whose content is not present Anything that needs to examine the file content will fail to match, or fall back to other available information. But the intent is that the matcher be checked for matchNeedsFileContent and only be used if it does not, so the exact behavior doesn't much matter as it should never happen. The real point of this is to not need to provide a dummy content file when matching. This commit was sponsored by Martin D on Patreon.	2020-09-28 11:17:46 -04:00
Joey Hess	ace02f41b0	seek: defer matcher check until more info is known Sped up seeking for files to operate on, when using options like --copies or --in, by around 20%. Benchmark showed an increase for --copies from 155 seconds to 121 seconds, and --in remote will be similar to that. For --in here, the speedup was less, 5-10% or so. (both warm cache) This commit was sponsored by Jack Hill on Patreon.	2020-09-24 17:59:12 -04:00
Joey Hess	d89984b121	sync --all avoid unncessary first pass Sped up seeking to around twice as fast, by avoiding a pass over the worktree files when preferred content expressions of the local repo and remotes don't use include=/exclude=. Thanks to Lukey for identifying the optimisation. This commit was sponsored by Brock Spratlen on Patreon.	2020-09-24 15:12:09 -04:00
Joey Hess	c1b4d76e6b	make MatchFiles introspectable matchNeedsFileContent is not used yet, but shows how to add information about terminals. That one would be needed for https://git-annex.branchable.com/todo/sync_fast_import/ Note the tricky bit in Annex.FileMatcher.call where it folds over the included matcher to propagate the information. This commit was sponsored by Svenne Krap on Patreon.	2020-09-24 14:01:53 -04:00
Joey Hess	83df401d93	Merge branch 'batchasync' into master	2020-09-16 13:02:58 -04:00
Joey Hess	77c42782d0	differentiate between concurrency enabled at command line and by git config The latter should not affect --batch mode.	2020-09-16 11:47:12 -04:00
Joey Hess	8471df3b6d	rename Configurable for clarity	2020-09-16 11:16:48 -04:00
Joey Hess	fcf5d11c63	add "input" field to json output The use case of this field is mostly to support -J combined with --json. When that is implemented, a user will be able to look at the field to determine which of the requests they have sent it corresponds to. The field typically has a single value in its list, but in some cases mutliple values (eg 2 command-line params) are combined together and the list will have more. Note that json parsing was already non-strict, so old git-annex metadata --json --batch can be fed json produced by the new git-annex and will not stumble over the new field.	2020-09-15 16:22:44 -04:00
Joey Hess	3a05d53761	add SeekInput (not yet used) No behavior changes (hopefully), just adding SeekInput and plumbing it through to the JSON display code for later use. Over the course of 2 grueling days. withFilesNotInGit reimplemented in terms of seekHelper should be the only possible behavior change. It seems to test as behaving the same. Note that seekHelper dummies up the SeekInput in the case where segmentPaths' gives up on sorting the expanded paths because there are too many input paths. When SeekInput later gets exposed as a json field, that will result in it being a little bit wrong in the case where 100 or more paths are passed to a git-annex command. I think this is a subtle enough problem to not matter. If it does turn out to be a problem, fixing it would require splitting up the input parameters into groups of < 100, which would make git ls-files run perhaps more than is necessary. May want to revisit this, because that fix seems fairly low-impact.	2020-09-15 15:41:13 -04:00
Joey Hess	ddf963d019	deepseq all things returned from ResourceT http Potentially fixes https://git-annex.branchable.com/bugs/concurrent_git-annex-copy_to_s3_special_remote_fails/ although I don't know if it does. My thinking is, ResourceT may allocate a resource and then free it, and a unforced thunk to that resource could result in reading memory that has since been overwritten by something else, or in a SEGV, depending. While that seems kind of like a bug in ResourceT to me, if it is what's happening, this will avoid it. If it's not, this doesn't really hurt much since the values are all smallish. This commit was sponsored by Graham Spencer on Patreon.	2020-09-14 18:30:06 -04:00
Joey Hess	d120c73302	sync, assistant: When merge.directoryRenames is not set, default it it to "false" Works better with automatic merge conflict resolution than git's ususual default of "conflict". This is not done when automatic merge conflict resolution is disabled. This commit was sponsored by Mark Reidenbach on Patreon.	2020-09-07 13:50:58 -04:00
Joey Hess	e36bae74da	Exposed annex.forward-retry git config One reason is, 5 is an arbitrary number so ought to be configurable. The real reason though, is I wanted to make the man page explain when forward retry can override annex.retry, and having a config made the man page easier to write.	2020-09-04 15:16:40 -04:00
Joey Hess	a47787934a	remove Show Cipher committed by accident, and could have been a security hole, but since this compiles, it was not	2020-09-01 18:11:22 -04:00
Joey Hess	4c58433c48	avoid using MonadFail in ParseDuration There's no instance for Either String, so that makes it not as useful as it could be, so instead just return an Either String.	2020-08-15 15:53:35 -04:00
Joey Hess	5a5873e052	fix bug caught by test suite	2020-07-31 16:11:50 -04:00
Joey Hess	f75be32166	external backends wip It's able to start them up, the only thing not implemented is generating and verifying keys. And, the key translation for HasExt.	2020-07-29 15:23:18 -04:00
Joey Hess	c4cc2cdf4c	rename getKey to genKey for consistency with external backend protocol	2020-07-20 14:06:05 -04:00
Joey Hess	172743728e	move cryptographicallySecure into Backend type This is groundwork for external backends, but also makes sense to keep this information with the rest of a Backend's implementation. Also, removed isVerifiable. I noticed that the same information is encoded by whether a Backend implements verifyKeyContent or not.	2020-07-20 12:17:42 -04:00
Joey Hess	9483b10469	cache one more log file for metadata My worry was that a preferred content expression that matches on metadata would have removed the location log from cache, causing an expensive re-read when a Seek action later checked the location log. Especially when the --all optimisation in the previous commit pre-cached the location log. This also means that the --all optimisation could cache the metadata log too, if it wanted too, but not currently done. The cache is a list, with the most recently accessed file first. That optimises it for the common case of reading the same file twice, eg a get, examine, followed by set reads it twice. And sync --content reads the location log 3 times in a row commonly. But, as a list, it should not be made to be too long. I thought about expanding it to 5 items, but that seemed unlikely to be a win commonly enough to outweigh the extra time spent checking the cache. Clearly there could be some further benchmarking and tuning here.	2020-07-07 14:18:55 -04:00
Joey Hess	e72ec8b9b2	add back git-annex branch read cache The cache was removed way back in 2012, commit `3417c55189` Then I forgot I had removed it! I remember clearly multiple times when I thought, "this reads the same data twice, but the cache will avoid that being very expensive". The reason it was removed was it messed up the assistant noticing when other processes made changes. That same kind of problem has recently been addressed when adding the optimisation to avoid reading the journal unnecessarily. Indeed, enableInteractiveJournalAccess is run in just the right places, so can just piggyback on it to know when it's not safe to use the cache.	2020-07-06 12:22:33 -04:00
Joey Hess	57cceac569	simplify interface by removing size Add size to the returned key after the fact, unless the remote happened to add it itself.	2020-07-03 14:22:22 -04:00
Joey Hess	85506a7015	import: Added --no-content option, which avoids downloading files from a special remote Only supported by some special remotes: directory I need to check the rest and they're currently missing methods until I do. git-annex sync --no-content does not yet use this to do imports	2020-07-03 13:41:57 -04:00
Joey Hess	b651d3ede0	test: Fix some test cases that assumed git's default branch name git is making that configurable, and configuring it globally would break the test suite in a few places. No other part of git-annex assumes any branch name. Renamed a few placeholders to make that clearer. This commit was sponsored by Jake Vosloo on Patreon.	2020-06-23 16:40:51 -04:00
Joey Hess	a1d4c8e4ec	external: SETCREDS include creds in externalConfigChanges This makes the creds get saved, since only things recorded there will be saved. IIRC, unparsedRemoteConfig was not originally available when I implemented this; now that it is things get a bit simpler. More could probably be simplified, is externalConfigChanges needed at all? This does not entirely fix the bugs though, because creds are only embedded when embedcreds=yes, but not when encryption=pubkey is used without embedcreds=yes.	2020-06-16 17:24:24 -04:00
Joey Hess	89b2542d3c	annex.skipunknown with transition plan Added annex.skipunknown git config, that can be set to false to change the behavior of commands like `git annex get foo*`, to not skip over files/dirs that are not checked into git and are explicitly listed in the command line. Significant complexity was needed to handle git-annex add, which uses some git ls-files calls, but needs to not use --error-unmatch because of course the files are not known to git. annex.skipunknown is planned to change to default to false in a git-annex release in early 2022. There's a todo for that.	2020-05-28 15:55:17 -04:00
Joey Hess	e04a931439	improve transfer stages for some commands move --to, copy --to, mirror --to: When concurrency is enabled, run cleanup actions in separate job pool from uploads. transferStages was confusingly named, it's only useful when doing downloads as then the verify actions can be run concurrently with other downloads. For commands that upload, there will be more concurrency from running cleanup actions in a separate job pool. As for sync, I left it using downloadStages although that's not optimal for the part of a sync that uploads. Perhaps it should use the union of both?	2020-05-26 11:55:50 -04:00
Joey Hess	2de40ba4da	improve docs around uri claiming	2020-05-21 11:58:57 -04:00
Joey Hess	6361074174	convert renameExport to throw exception Finishes the transition to make remote methods throw exceptions, rather than silently hide them. A bit on the fence about this one, because when renameExport fails, it falls back to deleting instead, and so does the user care why it failed? However, it did let me clean up several places in the code. This commit was sponsored by Ethan Aubin.	2020-05-15 15:08:09 -04:00
Joey Hess	037440ef36	convert removeExportDirectory to throw exception Part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-15 14:43:18 -04:00
Joey Hess	cdbfaae706	change removeExport to throw exception Part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. This commit was sponsored by Graham Spencer on Patreon.	2020-05-15 14:15:14 -04:00
Joey Hess	3334d3831b	change retrieveExport and getKey to throw exception retrieveExport is part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. getKey very rarely fails, and when it does it's always for the same reason (user configured annex.backend to url for some reason). So, this will avoid dealing with Nothing everywhere it's used. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-15 13:45:53 -04:00
Joey Hess	4814b444dd	make storeExport throw exceptions	2020-05-15 12:20:02 -04:00
Joey Hess	4be94c67c7	make removeKey throw exceptions	2020-05-14 14:11:05 -04:00
Joey Hess	d9c7f81ba4	make retrieveKeyFile and retrieveKeyFileCheap throw exceptions Converted retrieveKeyFileCheap to a Maybe, to avoid needing to throw a exception when a remote doesn't support it.	2020-05-13 17:07:07 -04:00
Joey Hess	c1cd402081	make storeKey throw exceptions When storing content on remote fails, always display a reason why. Since the Storer used by special remotes already did, this mostly affects git remotes, but not entirely. For example, if git-lfs failed to connect to the endpoint, it used to silently return False.	2020-05-13 14:03:00 -04:00
Joey Hess	b50ee9cd0c	remove Preparer abstraction That had almost no benefit at all, and complicated things quite a lot. What I proably wanted this to be was something like ResourceT, but it was not. The few remotes that actually need some preparation done only once and reused used a MVar and not Preparer.	2020-05-13 11:56:21 -04:00
Joey Hess	5f5170b22b	remove SafeFilePath Move sanitizeFilePath call to where fromSafeFilePath had been.	2020-05-11 14:04:56 -04:00
Thomas Koch	8a0480daf3	Fix haddock parse error I run haddock with `cabal haddock --executables`. It fails with: Types/Remote.hs:271:17: error: parse error on input ‘->’ Apparently haddock does not like to find haddock blocks outside of declarations? In any case, this patch makes these type of errors go away. Afterwards, I see errors like these, that need to be investigated as a next step: haddock: internal error: internal: extractDecl CallStack (from HasCallStack): error, called at utils/haddock/haddock-api/src/Haddock/Interface/Create.hs:1116:12 in main:Haddock.Interface.Create	2020-05-11 08:40:13 +02:00
Joey Hess	6952060665	addurl --preserve-filename and a few related changes * addurl --preserve-filename: New option, uses server-provided filename without any sanitization, but with some security checking. Not yet implemented for remotes other than the web. * addurl, importfeed: Avoid adding filenames with leading '.', instead it will be replaced with '_'. This might be considered a security fix, but a CVE seems unwattanted. It was possible for addurl to create a dotfile, which could change behavior of some program. It was also possible for a web server to say the file name was ".git" or "foo/.git". That would not overrwrite the .git directory, but would cause addurl to fail; of course git won't add "foo/.git". sanitizeFilePath is too opinionated to remain in Utility, so moved it. The changes to mkSafeFilePath are because it used sanitizeFilePath. In particular: isDrive will never succeed, because "c:" gets munged to "c_" ".." gets sanitized now ".git" gets sanitized now It will never be null, because sanitizeFilePath keeps the length the same, and splitDirectories never returns a null path. Also, on the off chance a web server suggests a filename of "", ignore that, rather than trying to save to such a filename, which would fail in some way.	2020-05-08 16:22:55 -04:00
Joey Hess	cee6b344b4	cat-file resource pool Avoid running a large number of git cat-file child processes when run with a large -J value. This implementation takes care to avoid adding any overhead to git-annex when run without -J. When run with -J, there is a small bit of added overhead, to manipulate the resource pool. That optimisation added a fair bit of complexity.	2020-04-20 15:19:31 -04:00
Joey Hess	9cb69dbb76	support boolean git configs that are represented by the name of the setting with no value Eg"core.bare" is the same as "core.bare = true". Note that git treats "core.bare =" the same as "core.bare = false", so the code had to become more complicated in order to treat the absense of a value differently than an empty value. Ugh.	2020-04-13 13:35:22 -04:00
Joey Hess	2caf579718	cache annex index filename for 1.5% speedup to queries	2020-04-10 13:37:04 -04:00
Joey Hess	aeca7c2207	Sped up query commands that read the git-annex branch by around 5% The only price paid is one additional MVar read per write to the journal. Presumably writing a journal file dominiates over a MVar read time by several orders of magnitude. --batch does not get the speedup because then it needs to notice when another process has made a change. Also made the assistant and other damon modes bypass the optimisation, which would not help them anyway.	2020-04-09 13:54:43 -04:00
Joey Hess	6c81e0c8f1	ByteString Ref continued Several nice speed wins I think. At 340/633 files converted.	2020-04-07 13:27:11 -04:00
Joey Hess	d5d8259937	ByteString Ref continued Attoparsec parser for diff-tree. Changed fromRef back to producing a String, to avoid needing to convert every use of it. However, this does mean I'm going to miss some opportunities where fromRef is used and the result converted back to a ByteString. Would be worth revisiting that at some point maybe.	2020-04-07 11:54:27 -04:00
Joey Hess	279991604d	started converting Ref from String to ByteString This should make code that reads shas and refs from git faster. Does not compile yet, a lot needs to be done still.	2020-04-06 17:14:49 -04:00
Joey Hess	6a91471923	GETCONFIG name fix Fix regression that prevented external special remotes from using GETCONFIG to query values like "name". (Introduced in version 7.20200202.7.)	2020-03-09 12:38:04 -04:00
Joey Hess	ccd8c43dc8	git-annex config: guard against non-repo-global configs git-annex config: Only allow configs be set that are ones git-annex actually supports reading from repo-global config, to avoid confused users trying to set other configs with this.	2020-03-02 15:54:18 -04:00
Joey Hess	81e3faf810	Merge branch 'v7'	2020-02-26 18:15:18 -04:00

1 2 3 4 5 ...

593 commits