git-annex

Author	SHA1	Message	Date
Joey Hess	3290a09a70	filter out control characters in warning messages Converted warning and similar to use StringContainingQuotedPath. Most warnings are static strings, some do refer to filepaths that need to be quoted, and others don't need quoting. Note that, since quote filters out control characters of even UnquotedString, this makes all warnings safe, even when an attacker sneaks in a control character in some other way. When json is being output, no quoting is done, since json gets its own quoting. This does, as a side effect, make warning messages in json output not be indented. The indentation is only needed to offset warning messages underneath the display of the file they apply to, so that's ok. Sponsored-by: Brett Eisenberg on Patreon	2023-04-10 15:55:44 -04:00
Joey Hess	cd544e548b	filter out control characters in error messages giveup changed to filter out control characters. (It is too low level to make it use StringContainingQuotedPath.) error still does not, but it should only be used for internal errors, where the message is not attacker-controlled. Changed a lot of existing error to giveup when it is not strictly an internal error. Of course, other exceptions can still be thrown, either by code in git-annex, or a library, that include some attacker-controlled value. This does not guard against those. Sponsored-by: Noam Kremen on Patreon	2023-04-10 13:50:51 -04:00
Joey Hess	a0badc5069	sync: Fix parsing of gcrypt::rsync:// urls that use a relative path Such an url is not valid; parseURI will fail on it. But git-annex doesn't actually need to parse the url, because all it needs to do to support syncing with it is know that it's not a local path, and use git pull and push. (Note that there is no good reason for the user to use such an url. An absolute url is valid and I patched git-remote-gcrypt to support them years ago. Still, users gonna do anything that tools allow, and git-remote-gcrypt still supports them.) Sponsored-by: Jack Hill on Patreon	2023-03-23 15:20:00 -04:00
Joey Hess	cfaae7e931	added an optional cost= configuration to all special remotes Note that when this is specified and an older git-annex is used to enableremote such a special remote, it will simply ignore the cost= field and use whatever the default cost is. In passing, fixed adb to support the remote.name.cost and remote.name.cost-command configs. Sponsored-by: Dartmouth College's DANDI project	2023-01-12 13:42:28 -04:00
Yaroslav Halchenko	0151976676	Typo fix unncessary -> unnecessary. Detected while reading recent CHANGELOG entry but then decided to apply to entire codebase and docs since why not?	2022-08-20 09:40:19 -04:00
Joey Hess	23c6e350cb	improve createDirectoryUnder to allow alternate top directories This should not change the behavior of it, unless there are multiple top directories, and then it should behave the same as if there was a single top directory that was actually above the directory to be created. Sponsored-by: Dartmouth College's Datalad project	2022-08-12 12:52:37 -04:00
Joey Hess	95a04920cf	remove objectDir'	2022-06-22 16:08:49 -04:00
Joey Hess	43701759a3	disable shellescape for rsync 3.2.4 rsync 3.2.4 broke backwards-compatability by preventing exposing filenames to the shell. Made the rsync and gcrypt special remotes detect this and disable shellescape. An alternative fix would have been to always set RSYNC_OLD_ARGS=1. Which would avoid the overhead of probing rsync --help for each affected remote. But that is really very fast to run, and it seemed better to switch to the modern code path rather than keeping on using the bad old code path. Sponsored-by: Tobias Ammann on Patreon	2022-05-03 12:12:41 -04:00
Joey Hess	7bdc7350a5	remove git-annex-shell compat code * Removed support for accessing git remotes that use versions of git-annex older than 6.20180312. * git-annex-shell: Removed several commands that were only needed to support git-annex versions older than 6.20180312. (lockcontent, recvkey, sendkey, transferinfo, commit) The P2P protocol was added in that version, and used ever since, so this code was only needed for interop with older versions. "git-annex-shell commit" is used by newer git-annex versions, though unnecessarily so, because the p2pstdio command makes a single commit at shutdown. Luckily, it was run with stderr and stdout sent to /dev/null, and non-zero exit status or other exceptions are caught and ignored. So, that was able to be removed from git-annex-shell too. git-annex-shell inannex, recvkey, sendkey, and dropkey are still used by gcrypt special remotes accessed over ssh, so those had to be kept. It would probably be possible to convert that to using the P2P protocol, but it would be another multi-year transition. Some git-annex-shell fields were able to be removed. I hoped to remove all of them, and the very concept of them, but unfortunately autoinit is used by git-annex sync, and gcrypt uses remoteuuid. The main win here is really in Remote.Git, removing piles of hairy fallback code. Sponsored-by: Luke Shumaker	2021-10-11 15:36:51 -04:00
Joey Hess	b1622eb932	incremental verify for directory special remote Added fileRetriever', which will let the remaining special remotes eventually also support incremental verify. Sponsored-by: Dartmouth College's DANDI project	2021-08-16 16:51:33 -04:00
Joey Hess	dadbb510f6	incremental hashing for fileRetriever It uses tailVerify to hash the file while it's being written. This is able to sometimes avoid a separate checksum step. Although if the file gets written quickly enough, tailVerify may not see it get created before the write finishes, and the checksum still happens. Testing with the directory special remote, incremental checksumming did not happen. But then I disabled the copy CoW probing, and it did work. What's going on with that is the CoW probe creates an empty file on failure, then deletes it, and then the file is created again. tailVerify will open the first, empty file, and so fails to read the content that gets written to the file that replaces it. The directory special remote really ought to be able to avoid needing to use tailVerify, and while other special remotes could do things that cause similar problems, they probably don't. And if they do, it just means the checksum doesn't get done incrementally. Sponsored-by: Dartmouth College's DANDI project	2021-08-13 15:43:29 -04:00
Joey Hess	c20358b671	incremental verify for byteRetriever special remotes Several special remotes verify content while it is being retrieved, avoiding a separate checksum pass. They are: S3, bup, ddar, and gcrypt (with a local repository). Not done when using chunking, yet. Complicated by Retriever needing to change to be polymorphic. Which in turn meant RankNTypes is needed, and also needed some code changes. The change in Remote.External does not change behavior at all but avoids the type checking failing because of a "rigid, skolem type" which "would escape its scope". So I refactored slightly to make the type checker's job easier there. Unfortunately, directory uses fileRetriever (except when chunked), so it is not amoung the improved ones. Fixing that would need a way for FileRetriever to return a Verification. But, since the file retrieved may be encrypted or chunked, it would be extra work to always incrementally checksum the file while retrieving it. Hm. Some other special remotes use fileRetriever, and so don't get incremental verification, but could be converted to byteRetriever later. One is GitLFS, which uses downloadConduit, which writes to the file, so could verify as it goes. Other special remotes like web could too, but don't use Remote.Helper.Special and so will need to be addressed separately. Sponsored-by: Dartmouth College's DANDI project	2021-08-11 14:20:38 -04:00
Joey Hess	fa62c98910	simplify and speed up Utility.FileSystemEncoding This eliminates the distinction between decodeBS and decodeBS', encodeBS and encodeBS', etc. The old implementation truncated at NUL, and the primed versions had to do extra work to avoid that problem. The new implementation does not truncate at NUL, and is also a lot faster. (Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the primed versions.) Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation, and upgrading to it will speed up to/fromRawFilePath. AFAIK, nothing relied on the old behavior of truncating at NUL. Some code used the faster versions in places where I was sure there would not be a NUL. So this change is unlikely to break anything. Also, moved s2w8 and w82s out of the module, as they do not involve filesystem encoding really. Sponsored-by: Shae Erisson on Patreon	2021-08-11 12:13:31 -04:00
Joey Hess	f8836306fa	remove "checking remotename" message This fixes fsck of a remote that uses chunking displaying (checking remotename) (checking remotename)" for every chunk. Also, some remotes displayed the message, and others did not, with no consistency. It was originally displayed only when accessing remotes that were expensive or might involve a password prompt, I think, but nothing in the API said when to do it so it became an inconsistent mess. Originally I thought fsck should always display it. But it only displays in fsck --from remote, so the user knows the remote is being accessed, so there is no reason to tell them it's accessing it over and over. It was also possible for git-annex move to sometimes display it twice, due to checking if content is present twice. But, the user of move specifies --from/--to, so it does not need to display when it's accessing the remote, as the user expects it to access the remote. git-annex get might display it, but only if the remote also supports hasKeyCheap, which is really only local git remotes, which didn't display it always; and in any case nothing displayed it before hasKeyCheap, which is checked first, so I don't think this needs to display it ever. mirror is like move. And that's all the main places it would have been displayed. This commit was sponsored by Jochen Bartl on Patreon.	2021-04-27 13:05:27 -04:00
Joey Hess	1d7fa63149	Added support for git-remote-gcrypt's rsync URIs Which access a remote using rsync over ssh, and which git pushes to much more efficiently than ssh urls. There was some old partial support for rsync URIs from 2013, but it seemed incomplete, and did not use rsync over ssh. Weird. I'm not sure if there's any remaining benefit to using the non-rsync url forms with gcrypt, now that this is implemented? Updated docs to encourage using the rsync urls. This commit was sponsored by Svenne Krap on Patreon.	2021-03-09 15:58:09 -04:00
Joey Hess	2aa4fab62a	avoid crashing when there are remotes using unparseable urls Including the non-standard URI form that git-remote-gcrypt uses for rsync. Eg, "ook://foo:bar" cannot be parsed because "bar" is not a valid port number. But git could have a remote with that, it would try to run git-remote-ook to handle it. So, git-annex has to allow for such things, rather than crashing. This commit was sponsored by Luke Shumaker on Patreon.	2021-01-18 14:59:08 -04:00
Joey Hess	36133f27c0	move untrust forcing from Logs.Trust into Remote No behavior changes here, but this is groundwork for letting remotes such as borg vary untrust forcing depending on configuration.	2020-12-28 15:22:10 -04:00
Joey Hess	9a2c8757f3	add thirdPartyPopulated interface This is to support, eg a borg repo as a special remote, which is populated not by running git-annex commands, but by using borg. Then git-annex sync lists the content of the remote, learns which files are annex objects, and treats those as present in the remote. So, most of the import machinery is reused, to a new purpose. While normally importtree maintains a remote tracking branch, this does not, because the files stored in the remote are annex object files, not user-visible filenames. But, internally, a git tree is still generated, of the files on the remote that are annex objects. This tree is used by retrieveExportWithContentIdentifier, etc. As with other import/export remotes, that the tree is recorded in the export log, and gets grafted into the git-annex branch. importKey changed to be able to return Nothing, to indicate when an ImportLocation is not an annex object and so should be skipped from being included in the tree. It did not seem to make sense to have git-annex import do this, since from the user's perspective, it's not like other imports. So only git-annex sync does it. Note that, git-annex sync does not yet download objects from such remotes that are preferred content. importKeys is run with content downloading disabled, to avoid getting the content of all objects. Perhaps what's needed is for seekSyncContent to be run with these remotes, but I don't know if it will just work (in particular, it needs to avoid trying to transfer objects to them), so I skipped that for now. (Untested and unused as of yet.) This commit was sponsored by Jochen Bartl on Patreon.	2020-12-18 15:23:58 -04:00
Joey Hess	66497d39b3	convert git config reading to use hGetLineUntilExitOrEOF Much nicer than the old hack of waiting for a few seconds for stderr to be read.	2020-11-19 15:38:43 -04:00
Joey Hess	8f452416f7	more RawFilePath conversion Better to use Git.repoPath to get a filepath, Git.repoLocation does not always return one. This commit was sponsored by Ethan Aubin.	2020-10-30 13:00:12 -04:00
Joey Hess	5cfcf1f05f	cache remote.log Unlikely to speed up any of the existing uses much, but I want to use it in a message that might be displayed many times.	2020-09-22 13:52:26 -04:00
Joey Hess	4be94c67c7	make removeKey throw exceptions	2020-05-14 14:11:05 -04:00
Joey Hess	d9c7f81ba4	make retrieveKeyFile and retrieveKeyFileCheap throw exceptions Converted retrieveKeyFileCheap to a Maybe, to avoid needing to throw a exception when a remote doesn't support it.	2020-05-13 17:07:07 -04:00
Joey Hess	c1cd402081	make storeKey throw exceptions When storing content on remote fails, always display a reason why. Since the Storer used by special remotes already did, this mostly affects git remotes, but not entirely. For example, if git-lfs failed to connect to the endpoint, it used to silently return False.	2020-05-13 14:03:00 -04:00
Joey Hess	b50ee9cd0c	remove Preparer abstraction That had almost no benefit at all, and complicated things quite a lot. What I proably wanted this to be was something like ResourceT, but it was not. The few remotes that actually need some preparation done only once and reused used a MVar and not Preparer.	2020-05-13 11:56:21 -04:00
Joey Hess	ca9c6c5f60	Fix a potential failure to parse git config Git has an obnoxious special case in git config, a line "foo" is the same as "foo = true". That means there is no way to examine the output of git config and tell if it was run with --null or not, since a "foo" in the first line could be such a boolean, or could be followed by its value on the next line if --null were used. So, rather than trying to do such a detection, track the style of config at all the points where it's generated.	2020-04-13 13:05:41 -04:00
Joey Hess	7f992ef59c	mostly finished with createDirectoryUnder conversion Remaining things needing converted are in the assistant, and Annex.Ssh. Every other remaining call to createDirectoryIfMissing True has been audited and is not relevant. The ones in Build/ of course don't get included in the program. Others included eg, Remote.Tahoe and Config.Files which both write to dotfiles under the home directory.	2020-03-06 11:57:15 -04:00
Joey Hess	6d58ca94d6	some easy createDirectoryUnder conversions	2020-03-05 15:20:10 -04:00
Joey Hess	81e3faf810	Merge branch 'v7'	2020-02-26 18:15:18 -04:00
Joey Hess	8af6d2c3c5	fix encryption of content to gcrypt and git-lfs Fix serious regression in gcrypt and encrypted git-lfs remotes. Since version 7.20200202.7, git-annex incorrectly stored content on those remotes without encrypting it. Problem was, Remote.Git enumerates all git remotes, including git-lfs and gcrypt. It then dispatches to those. So, Remote.List used the RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt, and that parser does not know about encryption fields, so did not include them in the ParsedRemoteConfig. (Also didn't include other fields specific to those remotes, perhaps chunking etc also didn't get through.) To fix, had to move RemoteConfig parsing down into the generate methods of each remote, rather than doing it in Remote.List. And a consequence of that was that ParsedRemoteConfig had to change to include the RemoteConfig that got parsed, so that testremote can generate a new remote based on an existing remote. (I would have rather fixed this just inside Remote.Git, but that was not practical, at least not w/o re-doing work that Remote.List already did. Big ugly mostly mechanical patch seemed preferable to making git-annex slower.)	2020-02-26 18:05:36 -04:00
Joey Hess	69f2d1dd43	remoteConfig rework remoteAnnexConfig will avoid bugs like `a3a674d15b` Use now more generic remoteConfig in a couple places that built non-annex config settings manually before.	2020-02-19 13:45:11 -04:00
Joey Hess	75059c9f3b	better error message when git config fails to parse remote config Rather than leaking the name of the temp file, just say the config parse failed, and where the config was downloaded from. Not closing the bug report because two issues were reported in the same bug report, because the universe wants me to continually re-read old unclosed bug reports to waste my time determining what still needs to be done.	2020-01-22 13:35:54 -04:00
Joey Hess	7038acf96c	add descriptions for all remote config fields not yet used	2020-01-20 15:20:04 -04:00
Joey Hess	201049cf93	gcrypt inherits shellescape setting from rsync, allow it	2020-01-20 15:13:49 -04:00
Joey Hess	99cb3e75f1	add LISTCONFIGS to external special remote protocol Special remote programs that use GETCONFIG/SETCONFIG are recommended to implement it. The description is not yet used, but will be useful later when adding a way to make initremote list all accepted configs. configParser now takes a RemoteConfig parameter. Normally, that's not needed, because configParser returns a parter, it does not parse it itself. But, it's needed to look at externaltype and work out what external remote program to run for LISTCONFIGS. Note that, while externalUUID is changed to a Maybe UUID, checkExportSupported used to use NoUUID. The code that now checks for Nothing used to behave in some undefined way if the external program made requests that triggered it. Also, note that in externalSetup, once it generates external, it parses the RemoteConfig strictly. That generates a ParsedRemoteConfig, which is thrown away. The reason it's ok to throw that away, is that, if the strict parse succeeded, the result must be the same as the earlier, lenient parse. initremote of an external special remote now runs the program three times. First for LISTCONFIGS, then EXPORTSUPPORTED, and again LISTCONFIGS+INITREMOTE. It would not be hard to eliminate at least one of those, and it should be possible to only run the program once.	2020-01-17 16:07:17 -04:00
Joey Hess	c498269a88	convert configParser to Annex action and add passthrough option Needed so Remote.External can query the external program for its configs. When the external program does not support the query, the passthrough option will make all input fields be available.	2020-01-14 13:52:03 -04:00
Joey Hess	963239da5c	separate RemoteConfig parsing basically working Many special remotes are not updated yet and are commented out.	2020-01-14 12:35:08 -04:00
Joey Hess	71ecfbfccf	be stricter about rejecting invalid configurations for remotes This is a first step toward that goal, using the ProposedAccepted type in RemoteConfig lets initremote/enableremote reject bad parameters that were passed in a remote's configuration, while avoiding enableremote rejecting bad parameters that have already been stored in remote.log This does not eliminate every place where a remote config is parsed and a default value is used if the parse false. But, I did fix several things that expected foo=yes/no and so confusingly accepted foo=true but treated it like foo=no. There are still some fields that are parsed with yesNo but not not checked when initializing a remote, and there are other fields that are parsed in other ways and not checked when initializing a remote. This also lays groundwork for rejecting unknown/typoed config keys.	2020-01-10 14:52:48 -04:00
Joey Hess	686791c4ed	more RawFilePath Remove dup definitions and just use the RawFilePath one. </> etc are enough faster that it's probably faster than building a String directly, although I have not benchmarked.	2019-12-18 17:10:28 -04:00
Joey Hess	c19211774f	use filepath-bytestring for annex object manipulations git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.	2019-12-11 15:25:07 -04:00
Joey Hess	c20f4704a7	all commands building except for assistant also, changed ConfigValue to a newtype, and moved it into Git.Config.	2019-12-05 14:41:18 -04:00
Joey Hess	f3047d7186	include git-annex-shell back in Also pushed ConfigKey down into the Git modules, which is the bulk of the changes.	2019-12-02 11:51:52 -04:00
Joey Hess	d7833def66	use ByteString for git config The parser and looking up config keys in the map should both be faster due to using ByteString. I had hoped this would speed up startup time, but any improvement to that was too small to measure. Seems worth keeping though. Note that the parser breaks up the ByteString, but a config map ends up pointing to the config as read, which is retained in memory until every value from it is no longer used. This can change memory usage patterns marginally, but won't affect git-annex.	2019-11-27 17:40:09 -04:00
Joey Hess	9e8d40181f	remove some unncessary uses of warningIO warningIO is not concurrent output safe, and it doesn't go to --json-error-messages There are a few more that would be too hard to remove, and there are also several dozen direct prints to stderr still.	2019-11-12 10:07:27 -04:00
Joey Hess	9828f45d85	add RemoteStateHandle This solves the problem of sameas remotes trampling over per-remote state. Used for: * per-remote state, of course * per-remote metadata, also of course * per-remote content identifiers, because two remote implementations could in theory generate the same content identifier for two different peices of content While chunk logs are per-remote data, they don't use this, because the number and size of chunks stored is a common property across sameas remotes. External special remote had a complication, where it was theoretically possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or EXPORTSUPPORTED. Since the uuid of the remote is typically generate in Remote.setup, it would only be possible to pass a Maybe RemoteStateHandle into it, and it would otherwise have to construct its own. Rather than go that route, I decided to send an ERROR in this case. It seems unlikely that any existing external special remote will be affected. They would have to make up a git-annex key, and set state for some reason during INITREMOTE. I can imagine such a hack, but it doesn't seem worth complicating the code in such an ugly way to support it. Unfortunately, both TestRemote and Annex.Import needed the Remote to have a new field added that holds its RemoteStateHandle.	2019-10-14 13:51:42 -04:00
Joey Hess	d1130ea04a	get rid of hardcoded "name" lookups Support "sameas-name" being set instead. In RenameRemote, rename which ever of the two is set.	2019-10-10 13:25:10 -04:00
Joey Hess	06c04ffe29	use storeUUIDIn	2019-10-10 12:12:09 -04:00
Joey Hess	fb7d92457f	support using gcrypt with git-lfs special remote	2019-08-05 13:43:45 -04:00
Joey Hess	8401b09e32	Allow setting up a gcrypt special remote with encryption=shared It was documented to work, but seems it has been broken for a while/forever.	2019-08-05 12:41:05 -04:00
Joey Hess	44de3fff0b	avoid rsync/gcrypt ssh startup delay with -J Avoid a delay at startup when concurrency is enabled and there are rsync or gcrypt special remotes, which was caused by git-annex opening a ssh connection to the remote too early. sshOptions makes a connection to the ssh server if one is not already open, when concurrency is enabled. Avoid doing that at startup, when the remote list is being built, but the remote may not be used at all. Instead, rsync/gcrypt now runs sshOptions once per ssh connection to the server. This should not be significant overhead since Remote.Git already has the same overhead (as do Bup and Ddar).	2019-06-13 11:16:38 -04:00

1 2 3

135 commits