git-annex

Author	SHA1	Message	Date
Joey Hess	2f2701137d	incremental verification for retrieval from all export remotes Only for export remotes so far, not export/import. Sponsored-by: Dartmouth College's Datalad project	2022-05-09 13:49:33 -04:00
Joey Hess	798b33ba3d	simplify annex.bwlimit handling RemoteGitConfig parsing looks for annex.bwlimit when a remote does not have a per-remote config for it, so no need for a separate gobal config. Sponsored-by: Svenne Krap on Patreon	2021-09-22 10:52:01 -04:00
Joey Hess	18e00500ce	bwlimit Added annex.bwlimit and remote.name.annex-bwlimit config that works for git remotes and many but not all special remotes. This nearly works, at least for a git remote on the same disk. With it set to 100kb/1s, the meter displays an actual bandwidth of 128 kb/s, with occasional spikes to 160 kb/s. So it needs to delay just a bit longer... I'm unsure why. However, at the beginning a lot of data flows before it determines the right bandwidth limit. A granularity of less than 1s would probably improve that. And, I don't know yet if it makes sense to have it be 100ks/1s rather than 100kb/s. Is there a situation where the user would want a larger granularity? Does granulatity need to be configurable at all? I only used that format for the config really in order to reuse an existing parser. This can't support for external special remotes, or for ones that themselves shell out to an external command. (Well, it could, but it would involve pausing and resuming the child process tree, which seems very hard to implement and very strange besides.) There could also be some built-in special remotes that it still doesn't work for, due to them not having a progress meter whose displays blocks the bandwidth using thread. But I don't think there are actually any that run a separate thread for downloads than the thread that displays the progress meter. Sponsored-by: Graham Spencer on Patreon	2021-09-21 16:58:10 -04:00
Joey Hess	f0754a61f5	plumb VerifyConfig into retrieveKeyFile This fixes the recent reversion that annex.verify is not honored, because retrieveChunks was passed RemoteVerify baser, but baser did not have export/import set up. Sponsored-by: Dartmouth College's DANDI project	2021-08-17 12:43:13 -04:00
Joey Hess	b1622eb932	incremental verify for directory special remote Added fileRetriever', which will let the remaining special remotes eventually also support incremental verify. Sponsored-by: Dartmouth College's DANDI project	2021-08-16 16:51:33 -04:00
Joey Hess	c4aba8e032	better handling of finishing up incomplete incremental verify Now it's run in VerifyStage. I thought about keeping the file handle open, and resuming reading where tailVerify left off. But that risks leaking open file handles, until the GC closes them, if the deferred verification does not get resumed. Since that could perhaps happen if there's an exception somewhere, I decided that was too unsafe. Instead, re-open the file, seek, and resume. Sponsored-by: Dartmouth College's DANDI project	2021-08-16 14:52:59 -04:00
Joey Hess	dadbb510f6	incremental hashing for fileRetriever It uses tailVerify to hash the file while it's being written. This is able to sometimes avoid a separate checksum step. Although if the file gets written quickly enough, tailVerify may not see it get created before the write finishes, and the checksum still happens. Testing with the directory special remote, incremental checksumming did not happen. But then I disabled the copy CoW probing, and it did work. What's going on with that is the CoW probe creates an empty file on failure, then deletes it, and then the file is created again. tailVerify will open the first, empty file, and so fails to read the content that gets written to the file that replaces it. The directory special remote really ought to be able to avoid needing to use tailVerify, and while other special remotes could do things that cause similar problems, they probably don't. And if they do, it just means the checksum doesn't get done incrementally. Sponsored-by: Dartmouth College's DANDI project	2021-08-13 15:43:29 -04:00
Joey Hess	c20358b671	incremental verify for byteRetriever special remotes Several special remotes verify content while it is being retrieved, avoiding a separate checksum pass. They are: S3, bup, ddar, and gcrypt (with a local repository). Not done when using chunking, yet. Complicated by Retriever needing to change to be polymorphic. Which in turn meant RankNTypes is needed, and also needed some code changes. The change in Remote.External does not change behavior at all but avoids the type checking failing because of a "rigid, skolem type" which "would escape its scope". So I refactored slightly to make the type checker's job easier there. Unfortunately, directory uses fileRetriever (except when chunked), so it is not amoung the improved ones. Fixing that would need a way for FileRetriever to return a Verification. But, since the file retrieved may be encrypted or chunked, it would be extra work to always incrementally checksum the file while retrieving it. Hm. Some other special remotes use fileRetriever, and so don't get incremental verification, but could be converted to byteRetriever later. One is GitLFS, which uses downloadConduit, which writes to the file, so could verify as it goes. Other special remotes like web could too, but don't use Remote.Helper.Special and so will need to be addressed separately. Sponsored-by: Dartmouth College's DANDI project	2021-08-11 14:20:38 -04:00
Joey Hess	fa62c98910	simplify and speed up Utility.FileSystemEncoding This eliminates the distinction between decodeBS and decodeBS', encodeBS and encodeBS', etc. The old implementation truncated at NUL, and the primed versions had to do extra work to avoid that problem. The new implementation does not truncate at NUL, and is also a lot faster. (Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the primed versions.) Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation, and upgrading to it will speed up to/fromRawFilePath. AFAIK, nothing relied on the old behavior of truncating at NUL. Some code used the faster versions in places where I was sure there would not be a NUL. So this change is unlikely to break anything. Also, moved s2w8 and w82s out of the module, as they do not involve filesystem encoding really. Sponsored-by: Shae Erisson on Patreon	2021-08-11 12:13:31 -04:00
Joey Hess	0e830b6bb5	make remoteKeyToRemoteName safer If it's passed a ConfigKey such as annex.version, avoid returning an empty remote name and return Nothing instead. Also, foo.bar.baz is not treated as a remote named "bar".	2021-04-23 13:29:21 -04:00
Joey Hess	381f203d1a	refactor Avoiding using a callback simplifies this and should make it easier to implement incremental checksumming, which will need to happen partly in writeRetrievedContent and partly in retrieveChunks.	2021-02-16 16:03:28 -04:00
Joey Hess	62e152f210	incremental checksum on download from ssh or p2p Checksum as content is received from a remote git-annex repository, rather than doing it in a second pass. Not tested at all yet, but I imagine it will work! Not implemented for any special remotes, and also not implemented for copies from local remotes. It may be that, for local remotes, it will suffice to use rsync, rely on its checksumming, and simply return Verified. (It would still make a checksumming pass when cp is used for COW, I guess.)	2021-02-09 17:03:27 -04:00
Joey Hess	a3b714ddd9	finish fixing removeLink on windows `9cb250f7be` got the ones in RawFilePath, but there were others that used the one from unix-compat, which fails at runtime on windows. To avoid this, import System.PosixCompat.Files hiding removeLink This commit was sponsored by Ethan Aubin.	2020-11-24 13:20:44 -04:00
Joey Hess	9b0dde834e	convert getFileSize to RawFilePath Lots of nice wins from this in avoiding unncessary work, and I think nothing got slower. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-11-05 11:32:57 -04:00
Joey Hess	681b44236a	more RawFilePath conversion at 377/645 This commit was sponsored by Svenne Krap on Patreon.	2020-10-29 14:20:57 -04:00
Joey Hess	e505c03bcc	more RawFilePath conversion nukeFile replaced with removeWhenExistsWith removeLink, which allows using RawFilePath. Utility.Directory cannot use RawFilePath since setup does not depend on posix. This commit was sponsored by Graham Spencer on Patreon.	2020-10-29 10:50:29 -04:00
Joey Hess	4be94c67c7	make removeKey throw exceptions	2020-05-14 14:11:05 -04:00
Joey Hess	d9c7f81ba4	make retrieveKeyFile and retrieveKeyFileCheap throw exceptions Converted retrieveKeyFileCheap to a Maybe, to avoid needing to throw a exception when a remote doesn't support it.	2020-05-13 17:07:07 -04:00
Joey Hess	c1cd402081	make storeKey throw exceptions When storing content on remote fails, always display a reason why. Since the Storer used by special remotes already did, this mostly affects git remotes, but not entirely. For example, if git-lfs failed to connect to the endpoint, it used to silently return False.	2020-05-13 14:03:00 -04:00
Joey Hess	b50ee9cd0c	remove Preparer abstraction That had almost no benefit at all, and complicated things quite a lot. What I proably wanted this to be was something like ResourceT, but it was not. The few remotes that actually need some preparation done only once and reused used a MVar and not Preparer.	2020-05-13 11:56:21 -04:00
Joey Hess	69f2d1dd43	remoteConfig rework remoteAnnexConfig will avoid bugs like `a3a674d15b` Use now more generic remoteConfig in a couple places that built non-annex config settings manually before.	2020-02-19 13:45:11 -04:00
Joey Hess	99cb3e75f1	add LISTCONFIGS to external special remote protocol Special remote programs that use GETCONFIG/SETCONFIG are recommended to implement it. The description is not yet used, but will be useful later when adding a way to make initremote list all accepted configs. configParser now takes a RemoteConfig parameter. Normally, that's not needed, because configParser returns a parter, it does not parse it itself. But, it's needed to look at externaltype and work out what external remote program to run for LISTCONFIGS. Note that, while externalUUID is changed to a Maybe UUID, checkExportSupported used to use NoUUID. The code that now checks for Nothing used to behave in some undefined way if the external program made requests that triggered it. Also, note that in externalSetup, once it generates external, it parses the RemoteConfig strictly. That generates a ParsedRemoteConfig, which is thrown away. The reason it's ok to throw that away, is that, if the strict parse succeeded, the result must be the same as the earlier, lenient parse. initremote of an external special remote now runs the program three times. First for LISTCONFIGS, then EXPORTSUPPORTED, and again LISTCONFIGS+INITREMOTE. It would not be hard to eliminate at least one of those, and it should be possible to only run the program once.	2020-01-17 16:07:17 -04:00
Joey Hess	c498269a88	convert configParser to Annex action and add passthrough option Needed so Remote.External can query the external program for its configs. When the external program does not support the query, the passthrough option will make all input fields be available.	2020-01-14 13:52:03 -04:00
Joey Hess	963239da5c	separate RemoteConfig parsing basically working Many special remotes are not updated yet and are commented out.	2020-01-14 12:35:08 -04:00
Joey Hess	71f78fe45d	wip separate RemoteConfig parsing Remote now contains a ParsedRemoteConfig. The parsing happens when the Remote is constructed, rather than when individual configs are used. This is more efficient, and it lets initremote/enableremote reject configs that have unknown fields or unparsable values. It also allows for improved type safety, as shown in Remote.Helper.Encryptable where things that used to match on string configs now match on data types. This is a work in progress, it does not build yet. The main risk in this conversion is forgetting to add a field to RemoteConfigParser. That will prevent using that field with initremote/enableremote, and will prevent remotes that already are set up from seeing that configuration. So will need to check carefully that every field that getRemoteConfigValue is called on has been added to RemoteConfigParser. (One such case I need to remember is that credPairRemoteField needs to be included in the RemoteConfigParser.)	2020-01-13 12:39:21 -04:00
Joey Hess	f3047d7186	include git-annex-shell back in Also pushed ConfigKey down into the Git modules, which is the bulk of the changes.	2019-12-02 11:51:52 -04:00
Joey Hess	d7833def66	use ByteString for git config The parser and looking up config keys in the map should both be faster due to using ByteString. I had hoped this would speed up startup time, but any improvement to that was too small to measure. Seems worth keeping though. Note that the parser breaks up the ByteString, but a config map ends up pointing to the config as read, which is retained in memory until every value from it is no longer used. This can change memory usage patterns marginally, but won't affect git-annex.	2019-11-27 17:40:09 -04:00
Joey Hess	35d7ffe128	initremote --sameas fully working And using sameas remotes is working. Moved annex-config-uuid setting out of Remote.Helper.Special. EnableRemote will also have to set it.	2019-10-11 14:19:10 -04:00
Joey Hess	59908586f4	rename RemoteConfigKey to RemoteConfigField And some associated renames. I was going to have some values named fooKeyKey otherwise..	2019-10-10 15:44:05 -04:00
Joey Hess	d1130ea04a	get rid of hardcoded "name" lookups Support "sameas-name" being set instead. In RenameRemote, rename which ever of the two is set.	2019-10-10 13:25:10 -04:00
Joey Hess	92ff30df70	set annex-config-uuid when RemoteConfig contains a sameas-uuid Initremote sets that, so after both initremote and enableremote, the git config will be set. Any remote that does not use Annex.SpecialRemote won't set annex-config-uuid. But that's only Remote.Git, which doesn't use RemoteConfig anyway.	2019-10-10 12:58:59 -04:00
Joey Hess	46071a2435	use storeUUIDIn	2019-10-10 12:38:17 -04:00
Joey Hess	26c54d6ea3	make metered more generic Allow it to be used when the Key is not known.	2019-06-25 12:33:36 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	7b9701675e	Display progress bar when getting files from export remotes And moved the progress bar display into storeExport as well. This commit was sponsored by John Pellman on Patreon.	2019-01-31 13:34:12 -04:00
Joey Hess	c4977ec1ff	refactoring	2019-01-29 13:42:32 -04:00
Joey Hess	bc31b93c77	remote.name.annex-security-allow-unverified-downloads Added remote.name.annex-security-allow-unverified-downloads, a per-remote setting for annex.security.allow-unverified-downloads. This commit was sponsored by Brock Spratlen on Patreon.	2018-09-25 15:34:47 -04:00
Joey Hess	4315bb9e42	add retrievalSecurityPolicy This will be used to protect against CVE-2018-10859, where an encrypted special remote is fed the wrong encrypted data, and so tricked into decrypting something that the user encrypted with their gpg key and did not store in git-annex. It also protects against CVE-2018-10857, where a remote follows a http redirect to a file:// url or to a local private web server. While that's already been prevented in git-annex's own use of http, external special remotes, hooks, etc use other http implementations and could still be vulnerable. The policy is not yet enforced, this commit only adds the appropriate metadata to remotes. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2018-06-21 11:36:36 -04:00
Joey Hess	2927618d35	Added adb special remote which allows exporting files to Android devices. git annex testremote passes. exportree not implemented yet, although the documentation talks about it, since it will be the main way this remote will be used. The adb push/pull progress is displayed for now; it would be better to consume it and use it to update the git-annex progress bar. This commit was sponsored by andrea rota.	2018-03-27 14:54:41 -04:00
Joey Hess	e16b069331	use total size from DATA Noticed that getting a key whose size is not known resulted in a progress display that didn't include the percent complete. Fixed for P2P by making the size sent with DATA be used to update the meter's total size. In order for rateLimitMeterUpdate to also learn the total size, had to make it be passed the Meter, and some other reorg in Utility.Metered was also done so that --json-progress can construct a Meter to pass to rateLimitMeterUpdate. When the fallback rsync is done, the progress display still doesn't include the percent complete. Only way to fix that seems to be to let rsync display its output again, but that would conflict with git-annex's own progress meter, which is also being displayed. This commit was sponsored by Henrik Riomar on Patreon.	2018-03-12 21:46:58 -04:00
Joey Hess	4e7e1fcff4	add gitAnnexTmpWorkDir and withTmpWorkDir Needed to run youtube-dl in, but could also be useful for other stuff. The tricky part of this was making the workdir be cleaned up whenever the tmp object file is cleaned up. This commit was sponsored by Ole-Morten Duesund on Patreon.	2017-11-29 13:53:39 -04:00
Joey Hess	f5edb16729	Display progress meter when uploading a key without size information Getting the size by statting the content file. This commit was supported by the NSF-funded DataLad project.	2017-11-14 16:40:49 -04:00
Joey Hess	a1730cd6af	adeiu, MissingH Removed dependency on MissingH, instead depending on the split library. After laying groundwork for this since 2015, it was mostly straightforward. Added Utility.Tuple and Utility.Split. Eyeballed System.Path.WildMatch while implementing the same thing. Since MissingH's progress meter display was being used, I re-implemented my own. Bonus: Now progress is displayed for transfers of files of unknown size. This commit was sponsored by Shane-o on Patreon.	2017-05-16 01:03:52 -04:00
Joey Hess	b9ce477fa2	plumb RemoteGitConfig through to decryptCipher	2016-05-23 17:33:32 -04:00
Joey Hess	91df4c6b53	Pass the various gnupg-options configs to gpg in several cases where they were not before. Removed the instance LensGpgEncParams RemoteConfig because it encouraged code that does not take the RemoteGitConfig into account. RemoteType's setup was changed to take a RemoteGitConfig, although the only place that is able to provide a non-empty one is enableremote, when it's changing an existing remote. This led to several folow-on changes, and got RemoteGitConfig plumbed through.	2016-05-23 17:03:20 -04:00
Joey Hess	3f1aaa84c5	Added annex.gnupg-decrypt-options and remote.<name>.annex-gnupg-decrypt-options, which are passed to gpg when it's decrypting data. The naming is unofrtunately not consistent, but the gnupg-options were only used for encrypting, and it's too late to change that. It would be nice to have a third setting that is always passed to gnupg, but ~/.gnupg/options can be used to specify such global options when really needed.	2016-05-10 13:03:56 -04:00
Joey Hess	b890f3a53d	Fix bug that prevented resuming of uploads to encrypted special remotes that used chunking. This bug could also expose the names of keys to such remotes. This is a low-severity security hole.	2016-04-27 12:54:43 -04:00
Joey Hess	737e45156e	remove 163 lines of code without changing anything except imports	2016-01-20 16:36:33 -04:00
Joey Hess	e97fce35a6	Display progress meter in -J mode when downloading from the web. Including in addurl, and get --from web, but also in S3 and External special remotes when a web url is known for content in those remotes.	2015-11-16 21:00:54 -04:00
Joey Hess	2def1d0a23	other 80% of avoding verification when hard linking to objects in shared repo In `c6632ee5c8`, it actually only handled uploading objects to a shared repository. To avoid verification when downloading objects from a shared repository, was a lot harder. On the plus side, if the process of downloading a file from a remote is able to verify its content on the side, the remote can indicate this now, and avoid the extra post-download verification. As of yet, I don't have any remotes (except Git) using this ability. Some more work would be needed to support it in special remotes. It would make sense for tahoe to implicitly verify things downloaded from it; as long as you trust your tahoe server (which typically runs locally), there's cryptographic integrity. OTOH, despite bup being based on shas, a bup repo under an attacker's control could have the git ref used for an object changed, and so a bup repo shouldn't implicitly verify. Indeed, tahoe seems unique in being trustworthy enough to implicitly verify.	2015-10-02 14:35:12 -04:00

1 2

86 commits