git-annex

Author	SHA1	Message	Date
Joey Hess	9286769d2c	let Remote.availability return Unavilable This is groundwork for making special remotes like borg be skipped by sync when on an offline drive. Added AVAILABILITY UNAVAILABLE reponse and the UNAVAILABLERESPONSE extension to the external special remote protocol. The extension is needed because old git-annex, if it sees that response, will display a warning message. (It does continue as if the remote is globally available, which is acceptable, and the warning is only displayed at initremote due to remote.name.annex-availability caching, but still it seemed best to make this a protocol extension.) The remote.name.annex-availability git config is no longer used any more, and is documented as such. It was only used by external special remotes to cache the availability, to avoid needing to start the external process every time. Now that availability is queried as an Annex action, the external is only started by sync (and the assistant), when they actually check availability. Sponsored-by: Nicholas Golder-Manning on Patreon	2023-08-16 14:31:31 -04:00
Joey Hess	3290a09a70	filter out control characters in warning messages Converted warning and similar to use StringContainingQuotedPath. Most warnings are static strings, some do refer to filepaths that need to be quoted, and others don't need quoting. Note that, since quote filters out control characters of even UnquotedString, this makes all warnings safe, even when an attacker sneaks in a control character in some other way. When json is being output, no quoting is done, since json gets its own quoting. This does, as a side effect, make warning messages in json output not be indented. The indentation is only needed to offset warning messages underneath the display of the file they apply to, so that's ok. Sponsored-by: Brett Eisenberg on Patreon	2023-04-10 15:55:44 -04:00
Joey Hess	cfaae7e931	added an optional cost= configuration to all special remotes Note that when this is specified and an older git-annex is used to enableremote such a special remote, it will simply ignore the cost= field and use whatever the default cost is. In passing, fixed adb to support the remote.name.cost and remote.name.cost-command configs. Sponsored-by: Dartmouth College's DANDI project	2023-01-12 13:42:28 -04:00
Joey Hess	0ffc59d341	change retrieveExportWithContentIdentifier to take a list of ContentIdentifier This partly fixes an issue where there are duplicate files in the special remote, and the first file gets swapped with another duplicate, or deleted. The swap case is fixed by this, the deleted case will need other changes. This makes retrieveExportWithContentIdentifier take a list of allowed ContentIdentifier, same as storeExportWithContentIdentifier, removeExportWithContentIdentifier, and checkPresentExportWithContentIdentifier. Of the special remotes that support importtree, borg is a special case and does not use content identifiers, S3 I assume can't get mixed up like this, directory certainly has the problem, and adb also appears to have had the problem. Sponsored-by: Graham Spencer on Patreon	2022-09-20 13:19:42 -04:00
Joey Hess	50c2cac7e7	adb: Added configuration setting oldandroid=true To avoid using find -printf, which was first supported in Android around 2019-2020. Probing seems too fragile, and execing stat once per file is too slow to do when there's a faster way available, which brought me to an option... Sponsored-by: Brett Eisenberg on Patreon	2022-07-13 18:00:47 -04:00
Joey Hess	e8a601aa24	incremental verification for retrieval from import remotes Sponsored-by: Dartmouth College's Datalad project	2022-05-09 15:39:43 -04:00
Joey Hess	2f2701137d	incremental verification for retrieval from all export remotes Only for export remotes so far, not export/import. Sponsored-by: Dartmouth College's Datalad project	2022-05-09 13:49:33 -04:00
Joey Hess	90950a37e5	support incremental verification when retrieving from export/import remotes None of the special remotes do it yet, but this lays the groundwork. Added MustFinishIncompleteVerify so that, when an incremental verify is started but not complete, it can be forced to finish it. Otherwise, it would have skipped doing it when verification is disabled, but verification must always be done when retrievin from export remotes since files can be modified during retrieval. Note that retrieveExportWithContentIdentifier doesn't support incremental verification yet. And I'm not sure if it can -- it doesn't know the Key before it downloads the content. It seems a new API call would need to be split out of that, which is provided with the key. Sponsored-by: Dartmouth College's Datalad project	2022-05-09 12:25:04 -04:00
Joey Hess	a32ff6cef0	adb: Avoid find failing with "Argument list too long" The "+" argument only runs the command once, so is not safe to use. Using ";" instead would have been the simplest fix, but also the slowest. Since my phone has an xargs that supports -0, I piped find to xargs instead. Unsure how portable this will be, perhaps some android's don't have xargs -0 or find -printf to send null terminated output. The business with pipefail is necessary to make a failure of find cause the import to fail. Probably this works on all androids, but if not, it will probably just result in a failure of find being ignored. It would be possible to make ignorefinderror just disable setting pipefail, but then if some android has a shell that has pipefail enabled by default, ignorefinderror would not work, so I kept the \|\| true approach for that. Sponsored-by: Max Thoursie on Patreon	2022-01-31 13:19:09 -04:00
Joey Hess	525473aa5a	adb: Added ignorefinderror configuration parameter On a phone with Calyxos, adb find in /sdcard complains: find: ./Android/data/com.android.providers.downloads.ui: Permission denied But otherwise works, so this option makes import and export work ok, except for that one app's data. Sponsored-by: Graham Spencer	2022-01-10 21:17:00 -04:00
Joey Hess	0584e096d1	comment	2022-01-03 13:53:34 -04:00
Joey Hess	69f8e6c7c0	ImportableContentsChunkable This improves the borg special remote memory usage, by letting it only load one archive's worth of filenames into memory at a time, and building up a larger tree out of the chunks. When a borg repository has many archives, git-annex could easily OOM before. Now, it will use only memory proportional to the number of annexed keys in an archive. Minor implementation wart: Each new chunk re-opens the content identifier database, and also a new vector clock is used for each chunk. This is a minor innefficiency only; the use of continuations makes it hard to avoid, although putting the database handle into a Reader monad would be one way to fix it. It may later be possible to extend the ImportableContentsChunkable interface to remotes that are not third-party populated. However, that would perhaps need an interface that does not use continuations. The ImportableContentsChunkable interface currently does not allow populating the top of the tree with anything other than subtrees. It would be easy to extend it to allow putting files in that tree, but borg doesn't need that so I left it out for now. Sponsored-by: Noam Kremen on Patreon	2021-10-08 13:15:22 -04:00
Joey Hess	b1622eb932	incremental verify for directory special remote Added fileRetriever', which will let the remaining special remotes eventually also support incremental verify. Sponsored-by: Dartmouth College's DANDI project	2021-08-16 16:51:33 -04:00
Joey Hess	c20358b671	incremental verify for byteRetriever special remotes Several special remotes verify content while it is being retrieved, avoiding a separate checksum pass. They are: S3, bup, ddar, and gcrypt (with a local repository). Not done when using chunking, yet. Complicated by Retriever needing to change to be polymorphic. Which in turn meant RankNTypes is needed, and also needed some code changes. The change in Remote.External does not change behavior at all but avoids the type checking failing because of a "rigid, skolem type" which "would escape its scope". So I refactored slightly to make the type checker's job easier there. Unfortunately, directory uses fileRetriever (except when chunked), so it is not amoung the improved ones. Fixing that would need a way for FileRetriever to return a Verification. But, since the file retrieved may be encrypted or chunked, it would be extra work to always incrementally checksum the file while retrieving it. Hm. Some other special remotes use fileRetriever, and so don't get incremental verification, but could be converted to byteRetriever later. One is GitLFS, which uses downloadConduit, which writes to the file, so could verify as it goes. Other special remotes like web could too, but don't use Remote.Helper.Special and so will need to be addressed separately. Sponsored-by: Dartmouth College's DANDI project	2021-08-11 14:20:38 -04:00
Joey Hess	fa62c98910	simplify and speed up Utility.FileSystemEncoding This eliminates the distinction between decodeBS and decodeBS', encodeBS and encodeBS', etc. The old implementation truncated at NUL, and the primed versions had to do extra work to avoid that problem. The new implementation does not truncate at NUL, and is also a lot faster. (Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the primed versions.) Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation, and upgrading to it will speed up to/fromRawFilePath. AFAIK, nothing relied on the old behavior of truncating at NUL. Some code used the faster versions in places where I was sure there would not be a NUL. So this change is unlikely to break anything. Also, moved s2w8 and w82s out of the module, as they do not involve filesystem encoding really. Sponsored-by: Shae Erisson on Patreon	2021-08-11 12:13:31 -04:00
Joey Hess	f8836306fa	remove "checking remotename" message This fixes fsck of a remote that uses chunking displaying (checking remotename) (checking remotename)" for every chunk. Also, some remotes displayed the message, and others did not, with no consistency. It was originally displayed only when accessing remotes that were expensive or might involve a password prompt, I think, but nothing in the API said when to do it so it became an inconsistent mess. Originally I thought fsck should always display it. But it only displays in fsck --from remote, so the user knows the remote is being accessed, so there is no reason to tell them it's accessing it over and over. It was also possible for git-annex move to sometimes display it twice, due to checking if content is present twice. But, the user of move specifies --from/--to, so it does not need to display when it's accessing the remote, as the user expects it to access the remote. git-annex get might display it, but only if the remote also supports hasKeyCheap, which is really only local git remotes, which didn't display it always; and in any case nothing displayed it before hasKeyCheap, which is checked first, so I don't think this needs to display it ever. mirror is like move. And that's all the main places it would have been displayed. This commit was sponsored by Jochen Bartl on Patreon.	2021-04-27 13:05:27 -04:00
Joey Hess	36133f27c0	move untrust forcing from Logs.Trust into Remote No behavior changes here, but this is groundwork for letting remotes such as borg vary untrust forcing depending on configuration.	2020-12-28 15:22:10 -04:00
Joey Hess	46059ab0e5	split off versionedExport from appendonly S3 uses versionedExport, while GitLFS uses appendonly. This is groundwork for later changes.	2020-12-28 14:37:15 -04:00
Joey Hess	4f9969d0a1	optimisation for borg Skip needing to list importable contents when unchanged since last time.	2020-12-22 15:00:05 -04:00
Joey Hess	e1ac42be77	convert listImportableContents to throwing exceptions	2020-12-22 14:24:29 -04:00
Joey Hess	9a2c8757f3	add thirdPartyPopulated interface This is to support, eg a borg repo as a special remote, which is populated not by running git-annex commands, but by using borg. Then git-annex sync lists the content of the remote, learns which files are annex objects, and treats those as present in the remote. So, most of the import machinery is reused, to a new purpose. While normally importtree maintains a remote tracking branch, this does not, because the files stored in the remote are annex object files, not user-visible filenames. But, internally, a git tree is still generated, of the files on the remote that are annex objects. This tree is used by retrieveExportWithContentIdentifier, etc. As with other import/export remotes, that the tree is recorded in the export log, and gets grafted into the git-annex branch. importKey changed to be able to return Nothing, to indicate when an ImportLocation is not an annex object and so should be skipped from being included in the tree. It did not seem to make sense to have git-annex import do this, since from the user's perspective, it's not like other imports. So only git-annex sync does it. Note that, git-annex sync does not yet download objects from such remotes that are preferred content. importKeys is run with content downloading disabled, to avoid getting the content of all objects. Perhaps what's needed is for seekSyncContent to be run with these remotes, but I don't know if it will just work (in particular, it needs to avoid trying to transfer objects to them), so I skipped that for now. (Untested and unused as of yet.) This commit was sponsored by Jochen Bartl on Patreon.	2020-12-18 15:23:58 -04:00
Joey Hess	85cd79ea01	no importKey for android yet adb shell has sha256sum sha1sum and some others, so they could be used. They're provided by toybox, so seem about as likely to keep working as find and stat, which it already depends on. Or to not add a dep, could use stat the same as getExportContentIdentifier to get a mtime, and make a WORM key. But do I really want this to default to WORM? Unsure what's the best path, so punting for now.	2020-07-03 14:02:50 -04:00
Joey Hess	6361074174	convert renameExport to throw exception Finishes the transition to make remote methods throw exceptions, rather than silently hide them. A bit on the fence about this one, because when renameExport fails, it falls back to deleting instead, and so does the user care why it failed? However, it did let me clean up several places in the code. This commit was sponsored by Ethan Aubin.	2020-05-15 15:08:09 -04:00
Joey Hess	037440ef36	convert removeExportDirectory to throw exception Part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-15 14:43:18 -04:00
Joey Hess	cdbfaae706	change removeExport to throw exception Part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. This commit was sponsored by Graham Spencer on Patreon.	2020-05-15 14:15:14 -04:00
Joey Hess	3334d3831b	change retrieveExport and getKey to throw exception retrieveExport is part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. getKey very rarely fails, and when it does it's always for the same reason (user configured annex.backend to url for some reason). So, this will avoid dealing with Nothing everywhere it's used. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-15 13:45:53 -04:00
Joey Hess	4814b444dd	make storeExport throw exceptions	2020-05-15 12:20:02 -04:00
Joey Hess	4be94c67c7	make removeKey throw exceptions	2020-05-14 14:11:05 -04:00
Joey Hess	d9c7f81ba4	make retrieveKeyFile and retrieveKeyFileCheap throw exceptions Converted retrieveKeyFileCheap to a Maybe, to avoid needing to throw a exception when a remote doesn't support it.	2020-05-13 17:07:07 -04:00
Joey Hess	c1cd402081	make storeKey throw exceptions When storing content on remote fails, always display a reason why. Since the Storer used by special remotes already did, this mostly affects git remotes, but not entirely. For example, if git-lfs failed to connect to the endpoint, it used to silently return False.	2020-05-13 14:03:00 -04:00
Joey Hess	b50ee9cd0c	remove Preparer abstraction That had almost no benefit at all, and complicated things quite a lot. What I proably wanted this to be was something like ResourceT, but it was not. The few remotes that actually need some preparation done only once and reused used a MVar and not Preparer.	2020-05-13 11:56:21 -04:00
Joey Hess	7ebc118776	adb: Better messages when the adb command is not installed After a user completely ignored the display of the exception probably because it didn't make sense.. This does make it a little bit slower since it checks adb is in path each time before running it. Also, it might display a lot of warnings about it not being installed. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-04-02 10:48:28 -04:00
Joey Hess	8af6d2c3c5	fix encryption of content to gcrypt and git-lfs Fix serious regression in gcrypt and encrypted git-lfs remotes. Since version 7.20200202.7, git-annex incorrectly stored content on those remotes without encrypting it. Problem was, Remote.Git enumerates all git remotes, including git-lfs and gcrypt. It then dispatches to those. So, Remote.List used the RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt, and that parser does not know about encryption fields, so did not include them in the ParsedRemoteConfig. (Also didn't include other fields specific to those remotes, perhaps chunking etc also didn't get through.) To fix, had to move RemoteConfig parsing down into the generate methods of each remote, rather than doing it in Remote.List. And a consequence of that was that ParsedRemoteConfig had to change to include the RemoteConfig that got parsed, so that testremote can generate a new remote based on an existing remote. (I would have rather fixed this just inside Remote.Git, but that was not practical, at least not w/o re-doing work that Remote.List already did. Big ugly mostly mechanical patch seemed preferable to making git-annex slower.)	2020-02-26 18:05:36 -04:00
Joey Hess	7038acf96c	add descriptions for all remote config fields not yet used	2020-01-20 15:20:04 -04:00
Joey Hess	c4ea3ca40a	ported almost all remotes, until my brain melted external is not started yet, and S3 is part way through and not compiling yet	2020-01-14 15:41:34 -04:00
Joey Hess	71ecfbfccf	be stricter about rejecting invalid configurations for remotes This is a first step toward that goal, using the ProposedAccepted type in RemoteConfig lets initremote/enableremote reject bad parameters that were passed in a remote's configuration, while avoiding enableremote rejecting bad parameters that have already been stored in remote.log This does not eliminate every place where a remote config is parsed and a default value is used if the parse false. But, I did fix several things that expected foo=yes/no and so confusingly accepted foo=true but treated it like foo=no. There are still some fields that are parsed with yesNo but not not checked when initializing a remote, and there are other fields that are parsed in other ways and not checked when initializing a remote. This also lays groundwork for rejecting unknown/typoed config keys.	2020-01-10 14:52:48 -04:00
Joey Hess	c19211774f	use filepath-bytestring for annex object manipulations git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.	2019-12-11 15:25:07 -04:00
Joey Hess	650a631ef8	include all remotes back in	2019-12-02 12:26:33 -04:00
Joey Hess	9828f45d85	add RemoteStateHandle This solves the problem of sameas remotes trampling over per-remote state. Used for: * per-remote state, of course * per-remote metadata, also of course * per-remote content identifiers, because two remote implementations could in theory generate the same content identifier for two different peices of content While chunk logs are per-remote data, they don't use this, because the number and size of chunks stored is a common property across sameas remotes. External special remote had a complication, where it was theoretically possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or EXPORTSUPPORTED. Since the uuid of the remote is typically generate in Remote.setup, it would only be possible to pass a Maybe RemoteStateHandle into it, and it would otherwise have to construct its own. Rather than go that route, I decided to send an ERROR in this case. It seems unlikely that any existing external special remote will be affected. They would have to make up a git-annex key, and set state for some reason during INITREMOTE. I can imagine such a hack, but it doesn't seem worth complicating the code in such an ugly way to support it. Unfortunately, both TestRemote and Annex.Import needed the Remote to have a new field added that holds its RemoteStateHandle.	2019-10-14 13:51:42 -04:00
Joey Hess	5004381dd9	improve error display when storing to an export/import remote fails Prompted by the test suite on windows failing to with "export foo failed" and no information about what went wrong. Note that only storeExportWithContentIdentifier has been converted. storeExport still returns a Bool and so exceptions may be hidden. However, storeExportWithContentIdentifier has many more failure modes, since it needs to avoid overwriting modified files. So it's more important it have better error display.	2019-08-13 12:05:00 -04:00
Joey Hess	cd86692c95	fix storeExportWithContentIdentifier	2019-04-09 19:15:20 -04:00
Joey Hess	7b6d0da9b8	adb import As well as adding the necessary methods, a few other changes to the adb remote: * Use ".annextmp" extension for temp files, to avoid conflict with other temp files. * Stop using "echo $?" to get exit status of command inside adb. There were two problems; first the "echo" just before it meant it was always 0! And secondly, it seems kind of random on my phone whether it's 1 or 0, not dependant on whether the command seems to have succeeded.	2019-04-09 17:52:41 -04:00
Joey Hess	2912429640	better indicate when special remotes do not support renameExport Avoid a warning message when renameExport is not supported, and just fallback to deleting with a subsequent re-upload. Especially needed for importtree remotes, where renameExport needs to be disabled. This changes the external special remote protocol, but in a backwards-compatible way. A reply of UNSUPPORTED-REQUEST to an older version of git-annex will cause it to make renameExport return False.	2019-03-11 12:53:24 -04:00
Joey Hess	ccc0684d21	no remotes support import yet	2019-02-20 16:59:04 -04:00
Joey Hess	9cebfd7002	purify exportActions Purifying exportActions will allow introspecting and modifying it, which is needed to add progress bar display to it. Only S3 and WebDAV ran an Annex action while constructing ExportActions. There was a small performance gain from them doing that, since a resource was able to be prepared and reused for multiple actions by Command.Export. As seen in commit `809cfbbd8a` and `5d394023eb` S3 and WebDAV actually create a new handle for each access in normal, non-export use. It doesn't seem worth making export use of them marginally more efficient than normal use. It would be better to do that work upfront when constructing the remote. Or perhaps use a MVar to cache a handle. This commit was sponsored by Nick Piper on Patreon.	2019-01-30 15:11:40 -04:00
Joey Hess	d3ab5e626b	rename key2file and file2key What these generate is not really suitable to be used as a filename, which is why keyFile and fileKey further escape it. These are just serializing Keys. Also removed a quickcheck test that was very unlikely to test anything useful, since it relied on random chance creating something that looks like a serialized key. The other test is sufficient for testing what that was intended to test anyway.	2019-01-14 13:03:35 -04:00
Joey Hess	02630b39ee	add Remote.readonly Does nothing yet. Considered making bup readonly, but while the content can't be removed, it is able to delete a branch, so didn't. This commit was supported by the NSF-funded DataLad project.	2018-08-30 11:12:18 -04:00
Joey Hess	4315bb9e42	add retrievalSecurityPolicy This will be used to protect against CVE-2018-10859, where an encrypted special remote is fed the wrong encrypted data, and so tricked into decrypting something that the user encrypted with their gpg key and did not store in git-annex. It also protects against CVE-2018-10857, where a remote follows a http redirect to a file:// url or to a local private web server. While that's already been prevented in git-annex's own use of http, external special remotes, hooks, etc use other http implementations and could still be vulnerable. The policy is not yet enforced, this commit only adds the appropriate metadata to remotes. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2018-06-21 11:36:36 -04:00
Joey Hess	90a3afb60f	adb: Android serial numbers are not all 16 characters long, so accept other lengths. I can't find any documentation of how long it should be. Hard to imagine it being shorter than 4 characters though, so put that in as a conservative lower bound. This commit was sponsored by Nick Piper on Patreon.	2018-06-12 13:56:01 -04:00
Joey Hess	67e46229a5	change Remote.repo to Remote.getRepo This is groundwork for letting a repo be instantiated the first time it's actually used, instead of at startup. The only behavior change is that some old special cases for xmpp remotes were removed. Where before git-annex silently did nothing with those no-longer supported remotes, it may now fail in some way. The additional IO action should have no performance impact as long as it's simply return. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon	2018-06-04 15:30:26 -04:00

1 2

53 commits