git-annex

Author	SHA1	Message	Date
Joey Hess	69f8e6c7c0	ImportableContentsChunkable This improves the borg special remote memory usage, by letting it only load one archive's worth of filenames into memory at a time, and building up a larger tree out of the chunks. When a borg repository has many archives, git-annex could easily OOM before. Now, it will use only memory proportional to the number of annexed keys in an archive. Minor implementation wart: Each new chunk re-opens the content identifier database, and also a new vector clock is used for each chunk. This is a minor innefficiency only; the use of continuations makes it hard to avoid, although putting the database handle into a Reader monad would be one way to fix it. It may later be possible to extend the ImportableContentsChunkable interface to remotes that are not third-party populated. However, that would perhaps need an interface that does not use continuations. The ImportableContentsChunkable interface currently does not allow populating the top of the tree with anything other than subtrees. It would be easy to extend it to allow putting files in that tree, but borg doesn't need that so I left it out for now. Sponsored-by: Noam Kremen on Patreon	2021-10-08 13:15:22 -04:00
Joey Hess	4f42292b13	improve url download failure display * When downloading urls fail, explain which urls failed for which reasons. * web: Avoid displaying a warning when downloading one url failed but another url later succeeded. Some other uses of downloadUrl use urls that are effectively internal use, and should not all be displayed to the user on failure. Eg, Remote.Git tries different urls where content could be located depending on how the remote repo is set up. Exposing those urls to the user would lead to wild goose chases. So had to parameterize it to control whether it displays urls or not. A side effect of this change is that when there are some youtube urls and some regular urls, it will try regular urls first, even if the youtube urls are listed first. This seems like an improvement if anything, but in any case there's no defined order of urls that it's supposed to use. Sponsored-by: Dartmouth College's Datalad project	2021-09-01 15:33:38 -04:00
Joey Hess	f5e09a1dbe	incremental verification for S3 Sponsored-by: Dartmouth College's DANDI project	2021-08-18 15:07:00 -04:00
Joey Hess	d154e7022e	incremental verification for web special remote Except when configuration makes curl be used. It did not seem worth trying to tail the file when curl is downloading. But when an interrupted download is resumed, it does not read the whole existing file to hash it. Same reason discussed in commit 7eb3742e4b76d1d7a487c2c53bf25cda4ee5df43; that could take a long time with no progress being displayed. And also there's an open http request, which needs to be consumed; taking a long time to hash the file might cause it to time out. Also in passing implemented it for git and external special remotes when downloading from the web. Several others like S3 are within striking distance now as well. Sponsored-by: Dartmouth College's DANDI project	2021-08-18 15:02:22 -04:00
Joey Hess	b1622eb932	incremental verify for directory special remote Added fileRetriever', which will let the remaining special remotes eventually also support incremental verify. Sponsored-by: Dartmouth College's DANDI project	2021-08-16 16:51:33 -04:00
Joey Hess	c20358b671	incremental verify for byteRetriever special remotes Several special remotes verify content while it is being retrieved, avoiding a separate checksum pass. They are: S3, bup, ddar, and gcrypt (with a local repository). Not done when using chunking, yet. Complicated by Retriever needing to change to be polymorphic. Which in turn meant RankNTypes is needed, and also needed some code changes. The change in Remote.External does not change behavior at all but avoids the type checking failing because of a "rigid, skolem type" which "would escape its scope". So I refactored slightly to make the type checker's job easier there. Unfortunately, directory uses fileRetriever (except when chunked), so it is not amoung the improved ones. Fixing that would need a way for FileRetriever to return a Verification. But, since the file retrieved may be encrypted or chunked, it would be extra work to always incrementally checksum the file while retrieving it. Hm. Some other special remotes use fileRetriever, and so don't get incremental verification, but could be converted to byteRetriever later. One is GitLFS, which uses downloadConduit, which writes to the file, so could verify as it goes. Other special remotes like web could too, but don't use Remote.Helper.Special and so will need to be addressed separately. Sponsored-by: Dartmouth College's DANDI project	2021-08-11 14:20:38 -04:00
Joey Hess	f8836306fa	remove "checking remotename" message This fixes fsck of a remote that uses chunking displaying (checking remotename) (checking remotename)" for every chunk. Also, some remotes displayed the message, and others did not, with no consistency. It was originally displayed only when accessing remotes that were expensive or might involve a password prompt, I think, but nothing in the API said when to do it so it became an inconsistent mess. Originally I thought fsck should always display it. But it only displays in fsck --from remote, so the user knows the remote is being accessed, so there is no reason to tell them it's accessing it over and over. It was also possible for git-annex move to sometimes display it twice, due to checking if content is present twice. But, the user of move specifies --from/--to, so it does not need to display when it's accessing the remote, as the user expects it to access the remote. git-annex get might display it, but only if the remote also supports hasKeyCheap, which is really only local git remotes, which didn't display it always; and in any case nothing displayed it before hasKeyCheap, which is checked first, so I don't think this needs to display it ever. mirror is like move. And that's all the main places it would have been displayed. This commit was sponsored by Jochen Bartl on Patreon.	2021-04-27 13:05:27 -04:00
Joey Hess	13c090b37a	use fastDebug everywhere it can be used None of these are likely to yeild a noticable speedup though.	2021-04-06 15:41:24 -04:00
Joey Hess	aaba83795b	switch from hslogger to purpose-built Utility.Debug This uses a DebugSelector, rather than debug levels, which will allow for a later option like --debug-from=Process to only see debuging about running processes. The module name that contains the thing being debugged is used as the DebugSelector (in most cases; does not need to be a hard and fast rule). Debug calls were changed to add that. hslogger did not display that first parameter to debugM, but the DebugSelector does get displayed. Also fastDebug will allow doing debugging in places that are used in tight loops, with the DebugSelector coming from the Annex Reader essentially for free. Not done yet.	2021-04-05 13:40:31 -04:00
Joey Hess	0e44c252c8	avoid getting creds from environment during autoenable When autoenabling special remotes of type S3, weddav, or glacier, do not take login credentials from environment variables, as the user may not be expecting the autoenable to happen, and may have those set for other purposes.	2021-03-17 09:41:12 -04:00
Joey Hess	36133f27c0	move untrust forcing from Logs.Trust into Remote No behavior changes here, but this is groundwork for letting remotes such as borg vary untrust forcing depending on configuration.	2020-12-28 15:22:10 -04:00
Joey Hess	46059ab0e5	split off versionedExport from appendonly S3 uses versionedExport, while GitLFS uses appendonly. This is groundwork for later changes.	2020-12-28 14:37:15 -04:00
Joey Hess	4f9969d0a1	optimisation for borg Skip needing to list importable contents when unchanged since last time.	2020-12-22 15:00:05 -04:00
Joey Hess	e1ac42be77	convert listImportableContents to throwing exceptions	2020-12-22 14:24:29 -04:00
Joey Hess	9a2c8757f3	add thirdPartyPopulated interface This is to support, eg a borg repo as a special remote, which is populated not by running git-annex commands, but by using borg. Then git-annex sync lists the content of the remote, learns which files are annex objects, and treats those as present in the remote. So, most of the import machinery is reused, to a new purpose. While normally importtree maintains a remote tracking branch, this does not, because the files stored in the remote are annex object files, not user-visible filenames. But, internally, a git tree is still generated, of the files on the remote that are annex objects. This tree is used by retrieveExportWithContentIdentifier, etc. As with other import/export remotes, that the tree is recorded in the export log, and gets grafted into the git-annex branch. importKey changed to be able to return Nothing, to indicate when an ImportLocation is not an annex object and so should be skipped from being included in the tree. It did not seem to make sense to have git-annex import do this, since from the user's perspective, it's not like other imports. So only git-annex sync does it. Note that, git-annex sync does not yet download objects from such remotes that are preferred content. importKeys is run with content downloading disabled, to avoid getting the content of all objects. Perhaps what's needed is for seekSyncContent to be run with these remotes, but I don't know if it will just work (in particular, it needs to avoid trying to transfer objects to them), so I skipped that for now. (Untested and unused as of yet.) This commit was sponsored by Jochen Bartl on Patreon.	2020-12-18 15:23:58 -04:00
Joey Hess	e9db382308	avoid redundant set of a S3 verison ID that is already recorded I think this could cause unnecessary changes to the git-annex branch, and retrieveExportWithContentIdentifier is now also used for getting content from importtree=yes remotes, so it would happen more frequently so let's avoid.	2020-12-17 16:49:17 -04:00
Joey Hess	9b0dde834e	convert getFileSize to RawFilePath Lots of nice wins from this in avoiding unncessary work, and I think nothing got slower. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-11-05 11:32:57 -04:00
Joey Hess	5844a54869	aws-0.22 improved its support for setting etags, which improves support for versioned S3 buckets. Remove placeholder version number I used when implementing the feature in aws. This commit was sponsored by Ethan Aubin.	2020-09-14 18:37:49 -04:00
Joey Hess	ddf963d019	deepseq all things returned from ResourceT http Potentially fixes https://git-annex.branchable.com/bugs/concurrent_git-annex-copy_to_s3_special_remote_fails/ although I don't know if it does. My thinking is, ResourceT may allocate a resource and then free it, and a unforced thunk to that resource could result in reading memory that has since been overwritten by something else, or in a SEGV, depending. While that seems kind of like a bug in ResourceT to me, if it is what's happening, this will avoid it. If it's not, this doesn't really hurt much since the values are all smallish. This commit was sponsored by Graham Spencer on Patreon.	2020-09-14 18:30:06 -04:00
Joey Hess	ddcab38e4a	no importKey for S3 for now The Etag is sometimes a md5, but not if eg, there was a multipart upload. May revisit later if there's demand.	2020-07-03 13:53:14 -04:00
Joey Hess	3175015d1b	lockContent for S3 (with versioning=yes) and git-lfs Made several special remotes support locking content on them while dropping, which allows dropping from another special remote when the content will only remain on a special remote of these types. In both cases, verify the content is present actively, because it's certianly possible for things other than git-annex to have removed it. Worth thinking about what to do if at some later point, git-lfs gains support for dropping content, and a content locking operation. That would probably need a transition; first would need to make lockContent use the locking operation. Then, once enough time had passed that we can assume any git-annex operating on the git-lfs remote had that change, git-annex could finally allow dropping from git-lfs. Or, it could be that git-lfs gains support for dropping content, but not locking it. In that case, it seems this commit would need to be reverted, and then wait long enough for that git-annex to be everywhere, and only then can git-annex safely support dropping from git-lfs. So, the assumption made in this commit could lead to bother later.. But I think it's actually highly unlikely git-lfs does ever support dropping; it's outside their centralized model. Probably. :) Worth keeping in mind as the same assumption is made about other special remotes though. This commit was sponsored by Ethan Aubin.	2020-06-26 13:46:42 -04:00
Joey Hess	ad81feb053	fix implicit embedcreds regression Fix bug that made creds not be stored in git when a special remote was initialized with gpg encryption, but without an explicit embedcreds=yes. (Yet nother regression introduced in version 7.20200202.7. 5th so far.)	2020-06-16 18:00:19 -04:00
Joey Hess	41952204ce	S3: The REDUCED_REDUNDANCY storage class is no longer cheaper So stop documenting it, and stop offering it as a choice in the assistant. Removed the code that parses it into S3.ReducedRedundancy, because S3.OtherStorageClass with the value will work just the same and avoids a special case for a deprecated this.	2020-06-16 12:04:29 -04:00
Joey Hess	e63dcbf36c	fix embedcreds=yes reversion Fix bug that made enableremote of S3 and webdav remotes, that have embedcreds=yes, fail to set up the embedded creds, so accessing the remotes failed. (Regression introduced in version 7.20200202.7 in when reworking all the remote configs to be parsed.) Root problem is that parseEncryptionConfig excludes all other config keys except encryption ones, so it is then unable to find the credPairRemoteField. And since that field is not required to be present, it proceeds as if it's not, rather than failing in any visible way. This causes it to not find any creds, and so it does not cache them. When when the S3 remote tries to make a S3 connection, it finds no creds, so assumes it's being used in no-creds mode, and tries to find a public url. With no public url available, it fails, but the failure doesn't say a lack of creds is the problem. Fix is to provide setRemoteCredPair with a ParsedRemoteConfig, so the full set of configs of the remote can be parsed. A bit annoying to need to parse the remote config before the full config (as returned by setRemoteCredPair) is available, but this avoids the problem. I assume webdav also had the problem by inspection, but didn't try to reproduce it with it. Also, getRemoteCredPair used getRemoteConfigValue to get a ProposedAccepted String, but that does not seem right. Now that it runs that code, it crashed saying it had just a String. Remotes that have already been enableremoted, and so lack the cached creds file will work after this fix, because getRemoteCredPair will extract the creds from the remote config, writing the missing file. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-21 14:35:30 -04:00
Joey Hess	6361074174	convert renameExport to throw exception Finishes the transition to make remote methods throw exceptions, rather than silently hide them. A bit on the fence about this one, because when renameExport fails, it falls back to deleting instead, and so does the user care why it failed? However, it did let me clean up several places in the code. This commit was sponsored by Ethan Aubin.	2020-05-15 15:08:09 -04:00
Joey Hess	cdbfaae706	change removeExport to throw exception Part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. This commit was sponsored by Graham Spencer on Patreon.	2020-05-15 14:15:14 -04:00
Joey Hess	3334d3831b	change retrieveExport and getKey to throw exception retrieveExport is part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. getKey very rarely fails, and when it does it's always for the same reason (user configured annex.backend to url for some reason). So, this will avoid dealing with Nothing everywhere it's used. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-15 13:45:53 -04:00
Joey Hess	4814b444dd	make storeExport throw exceptions	2020-05-15 12:20:02 -04:00
Joey Hess	4be94c67c7	make removeKey throw exceptions	2020-05-14 14:11:05 -04:00
Joey Hess	d9c7f81ba4	make retrieveKeyFile and retrieveKeyFileCheap throw exceptions Converted retrieveKeyFileCheap to a Maybe, to avoid needing to throw a exception when a remote doesn't support it.	2020-05-13 17:07:07 -04:00
Joey Hess	c1cd402081	make storeKey throw exceptions When storing content on remote fails, always display a reason why. Since the Storer used by special remotes already did, this mostly affects git remotes, but not entirely. For example, if git-lfs failed to connect to the endpoint, it used to silently return False.	2020-05-13 14:03:00 -04:00
Joey Hess	b50ee9cd0c	remove Preparer abstraction That had almost no benefit at all, and complicated things quite a lot. What I proably wanted this to be was something like ResourceT, but it was not. The few remotes that actually need some preparation done only once and reused used a MVar and not Preparer.	2020-05-13 11:56:21 -04:00
Joey Hess	1532d67c3e	S3: Support signature=v4 To use S3 Signature Version 4. Some S3 services seem to require v4, while others may only support v2, which remains the default. I'm also not sure if v4 works correctly in all cases, there is this upstream bug report: https://github.com/aristidb/aws/issues/262 I've only tested it against the default S3 endpoint.	2020-05-07 13:18:11 -04:00
Joey Hess	81e3faf810	Merge branch 'v7'	2020-02-26 18:15:18 -04:00
Joey Hess	8af6d2c3c5	fix encryption of content to gcrypt and git-lfs Fix serious regression in gcrypt and encrypted git-lfs remotes. Since version 7.20200202.7, git-annex incorrectly stored content on those remotes without encrypting it. Problem was, Remote.Git enumerates all git remotes, including git-lfs and gcrypt. It then dispatches to those. So, Remote.List used the RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt, and that parser does not know about encryption fields, so did not include them in the ParsedRemoteConfig. (Also didn't include other fields specific to those remotes, perhaps chunking etc also didn't get through.) To fix, had to move RemoteConfig parsing down into the generate methods of each remote, rather than doing it in Remote.List. And a consequence of that was that ParsedRemoteConfig had to change to include the RemoteConfig that got parsed, so that testremote can generate a new remote based on an existing remote. (I would have rather fixed this just inside Remote.Git, but that was not practical, at least not w/o re-doing work that Remote.List already did. Big ugly mostly mechanical patch seemed preferable to making git-annex slower.)	2020-02-26 18:05:36 -04:00
Joey Hess	67476fbc54	minor code simplification	2020-02-25 13:06:09 -04:00
Joey Hess	1883f7ef8f	support git remotes that need http basic auth using git credential to get the password One thing this doesn't do is wrap the password prompting inside the prompt action. So with -J, the output can be a bit garbled.	2020-01-22 16:16:19 -04:00
Joey Hess	2be4122bfc	include passthrough params in --describe-other-params	2020-01-20 16:53:27 -04:00
Joey Hess	7038acf96c	add descriptions for all remote config fields not yet used	2020-01-20 15:20:04 -04:00
Joey Hess	99cb3e75f1	add LISTCONFIGS to external special remote protocol Special remote programs that use GETCONFIG/SETCONFIG are recommended to implement it. The description is not yet used, but will be useful later when adding a way to make initremote list all accepted configs. configParser now takes a RemoteConfig parameter. Normally, that's not needed, because configParser returns a parter, it does not parse it itself. But, it's needed to look at externaltype and work out what external remote program to run for LISTCONFIGS. Note that, while externalUUID is changed to a Maybe UUID, checkExportSupported used to use NoUUID. The code that now checks for Nothing used to behave in some undefined way if the external program made requests that triggered it. Also, note that in externalSetup, once it generates external, it parses the RemoteConfig strictly. That generates a ParsedRemoteConfig, which is thrown away. The reason it's ok to throw that away, is that, if the strict parse succeeded, the result must be the same as the earlier, lenient parse. initremote of an external special remote now runs the program three times. First for LISTCONFIGS, then EXPORTSUPPORTED, and again LISTCONFIGS+INITREMOTE. It would not be hard to eliminate at least one of those, and it should be possible to only run the program once.	2020-01-17 16:07:17 -04:00
Joey Hess	907ca937ab	use more field functions Using field functions consistently avoids possibility of typos and also helps ensure that all fields are added to RemoteConfigParsers (as long as I have remembered to add them when writing the functions).	2020-01-15 11:15:07 -04:00
Joey Hess	7f2bfd41d7	include credPairRemoteFields in RemoteConfigParsers Avoids parse error when the fields are added to RemoteConfig at setup time and it then gets parsed, also at setup time. After setup time, such internally added fields are not a problem, because they're Accepted. So it may not be necessary in all cases to list such internally added fields, but I think it's a good idea to always do so.	2020-01-15 10:57:45 -04:00
Joey Hess	0706d9d093	finish porting S3	2020-01-15 10:52:28 -04:00
Joey Hess	c4ea3ca40a	ported almost all remotes, until my brain melted external is not started yet, and S3 is part way through and not compiling yet	2020-01-14 15:41:34 -04:00
Joey Hess	71ecfbfccf	be stricter about rejecting invalid configurations for remotes This is a first step toward that goal, using the ProposedAccepted type in RemoteConfig lets initremote/enableremote reject bad parameters that were passed in a remote's configuration, while avoiding enableremote rejecting bad parameters that have already been stored in remote.log This does not eliminate every place where a remote config is parsed and a default value is used if the parse false. But, I did fix several things that expected foo=yes/no and so confusingly accepted foo=true but treated it like foo=no. There are still some fields that are parsed with yesNo but not not checked when initializing a remote, and there are other fields that are parsed in other ways and not checked when initializing a remote. This also lays groundwork for rejecting unknown/typoed config keys.	2020-01-10 14:52:48 -04:00
Joey Hess	650a631ef8	include all remotes back in	2019-12-02 12:26:33 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	890330f0fe	make --json-error-messages capture url download errors Convert Utility.Url to return Either String so the error message can be displated in the annex monad and so captured. (When curl is used, its errors are still not caught.)	2019-11-12 13:52:38 -04:00
Joey Hess	9828f45d85	add RemoteStateHandle This solves the problem of sameas remotes trampling over per-remote state. Used for: * per-remote state, of course * per-remote metadata, also of course * per-remote content identifiers, because two remote implementations could in theory generate the same content identifier for two different peices of content While chunk logs are per-remote data, they don't use this, because the number and size of chunks stored is a common property across sameas remotes. External special remote had a complication, where it was theoretically possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or EXPORTSUPPORTED. Since the uuid of the remote is typically generate in Remote.setup, it would only be possible to pass a Maybe RemoteStateHandle into it, and it would otherwise have to construct its own. Rather than go that route, I decided to send an ERROR in this case. It seems unlikely that any existing external special remote will be affected. They would have to make up a git-annex key, and set state for some reason during INITREMOTE. I can imagine such a hack, but it doesn't seem worth complicating the code in such an ugly way to support it. Unfortunately, both TestRemote and Annex.Import needed the Remote to have a new field added that holds its RemoteStateHandle.	2019-10-14 13:51:42 -04:00
Joey Hess	c3975ff3b4	sameas RemoteConfig inheritance I found a way to avoid inheritance complicating anything outside of Logs.Remote. It seems fine to require all inherited values to be inherited and not set in the sameas remote's config. Since inherited values will be used for stuff like encryption and perhaps chunking, which control the actual content stored on the remote, it seems likely that there will not be any reason to need them to vary between two remotes that access the same underlying data store. The newer version of containers is free; the minimum ghc version is bundled with a newer version than that.	2019-10-10 15:58:22 -04:00

1 2 3 4 5 ...

296 commits