git-annex

Author	SHA1	Message	Date
Joey Hess	3ea835c7e8	proxied exporttree=yes versionedexport=yes remotes are not untrusted This removes versionedExport, which was only used by the S3 special remote. Instead, versionedexport=yes is a common way for remotes to indicate that they are versioned.	2024-08-08 15:24:19 -04:00
Joey Hess	1243af4a18	toward SafeDropProof expiry checking Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is based on a LockedCopy. If there are several LockedCopies, it uses the closest expiry time. That is not optimal, it may be that the proof expires based on one LockedCopy but another one has not expired. But that seems unlikely to really happen, and anyway the user can just re-run a drop if it fails due to expiry. Pass the SafeDropProof to removeKey, which is responsible for checking it for expiry in situations where that could be a problem. Which really only means in Remote.Git. Made Remote.Git check expiry when dropping from a local remote. Checking expiry when dropping from a P2P remote is not yet implemented. P2P.Protocol.remove has SafeDropProof plumbed through to it for that purpose. Fixing the remaining 2 build warnings should complete this work. Note that the use of a POSIXTime here means that if the clock gets set forward while git-annex is in the middle of a drop, it may say that dropping took too long. That seems ok. Less ok is that if the clock gets turned back a sufficient amount (eg 5 minutes), proof expiry won't be noticed. It might be better to use the Monotonic clock, but that doesn't advance when a laptop is suspended, and while there is the linux Boottime clock, that is not available on other systems. Perhaps a combination of POSIXTime and the Monotonic clock could detect laptop suspension and also detect clock being turned back? There is a potential future flag day where p2pDefaultLockContentRetentionDuration is not assumed, but is probed using the P2P protocol, and peers that don't support it can no longer produce a LockedCopy. Until that happens, when git-annex is communicating with older peers there is a risk of data loss when a ssh connection closes during LOCKCONTENT.	2024-07-04 12:39:06 -04:00
Joey Hess	7cef5e8f35	export tree: avoid confusing output about renaming files When a file in the export is renamed, and the remote's renameExport returned Nothing, renaming to the temp file would first say it was renaming, and appear to succeed, but actually what it did was delete the file. Then renaming from the temp file would not do anything, since the temp file is not present on the remote. This appeared as if a file got renamed to a temp file and left there. Note that exporttree=yes importree=yes remotes have their usual renameExport replaced with one that returns Nothing. (For reasons explained in Remote.Helper.ExportImport.) So this happened even with remotes that support renameExport. Fix by letting renameExport = Nothing when it's not supported at all. This avoids displaying the rename. Sponsored-by: Graham Spencer on Patreon	2024-03-09 13:50:26 -04:00
Joey Hess	9286769d2c	let Remote.availability return Unavilable This is groundwork for making special remotes like borg be skipped by sync when on an offline drive. Added AVAILABILITY UNAVAILABLE reponse and the UNAVAILABLERESPONSE extension to the external special remote protocol. The extension is needed because old git-annex, if it sees that response, will display a warning message. (It does continue as if the remote is globally available, which is acceptable, and the warning is only displayed at initremote due to remote.name.annex-availability caching, but still it seemed best to make this a protocol extension.) The remote.name.annex-availability git config is no longer used any more, and is documented as such. It was only used by external special remotes to cache the availability, to avoid needing to start the external process every time. Now that availability is queried as an Annex action, the external is only started by sync (and the assistant), when they actually check availability. Sponsored-by: Nicholas Golder-Manning on Patreon	2023-08-16 14:31:31 -04:00
Joey Hess	11db986781	enable TypeOperators The use of ‘~’ without TypeOperators will become an error in a future GHC release. says new ghc	2023-08-01 18:33:39 -04:00
Joey Hess	33ba537728	deal with Amazon S3 breaking change for public=yes * S3: Amazon S3 buckets created after April 2023 do not support ACLs, so public=yes cannot be used with them. Existing buckets configured with public=yes will keep working. * S3: Allow setting publicurl=yes without public=yes, to support buckets that are configured with a Bucket Policy that allows public access. Sponsored-by: Joshua Antonishen on Patreon	2023-07-21 13:59:07 -04:00
Joey Hess	8b6c7bdbcc	filter out control characters in all other Messages This does, as a side effect, make long notes in json output not be indented. The indentation is only needed to offset them underneath the display of the file they apply to, so that's ok. Sponsored-by: Brock Spratlen on Patreon	2023-04-11 12:58:01 -04:00
Joey Hess	3290a09a70	filter out control characters in warning messages Converted warning and similar to use StringContainingQuotedPath. Most warnings are static strings, some do refer to filepaths that need to be quoted, and others don't need quoting. Note that, since quote filters out control characters of even UnquotedString, this makes all warnings safe, even when an attacker sneaks in a control character in some other way. When json is being output, no quoting is done, since json gets its own quoting. This does, as a side effect, make warning messages in json output not be indented. The indentation is only needed to offset warning messages underneath the display of the file they apply to, so that's ok. Sponsored-by: Brett Eisenberg on Patreon	2023-04-10 15:55:44 -04:00
Yaroslav Halchenko	84b0a3707a	Apply codespell -w throughout	2023-03-17 15:14:58 -04:00
Joey Hess	143de24769	fix build warning	2023-02-06 16:30:50 -04:00
Joey Hess	04ec726d3b	S3 region= S3: Support a region= configuration useful for some non-Amazon S3 implementations. This feature needs git-annex to be built with aws-0.24. datacenter= sets both the AWS hostname and region in one setting, which is easy when using AWS, but not useful for other hosts. So kept datacenter as-is, but added this additional config. Sponsored-By: Brett Eisenberg on Patreon	2023-02-06 14:08:45 -04:00
Joey Hess	cfaae7e931	added an optional cost= configuration to all special remotes Note that when this is specified and an older git-annex is used to enableremote such a special remote, it will simply ignore the cost= field and use whatever the default cost is. In passing, fixed adb to support the remote.name.cost and remote.name.cost-command configs. Sponsored-by: Dartmouth College's DANDI project	2023-01-12 13:42:28 -04:00
Joey Hess	27444459e9	fix build warning	2022-11-09 15:33:46 -04:00
Joey Hess	c69d340ce5	remove dangling where clause	2022-11-09 13:25:05 -04:00
Joey Hess	e100993935	complete support for S3 signature=anonymous aws-0.23 has been released. When built with an older aws, initremote will error out when run with signature=anonymous. And when a remote has been initialized with that by a version of git-annex that does support it, older versions will fail when the remote is accessed, with a useful error message. Sponsored-by: Dartmouth College's DANDI project	2022-11-04 16:20:28 -04:00
Joey Hess	de1e8201a6	Merge branch 'master' into anons3	2022-11-04 15:08:29 -04:00
Joey Hess	b4305315b2	S3: pass fileprefix into getBucket calls S3: Speed up importing from a large bucket when fileprefix= is set by only asking for files under the prefix. getBucket still returns the files with the prefix included, so the rest of the fileprefix stripping still works unchanged. Sponsored-by: Dartmouth College's DANDI project	2022-10-10 17:37:26 -04:00
Joey Hess	ca91c3ba91	S3: Support signature=anonymous to access a S3 bucket anonymously This can be used, for example, with importtree=yes to import from a public bucket. This needs a patch that has not yet landed in the aws library, and will need to be adjusted to support compiling with old versions of the library, so is not yet suitable for merging. See https://github.com/aristidb/aws/pull/281 The stack.yaml changes are provided to show how to build against the aws fork and will need to be reverted as well. Sponsored-by: Dartmouth College's DANDI project	2022-10-10 17:02:45 -04:00
Joey Hess	90f9671e00	future proof AWS.Credentials generation Avoid breaking when a field is added to the constructor. Sponsored-by: Dartmouth College's DANDI project	2022-10-10 16:33:21 -04:00
Joey Hess	0ffc59d341	change retrieveExportWithContentIdentifier to take a list of ContentIdentifier This partly fixes an issue where there are duplicate files in the special remote, and the first file gets swapped with another duplicate, or deleted. The swap case is fixed by this, the deleted case will need other changes. This makes retrieveExportWithContentIdentifier take a list of allowed ContentIdentifier, same as storeExportWithContentIdentifier, removeExportWithContentIdentifier, and checkPresentExportWithContentIdentifier. Of the special remotes that support importtree, borg is a special case and does not use content identifiers, S3 I assume can't get mixed up like this, directory certainly has the problem, and adb also appears to have had the problem. Sponsored-by: Graham Spencer on Patreon	2022-09-20 13:19:42 -04:00
Joey Hess	093ad89ead	S3: Avoid writing or checking the uuid file in the S3 bucket when importtree=yes or exporttree=yes It does not make sense for either; importing from an existing bucket should not write to it. And the user may not have write access at all. And exporting to a bucket should not write other files. Also this prevents the uuid file being imported after being written. Sponsored-by: Dartmouth College's DANDI project	2022-07-14 15:05:51 -04:00
Joey Hess	cb9cf30c48	move several readonly values to AnnexRead This improves performance to a small extent in several places. Sponsored-by: Tobias Ammann on Patreon	2022-06-28 15:40:19 -04:00
Joey Hess	e8a601aa24	incremental verification for retrieval from import remotes Sponsored-by: Dartmouth College's Datalad project	2022-05-09 15:39:43 -04:00
Joey Hess	2f2701137d	incremental verification for retrieval from all export remotes Only for export remotes so far, not export/import. Sponsored-by: Dartmouth College's Datalad project	2022-05-09 13:49:33 -04:00
Joey Hess	90950a37e5	support incremental verification when retrieving from export/import remotes None of the special remotes do it yet, but this lays the groundwork. Added MustFinishIncompleteVerify so that, when an incremental verify is started but not complete, it can be forced to finish it. Otherwise, it would have skipped doing it when verification is disabled, but verification must always be done when retrievin from export remotes since files can be modified during retrieval. Note that retrieveExportWithContentIdentifier doesn't support incremental verification yet. And I'm not sure if it can -- it doesn't know the Key before it downloads the content. It seems a new API call would need to be split out of that, which is provided with the key. Sponsored-by: Dartmouth College's Datalad project	2022-05-09 12:25:04 -04:00
Joey Hess	69f8e6c7c0	ImportableContentsChunkable This improves the borg special remote memory usage, by letting it only load one archive's worth of filenames into memory at a time, and building up a larger tree out of the chunks. When a borg repository has many archives, git-annex could easily OOM before. Now, it will use only memory proportional to the number of annexed keys in an archive. Minor implementation wart: Each new chunk re-opens the content identifier database, and also a new vector clock is used for each chunk. This is a minor innefficiency only; the use of continuations makes it hard to avoid, although putting the database handle into a Reader monad would be one way to fix it. It may later be possible to extend the ImportableContentsChunkable interface to remotes that are not third-party populated. However, that would perhaps need an interface that does not use continuations. The ImportableContentsChunkable interface currently does not allow populating the top of the tree with anything other than subtrees. It would be easy to extend it to allow putting files in that tree, but borg doesn't need that so I left it out for now. Sponsored-by: Noam Kremen on Patreon	2021-10-08 13:15:22 -04:00
Joey Hess	4f42292b13	improve url download failure display * When downloading urls fail, explain which urls failed for which reasons. * web: Avoid displaying a warning when downloading one url failed but another url later succeeded. Some other uses of downloadUrl use urls that are effectively internal use, and should not all be displayed to the user on failure. Eg, Remote.Git tries different urls where content could be located depending on how the remote repo is set up. Exposing those urls to the user would lead to wild goose chases. So had to parameterize it to control whether it displays urls or not. A side effect of this change is that when there are some youtube urls and some regular urls, it will try regular urls first, even if the youtube urls are listed first. This seems like an improvement if anything, but in any case there's no defined order of urls that it's supposed to use. Sponsored-by: Dartmouth College's Datalad project	2021-09-01 15:33:38 -04:00
Joey Hess	f5e09a1dbe	incremental verification for S3 Sponsored-by: Dartmouth College's DANDI project	2021-08-18 15:07:00 -04:00
Joey Hess	d154e7022e	incremental verification for web special remote Except when configuration makes curl be used. It did not seem worth trying to tail the file when curl is downloading. But when an interrupted download is resumed, it does not read the whole existing file to hash it. Same reason discussed in commit 7eb3742e4b76d1d7a487c2c53bf25cda4ee5df43; that could take a long time with no progress being displayed. And also there's an open http request, which needs to be consumed; taking a long time to hash the file might cause it to time out. Also in passing implemented it for git and external special remotes when downloading from the web. Several others like S3 are within striking distance now as well. Sponsored-by: Dartmouth College's DANDI project	2021-08-18 15:02:22 -04:00
Joey Hess	b1622eb932	incremental verify for directory special remote Added fileRetriever', which will let the remaining special remotes eventually also support incremental verify. Sponsored-by: Dartmouth College's DANDI project	2021-08-16 16:51:33 -04:00
Joey Hess	c20358b671	incremental verify for byteRetriever special remotes Several special remotes verify content while it is being retrieved, avoiding a separate checksum pass. They are: S3, bup, ddar, and gcrypt (with a local repository). Not done when using chunking, yet. Complicated by Retriever needing to change to be polymorphic. Which in turn meant RankNTypes is needed, and also needed some code changes. The change in Remote.External does not change behavior at all but avoids the type checking failing because of a "rigid, skolem type" which "would escape its scope". So I refactored slightly to make the type checker's job easier there. Unfortunately, directory uses fileRetriever (except when chunked), so it is not amoung the improved ones. Fixing that would need a way for FileRetriever to return a Verification. But, since the file retrieved may be encrypted or chunked, it would be extra work to always incrementally checksum the file while retrieving it. Hm. Some other special remotes use fileRetriever, and so don't get incremental verification, but could be converted to byteRetriever later. One is GitLFS, which uses downloadConduit, which writes to the file, so could verify as it goes. Other special remotes like web could too, but don't use Remote.Helper.Special and so will need to be addressed separately. Sponsored-by: Dartmouth College's DANDI project	2021-08-11 14:20:38 -04:00
Joey Hess	f8836306fa	remove "checking remotename" message This fixes fsck of a remote that uses chunking displaying (checking remotename) (checking remotename)" for every chunk. Also, some remotes displayed the message, and others did not, with no consistency. It was originally displayed only when accessing remotes that were expensive or might involve a password prompt, I think, but nothing in the API said when to do it so it became an inconsistent mess. Originally I thought fsck should always display it. But it only displays in fsck --from remote, so the user knows the remote is being accessed, so there is no reason to tell them it's accessing it over and over. It was also possible for git-annex move to sometimes display it twice, due to checking if content is present twice. But, the user of move specifies --from/--to, so it does not need to display when it's accessing the remote, as the user expects it to access the remote. git-annex get might display it, but only if the remote also supports hasKeyCheap, which is really only local git remotes, which didn't display it always; and in any case nothing displayed it before hasKeyCheap, which is checked first, so I don't think this needs to display it ever. mirror is like move. And that's all the main places it would have been displayed. This commit was sponsored by Jochen Bartl on Patreon.	2021-04-27 13:05:27 -04:00
Joey Hess	13c090b37a	use fastDebug everywhere it can be used None of these are likely to yeild a noticable speedup though.	2021-04-06 15:41:24 -04:00
Joey Hess	aaba83795b	switch from hslogger to purpose-built Utility.Debug This uses a DebugSelector, rather than debug levels, which will allow for a later option like --debug-from=Process to only see debuging about running processes. The module name that contains the thing being debugged is used as the DebugSelector (in most cases; does not need to be a hard and fast rule). Debug calls were changed to add that. hslogger did not display that first parameter to debugM, but the DebugSelector does get displayed. Also fastDebug will allow doing debugging in places that are used in tight loops, with the DebugSelector coming from the Annex Reader essentially for free. Not done yet.	2021-04-05 13:40:31 -04:00
Joey Hess	0e44c252c8	avoid getting creds from environment during autoenable When autoenabling special remotes of type S3, weddav, or glacier, do not take login credentials from environment variables, as the user may not be expecting the autoenable to happen, and may have those set for other purposes.	2021-03-17 09:41:12 -04:00
Joey Hess	36133f27c0	move untrust forcing from Logs.Trust into Remote No behavior changes here, but this is groundwork for letting remotes such as borg vary untrust forcing depending on configuration.	2020-12-28 15:22:10 -04:00
Joey Hess	46059ab0e5	split off versionedExport from appendonly S3 uses versionedExport, while GitLFS uses appendonly. This is groundwork for later changes.	2020-12-28 14:37:15 -04:00
Joey Hess	4f9969d0a1	optimisation for borg Skip needing to list importable contents when unchanged since last time.	2020-12-22 15:00:05 -04:00
Joey Hess	e1ac42be77	convert listImportableContents to throwing exceptions	2020-12-22 14:24:29 -04:00
Joey Hess	9a2c8757f3	add thirdPartyPopulated interface This is to support, eg a borg repo as a special remote, which is populated not by running git-annex commands, but by using borg. Then git-annex sync lists the content of the remote, learns which files are annex objects, and treats those as present in the remote. So, most of the import machinery is reused, to a new purpose. While normally importtree maintains a remote tracking branch, this does not, because the files stored in the remote are annex object files, not user-visible filenames. But, internally, a git tree is still generated, of the files on the remote that are annex objects. This tree is used by retrieveExportWithContentIdentifier, etc. As with other import/export remotes, that the tree is recorded in the export log, and gets grafted into the git-annex branch. importKey changed to be able to return Nothing, to indicate when an ImportLocation is not an annex object and so should be skipped from being included in the tree. It did not seem to make sense to have git-annex import do this, since from the user's perspective, it's not like other imports. So only git-annex sync does it. Note that, git-annex sync does not yet download objects from such remotes that are preferred content. importKeys is run with content downloading disabled, to avoid getting the content of all objects. Perhaps what's needed is for seekSyncContent to be run with these remotes, but I don't know if it will just work (in particular, it needs to avoid trying to transfer objects to them), so I skipped that for now. (Untested and unused as of yet.) This commit was sponsored by Jochen Bartl on Patreon.	2020-12-18 15:23:58 -04:00
Joey Hess	e9db382308	avoid redundant set of a S3 verison ID that is already recorded I think this could cause unnecessary changes to the git-annex branch, and retrieveExportWithContentIdentifier is now also used for getting content from importtree=yes remotes, so it would happen more frequently so let's avoid.	2020-12-17 16:49:17 -04:00
Joey Hess	9b0dde834e	convert getFileSize to RawFilePath Lots of nice wins from this in avoiding unncessary work, and I think nothing got slower. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-11-05 11:32:57 -04:00
Joey Hess	5844a54869	aws-0.22 improved its support for setting etags, which improves support for versioned S3 buckets. Remove placeholder version number I used when implementing the feature in aws. This commit was sponsored by Ethan Aubin.	2020-09-14 18:37:49 -04:00
Joey Hess	ddf963d019	deepseq all things returned from ResourceT http Potentially fixes https://git-annex.branchable.com/bugs/concurrent_git-annex-copy_to_s3_special_remote_fails/ although I don't know if it does. My thinking is, ResourceT may allocate a resource and then free it, and a unforced thunk to that resource could result in reading memory that has since been overwritten by something else, or in a SEGV, depending. While that seems kind of like a bug in ResourceT to me, if it is what's happening, this will avoid it. If it's not, this doesn't really hurt much since the values are all smallish. This commit was sponsored by Graham Spencer on Patreon.	2020-09-14 18:30:06 -04:00
Joey Hess	ddcab38e4a	no importKey for S3 for now The Etag is sometimes a md5, but not if eg, there was a multipart upload. May revisit later if there's demand.	2020-07-03 13:53:14 -04:00
Joey Hess	3175015d1b	lockContent for S3 (with versioning=yes) and git-lfs Made several special remotes support locking content on them while dropping, which allows dropping from another special remote when the content will only remain on a special remote of these types. In both cases, verify the content is present actively, because it's certianly possible for things other than git-annex to have removed it. Worth thinking about what to do if at some later point, git-lfs gains support for dropping content, and a content locking operation. That would probably need a transition; first would need to make lockContent use the locking operation. Then, once enough time had passed that we can assume any git-annex operating on the git-lfs remote had that change, git-annex could finally allow dropping from git-lfs. Or, it could be that git-lfs gains support for dropping content, but not locking it. In that case, it seems this commit would need to be reverted, and then wait long enough for that git-annex to be everywhere, and only then can git-annex safely support dropping from git-lfs. So, the assumption made in this commit could lead to bother later.. But I think it's actually highly unlikely git-lfs does ever support dropping; it's outside their centralized model. Probably. :) Worth keeping in mind as the same assumption is made about other special remotes though. This commit was sponsored by Ethan Aubin.	2020-06-26 13:46:42 -04:00
Joey Hess	ad81feb053	fix implicit embedcreds regression Fix bug that made creds not be stored in git when a special remote was initialized with gpg encryption, but without an explicit embedcreds=yes. (Yet nother regression introduced in version 7.20200202.7. 5th so far.)	2020-06-16 18:00:19 -04:00
Joey Hess	41952204ce	S3: The REDUCED_REDUNDANCY storage class is no longer cheaper So stop documenting it, and stop offering it as a choice in the assistant. Removed the code that parses it into S3.ReducedRedundancy, because S3.OtherStorageClass with the value will work just the same and avoids a special case for a deprecated this.	2020-06-16 12:04:29 -04:00
Joey Hess	e63dcbf36c	fix embedcreds=yes reversion Fix bug that made enableremote of S3 and webdav remotes, that have embedcreds=yes, fail to set up the embedded creds, so accessing the remotes failed. (Regression introduced in version 7.20200202.7 in when reworking all the remote configs to be parsed.) Root problem is that parseEncryptionConfig excludes all other config keys except encryption ones, so it is then unable to find the credPairRemoteField. And since that field is not required to be present, it proceeds as if it's not, rather than failing in any visible way. This causes it to not find any creds, and so it does not cache them. When when the S3 remote tries to make a S3 connection, it finds no creds, so assumes it's being used in no-creds mode, and tries to find a public url. With no public url available, it fails, but the failure doesn't say a lack of creds is the problem. Fix is to provide setRemoteCredPair with a ParsedRemoteConfig, so the full set of configs of the remote can be parsed. A bit annoying to need to parse the remote config before the full config (as returned by setRemoteCredPair) is available, but this avoids the problem. I assume webdav also had the problem by inspection, but didn't try to reproduce it with it. Also, getRemoteCredPair used getRemoteConfigValue to get a ProposedAccepted String, but that does not seem right. Now that it runs that code, it crashed saying it had just a String. Remotes that have already been enableremoted, and so lack the cached creds file will work after this fix, because getRemoteCredPair will extract the creds from the remote config, writing the missing file. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-21 14:35:30 -04:00
Joey Hess	6361074174	convert renameExport to throw exception Finishes the transition to make remote methods throw exceptions, rather than silently hide them. A bit on the fence about this one, because when renameExport fails, it falls back to deleting instead, and so does the user care why it failed? However, it did let me clean up several places in the code. This commit was sponsored by Ethan Aubin.	2020-05-15 15:08:09 -04:00

1 2 3 4 5 ...

321 commits