git-annex

Author	SHA1	Message	Date
Joey Hess	fe1b2dfb4b	speed up very first tree import by 25% Reading from the cidsdb is responsible for about 25% of the runtime of an import. Since the cidmap is used to store the same information in ram, the cidsdb is not written to during an import any longer. And so, if it started off empty (and updateFromLog wasn't needed), those reads can just be skipped. This is kind of a cheesy optimisation, since after any import from any special remote, the database will no longer be empty, so it's a single use optimisation. But it's probably not uncommon to start by importing a lot of files, and it can save a lot of time then. Sponsored-by: Brock Spratlen on Patreon	2023-06-02 13:30:30 -04:00
Yaroslav Halchenko	84b0a3707a	Apply codespell -w throughout	2023-03-17 15:14:58 -04:00
Joey Hess	0756f4453d	try retrieval from more than one export location when the first fails Combined with commit `0ffc59d341`, this fixes the case where there are duplicate files on the special remote, and the first gets modified/deleted, while the second is still present. directory, adb: Fixed a bug when importtree=yes, and multiple files in the special remote have the same content, that caused it to refuse to get a file from the special remote, incorrectly complaining that it had changed, due to only accepting the inode+mtime of one file (that was since modified or deleted) and not accepting the inode+mtime of other duplicate files. Sponsored-by: Max Thoursie on Patreon	2022-09-20 13:33:57 -04:00
Joey Hess	0ffc59d341	change retrieveExportWithContentIdentifier to take a list of ContentIdentifier This partly fixes an issue where there are duplicate files in the special remote, and the first file gets swapped with another duplicate, or deleted. The swap case is fixed by this, the deleted case will need other changes. This makes retrieveExportWithContentIdentifier take a list of allowed ContentIdentifier, same as storeExportWithContentIdentifier, removeExportWithContentIdentifier, and checkPresentExportWithContentIdentifier. Of the special remotes that support importtree, borg is a special case and does not use content identifiers, S3 I assume can't get mixed up like this, directory certainly has the problem, and adb also appears to have had the problem. Sponsored-by: Graham Spencer on Patreon	2022-09-20 13:19:42 -04:00
Yaroslav Halchenko	0151976676	Typo fix unncessary -> unnecessary. Detected while reading recent CHANGELOG entry but then decided to apply to entire codebase and docs since why not?	2022-08-20 09:40:19 -04:00
Joey Hess	e60766543f	add annex.dbdir (WIP) WIP: This is mostly complete, but there is a problem: createDirectoryUnder throws an error when annex.dbdir is set to outside the git repo. annex.dbdir is a workaround for filesystems where sqlite does not work, due to eg, the filesystem not properly supporting locking. It's intended to be set before initializing the repository. Changing it in an existing repository can be done, but would be the same as making a new repository and moving all the annexed objects into it. While the databases get recreated from the git-annex branch in that situation, any information that is in the databases but not stored in the branch gets lost. It may be that no information ever gets stored in the databases that cannot be reconstructed from the branch, but I have not verified that. Sponsored-by: Dartmouth College's Datalad project	2022-08-11 16:58:53 -04:00
Joey Hess	54809e9eb3	fix untrustworthiness of import/export remotes Commit `36133f27c0` had a boolean flip in it, aaargh. Special remotes with importtree=yes or exporttree=yes are once again treated as untrusted, since files stored in them can be deleted or modified at any time. Sponsored-by: Kevin Mueller on Patreon	2022-05-09 15:53:23 -04:00
Joey Hess	e8a601aa24	incremental verification for retrieval from import remotes Sponsored-by: Dartmouth College's Datalad project	2022-05-09 15:39:43 -04:00
Joey Hess	90950a37e5	support incremental verification when retrieving from export/import remotes None of the special remotes do it yet, but this lays the groundwork. Added MustFinishIncompleteVerify so that, when an incremental verify is started but not complete, it can be forced to finish it. Otherwise, it would have skipped doing it when verification is disabled, but verification must always be done when retrievin from export remotes since files can be modified during retrieval. Note that retrieveExportWithContentIdentifier doesn't support incremental verification yet. And I'm not sure if it can -- it doesn't know the Key before it downloads the content. It seems a new API call would need to be split out of that, which is provided with the key. Sponsored-by: Dartmouth College's Datalad project	2022-05-09 12:25:04 -04:00
Joey Hess	f0754a61f5	plumb VerifyConfig into retrieveKeyFile This fixes the recent reversion that annex.verify is not honored, because retrieveChunks was passed RemoteVerify baser, but baser did not have export/import set up. Sponsored-by: Dartmouth College's DANDI project	2021-08-17 12:43:13 -04:00
Joey Hess	de482c7eeb	move verifyKeyContent to Annex.Verify The goal is that Database.Keys be able to use it; it can't use Annex.Content.Presence due to an import loop. Several other things also needed to be moved to Annex.Verify as a conseqence.	2021-07-27 14:07:23 -04:00
Joey Hess	0e44c252c8	avoid getting creds from environment during autoenable When autoenabling special remotes of type S3, weddav, or glacier, do not take login credentials from environment variables, as the user may not be expecting the autoenable to happen, and may have those set for other purposes.	2021-03-17 09:41:12 -04:00
Joey Hess	e10855e723	don't support dropping from thirdPartyPopulated for now This code I'm reverting works. But it has a problem: The export db and log and the ContentIdentifier db and log still list the content as being stored in the remote. So when I ran borg create again and stored the content in borg again in a new archive, git-annex sync noticed that, but since it didn't update the tree for the old archives, it then thought the content that had been removed from them was still in them, and so git-annex get failed in an ugly way: Include pattern 'tmp/x/.git/annex/objects/pX/ZJ/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855' never matched. [2020-12-28 16:40:44.878952393] process [933616] done ExitFailure 1 user error (borg ["extract","/tmp/b::abs4","tmp/x/.git/annex/objects/pX/ZJ/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855/SHA256E-s0--e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"] exited 1) It does not seem worth it to update the git tree for the export when dropping content, that would make drop of many files very expensive in git tree objects created. So, let's not support this I suppose..	2020-12-28 16:48:38 -04:00
Joey Hess	bfdaee234f	support removal from thirdPartyPopulated also some other fixes to thirdPartyPopulated	2020-12-28 16:36:26 -04:00
Joey Hess	0990d74574	revert recent lockContent change lockContent should only be done when it's versioned	2020-12-28 16:05:14 -04:00
Joey Hess	36133f27c0	move untrust forcing from Logs.Trust into Remote No behavior changes here, but this is groundwork for letting remotes such as borg vary untrust forcing depending on configuration.	2020-12-28 15:22:10 -04:00
Joey Hess	5ce7fce74a	simplify adjustExportImport' is never called with both isexport and isimport False.	2020-12-28 15:06:47 -04:00
Joey Hess	46059ab0e5	split off versionedExport from appendonly S3 uses versionedExport, while GitLFS uses appendonly. This is groundwork for later changes.	2020-12-28 14:37:15 -04:00
Joey Hess	2e72590a48	avoid using export method when the remote only supports import	2020-12-23 13:40:56 -04:00
Joey Hess	e1ac42be77	convert listImportableContents to throwing exceptions	2020-12-22 14:24:29 -04:00
Joey Hess	706e2a63fb	fix logic error in thirdPartyPopulated handling	2020-12-21 13:24:07 -04:00
Joey Hess	9a2c8757f3	add thirdPartyPopulated interface This is to support, eg a borg repo as a special remote, which is populated not by running git-annex commands, but by using borg. Then git-annex sync lists the content of the remote, learns which files are annex objects, and treats those as present in the remote. So, most of the import machinery is reused, to a new purpose. While normally importtree maintains a remote tracking branch, this does not, because the files stored in the remote are annex object files, not user-visible filenames. But, internally, a git tree is still generated, of the files on the remote that are annex objects. This tree is used by retrieveExportWithContentIdentifier, etc. As with other import/export remotes, that the tree is recorded in the export log, and gets grafted into the git-annex branch. importKey changed to be able to return Nothing, to indicate when an ImportLocation is not an annex object and so should be skipped from being included in the tree. It did not seem to make sense to have git-annex import do this, since from the user's perspective, it's not like other imports. So only git-annex sync does it. Note that, git-annex sync does not yet download objects from such remotes that are preferred content. importKeys is run with content downloading disabled, to avoid getting the content of all objects. Perhaps what's needed is for seekSyncContent to be run with these remotes, but I don't know if it will just work (in particular, it needs to avoid trying to transfer objects to them), so I skipped that for now. (Untested and unused as of yet.) This commit was sponsored by Jochen Bartl on Patreon.	2020-12-18 15:23:58 -04:00
Joey Hess	f930176d6e	change info from export=yes to exporttree=yes and same for import for consistency	2020-12-17 17:06:50 -04:00
Joey Hess	77aedbef8b	fix call to warnExportImportConflict That needs a Remote that has the right export/import set up, not the input Remote, which does not yet.	2020-12-17 16:25:02 -04:00
Joey Hess	f2ecc6e0da	import remotes use ContentIdentifier for getting and checking content This is better than using the equivilant actions for export remotes, especially for getting content, since the ContentIdentifier checking means we can be sure (enough) that the content is valid to not force verification of content. Which allows getting keys of types that cannot be verified. Also, reorganized the internals of adjustExportImport which was becoming very hard to follow. Now it's clear what each method does in each case.	2020-12-17 15:55:31 -04:00
Joey Hess	5946e7136e	force verification after getting file from export remote This way, if annex.verify is disabled, it's still checked, since this is not a key/value store, it has to be checked.	2020-12-17 15:31:22 -04:00
Joey Hess	ceda8c0066	refactor common code	2020-12-17 14:17:09 -04:00
Joey Hess	4d2cd58ee5	provide missing remote actions for importree only remote Ah, it seemed too easy before when I was implementing importrree only, and it was because all the key-based actions needed to be handled too. Mostly copied from isexport, and this works. It does seem that an import remote could use retrieveExportWithContentIdentifier rather than retrieveExport, and checkPresentExportWithContentIdentifier rather than checkPresentExport, which would both be more accurate.	2020-12-17 13:46:34 -04:00
Joey Hess	6c890d62f6	initremote: Prevent enabling encryption with exporttree=yes/importtree=yes I do think this was a reversion, but I have not tracked back to what version. While involving the remote config, it's not the same class of problems that I kept having to chase down for a while after the remote config parser reworking.	2020-12-15 12:08:08 -04:00
Joey Hess	6a11b6fab8	Support special remotes that are configured with importtree=yes but without exporttree=yes There was no particular reason not to support this, other than maybe a lack of a use case. One use case would of course be a remote that you want to avoid overwriting content on. A new use case is the idea of importing from backups, eg borg, where exporting is not necessarily supported at all. This commit was sponsored by Brock Spratlen on Patreon.	2020-12-10 13:17:40 -04:00
Joey Hess	f75be32166	external backends wip It's able to start them up, the only thing not implemented is generating and verifying keys. And, the key translation for HasExt.	2020-07-29 15:23:18 -04:00
Joey Hess	85506a7015	import: Added --no-content option, which avoids downloading files from a special remote Only supported by some special remotes: directory I need to check the rest and they're currently missing methods until I do. git-annex sync --no-content does not yet use this to do imports	2020-07-03 13:41:57 -04:00
Joey Hess	037440ef36	convert removeExportDirectory to throw exception Part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-15 14:43:18 -04:00
Joey Hess	cdbfaae706	change removeExport to throw exception Part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. This commit was sponsored by Graham Spencer on Patreon.	2020-05-15 14:15:14 -04:00
Joey Hess	3334d3831b	change retrieveExport and getKey to throw exception retrieveExport is part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. getKey very rarely fails, and when it does it's always for the same reason (user configured annex.backend to url for some reason). So, this will avoid dealing with Nothing everywhere it's used. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-15 13:45:53 -04:00
Joey Hess	4814b444dd	make storeExport throw exceptions	2020-05-15 12:20:02 -04:00
Joey Hess	4be94c67c7	make removeKey throw exceptions	2020-05-14 14:11:05 -04:00
Joey Hess	d9c7f81ba4	make retrieveKeyFile and retrieveKeyFileCheap throw exceptions Converted retrieveKeyFileCheap to a Maybe, to avoid needing to throw a exception when a remote doesn't support it.	2020-05-13 17:07:07 -04:00
Joey Hess	c1cd402081	make storeKey throw exceptions When storing content on remote fails, always display a reason why. Since the Storer used by special remotes already did, this mostly affects git remotes, but not entirely. For example, if git-lfs failed to connect to the endpoint, it used to silently return False.	2020-05-13 14:03:00 -04:00
Joey Hess	e535da621c	Bugfix to getting content from an export remote with -J, when the export database was not yet populated. (cherry picked from commit `e520341500`)	2020-02-26 18:07:20 -04:00
Joey Hess	8af6d2c3c5	fix encryption of content to gcrypt and git-lfs Fix serious regression in gcrypt and encrypted git-lfs remotes. Since version 7.20200202.7, git-annex incorrectly stored content on those remotes without encrypting it. Problem was, Remote.Git enumerates all git remotes, including git-lfs and gcrypt. It then dispatches to those. So, Remote.List used the RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt, and that parser does not know about encryption fields, so did not include them in the ParsedRemoteConfig. (Also didn't include other fields specific to those remotes, perhaps chunking etc also didn't get through.) To fix, had to move RemoteConfig parsing down into the generate methods of each remote, rather than doing it in Remote.List. And a consequence of that was that ParsedRemoteConfig had to change to include the RemoteConfig that got parsed, so that testremote can generate a new remote based on an existing remote. (I would have rather fixed this just inside Remote.Git, but that was not practical, at least not w/o re-doing work that Remote.List already did. Big ugly mostly mechanical patch seemed preferable to making git-annex slower.)	2020-02-26 18:05:36 -04:00
Joey Hess	7038acf96c	add descriptions for all remote config fields not yet used	2020-01-20 15:20:04 -04:00
Joey Hess	99cb3e75f1	add LISTCONFIGS to external special remote protocol Special remote programs that use GETCONFIG/SETCONFIG are recommended to implement it. The description is not yet used, but will be useful later when adding a way to make initremote list all accepted configs. configParser now takes a RemoteConfig parameter. Normally, that's not needed, because configParser returns a parter, it does not parse it itself. But, it's needed to look at externaltype and work out what external remote program to run for LISTCONFIGS. Note that, while externalUUID is changed to a Maybe UUID, checkExportSupported used to use NoUUID. The code that now checks for Nothing used to behave in some undefined way if the external program made requests that triggered it. Also, note that in externalSetup, once it generates external, it parses the RemoteConfig strictly. That generates a ParsedRemoteConfig, which is thrown away. The reason it's ok to throw that away, is that, if the strict parse succeeded, the result must be the same as the earlier, lenient parse. initremote of an external special remote now runs the program three times. First for LISTCONFIGS, then EXPORTSUPPORTED, and again LISTCONFIGS+INITREMOTE. It would not be hard to eliminate at least one of those, and it should be possible to only run the program once.	2020-01-17 16:07:17 -04:00
Joey Hess	c498269a88	convert configParser to Annex action and add passthrough option Needed so Remote.External can query the external program for its configs. When the external program does not support the query, the passthrough option will make all input fields be available.	2020-01-14 13:52:03 -04:00
Joey Hess	963239da5c	separate RemoteConfig parsing basically working Many special remotes are not updated yet and are commented out.	2020-01-14 12:35:08 -04:00
Joey Hess	71ecfbfccf	be stricter about rejecting invalid configurations for remotes This is a first step toward that goal, using the ProposedAccepted type in RemoteConfig lets initremote/enableremote reject bad parameters that were passed in a remote's configuration, while avoiding enableremote rejecting bad parameters that have already been stored in remote.log This does not eliminate every place where a remote config is parsed and a default value is used if the parse false. But, I did fix several things that expected foo=yes/no and so confusingly accepted foo=true but treated it like foo=no. There are still some fields that are parsed with yesNo but not not checked when initializing a remote, and there are other fields that are parsed in other ways and not checked when initializing a remote. This also lays groundwork for rejecting unknown/typoed config keys.	2020-01-10 14:52:48 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	9828f45d85	add RemoteStateHandle This solves the problem of sameas remotes trampling over per-remote state. Used for: * per-remote state, of course * per-remote metadata, also of course * per-remote content identifiers, because two remote implementations could in theory generate the same content identifier for two different peices of content While chunk logs are per-remote data, they don't use this, because the number and size of chunks stored is a common property across sameas remotes. External special remote had a complication, where it was theoretically possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or EXPORTSUPPORTED. Since the uuid of the remote is typically generate in Remote.setup, it would only be possible to pass a Maybe RemoteStateHandle into it, and it would otherwise have to construct its own. Rather than go that route, I decided to send an ERROR in this case. It seems unlikely that any existing external special remote will be affected. They would have to make up a git-annex key, and set state for some reason during INITREMOTE. I can imagine such a hack, but it doesn't seem worth complicating the code in such an ugly way to support it. Unfortunately, both TestRemote and Annex.Import needed the Remote to have a new field added that holds its RemoteStateHandle.	2019-10-14 13:51:42 -04:00
Joey Hess	5004381dd9	improve error display when storing to an export/import remote fails Prompted by the test suite on windows failing to with "export foo failed" and no information about what went wrong. Note that only storeExportWithContentIdentifier has been converted. storeExport still returns a Bool and so exceptions may be hidden. However, storeExportWithContentIdentifier has many more failure modes, since it needs to avoid overwriting modified files. So it's more important it have better error display.	2019-08-13 12:05:00 -04:00
Joey Hess	15bd7d57ca	info: Show when a remote is configured with importtree	2019-04-23 14:27:43 -04:00

1 2

71 commits