git-annex

Author	SHA1	Message	Date
Yaroslav Halchenko	84b0a3707a	Apply codespell -w throughout	2023-03-17 15:14:58 -04:00
Joey Hess	cfaae7e931	added an optional cost= configuration to all special remotes Note that when this is specified and an older git-annex is used to enableremote such a special remote, it will simply ignore the cost= field and use whatever the default cost is. In passing, fixed adb to support the remote.name.cost and remote.name.cost-command configs. Sponsored-by: Dartmouth College's DANDI project	2023-01-12 13:42:28 -04:00
Joey Hess	e60766543f	add annex.dbdir (WIP) WIP: This is mostly complete, but there is a problem: createDirectoryUnder throws an error when annex.dbdir is set to outside the git repo. annex.dbdir is a workaround for filesystems where sqlite does not work, due to eg, the filesystem not properly supporting locking. It's intended to be set before initializing the repository. Changing it in an existing repository can be done, but would be the same as making a new repository and moving all the annexed objects into it. While the databases get recreated from the git-annex branch in that situation, any information that is in the databases but not stored in the branch gets lost. It may be that no information ever gets stored in the databases that cannot be reconstructed from the branch, but I have not verified that. Sponsored-by: Dartmouth College's Datalad project	2022-08-11 16:58:53 -04:00
Joey Hess	2c1288334d	Avoid running bup join concurrently with bup split On the bup mailing list, this was hypothesized as also being a concurrency problem. Sponsored-by: Svenne Krap on Patreon	2022-08-09 10:40:45 -04:00
Joey Hess	abd417d4fe	Avoid running multiple bup split processes concurrently Since bup split is not concurrency safe. Used a lock file so that 2 git-annex processes only run one bup split between them (per bup repo). (Concurrent writes from different git-annex repository clones to the same bup repo could still have concurrency problems.) Sponsored-by: Noam Kremen on Patreon	2022-08-08 18:54:06 -04:00
Joey Hess	5bc70e2da5	When bup split fails, display its stderr It seems worth noting here that I emailed bup's author about bup split being noisy on stderr even with -q in approximately 2011. That never got fixed. Its current repo on github only accepts pull requests, not bug reports. Needing to add such complexity to deal with such a longstanding unfixed issue is not fun. Sponsored-by: Kevin Mueller on Patreon	2022-08-05 13:57:20 -04:00
Joey Hess	f94908f2a6	improve output when storing to bup bup split outputs to stderr even with -q. This was discarded when using -J, but it was still outputting when not using -J, and so was git-annex. Sponsored-by: Nicholas Golder-Manning on Patreon	2022-08-05 12:29:33 -04:00
Joey Hess	c20358b671	incremental verify for byteRetriever special remotes Several special remotes verify content while it is being retrieved, avoiding a separate checksum pass. They are: S3, bup, ddar, and gcrypt (with a local repository). Not done when using chunking, yet. Complicated by Retriever needing to change to be polymorphic. Which in turn meant RankNTypes is needed, and also needed some code changes. The change in Remote.External does not change behavior at all but avoids the type checking failing because of a "rigid, skolem type" which "would escape its scope". So I refactored slightly to make the type checker's job easier there. Unfortunately, directory uses fileRetriever (except when chunked), so it is not amoung the improved ones. Fixing that would need a way for FileRetriever to return a Verification. But, since the file retrieved may be encrypted or chunked, it would be extra work to always incrementally checksum the file while retrieving it. Hm. Some other special remotes use fileRetriever, and so don't get incremental verification, but could be converted to byteRetriever later. One is GitLFS, which uses downloadConduit, which writes to the file, so could verify as it goes. Other special remotes like web could too, but don't use Remote.Helper.Special and so will need to be addressed separately. Sponsored-by: Dartmouth College's DANDI project	2021-08-11 14:20:38 -04:00
Joey Hess	f8836306fa	remove "checking remotename" message This fixes fsck of a remote that uses chunking displaying (checking remotename) (checking remotename)" for every chunk. Also, some remotes displayed the message, and others did not, with no consistency. It was originally displayed only when accessing remotes that were expensive or might involve a password prompt, I think, but nothing in the API said when to do it so it became an inconsistent mess. Originally I thought fsck should always display it. But it only displays in fsck --from remote, so the user knows the remote is being accessed, so there is no reason to tell them it's accessing it over and over. It was also possible for git-annex move to sometimes display it twice, due to checking if content is present twice. But, the user of move specifies --from/--to, so it does not need to display when it's accessing the remote, as the user expects it to access the remote. git-annex get might display it, but only if the remote also supports hasKeyCheap, which is really only local git remotes, which didn't display it always; and in any case nothing displayed it before hasKeyCheap, which is checked first, so I don't think this needs to display it ever. mirror is like move. And that's all the main places it would have been displayed. This commit was sponsored by Jochen Bartl on Patreon.	2021-04-27 13:05:27 -04:00
Joey Hess	62e152f210	incremental checksum on download from ssh or p2p Checksum as content is received from a remote git-annex repository, rather than doing it in a second pass. Not tested at all yet, but I imagine it will work! Not implemented for any special remotes, and also not implemented for copies from local remotes. It may be that, for local remotes, it will suffice to use rsync, rely on its checksumming, and simply return Verified. (It would still make a checksumming pass when cp is used for COW, I guess.)	2021-02-09 17:03:27 -04:00
Joey Hess	3a66cd715f	avoid making absolute git remote path relative When a git remote is configured with an absolute path, use that path, rather than making it relative. If it's configured with a relative path, use that. Git.Construct.fromPath changed to preserve the path as-is, rather than making it absolute. And Annex.new changed to not convert the path to relative. Instead, Git.CurrentRepo.get generates a relative path. A few things that used fromAbsPath unncessarily were changed in passing to use fromPath instead. I'm seeing fromAbsPath as a security check, while before it was being used in some cases when the path was known absolute already. It may be that fromAbsPath is not really needed, but only git-annex-shell uses it now, and I'm not 100% sure that there's not some input that would cause a relative path to be used, opening a security hole, without the security check. So left it as-is. Test suite passes and strace shows the configured remote url is used unchanged in the path into it. I can't be 100% sure there's not some code somewhere that takes an absolute path to the repo and converts it to relative and uses it, but it seems pretty unlikely that the code paths used for a git remote would call such code. One place I know of is gitAnnexLink, but I'm pretty sure that git remotes never deal with annex symlinks. If that did get called, it generates a path relative to cwd, which would have been wrong before this change as well, when operating on a remote.	2021-02-08 13:18:01 -04:00
Joey Hess	36133f27c0	move untrust forcing from Logs.Trust into Remote No behavior changes here, but this is groundwork for letting remotes such as borg vary untrust forcing depending on configuration.	2020-12-28 15:22:10 -04:00
Joey Hess	9a2c8757f3	add thirdPartyPopulated interface This is to support, eg a borg repo as a special remote, which is populated not by running git-annex commands, but by using borg. Then git-annex sync lists the content of the remote, learns which files are annex objects, and treats those as present in the remote. So, most of the import machinery is reused, to a new purpose. While normally importtree maintains a remote tracking branch, this does not, because the files stored in the remote are annex object files, not user-visible filenames. But, internally, a git tree is still generated, of the files on the remote that are annex objects. This tree is used by retrieveExportWithContentIdentifier, etc. As with other import/export remotes, that the tree is recorded in the export log, and gets grafted into the git-annex branch. importKey changed to be able to return Nothing, to indicate when an ImportLocation is not an annex object and so should be skipped from being included in the tree. It did not seem to make sense to have git-annex import do this, since from the user's perspective, it's not like other imports. So only git-annex sync does it. Note that, git-annex sync does not yet download objects from such remotes that are preferred content. importKeys is run with content downloading disabled, to avoid getting the content of all objects. Perhaps what's needed is for seekSyncContent to be run with these remotes, but I don't know if it will just work (in particular, it needs to avoid trying to transfer objects to them), so I skipped that for now. (Untested and unused as of yet.) This commit was sponsored by Jochen Bartl on Patreon.	2020-12-18 15:23:58 -04:00
Joey Hess	ca80c3154c	more RawFilePath conversion removeFile changed to removeLink, because AFAICS it should be fine to remove non-file things here. In particular, it's fine to remove a symlink, since we're about to write a symlink. (removeLink does not remove directories, so file, symlink, and unix socket are the only possibilities.)	2020-10-30 13:07:41 -04:00
Joey Hess	3ed797be0f	fix reversion From back in `4be94c67c7`. Caused the test suite to fail, when bup is installed, but was not noticed since the autobuilds don't have bup.	2020-06-05 19:06:09 -04:00
Joey Hess	ef0024444b	fix reversion It was not the wrong handle. The handle was not being closed, so bup kept running. Before `2670890b17`, the code was: withHandle StdinHandle createProcessSuccess cmd feeder The stdin handle was not closed by the feeder. Testing this: withHandle StdinHandle createProcessSuccess (proc "cat" []) (\h -> hPutStrLn h "hi") There's a rather long pause, a couple seconds, before it completes, but it does complete. With hClose h, it immediately completes. This must be the GC noticing that h is out of scope and closing it. It seems likely that the old code worked only by that accident. So, other similar changes made in that and nearby commits may also have this problem, and need to explicitly close handles that were somehow implicitly closed before.	2020-06-05 17:10:52 -04:00
Joey Hess	291774779f	use right handle	2020-06-05 16:45:12 -04:00
Joey Hess	2670890b17	convert to withCreateProcess for async exception safety This handles all createProcessSuccess callers, and aside from process pools, the complete conversion of all process running to async exception safety should be complete now. Also, was able to remove from Utility.Process the old API that I now know was not a good idea. And proof it was bad: The code size went down, despite there being a fair bit of boilerplate for some future API to reduce.	2020-06-04 15:45:52 -04:00
Joey Hess	438dbe3b66	convert to withCreateProcess for async exception safety This handles all sites where checkSuccessProcess/ignoreFailureProcess is used, except for one: Git.Command.pipeReadLazy That one will be significantly more work to convert to bracketing. (Also skipped Command.Assistant.autoStart, but it does not need to shut down the processes it started on exception because they are git-annex assistant daemons..) forceSuccessProcess is done, except for createProcessSuccess. All call sites of createProcessSuccess will need to be converted to bracketing. (process pools still todo also)	2020-06-04 12:44:09 -04:00
Joey Hess	4be94c67c7	make removeKey throw exceptions	2020-05-14 14:11:05 -04:00
Joey Hess	d9c7f81ba4	make retrieveKeyFile and retrieveKeyFileCheap throw exceptions Converted retrieveKeyFileCheap to a Maybe, to avoid needing to throw a exception when a remote doesn't support it.	2020-05-13 17:07:07 -04:00
Joey Hess	c1cd402081	make storeKey throw exceptions When storing content on remote fails, always display a reason why. Since the Storer used by special remotes already did, this mostly affects git remotes, but not entirely. For example, if git-lfs failed to connect to the endpoint, it used to silently return False.	2020-05-13 14:03:00 -04:00
Joey Hess	b50ee9cd0c	remove Preparer abstraction That had almost no benefit at all, and complicated things quite a lot. What I proably wanted this to be was something like ResourceT, but it was not. The few remotes that actually need some preparation done only once and reused used a MVar and not Preparer.	2020-05-13 11:56:21 -04:00
Joey Hess	f85ca7dc80	fix all remaining -Wincomplete-uni-patterns warnings A couple of these were probably actual bugs in edge cases. Most of the changes I'm fine with. The fact that aeson's object returns sometihng that we know will be an Object, but the type checker does not know is kind of annoying.	2020-04-15 13:55:08 -04:00
Joey Hess	8af6d2c3c5	fix encryption of content to gcrypt and git-lfs Fix serious regression in gcrypt and encrypted git-lfs remotes. Since version 7.20200202.7, git-annex incorrectly stored content on those remotes without encrypting it. Problem was, Remote.Git enumerates all git remotes, including git-lfs and gcrypt. It then dispatches to those. So, Remote.List used the RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt, and that parser does not know about encryption fields, so did not include them in the ParsedRemoteConfig. (Also didn't include other fields specific to those remotes, perhaps chunking etc also didn't get through.) To fix, had to move RemoteConfig parsing down into the generate methods of each remote, rather than doing it in Remote.List. And a consequence of that was that ParsedRemoteConfig had to change to include the RemoteConfig that got parsed, so that testremote can generate a new remote based on an existing remote. (I would have rather fixed this just inside Remote.Git, but that was not practical, at least not w/o re-doing work that Remote.List already did. Big ugly mostly mechanical patch seemed preferable to making git-annex slower.)	2020-02-26 18:05:36 -04:00
Joey Hess	7038acf96c	add descriptions for all remote config fields not yet used	2020-01-20 15:20:04 -04:00
Joey Hess	c4ea3ca40a	ported almost all remotes, until my brain melted external is not started yet, and S3 is part way through and not compiling yet	2020-01-14 15:41:34 -04:00
Joey Hess	71ecfbfccf	be stricter about rejecting invalid configurations for remotes This is a first step toward that goal, using the ProposedAccepted type in RemoteConfig lets initremote/enableremote reject bad parameters that were passed in a remote's configuration, while avoiding enableremote rejecting bad parameters that have already been stored in remote.log This does not eliminate every place where a remote config is parsed and a default value is used if the parse false. But, I did fix several things that expected foo=yes/no and so confusingly accepted foo=true but treated it like foo=no. There are still some fields that are parsed with yesNo but not not checked when initializing a remote, and there are other fields that are parsed in other ways and not checked when initializing a remote. This also lays groundwork for rejecting unknown/typoed config keys.	2020-01-10 14:52:48 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	c20f4704a7	all commands building except for assistant also, changed ConfigValue to a newtype, and moved it into Git.Config.	2019-12-05 14:41:18 -04:00
Joey Hess	650a631ef8	include all remotes back in	2019-12-02 12:26:33 -04:00
Joey Hess	9828f45d85	add RemoteStateHandle This solves the problem of sameas remotes trampling over per-remote state. Used for: * per-remote state, of course * per-remote metadata, also of course * per-remote content identifiers, because two remote implementations could in theory generate the same content identifier for two different peices of content While chunk logs are per-remote data, they don't use this, because the number and size of chunks stored is a common property across sameas remotes. External special remote had a complication, where it was theoretically possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or EXPORTSUPPORTED. Since the uuid of the remote is typically generate in Remote.setup, it would only be possible to pass a Maybe RemoteStateHandle into it, and it would otherwise have to construct its own. Rather than go that route, I decided to send an ERROR in this case. It seems unlikely that any existing external special remote will be affected. They would have to make up a git-annex key, and set state for some reason during INITREMOTE. I can imagine such a hack, but it doesn't seem worth complicating the code in such an ugly way to support it. Unfortunately, both TestRemote and Annex.Import needed the Remote to have a new field added that holds its RemoteStateHandle.	2019-10-14 13:51:42 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	ccc0684d21	no remotes support import yet	2019-02-20 16:59:04 -04:00
Joey Hess	d3ab5e626b	rename key2file and file2key What these generate is not really suitable to be used as a filename, which is why keyFile and fileKey further escape it. These are just serializing Keys. Also removed a quickcheck test that was very unlikely to test anything useful, since it relied on random chance creating something that looks like a serialized key. The other test is sufficient for testing what that was intended to test anyway.	2019-01-14 13:03:35 -04:00
Joey Hess	8b39db20b5	export appendonly support Make `git annex export` check appendonly when removing a file from an export, and not update the location log, since the remote still contains the content. This commit was supported by the NSF-funded DataLad project.	2018-08-30 11:18:20 -04:00
Joey Hess	02630b39ee	add Remote.readonly Does nothing yet. Considered making bup readonly, but while the content can't be removed, it is able to delete a branch, so didn't. This commit was supported by the NSF-funded DataLad project.	2018-08-30 11:12:18 -04:00
Joey Hess	4315bb9e42	add retrievalSecurityPolicy This will be used to protect against CVE-2018-10859, where an encrypted special remote is fed the wrong encrypted data, and so tricked into decrypting something that the user encrypted with their gpg key and did not store in git-annex. It also protects against CVE-2018-10857, where a remote follows a http redirect to a file:// url or to a local private web server. While that's already been prevented in git-annex's own use of http, external special remotes, hooks, etc use other http implementations and could still be vulnerable. The policy is not yet enforced, this commit only adds the appropriate metadata to remotes. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2018-06-21 11:36:36 -04:00
Joey Hess	67e46229a5	change Remote.repo to Remote.getRepo This is groundwork for letting a repo be instantiated the first time it's actually used, instead of at startup. The only behavior change is that some old special cases for xmpp remotes were removed. Where before git-annex silently did nothing with those no-longer supported remotes, it may now fail in some way. The additional IO action should have no performance impact as long as it's simply return. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon	2018-06-04 15:30:26 -04:00
Joey Hess	2927618d35	Added adb special remote which allows exporting files to Android devices. git annex testremote passes. exportree not implemented yet, although the documentation talks about it, since it will be the main way this remote will be used. The adb push/pull progress is displayed for now; it would be better to consume it and use it to update the git-annex progress bar. This commit was sponsored by andrea rota.	2018-03-27 14:54:41 -04:00
Joey Hess	16eb2f976c	prevent exporttree=yes on remotes that don't support exports Don't allow "exporttree=yes" to be set when the special remote does not support exports. That would be confusing since the user would set up a special remote for exports, but `git annex export` to it would later fail. This commit was supported by the NSF-funded DataLad project.	2017-09-07 13:48:44 -04:00
Joey Hess	a4328b49d2	refactor ExportActions This will allow disabling exports for remotes that are not configured to allow them. Also, exportSupported will be useful for the external special remote to probe. This commit was supported by the NSF-funded DataLad project	2017-09-01 13:05:09 -04:00
Joey Hess	e55e445a36	add API for exporting Implemented so far for the directory special remote. Several remotes don't make sense to export to. Regular Git remotes, obviously, do not. Bup remotes almost certianly do not, since bup would need to be used to extract the export; same store for Ddar. Web and Bittorrent are download-only. GCrypt is always encrypted so exporting to it would be pointless. There's probably no point complicating the Hook remotes with exporting at this point. External, S3, Glacier, WebDAV, Rsync, and possibly Tahoe should be modified to support export. Thought about trying to reuse the storeKey/retrieveKeyFile/removeKey interface, rather than adding a new interface. But, it seemed better to keep it separate, to avoid a complicated interface that sometimes encrypts/chunks key/value storage and sometimes users non-key/value storage. Any common parts can be factored out. Note that storeExport is not atomic. doc/design/exporting_trees_to_special_remotes.mdwn has some things in the "resuming exports" section that bear on this decision. Basically, I don't think, at this time, that an atomic storeExport would help with resuming, because exports are not key/value storage, and we can't be sure that a partially uploaded file is the same content we're currently trying to export. Also, note that ExportLocation will always use unix path separators. This is important, because users may export from a mix of windows and unix, and it avoids complicating the API with path conversions, and ensures that in such a mix, they always use the same locations for exports. This commit was sponsored by Bruno BEAUFILS on Patreon.	2017-08-29 13:00:41 -04:00
Joey Hess	faecd73f32	Support GIT_SSH and GIT_SSH_COMMAND They are handled close the same as they are by git. However, unlike git, git-annex sometimes needs to pass the -n parameter when using these. So, this has the potential for breaking some setup, and perhaps there ought to be a ANNEX_USE_GIT_SSH=1 needed to use these. But I'd rather avoid that if possible, so let's see if anyone complains. Almost all places where "ssh" was run have been changed to support the env vars. Anything still calling sshOptions does not support them. In particular, rsync special remotes don't. Seems that annex-rsync-transport already gives sufficient control there. (Fixed in passing: Remote.Helper.Ssh.toRepo used to extract remoteAnnexSshOptions and pass them to sshOptions, which was redundant since sshOptions also extracts those.) This commit was sponsored by Jeff Goeke-Smith on Patreon.	2017-03-17 16:20:37 -04:00
Joey Hess	f07af03018	Run ssh with -n whenever input is not being piped into it ... to avoid it consuming stdin that it shouldn't. This fixes git-annex-checkpresentkey --batch remote, which didn't output results for all keys passed into it. Other git-annex commands that communicate with a remote over ssh may also have been consuming stdin that they shouldn't have, which could have impacted using them in eg, shell scripts. For example, a shell script reading files from stdin and passing them to git annex drop would be impacted by this bug, whenever git annex drop ran git-annex-shell checkpresent, it would consume part/all of the stdin that the shell script was supposed to consume. Fixed by adding a ConsumeStdin parameter to Annex.Ssh.sshOptions, which is used throughout git-annex to run ssh (in order for ssh connection caching to work). Every call site was checked to see if it used CreatePipe for stdin, and if not was marked NoConsumeStdin.	2017-02-15 15:08:46 -04:00
Joey Hess	5c804cf42e	add SetupStage parameter to RemoteType.setup Most remotes have an idempotent setup that can be reused for enableremote, but in a few cases, it needs to tell which, and whether a UUID was provided to setup was used. This is groundwork for making initremote be able to provide a UUID. It should not change any behavior. Note that it would be nice to make the UUID always be provided to setup, and make setup not need to generate and return a UUID. What prevented this simplification is Remote.Git.gitSetup, which needs to reuse the UUID of the git remote when setting it up, and so has to return that UUID. This commit was sponsored by Thom May on Patreon.	2017-02-07 14:55:58 -04:00
Joey Hess	9eb10caa27	Some optimisations to string splitting code. Turns out that Data.List.Utils.split is slow and makes a lot of allocations. Here's a much simpler single character splitter that behaves the same (even in wacky corner cases) while running in half the time and 75% the allocations. As well as being an optimisation, this helps move toward eliminating use of missingh. (Data.List.Split.splitOn is nearly as slow as Data.List.Utils.split and allocates even more.) I have not benchmarked the effect on git-annex, but would not be surprised to see some parsing of eg, large streams from git commands run twice as fast, and possibly in less memory. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2017-01-31 19:06:22 -04:00
Joey Hess	0a4479b8ec	Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors. ghc 8 added backtraces on uncaught errors. This is great, but git-annex was using error in many places for a error message targeted at the user, in some known problem case. A backtrace only confuses such a message, so omit it. Notably, commands like git annex drop that failed due to eg, numcopies, used to use error, so had a backtrace. This commit was sponsored by Ethan Aubin.	2016-11-15 21:29:54 -04:00
Joey Hess	20bfbb28ac	improved refactoring ghc 8.0.1 didn't like runner because it used Rank2Types or something. Instead, factor out the feeder action.	2016-05-23 18:47:30 -04:00
Joey Hess	b9ce477fa2	plumb RemoteGitConfig through to decryptCipher	2016-05-23 17:33:32 -04:00

1 2 3 4

173 commits