git-annex

Author	SHA1	Message	Date
Joey Hess	9f3a346f25	fix nested exception bug Fix reversion introduced in version 6.20180316 that caused git-annex to stop processing files when unable to contact a ssh remote. The bug was not in any of the changed lines, but this one in inAnnex: P2PHelper.checkpresent (Ssh.runProto rmt connpool (cantCheck rmt) fallback) key cantCheck throws an exception, but that parameter to runProto expects a value, which it returns. So, inAnnex is returning a Bool containing an exception. This defeats the usual checks for checkPresent throwing an exception, crashing git-annex. Fixed by making runProto take an `Annex a` instead of an `a`, so passing cantCheck to it doesn't nest exceptions. This commit was sponsored by andrea rota.	2018-07-03 13:10:43 -04:00
Joey Hess	b657242f5d	enforce retrievalSecurityPolicy Leveraged the existing verification code by making it also check the retrievalSecurityPolicy. Also, prevented getViaTmp from running the download action at all when the retrievalSecurityPolicy is going to prevent verifying and so storing it. Added annex.security.allow-unverified-downloads. A per-remote version would be nice to have too, but would need more plumbing, so KISS. (Bill the Cat reference not too over the top I hope. The point is to make this something the user reads the documentation for before using.) A few calls to verifyKeyContent and getViaTmp, that don't involve downloads from remotes, have RetrievalAllKeysSecure hard-coded. It was also hard-coded for P2P.Annex and Command.RecvKey, to match the values of the corresponding remotes. A few things use retrieveKeyFile/retrieveKeyFileCheap without going through getViaTmp. * Command.Fsck when downloading content from a remote to verify it. That content does not get into the annex, so this is ok. * Command.AddUrl when using a remote to download an url; this is new content being added, so this is ok. This commit was sponsored by Fernando Jimenez on Patreon.	2018-06-21 13:37:01 -04:00
Joey Hess	4315bb9e42	add retrievalSecurityPolicy This will be used to protect against CVE-2018-10859, where an encrypted special remote is fed the wrong encrypted data, and so tricked into decrypting something that the user encrypted with their gpg key and did not store in git-annex. It also protects against CVE-2018-10857, where a remote follows a http redirect to a file:// url or to a local private web server. While that's already been prevented in git-annex's own use of http, external special remotes, hooks, etc use other http implementations and could still be vulnerable. The policy is not yet enforced, this commit only adds the appropriate metadata to remotes. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2018-06-21 11:36:36 -04:00
Joey Hess	0f566ed242	removal of the rest of remoteGitConfig In keyUrls, the GitConfig is used only by annexLocations to support configured Differences. Since such configurations affect all clones of a repository, the local repo's GitConfig must have the same information as the remote's GitConfig would have. So, used getGitConfig to get the local GitConfig, which is cached and so available cheaply. That actually fixed a bug noone had ever noticed: keyUrls is used for remotes accessed over http. The full git config of such a remote is normally not available, so the remoteGitConfig that keyUrls used would not have the necessary information in it. In copyFromRemoteCheap', it uses gitAnnexLocation, which does need the GitConfig of the remote repo itself in order to check if it's crippled, supports symlinks, etc. So, made the State include that GitConfig, cached. The use of gitAnnexLocation is within a (not $ Git.repoIsUrl repo) guard, so it's local, and so its git config will always be read and available. (Note that gitAnnexLocation in turn calls annexLocations, so the Differences config it uses in this case comes from the remote repo's GitConfig and not from the local repo's GitConfig. As explained above this is ok since they must have the same value.) Not very happy with this mess of different GitConfigs not type-safe and some read only sometimes etc. Very hairy. Think I got it this change right. Test suite passes.. This commit was sponsored by Ethan Aubin.	2018-06-05 14:48:37 -04:00
Joey Hess	fc5888300f	fix annex-checkuuid Fixed annex-checkuuid implementation, so that remotes configured that way can be used. This was 100% broken from the first commit of it, oops. This commit was sponsored by Øyvind Andersen Holm.	2018-06-04 16:52:22 -04:00
Joey Hess	67e46229a5	change Remote.repo to Remote.getRepo This is groundwork for letting a repo be instantiated the first time it's actually used, instead of at startup. The only behavior change is that some old special cases for xmpp remotes were removed. Where before git-annex silently did nothing with those no-longer supported remotes, it may now fail in some way. The additional IO action should have no performance impact as long as it's simply return. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon	2018-06-04 15:30:26 -04:00
Joey Hess	c34152777b	Use http-conduit for url downloads by default, annex.web-options enables curl * For url downloads, git-annex now defaults to using a http library, rather than wget or curl. But, if annex.web-options is set, it will use curl. To use the .netrc file, run: git config annex.web-options --netrc * git-annex no longer uses wget (and wget is no longer shipped with git-annex builds). Note that curl is always run in silent mode, since the new API for download has a MeterUpdate and doesn't make way for curl progress output. It might be worth writing a parser for curl's progress output to update the meter when using it, but I didn't bother with this edge case for now. This commit was supported by the NSF-funded DataLad project.	2018-04-06 17:36:20 -04:00
Joey Hess	9b98d3f630	better HTTP connection reuse Enable HTTP connection reuse across multiple files, when git-annex uses http-conduit. Before, a new Manager was created each time Utility.Url used it. Now, a single Manager gets created the first time, so connections are reused. Doesn't help when external programs are used for url download, but does speed up addurl --fast, fsck --from web, etc. Testing fsck --fast --from web with 3 files, over high-latency satellite internet, it sped up from 19.37s to 14.96s. This commit was supported by the NSF-funded DataLad project.	2018-04-04 15:39:40 -04:00
Joey Hess	2ec07bc29f	Avoid running annex.http-headers-command more than once.	2018-04-04 15:15:08 -04:00
Joey Hess	46d4316954	implement annex.retry et al Added annex.retry, annex.retry-delay, and per-remote versions to configure transfer retries. This commit was supported by the NSF-funded DataLad project.	2018-03-29 13:04:07 -04:00
Joey Hess	31e1adc005	deal with unlocked files P2P protocol version 1 adds VALID\|INVALID after DATA; INVALID means the file was detected to change content while it was being sent and so we may not have received the valid content of the file. Added new MustVerify constructor for Verification, which forces verification even when annex.verify=false etc. This is used when INVALID and in protocol version 0. As well as changing git-annex-shell p2psdio, this makes git-annex tor remotes always force verification, since they don't yet use protocol version 1. Previously, annex.verify=false could skip verification when using tor remotes, and let bad data into the repository. This commit was sponsored by Jack Hill on Patreon.	2018-03-13 14:27:14 -04:00
Joey Hess	b96b845ffd	fix nested progress meters when using git-annex-shell fallback Caused an ugly blank line when the first progress meter was not used, but also it may have confused -J display.	2018-03-12 19:20:10 -04:00
Joey Hess	1c2c8995ac	hide rsync progress output when metered but not in other uses of rsync	2018-03-12 18:36:07 -04:00
Joey Hess	cb05ef06bf	fix lost metering for fallback rsyncs `08814327ff` accidentially got rid of it, when it removed commandMetered.	2018-03-12 18:22:48 -04:00
Joey Hess	c3df5d1f10	avoid double-connect to unreachable ssh remote When git-annex-shell p2pstdio fails with 255, it's because the ssh server is not reachable. Avoid running the fallback action in this case, since it would just try a second time to connect, and presumably fail. Note that the closed P2PSshConnection will not be stored in the pool, so the next request tries again to connect. This is just the right behavior; when the remote becomes reachable again, the same git-annex process will start using it. This commit was sponsored by Ole-Morten Duesund on Patreon.	2018-03-12 16:50:21 -04:00
Joey Hess	d7f54671bf	refactoring	2018-03-09 13:48:10 -04:00
Joey Hess	936ab43932	use P2P for locking keys The P2P protocol is now fully used for git-annex-shell. This commit was sponsored by Ewen McNeill on Patreon.	2018-03-09 13:42:55 -04:00
Joey Hess	08814327ff	use P2P protocol for checkpresent, retrieve, and store Note that, due to not using rsync to transfer files to ssh remotes any longer, permissions and other file metadata of annexed files will no longer be preserved when copying them to ssh remotes. Other remotes never supported preserving that information, so this is not considered a regression. Added NEWS item about this. Another significant side effect of this is that, even when rsync is run to retrieve a file, its progress display will no longer be shown, and instead the native git-annex progress display will appear. It would be possible to use the rsync process display when rsync is used (old git-annex-shell and also retrieval from a local repository), but it would have complicated the code unncessarily, and been inconsistent behavior. (I'd been thinking for a while about eliminating the rsync progress display, since it's got some annoying verbosities, including display of the key and the "(xfr#1, to-chk=0/1)" bit and was already somewhat inconsistent.) retrieveKeyFileCheap still uses rsync, since that ensures that it gets the actual file content from the remote. Using the P2P protocol would use the local content, as long as the local and remote size are the same. This commit was sponsored by John Pellman on Patreon.	2018-03-09 13:25:16 -04:00
Joey Hess	5bc0ab3f31	going AGPL Remote/Git.hs now contains AGPL licensed code, thus the license of git-annex as a whole is AGPL. This was already the case when git-annex was built with the webapp enabled. The AGPL license will apply to all code added to Remote/Git.hs in the future, which is going to include support for using `git-annex-shell p2pstdio`.	2018-03-09 01:03:46 -04:00
Joey Hess	6a59bc4845	use P2P protocol for drop Not yet used for everything else, but this is enough to verify that it works, and do some benchmarking. Some bugfixes included, which got it working. Also fallback to old actions has been verified to work correctly. Benchmarked dropping one thousand files from a ssh remote on localhost. Using the old git-annex 40.867 seconds. With the P2P protocol 9.905 seconds! This commit was sponsored by Jochen Bartl on Patreon.	2018-03-08 16:56:17 -04:00
Joey Hess	16af259209	refactor p2p remote action code Make a Remote.Helper.P2P using code that was in Remote.P2P, converted to use generic protocol runner actions. This will allow it to be reused in Remote.Git. This commit was sponsored by mo on Patreon.	2018-03-08 16:11:00 -04:00
Joey Hess	c036a380b2	p2p ssh connection pools Much like Remote.P2P, there's a pool of connections to a peer, in order to support concurrent operations. Deals with old git-annex-ssh on the remote that does not support p2pstdio, by only trying once to use it, and remembering if it's not supported. Made p2pstdio send an AUTH_SUCCESS with its uuid, which serves the dual purposes of something to detect to see that the connection is working, and a way to verify that it's connected to the right uuid. (There's a redundant uuid check since the uuid field is sent by git_annex_shell, but I anticipate that being removed later when the legacy git-annex-shell stuff gets removed.) Not entirely happy with Remote.Git.runSsh's behavior when the proto action fails. Running the fallback will work ok, but what will we do when the fallbacks later get removed? It might be better to try to reconnect, in case the connection got closed. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2018-03-08 15:11:31 -04:00
Joey Hess	f4103744c3	make sure that lockContentShared is always paired with an inAnnex check lockContentShared had a screwy caveat that it didn't verify that the content was present when locking it, but in the most common case, eg indirect mode, it failed to lock when the content is not present. That led to a few callers forgetting to check inAnnex when using it, but the potential data loss was unlikely to be noticed because it only affected direct mode I think. Fix data loss bug when the local repository uses direct mode, and a locally modified file is dropped from a remote repsitory. The bug caused the modified file to be counted as a copy of the original file. (This is not a severe bug because in such a situation, dropping from the remote and then modifying the file is allowed and has the same end result.) And, in content locking over tor, when the remote repository is in direct mode, it neglected to check that the content was actually present when locking it. This could cause git annex drop to remove the only copy of a file when it thought the tor remote had a copy. So, make lockContentShared do its own inAnnex check. This could perhaps be optimised for direct mode, to avoid the check then, since locking the content necessarily verifies it exists there, but I have not bothered with that. This commit was sponsored by Jeff Goeke-Smith on Patreon.	2018-03-07 14:23:52 -04:00
Joey Hess	a28c541e23	add remote.<name>.annex-checkuuid Added remote.<name>.annex-checkuuid config, which can be set to false to disable the default checking of the uuid of remotes that point to directories. This can be useful to avoid unncessary drive spin-ups and automounting. Note that the UUID check is still done before writing to the repository, to avoid writing to the wrong repository if it got relocated. Check is also done before checkPresent to avoid getting confused about what is in which repo. This is effectively the same as the use of git-annex-shell with a uuid to check that the remote repository is the expected one. Did not bother with the check for retrieveKeyFile because it doesn't matter if the wrong repo is used then. This commit was sponsored by Trenton Cronholm on Patreon.	2018-01-10 14:21:18 -04:00
Joey Hess	2b66492d6e	Improve startup time for commands that do not operate on remotes And for tab completion, by not unnessessarily statting paths to remotes, which used to cause eg, spin-up of removable drives. Got rid of the remotes member of Git.Repo. This was a bit painful. Remote.Git modifies the list of remotes as it reads their configs, so still need a persistent list of remotes. So, put it in as Annex.gitremotes. It's only populated by getGitRemotes, so commands like examinekey that don't care about remotes won't do so. This commit was sponsored by Jake Vosloo on Patreon.	2018-01-09 16:22:07 -04:00
Joey Hess	f5edb16729	Display progress meter when uploading a key without size information Getting the size by statting the content file. This commit was supported by the NSF-funded DataLad project.	2017-11-14 16:40:49 -04:00
Joey Hess	5c32196a37	fix process and FD leak Fix process and file descriptor leak that was exposed when git-annex was built with ghc 8.2.1. Apparently ghc has changed its behavior of GC of open file handles that are pipes to running processes. That broke git-annex test on OSX due to running out of FDs. Audited for all uses of Annex.new and made stopCoProcesses be called once it's done with the state. Fixed several places that might have leaked in other situations than running the test suite. This commit was sponsored by Ewen McNeill.	2017-09-29 22:36:08 -04:00
Joey Hess	16eb2f976c	prevent exporttree=yes on remotes that don't support exports Don't allow "exporttree=yes" to be set when the special remote does not support exports. That would be confusing since the user would set up a special remote for exports, but `git annex export` to it would later fail. This commit was supported by the NSF-funded DataLad project.	2017-09-07 13:48:44 -04:00
Joey Hess	28e2cad849	implement exporttree=yes configuration * Only export to remotes that were initialized to support it. * Prevent storing key/value on export remotes. * Prevent enabling exporttree=yes and encryption in the same remote. SetupStage Enable was changed to take the old RemoteConfig. This allowed only setting exporttree when initially setting up a remote, and not configuring it later after stuff might already be stored in the remote. Went with =yes rather than =true for consistency with other parts of git-annex. Changed docs accordingly. This commit was supported by the NSF-funded DataLad project.	2017-09-04 13:09:38 -04:00
Joey Hess	a4328b49d2	refactor ExportActions This will allow disabling exports for remotes that are not configured to allow them. Also, exportSupported will be useful for the external special remote to probe. This commit was supported by the NSF-funded DataLad project	2017-09-01 13:05:09 -04:00
Joey Hess	e55e445a36	add API for exporting Implemented so far for the directory special remote. Several remotes don't make sense to export to. Regular Git remotes, obviously, do not. Bup remotes almost certianly do not, since bup would need to be used to extract the export; same store for Ddar. Web and Bittorrent are download-only. GCrypt is always encrypted so exporting to it would be pointless. There's probably no point complicating the Hook remotes with exporting at this point. External, S3, Glacier, WebDAV, Rsync, and possibly Tahoe should be modified to support export. Thought about trying to reuse the storeKey/retrieveKeyFile/removeKey interface, rather than adding a new interface. But, it seemed better to keep it separate, to avoid a complicated interface that sometimes encrypts/chunks key/value storage and sometimes users non-key/value storage. Any common parts can be factored out. Note that storeExport is not atomic. doc/design/exporting_trees_to_special_remotes.mdwn has some things in the "resuming exports" section that bear on this decision. Basically, I don't think, at this time, that an atomic storeExport would help with resuming, because exports are not key/value storage, and we can't be sure that a partially uploaded file is the same content we're currently trying to export. Also, note that ExportLocation will always use unix path separators. This is important, because users may export from a mix of windows and unix, and it avoids complicating the API with path conversions, and ensures that in such a mix, they always use the same locations for exports. This commit was sponsored by Bruno BEAUFILS on Patreon.	2017-08-29 13:00:41 -04:00
Joey Hess	d39c120afa	add annex-ignore-command and annex-sync-command configs Added remote configuration settings annex-ignore-command and annex-sync-command, which are dynamic equivilants of the annex-ignore and annex-sync configurations. For this I needed a new DynamicConfig infrastructure. Its implementation should be as fast as before when there is no dynamic config, and it caches so shell commands are only run once. Note that annex-ignore-command exits nonzero when the remote should be ignored. While that may seem backwards, it allows using the same command for it as for annex-sync-command when you want to disable both. This commit was sponsored by Trenton Cronholm on Patreon.	2017-08-17 13:54:14 -04:00
Joey Hess	db1600b2de	de-Maybe remoteGitConfig It's always set, so does not need to be a Maybe.	2017-05-11 16:05:01 -04:00
Joey Hess	3c8eb59860	When a http remote does not expose an annex.uuid config, only warn about it once, not every time git-annex is run. Same behavior as for a ssh remote.	2017-03-29 12:43:47 -04:00
Joey Hess	c8e1e3dada	AssociatedFile newtype To prevent any further mistakes like `301aff34c4` This commit was sponsored by Francois Marier on Patreon.	2017-03-10 13:35:31 -04:00
Joey Hess	e6857e75a6	sync hack to make updateInstead work on eg FAT sync: When syncing with a local repository located on a crippled filesystem, run the post-receive hook there, since it wouldn't get run otherwise. This makes pushing to repos on FAT-formatted removable drives update them when receive.denyCurrentBranch=updateInstead. Made Remote.Git export onLocal, which was cleaned up to not have so many caveats about its use. This commit was sponsored by Jeff Goeke-Smith on Patreon.	2017-02-17 15:21:52 -04:00
Joey Hess	00464fbed7	have onLocal stop any coprocesses, not only cat-file I have not seen any other coprocesses being started, but let's avoid problems if any do for whatever reason.	2017-02-17 14:30:18 -04:00
Joey Hess	f07af03018	Run ssh with -n whenever input is not being piped into it ... to avoid it consuming stdin that it shouldn't. This fixes git-annex-checkpresentkey --batch remote, which didn't output results for all keys passed into it. Other git-annex commands that communicate with a remote over ssh may also have been consuming stdin that they shouldn't have, which could have impacted using them in eg, shell scripts. For example, a shell script reading files from stdin and passing them to git annex drop would be impacted by this bug, whenever git annex drop ran git-annex-shell checkpresent, it would consume part/all of the stdin that the shell script was supposed to consume. Fixed by adding a ConsumeStdin parameter to Annex.Ssh.sshOptions, which is used throughout git-annex to run ssh (in order for ssh connection caching to work). Every call site was checked to see if it used CreatePipe for stdin, and if not was marked NoConsumeStdin.	2017-02-15 15:08:46 -04:00
Joey Hess	5c804cf42e	add SetupStage parameter to RemoteType.setup Most remotes have an idempotent setup that can be reused for enableremote, but in a few cases, it needs to tell which, and whether a UUID was provided to setup was used. This is groundwork for making initremote be able to provide a UUID. It should not change any behavior. Note that it would be nice to make the UUID always be provided to setup, and make setup not need to generate and return a UUID. What prevented this simplification is Remote.Git.gitSetup, which needs to reuse the UUID of the git remote when setting it up, and so has to return that UUID. This commit was sponsored by Thom May on Patreon.	2017-02-07 14:55:58 -04:00
Joey Hess	15be5c04a6	git-annex-shell, remotedaemon, git remote: Fix some memory DOS attacks. The attacker could just send a very lot of data, with no \n and it would all be buffered in memory until the kernel killed git-annex or perhaps OOM killed some other more valuable process. This is a low impact security hole, only affecting communication between local git-annex and git-annex-shell on the remote system. (With either able to be the attacker). Only those with the right ssh key can do it. And, there are probably lots of ways to construct git repositories that make git use a lot of memory in various ways, which would have similar impact as this attack. The fix in P2P/IO.hs would have been higher impact, if it had made it to a released version, since it would have allowed DOSing the tor hidden service without needing to authenticate. (The LockContent and NotifyChanges instances may not be really exploitable; since the line is read and ignored, it probably gets read lazily and does not end up staying buffered in memory.)	2016-12-09 13:34:32 -04:00
Joey Hess	58f5d41cac	fix	2016-12-09 12:56:38 -04:00
Joey Hess	0f3a3ff1e5	make clear that log is only updated after successful removal This does not change behavior, because an exception is thrown on unsuccessful removal. But is clearer.	2016-12-09 12:54:18 -04:00
Joey Hess	b29088b8dc	stub Remote.P2P Similar to GCrypt remotes, P2P remotes have an url, so Remote.Git has to separate them out and handle them, passing off to Remote.P2P. This commit was sponsored by Ignacio on Patreon.	2016-12-06 12:27:58 -04:00
Joey Hess	0a4479b8ec	Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors. ghc 8 added backtraces on uncaught errors. This is great, but git-annex was using error in many places for a error message targeted at the user, in some known problem case. A backtrace only confuses such a message, so omit it. Notably, commands like git annex drop that failed due to eg, numcopies, used to use error, so had a backtrace. This commit was sponsored by Ethan Aubin.	2016-11-15 21:29:54 -04:00
Joey Hess	8dcf79694d	enable forwardRetry for command-line transfers If a transfer fails for some reason, but some data managed to be sent, the transfer will be retried. (The assistant already did this.) Possible impacts: * More ssh prompts if ssh needs to prompt for a password to connect to a host, or is prompting about some other problem like a ssh key mismatch. * More data transfer due to retrying, epecially when a remote does not support resuming a transfer. In the worst case, a lot of data will be transferred but it fails before the end, and then all that data gets transferred again plus one byte more; repeat until it manages to get the whole file.	2016-10-26 15:38:27 -04:00
Joey Hess	312ef4dfae	make --json-progress update meter when getting from git remote with rsync	2016-09-09 16:05:45 -04:00
Joey Hess	10ddf2c3bd	remove TransferObserver unused after last commit	2016-08-03 13:46:20 -04:00
Joey Hess	f4db181d9b	fix warning	2016-05-27 11:15:52 -04:00
Joey Hess	1b3bde0625	enableremote: Remove annex-ignore configuration from a remote.	2016-05-24 15:58:27 -04:00
Joey Hess	91df4c6b53	Pass the various gnupg-options configs to gpg in several cases where they were not before. Removed the instance LensGpgEncParams RemoteConfig because it encouraged code that does not take the RemoteGitConfig into account. RemoteType's setup was changed to take a RemoteGitConfig, although the only place that is able to provide a non-empty one is enableremote, when it's changing an existing remote. This led to several folow-on changes, and got RemoteGitConfig plumbed through.	2016-05-23 17:03:20 -04:00

1 2 3 4 5 ...

307 commits