git-annex

Author	SHA1	Message	Date
Joey Hess	46059ab0e5	split off versionedExport from appendonly S3 uses versionedExport, while GitLFS uses appendonly. This is groundwork for later changes.	2020-12-28 14:37:15 -04:00
Joey Hess	2e72590a48	avoid using export method when the remote only supports import	2020-12-23 13:40:56 -04:00
Joey Hess	e3d356fe84	borg: add subdir= config Note that, after changing it with enableremote, syncing won't rescan known archives in the borg repo using the changed config. Probably not a problem? Also used File in some places where filenames that could theoretically start with - are passed to borg, to avoid it confusing them with options.	2020-12-23 13:12:11 -04:00
Joey Hess	4254e2297d	implement retrieveExportWithContentIdentifier Moved out an XXX to a todo This seems about ready to merge..	2020-12-22 16:16:48 -04:00
Joey Hess	a9d639c5b5	borg can prompt	2020-12-22 15:48:17 -04:00
Joey Hess	df4942e179	notice when an archive that was seen before gets deleted	2020-12-22 15:45:06 -04:00
Joey Hess	523b7143e0	implemented checkPresentExportWithContentIdentifier	2020-12-22 15:34:41 -04:00
Joey Hess	4f9969d0a1	optimisation for borg Skip needing to list importable contents when unchanged since last time.	2020-12-22 15:00:05 -04:00
Joey Hess	e1ac42be77	convert listImportableContents to throwing exceptions	2020-12-22 14:24:29 -04:00
Joey Hess	5d8e4a7c74	avoid borg list of archives that have been listed before This makes sync a lot faster in the common case where there's no new backup. There's still room for it to be faster. Currently the old imported tree has to be traversed, to generate the ImportableContents. Which then gets turned around to generate the new imported tree, which is identical. So, it would be possible to just return a "no new imports", or an ImportableContents that has a way to graft in a tree. The latter is probably too far to go to optimise this, unless other things need it. The former might be worth it, but it's already pretty fast, since git ls-tree is pretty fast.	2020-12-22 14:06:40 -04:00
Joey Hess	7f7094a7cb	include borg archive name in tree, use empty ContentIdentifier It's unusual to use a ContentIdentifier that is not semi-unique for different contents. Note that in importKeys, it checks if a content identifier is one that's known before, to avoid downloading the same content twice. But that's done in a code path not used for borg repos, because they are thirdpartypopulated.	2020-12-22 11:53:00 -04:00
Joey Hess	bcd55b365c	import from borg is basically working Still some issues to deal with, see TODO and XXX. Here's what gets logged, for each key: cid log: 1608582045.832799227s 6720ebad-b20e-4460-a8f2-2477361aea75 !MjAyMC0xMi0yMVQxMTozMzoxNw==:!MjAyMC0xMi0yMVQxMzowNzoyNg== The "!Mj" are base64 encoded borg archive names, since mine were dates and contained some characters not allowed in cid logs unescaped. There were archives that each contained the key. This list will grow as more borg backups are done and learned about. tree generated: 120000 blob 5ef6a4615c084819b44cd4e3a31657664ddf643b x/dotgit/annex/objects/06/mv/SHA256E-s30--a5d8532e64ec28f5491e25e7a6c1cb68f80507c1be6c1b35f8ec53d25413e5da/SHA256E-s30--a5d8532e64ec28f5491e25e7a6c1cb68f80507c1be6c1b35f8ec53d25413e5da 120000 blob 063a139d3021c8db60f5c576d29fada2b824d91c x/dotgit/annex/objects/72/PP/SHA256E-s30--e80b09a854b4e4d99a76caaa6983b34272480e0b4fdb95d04234a54b4849b893/SHA256E-s30--e80b09a854b4e4d99a76caaa6983b34272480e0b4fdb95d04234a54b4849b893 120000 blob b53b54916fd6abf21fedf796deca08d5ac7a75af x/dotgit/annex/objects/Ww/pk/SHA256E-s30--6aac072a8ebf02a5807c4f15e77ed585a6c87b3b333ba625a3c8d6b4dc50a9f2/SHA256E-s30--6aac072a8ebf02a5807c4f15e77ed585a6c87b3b333ba625a3c8d6b4dc50a9f2 This commit was sponsored by Denis Dzyubenko on Patreon.	2020-12-21 16:37:55 -04:00
Joey Hess	15000dee07	improve thirdpartypopulated support May actually work now. Note that, importKey now has to add the size to the key if it's supposed to have size. Remote.Directory relied on the importer adding the size, which is no longer done, so it was changed; it was the only one. This way, importKey does not need to behave differently between regular and thirdpartypopulated imports.	2020-12-21 16:19:44 -04:00
Joey Hess	706e2a63fb	fix logic error in thirdPartyPopulated handling	2020-12-21 13:24:07 -04:00
Joey Hess	ca31d7e54f	refactor That code was not borg specific, and I can see making more remotes for other backup software.	2020-12-18 17:08:44 -04:00
Joey Hess	1c054f1cf7	started borg special remote Still need to implement 3 methods, but importKeyM looks like it will work well to find annex object files.	2020-12-18 16:56:54 -04:00
Joey Hess	3207e8293b	start borg special remote Compiles, but unusable so far.	2020-12-18 16:03:51 -04:00
Joey Hess	9a2c8757f3	add thirdPartyPopulated interface This is to support, eg a borg repo as a special remote, which is populated not by running git-annex commands, but by using borg. Then git-annex sync lists the content of the remote, learns which files are annex objects, and treats those as present in the remote. So, most of the import machinery is reused, to a new purpose. While normally importtree maintains a remote tracking branch, this does not, because the files stored in the remote are annex object files, not user-visible filenames. But, internally, a git tree is still generated, of the files on the remote that are annex objects. This tree is used by retrieveExportWithContentIdentifier, etc. As with other import/export remotes, that the tree is recorded in the export log, and gets grafted into the git-annex branch. importKey changed to be able to return Nothing, to indicate when an ImportLocation is not an annex object and so should be skipped from being included in the tree. It did not seem to make sense to have git-annex import do this, since from the user's perspective, it's not like other imports. So only git-annex sync does it. Note that, git-annex sync does not yet download objects from such remotes that are preferred content. importKeys is run with content downloading disabled, to avoid getting the content of all objects. Perhaps what's needed is for seekSyncContent to be run with these remotes, but I don't know if it will just work (in particular, it needs to avoid trying to transfer objects to them), so I skipped that for now. (Untested and unused as of yet.) This commit was sponsored by Jochen Bartl on Patreon.	2020-12-18 15:23:58 -04:00
Joey Hess	f930176d6e	change info from export=yes to exporttree=yes and same for import for consistency	2020-12-17 17:06:50 -04:00
Joey Hess	e9db382308	avoid redundant set of a S3 verison ID that is already recorded I think this could cause unnecessary changes to the git-annex branch, and retrieveExportWithContentIdentifier is now also used for getting content from importtree=yes remotes, so it would happen more frequently so let's avoid.	2020-12-17 16:49:17 -04:00
Joey Hess	77aedbef8b	fix call to warnExportImportConflict That needs a Remote that has the right export/import set up, not the input Remote, which does not yet.	2020-12-17 16:25:02 -04:00
Joey Hess	f2ecc6e0da	import remotes use ContentIdentifier for getting and checking content This is better than using the equivilant actions for export remotes, especially for getting content, since the ContentIdentifier checking means we can be sure (enough) that the content is valid to not force verification of content. Which allows getting keys of types that cannot be verified. Also, reorganized the internals of adjustExportImport which was becoming very hard to follow. Now it's clear what each method does in each case.	2020-12-17 15:55:31 -04:00
Joey Hess	5946e7136e	force verification after getting file from export remote This way, if annex.verify is disabled, it's still checked, since this is not a key/value store, it has to be checked.	2020-12-17 15:31:22 -04:00
Joey Hess	ceda8c0066	refactor common code	2020-12-17 14:17:09 -04:00
Joey Hess	4d2cd58ee5	provide missing remote actions for importree only remote Ah, it seemed too easy before when I was implementing importrree only, and it was because all the key-based actions needed to be handled too. Mostly copied from isexport, and this works. It does seem that an import remote could use retrieveExportWithContentIdentifier rather than retrieveExport, and checkPresentExportWithContentIdentifier rather than checkPresentExport, which would both be more accurate.	2020-12-17 13:46:34 -04:00
Joey Hess	6c890d62f6	initremote: Prevent enabling encryption with exporttree=yes/importtree=yes I do think this was a reversion, but I have not tracked back to what version. While involving the remote config, it's not the same class of problems that I kept having to chase down for a while after the remote config parser reworking.	2020-12-15 12:08:08 -04:00
Joey Hess	230e1c88a9	improve display	2020-12-14 13:13:53 -04:00
Joey Hess	d3f78da0ed	propagate signals to the transferrer process group Done on unix, could not implement it on windows quite. The signal library gets part of the way needed for windows. But I had to open https://github.com/pmlodawski/signal/issues/1 because it lacks raiseSignal. Also, I don't know what the equivilant of getProcessGroupIDOf is on windows. And System.Process does not provide a way to send any signal to a process group except for SIGINT. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-12-11 15:32:00 -04:00
Joey Hess	a422a056f2	make getViaTmpFrom no longer update location log All callers adjusted to update it themselves. In Command.ReKey, and Command.SetKey, the cleanup action already did, so it was updating the log twice before. This fixes a bug when annex.stalldetection is set, as now Command.Transferrer can skip updating the location log, and let it be updated by the calling process.	2020-12-11 11:50:13 -04:00
Joey Hess	6a11b6fab8	Support special remotes that are configured with importtree=yes but without exporttree=yes There was no particular reason not to support this, other than maybe a lack of a use case. One use case would of course be a remote that you want to avoid overwriting content on. A new use case is the idea of importing from backups, eg borg, where exporting is not necessarily supported at all. This commit was sponsored by Brock Spratlen on Patreon.	2020-12-10 13:17:40 -04:00
Joey Hess	63839532c9	remove uses of warningIO It's not concurrent-output safe, and doesn't support --json-error-messages. Using Annex.makeRunner is a bit scary, because what if it's run in a different thread from an active annex action? Normally the same Annex state is not used concurrently in several threads, and it's not designed to be fully concurrency safe. (Annex.Concurrent exists to deal with that.) I think it will be ok in these simple cases though. Eg, when buffering a warning message to json, Annex.changeState is used, and it modifies the MVar in a concurrency safe way. The only warningIO remaining is not a problem.	2020-12-02 14:57:43 -04:00
Joey Hess	7776677a5f	Fix hang on shutdown of external special remote using ASYNC protocol extension. Reversion introduced in version 8.20201007, one release after the 1st release with the extension. Surprisingly, hClose can hang if another thread is reading from the handle. This is because it uses takeMVar. The use of cancel here does mean that, if receiveMessageAddonProcess or Remote.External.AsyncExtension.receiveloop allocated some resource in a non-async-exception safe way, they might not get a chance to clean it up. They do not appear to, and anyway, this only happens when git-annex is shutting down, so any recource that did leak would not be a problem. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-11-30 13:04:02 -04:00
Joey Hess	a3b714ddd9	finish fixing removeLink on windows `9cb250f7be` got the ones in RawFilePath, but there were others that used the one from unix-compat, which fails at runtime on windows. To avoid this, import System.PosixCompat.Files hiding removeLink This commit was sponsored by Ethan Aubin.	2020-11-24 13:20:44 -04:00
Joey Hess	613455e059	convert to use hGetLineUntilExitOrEOF It looks to me like the old code would have already dealt with the case of ssh starting a ssh daemon that inherits stderr and keeps it open. The ender thread closed the handle, which would unblock the other thread and let it exit. Using hGetLineUntilExitOrEOF makes this more explicit that it's dealt with and simplifies the code.	2020-11-19 16:13:31 -04:00
Joey Hess	66497d39b3	convert git config reading to use hGetLineUntilExitOrEOF Much nicer than the old hack of waiting for a few seconds for stderr to be read.	2020-11-19 15:38:43 -04:00
Kyle Meyer	9e09dcb2cf	BitTorrent: Fix build for "no torrent" code path The RawFilePath conversions missed a spot in the else arm of "#ifdef WITH_TORRENTPARSER".	2020-11-19 14:46:21 -04:00
Joey Hess	4b739fc460	Fix build on Windows Thanks to bug reporter for the patch.	2020-11-19 12:33:00 -04:00
Joey Hess	0896038ba7	annex.adjustedbranchrefresh Added annex.adjustedbranchrefresh git config to update adjusted branches set up by git-annex adjust --unlock-present/--hide-missing. Note, in a few cases, I was not able to make the adjusted branch be updated in calls to moveAnnex, because information about what file corresponds to a key is not available. They are: * If two files point to one file, then eg, `git annex get foo` will update the branch to unlock foo, but will not unlock bar, because it does not know about it. Might be fixable by making `git annex get bar` do something besides skipping bar? * git-annex-shell recvkey likewise (so sends over ssh from old versions of git-annex) * git-annex setkey * git-annex transferkey if the user does not use --file * git-annex multicast sends keys with no associated file info Doing a single full refresh at the end, after any incremental refresh, will deal with those edge cases.	2020-11-16 14:27:28 -04:00
Joey Hess	885974be99	add newtypes for QuickCheck to avoid LANG=C issues All properties changed to use them, except for prop_encode_c_decode_c_roundtrip, which already filtered to ascii for other reasons. A few modules had to be split out, because Setup does not build-depend on QuickCheck.	2020-11-09 20:21:18 -04:00
Joey Hess	1db49497e0	finished this stage of the RawFilePath conversion This commit was sponsored by Denis Dzyubenko on Patreon.	2020-11-06 14:10:58 -04:00
Joey Hess	9b0dde834e	convert getFileSize to RawFilePath Lots of nice wins from this in avoiding unncessary work, and I think nothing got slower. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-11-05 11:32:57 -04:00
Joey Hess	87f91ce563	more RawFilePath conversion 451/645	2020-10-30 15:55:59 -04:00
Joey Hess	ca80c3154c	more RawFilePath conversion removeFile changed to removeLink, because AFAICS it should be fine to remove non-file things here. In particular, it's fine to remove a symlink, since we're about to write a symlink. (removeLink does not remove directories, so file, symlink, and unix socket are the only possibilities.)	2020-10-30 13:07:41 -04:00
Joey Hess	8f452416f7	more RawFilePath conversion Better to use Git.repoPath to get a filepath, Git.repoLocation does not always return one. This commit was sponsored by Ethan Aubin.	2020-10-30 13:00:12 -04:00
Joey Hess	19694fb280	more RawFilePath conversion At this point I'll be done by new year's. This commit was sponsored by Ethan Aubin.	2020-10-30 12:51:34 -04:00
Joey Hess	681b44236a	more RawFilePath conversion at 377/645 This commit was sponsored by Svenne Krap on Patreon.	2020-10-29 14:20:57 -04:00
Joey Hess	b05015f772	fix name of lock file It was the stringification of a UUID, so "UUID \"foo\""	2020-10-29 10:53:01 -04:00
Joey Hess	e505c03bcc	more RawFilePath conversion nukeFile replaced with removeWhenExistsWith removeLink, which allows using RawFilePath. Utility.Directory cannot use RawFilePath since setup does not depend on posix. This commit was sponsored by Graham Spencer on Patreon.	2020-10-29 10:50:29 -04:00
Joey Hess	8d66f7ba0f	more RawFilePath conversion Added a RawFilePath createDirectory and kept making stuff build. Up to 296/645 This commit was sponsored by Mark Reidenbach on Patreon.	2020-10-28 17:25:59 -04:00
Joey Hess	b2bf099aa3	use removeDirGeneric here too for consistency And because it might be more robust on windows.	2020-10-23 16:12:47 -04:00
Joey Hess	4d063f12c6	turns out this was fixed in 2014	2020-10-22 19:54:26 -04:00
Joey Hess	b62e004c2c	update chunk log after speculated chunks are verified to be present Only done in checkPresentChunks, although retrieveChunks could also do it. Does not seem necessary though, because git-annex never retrives content without first checking if it's present AFAICR. And really this will only be needed when using fsck. Puttting it here, rather than in fsck avoids breaking an abstraction boundary, and is nice and inexpensive.	2020-10-22 13:37:09 -04:00
Joey Hess	dad4be97c2	speculatively use remote's configured chunk size as a fallback When a special remote has chunking enabled, but no chunk sizes are recorded (or the recorded ones are not found), speculatively try chunks using the configured chunk size. This makes eg, git-annex fsck --from remote be able to fix up the location log of a file that the git-annex branch does not indicate is stored on the remote. Note that fsck does not fix up the chunk log to indicate the chunk size. So, changing the chunk config of the remote after that will still prevent accessing the chunks stored on it. Maybe fsck should, but I wanted to start with this and see if it's needed.	2020-10-22 13:11:06 -04:00
Joey Hess	2dd38b6403	switch to Haskell2010 When I put in Haskell98 this spring, I was under the mistaken apprehension that ghc defaulted to that. But it actually its default is a third mode, which is closer to Haskell2010 but with some differences. The manual says "By default, GHC mainly aims to behave (mostly) like a Haskell 2010 compiler" Fixed two cases where the Haskell98 do indentation flexability let wrongly indented code build. That is one of the places where ghc does not behave like Haskell2010 by default. The other place that I think I was concerned about, is GHC manual section 19.1.1.3. Expressions and patterns. But that only seems to affect code using bottoms, so would only affect pure functions throwing an error, which I don't think git-annex does in many places as it's pretty horrid style. And it would only affect rare cases like shown in that section. If it did happen, it would mean that the error was not thrown before specifying Haskell98, and then was. Haskell2010 behaves the same as Haskell98. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-10-19 11:26:16 -04:00
Joey Hess	4c32499e82	Parse youtube-dl progress output Which lets progress be displayed when doing concurrent downloads. Amoung other things, like --json-progress etc. The youtube-dl output is no longer displayed, except for any errors. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-09-29 17:53:48 -04:00
Joey Hess	084b502c7a	httpalso: Support being used with special remotes that do not have encryption= in their config.	2020-09-29 13:56:27 -04:00
Joey Hess	5cfcf1f05f	cache remote.log Unlikely to speed up any of the existing uses much, but I want to use it in a message that might be displayed many times.	2020-09-22 13:52:26 -04:00
Joey Hess	5844a54869	aws-0.22 improved its support for setting etags, which improves support for versioned S3 buckets. Remove placeholder version number I used when implementing the feature in aws. This commit was sponsored by Ethan Aubin.	2020-09-14 18:37:49 -04:00
Joey Hess	ddf963d019	deepseq all things returned from ResourceT http Potentially fixes https://git-annex.branchable.com/bugs/concurrent_git-annex-copy_to_s3_special_remote_fails/ although I don't know if it does. My thinking is, ResourceT may allocate a resource and then free it, and a unforced thunk to that resource could result in reading memory that has since been overwritten by something else, or in a SEGV, depending. While that seems kind of like a bug in ResourceT to me, if it is what's happening, this will avoid it. If it's not, this doesn't really hurt much since the values are all smallish. This commit was sponsored by Graham Spencer on Patreon.	2020-09-14 18:30:06 -04:00
Joey Hess	6ea511beb4	Removed the S3 and WebDAV build flags So these special remotes are always supported. IIRC these build flags were added because the dep chains were a bit too long, or perhaps because the libraries were not available in Debian stable, or something like that. That was long ago, those reasons no longer apply, and users get confused when builtin special remotes are not available, so it seems best to remove the build flags now. If this does cause a problem it can be reverted of course.. This commit was sponsored by Jochen Bartl on Patreon.	2020-09-08 12:42:59 -04:00
Joey Hess	eed20fe3b7	fix some file modes in calls to withTmpFileIn to honor umask Also audited for other calls to openTempFile, and all are ok, except for viaTmp which will need further work. Remote.Directory fixed to set umask mode when writing to an export, although it has another one using viaTmp that's not fixed. Will make exports that are published via a http server running as another user work, for example. Remote.BitTorrent fixed to set umask mode when downloading the torrent file. Normally this does not matter as that file does not hang around after the download, but if a bittorrent download were started by one user, got interrupted and then another user ran it, this will let them access the torrent file created by the first user.	2020-09-02 14:36:08 -04:00
Joey Hess	26724fb331	display actual download errors Eg, when config prohibits accessing localhost, need to show that message, not a generic "download failed".	2020-09-02 12:21:10 -04:00
Joey Hess	31e5785bf7	avoid multiple download failed messages when learning Also only display one progress meter for all download attempts, to avoid a bunch of blank lines.	2020-09-02 12:01:50 -04:00
Joey Hess	854cd2ad47	httpalso: support exporttree=yes Also tested what happens if the other special remote has importtree=yes and exporttree=yes, and in that case, download via httpalso works too, without needing to implement any importtree methods here. It might be possible to make it automatically set exporttree=yes if the --sameas does. Didn't try, will probably be layering issues. Or perhaps it should be inherited by sameas like some other configs? But then, wouldn't it also make sense to inherit importree=yes? But as shown here, it's not needed by this kind of remote.	2020-09-02 11:26:00 -04:00
Joey Hess	8656afd3e1	rename http special remote to httpalso "http" was too generic and easy to confuse with web. The new name makes clear it's used in addition to some other remote. And other protocols can use the same naming scheme.	2020-09-02 10:41:53 -04:00
Joey Hess	571ec900ac	Added http special remote, which is useful for accessing other remotes that publish content stored in them via http/https. With automatic layout learning!	2020-09-01 15:16:35 -04:00
Joey Hess	d00ce82418	fix hang if external program is not available startExternal' throws an exception, which left the externalAsync TMVar empty, so the next try to use it would hang.	2020-08-19 12:20:07 -04:00
Joey Hess	ad64079b44	fix some warnings	2020-08-15 14:33:18 -04:00
Joey Hess	f241a3cd3d	Display warning when external special remote does not start up properly, or is not usable I'm sure this used to work, but somewhere along the line something or things (getCost and getAvailability I think, probably others) started catching the exception and not displaying it. So, show warnings.	2020-08-14 15:38:31 -04:00
Joey Hess	198b709561	switch to TMVars for thread safety when using the async extension TVars were not updated atomically, which was ok when each thread got its own External that was the only thing using these TVars. But, with the async extension, several External instances can share the same var, so it needs to be a TMVar to avoid read/write conflicts. In particular, this makes PREPARE only be sent once.	2020-08-14 14:50:09 -04:00
Joey Hess	7da2d4dd2d	one jobid per thread And, relay ERROR on to all listening threads.	2020-08-14 14:24:46 -04:00
Joey Hess	72561563d9	rethought the async protocol some more Moving jobid generation to the git-annex side lets it be simplified a lot. Note that it will also be possible to generate one jobid per connection, rather than a new job per request. That will make overflow not an issue, and will avoid some work, and will simplify some of the code.	2020-08-13 20:18:06 -04:00
Joey Hess	59cbb42ee2	async proto fully tested and working Including with a concurrent capable remote program. However, this is not quite ready to merge, there's a TODO in the code.	2020-08-13 16:22:11 -04:00
Joey Hess	7546e686a2	async proto basically working Simplified the protocol by removing END-ASYNC. There's a STM crash when a non-async protocol message is sent, which needs to be fixed.	2020-08-13 15:52:12 -04:00
Joey Hess	c9e8cafb98	further work on external async relay	2020-08-12 16:25:53 -04:00
Joey Hess	15706e6991	relayer receive loop is done Receive loop looks right. Still need the send loop. And, a complication is that some messages git-annex sends need to be wrapped in REPLY_ASYNC, while others do not. So will probably need to split externalSend into two.	2020-08-12 15:56:58 -04:00
Joey Hess	06a4ab39fa	wip external remote async protocol extension	2020-08-12 15:17:53 -04:00
Joey Hess	3f8c808bd7	generalized ExternalState to not be limited to a ExternalAddonProcess Idea is for ASYNC extension, it will instead contain methods that communicate with the thread that handles all communication with the external process.	2020-08-12 12:30:45 -04:00
Joey Hess	5f4228dc2b	types for async protocol extension renamed AsyncMessage to ExceptionalMessage to make way for this new extension.	2020-08-12 12:04:12 -04:00
Joey Hess	f75be32166	external backends wip It's able to start them up, the only thing not implemented is generating and verifying keys. And, the key translation for HasExt.	2020-07-29 15:23:18 -04:00
Joey Hess	555fe669e1	refactoring in preparation for external backends	2020-07-29 12:00:27 -04:00
Joey Hess	2a45b5ae9a	avoid failure to lock content of removed file causing drop etc to fail This was already prevented in other ways, but as seen in commit `c30fd24d91`, those were a bit fragile. And I'm not sure races were avoided in every case before. At least a race between two separate git-annex processes, dropping the same content, seemed possible. This way, if locking fails, and the content is not present, it will always do the right thing. Also, it avoids the overhead of an unncessary inAnnex check for every file. This commit was sponsored by Denis Dzyubenko on Patreon.	2020-07-25 11:59:33 -04:00
Joey Hess	57cceac569	simplify interface by removing size Add size to the returned key after the fact, unless the remote happened to add it itself.	2020-07-03 14:22:22 -04:00
Joey Hess	85cd79ea01	no importKey for android yet adb shell has sha256sum sha1sum and some others, so they could be used. They're provided by toybox, so seem about as likely to keep working as find and stat, which it already depends on. Or to not add a dep, could use stat the same as getExportContentIdentifier to get a mtime, and make a WORM key. But do I really want this to default to WORM? Unsure what's the best path, so punting for now.	2020-07-03 14:02:50 -04:00
Joey Hess	ddcab38e4a	no importKey for S3 for now The Etag is sometimes a md5, but not if eg, there was a multipart upload. May revisit later if there's demand.	2020-07-03 13:53:14 -04:00
Joey Hess	85506a7015	import: Added --no-content option, which avoids downloading files from a special remote Only supported by some special remotes: directory I need to check the rest and they're currently missing methods until I do. git-annex sync --no-content does not yet use this to do imports	2020-07-03 13:41:57 -04:00
Joey Hess	0f26782a73	fix windows build more	2020-07-02 12:01:09 -04:00
Joey Hess	00497fd38e	fix windows build	2020-07-02 11:46:26 -04:00
Joey Hess	8ad433d5f0	fix windows build	2020-07-02 11:35:05 -04:00
Joey Hess	8b22e0bf37	lockContent for tahoe Trivial since git-annex cannot remove, but do an active checkKey verification anyway, in case the data was lost somehow. This commit was sponsored by Ryan Newton on Patreon.	2020-06-26 14:23:21 -04:00
Joey Hess	3175015d1b	lockContent for S3 (with versioning=yes) and git-lfs Made several special remotes support locking content on them while dropping, which allows dropping from another special remote when the content will only remain on a special remote of these types. In both cases, verify the content is present actively, because it's certianly possible for things other than git-annex to have removed it. Worth thinking about what to do if at some later point, git-lfs gains support for dropping content, and a content locking operation. That would probably need a transition; first would need to make lockContent use the locking operation. Then, once enough time had passed that we can assume any git-annex operating on the git-lfs remote had that change, git-annex could finally allow dropping from git-lfs. Or, it could be that git-lfs gains support for dropping content, but not locking it. In that case, it seems this commit would need to be reverted, and then wait long enough for that git-annex to be everywhere, and only then can git-annex safely support dropping from git-lfs. So, the assumption made in this commit could lead to bother later.. But I think it's actually highly unlikely git-lfs does ever support dropping; it's outside their centralized model. Probably. :) Worth keeping in mind as the same assumption is made about other special remotes though. This commit was sponsored by Ethan Aubin.	2020-06-26 13:46:42 -04:00
Joey Hess	01eb863a14	Build with the git-lfs library when available Otherwise use the vendored copy as before. The library is in Debian testing but not stable. Once it reaches stable, the vendored copy can be removed. Did not add it to debian/control because IIRC that's used to build git-annex on stable too, possibly. However, the Debian maintainer will probably want to make the package depend on libghc-git-lfs-dev. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-06-22 11:21:25 -04:00
Joey Hess	aa1ad0b7ca	remove redundant imports Clean build under ghc 8.8.3, which seems to do better at finding cases where two imports both provide the same symbol, and warns about one of them. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-06-22 11:05:34 -04:00
Joey Hess	5c0dc7dc0b	fix a further bug	2020-06-16 18:14:36 -04:00
Joey Hess	ad81feb053	fix implicit embedcreds regression Fix bug that made creds not be stored in git when a special remote was initialized with gpg encryption, but without an explicit embedcreds=yes. (Yet nother regression introduced in version 7.20200202.7. 5th so far.)	2020-06-16 18:00:19 -04:00
Joey Hess	a1d4c8e4ec	external: SETCREDS include creds in externalConfigChanges This makes the creds get saved, since only things recorded there will be saved. IIRC, unparsedRemoteConfig was not originally available when I implemented this; now that it is things get a bit simpler. More could probably be simplified, is externalConfigChanges needed at all? This does not entirely fix the bugs though, because creds are only embedded when embedcreds=yes, but not when encryption=pubkey is used without embedcreds=yes.	2020-06-16 17:24:24 -04:00
Joey Hess	4773713cc9	analysis of regression and fix related less serious regression	2020-06-16 15:16:36 -04:00
Joey Hess	a76b1ba3d6	local git remote autoinit improvements * Improve display of problems auto-initializing or upgrading local git remotes. * When a local git remote cannot be initialized because it has no git-annex branch or a .noannex file, avoid displaying a message about it.	2020-06-16 13:24:00 -04:00
Joey Hess	41952204ce	S3: The REDUCED_REDUNDANCY storage class is no longer cheaper So stop documenting it, and stop offering it as a choice in the assistant. Removed the code that parses it into S3.ReducedRedundancy, because S3.OtherStorageClass with the value will work just the same and avoids a special case for a deprecated this.	2020-06-16 12:04:29 -04:00
Joey Hess	24ff5e2b29	use uninterruptibleMask Some recent changes to use mask missed that async exceptions can still be thrown inside it. The goal is to make sure a block of cleanup code runs entirely, w/o being interrupted by an async exception, so use uninterruptibleMask. Also, converted a few to bracket, which is nicer.	2020-06-09 15:02:56 -04:00
Joey Hess	a49d300545	async exception safety for external special remote processes Since an external process can be in the middle of some operation when an async exception is received, it has to be shut down then. Using cleanupProcess will close its IO handles and send it a SIGTERM. If a special remote choses to catch SIGTERM, it's fine for it to do some cleanup then, but until it finishes, git-annex will be blocked waiting for it. If a special remote blocked SIGTERM, it would cause a hang. Mentioned in docs. Also, in passing, fixed a FD leak, it was not closing the error handle when shutting down the external. In practice that didn't matter before because it was only run when git-annex was itself shutting down, but now that it can run on exception, it would have been a problem.	2020-06-09 12:22:14 -04:00
Joey Hess	3ed797be0f	fix reversion From back in `4be94c67c7`. Caused the test suite to fail, when bup is installed, but was not noticed since the autobuilds don't have bup.	2020-06-05 19:06:09 -04:00
Joey Hess	e41f8c83f3	close stdin handles before waiting on commands Fixes reversion in recent conversions, the old code relied on the GC apparently, but the new code explicitly waits on the process, so must close stdin handle first or the command will never exit.	2020-06-05 17:27:49 -04:00
Joey Hess	ef0024444b	fix reversion It was not the wrong handle. The handle was not being closed, so bup kept running. Before `2670890b17`, the code was: withHandle StdinHandle createProcessSuccess cmd feeder The stdin handle was not closed by the feeder. Testing this: withHandle StdinHandle createProcessSuccess (proc "cat" []) (\h -> hPutStrLn h "hi") There's a rather long pause, a couple seconds, before it completes, but it does complete. With hClose h, it immediately completes. This must be the GC noticing that h is out of scope and closing it. It seems likely that the old code worked only by that accident. So, other similar changes made in that and nearby commits may also have this problem, and need to explicitly close handles that were somehow implicitly closed before.	2020-06-05 17:10:52 -04:00
Joey Hess	291774779f	use right handle	2020-06-05 16:45:12 -04:00
Joey Hess	1dd770b1af	fix file descriptor leak when importing from a directory special remote that is configured with exporttree=yes	2020-06-05 15:34:43 -04:00
Joey Hess	319f2a4afc	audit all uses of SomeException to avoid catching async exceptions Except for the assistant, which I think may use them between threads? Most of the uses of SomeException were already catching only async exceptions. But I did find a few places that were accidentially catching them.	2020-06-05 15:16:57 -04:00
Joey Hess	dca19099a9	async exception safety Masking ensures that EndStderrHandler gets written, so the helper threads shut down. However, nothing currently guarantees that calls to closeP2PSshConnection are async exception safe, so made a note about it. At this point, I've audited all calls to async, and made them all async exception safe, except for ones in the assistant, and a few in leaf commands (remotedaemon, enable-tor, multicast, p2p) which don't need to be.	2020-06-05 14:56:41 -04:00
Joey Hess	2670890b17	convert to withCreateProcess for async exception safety This handles all createProcessSuccess callers, and aside from process pools, the complete conversion of all process running to async exception safety should be complete now. Also, was able to remove from Utility.Process the old API that I now know was not a good idea. And proof it was bad: The code size went down, despite there being a fair bit of boilerplate for some future API to reduce.	2020-06-04 15:45:52 -04:00
Joey Hess	438dbe3b66	convert to withCreateProcess for async exception safety This handles all sites where checkSuccessProcess/ignoreFailureProcess is used, except for one: Git.Command.pipeReadLazy That one will be significantly more work to convert to bracketing. (Also skipped Command.Assistant.autoStart, but it does not need to shut down the processes it started on exception because they are git-annex assistant daemons..) forceSuccessProcess is done, except for createProcessSuccess. All call sites of createProcessSuccess will need to be converted to bracketing. (process pools still todo also)	2020-06-04 12:44:09 -04:00
Joey Hess	c429bbf2bd	remove workaround for old versions of process ghc 8.4.4 has process 1.6.3, which was the first version to include getPid.	2020-06-03 16:03:08 -04:00
Joey Hess	1ee5919d1e	make createProcess calls async exception safe Using cleanupProcess because withCreateProcess cannot run an Annex action, but the effect is the same as using it.	2020-06-03 15:30:30 -04:00
Joey Hess	484a74f073	auto-init autoenable=yes Try to enable special remotes configured with autoenable=yes when git-annex auto-initialization happens in a new clone of an existing repo. Previously, git-annex init had to be explicitly run to enable them. That was a bit of a wart of a special case for users to need to keep in mind. Special remotes cannot display anything when autoenabled this way, to avoid interfering with the output of git-annex query commands. Any error messages will be hidden, and if it fails, nothing is displayed. The user will realize the remote isn't enable when they try to use it, and can run git-annex init manually then to try the autoenable again and see what failed. That seems like a reasonable approach, and it's less complicated than communicating something across a pipe in order to display it as a side message. Other reason not to do that is that, if the first command the user runs is one like git-annex find that has machine readable output, any message about autoenable failing would need to not be displayed anyway. So better to not display a failure message ever, for consistency. (Had to split out Remote.List.Util to avoid an import cycle.)	2020-05-27 12:40:35 -04:00
Joey Hess	c108fa16f1	improve error message when download from non-chunked remote fails Avoid "chunk retrieval failed" in this case, when it tries to fall back to using chunks. This commit was sponsored by Ethan Aubin.	2020-05-21 14:44:40 -04:00
Joey Hess	e63dcbf36c	fix embedcreds=yes reversion Fix bug that made enableremote of S3 and webdav remotes, that have embedcreds=yes, fail to set up the embedded creds, so accessing the remotes failed. (Regression introduced in version 7.20200202.7 in when reworking all the remote configs to be parsed.) Root problem is that parseEncryptionConfig excludes all other config keys except encryption ones, so it is then unable to find the credPairRemoteField. And since that field is not required to be present, it proceeds as if it's not, rather than failing in any visible way. This causes it to not find any creds, and so it does not cache them. When when the S3 remote tries to make a S3 connection, it finds no creds, so assumes it's being used in no-creds mode, and tries to find a public url. With no public url available, it fails, but the failure doesn't say a lack of creds is the problem. Fix is to provide setRemoteCredPair with a ParsedRemoteConfig, so the full set of configs of the remote can be parsed. A bit annoying to need to parse the remote config before the full config (as returned by setRemoteCredPair) is available, but this avoids the problem. I assume webdav also had the problem by inspection, but didn't try to reproduce it with it. Also, getRemoteCredPair used getRemoteConfigValue to get a ProposedAccepted String, but that does not seem right. Now that it runs that code, it crashed saying it had just a String. Remotes that have already been enableremoted, and so lack the cached creds file will work after this fix, because getRemoteCredPair will extract the creds from the remote config, writing the missing file. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-21 14:35:30 -04:00
Joey Hess	6361074174	convert renameExport to throw exception Finishes the transition to make remote methods throw exceptions, rather than silently hide them. A bit on the fence about this one, because when renameExport fails, it falls back to deleting instead, and so does the user care why it failed? However, it did let me clean up several places in the code. This commit was sponsored by Ethan Aubin.	2020-05-15 15:08:09 -04:00
Joey Hess	037440ef36	convert removeExportDirectory to throw exception Part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-15 14:43:18 -04:00
Joey Hess	cdbfaae706	change removeExport to throw exception Part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. This commit was sponsored by Graham Spencer on Patreon.	2020-05-15 14:15:14 -04:00
Joey Hess	3334d3831b	change retrieveExport and getKey to throw exception retrieveExport is part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. getKey very rarely fails, and when it does it's always for the same reason (user configured annex.backend to url for some reason). So, this will avoid dealing with Nothing everywhere it's used. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-15 13:45:53 -04:00
Joey Hess	4814b444dd	make storeExport throw exceptions	2020-05-15 12:20:02 -04:00
Joey Hess	4be94c67c7	make removeKey throw exceptions	2020-05-14 14:11:05 -04:00
Joey Hess	d9c7f81ba4	make retrieveKeyFile and retrieveKeyFileCheap throw exceptions Converted retrieveKeyFileCheap to a Maybe, to avoid needing to throw a exception when a remote doesn't support it.	2020-05-13 17:07:07 -04:00
Joey Hess	c1cd402081	make storeKey throw exceptions When storing content on remote fails, always display a reason why. Since the Storer used by special remotes already did, this mostly affects git remotes, but not entirely. For example, if git-lfs failed to connect to the endpoint, it used to silently return False.	2020-05-13 14:03:00 -04:00
Joey Hess	b50ee9cd0c	remove Preparer abstraction That had almost no benefit at all, and complicated things quite a lot. What I proably wanted this to be was something like ResourceT, but it was not. The few remotes that actually need some preparation done only once and reused used a MVar and not Preparer.	2020-05-13 11:56:21 -04:00
Joey Hess	be5caeaf51	catch more exceptions Just in case a non-IO exception might somehow be thrown.	2020-05-12 13:05:06 -04:00
Joey Hess	5f5170b22b	remove SafeFilePath Move sanitizeFilePath call to where fromSafeFilePath had been.	2020-05-11 14:04:56 -04:00
Joey Hess	69e2e4763e	only check --force at init time, not enable time git-lfs repos that encrypt the annexed content but not the git repo only need --force passed to initremote, allow enableremote and autoenable of such remotes without forcing again. Needing --force again particularly made autoenable of such a repo not work. And once such a repo has been set up, it seems a second --force when enabling it elsewhere has little added value. It does tell the user about the possibly insecure configuration, but if the git repo has already been pushed to that remote in the clear, data has already been exposed. The goal of that --force was not to prevent every situation where such an exposure can happen -- anyone who sets up a public git repo and pushes to it will expose things similarly and git-annex is not involved. Instead, the purpose of the --force is to point out to the user that they're asking for a configuration where encryption is inconsistently applied.	2020-05-07 15:59:29 -04:00
Joey Hess	1532d67c3e	S3: Support signature=v4 To use S3 Signature Version 4. Some S3 services seem to require v4, while others may only support v2, which remains the default. I'm also not sure if v4 works correctly in all cases, there is this upstream bug report: https://github.com/aristidb/aws/issues/262 I've only tested it against the default S3 endpoint.	2020-05-07 13:18:11 -04:00
Joey Hess	f9ed30de3b	avoid beware of the leopard situation * Display a warning message when a remote uses a protocol, such as git://, that git-annex does not support. Silently skipping such a remote was confusing behavior. It sets annex-ignore, so the warning is only displayed once. * Also display a warning message when a remote, without a known uuid, is located in a directory that does not currently exist, to avoid silently skipping such a remote. This is a bit more debatable, since git-annex get will say, try making repository available. And since it does not set annex-ignore, the warning will be displayed repeatedly. It's also an extreme edge case, I don't think I've ever seen it happen in real life.	2020-05-04 13:01:11 -04:00
Joey Hess	2aeb79249b	external: stop storing readonly=true in remote.log readonly=true is used to make an external special remote that does not need the external program to be installed. It was stored in the remote.log by default, and so every time it was specified in an enableremote or initremote, whatever value was used became the new default for subsequent enableremotes of that remote. That was surprising, and I consider it to be a bug. It does not make much sense to pass it to initremote because then how would you populate that remote with anything? You would have to enableremote elsewhere, and store content there. I'm assuming nobody used it that way. Someone might rely on passing it to enableremote once, and then that being inherited in other clones. But that is not how it's documented to be used. It is barely documented in git-annex at all, only in the external special remote protocol, and the documentation there says to "Document that this external special remote can be used in readonly mode." (by the user of it passing readonly=true to enableremote). The one external special remote that I know of that does document that is <https://github.com/bgilbert/gcsannex> (the one that motivated adding it). That one's docs do say to pass it to enableremote. So, it seemed safe to make this behavior change. If someone was in fact relying on one of those behaviors, all their current repos will still work as they configured them (although they will need to deal with the related change in `9f3c2dfeda`). In new clones, they will find enableremote fails, complaining the external program is not in path. An easy enough problem to recover from.	2020-04-23 15:21:26 -04:00
Joey Hess	9f3c2dfeda	stop using remote.name.annex-readonly for two distinct things	2020-04-23 14:56:03 -04:00
Joey Hess	cd1676d604	fix bug involving local git remote and out of date location log get --from, move --from: When used with a local git remote, these used to silently skip files that the location log thought were present on the remote, when the remote actually no longer contained them. Since that behavior could be surprising, now instead display a warning. I got very confused when I encountered this behavior, since it was silently skipping a file I needed that whereis said was on the remote. get without --from already displayed a "unable to access these remotes" message, which while a bit misleading in that the remote is likely accessible, but just doesn't contain the file, at least indicated something went wrong. Having get --from display a warning makes it in line with get w/o --from, so seems certianly ok. It might be there are situations where move --from is used, on eg a whole directory, and the user only wants to move whatever is present in the remote, and is perfectly ok with files that are not present being skipped. So I'm less sure about the new warning being ok there. OTOH, only local git remotes avoiding displaying a warning in that case too, so this just brings them into line with other remotes. (Also note that this makes it a little bit faster when dealing with a lot of files, since it avoids a redundant stat of the file.)	2020-04-21 12:36:58 -04:00
Joey Hess	529f488ec4	fix a thundering herd problem Avoid repeatedly opening keys db when accessing a local git remote and -J is used. What was happening was that Remote.Git.onLocal created a new annex state as each thread started up. The way the MVar was used did not prevent that. And that, in turn, led to repeated opening of the keys db, as well as probably other extra work or resource use. Also managed to get rid of Annex.remoteannexstate, and it turned out there was an unncessary Maybe in the keysdbhandle, since the handle starts out closed.	2020-04-17 17:09:29 -04:00
Joey Hess	f85ca7dc80	fix all remaining -Wincomplete-uni-patterns warnings A couple of these were probably actual bugs in edge cases. Most of the changes I'm fine with. The fact that aeson's object returns sometihng that we know will be an Object, but the type checker does not know is kind of annoying.	2020-04-15 13:55:08 -04:00
Joey Hess	ca9c6c5f60	Fix a potential failure to parse git config Git has an obnoxious special case in git config, a line "foo" is the same as "foo = true". That means there is no way to examine the output of git config and tell if it was run with --null or not, since a "foo" in the first line could be such a boolean, or could be followed by its value on the next line if --null were used. So, rather than trying to do such a detection, track the style of config at all the points where it's generated.	2020-04-13 13:05:41 -04:00
Joey Hess	7ebc118776	adb: Better messages when the adb command is not installed After a user completely ignored the display of the exception probably because it didn't make sense.. This does make it a little bit slower since it checks adb is in path each time before running it. Also, it might display a lot of warnings about it not being installed. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-04-02 10:48:28 -04:00
Joey Hess	4b92bbe8d7	webdav: Made exporttree remotes faster by caching connection to the server Followed example of Remote.S3.	2020-03-20 12:48:43 -04:00
Joey Hess	a9d56a1abd	fix builds build	2020-03-10 13:50:46 -04:00
Joey Hess	6a91471923	GETCONFIG name fix Fix regression that prevented external special remotes from using GETCONFIG to query values like "name". (Introduced in version 7.20200202.7.)	2020-03-09 12:38:04 -04:00
Joey Hess	7f992ef59c	mostly finished with createDirectoryUnder conversion Remaining things needing converted are in the assistant, and Annex.Ssh. Every other remaining call to createDirectoryIfMissing True has been audited and is not relevant. The ones in Build/ of course don't get included in the program. Others included eg, Remote.Tahoe and Config.Files which both write to dotfiles under the home directory.	2020-03-06 11:57:15 -04:00
Joey Hess	6d58ca94d6	some easy createDirectoryUnder conversions	2020-03-05 15:20:10 -04:00
Joey Hess	ccd8c43dc8	git-annex config: guard against non-repo-global configs git-annex config: Only allow configs be set that are ones git-annex actually supports reading from repo-global config, to avoid confused users trying to set other configs with this.	2020-03-02 15:54:18 -04:00
Joey Hess	2366e7fb84	catch whereisKey exception and provide error messages when external programs neglect to * whereis: If a remote fails to report on urls where a key is located, display a warning, rather than giving up and not displaying any information. * When external special remotes fail but neglect to provide an error message, say what request failed, which is better than displaying an empty error message to the user.	2020-02-27 14:09:18 -04:00
Joey Hess	81e3faf810	Merge branch 'v7'	2020-02-26 18:15:18 -04:00
Joey Hess	e535da621c	Bugfix to getting content from an export remote with -J, when the export database was not yet populated. (cherry picked from commit `e520341500`)	2020-02-26 18:07:20 -04:00
Joey Hess	8af6d2c3c5	fix encryption of content to gcrypt and git-lfs Fix serious regression in gcrypt and encrypted git-lfs remotes. Since version 7.20200202.7, git-annex incorrectly stored content on those remotes without encrypting it. Problem was, Remote.Git enumerates all git remotes, including git-lfs and gcrypt. It then dispatches to those. So, Remote.List used the RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt, and that parser does not know about encryption fields, so did not include them in the ParsedRemoteConfig. (Also didn't include other fields specific to those remotes, perhaps chunking etc also didn't get through.) To fix, had to move RemoteConfig parsing down into the generate methods of each remote, rather than doing it in Remote.List. And a consequence of that was that ParsedRemoteConfig had to change to include the RemoteConfig that got parsed, so that testremote can generate a new remote based on an existing remote. (I would have rather fixed this just inside Remote.Git, but that was not practical, at least not w/o re-doing work that Remote.List already did. Big ugly mostly mechanical patch seemed preferable to making git-annex slower.)	2020-02-26 18:05:36 -04:00
Joey Hess	9050788b66	info: Fix display of the encryption value. (Some debugging junk had crept in.)	2020-02-26 15:02:23 -04:00
Joey Hess	e520341500	Bugfix to getting content from an export remote with -J, when the export database was not yet populated.	2020-02-26 14:57:29 -04:00
Joey Hess	67476fbc54	minor code simplification	2020-02-25 13:06:09 -04:00
Joey Hess	79a0435b77	automate remote.name.skipFetchAll initremote, enableremote: Set remote.name.skipFetchAll when the remote cannot be fetched from by git, so git fetch --all will not try to use it.	2020-02-19 13:58:26 -04:00

1 2 3 4 5 ...

1446 commits