git-annex

Author	SHA1	Message	Date
Joey Hess	8b39db20b5	export appendonly support Make `git annex export` check appendonly when removing a file from an export, and not update the location log, since the remote still contains the content. This commit was supported by the NSF-funded DataLad project.	2018-08-30 11:18:20 -04:00
Joey Hess	02630b39ee	add Remote.readonly Does nothing yet. Considered making bup readonly, but while the content can't be removed, it is able to delete a branch, so didn't. This commit was supported by the NSF-funded DataLad project.	2018-08-30 11:12:18 -04:00
Joey Hess	44658d80ef	clarify comment	2018-08-29 10:55:52 -04:00
Joey Hess	1a02fc1159	Fix wrong sorting of remotes when using -J It was sorting by uuid, rather than cost! Avoid future bugs of this kind by changing the Ord to primarily compare by cost, with uuid only used when the cost is the same. This commit was supported by the NSF-funded DataLad project.	2018-08-03 13:10:50 -04:00
Joey Hess	4315bb9e42	add retrievalSecurityPolicy This will be used to protect against CVE-2018-10859, where an encrypted special remote is fed the wrong encrypted data, and so tricked into decrypting something that the user encrypted with their gpg key and did not store in git-annex. It also protects against CVE-2018-10857, where a remote follows a http redirect to a file:// url or to a local private web server. While that's already been prevented in git-annex's own use of http, external special remotes, hooks, etc use other http implementations and could still be vulnerable. The policy is not yet enforced, this commit only adds the appropriate metadata to remotes. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2018-06-21 11:36:36 -04:00
Joey Hess	67e46229a5	change Remote.repo to Remote.getRepo This is groundwork for letting a repo be instantiated the first time it's actually used, instead of at startup. The only behavior change is that some old special cases for xmpp remotes were removed. Where before git-annex silently did nothing with those no-longer supported remotes, it may now fail in some way. The additional IO action should have no performance impact as long as it's simply return. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon	2018-06-04 15:30:26 -04:00
Joey Hess	4015c5679a	force verification when resuming download When resuming a download and not using a rolling checksummer like rsync, the partial file we start with might contain garbage, in the case where a file changed as it was being downloaded. So, disabling verification on resumes risked a bad object being put into the annex. Even downloads with rsync are currently affected. It didn't seem worth the added complexity to special case those to prevent verification, especially since git-annex is using rsync less often now. This commit was sponsored by Brock Spratlen on Patreon.	2018-03-13 14:50:49 -04:00
Joey Hess	31e1adc005	deal with unlocked files P2P protocol version 1 adds VALID\|INVALID after DATA; INVALID means the file was detected to change content while it was being sent and so we may not have received the valid content of the file. Added new MustVerify constructor for Verification, which forces verification even when annex.verify=false etc. This is used when INVALID and in protocol version 0. As well as changing git-annex-shell p2psdio, this makes git-annex tor remotes always force verification, since they don't yet use protocol version 1. Previously, annex.verify=false could skip verification when using tor remotes, and let bad data into the repository. This commit was sponsored by Jack Hill on Patreon.	2018-03-13 14:27:14 -04:00
Joey Hess	e1f5c90c92	split out Types.Export	2017-09-15 16:46:03 -04:00
Joey Hess	e54a05612e	avoid unncessary db queries when exported directory can't be empty In rename foo/bar to foo/baz, foo can't be empty. In delete zxyyz, there's no exported directory (top doesn't count).	2017-09-15 16:30:49 -04:00
Joey Hess	c633144d28	remove empty directories when removing from export The subtle part of this is what happens when the remote fails to remove an empty directory. The removal from the export needs to fail in that case, so the removal will be tried again later. However, removeExportLocation has already been run and changed the export db, so if the next run checks getExportLocation, it might decide nothing remains to be done, leaving the empty directory. Dealt with that by making removeEmptyDirectories, handle a failure by calling addExportLocation, reverting the database changes so the next run will be guaranteed to try deleting the empty directory again. This commit was sponsored by Thomas Hochstein on Patreon.	2017-09-15 15:22:53 -04:00
Joey Hess	9f4ffe65e9	implement removeExportDirectory Not yet called by Command.Export. WebDAV needs this to clean up empty collections. Also, example.sh turned out to not be cleaning up directories when removing content from them, so it made sense for it to use this. Remote.Directory did not need it, and since its cleanup method for empty directories is more efficient than what Command.Export will need to do to find empty directories, it uses Nothing so that extra work can be avoided. This commit was sponsored by Thom May on Patreon.	2017-09-15 13:18:21 -04:00
Joey Hess	9c3622882b	export: cache connections for S3 and webdav	2017-09-12 16:59:04 -04:00
Joey Hess	a1b195d84c	External special remote protocol extended to support export. Also updated example.sh to support export. This commit was supported by the NSF-funded DataLad project.	2017-09-08 14:24:05 -04:00
Joey Hess	16eb2f976c	prevent exporttree=yes on remotes that don't support exports Don't allow "exporttree=yes" to be set when the special remote does not support exports. That would be confusing since the user would set up a special remote for exports, but `git annex export` to it would later fail. This commit was supported by the NSF-funded DataLad project.	2017-09-07 13:48:44 -04:00
Joey Hess	cae3704a44	export file renaming This is seriously super hairy. It has to handle interrupted exports, which may be resumed with the same or a different tree. It also has to recover from export conflicts, which could cause the wrong content to be renamed to a file. I think this works, or is close to working. See the update to the design for how it works. This is definitely not optimal, in that it does more renames than are necessary. It would probably be worth finding the keys that are really renamed and only renaming those. But let's get the "simple" approach to work first.. This commit was supported by the NSF-funded DataLad project.	2017-09-06 15:44:10 -04:00
Joey Hess	662f2a5ee7	git annex get from exports Straightforward enough, except for the needed belt-and-suspenders sanity checks to avoid foot shooting due to exports not being key/value stores. * Even when annex.verify=false, always verify from exports. * Only get files from exports that use a backend that supports checksum verification. * Never trust exports, even if the user says to, because then `git annex drop` would drop content if the export seemed to contain a copy. This commit was supported by the NSF-funded DataLad project.	2017-09-04 16:39:56 -04:00
Joey Hess	4da763439b	use export db to correctly handle duplicate files Removed uncorrect UniqueKey key in db schema; a key can appear multiple times with different files. The database has to be flushed after each removal. But when adding files to the export, lots of changes are able to be queued up w/o flushing. So it's still fairly efficient. If large removals of files from exports are too slow, an alternative would be to make two passes over the diff, one pass queueing deletions from the database, then a flush and the a second pass updating the location log. But that would use more memory, and need to look up exportKey twice per removed file, so I've avoided such optimisation yet. This commit was supported by the NSF-funded DataLad project.	2017-09-04 14:39:32 -04:00
Joey Hess	28e2cad849	implement exporttree=yes configuration * Only export to remotes that were initialized to support it. * Prevent storing key/value on export remotes. * Prevent enabling exporttree=yes and encryption in the same remote. SetupStage Enable was changed to take the old RemoteConfig. This allowed only setting exporttree when initially setting up a remote, and not configuring it later after stuff might already be stored in the remote. Went with =yes rather than =true for consistency with other parts of git-annex. Changed docs accordingly. This commit was supported by the NSF-funded DataLad project.	2017-09-04 13:09:38 -04:00
Joey Hess	a4328b49d2	refactor ExportActions This will allow disabling exports for remotes that are not configured to allow them. Also, exportSupported will be useful for the external special remote to probe. This commit was supported by the NSF-funded DataLad project	2017-09-01 13:05:09 -04:00
Joey Hess	bb08b1abd2	make storeExport atomic This avoids needing to deal with the complexity of partially transferred files in the export. We'd not be able to resume uploading to such a file anyway, so just avoid them. The implementation in Remote.Directory is not completely ideal, because it could leave the temp file hanging around in the export directory. This only happens if it's killed with -9, or there's a power failure; normally viaTmp cleans up after itself, even when interrupted. I could not see a better way to do it though, since the export directory might be the root of a filesystem. Also some design thoughts on resuming, which depend on storeExport being atomic. This commit was sponsored by Fernando Jimenez on Partreon.	2017-08-31 14:24:32 -04:00
Joey Hess	cca2764f91	provide file with content to export Rather than providing the key to export, provide the file. When exporting a treeish that contains files that are not annexed, this will let the content of those files also be exported. There's still a Key in the interface; it will be used by the external special remote protocol. A SHA1 key can be used when exporting non-annexed files. This commit was sponsored by Brock Spratlen on Patreon.	2017-08-29 13:57:42 -04:00
Joey Hess	e55e445a36	add API for exporting Implemented so far for the directory special remote. Several remotes don't make sense to export to. Regular Git remotes, obviously, do not. Bup remotes almost certianly do not, since bup would need to be used to extract the export; same store for Ddar. Web and Bittorrent are download-only. GCrypt is always encrypted so exporting to it would be pointless. There's probably no point complicating the Hook remotes with exporting at this point. External, S3, Glacier, WebDAV, Rsync, and possibly Tahoe should be modified to support export. Thought about trying to reuse the storeKey/retrieveKeyFile/removeKey interface, rather than adding a new interface. But, it seemed better to keep it separate, to avoid a complicated interface that sometimes encrypts/chunks key/value storage and sometimes users non-key/value storage. Any common parts can be factored out. Note that storeExport is not atomic. doc/design/exporting_trees_to_special_remotes.mdwn has some things in the "resuming exports" section that bear on this decision. Basically, I don't think, at this time, that an atomic storeExport would help with resuming, because exports are not key/value storage, and we can't be sure that a partially uploaded file is the same content we're currently trying to export. Also, note that ExportLocation will always use unix path separators. This is important, because users may export from a mix of windows and unix, and it avoids complicating the API with path conversions, and ensures that in such a mix, they always use the same locations for exports. This commit was sponsored by Bruno BEAUFILS on Patreon.	2017-08-29 13:00:41 -04:00
Joey Hess	5c804cf42e	add SetupStage parameter to RemoteType.setup Most remotes have an idempotent setup that can be reused for enableremote, but in a few cases, it needs to tell which, and whether a UUID was provided to setup was used. This is groundwork for making initremote be able to provide a UUID. It should not change any behavior. Note that it would be nice to make the UUID always be provided to setup, and make setup not need to generate and return a UUID. What prevented this simplification is Remote.Git.gitSetup, which needs to reuse the UUID of the git remote when setting it up, and so has to return that UUID. This commit was sponsored by Thom May on Patreon.	2017-02-07 14:55:58 -04:00
Joey Hess	91df4c6b53	Pass the various gnupg-options configs to gpg in several cases where they were not before. Removed the instance LensGpgEncParams RemoteConfig because it encouraged code that does not take the RemoteGitConfig into account. RemoteType's setup was changed to take a RemoteGitConfig, although the only place that is able to provide a non-empty one is enableremote, when it's changing an existing remote. This led to several folow-on changes, and got RemoteGitConfig plumbed through.	2016-05-23 17:03:20 -04:00
Joey Hess	4c6095b6f5	content locking during drop working for local git remotes Only ssh remotes lack locking now	2015-10-09 13:12:58 -04:00
Joey Hess	ceb5819538	finish and use lockContent interface	2015-10-09 12:36:04 -04:00
Joey Hess	c75c79864d	support invalidating existing VerifiedCopys	2015-10-08 17:58:32 -04:00
Joey Hess	b1abe59193	add removeKey action to Remote Not implemented for any remotes yet; probably the git remote is the only one that will ever implement it.	2015-10-08 15:01:38 -04:00
Joey Hess	2def1d0a23	other 80% of avoding verification when hard linking to objects in shared repo In `c6632ee5c8`, it actually only handled uploading objects to a shared repository. To avoid verification when downloading objects from a shared repository, was a lot harder. On the plus side, if the process of downloading a file from a remote is able to verify its content on the side, the remote can indicate this now, and avoid the extra post-download verification. As of yet, I don't have any remotes (except Git) using this ability. Some more work would be needed to support it in special remotes. It would make sense for tahoe to implicitly verify things downloaded from it; as long as you trust your tahoe server (which typically runs locally), there's cryptographic integrity. OTOH, despite bup being based on shas, a bup repo under an attacker's control could have the git ref used for an object changed, and so a bup repo shouldn't implicitly verify. Indeed, tahoe seems unique in being trustworthy enough to implicitly verify.	2015-10-02 14:35:12 -04:00
Joey Hess	c5b8484c2e	Simplify setup process for a ssh remote. Now it suffices to run git remote add, followed by git-annex sync. Now the remote is automatically initialized for use by git-annex, where before the git-annex branch had to manually be pushed before using git-annex sync. Note that this involved changes to git-annex-shell, so if the remote is using an old version, the manual push is still needed. Implementation required git-annex-shell be changed, so configlist can autoinit a repository even when no git-annex branch has been pushed yet. Unfortunate because we'll have to wait for it to get deployed to servers before being able to rely on this change in the documentation. Did consider making git-annex sync push the git-annex branch to repos that didn't have a uuid, but this seemed difficult to do without complicating it in messy ways. It would be cleaner to split a command out from configlist to handle the initialization. But this is difficult without sacrificing backwards compatability, for users of old git-annex versions which would not use the new command.	2015-08-05 13:49:58 -04:00
Joey Hess	e27b97d364	Merge branch 'master' into concurrentprogress Conflicts: Command/Fsck.hs Messages.hs Remote/Directory.hs Remote/Git.hs Remote/Helper/Special.hs Types/Remote.hs debian/changelog git-annex.cabal	2015-05-12 13:23:22 -04:00
Joey Hess	234830b5c9	comment	2015-04-18 13:07:57 -04:00
Joey Hess	a2902cdaaf	add filename to progress bar, and display ok/failed at end This needed plumbing an AssociatedFile through retrieveKeyFileCheap.	2015-04-14 16:35:10 -04:00
Joey Hess	afc5153157	update my email address and homepage url	2015-01-21 12:50:09 -04:00
Joey Hess	2cd84fcc8b	Expand checkurl to support recommended filename, and multi-file-urls This commit was sponsored by an anonymous bitcoiner.	2014-12-11 15:33:42 -04:00
Joey Hess	7ae16bb6f7	Revert "let url claims optionally include a suggested filename" This reverts commit `85df9c30e9`. Putting filename in the claim was a bad idea.	2014-12-11 14:09:57 -04:00
Joey Hess	85df9c30e9	let url claims optionally include a suggested filename	2014-12-11 12:47:57 -04:00
Joey Hess	30bf112185	Urls can now be claimed by remotes. This will allow creating, for example, a external special remote that handles magnet: and *.torrent urls.	2014-12-08 19:15:07 -04:00
Joey Hess	ee27298b91	implement CLAIMURL for external special remote	2014-12-08 13:57:13 -04:00
Joey Hess	cb6e16947d	add stub claimUrl	2014-12-08 13:40:15 -04:00
Joey Hess	a0297915c1	add per-remote-type info Now `git annex info $remote` shows info specific to the type of the remote, for example, it shows the rsync url. Remote types that support encryption or chunking also include that in their info. This commit was sponsored by Ævar Arnfjörð Bjarmason.	2014-10-21 14:36:09 -04:00
Joey Hess	6adbd50cd9	testremote: Add testing of behavior when remote is not available Added a mkUnavailable method, which a Remote can use to generate a version of itself that is not available. Implemented for several, but not yet all remotes. This allows testing that checkPresent properly throws an exceptions when it cannot check if a key is present or not. It also allows testing that the other methods don't throw exceptions in these circumstances. This immediately found several bugs, which this commit also fixes! * git remotes using ssh accidentially had checkPresent return an exception, rather than throwing it * The chunking code accidentially returned False rather than propigating an exception when there were no chunks and checkPresent threw an exception for the non-chunked key. This commit was sponsored by Carlo Matteo Capocasa.	2014-08-10 15:02:59 -04:00
Joey Hess	b4cf22a388	pushed checkPresent exception handling out of Remote implementations I tend to prefer moving toward explicit exception handling, not away from it, but in this case, I think there are good reasons to let checkPresent throw exceptions: 1. They can all be caught in one place (Remote.hasKey), and we know every possible exception is caught there now, which we didn't before. 2. It simplified the code of the Remotes. I think it makes sense for Remotes to be able to be implemented without needing to worry about catching exceptions inside them. (Mostly.) 3. Types.StoreRetrieve.Preparer can only work on things that return a Bool, which all the other relevant remote methods already did. I do not see a good way to generalize that type; my previous attempts failed miserably.	2014-08-06 13:45:19 -04:00
Joey Hess	58f727afdd	resume interrupted chunked uploads Leverage the new chunked remotes to automatically resume uploads. Sort of like rsync, although of course not as efficient since this needs to start at a chunk boundry. But, unlike rsync, this method will work for S3, WebDAV, external special remotes, etc, etc. Only directory special remotes so far, but many more soon! This implementation will also allow starting an upload from one repository, interrupting it, and then resuming the upload to the same remote from an entirely different repository. Note that I added a comment that storeKey should atomically move the content into place once it's all received. This was already an undocumented requirement -- it's necessary for hasKey to work reliably. This resume code just uses hasKey to find the first chunk that's missing. Note that if there are two uploads of the same key to the same chunked remote, one might resume at the point the other had gotten to, but both will then redundantly upload. As before. In the non-resume case, this adds one hasKey call per storeKey, and only if the remote is configured to use chunks. Future work: Try to eliminate that hasKey. Notice that eg, `git annex copy --to` checks if the key is present before sending it, so is already running hasKey.. which could perhaps be cached and reused. However, this additional overhead is not very large compared with transferring an entire large file, and the ability to resume is certianly worth it. There is an optimisation in place for small files, that avoids trying to resume if the whole file fits within one chunk. This commit was sponsored by Georg Bauer.	2014-07-28 14:35:52 -04:00
Joey Hess	153ace4524	fix handling of removal of keys that are not present	2014-07-28 14:14:01 -04:00
Joey Hess	904859d676	wording	2014-07-26 13:25:06 -04:00
Joey Hess	fa24ba2520	plumb creds from webapp to initremote Avoids abusing setting environment variables, which was always a hack and won't work on windows.	2014-02-11 14:07:56 -04:00
Joey Hess	c20f31a1ad	add GETAVAILABILITY to external special remote protocol And some reworking of types, and added an annex-availability git config setting.	2014-01-13 14:41:10 -04:00
Joey Hess	958312885f	webapp: Improve UI around remote that have no annex.uuid set, either because setup of them is incomplete, or because the remote git repository is not a git-annex repository. Complicated by such repositories potentially being repos that should have an annex.uuid, but it failed to be gotten, perhaps due to the past ssh repo setup bugs. This is handled now by an Upgrade Repository button.	2013-11-07 18:02:00 -04:00

1 2

83 commits