git-annex

Author	SHA1	Message	Date
Joey Hess	963239da5c	separate RemoteConfig parsing basically working Many special remotes are not updated yet and are commented out.	2020-01-14 12:35:08 -04:00
Joey Hess	71f78fe45d	wip separate RemoteConfig parsing Remote now contains a ParsedRemoteConfig. The parsing happens when the Remote is constructed, rather than when individual configs are used. This is more efficient, and it lets initremote/enableremote reject configs that have unknown fields or unparsable values. It also allows for improved type safety, as shown in Remote.Helper.Encryptable where things that used to match on string configs now match on data types. This is a work in progress, it does not build yet. The main risk in this conversion is forgetting to add a field to RemoteConfigParser. That will prevent using that field with initremote/enableremote, and will prevent remotes that already are set up from seeing that configuration. So will need to check carefully that every field that getRemoteConfigValue is called on has been added to RemoteConfigParser. (One such case I need to remember is that credPairRemoteField needs to be included in the RemoteConfigParser.)	2020-01-13 12:39:21 -04:00
Joey Hess	71ecfbfccf	be stricter about rejecting invalid configurations for remotes This is a first step toward that goal, using the ProposedAccepted type in RemoteConfig lets initremote/enableremote reject bad parameters that were passed in a remote's configuration, while avoiding enableremote rejecting bad parameters that have already been stored in remote.log This does not eliminate every place where a remote config is parsed and a default value is used if the parse false. But, I did fix several things that expected foo=yes/no and so confusingly accepted foo=true but treated it like foo=no. There are still some fields that are parsed with yesNo but not not checked when initializing a remote, and there are other fields that are parsed in other ways and not checked when initializing a remote. This also lays groundwork for rejecting unknown/typoed config keys.	2020-01-10 14:52:48 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	df5b0ffab3	inherit other fields I think this is all that need to be inherited.	2019-10-10 16:11:21 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	96aba8eff7	Revert "cache the serialization of a Key" This reverts commit `4536c93bb2`. That broke Read/Show of a Key, and unfortunately Key is read in at least one place; the GitAnnexDistribution data type. It would be worth bringing this optimisation back, but it would need either a custom Read/Show instance that preserves back-compat, or wrapping Key in a data type that contains the serialization, or changing how GitAnnexDistribution is serialized. Also, the Eq instance would need to compare keys with and without a cached seralization the same.	2019-01-16 16:21:59 -04:00
Joey Hess	4536c93bb2	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. It means that every place a Key has any of its fields changed, the cache has to be dropped. I've grepped and found them all. But, it would be better to avoid that gotcha somehow..	2019-01-14 16:37:28 -04:00
Joey Hess	2542fb58ed	fix giveup shadowing	2016-11-16 00:28:10 -04:00
Joey Hess	0a4479b8ec	Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors. ghc 8 added backtraces on uncaught errors. This is great, but git-annex was using error in many places for a error message targeted at the user, in some known problem case. A backtrace only confuses such a message, so omit it. Notably, commands like git annex drop that failed due to eg, numcopies, used to use error, so had a backtrace. This commit was sponsored by Ethan Aubin.	2016-11-15 21:29:54 -04:00
Joey Hess	b890f3a53d	Fix bug that prevented resuming of uploads to encrypted special remotes that used chunking. This bug could also expose the names of keys to such remotes. This is a low-severity security hole.	2016-04-27 12:54:43 -04:00
Joey Hess	737e45156e	remove 163 lines of code without changing anything except imports	2016-01-20 16:36:33 -04:00
Joey Hess	6dad09a823	disable whereisKey for encrypted or chunked remotes This only makes sense for public repos, that are not chunked, so that there's a 1:1 from Key in the git-annex repo to file on the remote. Rather than making every remote implementation deal with that, just disable whereisKey when it doesn't make sense.	2015-08-19 14:16:01 -04:00
Joey Hess	addc82dab7	removed all uses of undefined from code base It's a code smell, can lead to hard to diagnose error messages.	2015-04-19 00:38:29 -04:00
Joey Hess	afc5153157	update my email address and homepage url	2015-01-21 12:50:09 -04:00
Joey Hess	4f657aa14e	add getFileSize, which can get the real size of a large file on Windows Avoid using fileSize which maxes out at just 2 gb on Windows. Instead, use hFileSize, which doesn't have a bounded size. Fixes support for files > 2 gb on Windows. Note that the InodeCache code only needs to compare a file size, so it doesn't matter it the file size wraps. So it has been left as-is. This was necessary both to avoid invalidating existing inode caches, and because the code passed FileStatus around and would have become more expensive if it called getFileSize. This commit was sponsored by Christian Dietrich.	2015-01-20 17:09:24 -04:00
Joey Hess	a0297915c1	add per-remote-type info Now `git annex info $remote` shows info specific to the type of the remote, for example, it shows the rsync url. Remote types that support encryption or chunking also include that in their info. This commit was sponsored by Ævar Arnfjörð Bjarmason.	2014-10-21 14:36:09 -04:00
Joey Hess	7b50b3c057	fix some mixed space+tab indentation This fixes all instances of " \t" in the code base. Most common case seems to be after a "where" line; probably vim copied the two space layout of that line. Done as a background task while listening to episode 2 of the Type Theory podcast.	2014-10-09 15:09:11 -04:00
Joey Hess	6adbd50cd9	testremote: Add testing of behavior when remote is not available Added a mkUnavailable method, which a Remote can use to generate a version of itself that is not available. Implemented for several, but not yet all remotes. This allows testing that checkPresent properly throws an exceptions when it cannot check if a key is present or not. It also allows testing that the other methods don't throw exceptions in these circumstances. This immediately found several bugs, which this commit also fixes! * git remotes using ssh accidentially had checkPresent return an exception, rather than throwing it * The chunking code accidentially returned False rather than propigating an exception when there were no chunks and checkPresent threw an exception for the non-chunked key. This commit was sponsored by Carlo Matteo Capocasa.	2014-08-10 15:02:59 -04:00
Joey Hess	c784ef4586	unify exception handling into Utility.Exception Removed old extensible-exceptions, only needed for very old ghc. Made webdav use Utility.Exception, to work after some changes in DAV's exception handling. Removed Annex.Exception. Mostly this was trivial, but note that tryAnnex is replaced with tryNonAsync and catchAnnex replaced with catchNonAsync. In theory that could be a behavior change, since the former caught all exceptions, and the latter don't catch async exceptions. However, in practice, nothing in the Annex monad uses async exceptions. Grepping for throwTo and killThread only find stuff in the assistant, which does not seem related. Command.Add.undo is changed to accept a SomeException, and things that use it for rollback now catch non-async exceptions, rather than only IOExceptions.	2014-08-07 22:03:29 -04:00
Joey Hess	b4cf22a388	pushed checkPresent exception handling out of Remote implementations I tend to prefer moving toward explicit exception handling, not away from it, but in this case, I think there are good reasons to let checkPresent throw exceptions: 1. They can all be caught in one place (Remote.hasKey), and we know every possible exception is caught there now, which we didn't before. 2. It simplified the code of the Remotes. I think it makes sense for Remotes to be able to be implemented without needing to worry about catching exceptions inside them. (Mostly.) 3. Types.StoreRetrieve.Preparer can only work on things that return a Bool, which all the other relevant remote methods already did. I do not see a good way to generalize that type; my previous attempts failed miserably.	2014-08-06 13:45:19 -04:00
Joey Hess	b3fe23b552	remove redundant progress meter display code specialRemote handles all meter display, so this is redundant.	2014-08-03 16:18:40 -04:00
Joey Hess	4b16989e98	roll ChunkedEncryptable into Special and improve interface Allow disabling progress displays, for eg, rsync.	2014-08-03 15:40:01 -04:00
Joey Hess	00f92a7e59	whitespace	2014-08-03 01:21:38 -04:00
Joey Hess	de0da0aece	minor optimisation	2014-08-01 17:18:39 -04:00
Joey Hess	3991327d09	testremote: Test retrieveKeyFile resume And fixed a bug found by these tests; retrieveKeyFile would fail when the dest file was already complete. This commit was sponsored by Bradley Unterrheiner.	2014-08-01 17:16:20 -04:00
Joey Hess	9636cfd9e1	fix a fenchpost bug when resuming chunked store at end Discovered thanks to testremote command!	2014-08-01 16:29:39 -04:00
Joey Hess	8fce4e4bd7	fix chunk=0 Found by testremote	2014-08-01 15:36:11 -04:00
Joey Hess	89416ba2d9	only chunk stable keys The content of unstable keys can potentially be different in different repos, so eg, resuming a chunked upload started by another repo would corrupt data.	2014-07-30 10:34:39 -04:00
Joey Hess	a963d790d3	update progress after each chunk, at least This way, when the remote implementation neglects to update progress, there will still be a somewhat useful progress display, as long as chunks are used.	2014-07-29 20:31:16 -04:00
Joey Hess	53b87a859e	optimise case of remote that retrieves FileContent, when chunks and encryption are not being used No need to read whole FileContent only to write it back out to a file in this case. Can just rename! Yay. Also indidentially, fixed an attempt to open a file for write that was already opened for write, which caused a crash and deadlock.	2014-07-29 20:10:14 -04:00
Joey Hess	c0dc134cde	support chunking for all external special remotes! Removing code and at the same time adding great features, including upload/download resuming. This commit was sponsored by Romain Lenglet.	2014-07-29 18:50:20 -04:00
Joey Hess	bc9e4697b9	better type for Retriever Putting a callback in the Retriever type allows for the callback to remove the retrieved file when it's done with it. I did not really want to make Retriever be fixed to Annex Bool, but when I tried to use Annex a, I got into some type of type mess.	2014-07-29 18:41:41 -04:00
Joey Hess	47e522979c	allow Retriever action to update the progress meter Needed for eg, Remote.External. Generally, any Retriever that stores content in a file is responsible for updating the meter, while ones that procude a lazy bytestring cannot update the meter, so are not asked to.	2014-07-29 17:18:49 -04:00
Joey Hess	1d263e1e7e	lift types from IO to Annex Some remotes like External need to run store and retrieve actions in Annex, not IO. In order to do that lift, I had to dive pretty deep into the utilities, making Utility.Gpg and Utility.Tmp be partly converted to using MonadIO, and Control.Monad.Catch for exception handling. There should be no behavior changes in this commit. This commit was sponsored by Michael Barabanov.	2014-07-29 16:28:44 -04:00
Joey Hess	f5af470875	add ContentSource type, for remotes that act on files rather than ByteStrings Note that currently nothing cleans up a ContentSource's file, when eg, retrieving chunks.	2014-07-29 15:16:12 -04:00
Joey Hess	216fdbd6bd	fix non-checked hasKeyChunks	2014-07-29 15:07:32 -04:00
Joey Hess	58f727afdd	resume interrupted chunked uploads Leverage the new chunked remotes to automatically resume uploads. Sort of like rsync, although of course not as efficient since this needs to start at a chunk boundry. But, unlike rsync, this method will work for S3, WebDAV, external special remotes, etc, etc. Only directory special remotes so far, but many more soon! This implementation will also allow starting an upload from one repository, interrupting it, and then resuming the upload to the same remote from an entirely different repository. Note that I added a comment that storeKey should atomically move the content into place once it's all received. This was already an undocumented requirement -- it's necessary for hasKey to work reliably. This resume code just uses hasKey to find the first chunk that's missing. Note that if there are two uploads of the same key to the same chunked remote, one might resume at the point the other had gotten to, but both will then redundantly upload. As before. In the non-resume case, this adds one hasKey call per storeKey, and only if the remote is configured to use chunks. Future work: Try to eliminate that hasKey. Notice that eg, `git annex copy --to` checks if the key is present before sending it, so is already running hasKey.. which could perhaps be cached and reused. However, this additional overhead is not very large compared with transferring an entire large file, and the ability to resume is certianly worth it. There is an optimisation in place for small files, that avoids trying to resume if the whole file fits within one chunk. This commit was sponsored by Georg Bauer.	2014-07-28 14:35:52 -04:00
Joey Hess	80cc554c82	add ChunkMethod type and make Logs.Chunk use it, rather than assuming fixed size chunks (so eg, rolling hash chunks can be supported later) If a newer git-annex starts logging something else in the chunk log, it won't be used by this version, but it will be preserved when updating the log.	2014-07-28 13:19:08 -04:00
Joey Hess	9d4a766cd7	resume interrupted chunked downloads Leverage the new chunked remotes to automatically resume downloads. Sort of like rsync, although of course not as efficient since this needs to start at a chunk boundry. But, unlike rsync, this method will work for S3, WebDAV, external special remotes, etc, etc. Only directory special remotes so far, but many more soon! This implementation will also properly handle starting a download from one remote, interrupting, and resuming from another one, and so on. (Resuming interrupted chunked uploads is similarly doable, although slightly more expensive.) This commit was sponsored by Thomas Djärv.	2014-07-27 18:56:32 -04:00
Joey Hess	2996f0eb05	use existing chunks even when chunk=0 When chunk=0, always try the unchunked key first. This avoids the overhead of needing to read the git-annex branch to find the chunkcount. However, if the unchunked key is not present, go on and try the chunks. Also, when removing a chunked key, update the chunkcounts even when chunk=0.	2014-07-27 02:13:51 -04:00
Joey Hess	7afb057d60	reorg	2014-07-27 01:24:34 -04:00
Joey Hess	c3af4897c0	faster storeChunks No need to process each L.ByteString chunk, instead ask it to split. Doesn't seem to have really sped things up much, but it also made the code simpler. Note that this does (and already did) buffer in memory. It seems that only the directory special remote could take advantage of streaming chunks to files w/o buffering, so probably won't add an interface to allow for that.	2014-07-27 01:18:38 -04:00
Joey Hess	9a8c4bb21f	improve exception handling Push it down from needing to be done in every Storer, to being checked once inside ChunkedEncryptable. Also, catch exceptions from PrepareStorer and PrepareRetriever, just in case..	2014-07-26 23:26:10 -04:00
Joey Hess	867fd116a7	better exception display	2014-07-26 23:01:44 -04:00
Joey Hess	93be3296fc	fix another fallback bug	2014-07-26 22:47:52 -04:00
Joey Hess	86e8532c0a	allM has slightly better memory use	2014-07-26 22:34:40 -04:00
Joey Hess	67975bf50d	fix fallback to other chunk size when first does not have it	2014-07-26 22:25:50 -04:00
Joey Hess	d4d68f57e5	finish up basic chunked remote groundwork Chunk retrieval and reassembly, removal, and checking if all necessary chunks are present. This commit was sponsored by Damien Raude-Morvan.	2014-07-26 20:11:41 -04:00
Joey Hess	cf83697c33	reorg	2014-07-26 12:04:35 -04:00

1 2

66 commits