git-annex

Author	SHA1	Message	Date
Joey Hess	32e4368377	S3: support chunking The assistant defaults to 1MiB chunk size for new S3 special remotes. Which will work around a couple of bugs: http://git-annex.branchable.com/bugs/S3_memory_leaks/ http://git-annex.branchable.com/bugs/S3_upload_not_using_multipart/	2014-08-02 15:51:58 -04:00
Joey Hess	c3750901d8	specialize Preparer a bit, so resourcePrepare can be added The forall a. in Preparer made resourcePrepare not seem to be usable, so I specialized a to Bool. Which works for both Preparer Storer and Preparer Retriever, but wouldn't let the Preparer be used for hasKey as it currently stands.	2014-08-02 15:34:09 -04:00
Joey Hess	de0da0aece	minor optimisation	2014-08-01 17:18:39 -04:00
Joey Hess	3991327d09	testremote: Test retrieveKeyFile resume And fixed a bug found by these tests; retrieveKeyFile would fail when the dest file was already complete. This commit was sponsored by Bradley Unterrheiner.	2014-08-01 17:16:20 -04:00
Joey Hess	9636cfd9e1	fix a fenchpost bug when resuming chunked store at end Discovered thanks to testremote command!	2014-08-01 16:29:39 -04:00
Joey Hess	8fce4e4bd7	fix chunk=0 Found by testremote	2014-08-01 15:36:11 -04:00
Joey Hess	b5ac627fee	WebDAV: Dropped support for DAV before 0.6.1. 0.6.1 is in testing, and stable does not have DAV at all, so I can dispense with this compatability code	2014-07-30 11:20:35 -04:00
Joey Hess	89416ba2d9	only chunk stable keys The content of unstable keys can potentially be different in different repos, so eg, resuming a chunked upload started by another repo would corrupt data.	2014-07-30 10:34:39 -04:00
Joey Hess	a963d790d3	update progress after each chunk, at least This way, when the remote implementation neglects to update progress, there will still be a somewhat useful progress display, as long as chunks are used.	2014-07-29 20:31:16 -04:00
Joey Hess	444944c7a9	fix cleanup of FileContents once done when them when retrieving	2014-07-29 20:27:13 -04:00
Joey Hess	53b87a859e	optimise case of remote that retrieves FileContent, when chunks and encryption are not being used No need to read whole FileContent only to write it back out to a file in this case. Can just rename! Yay. Also indidentially, fixed an attempt to open a file for write that was already opened for write, which caused a crash and deadlock.	2014-07-29 20:10:14 -04:00
Joey Hess	c0dc134cde	support chunking for all external special remotes! Removing code and at the same time adding great features, including upload/download resuming. This commit was sponsored by Romain Lenglet.	2014-07-29 18:50:20 -04:00
Joey Hess	bc9e4697b9	better type for Retriever Putting a callback in the Retriever type allows for the callback to remove the retrieved file when it's done with it. I did not really want to make Retriever be fixed to Annex Bool, but when I tried to use Annex a, I got into some type of type mess.	2014-07-29 18:41:41 -04:00
Joey Hess	47e522979c	allow Retriever action to update the progress meter Needed for eg, Remote.External. Generally, any Retriever that stores content in a file is responsible for updating the meter, while ones that procude a lazy bytestring cannot update the meter, so are not asked to.	2014-07-29 17:18:49 -04:00
Joey Hess	1d263e1e7e	lift types from IO to Annex Some remotes like External need to run store and retrieve actions in Annex, not IO. In order to do that lift, I had to dive pretty deep into the utilities, making Utility.Gpg and Utility.Tmp be partly converted to using MonadIO, and Control.Monad.Catch for exception handling. There should be no behavior changes in this commit. This commit was sponsored by Michael Barabanov.	2014-07-29 16:28:44 -04:00
Joey Hess	f5af470875	add ContentSource type, for remotes that act on files rather than ByteStrings Note that currently nothing cleans up a ContentSource's file, when eg, retrieving chunks.	2014-07-29 15:16:12 -04:00
Joey Hess	216fdbd6bd	fix non-checked hasKeyChunks	2014-07-29 15:07:32 -04:00
Joey Hess	58f727afdd	resume interrupted chunked uploads Leverage the new chunked remotes to automatically resume uploads. Sort of like rsync, although of course not as efficient since this needs to start at a chunk boundry. But, unlike rsync, this method will work for S3, WebDAV, external special remotes, etc, etc. Only directory special remotes so far, but many more soon! This implementation will also allow starting an upload from one repository, interrupting it, and then resuming the upload to the same remote from an entirely different repository. Note that I added a comment that storeKey should atomically move the content into place once it's all received. This was already an undocumented requirement -- it's necessary for hasKey to work reliably. This resume code just uses hasKey to find the first chunk that's missing. Note that if there are two uploads of the same key to the same chunked remote, one might resume at the point the other had gotten to, but both will then redundantly upload. As before. In the non-resume case, this adds one hasKey call per storeKey, and only if the remote is configured to use chunks. Future work: Try to eliminate that hasKey. Notice that eg, `git annex copy --to` checks if the key is present before sending it, so is already running hasKey.. which could perhaps be cached and reused. However, this additional overhead is not very large compared with transferring an entire large file, and the ability to resume is certianly worth it. There is an optimisation in place for small files, that avoids trying to resume if the whole file fits within one chunk. This commit was sponsored by Georg Bauer.	2014-07-28 14:35:52 -04:00
Joey Hess	153ace4524	fix handling of removal of keys that are not present	2014-07-28 14:14:01 -04:00
Joey Hess	80cc554c82	add ChunkMethod type and make Logs.Chunk use it, rather than assuming fixed size chunks (so eg, rolling hash chunks can be supported later) If a newer git-annex starts logging something else in the chunk log, it won't be used by this version, but it will be preserved when updating the log.	2014-07-28 13:19:08 -04:00
Joey Hess	9d4a766cd7	resume interrupted chunked downloads Leverage the new chunked remotes to automatically resume downloads. Sort of like rsync, although of course not as efficient since this needs to start at a chunk boundry. But, unlike rsync, this method will work for S3, WebDAV, external special remotes, etc, etc. Only directory special remotes so far, but many more soon! This implementation will also properly handle starting a download from one remote, interrupting, and resuming from another one, and so on. (Resuming interrupted chunked uploads is similarly doable, although slightly more expensive.) This commit was sponsored by Thomas Djärv.	2014-07-27 18:56:32 -04:00
Joey Hess	2996f0eb05	use existing chunks even when chunk=0 When chunk=0, always try the unchunked key first. This avoids the overhead of needing to read the git-annex branch to find the chunkcount. However, if the unchunked key is not present, go on and try the chunks. Also, when removing a chunked key, update the chunkcounts even when chunk=0.	2014-07-27 02:13:51 -04:00
Joey Hess	7afb057d60	reorg	2014-07-27 01:24:34 -04:00
Joey Hess	bffd0e34b3	comment typo	2014-07-27 01:22:51 -04:00
Joey Hess	c3af4897c0	faster storeChunks No need to process each L.ByteString chunk, instead ask it to split. Doesn't seem to have really sped things up much, but it also made the code simpler. Note that this does (and already did) buffer in memory. It seems that only the directory special remote could take advantage of streaming chunks to files w/o buffering, so probably won't add an interface to allow for that.	2014-07-27 01:18:38 -04:00
Joey Hess	f3e47b16a5	better Preparer interface This will allow things like WebDAV to opean a single persistent connection and reuse it for all the chunked data. The crazy types allow for some nice code reuse.	2014-07-27 00:30:04 -04:00
Joey Hess	9a8c4bb21f	improve exception handling Push it down from needing to be done in every Storer, to being checked once inside ChunkedEncryptable. Also, catch exceptions from PrepareStorer and PrepareRetriever, just in case..	2014-07-26 23:26:10 -04:00
Joey Hess	867fd116a7	better exception display	2014-07-26 23:01:44 -04:00
Joey Hess	0d89b65bfc	fix key checking when a directory special remote's directory is missing The best thing to do in this case is return Left, so that anything that tries to access it will fail.	2014-07-26 22:52:47 -04:00
Joey Hess	93be3296fc	fix another fallback bug	2014-07-26 22:47:52 -04:00
Joey Hess	86e8532c0a	allM has slightly better memory use	2014-07-26 22:34:40 -04:00
Joey Hess	67975bf50d	fix fallback to other chunk size when first does not have it	2014-07-26 22:25:50 -04:00
Joey Hess	adb6ca62ca	fix build	2014-07-26 20:21:36 -04:00
Joey Hess	34c6fdf5e3	fix build	2014-07-26 20:21:10 -04:00
Joey Hess	b2922c1d6d	convert directory special remote to using ChunkedEncryptable And clean up legacy chunking code, which is in its own module now. So much cleaner! This commit was sponsored by Henrik Ahlgren	2014-07-26 20:19:24 -04:00
Joey Hess	1400cbb032	Support for remotes that are chunkable and encryptable. I'd have liked to keep these two concepts entirely separate, but that are entagled: Storing a key in an encrypted and chunked remote need to generate chunk keys, encrypt the keys, chunk the data, encrypt the chunks, and send them to the remote. Similar for retrieval, etc. So, here's an implemnetation of all of that. The total win here is that every remote was implementing encrypted storage and retrival, and now it can move into this single place. I expect this to result in several hundred lines of code being removed from git-annex eventually! This commit was sponsored by Henrik Ahlgren.	2014-07-26 20:14:31 -04:00
Joey Hess	d4d68f57e5	finish up basic chunked remote groundwork Chunk retrieval and reassembly, removal, and checking if all necessary chunks are present. This commit was sponsored by Damien Raude-Morvan.	2014-07-26 20:11:41 -04:00
Joey Hess	cf83697c33	reorg	2014-07-26 12:04:35 -04:00
Joey Hess	e4cb50db33	Merge branch 'master' into newchunks	2014-07-26 12:02:48 -04:00
Joey Hess	005aded3e0	Fix cost calculation for non-encrypted remotes. Encyptable types of remotes that were not actually encrypted still had the encryptedRemoteCostAdj applied to their configured cost, which was a bug.	2014-07-25 17:29:59 -04:00
Joey Hess	9e8a4a0950	support new style chunking in directory special remote Only when storing non-encrypted so far, not retrieving or checking if a key is present or removing. This commit was sponsored by Renaud Casenave-Péré.	2014-07-25 16:21:01 -04:00
Joey Hess	ab4cce4114	core implementation of new style chunking Not yet used by any special remotes, but should not be too hard to add it to most of them. storeChunks is the hairy bit! It's loosely based on Remote.Directory.storeLegacyChunked. The object is read in using a lazy bytestring, which is streamed though, creating chunks as needed, without ever buffering more than 1 chunk in memory. Getting the progress meter update to work right was also fun, since progress meter values are absolute. Finessed by constructing an offset meter. This commit was sponsored by Richard Collins.	2014-07-25 16:20:32 -04:00
Joey Hess	ceea04e77f	move meteredWriteFileChunks out of legacy	2014-07-24 16:42:35 -04:00
Joey Hess	e2c44bf656	implement chunk logs Slightly tricky as they are not normal UUIDBased logs, but are instead maps from (uuid, chunksize) to chunkcount. This commit was sponsored by Frank Thomas.	2014-07-24 16:23:36 -04:00
Joey Hess	bbdb2c04d5	improve chunk data types	2014-07-24 15:08:07 -04:00
Joey Hess	9e2d49d441	prepare for new style chunking Moved old legacy chunking code, and cleaned up the directory and webdav remotes use of it, so when no chunking is configured, that code is not used. The config for new style chunking will be chunk=1M instead of chunksize=1M. There should be no behavior changes from this commit. This commit was sponsored by Andreas Laas.	2014-07-24 14:49:22 -04:00
Joey Hess	ec5ed2af9d	Set gcrypt-publish-participants when setting up a gcrypt repository, to avoid unncessary passphrase prompts. This is a security/usability tradeoff. To avoid exposing the gpg key ids who can decrypt the repository, users can unset gcrypt-publish-participants. The gcrypt-publish-participants option is available in my fork of git-remote-gcrypt. This commit was sponsored by Christopher Kernahan.	2014-07-15 17:33:14 -04:00
Joey Hess	cdf61071bc	optimise handling of unavailable repos The exception handling resulted in git config --list being run twice for unavailable repos. This dials it back down to running it only once.	2014-07-15 14:45:27 -04:00
Joey Hess	bd514eb65a	catch exception when repo is really not available	2014-07-15 14:39:31 -04:00
Joey Hess	522a0922b8	sync: Fix git sync with local git remotes even when they don't have an annex.uuid set. Catch an exception when ensureInitialized is run in a non-initted repository. In this case, just read the git config, so that the Git.Repo object is not LocalUnknown, which is what is used to represent remotes on eg, drives that are not connected. The assistant already got this right, and like with the assistant, this causes an implicit git-annex init of the local remote on the second sync, once the git-annex branch has been pushed to it. See this comment for more analysis: http://git-annex.branchable.com/todo/Recovering_from_a_bad_sync/#comment-64e469a2c1969829ee149cbb41b1c138 This commit was sponsored by jscit.	2014-07-15 14:27:43 -04:00
Joey Hess	604740b720	S3: Deal with AWS ACL configurations that do not allow creating or checking the location of a bucket, but only reading and writing content to it.	2014-07-11 15:21:43 -04:00
Joey Hess	26ee27915a	refactor locking	2014-07-10 00:32:23 -04:00
Joey Hess	a44fd2c019	export CreateProcess fields from Utility.Process update code to avoid cwd and env redefinition warnings	2014-06-10 19:20:14 -04:00
Joey Hess	2f84659d51	fix build with old versions of bytestring	2014-06-06 14:04:35 -04:00
Joey Hess	0c2a14e4aa	fix dodgy use of Char8 I don't know if this was a bug, but I don't know if it was not a bug either. See also, http://git-annex.branchable.com/bugs/Truncated_file_transferred_via_S3/ where the file is not truncated, but mangled..	2014-05-27 20:31:25 -04:00
Joey Hess	c07343e4f7	initremote/enableremote: Basic support for using with regular git remotes initremote stores the location of an already existing git remote, and enableremote setups up a remote using its stored location.	2014-05-22 13:42:17 -04:00
Joey Hess	c34b5e09f8	factor out getRemoteGitConfig	2014-05-16 16:08:20 -04:00
Fraser Tweedale	4eb72392b4	execute remote.<name>.annex-shell on remote, if set It is useful to be able to specify an alternative git-annex-shell program to execute on the remote, e.g., to run a version not on the PATH. Use remote.<name>.annex-shell if specified, instead of the default "git-annex-shell" i.e., first so-named executable on the PATH.	2014-05-16 15:46:43 -04:00
Joey Hess	0b899fa2f1	show a much longer message when annex-ignore is automatically set, to help the user fix their problem	2014-05-16 12:58:50 -04:00
Joey Hess	b1cddea7e4	remove odd character that snuck in somehow and broke build	2014-05-15 16:36:19 -04:00
Robie Basak	4184566627	ddar special remote	2014-05-15 16:32:44 -04:00
Joey Hess	f00cb21037	Bring back rsync -p, but only when git-annex is running on a non-crippled file system. This is a better approach to fix #700282 while not unncessarily losing file permissions on non-crippled systems.	2014-04-17 14:31:42 -04:00
Joey Hess	5af30678c7	factored out Utility.SimpleProtocol from the external special remote implementation	2014-04-05 13:29:28 -04:00
Joey Hess	3b8d5f03bb	Fix glacier repo creation bug Version 5.20140227 broke creation of glacier repositories, not including the datacenter and vault in their configuration. This bug is fixed, but glacier repositories set up with the broken version of git-annex need to have the datacenter and vault set in order to be usable. This can be done using git annex enableremote to add the missing settings. For details, see http://git-annex.branchable.com/bugs/problems_with_glacier/	2014-03-27 14:30:36 -04:00
Alberto Berti	0f7c2dd39b	Fix thaoe remote to work with latest tahoe (v. 1.10.0)	2014-03-26 00:31:02 +01:00
Joey Hess	e426fac273	add desktop notifications Motivation: Hook scripts for nautilus or other file managers need to provide the user with feedback that a file is being downloaded. This commit was sponsored by THM Schoemaker.	2014-03-22 14:12:19 -04:00
Joey Hess	40b599eff2	rsync special remote: Fix slashes when used on Windows.	2014-03-18 13:02:10 -04:00
Joey Hess	b63276309e	clean up cleanup action enumeration	2014-03-13 19:06:26 -04:00
Joey Hess	4d06037fdd	Fix zombie leak and general inneficiency when copying files to a local git repo. Benchmarking this with 1000 small files being copied, the time reduced from 15.98s to 14.64s -- an 8% improvement in the non-data-transfer overhead of git-annex copy.	2014-03-06 17:13:27 -04:00
Joey Hess	aa377ed567	webdav: When built with a new enough haskell DAV (0.6), disable the http response timeout, which was only 5 seconds.	2014-03-05 13:51:54 -04:00
Joey Hess	1f98d6fb00	glacier: Pass --region to glacier checkpresent. I suppose this is not necessary when it has a local cache, so I didn't notice it was missing.	2014-03-04 23:22:24 -04:00
Joey Hess	a1432bce2f	Put non-object tmp files in .git/annex/misctmp, leaving .git/annex/tmp for only partially transferred objects. This allows eg, putting .git/annex/tmp on a ram disk, if the disk IO of temp object files is too annoying (and if you don't want to keep partially transferred objects across reboots). .git/annex/misctmp must be on the same filesystem as the git work tree, since files are moved to there in a way that will not work cross-device, as well as symlinked into there. I first wanted to put the tmp objects in .git/annex/objects/tmp, but that would pose transition problems on upgrade when partially transferred objects existed. git annex info does not currently show the size of .git/annex/misctemp, since it should stay small. It would also be ok to make something clean it out, periodically.	2014-02-26 16:52:56 -04:00
Joey Hess	2aeb0750f9	more DAV url fixes for windows	2014-02-25 16:16:14 -04:00
Joey Hess	b1931d1cc1	add protocol-level debugging for dav	2014-02-25 15:58:44 -04:00
Joey Hess	2b66aaa763	Windows webdav: Fix DOS path separator bug. Use posix </> etc for urls.	2014-02-25 15:26:33 -04:00
Joey Hess	360ecb9f35	fix bare repo optimisation on Windows	2014-02-25 13:47:09 -04:00
Joey Hess	06142f4943	fix #740010 properly	2014-02-25 01:55:01 -04:00
Joey Hess	003fc2b7e1	add UrlOptions sum type	2014-02-24 22:00:25 -04:00
Joey Hess	c69d6eb035	Make annex.web-options be used in several places that call curl.	2014-02-24 21:29:37 -04:00
Joey Hess	d5a2b498f6	webdav: When built with DAV 0.6.0, use the new DAV monad to avoid locking files, which is not needed by git-annex's use of webdav, and does not work on Box.com.	2014-02-24 18:21:51 -04:00
Joey Hess	45e7040142	webapp: Fix creation of box.com, S3, and Glacier repositories, broken in 5.20140221.	2014-02-24 15:29:17 -04:00
Joey Hess	ded4ab5704	Fix handling of rsync remote urls containing a username, including rsync.net. This breakage seems to have been caused way back in `a1eded86`, but I am pretty sure rsync.net support has not been entirely broken since last April. AFAICS, the generated .ssh/config has not changed since then -- it has never included a Username setting line. So, I am puzzled at when this reversion was introduced. Note that the breakage only affected checkpresent and remove. Upload and download use the ssh connection caching, which includes a -l username.	2014-02-21 13:20:57 -04:00
Joey Hess	7d288d83c9	glacier: Do not try to run glacier value create when an existing glacier remote is enabled.	2014-02-20 15:56:26 -04:00
Joey Hess	4e0be2792b	remove Read instance for Ref Removed instance, got it all to build using fromRef. (With a few things that really need to show something using a ref for debugging stubbed out.) Then added back Read instance, and made Logs.View use it for serialization. This changes the view log format.	2014-02-19 01:19:57 -04:00
Joey Hess	7b19c7d25b	cleanup thanks to Utility.PID	2014-02-11 15:39:51 -04:00
Joey Hess	fa24ba2520	plumb creds from webapp to initremote Avoids abusing setting environment variables, which was always a hack and won't work on windows.	2014-02-11 14:07:56 -04:00
Joey Hess	e885080d06	Add progress display for transfers to/from external special remotes.	2014-02-10 21:33:22 -04:00
Joey Hess	08afe3a1f6	fix failing test case on Windows ensure file being modified is all read before it's opened for write	2014-02-03 10:20:18 -04:00
Joey Hess	1572c460e8	avoid using openFile when withFile can be used Potentially fixes some FD leak if an action on an opened file handle fails for some reason. There have been some hard to reproduce reports of git-annex leaking FDs, and this may solve them.	2014-02-03 10:19:06 -04:00
Joey Hess	089c0109a2	Added ways to configure rsync options to be used only when uploading or downloading from a remote. Useful to eg limit upload bandwidth.	2014-02-02 16:06:34 -04:00
Joey Hess	070ed4a766	change a few renameFile's to rename AFAIK, none of these ever operate on directories, but nor do I want to explicitly check if they're files and fail if not.	2014-01-29 15:21:02 -04:00
Joey Hess	891c85cd88	use locking on Windows This is all the easy cases, where there was already a separate lock file.	2014-01-28 14:42:03 -04:00
Joey Hess	74b101d1dd	reorg	2014-01-26 16:36:31 -04:00
Joey Hess	1ca111620d	reorg	2014-01-26 16:32:55 -04:00
Joey Hess	5fc2d760ea	Optimise non-bare http remotes; no longer does a 404 to the wrong url every time before trying the right url. Needs annex-bare to be set to false, which is done when initially probing the uuid of a http remote.	2014-01-26 13:03:25 -04:00
Joey Hess	b40df4f0d0	reorganize numcopies code (no behavior changes) Move stuff into Logs.NumCopies. Add a NumCopies newtype. Better names for various serialization classes that are specific to one thing or another.	2014-01-21 16:08:59 -04:00
Joey Hess	b6ba0bd556	sync --content: New option that makes the content of annexed files be transferred. Similar to the assistant, this honors any configured preferred content expressions. I am not entirely happpy with the implementation. It would be nicer if the seek function returned a list of actions which included the individual file gets and copies and drops, rather than the current list of calls to syncContent. This would allow getting rid of the somewhat reundant display of "sync file [ok\|failed]" after the get/put display. But, do that, withFilesInGit would need to somehow be able to construct such a mixed action list. And it would be less efficient than the current implementation, which is able to reuse several values between eg get and drop. Note that currently this does not try to satisfy numcopies when getting/putting files (numcopies are of course checked when dropping files!) This makes it like the assistant, and unlike get --auto and copy --auto, which do duplicate files when numcopies is not yet satisfied. I don't know if this is the right decision; it only seemed to make sense to have this parallel the assistant as far as possible to start with, since I know the assistant works. This commit was sponsored by Øyvind Andersen Holm.	2014-01-19 17:49:54 -04:00
Joey Hess	0d544649d0	catch exception checking if url exists when network is disconnected Leads to better failure message (or possibly fallback to another remote).	2014-01-16 21:24:17 -04:00
Joey Hess	207ac67aaa	avoid needing a build-dep on hxt for Data.AssocList	2014-01-14 16:42:10 -04:00
Joey Hess	d07f2d7865	Fix a long-standing bug that could cause the wrong index file to be used when committing to the git-annex branch, if GIT_INDEX_FILE is set in the environment. This typically resulted in git-annex branch log files being committed to the master branch and later showing up in the work tree. (These log files can be safely removed.)	2014-01-14 15:36:33 -04:00

1 2 3 4 5 ...

605 commits