Commit graph

571 commits

Author SHA1 Message Date
Joey Hess
6f4592966d make testremote work with gcrypt repos
This involved making Remote.Gcrypt.gen expect a Repo with a regular,
non-gcrypt path. Since tht is what's stored as the Remote's gitrepo,
testremote can then modify it and feed it back into gen.
2014-08-04 08:42:04 -04:00
Joey Hess
d3778e631b remove write bit when storing to local gcrypt repo
Same as is done by rsync, and for regular git repos.
2014-08-03 20:25:44 -04:00
Joey Hess
d12becfdde fix removal from local gcrypt repo that had files stored using rsync
When files are stored using rsync, they have their write bit removed;
so does the directory they're put in. The local repo code did not turn
these bits back on, so failed to remove.
2014-08-03 20:21:46 -04:00
Joey Hess
8601f8f571 when not using rsync (for local gcrypt repo), display own progress meter 2014-08-03 20:19:04 -04:00
Joey Hess
1cd2273035 finally properly fixed ssh zombie leak
The leak was caused by the thread that sshd'd to send transferinfo
not waiting on its ssh. Doh.
2014-08-03 20:14:20 -04:00
Joey Hess
b35f7983ff convert gcrypt to new regime, including chunking
Some reorg of Remote.Rsync code to export the things gcrypt needs.
2014-08-03 17:31:10 -04:00
Joey Hess
f5f961215b finish making rsync support chunking
This breaks gcrypt, which relies on some internals of the rsync remote.
To fix next..
2014-08-03 16:54:57 -04:00
Joey Hess
6c450aad1d move ugly rsync zombie workaround
This reaping of any processes came to cause me problems when redoing the
rsync special remote -- a gpg process that was running gets waited on and
the place that then checks its return code fails.

I cannot reproduce any zombies when using the rsync special remote.
But I still can when using a normal git remote, accessed over ssh.
There is 1 zombie per file downloaded without this horrible hack enabled.

So, move the hack to only be used in that case.
2014-08-03 16:53:29 -04:00
Joey Hess
b3fe23b552 remove redundant progress meter display code
specialRemote handles all meter display, so this is redundant.
2014-08-03 16:18:40 -04:00
Joey Hess
4b16989e98 roll ChunkedEncryptable into Special and improve interface
Allow disabling progress displays, for eg, rsync.
2014-08-03 15:40:01 -04:00
Joey Hess
00f92a7e59 whitespace 2014-08-03 01:21:38 -04:00
Joey Hess
d05b7b9182 better byteRetriever
Make the byteRetriever be passed the callback that consumes the bytestring.

This way, there's no worries about the lazy bytestring not all being read
when the resource that's creating it is closed.

Which in turn lets bup, ddar, and S3 each switch from using an unncessary
fileRetriver to a byteRetriever. So, more efficient on chunks and encrypted
files.

The only remaining fileRetrievers are hook and external, which really do
retrieve to files.
2014-08-03 01:12:24 -04:00
Joey Hess
19b71cfb8f convert ddar to new ChunkedEncryptable API (but do not support chunking)
Since ddar de-deuplicates, I assume there is no benefit from chunking.

This has not been tested!
2014-08-02 18:58:48 -04:00
Joey Hess
b261df735d convert bup to new ChunkedEncryptable API (but do not support chunking)
bup already splits files and does rolling deltas, so there is no reason to
use chunking here.

The new API made it easier to add progress support for storeKey, so that's
done. Unfortunately, bup-split still outputs its own progress with -q,
so a little ugly, but not too bad.

Made dropping remove the branch for an object, for two reasons:

1. The new API calls removeKey to roll back a storeKey when the content
   changed unexpectedly.
2. So that testremote will be happy.

Also, fixed a bug that caused a crash when removing the branch for an
object in rollback.
2014-08-02 18:48:49 -04:00
Joey Hess
7f5cd868d7 hook: use ChunkedEncryptable 2014-08-02 17:25:16 -04:00
Joey Hess
0eb1f057c4 convert glacier to new ChunkedEncryptable API (but do not support chunking)
Chunking would complicate the assistant's code that checks when a pending
retrieval of a key from glacier is done. It would perhaps be nice to
support it to allow resuming, but not right now.

Converting to the new API still simplifies the code.
2014-08-02 16:59:07 -04:00
Joey Hess
32e4368377 S3: support chunking
The assistant defaults to 1MiB chunk size for new S3 special remotes.
Which will work around a couple of bugs:
  http://git-annex.branchable.com/bugs/S3_memory_leaks/
  http://git-annex.branchable.com/bugs/S3_upload_not_using_multipart/
2014-08-02 15:51:58 -04:00
Joey Hess
c3750901d8 specialize Preparer a bit, so resourcePrepare can be added
The forall a. in Preparer made resourcePrepare not seem to be usable, so
I specialized a to Bool. Which works for both Preparer Storer and
Preparer Retriever, but wouldn't let the Preparer be used for hasKey
as it currently stands.
2014-08-02 15:34:09 -04:00
Joey Hess
de0da0aece minor optimisation 2014-08-01 17:18:39 -04:00
Joey Hess
3991327d09 testremote: Test retrieveKeyFile resume
And fixed a bug found by these tests; retrieveKeyFile would fail
when the dest file was already complete.

This commit was sponsored by Bradley Unterrheiner.
2014-08-01 17:16:20 -04:00
Joey Hess
9636cfd9e1 fix a fenchpost bug when resuming chunked store at end
Discovered thanks to testremote command!
2014-08-01 16:29:39 -04:00
Joey Hess
8fce4e4bd7 fix chunk=0
Found by testremote
2014-08-01 15:36:11 -04:00
Joey Hess
b5ac627fee WebDAV: Dropped support for DAV before 0.6.1.
0.6.1 is in testing, and stable does not have DAV at all, so I can dispense
with this compatability code
2014-07-30 11:20:35 -04:00
Joey Hess
89416ba2d9 only chunk stable keys
The content of unstable keys can potentially be different in different
repos, so eg, resuming a chunked upload started by another repo would
corrupt data.
2014-07-30 10:34:39 -04:00
Joey Hess
a963d790d3 update progress after each chunk, at least
This way, when the remote implementation neglects to update progress,
there will still be a somewhat useful progress display, as long as chunks
are used.
2014-07-29 20:31:16 -04:00
Joey Hess
444944c7a9 fix cleanup of FileContents once done when them when retrieving 2014-07-29 20:27:13 -04:00
Joey Hess
53b87a859e optimise case of remote that retrieves FileContent, when chunks and encryption are not being used
No need to read whole FileContent only to write it back out to a file in
this case. Can just rename! Yay.

Also indidentially, fixed an attempt to open a file for write that was
already opened for write, which caused a crash and deadlock.
2014-07-29 20:10:14 -04:00
Joey Hess
c0dc134cde support chunking for all external special remotes!
Removing code and at the same time adding great features, including
upload/download resuming.

This commit was sponsored by Romain Lenglet.
2014-07-29 18:50:20 -04:00
Joey Hess
bc9e4697b9 better type for Retriever
Putting a callback in the Retriever type allows for the callback to
remove the retrieved file when it's done with it.

I did not really want to make Retriever be fixed to Annex Bool,
but when I tried to use Annex a, I got into some type of type mess.
2014-07-29 18:41:41 -04:00
Joey Hess
47e522979c allow Retriever action to update the progress meter
Needed for eg, Remote.External.

Generally, any Retriever that stores content in a file is responsible for
updating the meter, while ones that procude a lazy bytestring cannot update
the meter, so are not asked to.
2014-07-29 17:18:49 -04:00
Joey Hess
1d263e1e7e lift types from IO to Annex
Some remotes like External need to run store and retrieve actions in Annex,
not IO. In order to do that lift, I had to dive pretty deep into the
utilities, making Utility.Gpg and Utility.Tmp be partly converted to using
MonadIO, and Control.Monad.Catch for exception handling.

There should be no behavior changes in this commit.

This commit was sponsored by Michael Barabanov.
2014-07-29 16:28:44 -04:00
Joey Hess
f5af470875 add ContentSource type, for remotes that act on files rather than ByteStrings
Note that currently nothing cleans up a ContentSource's file, when eg,
retrieving chunks.
2014-07-29 15:16:12 -04:00
Joey Hess
216fdbd6bd fix non-checked hasKeyChunks 2014-07-29 15:07:32 -04:00
Joey Hess
58f727afdd resume interrupted chunked uploads
Leverage the new chunked remotes to automatically resume uploads.
Sort of like rsync, although of course not as efficient since this
needs to start at a chunk boundry.

But, unlike rsync, this method will work for S3, WebDAV, external
special remotes, etc, etc. Only directory special remotes so far,
but many more soon!

This implementation will also allow starting an upload from one repository,
interrupting it, and then resuming the upload to the same remote from
an entirely different repository.

Note that I added a comment that storeKey should atomically move the content
into place once it's all received. This was already an undocumented
requirement -- it's necessary for hasKey to work reliably. This resume code
just uses hasKey to find the first chunk that's missing.

Note that if there are two uploads of the same key to the same chunked remote,
one might resume at the point the other had gotten to, but both will then
redundantly upload. As before.

In the non-resume case, this adds one hasKey call per storeKey, and only
if the remote is configured to use chunks. Future work: Try to eliminate that
hasKey. Notice that eg, `git annex copy --to` checks if the key is present
before sending it, so is already running hasKey.. which could perhaps
be cached and reused.

However, this additional overhead is not very large compared with
transferring an entire large file, and the ability to resume
is certianly worth it. There is an optimisation in place for small files,
that avoids trying to resume if the whole file fits within one chunk.

This commit was sponsored by Georg Bauer.
2014-07-28 14:35:52 -04:00
Joey Hess
153ace4524 fix handling of removal of keys that are not present 2014-07-28 14:14:01 -04:00
Joey Hess
80cc554c82 add ChunkMethod type and make Logs.Chunk use it, rather than assuming fixed size chunks (so eg, rolling hash chunks can be supported later)
If a newer git-annex starts logging something else in the chunk log, it
won't be used by this version, but it will be preserved when updating the
log.
2014-07-28 13:19:08 -04:00
Joey Hess
9d4a766cd7 resume interrupted chunked downloads
Leverage the new chunked remotes to automatically resume downloads.
Sort of like rsync, although of course not as efficient since this
needs to start at a chunk boundry.

But, unlike rsync, this method will work for S3, WebDAV, external
special remotes, etc, etc. Only directory special remotes so far,
but many more soon!

This implementation will also properly handle starting a download
from one remote, interrupting, and resuming from another one, and so on.

(Resuming interrupted chunked uploads is similarly doable, although
slightly more expensive.)

This commit was sponsored by Thomas Djärv.
2014-07-27 18:56:32 -04:00
Joey Hess
2996f0eb05 use existing chunks even when chunk=0
When chunk=0, always try the unchunked key first. This avoids the overhead
of needing to read the git-annex branch to find the chunkcount.

However, if the unchunked key is not present, go on and try the chunks.

Also, when removing a chunked key, update the chunkcounts even when
chunk=0.
2014-07-27 02:13:51 -04:00
Joey Hess
7afb057d60 reorg 2014-07-27 01:24:34 -04:00
Joey Hess
bffd0e34b3 comment typo 2014-07-27 01:22:51 -04:00
Joey Hess
c3af4897c0 faster storeChunks
No need to process each L.ByteString chunk, instead ask it to split.

Doesn't seem to have really sped things up much, but it also made the code
simpler.

Note that this does (and already did) buffer in memory. It seems that only
the directory special remote could take advantage of streaming chunks to
files w/o buffering, so probably won't add an interface to allow for that.
2014-07-27 01:18:38 -04:00
Joey Hess
f3e47b16a5 better Preparer interface
This will allow things like WebDAV to opean a single persistent connection
and reuse it for all the chunked data.

The crazy types allow for some nice code reuse.
2014-07-27 00:30:04 -04:00
Joey Hess
9a8c4bb21f improve exception handling
Push it down from needing to be done in every Storer,
to being checked once inside ChunkedEncryptable.

Also, catch exceptions from PrepareStorer and PrepareRetriever,
just in case..
2014-07-26 23:26:10 -04:00
Joey Hess
867fd116a7 better exception display 2014-07-26 23:01:44 -04:00
Joey Hess
0d89b65bfc fix key checking when a directory special remote's directory is missing
The best thing to do in this case is return Left, so that anything that
tries to access it will fail.
2014-07-26 22:52:47 -04:00
Joey Hess
93be3296fc fix another fallback bug 2014-07-26 22:47:52 -04:00
Joey Hess
86e8532c0a allM has slightly better memory use 2014-07-26 22:34:40 -04:00
Joey Hess
67975bf50d fix fallback to other chunk size when first does not have it 2014-07-26 22:25:50 -04:00
Joey Hess
adb6ca62ca fix build 2014-07-26 20:21:36 -04:00
Joey Hess
34c6fdf5e3 fix build 2014-07-26 20:21:10 -04:00