Commit graph

10 commits

Author SHA1 Message Date
Joey Hess
444944c7a9 fix cleanup of FileContents once done when them when retrieving 2014-07-29 20:27:13 -04:00
Joey Hess
53b87a859e optimise case of remote that retrieves FileContent, when chunks and encryption are not being used
No need to read whole FileContent only to write it back out to a file in
this case. Can just rename! Yay.

Also indidentially, fixed an attempt to open a file for write that was
already opened for write, which caused a crash and deadlock.
2014-07-29 20:10:14 -04:00
Joey Hess
bc9e4697b9 better type for Retriever
Putting a callback in the Retriever type allows for the callback to
remove the retrieved file when it's done with it.

I did not really want to make Retriever be fixed to Annex Bool,
but when I tried to use Annex a, I got into some type of type mess.
2014-07-29 18:41:41 -04:00
Joey Hess
47e522979c allow Retriever action to update the progress meter
Needed for eg, Remote.External.

Generally, any Retriever that stores content in a file is responsible for
updating the meter, while ones that procude a lazy bytestring cannot update
the meter, so are not asked to.
2014-07-29 17:18:49 -04:00
Joey Hess
f5af470875 add ContentSource type, for remotes that act on files rather than ByteStrings
Note that currently nothing cleans up a ContentSource's file, when eg,
retrieving chunks.
2014-07-29 15:16:12 -04:00
Joey Hess
58f727afdd resume interrupted chunked uploads
Leverage the new chunked remotes to automatically resume uploads.
Sort of like rsync, although of course not as efficient since this
needs to start at a chunk boundry.

But, unlike rsync, this method will work for S3, WebDAV, external
special remotes, etc, etc. Only directory special remotes so far,
but many more soon!

This implementation will also allow starting an upload from one repository,
interrupting it, and then resuming the upload to the same remote from
an entirely different repository.

Note that I added a comment that storeKey should atomically move the content
into place once it's all received. This was already an undocumented
requirement -- it's necessary for hasKey to work reliably. This resume code
just uses hasKey to find the first chunk that's missing.

Note that if there are two uploads of the same key to the same chunked remote,
one might resume at the point the other had gotten to, but both will then
redundantly upload. As before.

In the non-resume case, this adds one hasKey call per storeKey, and only
if the remote is configured to use chunks. Future work: Try to eliminate that
hasKey. Notice that eg, `git annex copy --to` checks if the key is present
before sending it, so is already running hasKey.. which could perhaps
be cached and reused.

However, this additional overhead is not very large compared with
transferring an entire large file, and the ability to resume
is certianly worth it. There is an optimisation in place for small files,
that avoids trying to resume if the whole file fits within one chunk.

This commit was sponsored by Georg Bauer.
2014-07-28 14:35:52 -04:00
Joey Hess
9d4a766cd7 resume interrupted chunked downloads
Leverage the new chunked remotes to automatically resume downloads.
Sort of like rsync, although of course not as efficient since this
needs to start at a chunk boundry.

But, unlike rsync, this method will work for S3, WebDAV, external
special remotes, etc, etc. Only directory special remotes so far,
but many more soon!

This implementation will also properly handle starting a download
from one remote, interrupting, and resuming from another one, and so on.

(Resuming interrupted chunked uploads is similarly doable, although
slightly more expensive.)

This commit was sponsored by Thomas Djärv.
2014-07-27 18:56:32 -04:00
Joey Hess
f3e47b16a5 better Preparer interface
This will allow things like WebDAV to opean a single persistent connection
and reuse it for all the chunked data.

The crazy types allow for some nice code reuse.
2014-07-27 00:30:04 -04:00
Joey Hess
9a8c4bb21f improve exception handling
Push it down from needing to be done in every Storer,
to being checked once inside ChunkedEncryptable.

Also, catch exceptions from PrepareStorer and PrepareRetriever,
just in case..
2014-07-26 23:26:10 -04:00
Joey Hess
1400cbb032 Support for remotes that are chunkable and encryptable.
I'd have liked to keep these two concepts entirely separate,
but that are entagled: Storing a key in an encrypted and chunked remote
need to generate chunk keys, encrypt the keys, chunk the data, encrypt the
chunks, and send them to the remote. Similar for retrieval, etc.

So, here's an implemnetation of all of that.

The total win here is that every remote was implementing encrypted storage
and retrival, and now it can move into this single place. I expect this
to result in several hundred lines of code being removed from git-annex
eventually!

This commit was sponsored by Henrik Ahlgren.
2014-07-26 20:14:31 -04:00