Commit graph

20727 commits

Author SHA1 Message Date
Joey Hess
1d263e1e7e lift types from IO to Annex
Some remotes like External need to run store and retrieve actions in Annex,
not IO. In order to do that lift, I had to dive pretty deep into the
utilities, making Utility.Gpg and Utility.Tmp be partly converted to using
MonadIO, and Control.Monad.Catch for exception handling.

There should be no behavior changes in this commit.

This commit was sponsored by Michael Barabanov.
2014-07-29 16:28:44 -04:00
Joey Hess
f5af470875 add ContentSource type, for remotes that act on files rather than ByteStrings
Note that currently nothing cleans up a ContentSource's file, when eg,
retrieving chunks.
2014-07-29 15:16:12 -04:00
Joey Hess
216fdbd6bd fix non-checked hasKeyChunks 2014-07-29 15:07:32 -04:00
Joey Hess
2474cf0032 make explicit the implicit requirement that CHECKPRESENT not say a key is present until it's all done being stored 2014-07-28 14:37:22 -04:00
Joey Hess
58f727afdd resume interrupted chunked uploads
Leverage the new chunked remotes to automatically resume uploads.
Sort of like rsync, although of course not as efficient since this
needs to start at a chunk boundry.

But, unlike rsync, this method will work for S3, WebDAV, external
special remotes, etc, etc. Only directory special remotes so far,
but many more soon!

This implementation will also allow starting an upload from one repository,
interrupting it, and then resuming the upload to the same remote from
an entirely different repository.

Note that I added a comment that storeKey should atomically move the content
into place once it's all received. This was already an undocumented
requirement -- it's necessary for hasKey to work reliably. This resume code
just uses hasKey to find the first chunk that's missing.

Note that if there are two uploads of the same key to the same chunked remote,
one might resume at the point the other had gotten to, but both will then
redundantly upload. As before.

In the non-resume case, this adds one hasKey call per storeKey, and only
if the remote is configured to use chunks. Future work: Try to eliminate that
hasKey. Notice that eg, `git annex copy --to` checks if the key is present
before sending it, so is already running hasKey.. which could perhaps
be cached and reused.

However, this additional overhead is not very large compared with
transferring an entire large file, and the ability to resume
is certianly worth it. There is an optimisation in place for small files,
that avoids trying to resume if the whole file fits within one chunk.

This commit was sponsored by Georg Bauer.
2014-07-28 14:35:52 -04:00
Joey Hess
153ace4524 fix handling of removal of keys that are not present 2014-07-28 14:14:01 -04:00
Joey Hess
80cc554c82 add ChunkMethod type and make Logs.Chunk use it, rather than assuming fixed size chunks (so eg, rolling hash chunks can be supported later)
If a newer git-annex starts logging something else in the chunk log, it
won't be used by this version, but it will be preserved when updating the
log.
2014-07-28 13:19:08 -04:00
Joey Hess
a33dafae5a Merge branch 'master' of ssh://git-annex.branchable.com into newchunks 2014-07-28 13:03:43 -04:00
divB
6249c5f30b 2014-07-27 23:16:19 +00:00
divB
b886ed1e25 2014-07-27 23:15:25 +00:00
Joey Hess
6c46a92040 devblog 2014-07-27 19:12:45 -04:00
Joey Hess
9d4a766cd7 resume interrupted chunked downloads
Leverage the new chunked remotes to automatically resume downloads.
Sort of like rsync, although of course not as efficient since this
needs to start at a chunk boundry.

But, unlike rsync, this method will work for S3, WebDAV, external
special remotes, etc, etc. Only directory special remotes so far,
but many more soon!

This implementation will also properly handle starting a download
from one remote, interrupting, and resuming from another one, and so on.

(Resuming interrupted chunked uploads is similarly doable, although
slightly more expensive.)

This commit was sponsored by Thomas Djärv.
2014-07-27 18:56:32 -04:00
Joey Hess
13bbb61a51 add key stability checking interface
Needed for resuming from chunks.

Url keys are considered not stable. I considered treating url keys with a
known size as stable, but just don't feel that is enough information.
2014-07-27 12:33:46 -04:00
Joey Hess
aad8cfe718 use map for faster backend name lookup 2014-07-27 12:24:12 -04:00
Joey Hess
85d17a698d Merge branch 'master' into newchunks
Conflicts:
	doc/design/assistant/chunks.mdwn
2014-07-27 12:24:03 -04:00
Joey Hess
729d38a763 update 2014-07-27 12:23:28 -04:00
Joey Hess
2996f0eb05 use existing chunks even when chunk=0
When chunk=0, always try the unchunked key first. This avoids the overhead
of needing to read the git-annex branch to find the chunkcount.

However, if the unchunked key is not present, go on and try the chunks.

Also, when removing a chunked key, update the chunkcounts even when
chunk=0.
2014-07-27 02:13:51 -04:00
Joey Hess
7afb057d60 reorg 2014-07-27 01:24:34 -04:00
Joey Hess
bffd0e34b3 comment typo 2014-07-27 01:22:51 -04:00
Joey Hess
c3af4897c0 faster storeChunks
No need to process each L.ByteString chunk, instead ask it to split.

Doesn't seem to have really sped things up much, but it also made the code
simpler.

Note that this does (and already did) buffer in memory. It seems that only
the directory special remote could take advantage of streaming chunks to
files w/o buffering, so probably won't add an interface to allow for that.
2014-07-27 01:18:38 -04:00
Joey Hess
f3e47b16a5 better Preparer interface
This will allow things like WebDAV to opean a single persistent connection
and reuse it for all the chunked data.

The crazy types allow for some nice code reuse.
2014-07-27 00:30:04 -04:00
Joey Hess
7db60269eb update does for chunking 2014-07-26 23:39:51 -04:00
Joey Hess
9a8c4bb21f improve exception handling
Push it down from needing to be done in every Storer,
to being checked once inside ChunkedEncryptable.

Also, catch exceptions from PrepareStorer and PrepareRetriever,
just in case..
2014-07-26 23:26:10 -04:00
Joey Hess
7496355031 add some more exception handling primitives 2014-07-26 23:24:27 -04:00
Joey Hess
867fd116a7 better exception display 2014-07-26 23:01:44 -04:00
Joey Hess
0d89b65bfc fix key checking when a directory special remote's directory is missing
The best thing to do in this case is return Left, so that anything that
tries to access it will fail.
2014-07-26 22:52:47 -04:00
Joey Hess
93be3296fc fix another fallback bug 2014-07-26 22:47:52 -04:00
Joey Hess
86e8532c0a allM has slightly better memory use 2014-07-26 22:34:40 -04:00
Joey Hess
67975bf50d fix fallback to other chunk size when first does not have it 2014-07-26 22:25:50 -04:00
Joey Hess
fc10959f68 Merge branch 'master' of ssh://git-annex.branchable.com 2014-07-26 20:52:27 -04:00
Joey Hess
5cd1025816 devblog 2014-07-26 20:51:58 -04:00
Joey Hess
275e284dda doc update for new chunking 2014-07-26 20:21:49 -04:00
Joey Hess
adb6ca62ca fix build 2014-07-26 20:21:36 -04:00
Joey Hess
34c6fdf5e3 fix build 2014-07-26 20:21:10 -04:00
Joey Hess
b2922c1d6d convert directory special remote to using ChunkedEncryptable
And clean up legacy chunking code, which is in its own module now.

So much cleaner!

This commit was sponsored by Henrik Ahlgren
2014-07-26 20:19:24 -04:00
Joey Hess
1400cbb032 Support for remotes that are chunkable and encryptable.
I'd have liked to keep these two concepts entirely separate,
but that are entagled: Storing a key in an encrypted and chunked remote
need to generate chunk keys, encrypt the keys, chunk the data, encrypt the
chunks, and send them to the remote. Similar for retrieval, etc.

So, here's an implemnetation of all of that.

The total win here is that every remote was implementing encrypted storage
and retrival, and now it can move into this single place. I expect this
to result in several hundred lines of code being removed from git-annex
eventually!

This commit was sponsored by Henrik Ahlgren.
2014-07-26 20:14:31 -04:00
Joey Hess
d4d68f57e5 finish up basic chunked remote groundwork
Chunk retrieval and reassembly, removal, and checking if all necessary
chunks are present.

This commit was sponsored by Damien Raude-Morvan.
2014-07-26 20:11:41 -04:00
Joey Hess
904859d676 wording 2014-07-26 13:25:06 -04:00
https://www.google.com/accounts/o8/id?id=AItOawmURXBzaYE1gmVc-X9eLAyDat_6rHPl670
45a5276e23 added output of ls -lb in git directory to show that the file is not added to the annex 2014-07-26 16:50:32 +00:00
https://www.google.com/accounts/o8/id?id=AItOawmURXBzaYE1gmVc-X9eLAyDat_6rHPl670
7bdac3a96b 2014-07-26 16:39:41 +00:00
Joey Hess
cf83697c33 reorg 2014-07-26 12:04:35 -04:00
Joey Hess
e4cb50db33 Merge branch 'master' into newchunks 2014-07-26 12:02:48 -04:00
https://www.google.com/accounts/o8/id?id=AItOawk9nck8WX8-ADF3Fdh5vFo4Qrw1I_bJcR8
712465c80e Added a comment 2014-07-26 14:57:53 +00:00
Joey Hess
655bdfd5bd Merge branch 'master' of ssh://git-annex.branchable.com 2014-07-25 20:57:52 -04:00
Joey Hess
8a46a89adc devblog 2014-07-25 20:56:28 -04:00
Joey Hess
005aded3e0 Fix cost calculation for non-encrypted remotes.
Encyptable types of remotes that were not actually encrypted still had
the encryptedRemoteCostAdj applied to their configured cost, which was a
bug.
2014-07-25 17:29:59 -04:00
Joey Hess
9e8a4a0950 support new style chunking in directory special remote
Only when storing non-encrypted so far, not retrieving or checking if a key
is present or removing.

This commit was sponsored by Renaud Casenave-Péré.
2014-07-25 16:21:01 -04:00
Joey Hess
ab4cce4114 core implementation of new style chunking
Not yet used by any special remotes, but should not be too hard to add it
to most of them.

storeChunks is the hairy bit! It's loosely based on
Remote.Directory.storeLegacyChunked. The object is read in using a lazy
bytestring, which is streamed though, creating chunks as needed, without
ever buffering more than 1 chunk in memory.

Getting the progress meter update to work right was also fun, since
progress meter values are absolute. Finessed by constructing an offset
meter.

This commit was sponsored by Richard Collins.
2014-07-25 16:20:32 -04:00
Joey Hess
8f93982df6 use same hash directories for chunked key as are used for its parent
This avoids a proliferation of hash directories when using new-style
chunking, and should improve performance since chunks are accessed
in sequence and so should have a common locality.

Of course, when a chunked key is encrypted, its hash directories have no
relation to the parent key.

This commit was sponsored by Christian Kellermann.
2014-07-25 16:09:23 -04:00
Joey Hess
1755c5de40 thought about chunk key hashing 2014-07-25 15:12:51 -04:00