S3: finish converting to aws library

Implemented the Retriever.

Unfortunately, it is a fileRetriever and not a byteRetriever.
It should be possible to convert this to a byteRetiever, but I got stuck:
The conduit sink needs to process individual chunks, but a byteRetriever
needs to pass a single L.ByteString to its callback for processing. I
looked into using unsafeInerlaveIO to build up the bytestring lazily,
but the sink is already operating under conduit's inversion of control,
and does not run directly in IO anyway.

On the plus side, no more memory leak..
This commit is contained in:
Joey Hess 2014-08-09 15:58:01 -04:00
parent a6da13c1e9
commit 3659cb9efb
3 changed files with 35 additions and 12 deletions

View file

@ -15,6 +15,7 @@ import qualified Aws.S3 as S3
import qualified Data.Text as T import qualified Data.Text as T
import qualified Data.Text.Encoding as T import qualified Data.Text.Encoding as T
import qualified Data.ByteString.Lazy as L import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString as S
import qualified Data.Map as M import qualified Data.Map as M
import Data.Char import Data.Char
import Network.Socket (HostName) import Network.Socket (HostName)
@ -23,6 +24,7 @@ import Network.HTTP.Client (defaultManagerSettings, managerResponseTimeout, resp
import Network.HTTP.Types import Network.HTTP.Types
import Control.Monad.Trans.Resource import Control.Monad.Trans.Resource
import Control.Monad.Catch import Control.Monad.Catch
import Data.Conduit
import Common.Annex import Common.Annex
import Types.Remote import Types.Remote
@ -36,6 +38,7 @@ import qualified Remote.Helper.AWS as AWS
import Creds import Creds
import Annex.UUID import Annex.UUID
import Logs.Web import Logs.Web
import Utility.Metered
type BucketName = String type BucketName = String
@ -145,14 +148,27 @@ store r h = fileStorer $ \k f p -> do
return True return True
{- Implemented as a fileRetriever, that uses conduit to stream the chunks
- out to the file. Would be better to implement a byteRetriever, but
- that is difficult. -}
retrieve :: S3Handle -> Retriever retrieve :: S3Handle -> Retriever
retrieve _h = error "TODO" retrieve h = fileRetriever $ \f k p -> liftIO $ runResourceT $ do
{- (fr, fh) <- allocate (openFile f WriteMode) hClose
resourcePrepare (const $ s3Action r False) $ \(conn, bucket) -> let req = S3.getObject (hBucket h) (hBucketObject h k)
byteRetriever $ \k sink -> S3.GetObjectResponse { S3.gorResponse = rsp } <- sendS3Handle' h req
liftIO (getObject conn $ bucketKey r bucket k) responseBody rsp $$+- sinkprogressfile fh p zeroBytesProcessed
>>= either s3Error (sink . obj_data) release fr
-} where
sinkprogressfile fh meterupdate sofar = do
mbs <- await
case mbs of
Nothing -> return ()
Just bs -> do
let sofar' = sofar -- addBytesProcessed $ S.length bs
liftIO $ do
void $ meterupdate sofar'
S.hPut fh bs
sinkprogressfile fh meterupdate sofar'
retrieveCheap :: Key -> FilePath -> Annex Bool retrieveCheap :: Key -> FilePath -> Annex Bool
retrieveCheap _ _ = return False retrieveCheap _ _ = return False
@ -289,9 +305,14 @@ sendS3Handle
=> S3Handle => S3Handle
-> req -> req
-> Annex res -> Annex res
sendS3Handle h = liftIO . runResourceT . call sendS3Handle h r = liftIO $ runResourceT $ sendS3Handle' h r
where
call = AWS.pureAws (hawscfg h) (hs3cfg h) (hmanager h) sendS3Handle'
:: (AWS.Transaction r a, AWS.ServiceConfiguration r ~ S3.S3Configuration)
=> S3Handle
-> r
-> ResourceT IO a
sendS3Handle' h = AWS.pureAws (hawscfg h) (hs3cfg h) (hmanager h)
withS3Handle :: RemoteConfig -> UUID -> (S3Handle -> Annex a) -> Annex a withS3Handle :: RemoteConfig -> UUID -> (S3Handle -> Annex a) -> Annex a
withS3Handle c u a = do withS3Handle c u a = do

4
debian/changelog vendored
View file

@ -16,9 +16,9 @@ git-annex (5.20140718) UNRELEASED; urgency=medium
were incompletely repaired before. were incompletely repaired before.
* Fix cost calculation for non-encrypted remotes. * Fix cost calculation for non-encrypted remotes.
* Display exception message when a transfer fails due to an exception. * Display exception message when a transfer fails due to an exception.
* WebDAV: Sped up by avoiding making multiple http connections * WebDAV, S3: Sped up by avoiding making multiple http connections
when storing a file. when storing a file.
* WebDAV: Avoid buffering whole file in memory when uploading and * WebDAV, S3: Avoid buffering whole file in memory when uploading and
downloading. downloading.
* WebDAV: Dropped support for DAV before 1.0. * WebDAV: Dropped support for DAV before 1.0.
* S3: Switched to using the haskell aws library. * S3: Switched to using the haskell aws library.

View file

@ -7,6 +7,8 @@ Sending a file to S3 causes a slow memory increase toward the file size.
Copying the file back from S3 causes a slow memory increase toward the Copying the file back from S3 causes a slow memory increase toward the
file size. file size.
> [[fixed|done]] too! --[[Joey]]
The author of hS3 is aware of the problem, and working on it. I think I The author of hS3 is aware of the problem, and working on it. I think I
have identified the root cause of the buffering; it's done by hS3 so it can have identified the root cause of the buffering; it's done by hS3 so it can
resend the data if S3 sends it a 307 redirect. --[[Joey]] resend the data if S3 sends it a 307 redirect. --[[Joey]]