incremental hashing for fileRetriever

It uses tailVerify to hash the file while it's being written. This is able to sometimes avoid a separate checksum step. Although if the file gets written quickly enough, tailVerify may not see it get created before the write finishes, and the checksum still happens. Testing with the directory special remote, incremental checksumming did not happen. But then I disabled the copy CoW probing, and it did work. What's going on with that is the CoW probe creates an empty file on failure, then deletes it, and then the file is created again. tailVerify will open the first, empty file, and so fails to read the content that gets written to the file that replaces it. The directory special remote really ought to be able to avoid needing to use tailVerify, and while other special remotes could do things that cause similar problems, they probably don't. And if they do, it just means the checksum doesn't get done incrementally. Sponsored-by: Dartmouth College's DANDI project
2021-08-13 15:43:29 -04:00 · 2021-08-13 15:43:29 -04:00 · dadbb510f6
commit dadbb510f6
parent ff2dc5eb18
8 changed files with 80 additions and 49 deletions
--- a/Remote/Helper/Special.hs
+++ b/Remote/Helper/Special.hs
@ -1,6 +1,6 @@
 {- helpers for special remotes
 -
- - Copyright 2011-2020 Joey Hess <id@joeyh.name>
+ - Copyright 2011-2021 Joey Hess <id@joeyh.name>
 -
 - Licensed under the GNU AGPL version 3 or higher.
 -}
@ -39,6 +39,7 @@ import Annex.Common
 import Annex.SpecialRemote.Config
 import Types.StoreRetrieve
 import Types.Remote
+import Annex.Verify
 import Annex.UUID
 import Config
 import Config.Cost
@ -54,6 +55,8 @@ import Git.Types
 import qualified Data.ByteString as S
 import qualified Data.ByteString.Lazy as L
 import qualified Data.Map as M
+import Control.Concurrent.STM
+import Control.Concurrent.Async

 {- Special remotes don't have a configured url, so Git.Repo does not
 - automatically generate remotes for them. This looks for a different
@ -101,19 +104,33 @@ fileStorer a k (ByteContent b) m = withTmp k $ \f -> do
 byteStorer :: (Key -> L.ByteString -> MeterUpdate -> Annex ()) -> Storer
 byteStorer a k c m = withBytes c $ \b -> a k b m

-- A Retriever that writes the content of a Key to a provided file.
-- It is responsible for updating the progress meter as it retrieves data.
-fileRetriever :: (FilePath -> Key -> MeterUpdate -> Annex ()) -> Retriever
-fileRetriever a k m callback = do
-	f <- prepTmp k
-	a (fromRawFilePath f) k m
-	pruneTmpWorkDirBefore f (callback . FileContent . fromRawFilePath)
-
 -- A Retriever that generates a lazy ByteString containing the Key's
 -- content, and passes it to a callback action which will fully consume it
 -- before returning.
-byteRetriever :: (Key -> (L.ByteString -> Annex a) -> Annex a) -> Key -> MeterUpdate -> (ContentSource -> Annex a) -> Annex a
-byteRetriever a k _m callback = a k (callback . ByteContent)
+byteRetriever :: (Key -> (L.ByteString -> Annex a) -> Annex a) -> Key -> MeterUpdate -> Maybe IncrementalVerifier -> (ContentSource -> Annex a) -> Annex a
+byteRetriever a k _m _miv callback = a k (callback . ByteContent)
+
+-- A Retriever that writes the content of a Key to a provided file.
+-- The action is responsible for updating the progress meter as it 
+-- retrieves data. The incremental verifier is updated in the background as
+-- the action writes to the file.
+fileRetriever :: (FilePath -> Key -> MeterUpdate -> Annex ()) -> Retriever
+fileRetriever a k m miv callback = do
+	f <- prepTmp k
+	let retrieve = a (fromRawFilePath f) k m
+	case miv of
+		Nothing -> retrieve
+		Just iv -> do
+			finished <- liftIO newEmptyTMVarIO
+			t <- liftIO $ async $ tailVerify iv f finished
+			retrieve
+			liftIO $ atomically $ putTMVar finished ()
+			liftIO (wait t) >>= \case
+				Nothing -> noop
+				Just deferredverify -> do
+					showAction (descVerify iv)
+					liftIO deferredverify
+	pruneTmpWorkDirBefore f (callback . FileContent . fromRawFilePath)

 {- The base Remote that is provided to specialRemote needs to have
 - storeKey, retrieveKeyFile, removeKey, and checkPresent methods,