Make import --deduplicate and --skip-duplicates only hash once, not twice

import: --deduplicate and --skip-duplicates were implemented inneficiently; they unncessarily hashed each file twice. They have been improved to only hash once. The new approach is to lock down (minimally) and hash files, and then reuse that information when importing them. This was rather tricky, especially in detecting changes to files while they are being imported. The output of import changed slightly. While before it silently skipped over files with eg --skip-duplicates, now it shows each file as it starts to act on it. Since every file is hashed first thing, it would otherwise not be clear what file import is chewing on. (Actually, it wasn't clear before when any of the duplicates switches were used.) This commit was sponsored by Alexander Thompson on Patreon.
2017-02-09 15:32:22 -04:00 · 2017-02-09 15:32:22 -04:00 · f617988a29
commit f617988a29
parent 30ab4ecc4b
5 changed files with 90 additions and 41 deletions
--- a/Assistant/Threads/Committer.hs
+++ b/Assistant/Threads/Committer.hs
@ -322,7 +322,7 @@ handleAdds havelsof delayadd cs = returnWhen (null incomplete) $ do
 		doadd = sanitycheck ks $ do
 			(mkey, mcache) <- liftAnnex $ do
 				showStart "add" $ keyFilename ks
-				ingest $ Just $ LockedDown lockdownconfig ks
+				ingest (Just $ LockedDown lockdownconfig ks) Nothing
 			maybe (failedingest change) (done change mcache $ keyFilename ks) mkey
 	add _ _ = return Nothing