remove empty log files in transition

forget --drop-dead: Remove several classes of git-annex log files when they
become empty, further reducing the size of the git-annex branch.

Noticed while testing sameas uuid removal, but it could happen other times
too.

An empty log file is always treated by git-annex the same as no file
being present, and when the files are per-key, it can be a sizable space
saving to exclude them from the tree.
This commit is contained in:
Joey Hess 2019-10-14 16:04:15 -04:00
parent 5e9a2cc37f
commit 4306dfbe68
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 27 additions and 30 deletions

View file

@ -583,22 +583,22 @@ performTransitionsLocked jl ts neednewlocalbranch transitionedrefs = do
content <- getStaged f content <- getStaged f
apply changers' f content apply changers' f content
apply [] _ _ = return () apply [] _ _ = return ()
apply (changer:rest) file content = apply (changer:rest) file content = case changer file content of
case changer file content of PreserveFile -> apply rest file content
RemoveFile -> do ChangeFile builder -> do
let content' = toLazyByteString builder
if L.null content'
then do
Annex.Queue.addUpdateIndex Annex.Queue.addUpdateIndex
=<< inRepo (Git.UpdateIndex.unstageFile file) =<< inRepo (Git.UpdateIndex.unstageFile file)
-- File is deleted; can't run any other -- File is deleted; can't run any other
-- transitions on it. -- transitions on it.
return () return ()
ChangeFile builder -> do else do
let content' = toLazyByteString builder
sha <- hashBlob content' sha <- hashBlob content'
Annex.Queue.addUpdateIndex $ Git.UpdateIndex.pureStreamer $ Annex.Queue.addUpdateIndex $ Git.UpdateIndex.pureStreamer $
Git.UpdateIndex.updateIndexLine sha TreeFile (asTopFilePath file) Git.UpdateIndex.updateIndexLine sha TreeFile (asTopFilePath file)
apply rest file content' apply rest file content'
PreserveFile ->
apply rest file content
checkBranchDifferences :: Git.Ref -> Annex () checkBranchDifferences :: Git.Ref -> Annex ()
checkBranchDifferences ref = do checkBranchDifferences ref = do
@ -666,3 +666,4 @@ rememberTreeish treeish graftpoint = lockJournal $ \jl -> do
-- and the index was updated to that above, so it's safe to -- and the index was updated to that above, so it's safe to
-- say that the index contains c'. -- say that the index contains c'.
setIndexSha c' setIndexSha c'

View file

@ -24,14 +24,12 @@ import Types.Remote
import Annex.SpecialRemote.Config import Annex.SpecialRemote.Config
import qualified Data.Map as M import qualified Data.Map as M
import qualified Data.Set as S
import qualified Data.ByteString.Lazy as L import qualified Data.ByteString.Lazy as L
import qualified Data.Attoparsec.ByteString.Lazy as A import qualified Data.Attoparsec.ByteString.Lazy as A
import Data.ByteString.Builder import Data.ByteString.Builder
data FileTransition data FileTransition
= ChangeFile Builder = ChangeFile Builder
| RemoveFile
| PreserveFile | PreserveFile
type TransitionCalculator = TrustMap -> M.Map UUID RemoteConfig -> FilePath -> L.ByteString -> FileTransition type TransitionCalculator = TrustMap -> M.Map UUID RemoteConfig -> FilePath -> L.ByteString -> FileTransition
@ -69,19 +67,15 @@ dropDead trustmap remoteconfigmap f content = case getLogVariety f of
dropDeadFromMapLog trustmap' id $ dropDeadFromMapLog trustmap' id $
UUIDBased.parseLogNew A.takeByteString content UUIDBased.parseLogNew A.takeByteString content
Just (ChunkLog _) -> ChangeFile $ Just (ChunkLog _) -> ChangeFile $
Chunk.buildLog $ dropDeadFromMapLog trustmap' fst $ Chunk.parseLog content Chunk.buildLog $ dropDeadFromMapLog trustmap' fst $
Just (PresenceLog _) -> Chunk.parseLog content
let newlog = Presence.compactLog $ Just (PresenceLog _) -> ChangeFile $ Presence.buildLog $
dropDeadFromPresenceLog trustmap' $ Presence.parseLog content Presence.compactLog $
in if null newlog dropDeadFromPresenceLog trustmap' $
then RemoveFile Presence.parseLog content
else ChangeFile $ Presence.buildLog newlog Just RemoteMetaDataLog -> ChangeFile $ MetaData.buildLog $
Just RemoteMetaDataLog -> dropDeadFromRemoteMetaDataLog trustmap' $
let newlog = dropDeadFromRemoteMetaDataLog trustmap' $
MetaData.simplifyLog $ MetaData.parseLog content MetaData.simplifyLog $ MetaData.parseLog content
in if S.null newlog
then RemoveFile
else ChangeFile $ MetaData.buildLog newlog
Just OtherLog -> PreserveFile Just OtherLog -> PreserveFile
Nothing -> PreserveFile Nothing -> PreserveFile
where where

View file

@ -7,6 +7,8 @@ git-annex (7.20191011) UNRELEASED; urgency=medium
or EXPORTSUPPORTED will now get back an ERROR. That would be a very or EXPORTSUPPORTED will now get back an ERROR. That would be a very
hackish thing for an external special remote to do, needing some kind hackish thing for an external special remote to do, needing some kind
of hard-coded key value to be used, so probably nothing will be affected. of hard-coded key value to be used, so probably nothing will be affected.
* forget --drop-dead: Remove several classes of git-annex log files
when they become empty, further reducing the size of the git-annex branch.
-- Joey Hess <id@joeyh.name> Thu, 19 Sep 2019 11:11:19 -0400 -- Joey Hess <id@joeyh.name> Thu, 19 Sep 2019 11:11:19 -0400