remove empty log files in transition

forget --drop-dead: Remove several classes of git-annex log files when they
become empty, further reducing the size of the git-annex branch.

Noticed while testing sameas uuid removal, but it could happen other times
too.

An empty log file is always treated by git-annex the same as no file
being present, and when the files are per-key, it can be a sizable space
saving to exclude them from the tree.
This commit is contained in:
Joey Hess 2019-10-14 16:04:15 -04:00
parent 5e9a2cc37f
commit 4306dfbe68
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 27 additions and 30 deletions

View file

@ -583,22 +583,22 @@ performTransitionsLocked jl ts neednewlocalbranch transitionedrefs = do
content <- getStaged f
apply changers' f content
apply [] _ _ = return ()
apply (changer:rest) file content =
case changer file content of
RemoveFile -> do
Annex.Queue.addUpdateIndex
=<< inRepo (Git.UpdateIndex.unstageFile file)
-- File is deleted; can't run any other
-- transitions on it.
return ()
ChangeFile builder -> do
let content' = toLazyByteString builder
sha <- hashBlob content'
Annex.Queue.addUpdateIndex $ Git.UpdateIndex.pureStreamer $
Git.UpdateIndex.updateIndexLine sha TreeFile (asTopFilePath file)
apply rest file content'
PreserveFile ->
apply rest file content
apply (changer:rest) file content = case changer file content of
PreserveFile -> apply rest file content
ChangeFile builder -> do
let content' = toLazyByteString builder
if L.null content'
then do
Annex.Queue.addUpdateIndex
=<< inRepo (Git.UpdateIndex.unstageFile file)
-- File is deleted; can't run any other
-- transitions on it.
return ()
else do
sha <- hashBlob content'
Annex.Queue.addUpdateIndex $ Git.UpdateIndex.pureStreamer $
Git.UpdateIndex.updateIndexLine sha TreeFile (asTopFilePath file)
apply rest file content'
checkBranchDifferences :: Git.Ref -> Annex ()
checkBranchDifferences ref = do
@ -666,3 +666,4 @@ rememberTreeish treeish graftpoint = lockJournal $ \jl -> do
-- and the index was updated to that above, so it's safe to
-- say that the index contains c'.
setIndexSha c'

View file

@ -24,14 +24,12 @@ import Types.Remote
import Annex.SpecialRemote.Config
import qualified Data.Map as M
import qualified Data.Set as S
import qualified Data.ByteString.Lazy as L
import qualified Data.Attoparsec.ByteString.Lazy as A
import Data.ByteString.Builder
data FileTransition
= ChangeFile Builder
| RemoveFile
| PreserveFile
type TransitionCalculator = TrustMap -> M.Map UUID RemoteConfig -> FilePath -> L.ByteString -> FileTransition
@ -69,19 +67,15 @@ dropDead trustmap remoteconfigmap f content = case getLogVariety f of
dropDeadFromMapLog trustmap' id $
UUIDBased.parseLogNew A.takeByteString content
Just (ChunkLog _) -> ChangeFile $
Chunk.buildLog $ dropDeadFromMapLog trustmap' fst $ Chunk.parseLog content
Just (PresenceLog _) ->
let newlog = Presence.compactLog $
dropDeadFromPresenceLog trustmap' $ Presence.parseLog content
in if null newlog
then RemoveFile
else ChangeFile $ Presence.buildLog newlog
Just RemoteMetaDataLog ->
let newlog = dropDeadFromRemoteMetaDataLog trustmap' $
Chunk.buildLog $ dropDeadFromMapLog trustmap' fst $
Chunk.parseLog content
Just (PresenceLog _) -> ChangeFile $ Presence.buildLog $
Presence.compactLog $
dropDeadFromPresenceLog trustmap' $
Presence.parseLog content
Just RemoteMetaDataLog -> ChangeFile $ MetaData.buildLog $
dropDeadFromRemoteMetaDataLog trustmap' $
MetaData.simplifyLog $ MetaData.parseLog content
in if S.null newlog
then RemoveFile
else ChangeFile $ MetaData.buildLog newlog
Just OtherLog -> PreserveFile
Nothing -> PreserveFile
where

View file

@ -7,6 +7,8 @@ git-annex (7.20191011) UNRELEASED; urgency=medium
or EXPORTSUPPORTED will now get back an ERROR. That would be a very
hackish thing for an external special remote to do, needing some kind
of hard-coded key value to be used, so probably nothing will be affected.
* forget --drop-dead: Remove several classes of git-annex log files
when they become empty, further reducing the size of the git-annex branch.
-- Joey Hess <id@joeyh.name> Thu, 19 Sep 2019 11:11:19 -0400