From eb59da9dd204b6c15ffd3e1cfb66acde5f0e904a Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Mon, 11 Dec 2023 15:04:06 -0400 Subject: [PATCH] Lower precision of timestamps in git-annex branch This can reduce the size of the branch by up to 8%. My test was running git-annex add 1000 times on one file each. Lots of different high-resolution timestamps were recorded before and eliminating those, after packing, the git repo was 8% smaller. Due to the use of vector clocks, high resolution timestamps are not necessary to make clear which information is most recent when eg, a value is changed repeatedly in the same second. In such a case, the vector clock will be advanced to the next second after the last modification. For example, running git-annex numcopies 1; git-annex numcopies 2 The first will record the current second, while the next records the second after that even if it runs in the same second. As for conflicting information written to two different clones of the repository, this will make git-annex sometimes pick information that was written earlier in a second over information written later in the same second. Usually git-annex does not write conflicting information, but there are some cases where it could. Eg, storing an object on a remote can update the remote state log with some state. If two repos both store the same object, and end up storing different remote state for some reason, this can result in one that ran a tiny bit later winning. Such a situation seems unlikely to be user visible. And a small amount of clock skew could already result in such things. The only case I can think of where this might be a user visible change is if a configuration command like git-annex numcopies is being run in 2 clones of a repository on the same machine at very close to the same time. Then the user will know which they ran last, and git-annex won't. If that did become a problem, this could be dialed back to eg log milliseconds with still some space saving. --- Annex/VectorClock/Utility.hs | 12 +++++++++++- CHANGELOG | 2 ++ Utility/TimeStamp.hs | 10 +++++++++- 3 files changed, 22 insertions(+), 2 deletions(-) diff --git a/Annex/VectorClock/Utility.hs b/Annex/VectorClock/Utility.hs index 1396dd54cb..76b74d9cd5 100644 --- a/Annex/VectorClock/Utility.hs +++ b/Annex/VectorClock/Utility.hs @@ -20,4 +20,14 @@ startVectorClock = go =<< getEnv "GIT_ANNEX_VECTOR_CLOCK" go (Just s) = case parsePOSIXTime s of Just t -> return (pure (CandidateVectorClock t)) Nothing -> timebased - timebased = return (CandidateVectorClock <$> getPOSIXTime) + -- Avoid using fractional seconds in the CandidateVectorClock. + -- This reduces the size of the packed git-annex branch by up + -- to 8%. + -- + -- Due to the use of vector clocks, high resolution timestamps are + -- not necessary to make clear which information is most recent when + -- eg, a value is changed repeatedly in the same second. In such a + -- case, the vector clock will be advanced to the next second after + -- the last modification. + timebased = return $ + CandidateVectorClock . truncateResolution 0 <$> getPOSIXTime diff --git a/CHANGELOG b/CHANGELOG index 862e594492..0a350cd892 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -17,6 +17,8 @@ git-annex (10.20231130) UNRELEASED; urgency=medium * sync: Fix locking problems during merge when annex.pidlock is set. * Avoid a problem with temp file names ending in "." on certian filesystems that have problems with such filenames. + * Lower precision of timestamps in git-annex branch, which can reduce the + size of the branch by up to 8%. -- Joey Hess Thu, 30 Nov 2023 14:48:12 -0400 diff --git a/Utility/TimeStamp.hs b/Utility/TimeStamp.hs index b740d7bead..878d6f7299 100644 --- a/Utility/TimeStamp.hs +++ b/Utility/TimeStamp.hs @@ -1,6 +1,6 @@ {- timestamp parsing and formatting - - - Copyright 2015-2019 Joey Hess + - Copyright 2015-2023 Joey Hess - - License: BSD-2-clause -} @@ -9,6 +9,7 @@ module Utility.TimeStamp ( parserPOSIXTime, parsePOSIXTime, formatPOSIXTime, + truncateResolution, ) where import Utility.Data @@ -56,3 +57,10 @@ mkPOSIXTime n (d, dlen) formatPOSIXTime :: String -> POSIXTime -> String formatPOSIXTime fmt t = formatTime defaultTimeLocale fmt (posixSecondsToUTCTime t) + +{- Truncate the resolution to the specified number of decimal places. -} +truncateResolution :: Int -> POSIXTime -> POSIXTime +truncateResolution n t = secondsToNominalDiffTime $ + fromIntegral ((truncate (nominalDiffTimeToSeconds t * d)) :: Integer) / d + where + d = 10 ^ n