2018-10-30 03:13:36 +00:00
|
|
|
{- timestamp parsing and formatting
|
2015-05-10 19:23:38 +00:00
|
|
|
-
|
Lower precision of timestamps in git-annex branch
This can reduce the size of the branch by up to 8%. My test was
running git-annex add 1000 times on one file each.
Lots of different high-resolution timestamps were recorded before
and eliminating those, after packing, the git repo was 8% smaller.
Due to the use of vector clocks, high resolution timestamps are
not necessary to make clear which information is most recent when
eg, a value is changed repeatedly in the same second. In such a
case, the vector clock will be advanced to the next second after
the last modification. For example, running
git-annex numcopies 1; git-annex numcopies 2
The first will record the current second, while the next records
the second after that even if it runs in the same second.
As for conflicting information written to two different clones of the
repository, this will make git-annex sometimes pick information that
was written earlier in a second over information written later in the
same second. Usually git-annex does not write conflicting information,
but there are some cases where it could. Eg, storing an object on a remote
can update the remote state log with some state. If two repos both store the
same object, and end up storing different remote state for some reason,
this can result in one that ran a tiny bit later winning. Such a situation
seems unlikely to be user visible. And a small amount of clock skew could
already result in such things.
The only case I can think of where this might be a user visible change
is if a configuration command like git-annex numcopies is being run
in 2 clones of a repository on the same machine at very
close to the same time. Then the user will know which they ran last,
and git-annex won't.
If that did become a problem, this could be dialed back to eg log
milliseconds with still some space saving.
2023-12-11 19:04:06 +00:00
|
|
|
- Copyright 2015-2023 Joey Hess <id@joeyh.name>
|
2015-05-10 19:23:38 +00:00
|
|
|
-
|
2018-10-30 03:13:36 +00:00
|
|
|
- License: BSD-2-clause
|
2015-05-10 19:23:38 +00:00
|
|
|
-}
|
|
|
|
|
2023-12-27 19:33:46 +00:00
|
|
|
{-# LANGUAGE CPP #-}
|
|
|
|
|
2019-11-23 15:07:22 +00:00
|
|
|
module Utility.TimeStamp (
|
|
|
|
parserPOSIXTime,
|
|
|
|
parsePOSIXTime,
|
|
|
|
formatPOSIXTime,
|
Lower precision of timestamps in git-annex branch
This can reduce the size of the branch by up to 8%. My test was
running git-annex add 1000 times on one file each.
Lots of different high-resolution timestamps were recorded before
and eliminating those, after packing, the git repo was 8% smaller.
Due to the use of vector clocks, high resolution timestamps are
not necessary to make clear which information is most recent when
eg, a value is changed repeatedly in the same second. In such a
case, the vector clock will be advanced to the next second after
the last modification. For example, running
git-annex numcopies 1; git-annex numcopies 2
The first will record the current second, while the next records
the second after that even if it runs in the same second.
As for conflicting information written to two different clones of the
repository, this will make git-annex sometimes pick information that
was written earlier in a second over information written later in the
same second. Usually git-annex does not write conflicting information,
but there are some cases where it could. Eg, storing an object on a remote
can update the remote state log with some state. If two repos both store the
same object, and end up storing different remote state for some reason,
this can result in one that ran a tiny bit later winning. Such a situation
seems unlikely to be user visible. And a small amount of clock skew could
already result in such things.
The only case I can think of where this might be a user visible change
is if a configuration command like git-annex numcopies is being run
in 2 clones of a repository on the same machine at very
close to the same time. Then the user will know which they ran last,
and git-annex won't.
If that did become a problem, this could be dialed back to eg log
milliseconds with still some space saving.
2023-12-11 19:04:06 +00:00
|
|
|
truncateResolution,
|
2019-11-23 15:07:22 +00:00
|
|
|
) where
|
2015-05-10 19:23:38 +00:00
|
|
|
|
2019-01-02 17:13:17 +00:00
|
|
|
import Utility.Data
|
2016-09-29 18:04:53 +00:00
|
|
|
|
2015-05-10 19:23:38 +00:00
|
|
|
import Data.Time.Clock.POSIX
|
|
|
|
import Data.Time
|
2016-09-29 18:04:53 +00:00
|
|
|
import Data.Ratio
|
2019-01-02 16:26:07 +00:00
|
|
|
import Control.Applicative
|
|
|
|
import qualified Data.ByteString as B
|
2019-01-02 17:13:17 +00:00
|
|
|
import qualified Data.ByteString.Char8 as B8
|
2019-01-02 16:26:07 +00:00
|
|
|
import qualified Data.Attoparsec.ByteString as A
|
|
|
|
import Data.Attoparsec.ByteString.Char8 (char, decimal, signed, isDigit_w8)
|
2015-05-10 19:23:38 +00:00
|
|
|
|
2019-01-02 16:26:07 +00:00
|
|
|
{- Parses how POSIXTime shows itself: "1431286201.113452s"
|
|
|
|
- (The "s" is included for historical reasons and is optional.)
|
|
|
|
- Also handles the format with no decimal seconds. -}
|
|
|
|
parserPOSIXTime :: A.Parser POSIXTime
|
|
|
|
parserPOSIXTime = mkPOSIXTime
|
|
|
|
<$> signed decimal
|
|
|
|
<*> (declen <|> pure (0, 0))
|
|
|
|
<* optional (char 's')
|
|
|
|
where
|
|
|
|
declen :: A.Parser (Integer, Int)
|
|
|
|
declen = do
|
|
|
|
_ <- char '.'
|
|
|
|
b <- A.takeWhile isDigit_w8
|
|
|
|
let len = B.length b
|
|
|
|
d <- either fail pure $
|
|
|
|
A.parseOnly (decimal <* A.endOfInput) b
|
|
|
|
return (d, len)
|
|
|
|
|
2015-05-10 19:23:38 +00:00
|
|
|
parsePOSIXTime :: String -> Maybe POSIXTime
|
2019-01-02 17:13:17 +00:00
|
|
|
parsePOSIXTime s = eitherToMaybe $
|
|
|
|
A.parseOnly (parserPOSIXTime <* A.endOfInput) (B8.pack s)
|
2015-05-10 19:36:58 +00:00
|
|
|
|
2019-01-02 16:26:07 +00:00
|
|
|
{- This implementation allows for higher precision in a POSIXTime than
|
|
|
|
- supported by the system's Double, and avoids the complications of
|
|
|
|
- floating point. -}
|
|
|
|
mkPOSIXTime :: Integer -> (Integer, Int) -> POSIXTime
|
|
|
|
mkPOSIXTime n (d, dlen)
|
|
|
|
| n < 0 = fromIntegral n - fromRational r
|
|
|
|
| otherwise = fromIntegral n + fromRational r
|
|
|
|
where
|
|
|
|
r = d % (10 ^ dlen)
|
|
|
|
|
2015-05-10 19:36:58 +00:00
|
|
|
formatPOSIXTime :: String -> POSIXTime -> String
|
|
|
|
formatPOSIXTime fmt t = formatTime defaultTimeLocale fmt (posixSecondsToUTCTime t)
|
Lower precision of timestamps in git-annex branch
This can reduce the size of the branch by up to 8%. My test was
running git-annex add 1000 times on one file each.
Lots of different high-resolution timestamps were recorded before
and eliminating those, after packing, the git repo was 8% smaller.
Due to the use of vector clocks, high resolution timestamps are
not necessary to make clear which information is most recent when
eg, a value is changed repeatedly in the same second. In such a
case, the vector clock will be advanced to the next second after
the last modification. For example, running
git-annex numcopies 1; git-annex numcopies 2
The first will record the current second, while the next records
the second after that even if it runs in the same second.
As for conflicting information written to two different clones of the
repository, this will make git-annex sometimes pick information that
was written earlier in a second over information written later in the
same second. Usually git-annex does not write conflicting information,
but there are some cases where it could. Eg, storing an object on a remote
can update the remote state log with some state. If two repos both store the
same object, and end up storing different remote state for some reason,
this can result in one that ran a tiny bit later winning. Such a situation
seems unlikely to be user visible. And a small amount of clock skew could
already result in such things.
The only case I can think of where this might be a user visible change
is if a configuration command like git-annex numcopies is being run
in 2 clones of a repository on the same machine at very
close to the same time. Then the user will know which they ran last,
and git-annex won't.
If that did become a problem, this could be dialed back to eg log
milliseconds with still some space saving.
2023-12-11 19:04:06 +00:00
|
|
|
|
|
|
|
{- Truncate the resolution to the specified number of decimal places. -}
|
|
|
|
truncateResolution :: Int -> POSIXTime -> POSIXTime
|
2023-12-27 19:33:46 +00:00
|
|
|
#if MIN_VERSION_time(1,9,1)
|
Lower precision of timestamps in git-annex branch
This can reduce the size of the branch by up to 8%. My test was
running git-annex add 1000 times on one file each.
Lots of different high-resolution timestamps were recorded before
and eliminating those, after packing, the git repo was 8% smaller.
Due to the use of vector clocks, high resolution timestamps are
not necessary to make clear which information is most recent when
eg, a value is changed repeatedly in the same second. In such a
case, the vector clock will be advanced to the next second after
the last modification. For example, running
git-annex numcopies 1; git-annex numcopies 2
The first will record the current second, while the next records
the second after that even if it runs in the same second.
As for conflicting information written to two different clones of the
repository, this will make git-annex sometimes pick information that
was written earlier in a second over information written later in the
same second. Usually git-annex does not write conflicting information,
but there are some cases where it could. Eg, storing an object on a remote
can update the remote state log with some state. If two repos both store the
same object, and end up storing different remote state for some reason,
this can result in one that ran a tiny bit later winning. Such a situation
seems unlikely to be user visible. And a small amount of clock skew could
already result in such things.
The only case I can think of where this might be a user visible change
is if a configuration command like git-annex numcopies is being run
in 2 clones of a repository on the same machine at very
close to the same time. Then the user will know which they ran last,
and git-annex won't.
If that did become a problem, this could be dialed back to eg log
milliseconds with still some space saving.
2023-12-11 19:04:06 +00:00
|
|
|
truncateResolution n t = secondsToNominalDiffTime $
|
|
|
|
fromIntegral ((truncate (nominalDiffTimeToSeconds t * d)) :: Integer) / d
|
|
|
|
where
|
|
|
|
d = 10 ^ n
|
2023-12-27 19:33:46 +00:00
|
|
|
#else
|
|
|
|
truncateResolution _ t = t
|
|
|
|
#endif
|