git-annex/Logs/ContentIdentifier.hs
Joey Hess db89e39df6
partially fix concurrency issue in updating the rollingtotal
It's possible for two processes or threads to both be doing the same
operation at the same time. Eg, both dropping the same key. If one
finishes and updates the rollingtotal, then the other one needs to be
prevented from later updating the rollingtotal as well. And they could
finish at the same time, or with some time in between.

Addressed this by making updateRepoSize be called with the journal
locked, and only once it's been determined that there is an actual
location change to record in the log. updateRepoSize waits for the
database to be updated.

When there is a redundant operation, updateRepoSize won't be called,
and the redundant LiveUpdate will be removed from the database on
garbage collection.

But: There will be a window where the redundant LiveUpdate is still
visible in the db, and processes can see it, combine it with the
rollingtotal, and arrive at the wrong size. This is a small window, but
it still ought to be addressed. Unsure if it would always be safe to
remove the redundant LiveUpdate? Consider the case where two drops and a
get are all running concurrently somehow, and the order they finish is
[drop, get, drop]. The second drop seems redundant to the first, but
it would not be safe to remove it. While this seems unlikely, it's hard
to rule out that a get and drop at different stages can both be running
at the same time.
2024-08-26 09:43:32 -04:00

55 lines
1.6 KiB
Haskell

{- Remote content identifier logs.
-
- Copyright 2019 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
module Logs.ContentIdentifier (
module X,
recordContentIdentifier,
getContentIdentifiers,
) where
import Annex.Common
import Logs
import Logs.MapLog
import Types.Import
import Types.RemoteState
import qualified Annex.Branch
import Logs.ContentIdentifier.Pure as X
import qualified Annex
import qualified Data.Map as M
import Data.List.NonEmpty (NonEmpty(..))
import qualified Data.List.NonEmpty as NonEmpty
-- | Records a remote's content identifier and the key that it corresponds to.
--
-- A remote may use multiple content identifiers for the same key over time,
-- so ones that were recorded before are preserved.
recordContentIdentifier :: RemoteStateHandle -> ContentIdentifier -> Key -> Annex ()
recordContentIdentifier (RemoteStateHandle u) cid k = do
c <- currentVectorClock
config <- Annex.getGitConfig
void $ Annex.Branch.maybeChange
(Annex.Branch.RegardingUUID [u])
(remoteContentIdentifierLogFile config k)
(addcid c . parseLog)
noop
where
addcid c v
| cid `elem` l = Nothing -- no change needed
| otherwise = Just $ buildLog $
changeMapLog c u (cid :| l) v
where
m = simpleMap v
l = contentIdentifierList (M.lookup u m)
-- | Get all known content identifiers for a key.
getContentIdentifiers :: Key -> Annex [(RemoteStateHandle, [ContentIdentifier])]
getContentIdentifiers k = do
config <- Annex.getGitConfig
map (\(u, l) -> (RemoteStateHandle u, NonEmpty.toList l) )
. M.toList . simpleMap . parseLog
<$> Annex.Branch.get (remoteContentIdentifierLogFile config k)