git-annex/Logs/ContentIdentifier.hs
Joey Hess 9828f45d85
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:

* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
  could in theory generate the same content identifier for two different
  peices of content

While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.

External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.

Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 13:51:42 -04:00

48 lines
1.5 KiB
Haskell

{- Remote content identifier logs.
-
- Copyright 2019 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
module Logs.ContentIdentifier (
module X,
recordContentIdentifier,
getContentIdentifiers,
) where
import Annex.Common
import Logs
import Logs.MapLog
import Types.Import
import Types.RemoteState
import qualified Annex.Branch
import Logs.ContentIdentifier.Pure as X
import qualified Annex
import qualified Data.Map as M
import Data.List.NonEmpty (NonEmpty(..))
import qualified Data.List.NonEmpty as NonEmpty
-- | Records a remote's content identifier and the key that it corresponds to.
--
-- A remote may use multiple content identifiers for the same key over time,
-- so ones that were recorded before are preserved.
recordContentIdentifier :: RemoteStateHandle -> ContentIdentifier -> Key -> Annex ()
recordContentIdentifier (RemoteStateHandle u) cid k = do
c <- liftIO currentVectorClock
config <- Annex.getGitConfig
Annex.Branch.change (remoteContentIdentifierLogFile config k) $
buildLog . addcid c . parseLog
where
addcid c l = changeMapLog c u (cid :| contentIdentifierList (M.lookup u m)) l
where
m = simpleMap l
-- | Get all known content identifiers for a key.
getContentIdentifiers :: Key -> Annex [(RemoteStateHandle, [ContentIdentifier])]
getContentIdentifiers k = do
config <- Annex.getGitConfig
map (\(u, l) -> (RemoteStateHandle u, NonEmpty.toList l) )
. M.toList . simpleMap . parseLog
<$> Annex.Branch.get (remoteContentIdentifierLogFile config k)