merge git-annex branch in memory in read-only repository

Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.

To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.

This manages that mostly. There are some exceptions:

- When there is a transition in one of the remote git-annex branches
  that has not yet been applied to the local or other git-annex branches.
  Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
  it will not be able to show information coming from the other, not yet
  merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
  unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
  also importfeed (but importfeed cannot work in a read-only repo
  anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
  Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog

These warts make this not suitable to be merged yet.

This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.

When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.

Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.

Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
Joey Hess 2021-12-26 14:28:42 -04:00
parent da6aa6e944
commit 6d7ecd9e5d
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
6 changed files with 124 additions and 82 deletions

View file

@ -2,7 +2,7 @@
-
- Runtime state about the git-annex branch, and a small cache.
-
- Copyright 2011-2020 Joey Hess <id@joeyh.name>
- Copyright 2011-2021 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@ -13,6 +13,7 @@ import Annex.Common
import Types.BranchState
import qualified Annex
import Logs
import qualified Git
import qualified Data.ByteString.Lazy as L
@ -30,29 +31,47 @@ checkIndexOnce a = unlessM (indexChecked <$> getState) $ do
a
changeState $ \s -> s { indexChecked = True }
data UpdateMade
= UpdateMade
{ refsWereMerged :: Bool
, journalClean :: Bool
}
| UpdateFailedPermissions
{ refsUnmerged :: [Git.Sha]
}
{- Runs an action to update the branch, if it's not been updated before
- in this run of git-annex.
-
- The action should return True if anything that was in the journal
- before got staged (or if the journal was empty). That lets an opmisation
- be done: The journal then does not need to be checked going forward,
- until new information gets written to it.
-
- When interactive access is enabled, the journal is always checked when
- reading values from the branch, and so this does not need to update
- the branch.
-
- When the action leaves the journal clean, by staging anything that
- was in it, an optimisation is enabled: The journal does not need to
- be checked going forward, until new information gets written to it.
-
- When the action is unable to update the branch due to a permissions
- problem,
-}
runUpdateOnce :: Annex Bool -> Annex BranchState
runUpdateOnce a = do
runUpdateOnce :: Annex UpdateMade -> Annex BranchState
runUpdateOnce update = do
st <- getState
if branchUpdated st || needInteractiveAccess st
then return st
else do
journalstaged <- a
let stf = \st' -> st'
{ branchUpdated = True
, journalIgnorable = journalstaged
}
um <- update
let stf = case um of
UpdateMade {} -> \st' -> st'
{ branchUpdated = True
, journalIgnorable = journalClean um
}
UpdateFailedPermissions {} -> \st' -> st'
{ branchUpdated = True
, journalIgnorable = False
, unmergedRefs = refsUnmerged um
, cachedFileContents = []
}
changeState stf
return (stf st)
@ -98,13 +117,13 @@ setCache file content = changeState $ \s -> s
| length l < logFilesToCache = (file, content) : l
| otherwise = (file, content) : Prelude.init l
getCache :: RawFilePath -> Annex (Maybe L.ByteString)
getCache file = (\st -> go (cachedFileContents st) st) <$> getState
getCache :: RawFilePath -> BranchState -> Maybe L.ByteString
getCache file state = go (cachedFileContents state)
where
go [] _ = Nothing
go ((f,c):rest) state
go [] = Nothing
go ((f,c):rest)
| f == file && not (needInteractiveAccess state) = Just c
| otherwise = go rest state
| otherwise = go rest
invalidateCache :: Annex ()
invalidateCache = changeState $ \s -> s { cachedFileContents = [] }