2011-12-12 21:38:46 +00:00
|
|
|
{- git-annex branch state management
|
|
|
|
-
|
2020-07-06 16:09:53 +00:00
|
|
|
- Runtime state about the git-annex branch, and a small cache.
|
2011-12-12 21:38:46 +00:00
|
|
|
-
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
- Copyright 2011-2021 Joey Hess <id@joeyh.name>
|
2011-12-12 21:38:46 +00:00
|
|
|
-
|
2019-03-13 19:48:14 +00:00
|
|
|
- Licensed under the GNU AGPL version 3 or higher.
|
2011-12-12 21:38:46 +00:00
|
|
|
-}
|
|
|
|
|
|
|
|
module Annex.BranchState where
|
|
|
|
|
2016-01-20 20:36:33 +00:00
|
|
|
import Annex.Common
|
2011-12-12 21:38:46 +00:00
|
|
|
import Types.BranchState
|
2021-12-28 17:23:32 +00:00
|
|
|
import Types.Transitions
|
2011-12-12 21:38:46 +00:00
|
|
|
import qualified Annex
|
cache one more log file for metadata
My worry was that a preferred content expression that matches on metadata
would have removed the location log from cache, causing an expensive
re-read when a Seek action later checked the location log.
Especially when the --all optimisation in the previous commit
pre-cached the location log.
This also means that the --all optimisation could cache the metadata log
too, if it wanted too, but not currently done.
The cache is a list, with the most recently accessed file first. That
optimises it for the common case of reading the same file twice, eg a
get, examine, followed by set reads it twice. And sync --content reads the
location log 3 times in a row commonly.
But, as a list, it should not be made to be too long. I thought about
expanding it to 5 items, but that seemed unlikely to be a win commonly
enough to outweigh the extra time spent checking the cache.
Clearly there could be some further benchmarking and tuning here.
2020-07-07 18:18:55 +00:00
|
|
|
import Logs
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
import qualified Git
|
2011-12-12 21:38:46 +00:00
|
|
|
|
2020-07-06 16:09:53 +00:00
|
|
|
import qualified Data.ByteString.Lazy as L
|
|
|
|
|
2011-12-12 21:38:46 +00:00
|
|
|
getState :: Annex BranchState
|
|
|
|
getState = Annex.getState Annex.branchstate
|
|
|
|
|
2012-01-14 18:31:16 +00:00
|
|
|
changeState :: (BranchState -> BranchState) -> Annex ()
|
2015-04-10 19:16:17 +00:00
|
|
|
changeState changer = Annex.changeState $ \s ->
|
|
|
|
s { Annex.branchstate = changer (Annex.branchstate s) }
|
2012-01-14 18:31:16 +00:00
|
|
|
|
2012-01-14 16:07:36 +00:00
|
|
|
{- Runs an action to check that the index file exists, if it's not been
|
|
|
|
- checked before in this run of git-annex. -}
|
|
|
|
checkIndexOnce :: Annex () -> Annex ()
|
|
|
|
checkIndexOnce a = unlessM (indexChecked <$> getState) $ do
|
|
|
|
a
|
2012-01-14 18:31:16 +00:00
|
|
|
changeState $ \s -> s { indexChecked = True }
|
2012-01-14 16:07:36 +00:00
|
|
|
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
data UpdateMade
|
|
|
|
= UpdateMade
|
|
|
|
{ refsWereMerged :: Bool
|
|
|
|
, journalClean :: Bool
|
|
|
|
}
|
|
|
|
| UpdateFailedPermissions
|
|
|
|
{ refsUnmerged :: [Git.Sha]
|
2021-12-28 17:23:32 +00:00
|
|
|
, newTransitions :: [TransitionCalculator]
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
}
|
|
|
|
|
2011-12-12 21:38:46 +00:00
|
|
|
{- Runs an action to update the branch, if it's not been updated before
|
2020-04-15 17:04:34 +00:00
|
|
|
- in this run of git-annex.
|
|
|
|
-
|
2021-04-02 15:56:50 +00:00
|
|
|
- When interactive access is enabled, the journal is always checked when
|
|
|
|
- reading values from the branch, and so this does not need to update
|
|
|
|
- the branch.
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
-
|
|
|
|
- When the action leaves the journal clean, by staging anything that
|
|
|
|
- was in it, an optimisation is enabled: The journal does not need to
|
|
|
|
- be checked going forward, until new information gets written to it.
|
|
|
|
-
|
|
|
|
- When the action is unable to update the branch due to a permissions
|
|
|
|
- problem,
|
2020-04-15 17:04:34 +00:00
|
|
|
-}
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
runUpdateOnce :: Annex UpdateMade -> Annex BranchState
|
|
|
|
runUpdateOnce update = do
|
2020-04-09 17:54:43 +00:00
|
|
|
st <- getState
|
2021-04-02 15:56:50 +00:00
|
|
|
if branchUpdated st || needInteractiveAccess st
|
2020-04-09 17:54:43 +00:00
|
|
|
then return st
|
|
|
|
else do
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
um <- update
|
|
|
|
let stf = case um of
|
|
|
|
UpdateMade {} -> \st' -> st'
|
|
|
|
{ branchUpdated = True
|
|
|
|
, journalIgnorable = journalClean um
|
|
|
|
}
|
|
|
|
UpdateFailedPermissions {} -> \st' -> st'
|
|
|
|
{ branchUpdated = True
|
|
|
|
, journalIgnorable = False
|
|
|
|
, unmergedRefs = refsUnmerged um
|
2021-12-28 17:23:32 +00:00
|
|
|
, unhandledTransitions = newTransitions um
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
, cachedFileContents = []
|
|
|
|
}
|
2020-04-09 17:54:43 +00:00
|
|
|
changeState stf
|
|
|
|
return (stf st)
|
2011-12-12 21:38:46 +00:00
|
|
|
|
|
|
|
{- Avoids updating the branch. A useful optimisation when the branch
|
|
|
|
- is known to have not changed, or git-annex won't be relying on info
|
2021-04-02 14:35:15 +00:00
|
|
|
- queried from it being as up-to-date as possible. -}
|
2011-12-12 21:38:46 +00:00
|
|
|
disableUpdate :: Annex ()
|
2012-01-14 18:31:16 +00:00
|
|
|
disableUpdate = changeState $ \s -> s { branchUpdated = True }
|
2020-04-09 17:54:43 +00:00
|
|
|
|
|
|
|
{- Called when a change is made to the journal. -}
|
|
|
|
journalChanged :: Annex ()
|
|
|
|
journalChanged = do
|
|
|
|
-- Optimisation: Typically journalIgnorable will already be True
|
|
|
|
-- (when one thing gets journalled, often other things do to),
|
|
|
|
-- so avoid an unnecessary write to the MVar that changeState
|
|
|
|
-- would do.
|
|
|
|
--
|
|
|
|
-- This assumes that another thread is not changing journalIgnorable
|
|
|
|
-- at the same time, but since runUpdateOnce is the only
|
|
|
|
-- thing that changes it, and it only runs once, that
|
|
|
|
-- should not happen.
|
|
|
|
st <- getState
|
|
|
|
when (journalIgnorable st) $
|
|
|
|
changeState $ \st' -> st' { journalIgnorable = False }
|
|
|
|
|
|
|
|
{- When git-annex is somehow interactive, eg in --batch mode,
|
|
|
|
- and needs to always notice changes made to the journal by other
|
|
|
|
- processes, this disables optimisations that avoid normally reading the
|
|
|
|
- journal.
|
2020-07-06 16:09:53 +00:00
|
|
|
-
|
|
|
|
- It also avoids using the cache, so changes committed by other processes
|
|
|
|
- will be seen.
|
2020-04-09 17:54:43 +00:00
|
|
|
-}
|
2020-07-06 16:09:53 +00:00
|
|
|
enableInteractiveBranchAccess :: Annex ()
|
|
|
|
enableInteractiveBranchAccess = changeState $
|
|
|
|
\s -> s { needInteractiveAccess = True }
|
|
|
|
|
|
|
|
setCache :: RawFilePath -> L.ByteString -> Annex ()
|
|
|
|
setCache file content = changeState $ \s -> s
|
cache one more log file for metadata
My worry was that a preferred content expression that matches on metadata
would have removed the location log from cache, causing an expensive
re-read when a Seek action later checked the location log.
Especially when the --all optimisation in the previous commit
pre-cached the location log.
This also means that the --all optimisation could cache the metadata log
too, if it wanted too, but not currently done.
The cache is a list, with the most recently accessed file first. That
optimises it for the common case of reading the same file twice, eg a
get, examine, followed by set reads it twice. And sync --content reads the
location log 3 times in a row commonly.
But, as a list, it should not be made to be too long. I thought about
expanding it to 5 items, but that seemed unlikely to be a win commonly
enough to outweigh the extra time spent checking the cache.
Clearly there could be some further benchmarking and tuning here.
2020-07-07 18:18:55 +00:00
|
|
|
{ cachedFileContents = add (cachedFileContents s) }
|
|
|
|
where
|
|
|
|
add l
|
|
|
|
| length l < logFilesToCache = (file, content) : l
|
|
|
|
| otherwise = (file, content) : Prelude.init l
|
2020-07-06 16:09:53 +00:00
|
|
|
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
getCache :: RawFilePath -> BranchState -> Maybe L.ByteString
|
|
|
|
getCache file state = go (cachedFileContents state)
|
2020-07-06 16:09:53 +00:00
|
|
|
where
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
go [] = Nothing
|
|
|
|
go ((f,c):rest)
|
cache one more log file for metadata
My worry was that a preferred content expression that matches on metadata
would have removed the location log from cache, causing an expensive
re-read when a Seek action later checked the location log.
Especially when the --all optimisation in the previous commit
pre-cached the location log.
This also means that the --all optimisation could cache the metadata log
too, if it wanted too, but not currently done.
The cache is a list, with the most recently accessed file first. That
optimises it for the common case of reading the same file twice, eg a
get, examine, followed by set reads it twice. And sync --content reads the
location log 3 times in a row commonly.
But, as a list, it should not be made to be too long. I thought about
expanding it to 5 items, but that seemed unlikely to be a win commonly
enough to outweigh the extra time spent checking the cache.
Clearly there could be some further benchmarking and tuning here.
2020-07-07 18:18:55 +00:00
|
|
|
| f == file && not (needInteractiveAccess state) = Just c
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
| otherwise = go rest
|
2020-07-06 16:09:53 +00:00
|
|
|
|
|
|
|
invalidateCache :: Annex ()
|
cache one more log file for metadata
My worry was that a preferred content expression that matches on metadata
would have removed the location log from cache, causing an expensive
re-read when a Seek action later checked the location log.
Especially when the --all optimisation in the previous commit
pre-cached the location log.
This also means that the --all optimisation could cache the metadata log
too, if it wanted too, but not currently done.
The cache is a list, with the most recently accessed file first. That
optimises it for the common case of reading the same file twice, eg a
get, examine, followed by set reads it twice. And sync --content reads the
location log 3 times in a row commonly.
But, as a list, it should not be made to be too long. I thought about
expanding it to 5 items, but that seemed unlikely to be a win commonly
enough to outweigh the extra time spent checking the cache.
Clearly there could be some further benchmarking and tuning here.
2020-07-07 18:18:55 +00:00
|
|
|
invalidateCache = changeState $ \s -> s { cachedFileContents = [] }
|