2011-12-12 21:38:46 +00:00
|
|
|
{- git-annex branch state management
|
|
|
|
-
|
2020-07-06 16:09:53 +00:00
|
|
|
- Runtime state about the git-annex branch, and a small cache.
|
2011-12-12 21:38:46 +00:00
|
|
|
-
|
share single BranchState amoung all threads
This fixes a problem when git-annex testremote is run against a cluster
accessed via the http server. Annex.Cluster uses the location log
to find nodes that contain a key when checking if the key is present or getting
it. Just after a key was stored to a cluster node, reading the location log
was not getting the UUID of that node.
Apparently the Annex action that wrote to the location log, and the one
that read from it were run with two different Annex states. The http server
does use several different Annex threads.
BranchState was part of the AnnexState, and so two threads could have
different BranchStates.
Moved BranchState to the AnnexRead, so all threads will see the common state.
This might possibly impact performance. If one thread is writing changes to the
branch, and another thread is reading from the branch, the writing thread will
now invalidate the BranchState's cache, which will cause the reading thread to
need to do extra work. But correctness is surely more important. If did is
found to have impacted performance, it could probably be dealt with by doing
smarter BranchState cache invalidation.
Another way this might impact performance is that the BranchState has a small
cache. If several threads were reading from the branch and relying on the value
they just read still being in the case, now a cache miss will be more likely.
Increasing the BranchState cache to the number of jobs might be a good
idea to amelorate that. But the cache is currently an innefficient list,
so making it large would need changes to the data types.
(Commit 4304f1b6aea19a5c402dc4f9d69aa4ff1c104c9b dealt with a follow-on
effect of the bug fixed here.)
2024-07-28 16:17:16 +00:00
|
|
|
- Copyright 2011-2024 Joey Hess <id@joeyh.name>
|
2011-12-12 21:38:46 +00:00
|
|
|
-
|
2019-03-13 19:48:14 +00:00
|
|
|
- Licensed under the GNU AGPL version 3 or higher.
|
2011-12-12 21:38:46 +00:00
|
|
|
-}
|
|
|
|
|
|
|
|
module Annex.BranchState where
|
|
|
|
|
2016-01-20 20:36:33 +00:00
|
|
|
import Annex.Common
|
2011-12-12 21:38:46 +00:00
|
|
|
import Types.BranchState
|
2021-12-28 17:23:32 +00:00
|
|
|
import Types.Transitions
|
2011-12-12 21:38:46 +00:00
|
|
|
import qualified Annex
|
cache one more log file for metadata
My worry was that a preferred content expression that matches on metadata
would have removed the location log from cache, causing an expensive
re-read when a Seek action later checked the location log.
Especially when the --all optimisation in the previous commit
pre-cached the location log.
This also means that the --all optimisation could cache the metadata log
too, if it wanted too, but not currently done.
The cache is a list, with the most recently accessed file first. That
optimises it for the common case of reading the same file twice, eg a
get, examine, followed by set reads it twice. And sync --content reads the
location log 3 times in a row commonly.
But, as a list, it should not be made to be too long. I thought about
expanding it to 5 items, but that seemed unlikely to be a win commonly
enough to outweigh the extra time spent checking the cache.
Clearly there could be some further benchmarking and tuning here.
2020-07-07 18:18:55 +00:00
|
|
|
import Logs
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
import qualified Git
|
2011-12-12 21:38:46 +00:00
|
|
|
|
share single BranchState amoung all threads
This fixes a problem when git-annex testremote is run against a cluster
accessed via the http server. Annex.Cluster uses the location log
to find nodes that contain a key when checking if the key is present or getting
it. Just after a key was stored to a cluster node, reading the location log
was not getting the UUID of that node.
Apparently the Annex action that wrote to the location log, and the one
that read from it were run with two different Annex states. The http server
does use several different Annex threads.
BranchState was part of the AnnexState, and so two threads could have
different BranchStates.
Moved BranchState to the AnnexRead, so all threads will see the common state.
This might possibly impact performance. If one thread is writing changes to the
branch, and another thread is reading from the branch, the writing thread will
now invalidate the BranchState's cache, which will cause the reading thread to
need to do extra work. But correctness is surely more important. If did is
found to have impacted performance, it could probably be dealt with by doing
smarter BranchState cache invalidation.
Another way this might impact performance is that the BranchState has a small
cache. If several threads were reading from the branch and relying on the value
they just read still being in the case, now a cache miss will be more likely.
Increasing the BranchState cache to the number of jobs might be a good
idea to amelorate that. But the cache is currently an innefficient list,
so making it large would need changes to the data types.
(Commit 4304f1b6aea19a5c402dc4f9d69aa4ff1c104c9b dealt with a follow-on
effect of the bug fixed here.)
2024-07-28 16:17:16 +00:00
|
|
|
import Control.Concurrent
|
2020-07-06 16:09:53 +00:00
|
|
|
import qualified Data.ByteString.Lazy as L
|
|
|
|
|
2011-12-12 21:38:46 +00:00
|
|
|
getState :: Annex BranchState
|
share single BranchState amoung all threads
This fixes a problem when git-annex testremote is run against a cluster
accessed via the http server. Annex.Cluster uses the location log
to find nodes that contain a key when checking if the key is present or getting
it. Just after a key was stored to a cluster node, reading the location log
was not getting the UUID of that node.
Apparently the Annex action that wrote to the location log, and the one
that read from it were run with two different Annex states. The http server
does use several different Annex threads.
BranchState was part of the AnnexState, and so two threads could have
different BranchStates.
Moved BranchState to the AnnexRead, so all threads will see the common state.
This might possibly impact performance. If one thread is writing changes to the
branch, and another thread is reading from the branch, the writing thread will
now invalidate the BranchState's cache, which will cause the reading thread to
need to do extra work. But correctness is surely more important. If did is
found to have impacted performance, it could probably be dealt with by doing
smarter BranchState cache invalidation.
Another way this might impact performance is that the BranchState has a small
cache. If several threads were reading from the branch and relying on the value
they just read still being in the case, now a cache miss will be more likely.
Increasing the BranchState cache to the number of jobs might be a good
idea to amelorate that. But the cache is currently an innefficient list,
so making it large would need changes to the data types.
(Commit 4304f1b6aea19a5c402dc4f9d69aa4ff1c104c9b dealt with a follow-on
effect of the bug fixed here.)
2024-07-28 16:17:16 +00:00
|
|
|
getState = do
|
|
|
|
v <- Annex.getRead Annex.branchstate
|
|
|
|
liftIO $ readMVar v
|
2011-12-12 21:38:46 +00:00
|
|
|
|
2012-01-14 18:31:16 +00:00
|
|
|
changeState :: (BranchState -> BranchState) -> Annex ()
|
share single BranchState amoung all threads
This fixes a problem when git-annex testremote is run against a cluster
accessed via the http server. Annex.Cluster uses the location log
to find nodes that contain a key when checking if the key is present or getting
it. Just after a key was stored to a cluster node, reading the location log
was not getting the UUID of that node.
Apparently the Annex action that wrote to the location log, and the one
that read from it were run with two different Annex states. The http server
does use several different Annex threads.
BranchState was part of the AnnexState, and so two threads could have
different BranchStates.
Moved BranchState to the AnnexRead, so all threads will see the common state.
This might possibly impact performance. If one thread is writing changes to the
branch, and another thread is reading from the branch, the writing thread will
now invalidate the BranchState's cache, which will cause the reading thread to
need to do extra work. But correctness is surely more important. If did is
found to have impacted performance, it could probably be dealt with by doing
smarter BranchState cache invalidation.
Another way this might impact performance is that the BranchState has a small
cache. If several threads were reading from the branch and relying on the value
they just read still being in the case, now a cache miss will be more likely.
Increasing the BranchState cache to the number of jobs might be a good
idea to amelorate that. But the cache is currently an innefficient list,
so making it large would need changes to the data types.
(Commit 4304f1b6aea19a5c402dc4f9d69aa4ff1c104c9b dealt with a follow-on
effect of the bug fixed here.)
2024-07-28 16:17:16 +00:00
|
|
|
changeState changer = do
|
|
|
|
v <- Annex.getRead Annex.branchstate
|
|
|
|
liftIO $ modifyMVar_ v $ return . changer
|
2012-01-14 18:31:16 +00:00
|
|
|
|
2012-01-14 16:07:36 +00:00
|
|
|
{- Runs an action to check that the index file exists, if it's not been
|
|
|
|
- checked before in this run of git-annex. -}
|
|
|
|
checkIndexOnce :: Annex () -> Annex ()
|
|
|
|
checkIndexOnce a = unlessM (indexChecked <$> getState) $ do
|
|
|
|
a
|
2012-01-14 18:31:16 +00:00
|
|
|
changeState $ \s -> s { indexChecked = True }
|
2012-01-14 16:07:36 +00:00
|
|
|
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
data UpdateMade
|
|
|
|
= UpdateMade
|
|
|
|
{ refsWereMerged :: Bool
|
|
|
|
, journalClean :: Bool
|
|
|
|
}
|
|
|
|
| UpdateFailedPermissions
|
|
|
|
{ refsUnmerged :: [Git.Sha]
|
2021-12-28 17:23:32 +00:00
|
|
|
, newTransitions :: [TransitionCalculator]
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
}
|
|
|
|
|
2011-12-12 21:38:46 +00:00
|
|
|
{- Runs an action to update the branch, if it's not been updated before
|
2020-04-15 17:04:34 +00:00
|
|
|
- in this run of git-annex.
|
|
|
|
-
|
2021-04-02 15:56:50 +00:00
|
|
|
- When interactive access is enabled, the journal is always checked when
|
|
|
|
- reading values from the branch, and so this does not need to update
|
|
|
|
- the branch.
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
-
|
|
|
|
- When the action leaves the journal clean, by staging anything that
|
|
|
|
- was in it, an optimisation is enabled: The journal does not need to
|
|
|
|
- be checked going forward, until new information gets written to it.
|
|
|
|
-
|
|
|
|
- When the action is unable to update the branch due to a permissions
|
2022-07-15 16:59:59 +00:00
|
|
|
- problem, the journal is still read every time.
|
2020-04-15 17:04:34 +00:00
|
|
|
-}
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
runUpdateOnce :: Annex UpdateMade -> Annex BranchState
|
|
|
|
runUpdateOnce update = do
|
2020-04-09 17:54:43 +00:00
|
|
|
st <- getState
|
2021-04-02 15:56:50 +00:00
|
|
|
if branchUpdated st || needInteractiveAccess st
|
2020-04-09 17:54:43 +00:00
|
|
|
then return st
|
|
|
|
else do
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
um <- update
|
|
|
|
let stf = case um of
|
|
|
|
UpdateMade {} -> \st' -> st'
|
|
|
|
{ branchUpdated = True
|
|
|
|
, journalIgnorable = journalClean um
|
|
|
|
}
|
|
|
|
UpdateFailedPermissions {} -> \st' -> st'
|
|
|
|
{ branchUpdated = True
|
|
|
|
, journalIgnorable = False
|
|
|
|
, unmergedRefs = refsUnmerged um
|
2021-12-28 17:23:32 +00:00
|
|
|
, unhandledTransitions = newTransitions um
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
, cachedFileContents = []
|
|
|
|
}
|
2020-04-09 17:54:43 +00:00
|
|
|
changeState stf
|
|
|
|
return (stf st)
|
2011-12-12 21:38:46 +00:00
|
|
|
|
|
|
|
{- Avoids updating the branch. A useful optimisation when the branch
|
|
|
|
- is known to have not changed, or git-annex won't be relying on info
|
2021-04-02 14:35:15 +00:00
|
|
|
- queried from it being as up-to-date as possible. -}
|
2011-12-12 21:38:46 +00:00
|
|
|
disableUpdate :: Annex ()
|
2012-01-14 18:31:16 +00:00
|
|
|
disableUpdate = changeState $ \s -> s { branchUpdated = True }
|
2020-04-09 17:54:43 +00:00
|
|
|
|
|
|
|
{- Called when a change is made to the journal. -}
|
|
|
|
journalChanged :: Annex ()
|
|
|
|
journalChanged = do
|
|
|
|
-- Optimisation: Typically journalIgnorable will already be True
|
|
|
|
-- (when one thing gets journalled, often other things do to),
|
|
|
|
-- so avoid an unnecessary write to the MVar that changeState
|
|
|
|
-- would do.
|
|
|
|
--
|
2022-07-15 17:43:46 +00:00
|
|
|
-- This assumes that another thread is not setting journalIgnorable
|
2020-04-09 17:54:43 +00:00
|
|
|
-- at the same time, but since runUpdateOnce is the only
|
2022-07-15 17:43:46 +00:00
|
|
|
-- thing that sets it, and it only runs once, that
|
2020-04-09 17:54:43 +00:00
|
|
|
-- should not happen.
|
|
|
|
st <- getState
|
|
|
|
when (journalIgnorable st) $
|
|
|
|
changeState $ \st' -> st' { journalIgnorable = False }
|
|
|
|
|
|
|
|
{- When git-annex is somehow interactive, eg in --batch mode,
|
|
|
|
- and needs to always notice changes made to the journal by other
|
|
|
|
- processes, this disables optimisations that avoid normally reading the
|
|
|
|
- journal.
|
2020-07-06 16:09:53 +00:00
|
|
|
-
|
|
|
|
- It also avoids using the cache, so changes committed by other processes
|
|
|
|
- will be seen.
|
2020-04-09 17:54:43 +00:00
|
|
|
-}
|
2020-07-06 16:09:53 +00:00
|
|
|
enableInteractiveBranchAccess :: Annex ()
|
2022-07-15 17:43:46 +00:00
|
|
|
enableInteractiveBranchAccess = changeState $ \s -> s
|
|
|
|
{ needInteractiveAccess = True
|
|
|
|
, journalIgnorable = False
|
|
|
|
}
|
2020-07-06 16:09:53 +00:00
|
|
|
|
|
|
|
setCache :: RawFilePath -> L.ByteString -> Annex ()
|
|
|
|
setCache file content = changeState $ \s -> s
|
cache one more log file for metadata
My worry was that a preferred content expression that matches on metadata
would have removed the location log from cache, causing an expensive
re-read when a Seek action later checked the location log.
Especially when the --all optimisation in the previous commit
pre-cached the location log.
This also means that the --all optimisation could cache the metadata log
too, if it wanted too, but not currently done.
The cache is a list, with the most recently accessed file first. That
optimises it for the common case of reading the same file twice, eg a
get, examine, followed by set reads it twice. And sync --content reads the
location log 3 times in a row commonly.
But, as a list, it should not be made to be too long. I thought about
expanding it to 5 items, but that seemed unlikely to be a win commonly
enough to outweigh the extra time spent checking the cache.
Clearly there could be some further benchmarking and tuning here.
2020-07-07 18:18:55 +00:00
|
|
|
{ cachedFileContents = add (cachedFileContents s) }
|
|
|
|
where
|
|
|
|
add l
|
|
|
|
| length l < logFilesToCache = (file, content) : l
|
|
|
|
| otherwise = (file, content) : Prelude.init l
|
2020-07-06 16:09:53 +00:00
|
|
|
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
getCache :: RawFilePath -> BranchState -> Maybe L.ByteString
|
|
|
|
getCache file state = go (cachedFileContents state)
|
2020-07-06 16:09:53 +00:00
|
|
|
where
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
go [] = Nothing
|
|
|
|
go ((f,c):rest)
|
cache one more log file for metadata
My worry was that a preferred content expression that matches on metadata
would have removed the location log from cache, causing an expensive
re-read when a Seek action later checked the location log.
Especially when the --all optimisation in the previous commit
pre-cached the location log.
This also means that the --all optimisation could cache the metadata log
too, if it wanted too, but not currently done.
The cache is a list, with the most recently accessed file first. That
optimises it for the common case of reading the same file twice, eg a
get, examine, followed by set reads it twice. And sync --content reads the
location log 3 times in a row commonly.
But, as a list, it should not be made to be too long. I thought about
expanding it to 5 items, but that seemed unlikely to be a win commonly
enough to outweigh the extra time spent checking the cache.
Clearly there could be some further benchmarking and tuning here.
2020-07-07 18:18:55 +00:00
|
|
|
| f == file && not (needInteractiveAccess state) = Just c
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
| otherwise = go rest
|
2020-07-06 16:09:53 +00:00
|
|
|
|
|
|
|
invalidateCache :: Annex ()
|
cache one more log file for metadata
My worry was that a preferred content expression that matches on metadata
would have removed the location log from cache, causing an expensive
re-read when a Seek action later checked the location log.
Especially when the --all optimisation in the previous commit
pre-cached the location log.
This also means that the --all optimisation could cache the metadata log
too, if it wanted too, but not currently done.
The cache is a list, with the most recently accessed file first. That
optimises it for the common case of reading the same file twice, eg a
get, examine, followed by set reads it twice. And sync --content reads the
location log 3 times in a row commonly.
But, as a list, it should not be made to be too long. I thought about
expanding it to 5 items, but that seemed unlikely to be a win commonly
enough to outweigh the extra time spent checking the cache.
Clearly there could be some further benchmarking and tuning here.
2020-07-07 18:18:55 +00:00
|
|
|
invalidateCache = changeState $ \s -> s { cachedFileContents = [] }
|