2011-06-22 19:58:30 +00:00
|
|
|
{- git-annex BranchState data type
|
|
|
|
-
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
- Copyright 2011-2021 Joey Hess <id@joeyh.name>
|
2011-06-22 19:58:30 +00:00
|
|
|
-
|
2019-03-13 19:48:14 +00:00
|
|
|
- Licensed under the GNU AGPL version 3 or higher.
|
2011-06-22 19:58:30 +00:00
|
|
|
-}
|
|
|
|
|
|
|
|
module Types.BranchState where
|
|
|
|
|
2020-07-06 16:09:53 +00:00
|
|
|
import Common
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
import qualified Git
|
2021-12-28 17:23:32 +00:00
|
|
|
import Types.Transitions
|
2020-07-06 16:09:53 +00:00
|
|
|
|
|
|
|
import qualified Data.ByteString.Lazy as L
|
|
|
|
|
2012-12-20 03:41:54 +00:00
|
|
|
data BranchState = BranchState
|
2020-04-09 17:54:43 +00:00
|
|
|
{ branchUpdated :: Bool
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
-- ^ has the branch been updated this run? (Or an update tried and
|
|
|
|
-- failed due to permissions.)
|
2020-04-09 17:54:43 +00:00
|
|
|
, indexChecked :: Bool
|
|
|
|
-- ^ has the index file been checked to exist?
|
|
|
|
, journalIgnorable :: Bool
|
|
|
|
-- ^ can reading the journal be skipped, while still getting
|
|
|
|
-- sufficiently up-to-date information from the branch?
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
, unmergedRefs :: [Git.Sha]
|
|
|
|
-- ^ when the branch was not able to be updated due to permissions,
|
|
|
|
-- these other git refs contain unmerged information and need to be
|
|
|
|
-- queried, along with the index and the journal.
|
2021-12-28 17:23:32 +00:00
|
|
|
, unhandledTransitions :: [TransitionCalculator]
|
|
|
|
-- ^ when the branch was not able to be updated due to permissions,
|
|
|
|
-- this is transitions that need to be applied when making queries.
|
cache one more log file for metadata
My worry was that a preferred content expression that matches on metadata
would have removed the location log from cache, causing an expensive
re-read when a Seek action later checked the location log.
Especially when the --all optimisation in the previous commit
pre-cached the location log.
This also means that the --all optimisation could cache the metadata log
too, if it wanted too, but not currently done.
The cache is a list, with the most recently accessed file first. That
optimises it for the common case of reading the same file twice, eg a
get, examine, followed by set reads it twice. And sync --content reads the
location log 3 times in a row commonly.
But, as a list, it should not be made to be too long. I thought about
expanding it to 5 items, but that seemed unlikely to be a win commonly
enough to outweigh the extra time spent checking the cache.
Clearly there could be some further benchmarking and tuning here.
2020-07-07 18:18:55 +00:00
|
|
|
, cachedFileContents :: [(RawFilePath, L.ByteString)]
|
|
|
|
-- ^ contents of a few files recently read from the branch
|
2020-07-06 16:09:53 +00:00
|
|
|
, needInteractiveAccess :: Bool
|
|
|
|
-- ^ do new changes written to the journal or branch by another
|
|
|
|
-- process need to be noticed while the current process is running?
|
|
|
|
-- (This makes the journal always be read, and avoids using the
|
|
|
|
-- cache.)
|
2024-05-15 21:33:38 +00:00
|
|
|
, alternateJournal :: Maybe RawFilePath
|
|
|
|
-- ^ use this directory for all journals, rather than the
|
|
|
|
-- gitAnnexJournalDir and gitAnnexPrivateJournalDir.
|
2012-12-20 03:41:54 +00:00
|
|
|
}
|
2011-06-22 19:58:30 +00:00
|
|
|
|
|
|
|
startBranchState :: BranchState
|
2024-05-15 21:33:38 +00:00
|
|
|
startBranchState = BranchState False False False [] [] [] False Nothing
|