2011-06-21 20:08:09 +00:00
|
|
|
{- management of the git-annex branch
|
code to update a git-annex branch
There is no suitable git hook to run code when pulling changes that
might need to be merged into the git-annex branch. The post-merge hook
is only run when changes are merged into HEAD, and it's possible,
and indeed likely that many pulls will only have changes in git-annex,
but not in HEAD, and not trigger it.
So, git-annex will have to take care to update the branch before reading
from it, to make sure it has merged in current info from remotes. Happily,
this can be done quite inexpensively, just a git-show-ref to list
branches, and a minimalized git-log to see if there are unmerged changes
on the branches. To further speed up, it will be done only once per
git-annex run, max.
2011-06-21 18:29:09 +00:00
|
|
|
-
|
2024-08-13 16:42:04 +00:00
|
|
|
- Copyright 2011-2024 Joey Hess <id@joeyh.name>
|
code to update a git-annex branch
There is no suitable git hook to run code when pulling changes that
might need to be merged into the git-annex branch. The post-merge hook
is only run when changes are merged into HEAD, and it's possible,
and indeed likely that many pulls will only have changes in git-annex,
but not in HEAD, and not trigger it.
So, git-annex will have to take care to update the branch before reading
from it, to make sure it has merged in current info from remotes. Happily,
this can be done quite inexpensively, just a git-show-ref to list
branches, and a minimalized git-log to see if there are unmerged changes
on the branches. To further speed up, it will be done only once per
git-annex run, max.
2011-06-21 18:29:09 +00:00
|
|
|
-
|
2019-03-13 19:48:14 +00:00
|
|
|
- Licensed under the GNU AGPL version 3 or higher.
|
code to update a git-annex branch
There is no suitable git hook to run code when pulling changes that
might need to be merged into the git-annex branch. The post-merge hook
is only run when changes are merged into HEAD, and it's possible,
and indeed likely that many pulls will only have changes in git-annex,
but not in HEAD, and not trigger it.
So, git-annex will have to take care to update the branch before reading
from it, to make sure it has merged in current info from remotes. Happily,
this can be done quite inexpensively, just a git-show-ref to list
branches, and a minimalized git-log to see if there are unmerged changes
on the branches. To further speed up, it will be done only once per
git-annex run, max.
2011-06-21 18:29:09 +00:00
|
|
|
-}
|
|
|
|
|
2020-04-07 21:41:09 +00:00
|
|
|
{-# LANGUAGE OverloadedStrings #-}
|
|
|
|
|
2011-10-04 04:40:47 +00:00
|
|
|
module Annex.Branch (
|
2012-01-06 19:40:04 +00:00
|
|
|
fullname,
|
2011-12-13 01:12:51 +00:00
|
|
|
name,
|
|
|
|
hasOrigin,
|
|
|
|
hasSibling,
|
2011-12-30 19:57:28 +00:00
|
|
|
siblingBranches,
|
2011-06-22 19:58:30 +00:00
|
|
|
create,
|
2023-12-07 19:50:52 +00:00
|
|
|
getBranch,
|
2020-04-15 17:04:34 +00:00
|
|
|
UpdateMade(..),
|
2011-06-21 20:08:09 +00:00
|
|
|
update,
|
2011-12-30 19:57:28 +00:00
|
|
|
forceUpdate,
|
|
|
|
updateTo,
|
2011-06-21 23:11:55 +00:00
|
|
|
get,
|
2014-02-06 16:43:56 +00:00
|
|
|
getHistorical,
|
2024-08-14 17:46:44 +00:00
|
|
|
getRef,
|
2021-12-27 18:08:50 +00:00
|
|
|
getUnmergedRefs,
|
start implementing hidden git-annex repositories
This adds a separate journal, which does not currently get committed to
an index, but is planned to be committed to .git/annex/index-private.
Changes that are regarding a UUID that is private will get written to
this journal, and so will not be published into the git-annex branch.
All log writing should have been made to indicate the UUID it's
regarding, though I've not verified this yet.
Currently, no UUIDs are treated as private yet, a way to configure that
is needed.
The implementation is careful to not add any additional IO work when
privateUUIDsKnown is False. It will skip looking at the private journal
at all. So this should be free, or nearly so, unless the feature is
used. When it is used, all branch reads will be about twice as expensive.
It is very lucky -- or very prudent design -- that Annex.Branch.change
and maybeChange are the only ways to change a file on the branch,
and Annex.Branch.set is only internal use. That let Annex.Branch.get
always yield any private information that has been recorded, without
the risk that Annex.Branch.set might be called, with a non-private UUID,
and end up leaking the private information into the git-annex branch.
And, this relies on the way git-annex union merges the git-annex branch.
When reading a file, there can be a public and a private version, and
they are just concacenated together. That will be handled the same as if
there were two diverged git-annex branches that got union merged.
2021-04-20 18:32:41 +00:00
|
|
|
RegardingUUID(..),
|
2011-06-21 23:11:55 +00:00
|
|
|
change,
|
split out appending to journal from writing, high level only
Currently this is not an improvement, but it allows for optimising
appendJournalFile later. With an optimised appendJournalFile, this will
greatly speed up access patterns like git-annex addurl of a lot of urls
to the same key, where the log file can grow rather large. Appending
rather than re-writing the journal file for each line can save a lot of
disk writes.
It still has to read the current journal or branch file, to check
if it can append to it, and so when the journal file does not exist yet,
it can write the old content from the branch to it. Probably the re-reads
are better cached by the filesystem than repeated writes. (If the
re-reads turn out to keep performance bad, they could be eliminated, at
the cost of not being able to compact the log when replacing old
information in it. That could be enabled by a switch.)
While the immediate need is to affect addurl writes, it was implemented
at the level of presence logs, so will also perhaps speed up location logs.
The only added overhead is the call to isNewInfo, which only needs to
compare ByteStrings. Helping to balance that out, it avoids compactLog
when it's able to append.
Sponsored-by: Dartmouth College's DANDI project
2022-07-18 17:22:50 +00:00
|
|
|
ChangeOrAppend(..),
|
|
|
|
changeOrAppend,
|
2015-10-12 18:46:28 +00:00
|
|
|
maybeChange,
|
2018-08-02 18:06:06 +00:00
|
|
|
commitMessage,
|
2021-05-17 17:07:47 +00:00
|
|
|
createMessage,
|
2011-06-22 21:47:06 +00:00
|
|
|
commit,
|
2013-10-23 16:58:01 +00:00
|
|
|
forceCommit,
|
2011-06-24 15:59:34 +00:00
|
|
|
files,
|
2019-02-22 15:16:22 +00:00
|
|
|
rememberTreeish,
|
2013-08-28 19:57:42 +00:00
|
|
|
performTransitions,
|
2017-09-12 22:30:36 +00:00
|
|
|
withIndex,
|
2021-04-21 18:02:15 +00:00
|
|
|
precache,
|
2024-08-13 16:42:04 +00:00
|
|
|
UnmergedBranches(..),
|
2024-08-14 20:04:18 +00:00
|
|
|
FileContents,
|
2021-04-21 18:19:58 +00:00
|
|
|
overBranchFileContents,
|
2024-08-14 07:19:30 +00:00
|
|
|
overJournalFileContents,
|
2024-08-14 17:46:44 +00:00
|
|
|
combineStaleJournalWithBranch,
|
sqlite datbase for importfeed
importfeed: Use caching database to avoid needing to list urls on every
run, and avoid using too much memory.
Benchmarking in my podcasts repo, importfeed got 1.42 seconds faster,
and memory use dropped from 203000k to 59408k.
Database.ImportFeed is Database.ContentIdentifier with the serial number
filed off. There is a bit of code duplication I would like to avoid,
particularly recordAnnexBranchTree, and getAnnexBranchTree. But these use
the persistent sqlite tables, so despite the code being the same, they
cannot be factored out.
Since this database includes the contentidentifier metadata, it will be
slightly redundant if a sqlite database is ever added for metadata. I
did consider making such a generic database and using it for this. But,
that would then need importfeed to update both the url database and the
metadata database, which is twice as much work diffing the git-annex
branch trees. Or would entagle updating two databases in a complex way.
So instead it seems better to optimise the database that
importfeed needs, and if the metadata database is used by another command,
use a little more disk space and do a little bit of redundant work to
update it.
Sponsored-by: unqueued on Patreon
2023-10-23 20:12:26 +00:00
|
|
|
updatedFromTree,
|
2011-06-21 20:08:09 +00:00
|
|
|
) where
|
code to update a git-annex branch
There is no suitable git hook to run code when pulling changes that
might need to be merged into the git-annex branch. The post-merge hook
is only run when changes are merged into HEAD, and it's possible,
and indeed likely that many pulls will only have changes in git-annex,
but not in HEAD, and not trigger it.
So, git-annex will have to take care to update the branch before reading
from it, to make sure it has merged in current info from remotes. Happily,
this can be done quite inexpensively, just a git-show-ref to list
branches, and a minimalized git-log to see if there are unmerged changes
on the branches. To further speed up, it will be done only once per
git-annex run, max.
2011-06-21 18:29:09 +00:00
|
|
|
|
2020-04-07 17:27:11 +00:00
|
|
|
import qualified Data.ByteString as B
|
Fix encoding of data written to git-annex branch. Avoid truncating unicode characters to 8 bits.
Allow any encoding to be used, as with filenames (but utf8 is the sane
choice). Affects metadata and repository descriptions, and preferred
content expressions.
The question of what's the right encoding for the git-annex branch is a
vexing one. utf-8 would be a nice choice, but this leaves the possibility
of bad data getting into a git-annex branch somehow, and this resulting in
git-annex crashing with encoding errors, which is a failure mode I want to
avoid.
(Also, preferred content expressions can refer to filenames, and filenames
can have any encoding, so limiting to utf-8 would not be ideal.)
The union merge code already took care to not assume any encoding for a
file. Except it assumes that any \n is a literal newline, and not part of
some encoding of a character that happens to contain a newline. (At least
utf-8 avoids using newline for anything except liternal newlines.)
Adapted the git-annex branch code to use this same approach.
Note that there is a potential interop problem with Windows, since
FileSystemEncoding doesn't work there, and instead things are always
decoded as utf-8. If someone uses non-utf8 encoding for data on the
git-annex branch, this can lead to an encoding error on windows. However,
this commit doesn't actually make that any worse, because the union merge
code would similarly fail with an encoding error on windows in that
situation.
This commit was sponsored by Kyle Meyer.
2014-05-27 18:16:33 +00:00
|
|
|
import qualified Data.ByteString.Lazy as L
|
2013-08-28 19:57:42 +00:00
|
|
|
import qualified Data.Set as S
|
|
|
|
import qualified Data.Map as M
|
2016-07-17 16:11:05 +00:00
|
|
|
import Data.Function
|
2017-05-16 03:32:17 +00:00
|
|
|
import Data.Char
|
2019-01-09 18:10:05 +00:00
|
|
|
import Data.ByteString.Builder
|
2015-02-09 22:34:48 +00:00
|
|
|
import Control.Concurrent (threadDelay)
|
2021-04-21 19:40:32 +00:00
|
|
|
import Control.Concurrent.MVar
|
2020-10-29 18:20:57 +00:00
|
|
|
import qualified System.FilePath.ByteString as P
|
2023-10-10 17:22:02 +00:00
|
|
|
import System.PosixCompat.Files (isRegularFile)
|
code to update a git-annex branch
There is no suitable git hook to run code when pulling changes that
might need to be merged into the git-annex branch. The post-merge hook
is only run when changes are merged into HEAD, and it's possible,
and indeed likely that many pulls will only have changes in git-annex,
but not in HEAD, and not trigger it.
So, git-annex will have to take care to update the branch before reading
from it, to make sure it has merged in current info from remotes. Happily,
this can be done quite inexpensively, just a git-show-ref to list
branches, and a minimalized git-log to see if there are unmerged changes
on the branches. To further speed up, it will be done only once per
git-annex run, max.
2011-06-21 18:29:09 +00:00
|
|
|
|
2024-03-26 17:16:33 +00:00
|
|
|
import Annex.Common
|
2020-04-09 17:54:43 +00:00
|
|
|
import Types.BranchState
|
2011-12-12 21:38:46 +00:00
|
|
|
import Annex.BranchState
|
2011-12-12 22:03:28 +00:00
|
|
|
import Annex.Journal
|
2016-04-06 19:33:29 +00:00
|
|
|
import Annex.GitOverlay
|
2019-01-17 19:40:44 +00:00
|
|
|
import Annex.Tmp
|
2011-06-30 17:16:57 +00:00
|
|
|
import qualified Git
|
2011-12-14 19:56:11 +00:00
|
|
|
import qualified Git.Command
|
2011-12-12 22:23:24 +00:00
|
|
|
import qualified Git.Ref
|
2015-03-26 15:15:15 +00:00
|
|
|
import qualified Git.RefLog
|
2013-08-28 19:57:42 +00:00
|
|
|
import qualified Git.Sha
|
2011-12-13 01:12:51 +00:00
|
|
|
import qualified Git.Branch
|
2011-12-13 01:24:55 +00:00
|
|
|
import qualified Git.UnionMerge
|
2012-06-06 04:03:08 +00:00
|
|
|
import qualified Git.UpdateIndex
|
2017-09-12 22:30:36 +00:00
|
|
|
import qualified Git.Tree
|
2019-02-21 21:32:59 +00:00
|
|
|
import qualified Git.LsTree
|
2015-07-06 18:21:43 +00:00
|
|
|
import Git.LsTree (lsTreeParams)
|
2016-03-14 20:29:19 +00:00
|
|
|
import qualified Git.HashObject
|
|
|
|
import Annex.HashObject
|
2020-04-07 21:41:09 +00:00
|
|
|
import Git.Types (Ref(..), fromRef, fromRef', RefDate, TreeItemType(..))
|
2012-06-06 18:29:10 +00:00
|
|
|
import Git.FilePath
|
2011-10-04 04:40:47 +00:00
|
|
|
import Annex.CatFile
|
2021-04-21 18:19:58 +00:00
|
|
|
import Git.CatFile (catObjectStreamLsTree)
|
2012-04-21 20:59:49 +00:00
|
|
|
import Annex.Perms
|
2013-08-31 21:38:33 +00:00
|
|
|
import Logs
|
2013-08-28 19:57:42 +00:00
|
|
|
import Logs.Transitions
|
2018-01-04 18:46:58 +00:00
|
|
|
import Logs.File
|
2013-08-31 21:38:33 +00:00
|
|
|
import Logs.Trust.Pure
|
2019-10-14 19:38:07 +00:00
|
|
|
import Logs.Remote.Pure
|
2021-04-13 19:00:23 +00:00
|
|
|
import Logs.Export.Pure
|
2015-01-27 21:38:06 +00:00
|
|
|
import Logs.Difference.Pure
|
2013-08-31 21:38:33 +00:00
|
|
|
import qualified Annex.Queue
|
2021-12-28 17:23:32 +00:00
|
|
|
import Types.Transitions
|
2013-08-31 21:38:33 +00:00
|
|
|
import Annex.Branch.Transitions
|
2015-01-27 21:38:06 +00:00
|
|
|
import qualified Annex
|
2015-03-20 18:52:58 +00:00
|
|
|
import Annex.Hook
|
2017-12-31 20:08:31 +00:00
|
|
|
import Utility.Directory.Stream
|
2021-08-30 17:05:02 +00:00
|
|
|
import Utility.Tmp
|
2020-11-05 22:45:37 +00:00
|
|
|
import qualified Utility.RawFilePath as R
|
code to update a git-annex branch
There is no suitable git hook to run code when pulling changes that
might need to be merged into the git-annex branch. The post-merge hook
is only run when changes are merged into HEAD, and it's possible,
and indeed likely that many pulls will only have changes in git-annex,
but not in HEAD, and not trigger it.
So, git-annex will have to take care to update the branch before reading
from it, to make sure it has merged in current info from remotes. Happily,
this can be done quite inexpensively, just a git-show-ref to list
branches, and a minimalized git-log to see if there are unmerged changes
on the branches. To further speed up, it will be done only once per
git-annex run, max.
2011-06-21 18:29:09 +00:00
|
|
|
|
2011-06-21 21:39:45 +00:00
|
|
|
{- Name of the branch that is used to store git-annex's information. -}
|
improve type signatures with a Ref newtype
In git, a Ref can be a Sha, or a Branch, or a Tag. I added type aliases for
those. Note that this does not prevent mixing up of eg, refs and branches
at the type level. Since git really doesn't care, except rare cases like
git update-ref, or git tag -d, that seems ok for now.
There's also a tree-ish, but let's just use Ref for it. A given Sha or Ref
may or may not be a tree-ish, depending on the object type, so there seems
no point in trying to represent it at the type level.
2011-11-16 06:23:34 +00:00
|
|
|
name :: Git.Ref
|
|
|
|
name = Git.Ref "git-annex"
|
code to update a git-annex branch
There is no suitable git hook to run code when pulling changes that
might need to be merged into the git-annex branch. The post-merge hook
is only run when changes are merged into HEAD, and it's possible,
and indeed likely that many pulls will only have changes in git-annex,
but not in HEAD, and not trigger it.
So, git-annex will have to take care to update the branch before reading
from it, to make sure it has merged in current info from remotes. Happily,
this can be done quite inexpensively, just a git-show-ref to list
branches, and a minimalized git-log to see if there are unmerged changes
on the branches. To further speed up, it will be done only once per
git-annex run, max.
2011-06-21 18:29:09 +00:00
|
|
|
|
2011-06-21 21:39:45 +00:00
|
|
|
{- Fully qualified name of the branch. -}
|
improve type signatures with a Ref newtype
In git, a Ref can be a Sha, or a Branch, or a Tag. I added type aliases for
those. Note that this does not prevent mixing up of eg, refs and branches
at the type level. Since git really doesn't care, except rare cases like
git update-ref, or git tag -d, that seems ok for now.
There's also a tree-ish, but let's just use Ref for it. A given Sha or Ref
may or may not be a tree-ish, depending on the object type, so there seems
no point in trying to represent it at the type level.
2011-11-16 06:23:34 +00:00
|
|
|
fullname :: Git.Ref
|
2020-04-07 21:41:09 +00:00
|
|
|
fullname = Git.Ref $ "refs/heads/" <> fromRef' name
|
code to update a git-annex branch
There is no suitable git hook to run code when pulling changes that
might need to be merged into the git-annex branch. The post-merge hook
is only run when changes are merged into HEAD, and it's possible,
and indeed likely that many pulls will only have changes in git-annex,
but not in HEAD, and not trigger it.
So, git-annex will have to take care to update the branch before reading
from it, to make sure it has merged in current info from remotes. Happily,
this can be done quite inexpensively, just a git-show-ref to list
branches, and a minimalized git-log to see if there are unmerged changes
on the branches. To further speed up, it will be done only once per
git-annex run, max.
2011-06-21 18:29:09 +00:00
|
|
|
|
2011-06-24 15:59:34 +00:00
|
|
|
{- Branch's name in origin. -}
|
improve type signatures with a Ref newtype
In git, a Ref can be a Sha, or a Branch, or a Tag. I added type aliases for
those. Note that this does not prevent mixing up of eg, refs and branches
at the type level. Since git really doesn't care, except rare cases like
git update-ref, or git tag -d, that seems ok for now.
There's also a tree-ish, but let's just use Ref for it. A given Sha or Ref
may or may not be a tree-ish, depending on the object type, so there seems
no point in trying to represent it at the type level.
2011-11-16 06:23:34 +00:00
|
|
|
originname :: Git.Ref
|
2021-03-23 19:22:51 +00:00
|
|
|
originname = Git.Ref $ "refs/remotes/origin/" <> fromRef' name
|
2011-06-24 15:59:34 +00:00
|
|
|
|
2011-12-13 01:12:51 +00:00
|
|
|
{- Does origin/git-annex exist? -}
|
|
|
|
hasOrigin :: Annex Bool
|
|
|
|
hasOrigin = inRepo $ Git.Ref.exists originname
|
slow, stupid, and safe index updating
Always merge the git-annex branch into .git/annex/index before making a
commit from the index.
This ensures that, when the branch has been changed in any way
(by a push being received, or changes pulled directly into it, or
even by the user checking it out, and committing a change), the index
reflects those changes.
This is much too slow; it needs to be optimised to only update the
index when the branch has really changed, not every time.
Also, there is an unhandled race, when a change is made to the branch
right after the index gets updated. I left it in for now because it's
unlikely and I didn't want to complicate things with additional locking
yet.
2011-12-11 18:51:20 +00:00
|
|
|
|
2011-12-13 01:12:51 +00:00
|
|
|
{- Does the git-annex branch or a sibling foo/git-annex branch exist? -}
|
|
|
|
hasSibling :: Annex Bool
|
|
|
|
hasSibling = not . null <$> siblingBranches
|
2011-06-21 21:39:45 +00:00
|
|
|
|
2016-07-17 16:11:05 +00:00
|
|
|
{- List of git-annex (shas, branches), including the main one and any
|
|
|
|
- from remotes. Duplicates are filtered out. -}
|
|
|
|
siblingBranches :: Annex [(Git.Sha, Git.Branch)]
|
2013-05-21 22:24:29 +00:00
|
|
|
siblingBranches = inRepo $ Git.Ref.matchingUniq [name]
|
2011-06-22 19:58:30 +00:00
|
|
|
|
|
|
|
{- Creates the branch, if it does not already exist. -}
|
|
|
|
create :: Annex ()
|
2012-06-12 15:32:06 +00:00
|
|
|
create = void getBranch
|
2011-12-12 07:30:47 +00:00
|
|
|
|
2023-12-07 19:50:52 +00:00
|
|
|
{- Returns the sha of the branch, creating it first if necessary. -}
|
2024-08-14 07:19:30 +00:00
|
|
|
getBranch :: Annex Git.Sha
|
2012-01-10 19:36:54 +00:00
|
|
|
getBranch = maybe (hasOrigin >>= go >>= use) return =<< branchsha
|
2012-12-13 04:24:19 +00:00
|
|
|
where
|
|
|
|
go True = do
|
2013-03-03 17:39:07 +00:00
|
|
|
inRepo $ Git.Command.run
|
2021-03-23 19:22:51 +00:00
|
|
|
[ Param "branch"
|
|
|
|
, Param "--no-track"
|
|
|
|
, Param $ fromRef name
|
|
|
|
, Param $ fromRef originname
|
|
|
|
]
|
2023-04-10 17:38:14 +00:00
|
|
|
fromMaybe (giveup $ "failed to create " ++ fromRef name)
|
2012-12-13 04:24:19 +00:00
|
|
|
<$> branchsha
|
2019-11-11 20:15:05 +00:00
|
|
|
go False = withIndex' True $ do
|
2022-08-04 18:20:57 +00:00
|
|
|
-- Create the index file. This is not necessary,
|
2024-01-20 15:25:22 +00:00
|
|
|
-- except to avoid a bug in git 2.37 that causes
|
2022-08-04 18:20:57 +00:00
|
|
|
-- git write-tree to segfault when the index file does not
|
|
|
|
-- exist.
|
|
|
|
inRepo $ flip Git.UpdateIndex.streamUpdateIndex []
|
2019-11-11 22:20:35 +00:00
|
|
|
cmode <- annexCommitMode <$> Annex.getGitConfig
|
2021-05-17 17:07:47 +00:00
|
|
|
cmessage <- createMessage
|
|
|
|
inRepo $ Git.Branch.commitAlways cmode cmessage fullname []
|
2012-12-13 04:24:19 +00:00
|
|
|
use sha = do
|
|
|
|
setIndexSha sha
|
|
|
|
return sha
|
|
|
|
branchsha = inRepo $ Git.Ref.sha fullname
|
2011-06-22 18:18:49 +00:00
|
|
|
|
2012-09-15 19:40:13 +00:00
|
|
|
{- Ensures that the branch and index are up-to-date; should be
|
2012-09-15 23:47:23 +00:00
|
|
|
- called before data is read from it. Runs only once per git-annex run. -}
|
2020-04-09 17:54:43 +00:00
|
|
|
update :: Annex BranchState
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
update = runUpdateOnce $ updateTo =<< siblingBranches
|
2011-12-30 19:57:28 +00:00
|
|
|
|
|
|
|
{- Forces an update even if one has already been run. -}
|
2020-04-15 17:04:34 +00:00
|
|
|
forceUpdate :: Annex UpdateMade
|
2011-12-30 19:57:28 +00:00
|
|
|
forceUpdate = updateTo =<< siblingBranches
|
|
|
|
|
|
|
|
{- Merges the specified Refs into the index, if they have any changes not
|
|
|
|
- already in it. The Branch names are only used in the commit message;
|
|
|
|
- it's even possible that the provided Branches have not been updated to
|
|
|
|
- point to the Refs yet.
|
2012-09-15 22:34:46 +00:00
|
|
|
-
|
|
|
|
- The branch is fast-forwarded if possible, otherwise a merge commit is
|
|
|
|
- made.
|
2011-10-09 20:19:09 +00:00
|
|
|
-
|
2012-09-15 22:34:46 +00:00
|
|
|
- Before Refs are merged into the index, it's important to first stage the
|
merge: Use fast-forward merges when possible.
Thanks Valentin Haenel for a test case showing how non-fast-forward merges
could result in an ongoing pull/merge/push cycle.
While the git-annex branch is fast-forwarded, git-annex's index file is still
updated using the union merge strategy as before. There's no other way to
update the index that would be any faster.
It is possible that a union merge and a fast-forward result in different file
contents: Files should have the same lines, but a union merge may change
their order. If this happens, the next commit made to the git-annex branch
will have some unnecessary changes to line orders, but the consistency
of data should be preserved.
Note that when the journal contains changes, a fast-forward is never attempted,
which is fine, because committing those changes would be vanishingly unlikely
to leave the git-annex branch at a commit that already exists in one of
the remotes.
The real difficulty is handling the case where multiple remotes have all
changed. git-annex does find the best (ie, newest) one and fast forwards
to it. If the remotes are diverged, no fast-forward is done at all. It would
be possible to pick one, fast forward to it, and make a merge commit to
the rest, I see no benefit to adding that complexity.
Determining the best of N changed remotes requires N*2+1 calls to git-log, but
these are fast git-log calls, and N is typically small. Also, typically
some or all of the remote refs will be the same, and git-log is not called to
compare those. In the real world I expect this will almost always add only
1 git-log call to the merge process. (Which already makes N anyway.)
2011-11-06 19:18:45 +00:00
|
|
|
- journal into the index. Otherwise, any changes in the journal would
|
|
|
|
- later get staged, and might overwrite changes made during the merge.
|
2012-09-15 22:34:46 +00:00
|
|
|
- This is only done if some of the Refs do need to be merged.
|
2011-10-09 20:19:09 +00:00
|
|
|
-
|
2013-08-28 19:57:42 +00:00
|
|
|
- Also handles performing any Transitions that have not yet been
|
|
|
|
- performed, in either the local branch, or the Refs.
|
2011-10-09 20:19:09 +00:00
|
|
|
-}
|
2020-04-15 17:04:34 +00:00
|
|
|
updateTo :: [(Git.Sha, Git.Branch)] -> Annex UpdateMade
|
2018-02-22 18:25:32 +00:00
|
|
|
updateTo pairs = ifM (annexMergeAnnexBranches <$> Annex.getGitConfig)
|
|
|
|
( updateTo' pairs
|
2020-04-15 17:04:34 +00:00
|
|
|
, return (UpdateMade False False)
|
2018-02-22 18:25:32 +00:00
|
|
|
)
|
|
|
|
|
2020-04-15 17:04:34 +00:00
|
|
|
updateTo' :: [(Git.Sha, Git.Branch)] -> Annex UpdateMade
|
2018-02-22 18:25:32 +00:00
|
|
|
updateTo' pairs = do
|
2011-12-12 07:30:47 +00:00
|
|
|
-- ensure branch exists, and get its current ref
|
|
|
|
branchref <- getBranch
|
2013-08-28 19:57:42 +00:00
|
|
|
ignoredrefs <- getIgnoredRefs
|
2016-07-17 16:11:05 +00:00
|
|
|
let unignoredrefs = excludeset ignoredrefs pairs
|
2023-08-09 17:30:43 +00:00
|
|
|
(tomerge, notnewer) <- if null unignoredrefs
|
|
|
|
then return ([], [])
|
2016-07-17 16:11:05 +00:00
|
|
|
else do
|
|
|
|
mergedrefs <- getMergedRefs
|
2023-08-09 17:30:43 +00:00
|
|
|
partitionM isnewer $
|
|
|
|
excludeset mergedrefs unignoredrefs
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
{- In a read-only repository, catching permission denied lets
|
|
|
|
- query operations still work, although they will need to do
|
|
|
|
- additional work since the refs are not merged. -}
|
|
|
|
catchPermissionDenied
|
2021-12-28 17:23:32 +00:00
|
|
|
(const (updatefailedperms tomerge))
|
2023-08-09 17:30:43 +00:00
|
|
|
(go branchref tomerge notnewer)
|
2012-12-13 04:24:19 +00:00
|
|
|
where
|
2016-07-17 16:11:05 +00:00
|
|
|
excludeset s = filter (\(r, _) -> S.notMember r s)
|
2021-12-28 17:23:32 +00:00
|
|
|
|
2016-07-17 16:11:05 +00:00
|
|
|
isnewer (r, _) = inRepo $ Git.Branch.changed fullname r
|
2021-12-28 17:23:32 +00:00
|
|
|
|
2023-08-09 17:30:43 +00:00
|
|
|
go branchref tomerge notnewer = do
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
dirty <- journalDirty gitAnnexJournalDir
|
|
|
|
journalcleaned <- if null tomerge
|
|
|
|
{- Even when no refs need to be merged, the index
|
|
|
|
- may still be updated if the branch has gotten ahead
|
|
|
|
- of the index, or just if the journal is dirty. -}
|
|
|
|
then ifM (needUpdateIndex branchref)
|
|
|
|
( lockJournal $ \jl -> do
|
|
|
|
forceUpdateIndex jl branchref
|
|
|
|
{- When there are journalled changes
|
|
|
|
- as well as the branch being updated,
|
|
|
|
- a commit needs to be done. -}
|
|
|
|
when dirty $
|
|
|
|
go' branchref dirty [] jl
|
|
|
|
return True
|
|
|
|
, if dirty
|
|
|
|
then ifM (annexAlwaysCommit <$> Annex.getGitConfig)
|
|
|
|
( lockJournal $ \jl -> do
|
|
|
|
go' branchref dirty [] jl
|
|
|
|
return True
|
|
|
|
, return False
|
|
|
|
)
|
|
|
|
else return True
|
|
|
|
)
|
|
|
|
else lockJournal $ \jl -> do
|
|
|
|
go' branchref dirty tomerge jl
|
|
|
|
return True
|
|
|
|
journalclean <- if journalcleaned
|
|
|
|
then not <$> privateUUIDsKnown
|
|
|
|
else pure False
|
2023-08-09 17:30:43 +00:00
|
|
|
addMergedRefs notnewer
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
return $ UpdateMade
|
|
|
|
{ refsWereMerged = not (null tomerge)
|
|
|
|
, journalClean = journalclean
|
|
|
|
}
|
2021-12-28 17:23:32 +00:00
|
|
|
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
go' branchref dirty tomerge jl = stagejournalwhen dirty jl $ do
|
2016-07-17 16:11:05 +00:00
|
|
|
let (refs, branches) = unzip tomerge
|
2018-08-02 18:06:06 +00:00
|
|
|
merge_desc <- if null tomerge
|
|
|
|
then commitMessage
|
|
|
|
else return $ "merging " ++
|
2012-12-13 04:24:19 +00:00
|
|
|
unwords (map Git.Ref.describe branches) ++
|
2014-02-19 05:09:17 +00:00
|
|
|
" into " ++ fromRef name
|
2022-07-20 14:57:28 +00:00
|
|
|
localtransitions <- getLocalTransitions
|
2016-07-17 16:11:05 +00:00
|
|
|
unless (null tomerge) $ do
|
2023-04-10 21:03:41 +00:00
|
|
|
showSideAction (UnquotedString merge_desc)
|
2015-01-27 21:38:06 +00:00
|
|
|
mapM_ checkBranchDifferences refs
|
2013-10-03 19:43:08 +00:00
|
|
|
mergeIndex jl refs
|
2013-08-28 19:57:42 +00:00
|
|
|
let commitrefs = nub $ fullname:refs
|
2015-03-20 18:52:58 +00:00
|
|
|
ifM (handleTransitions jl localtransitions commitrefs)
|
|
|
|
( runAnnexHook postUpdateAnnexHook
|
|
|
|
, do
|
|
|
|
ff <- if dirty
|
|
|
|
then return False
|
|
|
|
else inRepo $ Git.Branch.fastForward fullname refs
|
|
|
|
if ff
|
|
|
|
then updateIndex jl branchref
|
|
|
|
else commitIndex jl branchref merge_desc commitrefs
|
|
|
|
)
|
2016-07-17 16:11:05 +00:00
|
|
|
addMergedRefs tomerge
|
2024-07-28 16:33:32 +00:00
|
|
|
invalidateCacheAll
|
2021-12-28 17:23:32 +00:00
|
|
|
|
2019-05-07 15:57:12 +00:00
|
|
|
stagejournalwhen dirty jl a
|
|
|
|
| dirty = stageJournal jl a
|
|
|
|
| otherwise = withIndex a
|
2021-12-28 17:23:32 +00:00
|
|
|
|
|
|
|
-- Preparing for read-only branch access with unmerged remote refs.
|
|
|
|
updatefailedperms tomerge = do
|
|
|
|
let refs = map fst tomerge
|
|
|
|
-- Gather any transitions that are new to either the
|
|
|
|
-- local branch or a remote ref, which will need to be
|
|
|
|
-- applied on the fly.
|
2022-07-20 14:57:28 +00:00
|
|
|
localts <- getLocalTransitions
|
2021-12-28 17:23:32 +00:00
|
|
|
remotets <- mapM getRefTransitions refs
|
|
|
|
ts <- if all (localts ==) remotets
|
|
|
|
then return []
|
|
|
|
else
|
|
|
|
let tcs = mapMaybe getTransitionCalculator $
|
|
|
|
knownTransitionList $
|
|
|
|
combineTransitions (localts:remotets)
|
|
|
|
in if null tcs
|
|
|
|
then return []
|
|
|
|
else do
|
|
|
|
config <- Annex.getGitConfig
|
|
|
|
trustmap <- calcTrustMap <$> getStaged trustLog
|
|
|
|
remoteconfigmap <- calcRemoteConfigMap <$> getStaged remoteLog
|
|
|
|
return $ map (\c -> c trustmap remoteconfigmap config) tcs
|
|
|
|
return $ UpdateFailedPermissions
|
|
|
|
{ refsUnmerged = refs
|
|
|
|
, newTransitions = ts
|
|
|
|
}
|
2011-06-23 15:37:26 +00:00
|
|
|
|
Fix a bug in the git-annex branch handling code that could cause info from a remote to not be merged and take effect immediately.
This bug was turned up by the test suite, running fsck in direct mode.
A repository was cloned, was put into direct mode, was fscked, and fsck
incorrectly said that no copy existed of a file, that was actually present
in origin.
This turned out to occur because fsck first did a Annex.Branch.change,
recording that it did not locally have the file. That was recorded in the
journal. Since neither the git annex direct not the fsck had yet needed to
read any info from the branch, but had only made changes to it, the
origin/git-annex branch was not yet merged in. So the journal got a
location log entry written to it, but this did not include
the location log info for the origin. When fsck then did a
Annex.Branch.get, it trusted the journal was cosnsitent, and returned it,
again w/o merging from origin/git-annex. This latter behavior is the
actual bug.
Refer to commit e9bfa8eaed3ff59a4c0bc8d4d677bc493177807c for the thinking
behind it being ok to make a change to a file on the branch, without
first merging the branch. That thinking still stands. However, it means
that files in the journal cannot be trusted to be consistent if the branch
has not been merged. So, to fix, just enure the branch gets merged, even
when reading from the journal.
In tests, this does not seem to cause any extra merging. Except, of course,
in the one case described above. But git annex add, etc, are able to make
changes w/o first merging the branch.
2013-05-20 19:14:59 +00:00
|
|
|
{- Gets the content of a file, which may be in the journal, or in the index
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
- (and committed to the branch).
|
|
|
|
-
|
|
|
|
- Returns an empty string if the file doesn't exist yet.
|
2012-09-15 22:34:46 +00:00
|
|
|
-
|
|
|
|
- Updates the branch if necessary, to ensure the most up-to-date available
|
2021-12-28 17:23:32 +00:00
|
|
|
- content is returned.
|
|
|
|
-
|
|
|
|
- When permissions prevented updating the branch, reads the content from the
|
|
|
|
- journal, plus the branch, plus all unmerged refs. In this case, any
|
|
|
|
- transitions that have not been applied to all refs will be applied on
|
|
|
|
- the fly.
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
-}
|
2019-11-26 19:27:22 +00:00
|
|
|
get :: RawFilePath -> Annex L.ByteString
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
get file = do
|
|
|
|
st <- update
|
|
|
|
case getCache file st of
|
|
|
|
Just content -> return content
|
|
|
|
Nothing -> do
|
|
|
|
content <- if journalIgnorable st
|
|
|
|
then getRef fullname file
|
|
|
|
else if null (unmergedRefs st)
|
2022-07-20 14:57:28 +00:00
|
|
|
then getLocal file
|
2021-12-28 17:23:32 +00:00
|
|
|
else unmergedbranchfallback st
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
setCache file content
|
|
|
|
return content
|
|
|
|
where
|
2021-12-28 17:23:32 +00:00
|
|
|
unmergedbranchfallback st = do
|
2022-07-20 14:57:28 +00:00
|
|
|
l <- getLocal file
|
2021-12-28 17:23:32 +00:00
|
|
|
bs <- forM (unmergedRefs st) $ \ref -> getRef ref file
|
|
|
|
let content = l <> mconcat bs
|
|
|
|
return $ applytransitions (unhandledTransitions st) content
|
|
|
|
applytransitions [] content = content
|
|
|
|
applytransitions (changer:rest) content = case changer file content of
|
|
|
|
PreserveFile -> applytransitions rest content
|
|
|
|
ChangeFile builder -> do
|
|
|
|
let content' = toLazyByteString builder
|
|
|
|
if L.null content'
|
|
|
|
-- File is deleted, can't run any other
|
|
|
|
-- transitions on it.
|
|
|
|
then content'
|
|
|
|
else applytransitions rest content'
|
2011-11-12 19:15:57 +00:00
|
|
|
|
2021-12-27 18:08:50 +00:00
|
|
|
{- When the git-annex branch is unable to be updated due to permissions,
|
|
|
|
- and there are other git-annex branches that have not been merged into
|
|
|
|
- it, this gets the refs of those branches. -}
|
|
|
|
getUnmergedRefs :: Annex [Git.Ref]
|
|
|
|
getUnmergedRefs = unmergedRefs <$> update
|
|
|
|
|
2021-04-21 18:02:15 +00:00
|
|
|
{- Used to cache the value of a file, which has been read from the branch
|
|
|
|
- using some optimised method. The journal has to be checked, in case
|
|
|
|
- it has a newer version of the file that has not reached the branch yet.
|
|
|
|
-}
|
|
|
|
precache :: RawFilePath -> L.ByteString -> Annex ()
|
|
|
|
precache file branchcontent = do
|
|
|
|
st <- getState
|
|
|
|
content <- if journalIgnorable st
|
|
|
|
then pure branchcontent
|
2022-07-20 14:57:28 +00:00
|
|
|
else getJournalFileStale (GetPrivate True) file >>= return . \case
|
2021-10-26 17:43:50 +00:00
|
|
|
NoJournalledContent -> branchcontent
|
|
|
|
JournalledContent journalcontent -> journalcontent
|
|
|
|
PossiblyStaleJournalledContent journalcontent ->
|
|
|
|
branchcontent <> journalcontent
|
merge git-annex branch in memory in read-only repository
Improved support for using git-annex in a read-only repository, git-annex
branch information from remotes that cannot be merged into the git-annex
branch will now not crash it, but will be merged in memory.
To avoid this making git-annex behave one way in a read-only repository,
and another way when it can write, it's important that Annex.Branch.get
return the same thing (modulo log file compaction) in both cases.
This manages that mostly. There are some exceptions:
- When there is a transition in one of the remote git-annex branches
that has not yet been applied to the local or other git-annex branches.
Transitions are not handled.
- `git-annex log` runs git log on the git-annex branch, and so
it will not be able to show information coming from the other, not yet
merged branches.
- Annex.Branch.files only looks at files in the git-annex branch and not
unmerged branches. This affects git-annex info output.
- Annex.Branch.hs.overBranchFileContents ditto. Affects --all and
also importfeed (but importfeed cannot work in a read-only repo
anyway).
- CmdLine.Seek.seekFilteredKeys when precaching location logs.
Note use of Annex.Branch.fullname
- Database.ContentIdentifier.needsUpdateFromLog and updateFromLog
These warts make this not suitable to be merged yet.
This readonly code path is more expensive, since it has to query several
branches. The value does get cached, but still large queries will be
slower in a read-only repository when there are unmerged git-annex
branches.
When annex.merge-annex-branches=false, updateTo skips doing anything,
and so the read-only repository code does not get triggered. So a user who
is bothered by the extra work can set that.
Other writes to the repository can still result in permissions errors.
This includes the initial creation of the git-annex branch, and of course
any writes to the git-annex branch.
Sponsored-by: Dartmouth College's Datalad project
2021-12-26 18:28:42 +00:00
|
|
|
setCache file content
|
2021-04-21 18:02:15 +00:00
|
|
|
|
2011-11-12 19:15:57 +00:00
|
|
|
{- Like get, but does not merge the branch, so the info returned may not
|
Fix a bug in the git-annex branch handling code that could cause info from a remote to not be merged and take effect immediately.
This bug was turned up by the test suite, running fsck in direct mode.
A repository was cloned, was put into direct mode, was fscked, and fsck
incorrectly said that no copy existed of a file, that was actually present
in origin.
This turned out to occur because fsck first did a Annex.Branch.change,
recording that it did not locally have the file. That was recorded in the
journal. Since neither the git annex direct not the fsck had yet needed to
read any info from the branch, but had only made changes to it, the
origin/git-annex branch was not yet merged in. So the journal got a
location log entry written to it, but this did not include
the location log info for the origin. When fsck then did a
Annex.Branch.get, it trusted the journal was cosnsitent, and returned it,
again w/o merging from origin/git-annex. This latter behavior is the
actual bug.
Refer to commit e9bfa8eaed3ff59a4c0bc8d4d677bc493177807c for the thinking
behind it being ok to make a change to a file on the branch, without
first merging the branch. That thinking still stands. However, it means
that files in the journal cannot be trusted to be consistent if the branch
has not been merged. So, to fix, just enure the branch gets merged, even
when reading from the journal.
In tests, this does not seem to cause any extra merging. Except, of course,
in the one case described above. But git annex add, etc, are able to make
changes w/o first merging the branch.
2013-05-20 19:14:59 +00:00
|
|
|
- reflect changes in remotes.
|
|
|
|
- (Changing the value this returns, and then merging is always the
|
|
|
|
- same as using get, and then changing its value.) -}
|
2022-07-20 14:57:28 +00:00
|
|
|
getLocal :: RawFilePath -> Annex L.ByteString
|
|
|
|
getLocal = getLocal' (GetPrivate True)
|
start implementing hidden git-annex repositories
This adds a separate journal, which does not currently get committed to
an index, but is planned to be committed to .git/annex/index-private.
Changes that are regarding a UUID that is private will get written to
this journal, and so will not be published into the git-annex branch.
All log writing should have been made to indicate the UUID it's
regarding, though I've not verified this yet.
Currently, no UUIDs are treated as private yet, a way to configure that
is needed.
The implementation is careful to not add any additional IO work when
privateUUIDsKnown is False. It will skip looking at the private journal
at all. So this should be free, or nearly so, unless the feature is
used. When it is used, all branch reads will be about twice as expensive.
It is very lucky -- or very prudent design -- that Annex.Branch.change
and maybeChange are the only ways to change a file on the branch,
and Annex.Branch.set is only internal use. That let Annex.Branch.get
always yield any private information that has been recorded, without
the risk that Annex.Branch.set might be called, with a non-private UUID,
and end up leaking the private information into the git-annex branch.
And, this relies on the way git-annex union merges the git-annex branch.
When reading a file, there can be a public and a private version, and
they are just concacenated together. That will be handled the same as if
there were two diverged git-annex branches that got union merged.
2021-04-20 18:32:41 +00:00
|
|
|
|
2022-07-20 14:57:28 +00:00
|
|
|
getLocal' :: GetPrivate -> RawFilePath -> Annex L.ByteString
|
|
|
|
getLocal' getprivate file = do
|
2021-04-06 20:48:24 +00:00
|
|
|
fastDebug "Annex.Branch" ("read " ++ fromRawFilePath file)
|
2022-07-20 14:57:28 +00:00
|
|
|
go =<< getJournalFileStale getprivate file
|
2012-12-13 04:24:19 +00:00
|
|
|
where
|
2021-10-26 17:43:50 +00:00
|
|
|
go NoJournalledContent = getRef fullname file
|
|
|
|
go (JournalledContent journalcontent) = return journalcontent
|
|
|
|
go (PossiblyStaleJournalledContent journalcontent) = do
|
|
|
|
v <- getRef fullname file
|
|
|
|
return (v <> journalcontent)
|
2013-10-03 18:41:57 +00:00
|
|
|
|
Fix git-annex branch data loss that could occur after git-annex forget --drop-dead
Added getStaged, to get the versions of git-annex branch files staged in its
index, and use during transitions so the result of merging sibling branches
is used.
The catFileStop in performTransitionsLocked is absolutely necessary,
without that the bug still occurred, because git cat-file was already
running and was looking at the old index file.
Note that getLocal still has cat-file look at the git-annex branch, not the
index. It might be faster if it looked at the index, but probably only
marginally so, and I've not benchmarked it to see if it's faster at all. I
didn't want to change unrelated behavior as part of this bug fix. And as
the need for catFileStop shows, using the index file has added
complications.
Anyway, it still seems fine for getLocal to look at the git-annex branch,
because normally the index file is updated just before the git-annex branch
is committed, and so they'll contain the same information. It's only during
a transition that the two diverge.
This commit was sponsored by Paul Walmsley in honor of Mark Phillips.
2018-08-06 21:22:12 +00:00
|
|
|
{- Gets the content of a file as staged in the branch's index. -}
|
2019-11-26 19:27:22 +00:00
|
|
|
getStaged :: RawFilePath -> Annex L.ByteString
|
Fix git-annex branch data loss that could occur after git-annex forget --drop-dead
Added getStaged, to get the versions of git-annex branch files staged in its
index, and use during transitions so the result of merging sibling branches
is used.
The catFileStop in performTransitionsLocked is absolutely necessary,
without that the bug still occurred, because git cat-file was already
running and was looking at the old index file.
Note that getLocal still has cat-file look at the git-annex branch, not the
index. It might be faster if it looked at the index, but probably only
marginally so, and I've not benchmarked it to see if it's faster at all. I
didn't want to change unrelated behavior as part of this bug fix. And as
the need for catFileStop shows, using the index file has added
complications.
Anyway, it still seems fine for getLocal to look at the git-annex branch,
because normally the index file is updated just before the git-annex branch
is committed, and so they'll contain the same information. It's only during
a transition that the two diverge.
This commit was sponsored by Paul Walmsley in honor of Mark Phillips.
2018-08-06 21:22:12 +00:00
|
|
|
getStaged = getRef indexref
|
|
|
|
where
|
|
|
|
-- This makes git cat-file be run with ":file",
|
|
|
|
-- so it looks at the index.
|
|
|
|
indexref = Ref ""
|
2014-02-06 16:43:56 +00:00
|
|
|
|
2019-11-26 19:27:22 +00:00
|
|
|
getHistorical :: RefDate -> RawFilePath -> Annex L.ByteString
|
2015-03-26 15:15:15 +00:00
|
|
|
getHistorical date file =
|
|
|
|
-- This check avoids some ugly error messages when the reflog
|
|
|
|
-- is empty.
|
2015-07-07 21:31:30 +00:00
|
|
|
ifM (null <$> inRepo (Git.RefLog.get' [Param (fromRef fullname), Param "-n1"]))
|
2016-11-16 01:29:54 +00:00
|
|
|
( giveup ("No reflog for " ++ fromRef fullname)
|
2015-03-26 15:15:15 +00:00
|
|
|
, getRef (Git.Ref.dateRef fullname date) file
|
|
|
|
)
|
2014-02-06 16:43:56 +00:00
|
|
|
|
2019-11-26 19:27:22 +00:00
|
|
|
getRef :: Ref -> RawFilePath -> Annex L.ByteString
|
2019-01-03 17:21:48 +00:00
|
|
|
getRef ref file = withIndex $ catFile ref file
|
2011-06-30 01:23:40 +00:00
|
|
|
|
2017-01-30 20:41:29 +00:00
|
|
|
{- Applies a function to modify the content of a file.
|
2011-12-13 01:12:51 +00:00
|
|
|
-
|
|
|
|
- Note that this does not cause the branch to be merged, it only
|
2023-03-14 02:39:16 +00:00
|
|
|
- modifies the current content of the file on the branch.
|
2011-12-13 01:12:51 +00:00
|
|
|
-}
|
start implementing hidden git-annex repositories
This adds a separate journal, which does not currently get committed to
an index, but is planned to be committed to .git/annex/index-private.
Changes that are regarding a UUID that is private will get written to
this journal, and so will not be published into the git-annex branch.
All log writing should have been made to indicate the UUID it's
regarding, though I've not verified this yet.
Currently, no UUIDs are treated as private yet, a way to configure that
is needed.
The implementation is careful to not add any additional IO work when
privateUUIDsKnown is False. It will skip looking at the private journal
at all. So this should be free, or nearly so, unless the feature is
used. When it is used, all branch reads will be about twice as expensive.
It is very lucky -- or very prudent design -- that Annex.Branch.change
and maybeChange are the only ways to change a file on the branch,
and Annex.Branch.set is only internal use. That let Annex.Branch.get
always yield any private information that has been recorded, without
the risk that Annex.Branch.set might be called, with a non-private UUID,
and end up leaking the private information into the git-annex branch.
And, this relies on the way git-annex union merges the git-annex branch.
When reading a file, there can be a public and a private version, and
they are just concacenated together. That will be handled the same as if
there were two diverged git-annex branches that got union merged.
2021-04-20 18:32:41 +00:00
|
|
|
change :: Journalable content => RegardingUUID -> RawFilePath -> (L.ByteString -> content) -> Annex ()
|
2022-07-20 14:57:28 +00:00
|
|
|
change ru file f = lockJournal $ \jl -> f <$> getToChange ru file >>= set jl ru file
|
2015-10-12 18:46:28 +00:00
|
|
|
|
2024-08-15 17:27:14 +00:00
|
|
|
{- Applies a function which can modify the content of a file, or not.
|
|
|
|
-
|
partially fix concurrency issue in updating the rollingtotal
It's possible for two processes or threads to both be doing the same
operation at the same time. Eg, both dropping the same key. If one
finishes and updates the rollingtotal, then the other one needs to be
prevented from later updating the rollingtotal as well. And they could
finish at the same time, or with some time in between.
Addressed this by making updateRepoSize be called with the journal
locked, and only once it's been determined that there is an actual
location change to record in the log. updateRepoSize waits for the
database to be updated.
When there is a redundant operation, updateRepoSize won't be called,
and the redundant LiveUpdate will be removed from the database on
garbage collection.
But: There will be a window where the redundant LiveUpdate is still
visible in the db, and processes can see it, combine it with the
rollingtotal, and arrive at the wrong size. This is a small window, but
it still ought to be addressed. Unsure if it would always be safe to
remove the redundant LiveUpdate? Consider the case where two drops and a
get are all running concurrently somehow, and the order they finish is
[drop, get, drop]. The second drop seems redundant to the first, but
it would not be safe to remove it. While this seems unlikely, it's hard
to rule out that a get and drop at different stages can both be running
at the same time.
2024-08-26 13:43:32 +00:00
|
|
|
- When the file was modified, runs the onchange action, and returns
|
|
|
|
- True. The action is run while the journal is still locked,
|
|
|
|
- so another concurrent call to this cannot happen while it is running. -}
|
|
|
|
maybeChange :: Journalable content => RegardingUUID -> RawFilePath -> (L.ByteString -> Maybe content) -> Annex () -> Annex Bool
|
|
|
|
maybeChange ru file f onchange = lockJournal $ \jl -> do
|
2022-07-20 14:57:28 +00:00
|
|
|
v <- getToChange ru file
|
2015-10-12 18:46:28 +00:00
|
|
|
case f v of
|
2019-01-03 17:21:48 +00:00
|
|
|
Just jv ->
|
|
|
|
let b = journalableByteString jv
|
2024-08-15 17:27:14 +00:00
|
|
|
in if v /= b
|
|
|
|
then do
|
|
|
|
set jl ru file b
|
partially fix concurrency issue in updating the rollingtotal
It's possible for two processes or threads to both be doing the same
operation at the same time. Eg, both dropping the same key. If one
finishes and updates the rollingtotal, then the other one needs to be
prevented from later updating the rollingtotal as well. And they could
finish at the same time, or with some time in between.
Addressed this by making updateRepoSize be called with the journal
locked, and only once it's been determined that there is an actual
location change to record in the log. updateRepoSize waits for the
database to be updated.
When there is a redundant operation, updateRepoSize won't be called,
and the redundant LiveUpdate will be removed from the database on
garbage collection.
But: There will be a window where the redundant LiveUpdate is still
visible in the db, and processes can see it, combine it with the
rollingtotal, and arrive at the wrong size. This is a small window, but
it still ought to be addressed. Unsure if it would always be safe to
remove the redundant LiveUpdate? Consider the case where two drops and a
get are all running concurrently somehow, and the order they finish is
[drop, get, drop]. The second drop seems redundant to the first, but
it would not be safe to remove it. While this seems unlikely, it's hard
to rule out that a get and drop at different stages can both be running
at the same time.
2024-08-26 13:43:32 +00:00
|
|
|
onchange
|
2024-08-15 17:27:14 +00:00
|
|
|
return True
|
|
|
|
else return False
|
|
|
|
_ -> return False
|
2011-12-13 01:12:51 +00:00
|
|
|
|
split out appending to journal from writing, high level only
Currently this is not an improvement, but it allows for optimising
appendJournalFile later. With an optimised appendJournalFile, this will
greatly speed up access patterns like git-annex addurl of a lot of urls
to the same key, where the log file can grow rather large. Appending
rather than re-writing the journal file for each line can save a lot of
disk writes.
It still has to read the current journal or branch file, to check
if it can append to it, and so when the journal file does not exist yet,
it can write the old content from the branch to it. Probably the re-reads
are better cached by the filesystem than repeated writes. (If the
re-reads turn out to keep performance bad, they could be eliminated, at
the cost of not being able to compact the log when replacing old
information in it. That could be enabled by a switch.)
While the immediate need is to affect addurl writes, it was implemented
at the level of presence logs, so will also perhaps speed up location logs.
The only added overhead is the call to isNewInfo, which only needs to
compare ByteStrings. Helping to balance that out, it avoids compactLog
when it's able to append.
Sponsored-by: Dartmouth College's DANDI project
2022-07-18 17:22:50 +00:00
|
|
|
data ChangeOrAppend t = Change t | Append t
|
|
|
|
|
|
|
|
{- Applies a function that can either modify the content of the file,
|
|
|
|
- or append to the file. Appending can be more efficient when several
|
|
|
|
- lines are written to a file in succession.
|
2022-07-18 19:50:36 +00:00
|
|
|
-
|
|
|
|
- When annex.alwayscompact=false, the function is not passed the content
|
|
|
|
- of the journal file when the journal file already exists, and whatever
|
|
|
|
- value it provides is always appended to the journal file. That avoids
|
|
|
|
- reading the journal file, and so can be faster when many lines are being
|
|
|
|
- written to it. The information that is recorded will be effectively the
|
2023-03-14 02:39:16 +00:00
|
|
|
- same, only obsolete log lines will not get compacted.
|
2022-07-20 17:19:06 +00:00
|
|
|
-
|
|
|
|
- Currently, only appends when annex.alwayscompact=false. That is to
|
|
|
|
- avoid appending when an older version of git-annex is also in use in the
|
|
|
|
- same repository. An interrupted append could leave the journal file in a
|
|
|
|
- state that would confuse the older version. This is planned to be
|
|
|
|
- changed in a future repository version.
|
split out appending to journal from writing, high level only
Currently this is not an improvement, but it allows for optimising
appendJournalFile later. With an optimised appendJournalFile, this will
greatly speed up access patterns like git-annex addurl of a lot of urls
to the same key, where the log file can grow rather large. Appending
rather than re-writing the journal file for each line can save a lot of
disk writes.
It still has to read the current journal or branch file, to check
if it can append to it, and so when the journal file does not exist yet,
it can write the old content from the branch to it. Probably the re-reads
are better cached by the filesystem than repeated writes. (If the
re-reads turn out to keep performance bad, they could be eliminated, at
the cost of not being able to compact the log when replacing old
information in it. That could be enabled by a switch.)
While the immediate need is to affect addurl writes, it was implemented
at the level of presence logs, so will also perhaps speed up location logs.
The only added overhead is the call to isNewInfo, which only needs to
compare ByteStrings. Helping to balance that out, it avoids compactLog
when it's able to append.
Sponsored-by: Dartmouth College's DANDI project
2022-07-18 17:22:50 +00:00
|
|
|
-}
|
|
|
|
changeOrAppend :: Journalable content => RegardingUUID -> RawFilePath -> (L.ByteString -> ChangeOrAppend content) -> Annex ()
|
2022-07-18 19:50:36 +00:00
|
|
|
changeOrAppend ru file f = lockJournal $ \jl ->
|
|
|
|
checkCanAppendJournalFile jl ru file >>= \case
|
|
|
|
Just appendable -> ifM (annexAlwaysCompact <$> Annex.getGitConfig)
|
|
|
|
( do
|
2022-07-20 14:57:28 +00:00
|
|
|
oldc <- getToChange ru file
|
2022-07-18 19:50:36 +00:00
|
|
|
case f oldc of
|
|
|
|
Change newc -> set jl ru file newc
|
2022-07-20 17:19:06 +00:00
|
|
|
Append toappend ->
|
|
|
|
set jl ru file $
|
|
|
|
oldc <> journalableByteString toappend
|
|
|
|
-- Use this instead in v11
|
|
|
|
-- or whatever.
|
|
|
|
-- append jl file appendable toappend
|
2022-07-18 19:50:36 +00:00
|
|
|
, case f mempty of
|
|
|
|
-- Append even though a change was
|
|
|
|
-- requested; since mempty was passed in,
|
|
|
|
-- the lines requested to change are
|
|
|
|
-- minimized.
|
|
|
|
Change newc -> append jl file appendable newc
|
|
|
|
Append toappend -> append jl file appendable toappend
|
|
|
|
)
|
|
|
|
Nothing -> do
|
2022-07-20 14:57:28 +00:00
|
|
|
oldc <- getToChange ru file
|
2022-07-18 19:50:36 +00:00
|
|
|
case f oldc of
|
|
|
|
Change newc -> set jl ru file newc
|
|
|
|
-- Journal file does not exist yet, so
|
|
|
|
-- cannot append and have to write it all.
|
|
|
|
Append toappend -> set jl ru file $
|
|
|
|
oldc <> journalableByteString toappend
|
split out appending to journal from writing, high level only
Currently this is not an improvement, but it allows for optimising
appendJournalFile later. With an optimised appendJournalFile, this will
greatly speed up access patterns like git-annex addurl of a lot of urls
to the same key, where the log file can grow rather large. Appending
rather than re-writing the journal file for each line can save a lot of
disk writes.
It still has to read the current journal or branch file, to check
if it can append to it, and so when the journal file does not exist yet,
it can write the old content from the branch to it. Probably the re-reads
are better cached by the filesystem than repeated writes. (If the
re-reads turn out to keep performance bad, they could be eliminated, at
the cost of not being able to compact the log when replacing old
information in it. That could be enabled by a switch.)
While the immediate need is to affect addurl writes, it was implemented
at the level of presence logs, so will also perhaps speed up location logs.
The only added overhead is the call to isNewInfo, which only needs to
compare ByteStrings. Helping to balance that out, it avoids compactLog
when it's able to append.
Sponsored-by: Dartmouth College's DANDI project
2022-07-18 17:22:50 +00:00
|
|
|
|
start implementing hidden git-annex repositories
This adds a separate journal, which does not currently get committed to
an index, but is planned to be committed to .git/annex/index-private.
Changes that are regarding a UUID that is private will get written to
this journal, and so will not be published into the git-annex branch.
All log writing should have been made to indicate the UUID it's
regarding, though I've not verified this yet.
Currently, no UUIDs are treated as private yet, a way to configure that
is needed.
The implementation is careful to not add any additional IO work when
privateUUIDsKnown is False. It will skip looking at the private journal
at all. So this should be free, or nearly so, unless the feature is
used. When it is used, all branch reads will be about twice as expensive.
It is very lucky -- or very prudent design -- that Annex.Branch.change
and maybeChange are the only ways to change a file on the branch,
and Annex.Branch.set is only internal use. That let Annex.Branch.get
always yield any private information that has been recorded, without
the risk that Annex.Branch.set might be called, with a non-private UUID,
and end up leaking the private information into the git-annex branch.
And, this relies on the way git-annex union merges the git-annex branch.
When reading a file, there can be a public and a private version, and
they are just concacenated together. That will be handled the same as if
there were two diverged git-annex branches that got union merged.
2021-04-20 18:32:41 +00:00
|
|
|
{- Only get private information when the RegardingUUID is itself private. -}
|
2022-07-20 14:57:28 +00:00
|
|
|
getToChange :: RegardingUUID -> RawFilePath -> Annex L.ByteString
|
|
|
|
getToChange ru f = flip getLocal' f . GetPrivate =<< regardingPrivateUUID ru
|
start implementing hidden git-annex repositories
This adds a separate journal, which does not currently get committed to
an index, but is planned to be committed to .git/annex/index-private.
Changes that are regarding a UUID that is private will get written to
this journal, and so will not be published into the git-annex branch.
All log writing should have been made to indicate the UUID it's
regarding, though I've not verified this yet.
Currently, no UUIDs are treated as private yet, a way to configure that
is needed.
The implementation is careful to not add any additional IO work when
privateUUIDsKnown is False. It will skip looking at the private journal
at all. So this should be free, or nearly so, unless the feature is
used. When it is used, all branch reads will be about twice as expensive.
It is very lucky -- or very prudent design -- that Annex.Branch.change
and maybeChange are the only ways to change a file on the branch,
and Annex.Branch.set is only internal use. That let Annex.Branch.get
always yield any private information that has been recorded, without
the risk that Annex.Branch.set might be called, with a non-private UUID,
and end up leaking the private information into the git-annex branch.
And, this relies on the way git-annex union merges the git-annex branch.
When reading a file, there can be a public and a private version, and
they are just concacenated together. That will be handled the same as if
there were two diverged git-annex branches that got union merged.
2021-04-20 18:32:41 +00:00
|
|
|
|
|
|
|
{- Records new content of a file into the journal.
|
|
|
|
-
|
|
|
|
- This is not exported; all changes have to be made via change. This
|
|
|
|
- ensures that information that was written to the branch is not
|
|
|
|
- overwritten. Also, it avoids a get followed by a set without taking into
|
|
|
|
- account whether private information was gotten from the private
|
|
|
|
- git-annex index, and should not be written to the public git-annex
|
|
|
|
- branch.
|
|
|
|
-}
|
|
|
|
set :: Journalable content => JournalLocked -> RegardingUUID -> RawFilePath -> content -> Annex ()
|
|
|
|
set jl ru f c = do
|
2020-04-09 17:54:43 +00:00
|
|
|
journalChanged
|
start implementing hidden git-annex repositories
This adds a separate journal, which does not currently get committed to
an index, but is planned to be committed to .git/annex/index-private.
Changes that are regarding a UUID that is private will get written to
this journal, and so will not be published into the git-annex branch.
All log writing should have been made to indicate the UUID it's
regarding, though I've not verified this yet.
Currently, no UUIDs are treated as private yet, a way to configure that
is needed.
The implementation is careful to not add any additional IO work when
privateUUIDsKnown is False. It will skip looking at the private journal
at all. So this should be free, or nearly so, unless the feature is
used. When it is used, all branch reads will be about twice as expensive.
It is very lucky -- or very prudent design -- that Annex.Branch.change
and maybeChange are the only ways to change a file on the branch,
and Annex.Branch.set is only internal use. That let Annex.Branch.get
always yield any private information that has been recorded, without
the risk that Annex.Branch.set might be called, with a non-private UUID,
and end up leaking the private information into the git-annex branch.
And, this relies on the way git-annex union merges the git-annex branch.
When reading a file, there can be a public and a private version, and
they are just concacenated together. That will be handled the same as if
there were two diverged git-annex branches that got union merged.
2021-04-20 18:32:41 +00:00
|
|
|
setJournalFile jl ru f c
|
2021-04-06 20:48:24 +00:00
|
|
|
fastDebug "Annex.Branch" ("set " ++ fromRawFilePath f)
|
2020-07-06 16:09:53 +00:00
|
|
|
-- Could cache the new content, but it would involve
|
|
|
|
-- evaluating a Journalable Builder twice, which is not very
|
|
|
|
-- efficient. Instead, assume that it's not common to need to read
|
|
|
|
-- a log file immediately after writing it.
|
2024-07-28 16:33:32 +00:00
|
|
|
invalidateCache f
|
2011-12-13 01:12:51 +00:00
|
|
|
|
2022-07-18 19:50:36 +00:00
|
|
|
{- Appends content to the journal file. -}
|
|
|
|
append :: Journalable content => JournalLocked -> RawFilePath -> AppendableJournalFile -> content -> Annex ()
|
|
|
|
append jl f appendable toappend = do
|
split out appending to journal from writing, high level only
Currently this is not an improvement, but it allows for optimising
appendJournalFile later. With an optimised appendJournalFile, this will
greatly speed up access patterns like git-annex addurl of a lot of urls
to the same key, where the log file can grow rather large. Appending
rather than re-writing the journal file for each line can save a lot of
disk writes.
It still has to read the current journal or branch file, to check
if it can append to it, and so when the journal file does not exist yet,
it can write the old content from the branch to it. Probably the re-reads
are better cached by the filesystem than repeated writes. (If the
re-reads turn out to keep performance bad, they could be eliminated, at
the cost of not being able to compact the log when replacing old
information in it. That could be enabled by a switch.)
While the immediate need is to affect addurl writes, it was implemented
at the level of presence logs, so will also perhaps speed up location logs.
The only added overhead is the call to isNewInfo, which only needs to
compare ByteStrings. Helping to balance that out, it avoids compactLog
when it's able to append.
Sponsored-by: Dartmouth College's DANDI project
2022-07-18 17:22:50 +00:00
|
|
|
journalChanged
|
2022-07-18 19:50:36 +00:00
|
|
|
appendJournalFile jl appendable toappend
|
split out appending to journal from writing, high level only
Currently this is not an improvement, but it allows for optimising
appendJournalFile later. With an optimised appendJournalFile, this will
greatly speed up access patterns like git-annex addurl of a lot of urls
to the same key, where the log file can grow rather large. Appending
rather than re-writing the journal file for each line can save a lot of
disk writes.
It still has to read the current journal or branch file, to check
if it can append to it, and so when the journal file does not exist yet,
it can write the old content from the branch to it. Probably the re-reads
are better cached by the filesystem than repeated writes. (If the
re-reads turn out to keep performance bad, they could be eliminated, at
the cost of not being able to compact the log when replacing old
information in it. That could be enabled by a switch.)
While the immediate need is to affect addurl writes, it was implemented
at the level of presence logs, so will also perhaps speed up location logs.
The only added overhead is the call to isNewInfo, which only needs to
compare ByteStrings. Helping to balance that out, it avoids compactLog
when it's able to append.
Sponsored-by: Dartmouth College's DANDI project
2022-07-18 17:22:50 +00:00
|
|
|
fastDebug "Annex.Branch" ("append " ++ fromRawFilePath f)
|
2024-07-28 16:33:32 +00:00
|
|
|
invalidateCache f
|
split out appending to journal from writing, high level only
Currently this is not an improvement, but it allows for optimising
appendJournalFile later. With an optimised appendJournalFile, this will
greatly speed up access patterns like git-annex addurl of a lot of urls
to the same key, where the log file can grow rather large. Appending
rather than re-writing the journal file for each line can save a lot of
disk writes.
It still has to read the current journal or branch file, to check
if it can append to it, and so when the journal file does not exist yet,
it can write the old content from the branch to it. Probably the re-reads
are better cached by the filesystem than repeated writes. (If the
re-reads turn out to keep performance bad, they could be eliminated, at
the cost of not being able to compact the log when replacing old
information in it. That could be enabled by a switch.)
While the immediate need is to affect addurl writes, it was implemented
at the level of presence logs, so will also perhaps speed up location logs.
The only added overhead is the call to isNewInfo, which only needs to
compare ByteStrings. Helping to balance that out, it avoids compactLog
when it's able to append.
Sponsored-by: Dartmouth College's DANDI project
2022-07-18 17:22:50 +00:00
|
|
|
|
2018-08-02 18:06:06 +00:00
|
|
|
{- Commit message used when making a commit of whatever data has changed
|
2023-03-14 02:39:16 +00:00
|
|
|
- to the git-annex branch. -}
|
2018-08-02 18:06:06 +00:00
|
|
|
commitMessage :: Annex String
|
2024-02-12 18:34:50 +00:00
|
|
|
commitMessage = fromMaybe "update" <$> getCommitMessage
|
2018-08-02 18:06:06 +00:00
|
|
|
|
2021-05-17 17:07:47 +00:00
|
|
|
{- Commit message used when creating the branch. -}
|
|
|
|
createMessage :: Annex String
|
2024-02-12 18:34:50 +00:00
|
|
|
createMessage = fromMaybe "branch created" <$> getCommitMessage
|
|
|
|
|
|
|
|
getCommitMessage :: Annex (Maybe String)
|
|
|
|
getCommitMessage = do
|
|
|
|
config <- Annex.getGitConfig
|
|
|
|
case annexCommitMessageCommand config of
|
|
|
|
Nothing -> return (annexCommitMessage config)
|
|
|
|
Just cmd -> catchDefaultIO (annexCommitMessage config) $
|
|
|
|
Just <$> liftIO (readProcess "sh" ["-c", cmd])
|
2021-05-17 17:07:47 +00:00
|
|
|
|
2011-12-13 01:12:51 +00:00
|
|
|
{- Stages the journal, and commits staged changes to the branch. -}
|
|
|
|
commit :: String -> Annex ()
|
2022-07-13 19:17:08 +00:00
|
|
|
commit = whenM (journalDirty gitAnnexJournalDir) . forceCommit
|
2013-10-23 16:58:01 +00:00
|
|
|
|
2014-05-30 00:12:17 +00:00
|
|
|
{- Commits the current index to the branch even without any journalled
|
2013-10-23 16:58:01 +00:00
|
|
|
- changes. -}
|
|
|
|
forceCommit :: String -> Annex ()
|
2019-05-07 15:57:12 +00:00
|
|
|
forceCommit message = lockJournal $ \jl ->
|
|
|
|
stageJournal jl $ do
|
|
|
|
ref <- getBranch
|
|
|
|
commitIndex jl ref message [fullname]
|
2012-02-25 20:11:47 +00:00
|
|
|
|
2011-12-13 01:12:51 +00:00
|
|
|
{- Commits the staged changes in the index to the branch.
|
|
|
|
-
|
2015-02-09 22:34:48 +00:00
|
|
|
- Ensures that the branch's index file is first updated to merge the state
|
2012-02-14 15:20:30 +00:00
|
|
|
- of the branch at branchref, before running the commit action. This
|
2011-12-13 01:12:51 +00:00
|
|
|
- is needed because the branch may have had changes pushed to it, that
|
|
|
|
- are not yet reflected in the index.
|
|
|
|
-
|
|
|
|
- The branchref value can have been obtained using getBranch at any
|
|
|
|
- previous point, though getting it a long time ago makes the race
|
|
|
|
- more likely to occur.
|
2015-02-09 22:34:48 +00:00
|
|
|
-
|
|
|
|
- Note that changes may be pushed to the branch at any point in time!
|
|
|
|
- So, there's a race. If the commit is made using the newly pushed tip of
|
|
|
|
- the branch as its parent, and that ref has not yet been merged into the
|
|
|
|
- index, then the result is that the commit will revert the pushed
|
|
|
|
- changes, since they have not been merged into the index. This race
|
|
|
|
- is detected and another commit made to fix it.
|
|
|
|
-
|
|
|
|
- (It's also possible for the branch to be overwritten,
|
|
|
|
- losing the commit made here. But that's ok; the data is still in the
|
|
|
|
- index and will get committed again later.)
|
2011-12-13 01:12:51 +00:00
|
|
|
-}
|
2013-10-23 16:58:01 +00:00
|
|
|
commitIndex :: JournalLocked -> Git.Ref -> String -> [Git.Ref] -> Annex ()
|
|
|
|
commitIndex jl branchref message parents = do
|
2012-09-15 22:34:46 +00:00
|
|
|
showStoringStateAction
|
2015-02-09 22:34:48 +00:00
|
|
|
commitIndex' jl branchref message message 0 parents
|
|
|
|
commitIndex' :: JournalLocked -> Git.Ref -> String -> String -> Integer -> [Git.Ref] -> Annex ()
|
|
|
|
commitIndex' jl branchref message basemessage retrynum parents = do
|
2013-10-03 19:43:08 +00:00
|
|
|
updateIndex jl branchref
|
2019-11-11 22:20:35 +00:00
|
|
|
cmode <- annexCommitMode <$> Annex.getGitConfig
|
2019-11-11 20:15:05 +00:00
|
|
|
committedref <- inRepo $ Git.Branch.commitAlways cmode message fullname parents
|
2011-12-13 01:12:51 +00:00
|
|
|
setIndexSha committedref
|
|
|
|
parentrefs <- commitparents <$> catObject committedref
|
2014-01-26 21:04:12 +00:00
|
|
|
when (racedetected branchref parentrefs) $
|
2011-12-13 01:12:51 +00:00
|
|
|
fixrace committedref parentrefs
|
2012-12-13 04:24:19 +00:00
|
|
|
where
|
|
|
|
-- look for "parent ref" lines and return the refs
|
|
|
|
commitparents = map (Git.Ref . snd) . filter isparent .
|
2020-04-07 21:41:09 +00:00
|
|
|
map (toassoc . L.toStrict) . L.split newline
|
2017-05-16 03:32:17 +00:00
|
|
|
newline = fromIntegral (ord '\n')
|
2020-04-07 21:41:09 +00:00
|
|
|
toassoc = separate' (== (fromIntegral (ord ' ')))
|
2012-12-13 04:24:19 +00:00
|
|
|
isparent (k,_) = k == "parent"
|
2011-12-13 01:12:51 +00:00
|
|
|
|
2012-12-13 04:24:19 +00:00
|
|
|
{- The race can be detected by checking the commit's
|
|
|
|
- parent, which will be the newly pushed branch,
|
|
|
|
- instead of the expected ref that the index was updated to. -}
|
|
|
|
racedetected expectedref parentrefs
|
|
|
|
| expectedref `elem` parentrefs = False -- good parent
|
|
|
|
| otherwise = True -- race!
|
2011-12-13 01:12:51 +00:00
|
|
|
|
2012-12-13 04:24:19 +00:00
|
|
|
{- To recover from the race, union merge the lost refs
|
2015-02-09 22:34:48 +00:00
|
|
|
- into the index. -}
|
2012-12-13 04:24:19 +00:00
|
|
|
fixrace committedref lostrefs = do
|
2015-02-09 22:34:48 +00:00
|
|
|
showSideAction "recovering from race"
|
|
|
|
let retrynum' = retrynum+1
|
|
|
|
-- small sleep to let any activity that caused
|
|
|
|
-- the race settle down
|
|
|
|
liftIO $ threadDelay (100000 + fromInteger retrynum')
|
2013-10-03 19:43:08 +00:00
|
|
|
mergeIndex jl lostrefs
|
2015-02-09 22:34:48 +00:00
|
|
|
let racemessage = basemessage ++ " (recovery from race #" ++ show retrynum' ++ "; expected commit parent " ++ show branchref ++ " but found " ++ show lostrefs ++ " )"
|
|
|
|
commitIndex' jl committedref racemessage basemessage retrynum' [committedref]
|
2011-12-13 01:12:51 +00:00
|
|
|
|
2018-04-26 18:21:27 +00:00
|
|
|
{- Lists all files on the branch. including ones in the journal
|
2021-12-27 19:28:31 +00:00
|
|
|
- that have not been committed yet.
|
|
|
|
-
|
|
|
|
- There may be duplicates in the list, when the journal has files that
|
|
|
|
- have not been written to the branch yet.
|
|
|
|
-
|
|
|
|
- In a read-only repository that has other git-annex branches that have
|
|
|
|
- not been merged in, returns Nothing, because it's not possible to
|
|
|
|
- efficiently handle that.
|
|
|
|
-}
|
|
|
|
files :: Annex (Maybe ([RawFilePath], IO Bool))
|
2012-09-15 19:40:13 +00:00
|
|
|
files = do
|
2021-12-27 19:28:31 +00:00
|
|
|
st <- update
|
|
|
|
if not (null (unmergedRefs st))
|
|
|
|
then return Nothing
|
|
|
|
else do
|
|
|
|
(bfs, cleanup) <- branchFiles
|
2023-10-24 17:06:54 +00:00
|
|
|
jfs <- journalledFiles
|
|
|
|
pjfs <- journalledFilesPrivate
|
2021-12-27 19:28:31 +00:00
|
|
|
-- ++ forces the content of the first list to be
|
|
|
|
-- buffered in memory, so use journalledFiles,
|
|
|
|
-- which should be much smaller most of the time.
|
|
|
|
-- branchFiles will stream as the list is consumed.
|
2023-10-24 17:06:54 +00:00
|
|
|
let l = jfs ++ pjfs ++ bfs
|
2021-12-27 19:28:31 +00:00
|
|
|
return (Just (l, cleanup))
|
2013-08-31 21:38:33 +00:00
|
|
|
|
2023-10-24 17:06:54 +00:00
|
|
|
{- Lists all files currently in the journal, but not files in the private
|
|
|
|
- journal. -}
|
2021-04-21 19:54:37 +00:00
|
|
|
journalledFiles :: Annex [RawFilePath]
|
2023-10-24 17:06:54 +00:00
|
|
|
journalledFiles = getJournalledFilesStale gitAnnexJournalDir
|
|
|
|
|
|
|
|
journalledFilesPrivate :: Annex [RawFilePath]
|
|
|
|
journalledFilesPrivate = ifM privateUUIDsKnown
|
|
|
|
( getJournalledFilesStale gitAnnexPrivateJournalDir
|
|
|
|
, return []
|
2021-04-23 18:21:57 +00:00
|
|
|
)
|
2021-04-21 19:54:37 +00:00
|
|
|
|
2013-08-31 21:38:33 +00:00
|
|
|
{- Files in the branch, not including any from journalled changes,
|
|
|
|
- and without updating the branch. -}
|
2020-09-25 14:58:30 +00:00
|
|
|
branchFiles :: Annex ([RawFilePath], IO Bool)
|
2018-04-26 18:21:27 +00:00
|
|
|
branchFiles = withIndex $ inRepo branchFiles'
|
|
|
|
|
2020-09-25 14:58:30 +00:00
|
|
|
branchFiles' :: Git.Repo -> IO ([RawFilePath], IO Bool)
|
|
|
|
branchFiles' = Git.Command.pipeNullSplit' $
|
2021-03-23 16:44:29 +00:00
|
|
|
lsTreeParams Git.LsTree.LsTreeRecursive (Git.LsTree.LsTreeLong False)
|
|
|
|
fullname
|
|
|
|
[Param "--name-only"]
|
2011-12-13 01:12:51 +00:00
|
|
|
|
|
|
|
{- Populates the branch's index file with the current branch contents.
|
|
|
|
-
|
|
|
|
- This is only done when the index doesn't yet exist, and the index
|
2023-03-14 02:39:16 +00:00
|
|
|
- is used to build up changes to be committed to the branch, and merge
|
2011-12-13 01:12:51 +00:00
|
|
|
- in changes from other branches.
|
|
|
|
-}
|
|
|
|
genIndex :: Git.Repo -> IO ()
|
2012-06-08 04:29:39 +00:00
|
|
|
genIndex g = Git.UpdateIndex.streamUpdateIndex g
|
|
|
|
[Git.UpdateIndex.lsTree fullname g]
|
2011-12-13 01:12:51 +00:00
|
|
|
|
2011-12-30 19:57:28 +00:00
|
|
|
{- Merges the specified refs into the index.
|
2011-12-13 01:12:51 +00:00
|
|
|
- Any changes staged in the index will be preserved. -}
|
2013-10-03 19:43:08 +00:00
|
|
|
mergeIndex :: JournalLocked -> [Git.Ref] -> Annex ()
|
|
|
|
mergeIndex jl branches = do
|
|
|
|
prepareModifyIndex jl
|
2022-07-25 21:32:39 +00:00
|
|
|
withHashObjectHandle $ \hashhandle ->
|
|
|
|
withCatFileHandle $ \ch ->
|
|
|
|
inRepo $ \g -> Git.UnionMerge.mergeIndex hashhandle ch g branches
|
2011-12-13 01:12:51 +00:00
|
|
|
|
2013-10-03 19:43:08 +00:00
|
|
|
{- Removes any stale git lock file, to avoid git falling over when
|
|
|
|
- updating the index.
|
|
|
|
-
|
|
|
|
- Since all modifications of the index are performed inside this module,
|
|
|
|
- and only when the journal is locked, the fact that the journal has to be
|
|
|
|
- locked when this is called ensures that no other process is currently
|
|
|
|
- modifying the index. So any index.lock file must be stale, caused
|
|
|
|
- by git running when the system crashed, or the repository's disk was
|
|
|
|
- removed, etc.
|
|
|
|
-}
|
|
|
|
prepareModifyIndex :: JournalLocked -> Annex ()
|
|
|
|
prepareModifyIndex _jl = do
|
|
|
|
index <- fromRepo gitAnnexIndex
|
2020-11-05 22:45:37 +00:00
|
|
|
void $ liftIO $ tryIO $ R.removeLink (index <> ".lock")
|
2013-10-03 19:43:08 +00:00
|
|
|
|
2011-12-13 01:12:51 +00:00
|
|
|
{- Runs an action using the branch's index file. -}
|
|
|
|
withIndex :: Annex a -> Annex a
|
|
|
|
withIndex = withIndex' False
|
|
|
|
withIndex' :: Bool -> Annex a -> Annex a
|
2020-04-10 17:37:04 +00:00
|
|
|
withIndex' bootstrapping a = withIndexFile AnnexIndexFile $ \f -> do
|
|
|
|
checkIndexOnce $ unlessM (liftIO $ doesFileExist f) $ do
|
|
|
|
unless bootstrapping create
|
2020-10-29 18:20:57 +00:00
|
|
|
createAnnexDirectory $ toRawFilePath $ takeDirectory f
|
2020-04-10 17:37:04 +00:00
|
|
|
unless bootstrapping $ inRepo genIndex
|
|
|
|
a
|
2011-12-13 01:12:51 +00:00
|
|
|
|
|
|
|
{- Updates the branch's index to reflect the current contents of the branch.
|
|
|
|
- Any changes staged in the index will be preserved.
|
|
|
|
-
|
|
|
|
- Compares the ref stored in the lock file with the current
|
|
|
|
- ref of the branch to see if an update is needed.
|
|
|
|
-}
|
2013-10-03 19:43:08 +00:00
|
|
|
updateIndex :: JournalLocked -> Git.Ref -> Annex ()
|
|
|
|
updateIndex jl branchref = whenM (needUpdateIndex branchref) $
|
|
|
|
forceUpdateIndex jl branchref
|
2012-09-15 22:34:46 +00:00
|
|
|
|
2013-10-03 19:43:08 +00:00
|
|
|
forceUpdateIndex :: JournalLocked -> Git.Ref -> Annex ()
|
|
|
|
forceUpdateIndex jl branchref = do
|
|
|
|
withIndex $ mergeIndex jl [fullname]
|
2012-09-15 22:34:46 +00:00
|
|
|
setIndexSha branchref
|
|
|
|
|
|
|
|
{- Checks if the index needs to be updated. -}
|
|
|
|
needUpdateIndex :: Git.Ref -> Annex Bool
|
|
|
|
needUpdateIndex branchref = do
|
2020-10-29 18:20:57 +00:00
|
|
|
f <- fromRawFilePath <$> fromRepo gitAnnexIndexStatus
|
2020-04-07 21:41:09 +00:00
|
|
|
committedref <- Git.Ref . firstLine' <$>
|
|
|
|
liftIO (catchDefaultIO mempty $ B.readFile f)
|
2013-10-03 19:06:58 +00:00
|
|
|
return (committedref /= branchref)
|
2011-12-13 01:12:51 +00:00
|
|
|
|
|
|
|
{- Record that the branch's index has been updated to correspond to a
|
2024-08-14 07:19:30 +00:00
|
|
|
- given sha of the branch. -}
|
|
|
|
setIndexSha :: Git.Sha -> Annex ()
|
2011-12-13 01:12:51 +00:00
|
|
|
setIndexSha ref = do
|
2013-10-03 19:06:58 +00:00
|
|
|
f <- fromRepo gitAnnexIndexStatus
|
2018-01-04 18:46:58 +00:00
|
|
|
writeLogFile f $ fromRef ref ++ "\n"
|
2015-03-20 18:52:58 +00:00
|
|
|
runAnnexHook postUpdateAnnexHook
|
2011-12-13 01:12:51 +00:00
|
|
|
|
2019-05-07 15:57:12 +00:00
|
|
|
{- Stages the journal into the index, and runs an action that
|
|
|
|
- commits the index to the branch. Note that the action is run
|
|
|
|
- inside withIndex so will automatically use the branch's index.
|
2013-10-03 19:43:08 +00:00
|
|
|
-
|
|
|
|
- Before staging, this removes any existing git index file lock.
|
|
|
|
- This is safe to do because stageJournal is the only thing that
|
|
|
|
- modifies this index file, and only one can run at a time, because
|
|
|
|
- the journal is locked. So any existing git index file lock must be
|
|
|
|
- stale, and the journal must contain any data that was in the process
|
|
|
|
- of being written to the index file when it crashed.
|
|
|
|
-}
|
2019-05-07 15:57:12 +00:00
|
|
|
stageJournal :: JournalLocked -> Annex () -> Annex ()
|
|
|
|
stageJournal jl commitindex = withIndex $ withOtherTmp $ \tmpdir -> do
|
2013-10-03 19:43:08 +00:00
|
|
|
prepareModifyIndex jl
|
2012-09-15 23:47:23 +00:00
|
|
|
g <- gitRepo
|
2024-05-15 21:33:38 +00:00
|
|
|
st <- getState
|
|
|
|
let dir = gitAnnexJournalDir st g
|
2020-10-29 18:20:57 +00:00
|
|
|
(jlogf, jlogh) <- openjlog (fromRawFilePath tmpdir)
|
2022-07-25 21:32:39 +00:00
|
|
|
withHashObjectHandle $ \h ->
|
|
|
|
withJournalHandle gitAnnexJournalDir $ \jh ->
|
|
|
|
Git.UpdateIndex.streamUpdateIndex g
|
|
|
|
[genstream dir h jh jlogh]
|
2019-05-07 15:57:12 +00:00
|
|
|
commitindex
|
2020-10-29 18:20:57 +00:00
|
|
|
liftIO $ cleanup (fromRawFilePath dir) jlogh jlogf
|
2012-12-13 04:24:19 +00:00
|
|
|
where
|
2017-11-15 20:55:38 +00:00
|
|
|
genstream dir h jh jlogh streamer = readDirectory jh >>= \case
|
|
|
|
Nothing -> return ()
|
|
|
|
Just file -> do
|
2023-10-10 17:22:02 +00:00
|
|
|
let path = dir P.</> toRawFilePath file
|
|
|
|
unless (dirCruft file) $ whenM (isfile path) $ do
|
2017-11-15 20:55:38 +00:00
|
|
|
sha <- Git.HashObject.hashFile h path
|
|
|
|
hPutStrLn jlogh file
|
|
|
|
streamer $ Git.UpdateIndex.updateIndexLine
|
2019-12-11 18:12:22 +00:00
|
|
|
sha TreeFile (asTopFilePath $ fileJournal $ toRawFilePath file)
|
2017-11-15 20:55:38 +00:00
|
|
|
genstream dir h jh jlogh streamer
|
2023-10-10 17:22:02 +00:00
|
|
|
isfile file = isRegularFile <$> R.getFileStatus file
|
2014-07-04 19:28:07 +00:00
|
|
|
-- Clean up the staged files, as listed in the temp log file.
|
|
|
|
-- The temp file is used to avoid needing to buffer all the
|
|
|
|
-- filenames in memory.
|
|
|
|
cleanup dir jlogh jlogf = do
|
|
|
|
hFlush jlogh
|
|
|
|
hSeek jlogh AbsoluteSeek 0
|
|
|
|
stagedfs <- lines <$> hGetContents jlogh
|
|
|
|
mapM_ (removeFile . (dir </>)) stagedfs
|
|
|
|
hClose jlogh
|
2020-11-24 16:38:12 +00:00
|
|
|
removeWhenExistsWith (R.removeLink) (toRawFilePath jlogf)
|
2021-08-30 17:05:02 +00:00
|
|
|
openjlog tmpdir = liftIO $ openTmpFileIn tmpdir "jlog"
|
2013-08-28 19:57:42 +00:00
|
|
|
|
2022-07-20 14:57:28 +00:00
|
|
|
getLocalTransitions :: Annex Transitions
|
|
|
|
getLocalTransitions =
|
2021-12-28 16:15:51 +00:00
|
|
|
parseTransitionsStrictly "local"
|
2022-07-20 14:57:28 +00:00
|
|
|
<$> getLocal transitionsLog
|
2021-12-28 16:15:51 +00:00
|
|
|
|
2013-08-28 19:57:42 +00:00
|
|
|
{- This is run after the refs have been merged into the index,
|
|
|
|
- but before the result is committed to the branch.
|
2013-09-03 20:31:32 +00:00
|
|
|
- (Which is why it's passed the contents of the local branches's
|
|
|
|
- transition log before that merge took place.)
|
2013-08-28 19:57:42 +00:00
|
|
|
-
|
|
|
|
- When the refs contain transitions that have not yet been done locally,
|
|
|
|
- the transitions are performed on the index, and a new branch
|
2013-09-03 20:31:32 +00:00
|
|
|
- is created from the result.
|
2013-08-28 19:57:42 +00:00
|
|
|
-
|
|
|
|
- When there are transitions recorded locally that have not been done
|
|
|
|
- to the remote refs, the transitions are performed in the index,
|
2013-09-03 20:31:32 +00:00
|
|
|
- and committed to the existing branch. In this case, the untransitioned
|
2013-08-28 19:57:42 +00:00
|
|
|
- remote refs cannot be merged into the branch (since transitions
|
2013-09-03 20:31:32 +00:00
|
|
|
- throw away history), so they are added to the list of refs to ignore,
|
2013-08-28 19:57:42 +00:00
|
|
|
- to avoid re-merging content from them again.
|
2013-10-03 18:41:57 +00:00
|
|
|
-}
|
|
|
|
handleTransitions :: JournalLocked -> Transitions -> [Git.Ref] -> Annex Bool
|
|
|
|
handleTransitions jl localts refs = do
|
2021-12-28 17:23:32 +00:00
|
|
|
remotets <- mapM getRefTransitions refs
|
2013-08-28 19:57:42 +00:00
|
|
|
if all (localts ==) remotets
|
2013-09-03 20:31:32 +00:00
|
|
|
then return False
|
2013-08-28 19:57:42 +00:00
|
|
|
else do
|
2021-12-28 17:23:32 +00:00
|
|
|
let m = M.fromList (zip refs remotets)
|
2013-08-28 19:57:42 +00:00
|
|
|
let allts = combineTransitions (localts:remotets)
|
|
|
|
let (transitionedrefs, untransitionedrefs) =
|
|
|
|
partition (\r -> M.lookup r m == Just allts) refs
|
2013-10-03 18:41:57 +00:00
|
|
|
performTransitionsLocked jl allts (localts /= allts) transitionedrefs
|
2013-08-28 19:57:42 +00:00
|
|
|
ignoreRefs untransitionedrefs
|
2013-09-03 20:31:32 +00:00
|
|
|
return True
|
2013-08-28 19:57:42 +00:00
|
|
|
|
|
|
|
{- Performs the specified transitions on the contents of the index file,
|
2013-10-03 18:02:34 +00:00
|
|
|
- commits it to the branch, or creates a new branch.
|
|
|
|
-}
|
2013-09-03 20:31:32 +00:00
|
|
|
performTransitions :: Transitions -> Bool -> [Ref] -> Annex ()
|
2013-10-03 18:41:57 +00:00
|
|
|
performTransitions ts neednewlocalbranch transitionedrefs = lockJournal $ \jl ->
|
|
|
|
performTransitionsLocked jl ts neednewlocalbranch transitionedrefs
|
|
|
|
performTransitionsLocked :: JournalLocked -> Transitions -> Bool -> [Ref] -> Annex ()
|
2013-10-03 18:48:46 +00:00
|
|
|
performTransitionsLocked jl ts neednewlocalbranch transitionedrefs = do
|
2013-08-31 21:38:33 +00:00
|
|
|
-- For simplicity & speed, we're going to use the Annex.Queue to
|
|
|
|
-- update the git-annex branch, while it usually holds changes
|
|
|
|
-- for the head branch. Flush any such changes.
|
|
|
|
Annex.Queue.flush
|
Fix git-annex branch data loss that could occur after git-annex forget --drop-dead
Added getStaged, to get the versions of git-annex branch files staged in its
index, and use during transitions so the result of merging sibling branches
is used.
The catFileStop in performTransitionsLocked is absolutely necessary,
without that the bug still occurred, because git cat-file was already
running and was looking at the old index file.
Note that getLocal still has cat-file look at the git-annex branch, not the
index. It might be faster if it looked at the index, but probably only
marginally so, and I've not benchmarked it to see if it's faster at all. I
didn't want to change unrelated behavior as part of this bug fix. And as
the need for catFileStop shows, using the index file has added
complications.
Anyway, it still seems fine for getLocal to look at the git-annex branch,
because normally the index file is updated just before the git-annex branch
is committed, and so they'll contain the same information. It's only during
a transition that the two diverge.
This commit was sponsored by Paul Walmsley in honor of Mark Phillips.
2018-08-06 21:22:12 +00:00
|
|
|
-- Stop any running git cat-files, to ensure that the
|
|
|
|
-- getStaged calls below use the current index, and not some older
|
|
|
|
-- one.
|
|
|
|
catFileStop
|
2013-08-31 21:38:33 +00:00
|
|
|
withIndex $ do
|
2013-10-03 19:43:08 +00:00
|
|
|
prepareModifyIndex jl
|
2016-05-18 16:26:38 +00:00
|
|
|
run $ mapMaybe getTransitionCalculator tlist
|
2013-08-31 21:38:33 +00:00
|
|
|
Annex.Queue.flush
|
2013-09-03 20:31:32 +00:00
|
|
|
if neednewlocalbranch
|
2013-08-31 21:38:33 +00:00
|
|
|
then do
|
2019-11-11 22:20:35 +00:00
|
|
|
cmode <- annexCommitMode <$> Annex.getGitConfig
|
2024-06-07 19:59:54 +00:00
|
|
|
-- Creating a new empty branch must happen
|
|
|
|
-- atomically, so if this is interrupted,
|
|
|
|
-- it will not leave the new branch created
|
|
|
|
-- but without exports grafted in.
|
|
|
|
c <- inRepo $ Git.Branch.commitShaAlways
|
|
|
|
cmode message transitionedrefs
|
|
|
|
void $ regraftexports c
|
2013-08-31 21:38:33 +00:00
|
|
|
else do
|
|
|
|
ref <- getBranch
|
2024-06-07 19:59:54 +00:00
|
|
|
ref' <- regraftexports ref
|
|
|
|
commitIndex jl ref' message
|
|
|
|
(nub $ fullname:transitionedrefs)
|
2013-08-28 19:57:42 +00:00
|
|
|
where
|
2014-10-09 18:53:13 +00:00
|
|
|
message
|
2013-09-03 20:31:32 +00:00
|
|
|
| neednewlocalbranch && null transitionedrefs = "new branch for transition " ++ tdesc
|
2013-08-28 19:57:42 +00:00
|
|
|
| otherwise = "continuing transition " ++ tdesc
|
2016-05-18 16:26:38 +00:00
|
|
|
tdesc = show $ map describeTransition tlist
|
2019-01-10 21:13:30 +00:00
|
|
|
tlist = knownTransitionList ts
|
2013-08-31 21:38:33 +00:00
|
|
|
|
|
|
|
{- The changes to make to the branch are calculated and applied to
|
|
|
|
- the branch directly, rather than going through the journal,
|
|
|
|
- which would be innefficient. (And the journal is not designed
|
|
|
|
- to hold changes to every file in the branch at once.)
|
|
|
|
-
|
|
|
|
- When a file in the branch is changed by transition code,
|
Fix git-annex branch data loss that could occur after git-annex forget --drop-dead
Added getStaged, to get the versions of git-annex branch files staged in its
index, and use during transitions so the result of merging sibling branches
is used.
The catFileStop in performTransitionsLocked is absolutely necessary,
without that the bug still occurred, because git cat-file was already
running and was looking at the old index file.
Note that getLocal still has cat-file look at the git-annex branch, not the
index. It might be faster if it looked at the index, but probably only
marginally so, and I've not benchmarked it to see if it's faster at all. I
didn't want to change unrelated behavior as part of this bug fix. And as
the need for catFileStop shows, using the index file has added
complications.
Anyway, it still seems fine for getLocal to look at the git-annex branch,
because normally the index file is updated just before the git-annex branch
is committed, and so they'll contain the same information. It's only during
a transition that the two diverge.
This commit was sponsored by Paul Walmsley in honor of Mark Phillips.
2018-08-06 21:22:12 +00:00
|
|
|
- its new content is remembered and fed into the code for subsequent
|
2013-08-31 21:38:33 +00:00
|
|
|
- transitions.
|
|
|
|
-}
|
|
|
|
run [] = noop
|
|
|
|
run changers = do
|
2020-02-14 19:22:48 +00:00
|
|
|
config <- Annex.getGitConfig
|
convert old uuid-based log parsers to attoparsec
This preserves the workaround for the old bug that caused NoUUID items
to be stored in the log, prefixing log lines with " ". It's now handled
implicitly, by using takeWhile1 (/= ' ') to get the uuid.
There is a behavior change from the old parser, which split the value
into words and then recombined it. That meant that "foo bar" and "foo\tbar"
came out as "foo bar". That behavior was not documented, and seems
surprising; it meant that after a git-annex describe here "foo bar",
you wouldn't get that same string back out when git-annex displayed repo
descriptions.
Otoh, some other parsers relied on the old behavior, and the attoparsec
rewrites had to deal with the issue themselves...
For group.log, there are some edge cases around the user providing a
group name with a leading or trailing space. The old parser would ignore
such excess whitespace. The new parser does too, because the alternative
is to refuse to parse something like " group1 group2 " due to excess
whitespace, which would be even more confusing behavior.
The only git-annex branch log file that is not converted to attoparsec
and bytestring-builder now is transitions.log.
2019-01-10 18:39:36 +00:00
|
|
|
trustmap <- calcTrustMap <$> getStaged trustLog
|
2019-10-14 19:38:07 +00:00
|
|
|
remoteconfigmap <- calcRemoteConfigMap <$> getStaged remoteLog
|
|
|
|
-- partially apply, improves performance
|
2021-05-13 18:43:25 +00:00
|
|
|
let changers' = map (\c -> c trustmap remoteconfigmap config) changers
|
2020-09-25 14:58:30 +00:00
|
|
|
(fs, cleanup) <- branchFiles
|
2013-08-31 21:38:33 +00:00
|
|
|
forM_ fs $ \f -> do
|
Fix git-annex branch data loss that could occur after git-annex forget --drop-dead
Added getStaged, to get the versions of git-annex branch files staged in its
index, and use during transitions so the result of merging sibling branches
is used.
The catFileStop in performTransitionsLocked is absolutely necessary,
without that the bug still occurred, because git cat-file was already
running and was looking at the old index file.
Note that getLocal still has cat-file look at the git-annex branch, not the
index. It might be faster if it looked at the index, but probably only
marginally so, and I've not benchmarked it to see if it's faster at all. I
didn't want to change unrelated behavior as part of this bug fix. And as
the need for catFileStop shows, using the index file has added
complications.
Anyway, it still seems fine for getLocal to look at the git-annex branch,
because normally the index file is updated just before the git-annex branch
is committed, and so they'll contain the same information. It's only during
a transition that the two diverge.
This commit was sponsored by Paul Walmsley in honor of Mark Phillips.
2018-08-06 21:22:12 +00:00
|
|
|
content <- getStaged f
|
2019-10-14 19:38:07 +00:00
|
|
|
apply changers' f content
|
2020-09-25 14:58:30 +00:00
|
|
|
liftIO $ void cleanup
|
2021-04-13 19:00:23 +00:00
|
|
|
|
2019-10-14 19:38:07 +00:00
|
|
|
apply [] _ _ = return ()
|
2019-10-14 20:04:15 +00:00
|
|
|
apply (changer:rest) file content = case changer file content of
|
|
|
|
PreserveFile -> apply rest file content
|
|
|
|
ChangeFile builder -> do
|
|
|
|
let content' = toLazyByteString builder
|
|
|
|
if L.null content'
|
|
|
|
then do
|
|
|
|
Annex.Queue.addUpdateIndex
|
2023-02-27 19:02:53 +00:00
|
|
|
=<< inRepo (Git.UpdateIndex.unstageFile file)
|
2019-10-14 20:04:15 +00:00
|
|
|
-- File is deleted; can't run any other
|
|
|
|
-- transitions on it.
|
|
|
|
return ()
|
|
|
|
else do
|
|
|
|
sha <- hashBlob content'
|
|
|
|
Annex.Queue.addUpdateIndex $ Git.UpdateIndex.pureStreamer $
|
2019-12-09 17:49:05 +00:00
|
|
|
Git.UpdateIndex.updateIndexLine sha TreeFile (asTopFilePath file)
|
2019-10-14 20:04:15 +00:00
|
|
|
apply rest file content'
|
2015-01-27 21:38:06 +00:00
|
|
|
|
2021-04-13 19:00:23 +00:00
|
|
|
-- Trees mentioned in export.log were grafted into the old
|
2024-06-07 19:59:54 +00:00
|
|
|
-- git-annex branch to make sure they remain available.
|
|
|
|
-- Re-graft the trees.
|
|
|
|
regraftexports parent = do
|
2021-04-13 19:00:23 +00:00
|
|
|
l <- exportedTreeishes . M.elems . parseExportLogMap
|
|
|
|
<$> getStaged exportLog
|
2024-06-07 19:59:54 +00:00
|
|
|
c <- regraft l parent
|
|
|
|
inRepo $ Git.Branch.update' fullname c
|
|
|
|
setIndexSha c
|
|
|
|
return c
|
|
|
|
where
|
|
|
|
regraft [] c = pure c
|
2024-06-07 20:50:05 +00:00
|
|
|
regraft (et:ets) c =
|
|
|
|
-- Verify that the tree object exists.
|
|
|
|
catObjectDetails et >>= \case
|
|
|
|
Just _ ->
|
|
|
|
prepRememberTreeish et graftpoint c
|
|
|
|
>>= regraft ets
|
|
|
|
Nothing -> regraft ets c
|
2024-06-07 19:59:54 +00:00
|
|
|
graftpoint = asTopFilePath exportTreeGraftPoint
|
2021-04-13 19:00:23 +00:00
|
|
|
|
2015-01-27 21:38:06 +00:00
|
|
|
checkBranchDifferences :: Git.Ref -> Annex ()
|
|
|
|
checkBranchDifferences ref = do
|
convert old uuid-based log parsers to attoparsec
This preserves the workaround for the old bug that caused NoUUID items
to be stored in the log, prefixing log lines with " ". It's now handled
implicitly, by using takeWhile1 (/= ' ') to get the uuid.
There is a behavior change from the old parser, which split the value
into words and then recombined it. That meant that "foo bar" and "foo\tbar"
came out as "foo bar". That behavior was not documented, and seems
surprising; it meant that after a git-annex describe here "foo bar",
you wouldn't get that same string back out when git-annex displayed repo
descriptions.
Otoh, some other parsers relied on the old behavior, and the attoparsec
rewrites had to deal with the issue themselves...
For group.log, there are some edge cases around the user providing a
group name with a leading or trailing space. The old parser would ignore
such excess whitespace. The new parser does too, because the alternative
is to refuse to parse something like " group1 group2 " due to excess
whitespace, which would be even more confusing behavior.
The only git-annex branch log file that is not converted to attoparsec
and bytestring-builder now is transitions.log.
2019-01-10 18:39:36 +00:00
|
|
|
theirdiffs <- allDifferences . parseDifferencesLog
|
2015-01-27 21:38:06 +00:00
|
|
|
<$> catFile ref differenceLog
|
|
|
|
mydiffs <- annexDifferences <$> Annex.getGitConfig
|
|
|
|
when (theirdiffs /= mydiffs) $
|
2017-02-11 09:38:49 +00:00
|
|
|
giveup "Remote repository is tuned in incompatible way; cannot be merged with local repository."
|
2016-07-17 16:11:05 +00:00
|
|
|
|
|
|
|
ignoreRefs :: [Git.Sha] -> Annex ()
|
|
|
|
ignoreRefs rs = do
|
|
|
|
old <- getIgnoredRefs
|
|
|
|
let s = S.unions [old, S.fromList rs]
|
|
|
|
f <- fromRepo gitAnnexIgnoredRefs
|
2018-01-04 18:46:58 +00:00
|
|
|
writeLogFile f $
|
2016-07-17 16:11:05 +00:00
|
|
|
unlines $ map fromRef $ S.elems s
|
|
|
|
|
|
|
|
getIgnoredRefs :: Annex (S.Set Git.Sha)
|
2020-04-07 21:41:09 +00:00
|
|
|
getIgnoredRefs =
|
Windows: Fix CRLF handling in some log files
In particular, the mergedrefs file was written with CR added to each line,
but read without CRLF handling. This resulted in each update of the file
adding CR to each line in it, growing the number of lines, while also
preventing the optimisation from working, so it remerged unncessarily.
writeFile and readFile do NewlineMode translation on Windows. But the
ByteString conversion prevented that from happening any longer.
I've audited for other cases of this, and found three more
(.git/annex/index.lck, .git/annex/ignoredrefs, and .git/annex/import/). All
of those also only prevent optimisations from working. Some other files are
currently both read and written with ByteString, but old git-annex may have
written them with NewlineMode translation. Other files are at risk for
breakage later if the reader gets converted to ByteString.
This is a minimal fix, but should be enough, as long as I remember to use
fileLines when splitting a ByteString into lines. This leaves files written
using ByteString without CR added, but that's ok because old git-annex has
no difficulty reading such files.
When the mergedrefs file has gotten lines that end with "\r\r\r\n", this
will eventually clean it up. Each update will remove a single trailing CR.
Note that S8.lines is still used in eg Command.Unused, where it is parsing
git show-ref, and similar in Git/*. git commands don't include CR in their
output so that's ok.
Sponsored-by: Joshua Antonishen on Patreon
2023-10-30 18:23:23 +00:00
|
|
|
S.fromList . mapMaybe Git.Sha.extractSha . fileLines' <$> content
|
2016-07-17 16:11:05 +00:00
|
|
|
where
|
|
|
|
content = do
|
2020-10-29 18:20:57 +00:00
|
|
|
f <- fromRawFilePath <$> fromRepo gitAnnexIgnoredRefs
|
2020-04-07 21:41:09 +00:00
|
|
|
liftIO $ catchDefaultIO mempty $ B.readFile f
|
2016-07-17 16:11:05 +00:00
|
|
|
|
|
|
|
addMergedRefs :: [(Git.Sha, Git.Branch)] -> Annex ()
|
|
|
|
addMergedRefs [] = return ()
|
|
|
|
addMergedRefs new = do
|
|
|
|
old <- getMergedRefs'
|
|
|
|
-- Keep only the newest sha for each branch.
|
|
|
|
let l = nubBy ((==) `on` snd) (new ++ old)
|
|
|
|
f <- fromRepo gitAnnexMergedRefs
|
2018-01-04 18:46:58 +00:00
|
|
|
writeLogFile f $
|
2016-07-17 16:11:05 +00:00
|
|
|
unlines $ map (\(s, b) -> fromRef s ++ '\t' : fromRef b) l
|
|
|
|
|
|
|
|
getMergedRefs :: Annex (S.Set Git.Sha)
|
|
|
|
getMergedRefs = S.fromList . map fst <$> getMergedRefs'
|
|
|
|
|
|
|
|
getMergedRefs' :: Annex [(Git.Sha, Git.Branch)]
|
|
|
|
getMergedRefs' = do
|
2020-10-29 18:20:57 +00:00
|
|
|
f <- fromRawFilePath <$> fromRepo gitAnnexMergedRefs
|
2020-04-07 17:27:11 +00:00
|
|
|
s <- liftIO $ catchDefaultIO mempty $ B.readFile f
|
Windows: Fix CRLF handling in some log files
In particular, the mergedrefs file was written with CR added to each line,
but read without CRLF handling. This resulted in each update of the file
adding CR to each line in it, growing the number of lines, while also
preventing the optimisation from working, so it remerged unncessarily.
writeFile and readFile do NewlineMode translation on Windows. But the
ByteString conversion prevented that from happening any longer.
I've audited for other cases of this, and found three more
(.git/annex/index.lck, .git/annex/ignoredrefs, and .git/annex/import/). All
of those also only prevent optimisations from working. Some other files are
currently both read and written with ByteString, but old git-annex may have
written them with NewlineMode translation. Other files are at risk for
breakage later if the reader gets converted to ByteString.
This is a minimal fix, but should be enough, as long as I remember to use
fileLines when splitting a ByteString into lines. This leaves files written
using ByteString without CR added, but that's ok because old git-annex has
no difficulty reading such files.
When the mergedrefs file has gotten lines that end with "\r\r\r\n", this
will eventually clean it up. Each update will remove a single trailing CR.
Note that S8.lines is still used in eg Command.Unused, where it is parsing
git show-ref, and similar in Git/*. git commands don't include CR in their
output so that's ok.
Sponsored-by: Joshua Antonishen on Patreon
2023-10-30 18:23:23 +00:00
|
|
|
return $ map parse $ fileLines' s
|
2016-07-17 16:11:05 +00:00
|
|
|
where
|
|
|
|
parse l =
|
2020-04-07 17:27:11 +00:00
|
|
|
let (s, b) = separate' (== (fromIntegral (ord '\t'))) l
|
2016-07-17 16:11:05 +00:00
|
|
|
in (Ref s, Ref b)
|
2017-09-12 22:30:36 +00:00
|
|
|
|
|
|
|
{- Grafts a treeish into the branch at the specified location,
|
|
|
|
- and then removes it. This ensures that the treeish won't get garbage
|
|
|
|
- collected, and will always be available as long as the git-annex branch
|
2023-12-07 19:50:52 +00:00
|
|
|
- is available.
|
|
|
|
-
|
|
|
|
- Returns the sha of the git commit made to the git-annex branch.
|
|
|
|
-}
|
|
|
|
rememberTreeish :: Git.Ref -> TopFilePath -> Annex Git.Sha
|
2024-06-07 19:59:54 +00:00
|
|
|
rememberTreeish treeish graftpoint = lockJournal $ \jl -> do
|
2017-09-12 22:30:36 +00:00
|
|
|
branchref <- getBranch
|
|
|
|
updateIndex jl branchref
|
2024-06-07 19:59:54 +00:00
|
|
|
c <- prepRememberTreeish treeish graftpoint branchref
|
|
|
|
inRepo $ Git.Branch.update' fullname c
|
|
|
|
-- The tree in c is the same as the tree in branchref,
|
|
|
|
-- and the index was updated to that above, so it's safe to
|
|
|
|
-- say that the index contains c.
|
|
|
|
setIndexSha c
|
|
|
|
return c
|
|
|
|
|
|
|
|
{- Create a series of commits that graft a tree onto the parent commit,
|
|
|
|
- and then remove it. -}
|
|
|
|
prepRememberTreeish :: Git.Ref -> TopFilePath -> Git.Ref -> Annex Git.Sha
|
|
|
|
prepRememberTreeish treeish graftpoint parent = do
|
2019-02-22 15:16:22 +00:00
|
|
|
origtree <- fromMaybe (giveup "unable to determine git-annex branch tree") <$>
|
2024-06-07 19:59:54 +00:00
|
|
|
inRepo (Git.Ref.tree parent)
|
2019-02-22 15:16:22 +00:00
|
|
|
addedt <- inRepo $ Git.Tree.graftTree treeish graftpoint origtree
|
2019-11-11 22:20:35 +00:00
|
|
|
cmode <- annexCommitMode <$> Annex.getGitConfig
|
2019-11-11 20:15:05 +00:00
|
|
|
c <- inRepo $ Git.Branch.commitTree cmode
|
2024-06-07 19:59:54 +00:00
|
|
|
["graft"] [parent] addedt
|
|
|
|
inRepo $ Git.Branch.commitTree cmode
|
2024-04-09 16:56:47 +00:00
|
|
|
["graft cleanup"] [c] origtree
|
2019-10-14 20:04:15 +00:00
|
|
|
|
2024-08-13 16:42:04 +00:00
|
|
|
{- UnmergedBranches is used to indicate when a value was calculated in a
|
|
|
|
- read-only repository that has other git-annex branches that have not
|
|
|
|
- been merged in. The value does not include information from those
|
|
|
|
- branches.
|
|
|
|
-}
|
|
|
|
data UnmergedBranches t
|
|
|
|
= UnmergedBranches t
|
|
|
|
| NoUnmergedBranches t
|
|
|
|
|
2024-08-14 20:04:18 +00:00
|
|
|
type FileContents t b = Maybe (t, RawFilePath, Maybe (L.ByteString, Maybe b))
|
|
|
|
|
2021-04-21 18:19:58 +00:00
|
|
|
{- Runs an action on the content of selected files from the branch.
|
|
|
|
- This is much faster than reading the content of each file in turn,
|
2021-04-21 19:40:32 +00:00
|
|
|
- because it lets git cat-file stream content without blocking.
|
2021-04-21 18:19:58 +00:00
|
|
|
-
|
2021-04-21 19:40:32 +00:00
|
|
|
- The action is passed a callback that it can repeatedly call to read
|
|
|
|
- the next file and its contents. When there are no more files, the
|
|
|
|
- callback will return Nothing.
|
2024-08-14 07:19:30 +00:00
|
|
|
-
|
|
|
|
- Returns the accumulated result of the callback, as well as the sha of
|
|
|
|
- the branch at the point it was read.
|
2021-04-21 18:19:58 +00:00
|
|
|
-}
|
|
|
|
overBranchFileContents
|
2024-08-14 07:19:30 +00:00
|
|
|
:: Bool
|
|
|
|
-- ^ Should files in the journal be ignored? When False,
|
|
|
|
-- the content of journalled files is combined with files in the
|
|
|
|
-- git-annex branch. And also, at the end, the callback is run
|
|
|
|
-- on each journalled file, in case some journalled files are new
|
|
|
|
-- files that do not yet appear in the branch. Note that this means
|
|
|
|
-- the callback can be run more than once on the same filename,
|
|
|
|
-- and in this case it's also possible for the callback to be
|
|
|
|
-- passed some of the same file content repeatedly.
|
|
|
|
-> (RawFilePath -> Maybe v)
|
2024-08-14 20:04:18 +00:00
|
|
|
-> (Annex (FileContents v Bool) -> Annex a)
|
2024-08-14 07:19:30 +00:00
|
|
|
-> Annex (UnmergedBranches (a, Git.Sha))
|
|
|
|
overBranchFileContents ignorejournal select go = do
|
2021-04-21 19:40:32 +00:00
|
|
|
st <- update
|
2024-08-14 07:19:30 +00:00
|
|
|
let st' = if ignorejournal
|
|
|
|
then st { journalIgnorable = True }
|
|
|
|
else st
|
|
|
|
v <- overBranchFileContents' select go st'
|
2024-08-13 16:42:04 +00:00
|
|
|
return $ if not (null (unmergedRefs st))
|
|
|
|
then UnmergedBranches v
|
|
|
|
else NoUnmergedBranches v
|
2021-12-27 18:30:51 +00:00
|
|
|
|
|
|
|
overBranchFileContents'
|
|
|
|
:: (RawFilePath -> Maybe v)
|
2024-08-14 20:04:18 +00:00
|
|
|
-> (Annex (FileContents v Bool) -> Annex a)
|
2021-12-27 18:30:51 +00:00
|
|
|
-> BranchState
|
2024-08-14 07:19:30 +00:00
|
|
|
-> Annex (a, Git.Sha)
|
|
|
|
overBranchFileContents' select go st = do
|
2021-04-21 18:19:58 +00:00
|
|
|
g <- Annex.gitRepo
|
2024-08-14 07:19:30 +00:00
|
|
|
branchsha <- getBranch
|
2021-04-21 18:19:58 +00:00
|
|
|
(l, cleanup) <- inRepo $ Git.LsTree.lsTree
|
|
|
|
Git.LsTree.LsTreeRecursive
|
|
|
|
(Git.LsTree.LsTreeLong False)
|
2024-08-14 07:19:30 +00:00
|
|
|
branchsha
|
2021-04-21 18:19:58 +00:00
|
|
|
let select' f = fmap (\v -> (v, f)) (select f)
|
2021-04-21 19:40:32 +00:00
|
|
|
buf <- liftIO newEmptyMVar
|
|
|
|
let go' reader = go $ liftIO reader >>= \case
|
|
|
|
Just ((v, f), content) -> do
|
2024-08-14 17:46:44 +00:00
|
|
|
content' <- checkjournal f content >>= return . \case
|
|
|
|
Nothing -> Nothing
|
|
|
|
Just c -> Just (c, Just False)
|
2021-04-21 19:40:32 +00:00
|
|
|
return (Just (v, f, content'))
|
|
|
|
Nothing
|
2024-08-14 07:19:30 +00:00
|
|
|
| journalIgnorable st -> return Nothing
|
2024-08-14 17:46:44 +00:00
|
|
|
| otherwise ->
|
|
|
|
overJournalFileContents' buf (handlestale branchsha) select
|
2024-08-14 07:19:30 +00:00
|
|
|
res <- catObjectStreamLsTree l (select' . getTopFilePath . Git.LsTree.file) g go'
|
2021-04-23 15:32:25 +00:00
|
|
|
`finally` liftIO (void cleanup)
|
2024-08-14 07:19:30 +00:00
|
|
|
return (res, branchsha)
|
2021-04-21 19:40:32 +00:00
|
|
|
where
|
2021-12-27 18:30:51 +00:00
|
|
|
checkjournal f branchcontent
|
2021-10-26 17:43:50 +00:00
|
|
|
| journalIgnorable st = return branchcontent
|
2022-07-20 14:57:28 +00:00
|
|
|
| otherwise = getJournalFileStale (GetPrivate True) f >>= return . \case
|
2021-10-26 17:43:50 +00:00
|
|
|
NoJournalledContent -> branchcontent
|
|
|
|
JournalledContent journalledcontent ->
|
|
|
|
Just journalledcontent
|
|
|
|
PossiblyStaleJournalledContent journalledcontent ->
|
|
|
|
Just (fromMaybe mempty branchcontent <> journalledcontent)
|
2024-08-14 07:19:30 +00:00
|
|
|
|
|
|
|
handlestale branchsha f journalledcontent = do
|
|
|
|
-- This is expensive, but happens only when there is a
|
|
|
|
-- private journal file.
|
2024-08-14 17:46:44 +00:00
|
|
|
branchcontent <- getRef branchsha f
|
|
|
|
return (combineStaleJournalWithBranch branchcontent journalledcontent, Just True)
|
|
|
|
|
|
|
|
combineStaleJournalWithBranch :: L.ByteString -> L.ByteString -> L.ByteString
|
|
|
|
combineStaleJournalWithBranch branchcontent journalledcontent =
|
|
|
|
branchcontent <> journalledcontent
|
2024-08-14 07:19:30 +00:00
|
|
|
|
|
|
|
{- Like overBranchFileContents but only reads the content of journalled
|
2024-08-14 17:46:44 +00:00
|
|
|
- files.
|
2024-08-14 07:19:30 +00:00
|
|
|
-}
|
|
|
|
overJournalFileContents
|
2024-08-14 17:46:44 +00:00
|
|
|
:: (RawFilePath -> L.ByteString -> Annex (L.ByteString, Maybe b))
|
|
|
|
-- ^ Called with the journalled file content when the journalled
|
|
|
|
-- content may be stale or lack information committed to the
|
|
|
|
-- git-annex branch.
|
|
|
|
-> (RawFilePath -> Maybe v)
|
2024-08-14 20:04:18 +00:00
|
|
|
-> (Annex (FileContents v b) -> Annex a)
|
2024-08-14 07:19:30 +00:00
|
|
|
-> Annex a
|
2024-08-14 17:46:44 +00:00
|
|
|
overJournalFileContents handlestale select go = do
|
2024-08-14 07:19:30 +00:00
|
|
|
buf <- liftIO newEmptyMVar
|
|
|
|
go $ overJournalFileContents' buf handlestale select
|
|
|
|
|
|
|
|
overJournalFileContents'
|
|
|
|
:: MVar ([RawFilePath], [RawFilePath])
|
2024-08-14 17:46:44 +00:00
|
|
|
-> (RawFilePath -> L.ByteString -> Annex (L.ByteString, Maybe b))
|
2024-08-14 07:19:30 +00:00
|
|
|
-> (RawFilePath -> Maybe a)
|
2024-08-14 20:04:18 +00:00
|
|
|
-> Annex (FileContents a b)
|
2024-08-14 07:19:30 +00:00
|
|
|
overJournalFileContents' buf handlestale select =
|
|
|
|
liftIO (tryTakeMVar buf) >>= \case
|
|
|
|
Nothing -> do
|
|
|
|
jfs <- journalledFiles
|
|
|
|
pjfs <- journalledFilesPrivate
|
|
|
|
drain jfs pjfs
|
|
|
|
Just (jfs, pjfs) -> drain jfs pjfs
|
|
|
|
where
|
|
|
|
drain fs pfs = case getnext fs pfs of
|
2023-10-24 17:06:54 +00:00
|
|
|
Just (v, f, fs', pfs') -> do
|
|
|
|
liftIO $ putMVar buf (fs', pfs')
|
2022-07-20 14:57:28 +00:00
|
|
|
content <- getJournalFileStale (GetPrivate True) f >>= \case
|
2021-10-26 17:43:50 +00:00
|
|
|
NoJournalledContent -> return Nothing
|
|
|
|
JournalledContent journalledcontent ->
|
2024-08-14 17:46:44 +00:00
|
|
|
return (Just (journalledcontent, Nothing))
|
2024-08-14 07:19:30 +00:00
|
|
|
PossiblyStaleJournalledContent journalledcontent ->
|
|
|
|
Just <$> handlestale f journalledcontent
|
2021-04-21 19:40:32 +00:00
|
|
|
return (Just (v, f, content))
|
|
|
|
Nothing -> do
|
2023-10-24 17:06:54 +00:00
|
|
|
liftIO $ putMVar buf ([], [])
|
2021-04-21 19:40:32 +00:00
|
|
|
return Nothing
|
2021-10-26 17:43:50 +00:00
|
|
|
|
2023-10-24 17:06:54 +00:00
|
|
|
getnext [] [] = Nothing
|
|
|
|
getnext (f:fs) pfs = case select f of
|
|
|
|
Nothing -> getnext fs pfs
|
|
|
|
Just v -> Just (v, f, fs, pfs)
|
|
|
|
getnext [] (pf:pfs) = case select pf of
|
|
|
|
Nothing -> getnext [] pfs
|
|
|
|
Just v -> Just (v, pf, [], pfs)
|
2021-10-26 17:43:50 +00:00
|
|
|
|
sqlite datbase for importfeed
importfeed: Use caching database to avoid needing to list urls on every
run, and avoid using too much memory.
Benchmarking in my podcasts repo, importfeed got 1.42 seconds faster,
and memory use dropped from 203000k to 59408k.
Database.ImportFeed is Database.ContentIdentifier with the serial number
filed off. There is a bit of code duplication I would like to avoid,
particularly recordAnnexBranchTree, and getAnnexBranchTree. But these use
the persistent sqlite tables, so despite the code being the same, they
cannot be factored out.
Since this database includes the contentidentifier metadata, it will be
slightly redundant if a sqlite database is ever added for metadata. I
did consider making such a generic database and using it for this. But,
that would then need importfeed to update both the url database and the
metadata database, which is twice as much work diffing the git-annex
branch trees. Or would entagle updating two databases in a complex way.
So instead it seems better to optimise the database that
importfeed needs, and if the metadata database is used by another command,
use a little more disk space and do a little bit of redundant work to
update it.
Sponsored-by: unqueued on Patreon
2023-10-23 20:12:26 +00:00
|
|
|
{- Check if the git-annex branch has been updated from the oldtree.
|
|
|
|
- If so, returns the tuple of the old and new trees. -}
|
|
|
|
updatedFromTree :: Git.Sha -> Annex (Maybe (Git.Sha, Git.Sha))
|
|
|
|
updatedFromTree oldtree =
|
|
|
|
inRepo (Git.Ref.tree fullname) >>= \case
|
|
|
|
Just currtree | currtree /= oldtree ->
|
|
|
|
return $ Just (oldtree, currtree)
|
|
|
|
_ -> return Nothing
|