2011-06-21 20:08:09 +00:00
|
|
|
{- management of the git-annex branch
|
code to update a git-annex branch
There is no suitable git hook to run code when pulling changes that
might need to be merged into the git-annex branch. The post-merge hook
is only run when changes are merged into HEAD, and it's possible,
and indeed likely that many pulls will only have changes in git-annex,
but not in HEAD, and not trigger it.
So, git-annex will have to take care to update the branch before reading
from it, to make sure it has merged in current info from remotes. Happily,
this can be done quite inexpensively, just a git-show-ref to list
branches, and a minimalized git-log to see if there are unmerged changes
on the branches. To further speed up, it will be done only once per
git-annex run, max.
2011-06-21 18:29:09 +00:00
|
|
|
-
|
2013-08-28 19:57:42 +00:00
|
|
|
- Copyright 2011-2013 Joey Hess <joey@kitenet.net>
|
code to update a git-annex branch
There is no suitable git hook to run code when pulling changes that
might need to be merged into the git-annex branch. The post-merge hook
is only run when changes are merged into HEAD, and it's possible,
and indeed likely that many pulls will only have changes in git-annex,
but not in HEAD, and not trigger it.
So, git-annex will have to take care to update the branch before reading
from it, to make sure it has merged in current info from remotes. Happily,
this can be done quite inexpensively, just a git-show-ref to list
branches, and a minimalized git-log to see if there are unmerged changes
on the branches. To further speed up, it will be done only once per
git-annex run, max.
2011-06-21 18:29:09 +00:00
|
|
|
-
|
|
|
|
- Licensed under the GNU GPL version 3 or higher.
|
|
|
|
-}
|
|
|
|
|
2011-10-04 04:40:47 +00:00
|
|
|
module Annex.Branch (
|
2012-01-06 19:40:04 +00:00
|
|
|
fullname,
|
2011-12-13 01:12:51 +00:00
|
|
|
name,
|
|
|
|
hasOrigin,
|
|
|
|
hasSibling,
|
2011-12-30 19:57:28 +00:00
|
|
|
siblingBranches,
|
2011-06-22 19:58:30 +00:00
|
|
|
create,
|
2011-06-21 20:08:09 +00:00
|
|
|
update,
|
2011-12-30 19:57:28 +00:00
|
|
|
forceUpdate,
|
|
|
|
updateTo,
|
2011-06-21 23:11:55 +00:00
|
|
|
get,
|
2014-02-06 16:43:56 +00:00
|
|
|
getHistorical,
|
2011-06-21 23:11:55 +00:00
|
|
|
change,
|
2011-06-22 21:47:06 +00:00
|
|
|
commit,
|
2013-10-23 16:58:01 +00:00
|
|
|
forceCommit,
|
2011-06-24 15:59:34 +00:00
|
|
|
files,
|
2013-07-28 19:27:36 +00:00
|
|
|
withIndex,
|
2013-08-28 19:57:42 +00:00
|
|
|
performTransitions,
|
2011-06-21 20:08:09 +00:00
|
|
|
) where
|
code to update a git-annex branch
There is no suitable git hook to run code when pulling changes that
might need to be merged into the git-annex branch. The post-merge hook
is only run when changes are merged into HEAD, and it's possible,
and indeed likely that many pulls will only have changes in git-annex,
but not in HEAD, and not trigger it.
So, git-annex will have to take care to update the branch before reading
from it, to make sure it has merged in current info from remotes. Happily,
this can be done quite inexpensively, just a git-show-ref to list
branches, and a minimalized git-log to see if there are unmerged changes
on the branches. To further speed up, it will be done only once per
git-annex run, max.
2011-06-21 18:29:09 +00:00
|
|
|
|
Fix encoding of data written to git-annex branch. Avoid truncating unicode characters to 8 bits.
Allow any encoding to be used, as with filenames (but utf8 is the sane
choice). Affects metadata and repository descriptions, and preferred
content expressions.
The question of what's the right encoding for the git-annex branch is a
vexing one. utf-8 would be a nice choice, but this leaves the possibility
of bad data getting into a git-annex branch somehow, and this resulting in
git-annex crashing with encoding errors, which is a failure mode I want to
avoid.
(Also, preferred content expressions can refer to filenames, and filenames
can have any encoding, so limiting to utf-8 would not be ideal.)
The union merge code already took care to not assume any encoding for a
file. Except it assumes that any \n is a literal newline, and not part of
some encoding of a character that happens to contain a newline. (At least
utf-8 avoids using newline for anything except liternal newlines.)
Adapted the git-annex branch code to use this same approach.
Note that there is a potential interop problem with Windows, since
FileSystemEncoding doesn't work there, and instead things are always
decoded as utf-8. If someone uses non-utf8 encoding for data on the
git-annex branch, this can lead to an encoding error on windows. However,
this commit doesn't actually make that any worse, because the union merge
code would similarly fail with an encoding error on windows in that
situation.
This commit was sponsored by Kyle Meyer.
2014-05-27 18:16:33 +00:00
|
|
|
import qualified Data.ByteString.Lazy as L
|
2013-08-28 19:57:42 +00:00
|
|
|
import qualified Data.Set as S
|
|
|
|
import qualified Data.Map as M
|
Fix encoding of data written to git-annex branch. Avoid truncating unicode characters to 8 bits.
Allow any encoding to be used, as with filenames (but utf8 is the sane
choice). Affects metadata and repository descriptions, and preferred
content expressions.
The question of what's the right encoding for the git-annex branch is a
vexing one. utf-8 would be a nice choice, but this leaves the possibility
of bad data getting into a git-annex branch somehow, and this resulting in
git-annex crashing with encoding errors, which is a failure mode I want to
avoid.
(Also, preferred content expressions can refer to filenames, and filenames
can have any encoding, so limiting to utf-8 would not be ideal.)
The union merge code already took care to not assume any encoding for a
file. Except it assumes that any \n is a literal newline, and not part of
some encoding of a character that happens to contain a newline. (At least
utf-8 avoids using newline for anything except liternal newlines.)
Adapted the git-annex branch code to use this same approach.
Note that there is a potential interop problem with Windows, since
FileSystemEncoding doesn't work there, and instead things are always
decoded as utf-8. If someone uses non-utf8 encoding for data on the
git-annex branch, this can lead to an encoding error on windows. However,
this commit doesn't actually make that any worse, because the union merge
code would similarly fail with an encoding error on windows in that
situation.
This commit was sponsored by Kyle Meyer.
2014-05-27 18:16:33 +00:00
|
|
|
import Data.Bits.Utils
|
code to update a git-annex branch
There is no suitable git hook to run code when pulling changes that
might need to be merged into the git-annex branch. The post-merge hook
is only run when changes are merged into HEAD, and it's possible,
and indeed likely that many pulls will only have changes in git-annex,
but not in HEAD, and not trigger it.
So, git-annex will have to take care to update the branch before reading
from it, to make sure it has merged in current info from remotes. Happily,
this can be done quite inexpensively, just a git-show-ref to list
branches, and a minimalized git-log to see if there are unmerged changes
on the branches. To further speed up, it will be done only once per
git-annex run, max.
2011-06-21 18:29:09 +00:00
|
|
|
|
2011-10-05 20:02:51 +00:00
|
|
|
import Common.Annex
|
2011-12-12 21:38:46 +00:00
|
|
|
import Annex.BranchState
|
2011-12-12 22:03:28 +00:00
|
|
|
import Annex.Journal
|
2014-02-18 21:38:23 +00:00
|
|
|
import Annex.Index
|
2011-06-30 17:16:57 +00:00
|
|
|
import qualified Git
|
2011-12-14 19:56:11 +00:00
|
|
|
import qualified Git.Command
|
2011-12-12 22:23:24 +00:00
|
|
|
import qualified Git.Ref
|
2013-08-28 19:57:42 +00:00
|
|
|
import qualified Git.Sha
|
2011-12-13 01:12:51 +00:00
|
|
|
import qualified Git.Branch
|
2011-12-13 01:24:55 +00:00
|
|
|
import qualified Git.UnionMerge
|
2012-06-06 04:03:08 +00:00
|
|
|
import qualified Git.UpdateIndex
|
2012-02-14 18:35:52 +00:00
|
|
|
import Git.HashObject
|
2012-06-06 18:29:10 +00:00
|
|
|
import Git.Types
|
|
|
|
import Git.FilePath
|
2011-10-04 04:40:47 +00:00
|
|
|
import Annex.CatFile
|
2012-04-21 20:59:49 +00:00
|
|
|
import Annex.Perms
|
2013-08-31 21:38:33 +00:00
|
|
|
import Logs
|
2013-08-28 19:57:42 +00:00
|
|
|
import Logs.Transitions
|
2013-08-31 21:38:33 +00:00
|
|
|
import Logs.Trust.Pure
|
2013-08-28 19:57:42 +00:00
|
|
|
import Annex.ReplaceFile
|
2013-08-31 21:38:33 +00:00
|
|
|
import qualified Annex.Queue
|
|
|
|
import Annex.Branch.Transitions
|
code to update a git-annex branch
There is no suitable git hook to run code when pulling changes that
might need to be merged into the git-annex branch. The post-merge hook
is only run when changes are merged into HEAD, and it's possible,
and indeed likely that many pulls will only have changes in git-annex,
but not in HEAD, and not trigger it.
So, git-annex will have to take care to update the branch before reading
from it, to make sure it has merged in current info from remotes. Happily,
this can be done quite inexpensively, just a git-show-ref to list
branches, and a minimalized git-log to see if there are unmerged changes
on the branches. To further speed up, it will be done only once per
git-annex run, max.
2011-06-21 18:29:09 +00:00
|
|
|
|
2011-06-21 21:39:45 +00:00
|
|
|
{- Name of the branch that is used to store git-annex's information. -}
|
improve type signatures with a Ref newtype
In git, a Ref can be a Sha, or a Branch, or a Tag. I added type aliases for
those. Note that this does not prevent mixing up of eg, refs and branches
at the type level. Since git really doesn't care, except rare cases like
git update-ref, or git tag -d, that seems ok for now.
There's also a tree-ish, but let's just use Ref for it. A given Sha or Ref
may or may not be a tree-ish, depending on the object type, so there seems
no point in trying to represent it at the type level.
2011-11-16 06:23:34 +00:00
|
|
|
name :: Git.Ref
|
|
|
|
name = Git.Ref "git-annex"
|
code to update a git-annex branch
There is no suitable git hook to run code when pulling changes that
might need to be merged into the git-annex branch. The post-merge hook
is only run when changes are merged into HEAD, and it's possible,
and indeed likely that many pulls will only have changes in git-annex,
but not in HEAD, and not trigger it.
So, git-annex will have to take care to update the branch before reading
from it, to make sure it has merged in current info from remotes. Happily,
this can be done quite inexpensively, just a git-show-ref to list
branches, and a minimalized git-log to see if there are unmerged changes
on the branches. To further speed up, it will be done only once per
git-annex run, max.
2011-06-21 18:29:09 +00:00
|
|
|
|
2011-06-21 21:39:45 +00:00
|
|
|
{- Fully qualified name of the branch. -}
|
improve type signatures with a Ref newtype
In git, a Ref can be a Sha, or a Branch, or a Tag. I added type aliases for
those. Note that this does not prevent mixing up of eg, refs and branches
at the type level. Since git really doesn't care, except rare cases like
git update-ref, or git tag -d, that seems ok for now.
There's also a tree-ish, but let's just use Ref for it. A given Sha or Ref
may or may not be a tree-ish, depending on the object type, so there seems
no point in trying to represent it at the type level.
2011-11-16 06:23:34 +00:00
|
|
|
fullname :: Git.Ref
|
2014-02-19 05:09:17 +00:00
|
|
|
fullname = Git.Ref $ "refs/heads/" ++ fromRef name
|
code to update a git-annex branch
There is no suitable git hook to run code when pulling changes that
might need to be merged into the git-annex branch. The post-merge hook
is only run when changes are merged into HEAD, and it's possible,
and indeed likely that many pulls will only have changes in git-annex,
but not in HEAD, and not trigger it.
So, git-annex will have to take care to update the branch before reading
from it, to make sure it has merged in current info from remotes. Happily,
this can be done quite inexpensively, just a git-show-ref to list
branches, and a minimalized git-log to see if there are unmerged changes
on the branches. To further speed up, it will be done only once per
git-annex run, max.
2011-06-21 18:29:09 +00:00
|
|
|
|
2011-06-24 15:59:34 +00:00
|
|
|
{- Branch's name in origin. -}
|
improve type signatures with a Ref newtype
In git, a Ref can be a Sha, or a Branch, or a Tag. I added type aliases for
those. Note that this does not prevent mixing up of eg, refs and branches
at the type level. Since git really doesn't care, except rare cases like
git update-ref, or git tag -d, that seems ok for now.
There's also a tree-ish, but let's just use Ref for it. A given Sha or Ref
may or may not be a tree-ish, depending on the object type, so there seems
no point in trying to represent it at the type level.
2011-11-16 06:23:34 +00:00
|
|
|
originname :: Git.Ref
|
2014-02-19 05:09:17 +00:00
|
|
|
originname = Git.Ref $ "origin/" ++ fromRef name
|
2011-06-24 15:59:34 +00:00
|
|
|
|
2011-12-13 01:12:51 +00:00
|
|
|
{- Does origin/git-annex exist? -}
|
|
|
|
hasOrigin :: Annex Bool
|
|
|
|
hasOrigin = inRepo $ Git.Ref.exists originname
|
slow, stupid, and safe index updating
Always merge the git-annex branch into .git/annex/index before making a
commit from the index.
This ensures that, when the branch has been changed in any way
(by a push being received, or changes pulled directly into it, or
even by the user checking it out, and committing a change), the index
reflects those changes.
This is much too slow; it needs to be optimised to only update the
index when the branch has really changed, not every time.
Also, there is an unhandled race, when a change is made to the branch
right after the index gets updated. I left it in for now because it's
unlikely and I didn't want to complicate things with additional locking
yet.
2011-12-11 18:51:20 +00:00
|
|
|
|
2011-12-13 01:12:51 +00:00
|
|
|
{- Does the git-annex branch or a sibling foo/git-annex branch exist? -}
|
|
|
|
hasSibling :: Annex Bool
|
|
|
|
hasSibling = not . null <$> siblingBranches
|
2011-06-21 21:39:45 +00:00
|
|
|
|
2011-12-13 01:12:51 +00:00
|
|
|
{- List of git-annex (refs, branches), including the main one and any
|
|
|
|
- from remotes. Duplicate refs are filtered out. -}
|
|
|
|
siblingBranches :: Annex [(Git.Ref, Git.Branch)]
|
2013-05-21 22:24:29 +00:00
|
|
|
siblingBranches = inRepo $ Git.Ref.matchingUniq [name]
|
2011-06-22 19:58:30 +00:00
|
|
|
|
|
|
|
{- Creates the branch, if it does not already exist. -}
|
|
|
|
create :: Annex ()
|
2012-06-12 15:32:06 +00:00
|
|
|
create = void getBranch
|
2011-12-12 07:30:47 +00:00
|
|
|
|
|
|
|
{- Returns the ref of the branch, creating it first if necessary. -}
|
2012-01-10 19:36:54 +00:00
|
|
|
getBranch :: Annex Git.Ref
|
|
|
|
getBranch = maybe (hasOrigin >>= go >>= use) return =<< branchsha
|
2012-12-13 04:24:19 +00:00
|
|
|
where
|
|
|
|
go True = do
|
2013-03-03 17:39:07 +00:00
|
|
|
inRepo $ Git.Command.run
|
2014-02-19 05:09:17 +00:00
|
|
|
[Param "branch", Param $ fromRef name, Param $ fromRef originname]
|
|
|
|
fromMaybe (error $ "failed to create " ++ fromRef name)
|
2012-12-13 04:24:19 +00:00
|
|
|
<$> branchsha
|
|
|
|
go False = withIndex' True $
|
2014-07-04 15:36:59 +00:00
|
|
|
inRepo $ Git.Branch.commitAlways Git.Branch.AutomaticCommit "branch created" fullname []
|
2012-12-13 04:24:19 +00:00
|
|
|
use sha = do
|
|
|
|
setIndexSha sha
|
|
|
|
return sha
|
|
|
|
branchsha = inRepo $ Git.Ref.sha fullname
|
2011-06-22 18:18:49 +00:00
|
|
|
|
2012-09-15 19:40:13 +00:00
|
|
|
{- Ensures that the branch and index are up-to-date; should be
|
2012-09-15 23:47:23 +00:00
|
|
|
- called before data is read from it. Runs only once per git-annex run. -}
|
2011-12-30 19:57:28 +00:00
|
|
|
update :: Annex ()
|
avoid unnecessary transfer scans when syncing a disconnected remote
Found a very cheap way to determine when a disconnected remote has
diverged, and has new content that needs to be transferred: Piggyback on
the git-annex branch update, which already checks for divergence.
However, this does not check if new content has appeared locally while
disconnected, that should be transferred to the remote.
Also, this does not handle cases where the two git repos are in sync,
but their content syncing has not caught up yet.
This code could have its efficiency improved:
* When multiple remotes are synced, if any one has diverged, they're
all queued for transfer scans.
* The transfer scanner could be told whether the remote has new content,
the local repo has new content, or both, and could optimise its scan
accordingly.
2012-08-22 18:51:11 +00:00
|
|
|
update = runUpdateOnce $ void $ updateTo =<< siblingBranches
|
2011-12-30 19:57:28 +00:00
|
|
|
|
|
|
|
{- Forces an update even if one has already been run. -}
|
avoid unnecessary transfer scans when syncing a disconnected remote
Found a very cheap way to determine when a disconnected remote has
diverged, and has new content that needs to be transferred: Piggyback on
the git-annex branch update, which already checks for divergence.
However, this does not check if new content has appeared locally while
disconnected, that should be transferred to the remote.
Also, this does not handle cases where the two git repos are in sync,
but their content syncing has not caught up yet.
This code could have its efficiency improved:
* When multiple remotes are synced, if any one has diverged, they're
all queued for transfer scans.
* The transfer scanner could be told whether the remote has new content,
the local repo has new content, or both, and could optimise its scan
accordingly.
2012-08-22 18:51:11 +00:00
|
|
|
forceUpdate :: Annex Bool
|
2011-12-30 19:57:28 +00:00
|
|
|
forceUpdate = updateTo =<< siblingBranches
|
|
|
|
|
|
|
|
{- Merges the specified Refs into the index, if they have any changes not
|
|
|
|
- already in it. The Branch names are only used in the commit message;
|
|
|
|
- it's even possible that the provided Branches have not been updated to
|
|
|
|
- point to the Refs yet.
|
2012-09-15 22:34:46 +00:00
|
|
|
-
|
|
|
|
- The branch is fast-forwarded if possible, otherwise a merge commit is
|
|
|
|
- made.
|
2011-10-09 20:19:09 +00:00
|
|
|
-
|
2012-09-15 22:34:46 +00:00
|
|
|
- Before Refs are merged into the index, it's important to first stage the
|
merge: Use fast-forward merges when possible.
Thanks Valentin Haenel for a test case showing how non-fast-forward merges
could result in an ongoing pull/merge/push cycle.
While the git-annex branch is fast-forwarded, git-annex's index file is still
updated using the union merge strategy as before. There's no other way to
update the index that would be any faster.
It is possible that a union merge and a fast-forward result in different file
contents: Files should have the same lines, but a union merge may change
their order. If this happens, the next commit made to the git-annex branch
will have some unnecessary changes to line orders, but the consistency
of data should be preserved.
Note that when the journal contains changes, a fast-forward is never attempted,
which is fine, because committing those changes would be vanishingly unlikely
to leave the git-annex branch at a commit that already exists in one of
the remotes.
The real difficulty is handling the case where multiple remotes have all
changed. git-annex does find the best (ie, newest) one and fast forwards
to it. If the remotes are diverged, no fast-forward is done at all. It would
be possible to pick one, fast forward to it, and make a merge commit to
the rest, I see no benefit to adding that complexity.
Determining the best of N changed remotes requires N*2+1 calls to git-log, but
these are fast git-log calls, and N is typically small. Also, typically
some or all of the remote refs will be the same, and git-log is not called to
compare those. In the real world I expect this will almost always add only
1 git-log call to the merge process. (Which already makes N anyway.)
2011-11-06 19:18:45 +00:00
|
|
|
- journal into the index. Otherwise, any changes in the journal would
|
|
|
|
- later get staged, and might overwrite changes made during the merge.
|
2012-09-15 22:34:46 +00:00
|
|
|
- This is only done if some of the Refs do need to be merged.
|
2011-10-09 20:19:09 +00:00
|
|
|
-
|
2013-08-28 19:57:42 +00:00
|
|
|
- Also handles performing any Transitions that have not yet been
|
|
|
|
- performed, in either the local branch, or the Refs.
|
|
|
|
-
|
avoid unnecessary transfer scans when syncing a disconnected remote
Found a very cheap way to determine when a disconnected remote has
diverged, and has new content that needs to be transferred: Piggyback on
the git-annex branch update, which already checks for divergence.
However, this does not check if new content has appeared locally while
disconnected, that should be transferred to the remote.
Also, this does not handle cases where the two git repos are in sync,
but their content syncing has not caught up yet.
This code could have its efficiency improved:
* When multiple remotes are synced, if any one has diverged, they're
all queued for transfer scans.
* The transfer scanner could be told whether the remote has new content,
the local repo has new content, or both, and could optimise its scan
accordingly.
2012-08-22 18:51:11 +00:00
|
|
|
- Returns True if any refs were merged in, False otherwise.
|
2011-10-09 20:19:09 +00:00
|
|
|
-}
|
avoid unnecessary transfer scans when syncing a disconnected remote
Found a very cheap way to determine when a disconnected remote has
diverged, and has new content that needs to be transferred: Piggyback on
the git-annex branch update, which already checks for divergence.
However, this does not check if new content has appeared locally while
disconnected, that should be transferred to the remote.
Also, this does not handle cases where the two git repos are in sync,
but their content syncing has not caught up yet.
This code could have its efficiency improved:
* When multiple remotes are synced, if any one has diverged, they're
all queued for transfer scans.
* The transfer scanner could be told whether the remote has new content,
the local repo has new content, or both, and could optimise its scan
accordingly.
2012-08-22 18:51:11 +00:00
|
|
|
updateTo :: [(Git.Ref, Git.Branch)] -> Annex Bool
|
2011-12-30 19:57:28 +00:00
|
|
|
updateTo pairs = do
|
2011-12-12 07:30:47 +00:00
|
|
|
-- ensure branch exists, and get its current ref
|
|
|
|
branchref <- getBranch
|
2012-09-15 23:47:23 +00:00
|
|
|
dirty <- journalDirty
|
2013-08-28 19:57:42 +00:00
|
|
|
ignoredrefs <- getIgnoredRefs
|
|
|
|
(refs, branches) <- unzip <$> filterM (isnewer ignoredrefs) pairs
|
2012-09-15 22:34:46 +00:00
|
|
|
if null refs
|
2012-12-13 04:45:27 +00:00
|
|
|
{- Even when no refs need to be merged, the index
|
2012-09-15 23:47:23 +00:00
|
|
|
- may still be updated if the branch has gotten ahead
|
|
|
|
- of the index. -}
|
2013-10-03 18:41:57 +00:00
|
|
|
then whenM (needUpdateIndex branchref) $ lockJournal $ \jl -> do
|
2013-10-03 19:43:08 +00:00
|
|
|
forceUpdateIndex jl branchref
|
2012-09-15 23:47:23 +00:00
|
|
|
{- When there are journalled changes
|
|
|
|
- as well as the branch being updated,
|
|
|
|
- a commit needs to be done. -}
|
|
|
|
when dirty $
|
2013-10-03 18:41:57 +00:00
|
|
|
go branchref True [] [] jl
|
2012-09-15 23:47:23 +00:00
|
|
|
else lockJournal $ go branchref dirty refs branches
|
|
|
|
return $ not $ null refs
|
2012-12-13 04:24:19 +00:00
|
|
|
where
|
2013-08-28 19:57:42 +00:00
|
|
|
isnewer ignoredrefs (r, _)
|
|
|
|
| S.member r ignoredrefs = return False
|
|
|
|
| otherwise = inRepo $ Git.Branch.changed fullname r
|
2013-10-03 18:41:57 +00:00
|
|
|
go branchref dirty refs branches jl = withIndex $ do
|
|
|
|
cleanjournal <- if dirty then stageJournal jl else return noop
|
2012-12-13 04:24:19 +00:00
|
|
|
let merge_desc = if null branches
|
|
|
|
then "update"
|
|
|
|
else "merging " ++
|
|
|
|
unwords (map Git.Ref.describe branches) ++
|
2014-02-19 05:09:17 +00:00
|
|
|
" into " ++ fromRef name
|
2013-08-28 19:57:42 +00:00
|
|
|
localtransitions <- parseTransitionsStrictly "local"
|
2013-10-03 18:41:57 +00:00
|
|
|
<$> getLocal transitionsLog
|
2012-12-13 04:24:19 +00:00
|
|
|
unless (null branches) $ do
|
|
|
|
showSideAction merge_desc
|
2013-10-03 19:43:08 +00:00
|
|
|
mergeIndex jl refs
|
2013-08-28 19:57:42 +00:00
|
|
|
let commitrefs = nub $ fullname:refs
|
2013-10-03 18:41:57 +00:00
|
|
|
unlessM (handleTransitions jl localtransitions commitrefs) $ do
|
2013-09-03 20:31:32 +00:00
|
|
|
ff <- if dirty
|
|
|
|
then return False
|
|
|
|
else inRepo $ Git.Branch.fastForward fullname refs
|
|
|
|
if ff
|
2013-10-03 19:43:08 +00:00
|
|
|
then updateIndex jl branchref
|
2013-10-23 16:58:01 +00:00
|
|
|
else commitIndex jl branchref merge_desc commitrefs
|
2012-12-13 04:24:19 +00:00
|
|
|
liftIO cleanjournal
|
2011-06-23 15:37:26 +00:00
|
|
|
|
Fix a bug in the git-annex branch handling code that could cause info from a remote to not be merged and take effect immediately.
This bug was turned up by the test suite, running fsck in direct mode.
A repository was cloned, was put into direct mode, was fscked, and fsck
incorrectly said that no copy existed of a file, that was actually present
in origin.
This turned out to occur because fsck first did a Annex.Branch.change,
recording that it did not locally have the file. That was recorded in the
journal. Since neither the git annex direct not the fsck had yet needed to
read any info from the branch, but had only made changes to it, the
origin/git-annex branch was not yet merged in. So the journal got a
location log entry written to it, but this did not include
the location log info for the origin. When fsck then did a
Annex.Branch.get, it trusted the journal was cosnsitent, and returned it,
again w/o merging from origin/git-annex. This latter behavior is the
actual bug.
Refer to commit e9bfa8eaed3ff59a4c0bc8d4d677bc493177807c for the thinking
behind it being ok to make a change to a file on the branch, without
first merging the branch. That thinking still stands. However, it means
that files in the journal cannot be trusted to be consistent if the branch
has not been merged. So, to fix, just enure the branch gets merged, even
when reading from the journal.
In tests, this does not seem to cause any extra merging. Except, of course,
in the one case described above. But git annex add, etc, are able to make
changes w/o first merging the branch.
2013-05-20 19:14:59 +00:00
|
|
|
{- Gets the content of a file, which may be in the journal, or in the index
|
|
|
|
- (and committed to the branch).
|
2012-09-15 22:34:46 +00:00
|
|
|
-
|
|
|
|
- Updates the branch if necessary, to ensure the most up-to-date available
|
2013-08-29 20:41:59 +00:00
|
|
|
- content is returned.
|
2011-06-23 15:37:26 +00:00
|
|
|
-
|
|
|
|
- Returns an empty string if the file doesn't exist yet. -}
|
2011-06-21 21:39:45 +00:00
|
|
|
get :: FilePath -> Annex String
|
Fix a bug in the git-annex branch handling code that could cause info from a remote to not be merged and take effect immediately.
This bug was turned up by the test suite, running fsck in direct mode.
A repository was cloned, was put into direct mode, was fscked, and fsck
incorrectly said that no copy existed of a file, that was actually present
in origin.
This turned out to occur because fsck first did a Annex.Branch.change,
recording that it did not locally have the file. That was recorded in the
journal. Since neither the git annex direct not the fsck had yet needed to
read any info from the branch, but had only made changes to it, the
origin/git-annex branch was not yet merged in. So the journal got a
location log entry written to it, but this did not include
the location log info for the origin. When fsck then did a
Annex.Branch.get, it trusted the journal was cosnsitent, and returned it,
again w/o merging from origin/git-annex. This latter behavior is the
actual bug.
Refer to commit e9bfa8eaed3ff59a4c0bc8d4d677bc493177807c for the thinking
behind it being ok to make a change to a file on the branch, without
first merging the branch. That thinking still stands. However, it means
that files in the journal cannot be trusted to be consistent if the branch
has not been merged. So, to fix, just enure the branch gets merged, even
when reading from the journal.
In tests, this does not seem to cause any extra merging. Except, of course,
in the one case described above. But git annex add, etc, are able to make
changes w/o first merging the branch.
2013-05-20 19:14:59 +00:00
|
|
|
get file = do
|
|
|
|
update
|
2013-10-03 18:41:57 +00:00
|
|
|
getLocal file
|
2011-11-12 19:15:57 +00:00
|
|
|
|
|
|
|
{- Like get, but does not merge the branch, so the info returned may not
|
Fix a bug in the git-annex branch handling code that could cause info from a remote to not be merged and take effect immediately.
This bug was turned up by the test suite, running fsck in direct mode.
A repository was cloned, was put into direct mode, was fscked, and fsck
incorrectly said that no copy existed of a file, that was actually present
in origin.
This turned out to occur because fsck first did a Annex.Branch.change,
recording that it did not locally have the file. That was recorded in the
journal. Since neither the git annex direct not the fsck had yet needed to
read any info from the branch, but had only made changes to it, the
origin/git-annex branch was not yet merged in. So the journal got a
location log entry written to it, but this did not include
the location log info for the origin. When fsck then did a
Annex.Branch.get, it trusted the journal was cosnsitent, and returned it,
again w/o merging from origin/git-annex. This latter behavior is the
actual bug.
Refer to commit e9bfa8eaed3ff59a4c0bc8d4d677bc493177807c for the thinking
behind it being ok to make a change to a file on the branch, without
first merging the branch. That thinking still stands. However, it means
that files in the journal cannot be trusted to be consistent if the branch
has not been merged. So, to fix, just enure the branch gets merged, even
when reading from the journal.
In tests, this does not seem to cause any extra merging. Except, of course,
in the one case described above. But git annex add, etc, are able to make
changes w/o first merging the branch.
2013-05-20 19:14:59 +00:00
|
|
|
- reflect changes in remotes.
|
|
|
|
- (Changing the value this returns, and then merging is always the
|
|
|
|
- same as using get, and then changing its value.) -}
|
2013-10-03 18:41:57 +00:00
|
|
|
getLocal :: FilePath -> Annex String
|
|
|
|
getLocal file = go =<< getJournalFileStale file
|
2012-12-13 04:24:19 +00:00
|
|
|
where
|
Fix a bug in the git-annex branch handling code that could cause info from a remote to not be merged and take effect immediately.
This bug was turned up by the test suite, running fsck in direct mode.
A repository was cloned, was put into direct mode, was fscked, and fsck
incorrectly said that no copy existed of a file, that was actually present
in origin.
This turned out to occur because fsck first did a Annex.Branch.change,
recording that it did not locally have the file. That was recorded in the
journal. Since neither the git annex direct not the fsck had yet needed to
read any info from the branch, but had only made changes to it, the
origin/git-annex branch was not yet merged in. So the journal got a
location log entry written to it, but this did not include
the location log info for the origin. When fsck then did a
Annex.Branch.get, it trusted the journal was cosnsitent, and returned it,
again w/o merging from origin/git-annex. This latter behavior is the
actual bug.
Refer to commit e9bfa8eaed3ff59a4c0bc8d4d677bc493177807c for the thinking
behind it being ok to make a change to a file on the branch, without
first merging the branch. That thinking still stands. However, it means
that files in the journal cannot be trusted to be consistent if the branch
has not been merged. So, to fix, just enure the branch gets merged, even
when reading from the journal.
In tests, this does not seem to cause any extra merging. Except, of course,
in the one case described above. But git annex add, etc, are able to make
changes w/o first merging the branch.
2013-05-20 19:14:59 +00:00
|
|
|
go (Just journalcontent) = return journalcontent
|
2013-08-31 21:38:33 +00:00
|
|
|
go Nothing = getRaw file
|
2013-10-03 18:41:57 +00:00
|
|
|
|
2013-08-31 21:38:33 +00:00
|
|
|
getRaw :: FilePath -> Annex String
|
2014-02-06 16:43:56 +00:00
|
|
|
getRaw = getRef fullname
|
|
|
|
|
|
|
|
getHistorical :: RefDate -> FilePath -> Annex String
|
|
|
|
getHistorical date = getRef (Git.Ref.dateRef fullname date)
|
|
|
|
|
|
|
|
getRef :: Ref -> FilePath -> Annex String
|
Fix encoding of data written to git-annex branch. Avoid truncating unicode characters to 8 bits.
Allow any encoding to be used, as with filenames (but utf8 is the sane
choice). Affects metadata and repository descriptions, and preferred
content expressions.
The question of what's the right encoding for the git-annex branch is a
vexing one. utf-8 would be a nice choice, but this leaves the possibility
of bad data getting into a git-annex branch somehow, and this resulting in
git-annex crashing with encoding errors, which is a failure mode I want to
avoid.
(Also, preferred content expressions can refer to filenames, and filenames
can have any encoding, so limiting to utf-8 would not be ideal.)
The union merge code already took care to not assume any encoding for a
file. Except it assumes that any \n is a literal newline, and not part of
some encoding of a character that happens to contain a newline. (At least
utf-8 avoids using newline for anything except liternal newlines.)
Adapted the git-annex branch code to use this same approach.
Note that there is a potential interop problem with Windows, since
FileSystemEncoding doesn't work there, and instead things are always
decoded as utf-8. If someone uses non-utf8 encoding for data on the
git-annex branch, this can lead to an encoding error on windows. However,
this commit doesn't actually make that any worse, because the union merge
code would similarly fail with an encoding error on windows in that
situation.
This commit was sponsored by Kyle Meyer.
2014-05-27 18:16:33 +00:00
|
|
|
getRef ref file = withIndex $ decodeBS <$> catFile ref file
|
2011-06-30 01:23:40 +00:00
|
|
|
|
2011-12-13 01:12:51 +00:00
|
|
|
{- Applies a function to modifiy the content of a file.
|
|
|
|
-
|
|
|
|
- Note that this does not cause the branch to be merged, it only
|
|
|
|
- modifes the current content of the file on the branch.
|
|
|
|
-}
|
|
|
|
change :: FilePath -> (String -> String) -> Annex ()
|
2013-10-03 18:41:57 +00:00
|
|
|
change file a = lockJournal $ \jl -> a <$> getLocal file >>= set jl file
|
2011-12-13 01:12:51 +00:00
|
|
|
|
2012-10-19 18:25:15 +00:00
|
|
|
{- Records new content of a file into the journal -}
|
2013-10-03 18:41:57 +00:00
|
|
|
set :: JournalLocked -> FilePath -> String -> Annex ()
|
2013-04-03 07:52:41 +00:00
|
|
|
set = setJournalFile
|
2011-12-13 01:12:51 +00:00
|
|
|
|
|
|
|
{- Stages the journal, and commits staged changes to the branch. -}
|
|
|
|
commit :: String -> Annex ()
|
2013-10-23 16:58:01 +00:00
|
|
|
commit = whenM journalDirty . forceCommit
|
|
|
|
|
2014-05-30 00:12:17 +00:00
|
|
|
{- Commits the current index to the branch even without any journalled
|
2013-10-23 16:58:01 +00:00
|
|
|
- changes. -}
|
|
|
|
forceCommit :: String -> Annex ()
|
|
|
|
forceCommit message = lockJournal $ \jl -> do
|
2013-10-03 18:41:57 +00:00
|
|
|
cleanjournal <- stageJournal jl
|
2011-12-13 01:12:51 +00:00
|
|
|
ref <- getBranch
|
2013-10-23 16:58:01 +00:00
|
|
|
withIndex $ commitIndex jl ref message [fullname]
|
2013-04-03 07:52:41 +00:00
|
|
|
liftIO cleanjournal
|
2012-02-25 20:11:47 +00:00
|
|
|
|
2011-12-13 01:12:51 +00:00
|
|
|
{- Commits the staged changes in the index to the branch.
|
|
|
|
-
|
|
|
|
- Ensures that the branch's index file is first updated to the state
|
2012-02-14 15:20:30 +00:00
|
|
|
- of the branch at branchref, before running the commit action. This
|
2011-12-13 01:12:51 +00:00
|
|
|
- is needed because the branch may have had changes pushed to it, that
|
|
|
|
- are not yet reflected in the index.
|
|
|
|
-
|
|
|
|
- Also safely handles a race that can occur if a change is being pushed
|
|
|
|
- into the branch at the same time. When the race happens, the commit will
|
|
|
|
- be made on top of the newly pushed change, but without the index file
|
|
|
|
- being updated to include it. The result is that the newly pushed
|
|
|
|
- change is reverted. This race is detected and another commit made
|
|
|
|
- to fix it.
|
|
|
|
-
|
|
|
|
- The branchref value can have been obtained using getBranch at any
|
|
|
|
- previous point, though getting it a long time ago makes the race
|
|
|
|
- more likely to occur.
|
|
|
|
-}
|
2013-10-23 16:58:01 +00:00
|
|
|
commitIndex :: JournalLocked -> Git.Ref -> String -> [Git.Ref] -> Annex ()
|
|
|
|
commitIndex jl branchref message parents = do
|
2012-09-15 22:34:46 +00:00
|
|
|
showStoringStateAction
|
2013-10-23 16:58:01 +00:00
|
|
|
commitIndex' jl branchref message parents
|
|
|
|
commitIndex' :: JournalLocked -> Git.Ref -> String -> [Git.Ref] -> Annex ()
|
|
|
|
commitIndex' jl branchref message parents = do
|
2013-10-03 19:43:08 +00:00
|
|
|
updateIndex jl branchref
|
2014-07-04 15:36:59 +00:00
|
|
|
committedref <- inRepo $ Git.Branch.commitAlways Git.Branch.AutomaticCommit message fullname parents
|
2011-12-13 01:12:51 +00:00
|
|
|
setIndexSha committedref
|
|
|
|
parentrefs <- commitparents <$> catObject committedref
|
2014-01-26 21:04:12 +00:00
|
|
|
when (racedetected branchref parentrefs) $
|
2011-12-13 01:12:51 +00:00
|
|
|
fixrace committedref parentrefs
|
2012-12-13 04:24:19 +00:00
|
|
|
where
|
|
|
|
-- look for "parent ref" lines and return the refs
|
|
|
|
commitparents = map (Git.Ref . snd) . filter isparent .
|
Fix encoding of data written to git-annex branch. Avoid truncating unicode characters to 8 bits.
Allow any encoding to be used, as with filenames (but utf8 is the sane
choice). Affects metadata and repository descriptions, and preferred
content expressions.
The question of what's the right encoding for the git-annex branch is a
vexing one. utf-8 would be a nice choice, but this leaves the possibility
of bad data getting into a git-annex branch somehow, and this resulting in
git-annex crashing with encoding errors, which is a failure mode I want to
avoid.
(Also, preferred content expressions can refer to filenames, and filenames
can have any encoding, so limiting to utf-8 would not be ideal.)
The union merge code already took care to not assume any encoding for a
file. Except it assumes that any \n is a literal newline, and not part of
some encoding of a character that happens to contain a newline. (At least
utf-8 avoids using newline for anything except liternal newlines.)
Adapted the git-annex branch code to use this same approach.
Note that there is a potential interop problem with Windows, since
FileSystemEncoding doesn't work there, and instead things are always
decoded as utf-8. If someone uses non-utf8 encoding for data on the
git-annex branch, this can lead to an encoding error on windows. However,
this commit doesn't actually make that any worse, because the union merge
code would similarly fail with an encoding error on windows in that
situation.
This commit was sponsored by Kyle Meyer.
2014-05-27 18:16:33 +00:00
|
|
|
map (toassoc . decodeBS) . L.split newline
|
|
|
|
newline = c2w8 '\n'
|
2012-12-13 04:24:19 +00:00
|
|
|
toassoc = separate (== ' ')
|
|
|
|
isparent (k,_) = k == "parent"
|
2011-12-13 01:12:51 +00:00
|
|
|
|
2012-12-13 04:24:19 +00:00
|
|
|
{- The race can be detected by checking the commit's
|
|
|
|
- parent, which will be the newly pushed branch,
|
|
|
|
- instead of the expected ref that the index was updated to. -}
|
|
|
|
racedetected expectedref parentrefs
|
|
|
|
| expectedref `elem` parentrefs = False -- good parent
|
|
|
|
| otherwise = True -- race!
|
2011-12-13 01:12:51 +00:00
|
|
|
|
2012-12-13 04:24:19 +00:00
|
|
|
{- To recover from the race, union merge the lost refs
|
|
|
|
- into the index, and recommit on top of the bad commit. -}
|
|
|
|
fixrace committedref lostrefs = do
|
2013-10-03 19:43:08 +00:00
|
|
|
mergeIndex jl lostrefs
|
2013-10-23 16:58:01 +00:00
|
|
|
commitIndex jl committedref racemessage [committedref]
|
2011-12-13 01:12:51 +00:00
|
|
|
|
2012-12-13 04:24:19 +00:00
|
|
|
racemessage = message ++ " (recovery from race)"
|
2011-12-13 01:12:51 +00:00
|
|
|
|
2011-06-23 15:37:26 +00:00
|
|
|
{- Lists all files on the branch. There may be duplicates in the list. -}
|
2011-06-23 03:24:14 +00:00
|
|
|
files :: Annex [FilePath]
|
2012-09-15 19:40:13 +00:00
|
|
|
files = do
|
|
|
|
update
|
2013-08-31 21:38:33 +00:00
|
|
|
(++)
|
|
|
|
<$> branchFiles
|
2013-10-03 18:41:57 +00:00
|
|
|
<*> getJournalledFilesStale
|
2013-08-31 21:38:33 +00:00
|
|
|
|
|
|
|
{- Files in the branch, not including any from journalled changes,
|
|
|
|
- and without updating the branch. -}
|
|
|
|
branchFiles :: Annex [FilePath]
|
|
|
|
branchFiles = withIndex $ inRepo $ Git.Command.pipeNullSplitZombie
|
|
|
|
[ Params "ls-tree --name-only -r -z"
|
2014-02-19 05:09:17 +00:00
|
|
|
, Param $ fromRef fullname
|
2013-08-31 21:38:33 +00:00
|
|
|
]
|
2011-12-13 01:12:51 +00:00
|
|
|
|
|
|
|
{- Populates the branch's index file with the current branch contents.
|
|
|
|
-
|
|
|
|
- This is only done when the index doesn't yet exist, and the index
|
|
|
|
- is used to build up changes to be commited to the branch, and merge
|
|
|
|
- in changes from other branches.
|
|
|
|
-}
|
|
|
|
genIndex :: Git.Repo -> IO ()
|
2012-06-08 04:29:39 +00:00
|
|
|
genIndex g = Git.UpdateIndex.streamUpdateIndex g
|
|
|
|
[Git.UpdateIndex.lsTree fullname g]
|
2011-12-13 01:12:51 +00:00
|
|
|
|
2011-12-30 19:57:28 +00:00
|
|
|
{- Merges the specified refs into the index.
|
2011-12-13 01:12:51 +00:00
|
|
|
- Any changes staged in the index will be preserved. -}
|
2013-10-03 19:43:08 +00:00
|
|
|
mergeIndex :: JournalLocked -> [Git.Ref] -> Annex ()
|
|
|
|
mergeIndex jl branches = do
|
|
|
|
prepareModifyIndex jl
|
2011-12-13 01:12:51 +00:00
|
|
|
h <- catFileHandle
|
2012-06-08 04:29:39 +00:00
|
|
|
inRepo $ \g -> Git.UnionMerge.mergeIndex h g branches
|
2011-12-13 01:12:51 +00:00
|
|
|
|
2013-10-03 19:43:08 +00:00
|
|
|
{- Removes any stale git lock file, to avoid git falling over when
|
|
|
|
- updating the index.
|
|
|
|
-
|
|
|
|
- Since all modifications of the index are performed inside this module,
|
|
|
|
- and only when the journal is locked, the fact that the journal has to be
|
|
|
|
- locked when this is called ensures that no other process is currently
|
|
|
|
- modifying the index. So any index.lock file must be stale, caused
|
|
|
|
- by git running when the system crashed, or the repository's disk was
|
|
|
|
- removed, etc.
|
|
|
|
-}
|
|
|
|
prepareModifyIndex :: JournalLocked -> Annex ()
|
|
|
|
prepareModifyIndex _jl = do
|
|
|
|
index <- fromRepo gitAnnexIndex
|
|
|
|
void $ liftIO $ tryIO $ removeFile $ index ++ ".lock"
|
|
|
|
|
2011-12-13 01:12:51 +00:00
|
|
|
{- Runs an action using the branch's index file. -}
|
|
|
|
withIndex :: Annex a -> Annex a
|
|
|
|
withIndex = withIndex' False
|
|
|
|
withIndex' :: Bool -> Annex a -> Annex a
|
|
|
|
withIndex' bootstrapping a = do
|
2015-01-06 21:34:02 +00:00
|
|
|
f <- liftIO . absPath =<< fromRepo gitAnnexIndex
|
2014-02-18 21:38:23 +00:00
|
|
|
withIndexFile f $ do
|
2013-11-06 16:21:50 +00:00
|
|
|
checkIndexOnce $ unlessM (liftIO $ doesFileExist f) $ do
|
|
|
|
unless bootstrapping create
|
2013-11-18 22:20:20 +00:00
|
|
|
createAnnexDirectory $ takeDirectory f
|
2013-11-06 16:21:50 +00:00
|
|
|
unless bootstrapping $ inRepo genIndex
|
|
|
|
a
|
2011-12-13 01:12:51 +00:00
|
|
|
|
|
|
|
{- Updates the branch's index to reflect the current contents of the branch.
|
|
|
|
- Any changes staged in the index will be preserved.
|
|
|
|
-
|
|
|
|
- Compares the ref stored in the lock file with the current
|
|
|
|
- ref of the branch to see if an update is needed.
|
|
|
|
-}
|
2013-10-03 19:43:08 +00:00
|
|
|
updateIndex :: JournalLocked -> Git.Ref -> Annex ()
|
|
|
|
updateIndex jl branchref = whenM (needUpdateIndex branchref) $
|
|
|
|
forceUpdateIndex jl branchref
|
2012-09-15 22:34:46 +00:00
|
|
|
|
2013-10-03 19:43:08 +00:00
|
|
|
forceUpdateIndex :: JournalLocked -> Git.Ref -> Annex ()
|
|
|
|
forceUpdateIndex jl branchref = do
|
|
|
|
withIndex $ mergeIndex jl [fullname]
|
2012-09-15 22:34:46 +00:00
|
|
|
setIndexSha branchref
|
|
|
|
|
|
|
|
{- Checks if the index needs to be updated. -}
|
|
|
|
needUpdateIndex :: Git.Ref -> Annex Bool
|
|
|
|
needUpdateIndex branchref = do
|
2013-10-03 19:06:58 +00:00
|
|
|
f <- fromRepo gitAnnexIndexStatus
|
|
|
|
committedref <- Git.Ref . firstLine <$>
|
|
|
|
liftIO (catchDefaultIO "" $ readFileStrict f)
|
|
|
|
return (committedref /= branchref)
|
2011-12-13 01:12:51 +00:00
|
|
|
|
|
|
|
{- Record that the branch's index has been updated to correspond to a
|
|
|
|
- given ref of the branch. -}
|
|
|
|
setIndexSha :: Git.Ref -> Annex ()
|
|
|
|
setIndexSha ref = do
|
2013-10-03 19:06:58 +00:00
|
|
|
f <- fromRepo gitAnnexIndexStatus
|
2014-02-19 05:09:17 +00:00
|
|
|
liftIO $ writeFile f $ fromRef ref ++ "\n"
|
2013-11-18 22:05:37 +00:00
|
|
|
setAnnexFilePerm f
|
2011-12-13 01:12:51 +00:00
|
|
|
|
2012-09-15 23:47:23 +00:00
|
|
|
{- Stages the journal into the index and returns an action that will
|
|
|
|
- clean up the staged journal files, which should only be run once
|
2013-10-03 19:43:08 +00:00
|
|
|
- the index has been committed to the branch.
|
|
|
|
-
|
|
|
|
- Before staging, this removes any existing git index file lock.
|
|
|
|
- This is safe to do because stageJournal is the only thing that
|
|
|
|
- modifies this index file, and only one can run at a time, because
|
|
|
|
- the journal is locked. So any existing git index file lock must be
|
|
|
|
- stale, and the journal must contain any data that was in the process
|
|
|
|
- of being written to the index file when it crashed.
|
|
|
|
-}
|
2013-10-03 18:41:57 +00:00
|
|
|
stageJournal :: JournalLocked -> Annex (IO ())
|
|
|
|
stageJournal jl = withIndex $ do
|
2013-10-03 19:43:08 +00:00
|
|
|
prepareModifyIndex jl
|
2012-09-15 23:47:23 +00:00
|
|
|
g <- gitRepo
|
|
|
|
let dir = gitAnnexJournalDir g
|
2014-07-04 19:28:07 +00:00
|
|
|
(jlogf, jlogh) <- openjlog
|
2014-07-10 03:36:53 +00:00
|
|
|
withJournalHandle $ \jh -> do
|
2012-09-15 23:47:23 +00:00
|
|
|
h <- hashObjectStart g
|
|
|
|
Git.UpdateIndex.streamUpdateIndex g
|
2014-07-10 03:36:53 +00:00
|
|
|
[genstream dir h jh jlogh]
|
2012-09-15 23:47:23 +00:00
|
|
|
hashObjectStop h
|
2014-07-04 19:28:07 +00:00
|
|
|
return $ cleanup dir jlogh jlogf
|
2012-12-13 04:24:19 +00:00
|
|
|
where
|
2014-07-10 03:36:53 +00:00
|
|
|
genstream dir h jh jlogh streamer = do
|
|
|
|
v <- readDirectory jh
|
|
|
|
case v of
|
|
|
|
Nothing -> return ()
|
|
|
|
Just file -> do
|
|
|
|
unless (dirCruft file) $ do
|
|
|
|
let path = dir </> file
|
|
|
|
sha <- hashFile h path
|
|
|
|
hPutStrLn jlogh file
|
|
|
|
streamer $ Git.UpdateIndex.updateIndexLine
|
|
|
|
sha FileBlob (asTopFilePath $ fileJournal file)
|
|
|
|
genstream dir h jh jlogh streamer
|
2014-07-04 19:28:07 +00:00
|
|
|
-- Clean up the staged files, as listed in the temp log file.
|
|
|
|
-- The temp file is used to avoid needing to buffer all the
|
|
|
|
-- filenames in memory.
|
|
|
|
cleanup dir jlogh jlogf = do
|
|
|
|
hFlush jlogh
|
|
|
|
hSeek jlogh AbsoluteSeek 0
|
|
|
|
stagedfs <- lines <$> hGetContents jlogh
|
|
|
|
mapM_ (removeFile . (dir </>)) stagedfs
|
|
|
|
hClose jlogh
|
|
|
|
nukeFile jlogf
|
|
|
|
openjlog = do
|
|
|
|
tmpdir <- fromRepo gitAnnexTmpMiscDir
|
|
|
|
createAnnexDirectory tmpdir
|
|
|
|
liftIO $ openTempFile tmpdir "jlog"
|
2013-08-28 19:57:42 +00:00
|
|
|
|
|
|
|
{- This is run after the refs have been merged into the index,
|
|
|
|
- but before the result is committed to the branch.
|
2013-09-03 20:31:32 +00:00
|
|
|
- (Which is why it's passed the contents of the local branches's
|
|
|
|
- transition log before that merge took place.)
|
2013-08-28 19:57:42 +00:00
|
|
|
-
|
|
|
|
- When the refs contain transitions that have not yet been done locally,
|
|
|
|
- the transitions are performed on the index, and a new branch
|
2013-09-03 20:31:32 +00:00
|
|
|
- is created from the result.
|
2013-08-28 19:57:42 +00:00
|
|
|
-
|
|
|
|
- When there are transitions recorded locally that have not been done
|
|
|
|
- to the remote refs, the transitions are performed in the index,
|
2013-09-03 20:31:32 +00:00
|
|
|
- and committed to the existing branch. In this case, the untransitioned
|
2013-08-28 19:57:42 +00:00
|
|
|
- remote refs cannot be merged into the branch (since transitions
|
2013-09-03 20:31:32 +00:00
|
|
|
- throw away history), so they are added to the list of refs to ignore,
|
2013-08-28 19:57:42 +00:00
|
|
|
- to avoid re-merging content from them again.
|
2013-10-03 18:41:57 +00:00
|
|
|
-}
|
|
|
|
handleTransitions :: JournalLocked -> Transitions -> [Git.Ref] -> Annex Bool
|
|
|
|
handleTransitions jl localts refs = do
|
2013-08-28 19:57:42 +00:00
|
|
|
m <- M.fromList <$> mapM getreftransition refs
|
|
|
|
let remotets = M.elems m
|
|
|
|
if all (localts ==) remotets
|
2013-09-03 20:31:32 +00:00
|
|
|
then return False
|
2013-08-28 19:57:42 +00:00
|
|
|
else do
|
|
|
|
let allts = combineTransitions (localts:remotets)
|
|
|
|
let (transitionedrefs, untransitionedrefs) =
|
|
|
|
partition (\r -> M.lookup r m == Just allts) refs
|
2013-10-03 18:41:57 +00:00
|
|
|
performTransitionsLocked jl allts (localts /= allts) transitionedrefs
|
2013-08-28 19:57:42 +00:00
|
|
|
ignoreRefs untransitionedrefs
|
2013-09-03 20:31:32 +00:00
|
|
|
return True
|
2013-08-28 19:57:42 +00:00
|
|
|
where
|
2014-10-09 18:53:13 +00:00
|
|
|
getreftransition ref = do
|
Fix encoding of data written to git-annex branch. Avoid truncating unicode characters to 8 bits.
Allow any encoding to be used, as with filenames (but utf8 is the sane
choice). Affects metadata and repository descriptions, and preferred
content expressions.
The question of what's the right encoding for the git-annex branch is a
vexing one. utf-8 would be a nice choice, but this leaves the possibility
of bad data getting into a git-annex branch somehow, and this resulting in
git-annex crashing with encoding errors, which is a failure mode I want to
avoid.
(Also, preferred content expressions can refer to filenames, and filenames
can have any encoding, so limiting to utf-8 would not be ideal.)
The union merge code already took care to not assume any encoding for a
file. Except it assumes that any \n is a literal newline, and not part of
some encoding of a character that happens to contain a newline. (At least
utf-8 avoids using newline for anything except liternal newlines.)
Adapted the git-annex branch code to use this same approach.
Note that there is a potential interop problem with Windows, since
FileSystemEncoding doesn't work there, and instead things are always
decoded as utf-8. If someone uses non-utf8 encoding for data on the
git-annex branch, this can lead to an encoding error on windows. However,
this commit doesn't actually make that any worse, because the union merge
code would similarly fail with an encoding error on windows in that
situation.
This commit was sponsored by Kyle Meyer.
2014-05-27 18:16:33 +00:00
|
|
|
ts <- parseTransitionsStrictly "remote" . decodeBS
|
2013-08-28 19:57:42 +00:00
|
|
|
<$> catFile ref transitionsLog
|
|
|
|
return (ref, ts)
|
|
|
|
|
|
|
|
ignoreRefs :: [Git.Ref] -> Annex ()
|
|
|
|
ignoreRefs rs = do
|
|
|
|
old <- getIgnoredRefs
|
|
|
|
let s = S.unions [old, S.fromList rs]
|
|
|
|
f <- fromRepo gitAnnexIgnoredRefs
|
|
|
|
replaceFile f $ \tmp -> liftIO $ writeFile tmp $
|
2014-02-19 05:09:17 +00:00
|
|
|
unlines $ map fromRef $ S.elems s
|
2013-08-28 19:57:42 +00:00
|
|
|
|
|
|
|
getIgnoredRefs :: Annex (S.Set Git.Ref)
|
|
|
|
getIgnoredRefs = S.fromList . mapMaybe Git.Sha.extractSha . lines <$> content
|
|
|
|
where
|
2014-10-09 18:53:13 +00:00
|
|
|
content = do
|
2013-08-28 19:57:42 +00:00
|
|
|
f <- fromRepo gitAnnexIgnoredRefs
|
|
|
|
liftIO $ catchDefaultIO "" $ readFile f
|
|
|
|
|
|
|
|
{- Performs the specified transitions on the contents of the index file,
|
2013-10-03 18:02:34 +00:00
|
|
|
- commits it to the branch, or creates a new branch.
|
|
|
|
-}
|
2013-09-03 20:31:32 +00:00
|
|
|
performTransitions :: Transitions -> Bool -> [Ref] -> Annex ()
|
2013-10-03 18:41:57 +00:00
|
|
|
performTransitions ts neednewlocalbranch transitionedrefs = lockJournal $ \jl ->
|
|
|
|
performTransitionsLocked jl ts neednewlocalbranch transitionedrefs
|
|
|
|
performTransitionsLocked :: JournalLocked -> Transitions -> Bool -> [Ref] -> Annex ()
|
2013-10-03 18:48:46 +00:00
|
|
|
performTransitionsLocked jl ts neednewlocalbranch transitionedrefs = do
|
2013-08-31 21:38:33 +00:00
|
|
|
-- For simplicity & speed, we're going to use the Annex.Queue to
|
|
|
|
-- update the git-annex branch, while it usually holds changes
|
|
|
|
-- for the head branch. Flush any such changes.
|
|
|
|
Annex.Queue.flush
|
|
|
|
withIndex $ do
|
2013-10-03 19:43:08 +00:00
|
|
|
prepareModifyIndex jl
|
2013-08-31 21:38:33 +00:00
|
|
|
run $ mapMaybe getTransitionCalculator $ transitionList ts
|
|
|
|
Annex.Queue.flush
|
2013-09-03 20:31:32 +00:00
|
|
|
if neednewlocalbranch
|
2013-08-31 21:38:33 +00:00
|
|
|
then do
|
2014-07-04 15:36:59 +00:00
|
|
|
committedref <- inRepo $ Git.Branch.commitAlways Git.Branch.AutomaticCommit message fullname transitionedrefs
|
2013-08-31 21:38:33 +00:00
|
|
|
setIndexSha committedref
|
|
|
|
else do
|
|
|
|
ref <- getBranch
|
2013-10-23 16:58:01 +00:00
|
|
|
commitIndex jl ref message (nub $ fullname:transitionedrefs)
|
2013-08-28 19:57:42 +00:00
|
|
|
where
|
2014-10-09 18:53:13 +00:00
|
|
|
message
|
2013-09-03 20:31:32 +00:00
|
|
|
| neednewlocalbranch && null transitionedrefs = "new branch for transition " ++ tdesc
|
2013-08-28 19:57:42 +00:00
|
|
|
| otherwise = "continuing transition " ++ tdesc
|
|
|
|
tdesc = show $ map describeTransition $ transitionList ts
|
2013-08-31 21:38:33 +00:00
|
|
|
|
|
|
|
{- The changes to make to the branch are calculated and applied to
|
|
|
|
- the branch directly, rather than going through the journal,
|
|
|
|
- which would be innefficient. (And the journal is not designed
|
|
|
|
- to hold changes to every file in the branch at once.)
|
|
|
|
-
|
|
|
|
- When a file in the branch is changed by transition code,
|
|
|
|
- that value is remembered and fed into the code for subsequent
|
|
|
|
- transitions.
|
|
|
|
-}
|
|
|
|
run [] = noop
|
|
|
|
run changers = do
|
|
|
|
trustmap <- calcTrustMap <$> getRaw trustLog
|
|
|
|
fs <- branchFiles
|
|
|
|
hasher <- inRepo hashObjectStart
|
|
|
|
forM_ fs $ \f -> do
|
|
|
|
content <- getRaw f
|
|
|
|
apply changers hasher f content trustmap
|
|
|
|
liftIO $ hashObjectStop hasher
|
|
|
|
apply [] _ _ _ _ = return ()
|
|
|
|
apply (changer:rest) hasher file content trustmap =
|
|
|
|
case changer file content trustmap of
|
|
|
|
RemoveFile -> do
|
|
|
|
Annex.Queue.addUpdateIndex
|
|
|
|
=<< inRepo (Git.UpdateIndex.unstageFile file)
|
|
|
|
-- File is deleted; can't run any other
|
|
|
|
-- transitions on it.
|
|
|
|
return ()
|
|
|
|
ChangeFile content' -> do
|
|
|
|
sha <- inRepo $ hashObject BlobObject content'
|
|
|
|
Annex.Queue.addUpdateIndex $ Git.UpdateIndex.pureStreamer $
|
|
|
|
Git.UpdateIndex.updateIndexLine sha FileBlob (asTopFilePath file)
|
|
|
|
apply rest hasher file content' trustmap
|
|
|
|
PreserveFile ->
|
|
|
|
apply rest hasher file content trustmap
|