make import tree from remote generate a merge commit
This way no history is lost, neither what was exported to the remote, or the history of changes that is imported from it. No complicated correlation of two possibly very different histories is needed, just record what we know and then git merge will do a good job. Also, it notices when the remote tracking branch doesn't need to be updated, and avoids doing anything, so noop remotes are super cheap. The only catch here is that, since the commits generated for imports from the remote don't have a stable date or author/committer, each (non-noop) import generates different commits for the same imported trees. So, when the imported remote tracking branch is merged into master and then a change is imported again, there will be an extra series of commits, which will get more and more expensive each time. This seems to call for making stable commits for imports. Also that seems a good idea to make importing in several repositories have the same result.
This commit is contained in:
parent
b69d11ec42
commit
1503b86a14
3 changed files with 71 additions and 106 deletions
138
Annex/Import.hs
138
Annex/Import.hs
|
@ -60,19 +60,16 @@ data ImportTreeConfig
|
|||
|
||||
{- Configures how to build an import commit. -}
|
||||
data ImportCommitConfig = ImportCommitConfig
|
||||
{ importCommitParent :: Maybe Sha
|
||||
-- ^ Commit to use as a parent of the import commit.
|
||||
{ importCommitTracking :: Maybe Sha
|
||||
-- ^ Current commit on the remote tracking branch.
|
||||
, importCommitMode :: Git.Branch.CommitMode
|
||||
, importCommitMessage :: String
|
||||
}
|
||||
|
||||
{- Buils a commit for an import from a special remote.
|
||||
-
|
||||
- When a remote provided a history of versions of files,
|
||||
- builds a corresponding tree of git commits.
|
||||
-
|
||||
- When there are no changes to commit on top of the importCommitParent,
|
||||
- returns Nothing.
|
||||
- When there are no changes to make (importCommitTracking
|
||||
- already matches what was imported), returns Nothing.
|
||||
-
|
||||
- After importing from a remote, exporting the same thing back to the
|
||||
- remote should be a no-op. So, the export log and database are
|
||||
|
@ -89,11 +86,11 @@ buildImportCommit
|
|||
-> ImportableContents Key
|
||||
-> Annex (Maybe Ref)
|
||||
buildImportCommit remote importtreeconfig importcommitconfig importable =
|
||||
case importCommitParent importcommitconfig of
|
||||
case importCommitTracking importcommitconfig of
|
||||
Nothing -> go Nothing
|
||||
Just basecommit -> inRepo (Git.Ref.tree basecommit) >>= \case
|
||||
Just trackingcommit -> inRepo (Git.Ref.tree trackingcommit) >>= \case
|
||||
Nothing -> go Nothing
|
||||
Just _ -> go (Just basecommit)
|
||||
Just _ -> go (Just trackingcommit)
|
||||
where
|
||||
basetree = case importtreeconfig of
|
||||
ImportTree -> emptyTree
|
||||
|
@ -102,27 +99,15 @@ buildImportCommit remote importtreeconfig importcommitconfig importable =
|
|||
ImportTree -> Nothing
|
||||
ImportSubTree dir _ -> Just dir
|
||||
|
||||
go basecommit = do
|
||||
go trackingcommit = do
|
||||
imported@(History finaltree _) <-
|
||||
buildImportTrees basetree subdir importable
|
||||
whatToCommit basecommit imported >>= \case
|
||||
Just (toadd, basecommit') -> do
|
||||
finalcommit <- mkcommits basecommit' toadd
|
||||
buildImportCommit' importcommitconfig trackingcommit imported >>= \case
|
||||
Just finalcommit -> do
|
||||
updatestate finaltree
|
||||
return (Just finalcommit)
|
||||
Nothing -> return Nothing
|
||||
|
||||
mkcommits basecommit (History importedtree hs) = do
|
||||
parents <- mapM (mkcommits basecommit) (S.toList hs)
|
||||
let commitparents = if null parents
|
||||
then catMaybes [basecommit]
|
||||
else parents
|
||||
inRepo $ Git.Branch.commitTree
|
||||
(importCommitMode importcommitconfig)
|
||||
(importCommitMessage importcommitconfig)
|
||||
commitparents
|
||||
importedtree
|
||||
|
||||
updatestate committedtree = do
|
||||
importedtree <- case subdir of
|
||||
Nothing -> pure committedtree
|
||||
|
@ -174,73 +159,50 @@ buildImportCommit remote importtreeconfig importcommitconfig importable =
|
|||
Export.runExportDiffUpdater updater db oldtree finaltree
|
||||
Export.closeDb db
|
||||
|
||||
{- Finds what to commit to update a basecommit with an imported History
|
||||
- of git trees.
|
||||
-
|
||||
- Returns the part of the imported history that should be committed,
|
||||
- as well as the commit sha that it should be committed on top of.
|
||||
- Typically, the latter is the same as the basecommit.
|
||||
-
|
||||
- This uses skipOldHistory to try to match up common trees.
|
||||
- Sometimes, that matching doesn't work. For example, a remote without an
|
||||
- atomic rename operation might result in an imported History with two trees
|
||||
- for each rename, one with the old file removed an another with the new file
|
||||
- added. Since the remote tracking branch is updated on export to the git
|
||||
- commit that was exported, the basecommit could have a single tree for a
|
||||
- rename.
|
||||
-
|
||||
- In that situation, the top tree in the History will match the
|
||||
- basecommit's tree, but then there will be a run of different trees
|
||||
- before they re-converge. That is detected, and the History returned is
|
||||
- truncated to the part above the re-convergence point, to be committed
|
||||
- on top of the re-convergence point.
|
||||
-}
|
||||
whatToCommit :: Maybe Sha -> History Sha -> Annex (Maybe (History Sha, Maybe Sha))
|
||||
whatToCommit (Just basecommit) importedhistory = getknownhistory >>= return . \case
|
||||
Just knownhistory -> whatToCommit' importedhistory basecommit knownhistory
|
||||
Nothing -> Just (importedhistory, Nothing)
|
||||
buildImportCommit' :: ImportCommitConfig -> Maybe Sha -> History Sha -> Annex (Maybe Sha)
|
||||
buildImportCommit' importcommitconfig mtrackingcommit imported@(History ti _) =
|
||||
case mtrackingcommit of
|
||||
Nothing -> Just <$> mkcommits imported
|
||||
Just trackingcommit -> do
|
||||
-- Get history of tracking branch to at most
|
||||
-- one more level deep, so sametodepth will
|
||||
-- always have enough history to compare,
|
||||
-- but unncessary history won't be loaded.
|
||||
let maxdepth = succ (historyDepth imported)
|
||||
inRepo (getHistoryToDepth maxdepth trackingcommit)
|
||||
>>= go trackingcommit
|
||||
where
|
||||
getknownhistory = inRepo $
|
||||
getHistoryToDepth (historyDepth importedhistory) basecommit
|
||||
whatToCommit Nothing importedhistory = return $ Just (importedhistory, Nothing)
|
||||
go _ Nothing = Just <$> mkcommits imported
|
||||
go trackingcommit (Just h)
|
||||
| sametodepth imported h' = return Nothing
|
||||
| t == ti && any (sametodepth imported) (S.toList s) = return Nothing
|
||||
| otherwise = do
|
||||
ci <- mkcommits imported
|
||||
-- Make a merge commit, with one side being the
|
||||
-- import, and the other being the trackingcommit.
|
||||
-- This way the history as imported is preserved,
|
||||
-- even when it differs from the history as exported,
|
||||
-- and git merge will understand that the history
|
||||
-- is connected.
|
||||
let parents =
|
||||
[ trackingcommit
|
||||
, ci
|
||||
]
|
||||
Just <$> mkcommit parents ti
|
||||
where
|
||||
h'@(History t s) = mapHistory historyCommitTree h
|
||||
|
||||
whatToCommit' :: History Sha -> Sha -> History HistoryCommit -> Maybe (History Sha, Maybe Sha)
|
||||
whatToCommit' importedhistory basecommit knownhistory@(History ktop _) =
|
||||
case skipOldHistory (mapHistory historyCommitTree knownhistory) importedhistory of
|
||||
Nothing -> Nothing
|
||||
Just newhistory@(History ntop _)
|
||||
| ntop /= historyCommitTree ktop ->
|
||||
Just (newhistory, Just basecommit)
|
||||
-- XXX find convergence point
|
||||
| otherwise -> undefined
|
||||
|
||||
{- Finds the part of the importedhistory of git trees that is new and
|
||||
- should be committed on top of the knownhistory, skipping parts that have
|
||||
- already been committed.
|
||||
-
|
||||
- Will be Nothing if the knownhistory is already present at the top of
|
||||
- the importedhistory.
|
||||
-
|
||||
- In the simple case where there is only one level of importedhistory,
|
||||
- when the knownhistory has the same tree at its top, there's nothing
|
||||
- to commit. And otherwise it should be committed on top of the knownhistory.
|
||||
-
|
||||
- In the complex case where there are several levels of importedhistory,
|
||||
- finds the point where it first starts matching up with the knownhistory.
|
||||
-
|
||||
- The knownhistory does not need to be complete; it can be
|
||||
- truncated to the same depth as the importedhistory to avoid reading
|
||||
- in a lot of past history.
|
||||
-}
|
||||
skipOldHistory :: Ord t => History t -> History t -> Maybe (History t)
|
||||
skipOldHistory knownhistory importedhistory@(History top rest)
|
||||
| sametodepth importedhistory knownhistory = Nothing
|
||||
| otherwise = Just $
|
||||
History top $ S.fromList $ catMaybes $
|
||||
map (skipOldHistory knownhistory) (S.toList rest)
|
||||
where
|
||||
sametodepth a b = a == truncateHistoryToDepth (historyDepth a) b
|
||||
|
||||
mkcommits (History importedtree hs) = do
|
||||
parents <- mapM mkcommits (S.toList hs)
|
||||
mkcommit parents importedtree
|
||||
mkcommit parents tree = inRepo $ Git.Branch.commitTree
|
||||
(importCommitMode importcommitconfig)
|
||||
(importCommitMessage importcommitconfig)
|
||||
parents
|
||||
tree
|
||||
|
||||
{- Builds a history of git trees reflecting the ImportableContents.
|
||||
-
|
||||
- When a subdir is provided, imported tree is grafted into the basetree at
|
||||
|
|
|
@ -265,9 +265,9 @@ seekRemote remote branch msubdir = do
|
|||
Just tree -> mk tree
|
||||
Nothing -> giveup $ "Unable to find base tree for branch " ++ fromRef branch
|
||||
|
||||
parentcommit <- fromtrackingbranch Git.Ref.sha
|
||||
let importcommitconfig = ImportCommitConfig parentcommit ManualCommit importmessage
|
||||
let commitimport = commitRemote remote branch tb parentcommit importtreeconfig importcommitconfig
|
||||
trackingcommit <- fromtrackingbranch Git.Ref.sha
|
||||
let importcommitconfig = ImportCommitConfig trackingcommit ManualCommit importmessage
|
||||
let commitimport = commitRemote remote branch tb trackingcommit importtreeconfig importcommitconfig
|
||||
|
||||
importabletvar <- liftIO $ newTVarIO Nothing
|
||||
void $ includeCommandAction (listContents remote importabletvar)
|
||||
|
@ -298,7 +298,7 @@ listContents remote tvar = do
|
|||
return True
|
||||
|
||||
commitRemote :: Remote -> Branch -> RemoteTrackingBranch -> Maybe Sha -> ImportTreeConfig -> ImportCommitConfig -> ImportableContents Key -> CommandStart
|
||||
commitRemote remote branch tb parentcommit importtreeconfig importcommitconfig importable = do
|
||||
commitRemote remote branch tb trackingcommit importtreeconfig importcommitconfig importable = do
|
||||
showStart' "update" (Just $ fromRef $ fromRemoteTrackingBranch tb)
|
||||
next $ do
|
||||
importcommit <- buildImportCommit remote importtreeconfig importcommitconfig importable
|
||||
|
@ -308,7 +308,7 @@ commitRemote remote branch tb parentcommit importtreeconfig importcommitconfig i
|
|||
-- Update the tracking branch. Done even when there
|
||||
-- is nothing new to import, to make sure it exists.
|
||||
updateremotetrackingbranch importcommit =
|
||||
case importcommit <|> parentcommit of
|
||||
case importcommit <|> trackingcommit of
|
||||
Just c -> do
|
||||
setRemoteTrackingBranch tb c
|
||||
return True
|
||||
|
|
|
@ -42,23 +42,26 @@ and `git annex sync --content` can be configured to use it.
|
|||
while the S3 history is
|
||||
[[newname, foo, bar], [foo, bar], [oldname, foo, bar], etc]
|
||||
|
||||
(Both of the "etc" values are the same.
|
||||
(Both of the "etc" values are the same.)
|
||||
|
||||
While perhaps a heuristic to detect renames could be added to the History
|
||||
comparison, maybe it would be better to make it notice that these
|
||||
are the same after a certian point, and so preserve the divergence,
|
||||
but with a less ugly history.
|
||||
comparison, better would be to generate a merge between the git commit
|
||||
that was exported to the remote before, and the imported history from the
|
||||
remote. The merge just needs to have as its tree the current imported
|
||||
tree.
|
||||
|
||||
Ie, rather than creating a history like:
|
||||
This way whatever happened on the remote as a consequence of
|
||||
exports and other changes is preserved in the git history in full.
|
||||
|
||||
[[newname, foo, bar], [foo, bar], [oldname, foo, bar], etc, [newname, foo, bar], [oldname, foo, bar], etc]
|
||||
When creating such a merge, first check if the old value of the remote
|
||||
tracking branch matches the imported history. If so, nothing to do.
|
||||
|
||||
Create:
|
||||
Next, check if the old value of the remote tracking branch is a merge,
|
||||
and its tree matches the top of the imported history, and one
|
||||
of its parents matches the full imported history. If so, nothing to do
|
||||
because that is what we want to generate.
|
||||
|
||||
[[newname, foo, bar], [oldname, foo, bar], etc]
|
||||
|
||||
and then upon merge of that s3/master, there will be two lines of
|
||||
devlopment that branch out after "etc" and rejoin at the top.
|
||||
Otherwise, commit the imported history and generate a merge commit.
|
||||
|
||||
* S3 buckets can be set up to allow reads and listing by an anonymous user.
|
||||
That should allow importing from such a bucket, but the S3 remote
|
||||
|
|
Loading…
Reference in a new issue