graft exported tree into git-annex branch

So it will be available later and elsewhere, even after GC.

I first though to use git update-index to do this, but feeding it a line
with a tree object seems to always cause it to generate a git subtree
merge. So, fell back to using the Git.Tree interface to maniupulate the
trees, and not involving the git-annex branch index file at all.

This commit was sponsored by Andreas Karlsson.
This commit is contained in:
Joey Hess 2017-08-31 18:06:49 -04:00
parent 978885247e
commit 5483ea90ec
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 35 additions and 15 deletions

View file

@ -21,6 +21,7 @@ module Annex.Branch (
maybeChange,
commit,
forceCommit,
getBranch,
files,
withIndex,
performTransitions,

View file

@ -70,7 +70,9 @@ seek :: ExportOptions -> CommandSeek
seek o = do
r <- getParsed (exportRemote o)
new <- fromMaybe (error "unknown tree") <$>
inRepo (Git.Ref.sha (exportTreeish o))
-- Dereference the tree pointed to by the branch, commit,
-- or tag.
inRepo (Git.Ref.tree (exportTreeish o))
old <- getExport (uuid r)
when (length old > 1) $

View file

@ -14,6 +14,7 @@ module Git.Tree (
recordTree,
TreeItem(..),
adjustTree,
treeMode,
) where
import Common
@ -94,12 +95,15 @@ mkTree (MkTreeHandle cp) l = CoProcess.query cp send receive
send h = do
forM_ l $ \i -> hPutStr h $ case i of
TreeBlob f fm s -> mkTreeOutput fm BlobObject s f
RecordedSubTree f s _ -> mkTreeOutput 0o040000 TreeObject s f
RecordedSubTree f s _ -> mkTreeOutput treeMode TreeObject s f
NewSubTree _ _ -> error "recordSubTree internal error; unexpected NewSubTree"
TreeCommit f fm s -> mkTreeOutput fm CommitObject s f
hPutStr h "\NUL" -- signal end of tree to --batch
receive h = getSha "mktree" (hGetLine h)
treeMode :: FileMode
treeMode = 0o040000
mkTreeOutput :: FileMode -> ObjectType -> Sha -> TopFilePath -> String
mkTreeOutput fm ot s f = concat
[ showOct fm ""

View file

@ -12,6 +12,9 @@ import qualified Data.Map as M
import Annex.Common
import qualified Annex.Branch
import qualified Git
import qualified Git.Branch
import Git.Tree
import Git.FilePath
import Logs
import Logs.UUIDBased
import Annex.UUID
@ -40,6 +43,9 @@ data ExportChange = ExportChange
-- newTreeish. This way, when multiple repositories are exporting to
-- the same special remote, there's no conflict as long as they move
-- forward in lock-step.
--
-- Also, the newTreeish is grafted into the git-annex branch. This is done
-- to ensure that it's available later.
recordExport :: UUID -> ExportChange -> Annex ()
recordExport remoteuuid ec = do
c <- liftIO currentVectorClock
@ -50,6 +56,7 @@ recordExport remoteuuid ec = do
. changeLog c u val
. M.mapWithKey (updateothers c u)
. parseLogNew parseExportLog
graftTreeish (newTreeish ec)
where
updateothers c u theiru le@(LogEntry _ (ExportLog t remoteuuid'))
| u == theiru || remoteuuid' /= remoteuuid || t `notElem` oldTreeish ec = le
@ -65,3 +72,20 @@ parseExportLog :: String -> Maybe ExportLog
parseExportLog s = case words s of
(t:u:[]) -> Just $ ExportLog (Git.Ref t) (toUUID u)
_ -> Nothing
-- To prevent git-annex branch merge conflicts, the treeish is
-- first grafted in and then removed in a subsequent commit.
graftTreeish :: Git.Ref -> Annex ()
graftTreeish treeish = do
branchref <- Annex.Branch.getBranch
Tree t <- inRepo $ getTree branchref
t' <- inRepo $ recordTree $ Tree $
RecordedSubTree (asTopFilePath graftpoint) treeish [] : t
commit <- inRepo $ Git.Branch.commitTree Git.Branch.AutomaticCommit
"export tree" [branchref] t'
origtree <- inRepo $ recordTree (Tree t)
commit' <- inRepo $ Git.Branch.commitTree Git.Branch.AutomaticCommit
"export tree cleanup" [commit] origtree
inRepo $ Git.Branch.update' Annex.Branch.fullname commit'
where
graftpoint = "export.tree"

View file

@ -133,12 +133,6 @@ key/value stores. The content of a file can change, and if multiple
repositories can export a special remote, they can be out of sync about
what files are exported to it.
To avoid such problems, when updating an exported file on a special remote,
the key could be recorded there too. But, this would have to be done
atomically, and checked atomically when downloading the file. Special
remotes lack atomicity guarantees for file storage, let alone for file
retrieval.
Possible solution: Make exporttree=true cause the special remote to
be untrusted, and rely on annex.verify to catch cases where the content
of a file on a special remote has changed. This would work well enough
@ -205,13 +199,8 @@ In this case, git-annex knows both exported trees. Have the user provide
a tree that resolves the conflict as they desire (it could be the same as
one of the exported trees, or some merge of them or an entirely new tree).
The UI to do this can just be another `git annex export $tree --to remote`.
To resolve, diff each exported tree in turn against the resolving tree. If a
file differs, re-export that file. In some cases this will do unncessary
re-uploads, but it's reasonably efficient.
The documentation should suggest strongly only exporting to a given special
remote from a single repository, or having some other rule that avoids
export conflicts.
To resolve, diff each exported tree in turn against the resolving tree
and delete all files that differ.
## when to update export.log for efficient resuming of exports