export: Added --from option

This is similar to git-annex copy --from --to, in that it downloads a
local copy, locks it for removal, uploads it, and drops it. Removal of
the temporary local copy is done without verifying numcopies for the
same reason as that command.

I do wonder, looking at this, if there's a race where the local copy
gets used as a copy to allow some other drop in the narrow window after
it is downloaded and before it gets locked for removal. That would need
some other repository to have an out of date location log that says the
repository contains a copy of the key, in order for it to try to use it
as a copy. If there is such a race, git-annex copy/move would also be
vulnerable to it. It would be better to lock it for removal before
starting to download it! That is possible in v10 repositories, which do
use a separate content lock file.

Note that, when the exported tree contains several files that use the
same key, it will be downloaded repeatedly, once per time needed to
upload it. It would be possible to avoid that extra work, but it would
complicate this since the local copy would need to be preserved, locked
for removal, until the end. Also, that would mean that interrupting the
export would leave possibly a lot of temporarily downloaded keys in the
local repository, while currently it can only leave one.
This commit is contained in:
Joey Hess 2024-08-08 12:04:39 -04:00
parent bd677bb65a
commit 7294d23d78
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
6 changed files with 76 additions and 31 deletions

View file

@ -2,6 +2,7 @@ git-annex (10.20240732) UNRELEASED; urgency=medium
* Remove debug output (to stderr) accidentially included in * Remove debug output (to stderr) accidentially included in
last version. last version.
* export: Added --from option.
* Special remotes configured with exporttree=yes annexobjects=yes * Special remotes configured with exporttree=yes annexobjects=yes
can store objects in .git/annex/objects, as well as an exported tree. can store objects in .git/annex/objects, as well as an exported tree.
* Support proxying to special remotes configured with * Support proxying to special remotes configured with

View file

@ -55,6 +55,7 @@ data ExportOptions = ExportOptions
{ exportTreeish :: Git.Ref { exportTreeish :: Git.Ref
-- ^ can be a tree, a branch, a commit, or a tag -- ^ can be a tree, a branch, a commit, or a tag
, exportRemote :: DeferredParse Remote , exportRemote :: DeferredParse Remote
, sourceRemote :: [DeferredParse Remote]
, exportTracking :: Bool , exportTracking :: Bool
} }
@ -62,6 +63,7 @@ optParser :: CmdParamsDesc -> Parser ExportOptions
optParser _ = ExportOptions optParser _ = ExportOptions
<$> (Git.Ref <$> parsetreeish) <$> (Git.Ref <$> parsetreeish)
<*> (mkParseRemoteOption <$> parseToOption) <*> (mkParseRemoteOption <$> parseToOption)
<*> many (mkParseRemoteOption <$> parseFromOption)
<*> parsetracking <*> parsetracking
where where
parsetreeish = argument str parsetreeish = argument str
@ -84,6 +86,9 @@ seek o = startConcurrency commandStages $ do
unlessM (isExportSupported r) $ unlessM (isExportSupported r) $
giveup "That remote does not support exports." giveup "That remote does not support exports."
srcrs <- concat . Remote.byCost
<$> mapM getParsed (sourceRemote o)
-- handle deprecated option -- handle deprecated option
when (exportTracking o) $ when (exportTracking o) $
setConfig (remoteAnnexConfig r "tracking-branch") setConfig (remoteAnnexConfig r "tracking-branch")
@ -94,15 +99,15 @@ seek o = startConcurrency commandStages $ do
inRepo (Git.Ref.tree (exportTreeish o)) inRepo (Git.Ref.tree (exportTreeish o))
mtbcommitsha <- getExportCommit r (exportTreeish o) mtbcommitsha <- getExportCommit r (exportTreeish o)
seekExport r tree mtbcommitsha seekExport r tree mtbcommitsha srcrs
seekExport :: Remote -> ExportFiltered Git.Ref -> Maybe (RemoteTrackingBranch, Sha) -> CommandSeek seekExport :: Remote -> ExportFiltered Git.Ref -> Maybe (RemoteTrackingBranch, Sha) -> [Remote] -> CommandSeek
seekExport r tree mtbcommitsha = do seekExport r tree mtbcommitsha srcrs = do
db <- openDb (uuid r) db <- openDb (uuid r)
writeLockDbWhile db $ do writeLockDbWhile db $ do
changeExport r db tree changeExport r db tree
unlessM (Annex.getRead Annex.fast) $ do unlessM (Annex.getRead Annex.fast) $ do
void $ fillExport r db tree mtbcommitsha void $ fillExport r db tree mtbcommitsha srcrs
closeDb db closeDb db
-- | When the treeish is a branch like master or refs/heads/master -- | When the treeish is a branch like master or refs/heads/master
@ -241,8 +246,8 @@ newtype AllFilled = AllFilled { fromAllFilled :: Bool }
-- --
-- Once all exported files have reached the remote, updates the -- Once all exported files have reached the remote, updates the
-- remote tracking branch. -- remote tracking branch.
fillExport :: Remote -> ExportHandle -> ExportFiltered Git.Ref -> Maybe (RemoteTrackingBranch, Sha) -> Annex Bool fillExport :: Remote -> ExportHandle -> ExportFiltered Git.Ref -> Maybe (RemoteTrackingBranch, Sha) -> [Remote] -> Annex Bool
fillExport r db (ExportFiltered newtree) mtbcommitsha = do fillExport r db (ExportFiltered newtree) mtbcommitsha srcrs = do
(l, cleanup) <- inRepo $ Git.LsTree.lsTree (l, cleanup) <- inRepo $ Git.LsTree.lsTree
Git.LsTree.LsTreeRecursive Git.LsTree.LsTreeRecursive
(Git.LsTree.LsTreeLong False) (Git.LsTree.LsTreeLong False)
@ -250,7 +255,7 @@ fillExport r db (ExportFiltered newtree) mtbcommitsha = do
cvar <- liftIO $ newMVar (FileUploaded False) cvar <- liftIO $ newMVar (FileUploaded False)
allfilledvar <- liftIO $ newMVar (AllFilled True) allfilledvar <- liftIO $ newMVar (AllFilled True)
commandActions $ commandActions $
map (startExport r db cvar allfilledvar) l map (startExport r srcrs db cvar allfilledvar) l
void $ liftIO $ cleanup void $ liftIO $ cleanup
waitForAllRunningCommandActions waitForAllRunningCommandActions
@ -263,8 +268,8 @@ fillExport r db (ExportFiltered newtree) mtbcommitsha = do
liftIO $ fromFileUploaded <$> takeMVar cvar liftIO $ fromFileUploaded <$> takeMVar cvar
startExport :: Remote -> ExportHandle -> MVar FileUploaded -> MVar AllFilled -> Git.LsTree.TreeItem -> CommandStart startExport :: Remote -> [Remote] -> ExportHandle -> MVar FileUploaded -> MVar AllFilled -> Git.LsTree.TreeItem -> CommandStart
startExport r db cvar allfilledvar ti = do startExport r srcrs db cvar allfilledvar ti = do
ek <- exportKey (Git.LsTree.sha ti) ek <- exportKey (Git.LsTree.sha ti)
stopUnless (notrecordedpresent ek) $ stopUnless (notrecordedpresent ek) $
starting ("export " ++ name r) ai si $ starting ("export " ++ name r) ai si $
@ -272,7 +277,7 @@ startExport r db cvar allfilledvar ti = do
( next $ cleanupExport r db ek loc False ( next $ cleanupExport r db ek loc False
, do , do
liftIO $ modifyMVar_ cvar (pure . const (FileUploaded True)) liftIO $ modifyMVar_ cvar (pure . const (FileUploaded True))
performExport r db ek af (Git.LsTree.sha ti) loc allfilledvar performExport r srcrs db ek af (Git.LsTree.sha ti) loc allfilledvar
) )
where where
loc = mkExportLocation f loc = mkExportLocation f
@ -295,25 +300,10 @@ startExport r db cvar allfilledvar ti = do
else notElem (uuid r) <$> loggedLocations ek else notElem (uuid r) <$> loggedLocations ek
) )
performExport :: Remote -> ExportHandle -> Key -> AssociatedFile -> Sha -> ExportLocation -> MVar AllFilled -> CommandPerform performExport :: Remote -> [Remote] -> ExportHandle -> Key -> AssociatedFile -> Sha -> ExportLocation -> MVar AllFilled -> CommandPerform
performExport r db ek af contentsha loc allfilledvar = do performExport r srcrs db ek af contentsha loc allfilledvar = do
sent <- tryNonAsync $ if not (isGitShaKey ek) sent <- tryNonAsync $ if not (isGitShaKey ek)
then tryrenameannexobject $ ifM (inAnnex ek) then tryrenameannexobject $ sendannexobject
( notifyTransfer Upload af $
-- alwaysUpload because the same key
-- could be used for more than one export
-- location, and concurrently uploading
-- of the content should still be allowed.
alwaysUpload (uuid r) ek af Nothing stdRetry $ \pm -> do
let rollback = void $
performUnexport r db [ek] loc
sendAnnex ek Nothing rollback $ \f _sz ->
Remote.action $
storer f ek loc pm
, do
showNote "not available"
return False
)
-- Sending a non-annexed file. -- Sending a non-annexed file.
else withTmpFile "export" $ \tmp h -> do else withTmpFile "export" $ \tmp h -> do
b <- catObject contentsha b <- catObject contentsha
@ -333,6 +323,44 @@ performExport r db ek af contentsha loc allfilledvar = do
where where
storer = storeExport (exportActions r) storer = storeExport (exportActions r)
sendannexobject = ifM (inAnnex ek)
( sendlocalannexobject
, firstM remotehaskey srcrs >>= \case
Nothing -> do
showNote "not available"
return False
Just srcr -> getsendannexobject srcr
)
sendlocalannexobject = sendwith $ \p -> do
let rollback = void $
performUnexport r db [ek] loc
sendAnnex ek Nothing rollback $ \f _sz ->
Remote.action $
storer f ek loc p
sendwith a =
notifyTransfer Upload af $
-- alwaysUpload because the same key
-- could be used for more than one export
-- location, and concurrently uploading
-- of the content should still be allowed.
alwaysUpload (uuid r) ek af Nothing stdRetry a
remotehaskey srcr = either (const False) id <$> Remote.hasKey srcr ek
-- Similar to Command.Move.fromToPerform, use a regular download
-- of a local copy, lock early, and drop the local copy after sending.
getsendannexobject srcr = do
showAction $ UnquotedString $ "from " ++ Remote.name srcr
ifM (notifyTransfer Download af $ download srcr ek af stdRetry)
( lockContentForRemoval ek (return False) $ \contentlock -> do
showAction $ UnquotedString $ "to " ++ Remote.name r
sendlocalannexobject
`finally` removeAnnex contentlock
, return False
)
tryrenameannexobject fallback tryrenameannexobject fallback
| annexObjects (Remote.config r) = do | annexObjects (Remote.config r) = do
case renameExport (exportActions r) of case renameExport (exportActions r) of

View file

@ -79,7 +79,7 @@ proxyExportTree = do
Just t -> do Just t -> do
tree <- filterExport r t tree <- filterExport r t
mtbcommitsha <- getExportCommit r b mtbcommitsha <- getExportCommit r b
seekExport r tree mtbcommitsha seekExport r tree mtbcommitsha []
parseHookInput :: B.ByteString -> [((Sha, Sha), Ref)] parseHookInput :: B.ByteString -> [((Sha, Sha), Ref)]
parseHookInput = mapMaybe parse . B8.lines parseHookInput = mapMaybe parse . B8.lines

View file

@ -1019,7 +1019,7 @@ seekExportContent' o rs (mcurrbranch, madj)
| tree == currtree -> do | tree == currtree -> do
filteredtree <- Command.Export.filterExport r tree filteredtree <- Command.Export.filterExport r tree
Command.Export.changeExport r db filteredtree Command.Export.changeExport r db filteredtree
Command.Export.fillExport r db filteredtree mtbcommitsha Command.Export.fillExport r db filteredtree mtbcommitsha []
| otherwise -> cannotupdateexport r db Nothing False | otherwise -> cannotupdateexport r db Nothing False
(Nothing, _, _) -> cannotupdateexport r db (Just (Git.fromRef b ++ " does not exist")) True (Nothing, _, _) -> cannotupdateexport r db (Just (Git.fromRef b ++ " does not exist")) True
(_, Nothing, _) -> cannotupdateexport r db (Just "no branch is currently checked out") True (_, Nothing, _) -> cannotupdateexport r db (Just "no branch is currently checked out") True
@ -1062,7 +1062,7 @@ seekExportContent' o rs (mcurrbranch, madj)
-- filling in any files that did not get transferred -- filling in any files that did not get transferred
-- to the existing exported tree. -- to the existing exported tree.
let filteredtree = Command.Export.ExportFiltered tree let filteredtree = Command.Export.ExportFiltered tree
Command.Export.fillExport r db filteredtree mtbcommitsha Command.Export.fillExport r db filteredtree mtbcommitsha []
fillexistingexport r _ _ _ = do fillexistingexport r _ _ _ = do
warnExportImportConflict r warnExportImportConflict r
return False return False

View file

@ -77,6 +77,20 @@ so the overwritten modification is not lost.)
Specify the special remote to export to. Specify the special remote to export to.
* `--from=remote`
When the content of a file is not available in the local repository,
this option lets it be downloaded from another remote, and sent on to the
destination remote. The file will be temporarily stored on local disk,
but will never enter the local repository.
This option can be repeated multiple times.
It is possible to use --from with the same remote as --to. If the tree
contains several files with the same content, and the remote being
exported to already contains one copy of the content, this allows making
a copy by downloading the content from it.
* `--tracking` * `--tracking`
This is a deprecated way to set "remote.<name>.annex-tracking-branch". This is a deprecated way to set "remote.<name>.annex-tracking-branch".

View file

@ -33,6 +33,8 @@ Planned schedule of work:
* Working on `exportreeplus` branch which is groundwork for proxying to * Working on `exportreeplus` branch which is groundwork for proxying to
exporttree=yes special remotes. Need to merge it to master. exporttree=yes special remotes. Need to merge it to master.
* A proxied exporttree=yes special remote is not untrusted, and should be.
* Handle cases where a single key is used by multiple files in the exported * Handle cases where a single key is used by multiple files in the exported
tree. Need to download from the special remote in order to export tree. Need to download from the special remote in order to export
multiple copies to it. (In particular, this is needed when using multiple copies to it. (In particular, this is needed when using