export: Added --from option

This is similar to git-annex copy --from --to, in that it downloads a
local copy, locks it for removal, uploads it, and drops it. Removal of
the temporary local copy is done without verifying numcopies for the
same reason as that command.

I do wonder, looking at this, if there's a race where the local copy
gets used as a copy to allow some other drop in the narrow window after
it is downloaded and before it gets locked for removal. That would need
some other repository to have an out of date location log that says the
repository contains a copy of the key, in order for it to try to use it
as a copy. If there is such a race, git-annex copy/move would also be
vulnerable to it. It would be better to lock it for removal before
starting to download it! That is possible in v10 repositories, which do
use a separate content lock file.

Note that, when the exported tree contains several files that use the
same key, it will be downloaded repeatedly, once per time needed to
upload it. It would be possible to avoid that extra work, but it would
complicate this since the local copy would need to be preserved, locked
for removal, until the end. Also, that would mean that interrupting the
export would leave possibly a lot of temporarily downloaded keys in the
local repository, while currently it can only leave one.
This commit is contained in:
Joey Hess 2024-08-08 12:04:39 -04:00
parent bd677bb65a
commit 7294d23d78
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
6 changed files with 76 additions and 31 deletions

View file

@ -2,6 +2,7 @@ git-annex (10.20240732) UNRELEASED; urgency=medium
* Remove debug output (to stderr) accidentially included in
last version.
* export: Added --from option.
* Special remotes configured with exporttree=yes annexobjects=yes
can store objects in .git/annex/objects, as well as an exported tree.
* Support proxying to special remotes configured with

View file

@ -55,6 +55,7 @@ data ExportOptions = ExportOptions
{ exportTreeish :: Git.Ref
-- ^ can be a tree, a branch, a commit, or a tag
, exportRemote :: DeferredParse Remote
, sourceRemote :: [DeferredParse Remote]
, exportTracking :: Bool
}
@ -62,6 +63,7 @@ optParser :: CmdParamsDesc -> Parser ExportOptions
optParser _ = ExportOptions
<$> (Git.Ref <$> parsetreeish)
<*> (mkParseRemoteOption <$> parseToOption)
<*> many (mkParseRemoteOption <$> parseFromOption)
<*> parsetracking
where
parsetreeish = argument str
@ -84,6 +86,9 @@ seek o = startConcurrency commandStages $ do
unlessM (isExportSupported r) $
giveup "That remote does not support exports."
srcrs <- concat . Remote.byCost
<$> mapM getParsed (sourceRemote o)
-- handle deprecated option
when (exportTracking o) $
setConfig (remoteAnnexConfig r "tracking-branch")
@ -94,15 +99,15 @@ seek o = startConcurrency commandStages $ do
inRepo (Git.Ref.tree (exportTreeish o))
mtbcommitsha <- getExportCommit r (exportTreeish o)
seekExport r tree mtbcommitsha
seekExport r tree mtbcommitsha srcrs
seekExport :: Remote -> ExportFiltered Git.Ref -> Maybe (RemoteTrackingBranch, Sha) -> CommandSeek
seekExport r tree mtbcommitsha = do
seekExport :: Remote -> ExportFiltered Git.Ref -> Maybe (RemoteTrackingBranch, Sha) -> [Remote] -> CommandSeek
seekExport r tree mtbcommitsha srcrs = do
db <- openDb (uuid r)
writeLockDbWhile db $ do
changeExport r db tree
unlessM (Annex.getRead Annex.fast) $ do
void $ fillExport r db tree mtbcommitsha
void $ fillExport r db tree mtbcommitsha srcrs
closeDb db
-- | When the treeish is a branch like master or refs/heads/master
@ -241,8 +246,8 @@ newtype AllFilled = AllFilled { fromAllFilled :: Bool }
--
-- Once all exported files have reached the remote, updates the
-- remote tracking branch.
fillExport :: Remote -> ExportHandle -> ExportFiltered Git.Ref -> Maybe (RemoteTrackingBranch, Sha) -> Annex Bool
fillExport r db (ExportFiltered newtree) mtbcommitsha = do
fillExport :: Remote -> ExportHandle -> ExportFiltered Git.Ref -> Maybe (RemoteTrackingBranch, Sha) -> [Remote] -> Annex Bool
fillExport r db (ExportFiltered newtree) mtbcommitsha srcrs = do
(l, cleanup) <- inRepo $ Git.LsTree.lsTree
Git.LsTree.LsTreeRecursive
(Git.LsTree.LsTreeLong False)
@ -250,7 +255,7 @@ fillExport r db (ExportFiltered newtree) mtbcommitsha = do
cvar <- liftIO $ newMVar (FileUploaded False)
allfilledvar <- liftIO $ newMVar (AllFilled True)
commandActions $
map (startExport r db cvar allfilledvar) l
map (startExport r srcrs db cvar allfilledvar) l
void $ liftIO $ cleanup
waitForAllRunningCommandActions
@ -263,8 +268,8 @@ fillExport r db (ExportFiltered newtree) mtbcommitsha = do
liftIO $ fromFileUploaded <$> takeMVar cvar
startExport :: Remote -> ExportHandle -> MVar FileUploaded -> MVar AllFilled -> Git.LsTree.TreeItem -> CommandStart
startExport r db cvar allfilledvar ti = do
startExport :: Remote -> [Remote] -> ExportHandle -> MVar FileUploaded -> MVar AllFilled -> Git.LsTree.TreeItem -> CommandStart
startExport r srcrs db cvar allfilledvar ti = do
ek <- exportKey (Git.LsTree.sha ti)
stopUnless (notrecordedpresent ek) $
starting ("export " ++ name r) ai si $
@ -272,7 +277,7 @@ startExport r db cvar allfilledvar ti = do
( next $ cleanupExport r db ek loc False
, do
liftIO $ modifyMVar_ cvar (pure . const (FileUploaded True))
performExport r db ek af (Git.LsTree.sha ti) loc allfilledvar
performExport r srcrs db ek af (Git.LsTree.sha ti) loc allfilledvar
)
where
loc = mkExportLocation f
@ -295,25 +300,10 @@ startExport r db cvar allfilledvar ti = do
else notElem (uuid r) <$> loggedLocations ek
)
performExport :: Remote -> ExportHandle -> Key -> AssociatedFile -> Sha -> ExportLocation -> MVar AllFilled -> CommandPerform
performExport r db ek af contentsha loc allfilledvar = do
performExport :: Remote -> [Remote] -> ExportHandle -> Key -> AssociatedFile -> Sha -> ExportLocation -> MVar AllFilled -> CommandPerform
performExport r srcrs db ek af contentsha loc allfilledvar = do
sent <- tryNonAsync $ if not (isGitShaKey ek)
then tryrenameannexobject $ ifM (inAnnex ek)
( notifyTransfer Upload af $
-- alwaysUpload because the same key
-- could be used for more than one export
-- location, and concurrently uploading
-- of the content should still be allowed.
alwaysUpload (uuid r) ek af Nothing stdRetry $ \pm -> do
let rollback = void $
performUnexport r db [ek] loc
sendAnnex ek Nothing rollback $ \f _sz ->
Remote.action $
storer f ek loc pm
, do
showNote "not available"
return False
)
then tryrenameannexobject $ sendannexobject
-- Sending a non-annexed file.
else withTmpFile "export" $ \tmp h -> do
b <- catObject contentsha
@ -333,6 +323,44 @@ performExport r db ek af contentsha loc allfilledvar = do
where
storer = storeExport (exportActions r)
sendannexobject = ifM (inAnnex ek)
( sendlocalannexobject
, firstM remotehaskey srcrs >>= \case
Nothing -> do
showNote "not available"
return False
Just srcr -> getsendannexobject srcr
)
sendlocalannexobject = sendwith $ \p -> do
let rollback = void $
performUnexport r db [ek] loc
sendAnnex ek Nothing rollback $ \f _sz ->
Remote.action $
storer f ek loc p
sendwith a =
notifyTransfer Upload af $
-- alwaysUpload because the same key
-- could be used for more than one export
-- location, and concurrently uploading
-- of the content should still be allowed.
alwaysUpload (uuid r) ek af Nothing stdRetry a
remotehaskey srcr = either (const False) id <$> Remote.hasKey srcr ek
-- Similar to Command.Move.fromToPerform, use a regular download
-- of a local copy, lock early, and drop the local copy after sending.
getsendannexobject srcr = do
showAction $ UnquotedString $ "from " ++ Remote.name srcr
ifM (notifyTransfer Download af $ download srcr ek af stdRetry)
( lockContentForRemoval ek (return False) $ \contentlock -> do
showAction $ UnquotedString $ "to " ++ Remote.name r
sendlocalannexobject
`finally` removeAnnex contentlock
, return False
)
tryrenameannexobject fallback
| annexObjects (Remote.config r) = do
case renameExport (exportActions r) of

View file

@ -79,7 +79,7 @@ proxyExportTree = do
Just t -> do
tree <- filterExport r t
mtbcommitsha <- getExportCommit r b
seekExport r tree mtbcommitsha
seekExport r tree mtbcommitsha []
parseHookInput :: B.ByteString -> [((Sha, Sha), Ref)]
parseHookInput = mapMaybe parse . B8.lines

View file

@ -1019,7 +1019,7 @@ seekExportContent' o rs (mcurrbranch, madj)
| tree == currtree -> do
filteredtree <- Command.Export.filterExport r tree
Command.Export.changeExport r db filteredtree
Command.Export.fillExport r db filteredtree mtbcommitsha
Command.Export.fillExport r db filteredtree mtbcommitsha []
| otherwise -> cannotupdateexport r db Nothing False
(Nothing, _, _) -> cannotupdateexport r db (Just (Git.fromRef b ++ " does not exist")) True
(_, Nothing, _) -> cannotupdateexport r db (Just "no branch is currently checked out") True
@ -1062,7 +1062,7 @@ seekExportContent' o rs (mcurrbranch, madj)
-- filling in any files that did not get transferred
-- to the existing exported tree.
let filteredtree = Command.Export.ExportFiltered tree
Command.Export.fillExport r db filteredtree mtbcommitsha
Command.Export.fillExport r db filteredtree mtbcommitsha []
fillexistingexport r _ _ _ = do
warnExportImportConflict r
return False

View file

@ -77,6 +77,20 @@ so the overwritten modification is not lost.)
Specify the special remote to export to.
* `--from=remote`
When the content of a file is not available in the local repository,
this option lets it be downloaded from another remote, and sent on to the
destination remote. The file will be temporarily stored on local disk,
but will never enter the local repository.
This option can be repeated multiple times.
It is possible to use --from with the same remote as --to. If the tree
contains several files with the same content, and the remote being
exported to already contains one copy of the content, this allows making
a copy by downloading the content from it.
* `--tracking`
This is a deprecated way to set "remote.<name>.annex-tracking-branch".

View file

@ -33,6 +33,8 @@ Planned schedule of work:
* Working on `exportreeplus` branch which is groundwork for proxying to
exporttree=yes special remotes. Need to merge it to master.
* A proxied exporttree=yes special remote is not untrusted, and should be.
* Handle cases where a single key is used by multiple files in the exported
tree. Need to download from the special remote in order to export
multiple copies to it. (In particular, this is needed when using