From ee076b68f51e9ae4d4a0e9f4cd1efffb9dce538b Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 4 Aug 2024 11:17:42 -0400 Subject: [PATCH] strong verification on retrieval from annexobjects location The file in the annexobjects location may have been renamed from a previously exported file that got deleted in a subsequent export. Or it may be renamed to annexobjects temporarily before being renamed to another name (to handle eg pairwise renames). But, an exported file is not guaranteed to contain the content of the key that the local repository last exported there. Another tree could have been exported from elsewhere in the meantime. So, files in annexobjects do not necessarily have the content of their key. And so have to be strongly verified when retrieving. The same as is done when retrieving exported files. --- Remote/Helper/ExportImport.hs | 36 ++++++++++--------- ...xporttree_remotes_could_store_any_key.mdwn | 17 +++++---- doc/todo/git-annex_proxies.mdwn | 7 ---- 3 files changed, 29 insertions(+), 31 deletions(-) diff --git a/Remote/Helper/ExportImport.hs b/Remote/Helper/ExportImport.hs index 73137c728a..09df4c58b4 100644 --- a/Remote/Helper/ExportImport.hs +++ b/Remote/Helper/ExportImport.hs @@ -184,15 +184,11 @@ adjustExportImport' isexport isimport annexobjects r rs gc = do then lockContent r else Nothing , retrieveKeyFile = \k af dest p vc -> - if isimport + if isimport || isexport then supportversionedretrieve k af dest p vc $ - supportretrieveannexobject dbv k dest p $ - retrieveKeyFileFromImport dbv ciddbv k af dest p - else if isexport - then supportversionedretrieve k af dest p vc $ - supportretrieveannexobject dbv k dest p $ - retrieveKeyFileFromExport dbv k af dest p - else retrieveKeyFile r k af dest p vc + supportretrieveannexobject dbv k af dest p $ + retrieveFromImportOrExport (tryexportlocs dbv k) ciddbv k af dest p + else retrieveKeyFile r k af dest p vc , retrieveKeyFileCheap = if versioned then retrieveKeyFileCheap r else Nothing @@ -371,12 +367,16 @@ adjustExportImport' isexport isimport annexobjects r rs gc = do db <- getciddb ciddbv liftIO $ ContentIdentifier.getContentIdentifiers db rs k + retrieveFromImportOrExport getlocs ciddbv k af dest p + | isimport = retrieveFromImport getlocs ciddbv k af dest p + | otherwise = retrieveFromExport getlocs k af dest p + -- Keys can be retrieved using retrieveExport, but since that -- retrieves from a path in the remote that another writer could -- have replaced with content not of the requested key, the content -- has to be strongly verified. - retrieveKeyFileFromExport dbv k _af dest p = ifM (isVerifiable k) - ( tryexportlocs dbv k $ \loc -> + retrieveFromExport getlocs k _af dest p = ifM (isVerifiable k) + ( getlocs $ \loc -> retrieveExport (exportActions r) k loc dest p >>= return . \case UnVerified -> MustVerify IncompleteVerify iv -> MustFinishIncompleteVerify iv @@ -384,15 +384,15 @@ adjustExportImport' isexport isimport annexobjects r rs gc = do , giveup $ "exported content cannot be verified due to using the " ++ decodeBS (formatKeyVariety (fromKey keyVariety k)) ++ " backend" ) - retrieveKeyFileFromImport dbv ciddbv k af dest p = do + retrieveFromImport getlocs ciddbv k af dest p = do cids <- getkeycids ciddbv k if not (null cids) - then tryexportlocs dbv k $ \loc -> + then getlocs $ \loc -> snd <$> retrieveExportWithContentIdentifier (importActions r) loc cids dest (Left k) p -- In case a content identifier is somehow missing, -- try this instead. else if isexport - then retrieveKeyFileFromExport dbv k af dest p + then retrieveFromExport getlocs k af dest p else giveup "no content identifier is recorded, unable to retrieve" checkpresentwith k a = ifM a @@ -437,13 +437,15 @@ adjustExportImport' isexport isimport annexobjects r rs gc = do ) _ -> giveup "This key is part of the exported tree, so can only be removed by exporting a tree that does not include it." - retrieveannexobject k dest p = - retrieveExport (exportActions r) k (annexobjectlocation k) dest p + retrieveannexobject k af dest p = + retrieveFromExport getlocs k af dest p + where + getlocs a = a (annexobjectlocation k) - supportretrieveannexobject dbv k dest p a + supportretrieveannexobject dbv k af dest p a | annexobjects = tryNonAsync a >>= \case Right res -> return res - Left err -> tryNonAsync (retrieveannexobject k dest p) >>= \case + Left err -> tryNonAsync (retrieveannexobject k af dest p) >>= \case Right res -> return res -- Both failed, so which exception to -- throw? If there are known export diff --git a/doc/todo/exporttree_remotes_could_store_any_key.mdwn b/doc/todo/exporttree_remotes_could_store_any_key.mdwn index 2ad104e815..0762ae9172 100644 --- a/doc/todo/exporttree_remotes_could_store_any_key.mdwn +++ b/doc/todo/exporttree_remotes_could_store_any_key.mdwn @@ -80,8 +80,6 @@ also remove from the objects location. ---- -# trust - Could a remote with annexobjects=yet and exporttree=yes but without importtree=yes not be forced to be untrusted? @@ -92,11 +90,12 @@ If the annexobjects directory only gets keys uploaded to it, and never had exported files renamed into it, its content will always be as expected, and perhaps the remote does not need to be untrusted. -OTOH, if an exported file that is being deleted in an updated export gets -renamed into the annexobjects directory, it's possible that the file has in -fact been overwritten with other content (by git-annex in another clone of -the repository), and so the object in annexobjects would not be as -expected. So unfortunately, it seems that rename can't be done. +OTOH, if an exported file that is being deleted (or pairwise renamed) in an +updated export gets renamed into the annexobjects directory, it's possible +that the file has in fact been overwritten with other content (by git-annex +in another clone of the repository), and so the object in annexobjects +would not be as expected. So unfortunately, it seems that rename can't be +done without forcing untrusted. Note that, exporting a new tree can still delete any file at any time. If the remote is not untrusted, that could violate numcopies. @@ -114,6 +113,10 @@ the annexobjects directory, and the other for the exported files. This clean separation avoids the above problem. But would be confusing for the user. HOWEVER, what if the two were treated as parts of the same cluster....? +This may be worth revisiting later, but for now, I am leaning to keeping it +untrusted, and following down that line to make it as performant as +possible. + --- Implementing in the "exportreeplus" branch --[[Joey]] diff --git a/doc/todo/git-annex_proxies.mdwn b/doc/todo/git-annex_proxies.mdwn index 429026cb09..b7fa222717 100644 --- a/doc/todo/git-annex_proxies.mdwn +++ b/doc/todo/git-annex_proxies.mdwn @@ -36,13 +36,6 @@ Planned schedule of work: * `git-annex export` when renaming an exported file to a temporary name should use the annexobjects location. -* Make annexobjects=true remotes not be untrusted, if possible. See todo. - - Alternatively, if they do need to be untrusted, the retrieval from the - annexobjects location may also need to do strong verification of the - content, if exported files ever get renamed into the annexobjects - location. - ## items deferred until later for p2p protocol over http * `git-annex p2phttp` should support serving several repositories at the same