strong verification on retrieval from annexobjects location

The file in the annexobjects location may have been renamed from a
previously exported file that got deleted in a subsequent export.
Or it may be renamed to annexobjects temporarily before being renamed to
another name (to handle eg pairwise renames).

But, an exported file is not guaranteed to contain the content of the
key that the local repository last exported there. Another tree could
have been exported from elsewhere in the meantime.

So, files in annexobjects do not necessarily have the content of their
key. And so have to be strongly verified when retrieving. The same as
is done when retrieving exported files.
This commit is contained in:
Joey Hess 2024-08-04 11:17:42 -04:00
parent fe01a1e7e1
commit ee076b68f5
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 29 additions and 31 deletions

View file

@ -184,14 +184,10 @@ adjustExportImport' isexport isimport annexobjects r rs gc = do
then lockContent r then lockContent r
else Nothing else Nothing
, retrieveKeyFile = \k af dest p vc -> , retrieveKeyFile = \k af dest p vc ->
if isimport if isimport || isexport
then supportversionedretrieve k af dest p vc $ then supportversionedretrieve k af dest p vc $
supportretrieveannexobject dbv k dest p $ supportretrieveannexobject dbv k af dest p $
retrieveKeyFileFromImport dbv ciddbv k af dest p retrieveFromImportOrExport (tryexportlocs dbv k) ciddbv k af dest p
else if isexport
then supportversionedretrieve k af dest p vc $
supportretrieveannexobject dbv k dest p $
retrieveKeyFileFromExport dbv k af dest p
else retrieveKeyFile r k af dest p vc else retrieveKeyFile r k af dest p vc
, retrieveKeyFileCheap = if versioned , retrieveKeyFileCheap = if versioned
then retrieveKeyFileCheap r then retrieveKeyFileCheap r
@ -371,12 +367,16 @@ adjustExportImport' isexport isimport annexobjects r rs gc = do
db <- getciddb ciddbv db <- getciddb ciddbv
liftIO $ ContentIdentifier.getContentIdentifiers db rs k liftIO $ ContentIdentifier.getContentIdentifiers db rs k
retrieveFromImportOrExport getlocs ciddbv k af dest p
| isimport = retrieveFromImport getlocs ciddbv k af dest p
| otherwise = retrieveFromExport getlocs k af dest p
-- Keys can be retrieved using retrieveExport, but since that -- Keys can be retrieved using retrieveExport, but since that
-- retrieves from a path in the remote that another writer could -- retrieves from a path in the remote that another writer could
-- have replaced with content not of the requested key, the content -- have replaced with content not of the requested key, the content
-- has to be strongly verified. -- has to be strongly verified.
retrieveKeyFileFromExport dbv k _af dest p = ifM (isVerifiable k) retrieveFromExport getlocs k _af dest p = ifM (isVerifiable k)
( tryexportlocs dbv k $ \loc -> ( getlocs $ \loc ->
retrieveExport (exportActions r) k loc dest p >>= return . \case retrieveExport (exportActions r) k loc dest p >>= return . \case
UnVerified -> MustVerify UnVerified -> MustVerify
IncompleteVerify iv -> MustFinishIncompleteVerify iv IncompleteVerify iv -> MustFinishIncompleteVerify iv
@ -384,15 +384,15 @@ adjustExportImport' isexport isimport annexobjects r rs gc = do
, giveup $ "exported content cannot be verified due to using the " ++ decodeBS (formatKeyVariety (fromKey keyVariety k)) ++ " backend" , giveup $ "exported content cannot be verified due to using the " ++ decodeBS (formatKeyVariety (fromKey keyVariety k)) ++ " backend"
) )
retrieveKeyFileFromImport dbv ciddbv k af dest p = do retrieveFromImport getlocs ciddbv k af dest p = do
cids <- getkeycids ciddbv k cids <- getkeycids ciddbv k
if not (null cids) if not (null cids)
then tryexportlocs dbv k $ \loc -> then getlocs $ \loc ->
snd <$> retrieveExportWithContentIdentifier (importActions r) loc cids dest (Left k) p snd <$> retrieveExportWithContentIdentifier (importActions r) loc cids dest (Left k) p
-- In case a content identifier is somehow missing, -- In case a content identifier is somehow missing,
-- try this instead. -- try this instead.
else if isexport else if isexport
then retrieveKeyFileFromExport dbv k af dest p then retrieveFromExport getlocs k af dest p
else giveup "no content identifier is recorded, unable to retrieve" else giveup "no content identifier is recorded, unable to retrieve"
checkpresentwith k a = ifM a checkpresentwith k a = ifM a
@ -437,13 +437,15 @@ adjustExportImport' isexport isimport annexobjects r rs gc = do
) )
_ -> giveup "This key is part of the exported tree, so can only be removed by exporting a tree that does not include it." _ -> giveup "This key is part of the exported tree, so can only be removed by exporting a tree that does not include it."
retrieveannexobject k dest p = retrieveannexobject k af dest p =
retrieveExport (exportActions r) k (annexobjectlocation k) dest p retrieveFromExport getlocs k af dest p
where
getlocs a = a (annexobjectlocation k)
supportretrieveannexobject dbv k dest p a supportretrieveannexobject dbv k af dest p a
| annexobjects = tryNonAsync a >>= \case | annexobjects = tryNonAsync a >>= \case
Right res -> return res Right res -> return res
Left err -> tryNonAsync (retrieveannexobject k dest p) >>= \case Left err -> tryNonAsync (retrieveannexobject k af dest p) >>= \case
Right res -> return res Right res -> return res
-- Both failed, so which exception to -- Both failed, so which exception to
-- throw? If there are known export -- throw? If there are known export

View file

@ -80,8 +80,6 @@ also remove from the objects location.
---- ----
# trust
Could a remote with annexobjects=yet and exporttree=yes but without Could a remote with annexobjects=yet and exporttree=yes but without
importtree=yes not be forced to be untrusted? importtree=yes not be forced to be untrusted?
@ -92,11 +90,12 @@ If the annexobjects directory only gets keys uploaded to it, and never had
exported files renamed into it, its content will always be as expected, and exported files renamed into it, its content will always be as expected, and
perhaps the remote does not need to be untrusted. perhaps the remote does not need to be untrusted.
OTOH, if an exported file that is being deleted in an updated export gets OTOH, if an exported file that is being deleted (or pairwise renamed) in an
renamed into the annexobjects directory, it's possible that the file has in updated export gets renamed into the annexobjects directory, it's possible
fact been overwritten with other content (by git-annex in another clone of that the file has in fact been overwritten with other content (by git-annex
the repository), and so the object in annexobjects would not be as in another clone of the repository), and so the object in annexobjects
expected. So unfortunately, it seems that rename can't be done. would not be as expected. So unfortunately, it seems that rename can't be
done without forcing untrusted.
Note that, exporting a new tree can still delete any file at any time. Note that, exporting a new tree can still delete any file at any time.
If the remote is not untrusted, that could violate numcopies. If the remote is not untrusted, that could violate numcopies.
@ -114,6 +113,10 @@ the annexobjects directory, and the other for the exported files. This
clean separation avoids the above problem. But would be confusing for the clean separation avoids the above problem. But would be confusing for the
user. HOWEVER, what if the two were treated as parts of the same cluster....? user. HOWEVER, what if the two were treated as parts of the same cluster....?
This may be worth revisiting later, but for now, I am leaning to keeping it
untrusted, and following down that line to make it as performant as
possible.
--- ---
Implementing in the "exportreeplus" branch --[[Joey]] Implementing in the "exportreeplus" branch --[[Joey]]

View file

@ -36,13 +36,6 @@ Planned schedule of work:
* `git-annex export` when renaming an exported file to a temporary name * `git-annex export` when renaming an exported file to a temporary name
should use the annexobjects location. should use the annexobjects location.
* Make annexobjects=true remotes not be untrusted, if possible. See todo.
Alternatively, if they do need to be untrusted, the retrieval from the
annexobjects location may also need to do strong verification of the
content, if exported files ever get renamed into the annexobjects
location.
## items deferred until later for p2p protocol over http ## items deferred until later for p2p protocol over http
* `git-annex p2phttp` should support serving several repositories at the same * `git-annex p2phttp` should support serving several repositories at the same