From 94d8bfb1581a004d08a740453218fe599b1f883a Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Wed, 13 Feb 2019 16:28:02 -0400 Subject: [PATCH] finally an API happy with --- doc/todo/import_tree.mdwn | 44 ++++++++++++++++++++++++++++++++------- 1 file changed, 36 insertions(+), 8 deletions(-) diff --git a/doc/todo/import_tree.mdwn b/doc/todo/import_tree.mdwn index b0ab8cedc9..72e49f112b 100644 --- a/doc/todo/import_tree.mdwn +++ b/doc/todo/import_tree.mdwn @@ -99,7 +99,9 @@ import tree, and an export then overwrites it with something else. One solution would be to only allow one of importtree or exporttree to a given remote. This reduces the use cases a lot though, and perhaps so far that the import tree feature is not worth building. The adb -special remote needs both. +special remote needs both. Also, such a limitation seems like one that +users might try to work around by initializing two remotes using the same +data and trying to use one for import and the other for export. Really fixing this race needs locking or an atomic operation. Locking seems unlikely to be a portable enough solution. @@ -153,8 +155,12 @@ that git-annex did not know the file already had. with version-id-marker set to the previous version of the file, should list only the previous and current versions; if there's an intermediate version then the race occurred and it could roll the change -back, or otherwise recover the overwritten version. -(Note that there's a risk of a second race occuring during rollback.) +back, or otherwise recover the overwritten version. This could be done at +import time, to detect a previous race, and recover from it; importing +a tree with the file(s) that were overwritten due to the race, leading to a +tree import conflict that the user can resolve. This likely generalizes +to importing a sequence of trees, so each version written to S3 gets +imported. ---- @@ -194,7 +200,7 @@ importing from the remote. Pulling all of the above together, this is an extension to the ExportActions api. - listContents :: Annex [(ExportLocation, ContentIdentifier)] + listContents :: Annex (Tree [(ExportLocation, ContentIdentifier)]) getContentIdentifier :: ExportLocation -> Annex (Maybe ContentIdentifier) @@ -202,21 +208,43 @@ ExportActions api. storeExportWithContentIdentifier :: FilePath -> Key -> ExportLocation -> MeterUpdate -> Annex (Maybe ContentIdentifier) +listContents finds the current set of files that are stored in the remote, +some of which may have been written by other programs than git-annex, +along with their content identifiers. It returns a list of those, often in +a single node tree. + +listContents may also find past versions of files that are stored in the +remote, when it supports storing multiple versions of files. Since it +returns a tree of lists of files, it can represent anything from a linear +history to a full branching version control history. + retrieveExportWithContentIdentifier is used when downloading a new file from the remote that listContents found. retrieveExport can't be used because it has a Key parameter and the key is not yet known in this case. (The callback generating a key will let eg S3 record the S3 version id for the key.) +retrieveExportWithContentIdentifier should detect when the file it's +downloaded may not match the requested content identifier (eg when +something else wrote to it), and fail in that case. + storeExportWithContentIdentifier is used to get the content identifier corresponding to what it stores. It can either get the content identifier in reply to the store (as S3 does with versioning), or it can store to a temp location, get the content identifier of that, and then -rename the content into place. When there's a race with a concurrent -writer, it needs to avoid getting the wrong ContentIdentifier for data -written by the other writer. +rename the content into place. -TODO what's needed to work around the other race condition discussed above? +storeExportWithContentIdentifier must avoid overwriting any file that may +have been written to the remote by something else (unless that version of +the file can later be recovered by listContents), so it will typically +need to query for the content identifier before moving the new content +into place. + +storeExportWithContentIdentifier needs to handle the case when there's a +race with a concurrent writer. It needs to avoid getting the wrong +ContentIdentifier for data written by the other writer. It may detect such +races and fail, or it could succeed and overwrite the other file, so long +as it can later be recovered by listContents. ----