starting api design
This commit is contained in:
parent
b7991248db
commit
87987c78cf
1 changed files with 50 additions and 21 deletions
|
@ -11,7 +11,7 @@ that has the modifications in it.
|
|||
Updating the working copy is then done by merging the import treeish.
|
||||
This way, conflicts will be detected and handled as normal by git.
|
||||
|
||||
----
|
||||
## content identifiers
|
||||
|
||||
The remote is responsible for collecting a list of
|
||||
files currently in it, along with some content identifier. That data is
|
||||
|
@ -53,7 +53,28 @@ the same remote? In that case, perhaps different trees would be imported,
|
|||
and merged into master. So the two repositories then have differing
|
||||
masters, which can be reconciled in merge as usual.
|
||||
|
||||
----
|
||||
Since exporttree remotes don't have content identifier information yet, it
|
||||
needs to be collected the first time import tree is used. (Or import
|
||||
everything, but that is probably too expensive). Any modifications made to
|
||||
exported files before the first import tree would not be noticed. Seems
|
||||
acceptible as long as this only affects exporttree remotes created before
|
||||
this feature was added.
|
||||
|
||||
What if repo A is being used to import tree from R for a while, and the
|
||||
user gets used to editing files on R and importing them. Then they stop
|
||||
using A and switch to clone B. It would not have the content identifier
|
||||
information that A did. It seems that in this case, B needs to re-download
|
||||
everything, to build up the map of content identifiers.
|
||||
(Anything could have changed since the last time A imported).
|
||||
That seems too expensive!
|
||||
|
||||
Would storing content identifiers in the git-annex branch be too
|
||||
expensive? Probably not.. For S3 with versioning a content identifier is
|
||||
already stored. When the content identifier is (mtime, size, inode),
|
||||
that's a small amount of data. The maximum size of a content identifier
|
||||
could be limited to the size of a typical hash, and if a remote for some
|
||||
reason gets something larger, it could simply hash it to generate
|
||||
the content identifier.
|
||||
|
||||
## race conditions TODO
|
||||
|
||||
|
@ -152,25 +173,6 @@ Since this is acceptable in git, I suppose we can accept it here too..
|
|||
|
||||
----
|
||||
|
||||
Since exporttree remotes don't have content identifier information yet, it
|
||||
needs to be collected the first time import tree is used. (Or import
|
||||
everything, but that is probably too expensive). Any modifications made to
|
||||
exported files before the first import tree would not be noticed. Seems
|
||||
acceptible as long as this only affects exporttree remotes created before
|
||||
this feature was added.
|
||||
|
||||
What if repo A is being used to import tree from R for a while, and the
|
||||
user gets used to editing files on R and importing them. Then they stop
|
||||
using A and switch to clone B. It would not have the content identifier
|
||||
information that A did (unless it's stored in git-annex branch rather than
|
||||
locally). It seems that in this case, B needs to re-download everything,
|
||||
since anything could have changed since the last time A imported.
|
||||
That seems too expensive!
|
||||
|
||||
Would storing content identifiers in the git-annex branch be too expensive?
|
||||
|
||||
----
|
||||
|
||||
If multiple repos can access the remote at the same time, then there's a
|
||||
potential problem when one is exporting a new tree, and the other one is
|
||||
importing from the remote.
|
||||
|
@ -187,6 +189,33 @@ importing from the remote.
|
|||
> to be on the remote. (May need to reword that prompt.)
|
||||
> --[[Joey]]
|
||||
|
||||
## api design
|
||||
|
||||
Pulling all of the above together, this is an extension to the
|
||||
ExportActions api.
|
||||
|
||||
listContents :: Annex [(ExportLocation, ContentIdentifier)]
|
||||
|
||||
getContentIdentifier :: ExportLocation -> Annex (Maybe ContentIdentifier)
|
||||
|
||||
retrieveExportWithContentIdentifier :: ExportLocation -> ContentIdentifier -> FilePath -> MeterUpdate -> Annex Bool
|
||||
|
||||
storeExportWithContentIdentifier :: FilePath -> Key -> ExportLocation -> MeterUpdate -> Annex (Maybe ContentIdentifier)
|
||||
|
||||
retrieveExportWithContentIdentifier is used when downloading a new file from
|
||||
the remote that listContents found. retrieveExport can't be used because
|
||||
it has a Key parameter and the key is not yet known in this case.
|
||||
|
||||
storeExportWithContentIdentifier is used to get the content identifier
|
||||
corresponding to what was just stored. It can either get the content
|
||||
identifier in reply to the store (as S3 does with versioning), or it can
|
||||
store to a temp location, get the content identifier of that, and then
|
||||
rename the content into place. When there's a race with a concurrent
|
||||
writer, it needs to avoid getting the ContentIdentifier for data written by
|
||||
the other writer.
|
||||
|
||||
TODO what's needed to work around the other race condition discussed above?
|
||||
|
||||
----
|
||||
|
||||
See also, [[adb_special_remote]]
|
||||
|
|
Loading…
Add table
Reference in a new issue