devblog
This commit is contained in:
parent
352612618c
commit
e65455fdda
1 changed files with 27 additions and 0 deletions
27
doc/devblog/day_581__starting_import_from_S3.mdwn
Normal file
27
doc/devblog/day_581__starting_import_from_S3.mdwn
Normal file
|
@ -0,0 +1,27 @@
|
|||
Started today on `git annex import` from S3, in the "import-from-s3"
|
||||
branch.
|
||||
|
||||
It looks like I'm going to support both versioned and unversioned buckets;
|
||||
the latter will need --force to initialize since it can lose data.
|
||||
|
||||
One thought I had about that is: It's probably better for git-annex to be
|
||||
able to import data from an unversioned S3 bucket with caveats about
|
||||
avoiding unsafe operations (export) that could lose data, than it is for
|
||||
git-annex to not be able to import from the bucket at all, guaranteeing
|
||||
that past versions of modified files will be lost. (Rationalization is a
|
||||
powerful drug.)
|
||||
|
||||
To support unversioned buckets, some kind of stable content identifier is
|
||||
needed other than the S3 version id. Luckily, S3 has etags, which are
|
||||
md5sum of the content, so will work great. But, the `aws` haskell library
|
||||
needs one small change to return an etag, so this will be
|
||||
blocked on that change.
|
||||
|
||||
I've gotten listing importable contents from S3 working for unversioned
|
||||
buckets, including dealing with S3's 1000 item limit by paging.
|
||||
Listing importable contents from versioned buckets is harder, because
|
||||
it needs to synthesize a git version history from the information that S3
|
||||
provides. I think I have a method for doing this that will generate the
|
||||
trees that users will expect to see, and also will generate the same past
|
||||
trees every time, avoiding a proliferation of git trees. Next step:
|
||||
Converting my prose description of how to do that into haskell.
|
Loading…
Add table
Reference in a new issue