52 lines
2.2 KiB
Markdown
52 lines
2.2 KiB
Markdown
Support Amazon S3 as a file storage backend.
|
|
|
|
There's a haskell library that looks good. Not yet in Debian.
|
|
|
|
Multiple ways of using S3 are possible. Current plan is to have a S3BUCKET
|
|
backend, that is derived from Backend.File, so it caches files locally and
|
|
can transfer files between systems too, without involving S3.
|
|
|
|
get will try to get it from S3 or from a remote. A annex.s3.cost can
|
|
configure the cost of S3 vs the cost of other remotes.
|
|
|
|
add will always upload a copy to S3.
|
|
|
|
Each file in the S3 bucket is assumed to be in the annex. So unused
|
|
will show files in the bucket that nothing points to, and dropunused remove
|
|
them.
|
|
|
|
For numcopies counting, S3 will count as 1 copy (or maybe more?), so if
|
|
numcopies=2, then you don't fully trust S3 and request git-annex assure
|
|
one other copy.
|
|
|
|
drop will remove a file locally, but keep it in S3. drop --force *might*
|
|
remove it from S3. TBD.
|
|
|
|
annex.s3.bucket would configure the bucket the use. (And an env var or
|
|
something configure the password.) Although the bucket
|
|
would also be encoded in the keys. So, the configured bucket would be used
|
|
when adding new files. A system could move from one bucket to another over
|
|
time while still having legacy files in an earlier one;
|
|
perhaps you move to Europe and want new files to be put in that region.
|
|
|
|
And git annex `migrate --backend=S3BUCKET --force` could move files
|
|
between datacenters!
|
|
|
|
Problem: Then the only way for unused to know what buckets are in use
|
|
is to see what keys point to them -- but if the last file from a bucket is
|
|
deleted, it would then not be able to say that the files in that bucket are
|
|
all unused. Need cached list of recently seen S3 buckets?
|
|
|
|
-----
|
|
|
|
One problem with this is what key metadata to include. Should it be like
|
|
WORM? Or like SHA1? Or just a new unique identifier for each file? It might
|
|
be worth having S3 variants of *all* the Backend.File derived backends.
|
|
|
|
More blue-sky, it might be nice to be able to union or stack together
|
|
multiple backends, so S3BUCKET+SHA1 or S3BUCKET+WORM. That would likely
|
|
be hard to get right.
|
|
|
|
Less blue-sky, if the S3 capability were added directly to Backend.File,
|
|
and bucket name was configured by annex.s3.bucket, then any existing
|
|
annexed file could be upgraded to also store on S3.
|