git-annex/doc/walkthrough/Internet_Archive_via_S3.mdwn
Joey Hess 57428c356e heh
2011-05-16 13:33:33 -04:00

49 lines
2.2 KiB
Markdown

[The Internet Archive](http://www.archive.org/) allows members to upload
collections using an Amazon S3
[compatible API](http://www.archive.org/help/abouts3.txt), and this can
be used with git-annex's [[special_remotes/S3]] support.
So, you can locally archive things with git-annex, define remotes that
correspond to "items" at the Internet Archive, and use git-annex to upload
your files to there. Of course, your use of the Internet Archive must
comply with their [terms of service](http://www.archive.org/about/terms.php).
Sign up for an account, and get your access keys here:
<http://www.archive.org/account/s3.php>
# export AWS_ACCESS_KEY_ID=blahblah
# export AWS_SECRET_ACCESS_KEY=xxxxxxx
Specify `host=s3.us.archive.org` when doing `initremote` to set up
a remote at the Archive. This will enable a special Internet Archive mode:
Encryption is not allowed; you are required to specify a bucket name
rather than having git-annex pick a random one; and you can optionally
specify `x-archive-meta*` headers to add metadata as explained in their
[documentation](http://www.archive.org/help/abouts3.txt).
[[!template id=note text="""
/!\ There seems to be a bug in either hS3 or the archive that breaks
authentication when the bucket name contains spaces or upper-case letters..
use all lowercase and no spaces when making the bucket with `initremote`.
"""]]
# git annex initremote archive-panama type=S3 \
host=s3.us.archive.org bucket=panama-canal-lock-blueprints \
x-archive-meta-mediatype=texts x-archive-meta-language=eng \
x-archive-meta-title="original Panama Canal lock design blueprints"
initremote archive-panama (Internet Archive mode) ok
# git annex describe archive-panama "a man, a plan, a canal: panama"
describe archive-panama ok
Then you can annex files and copy them to the remote as usual:
# git annex add photo1.jpeg --backend=SHA1E
add photo1.jpeg (checksum...) ok
# git annex copy photo1.jpeg --fast --to archive-panama
copy (to archive-panama...) ok
Note the use of the SHA1E [[backend|backends]]. It makes most sense
to use the WORM or SHA1E backend for files that will be stored in
the Internet Archive, since the key name will be exposed as the filename
there, and since the Archive does special processing of files based on
their extension.