1332e6cec0
In a test, I uploaded a pdf, and several files were derived from it. After removing the pdf, the derived files went away after approximatly half an hour. This window does not seem worth warning about every time. Documented it in the tip.
75 lines
3.2 KiB
Markdown
75 lines
3.2 KiB
Markdown
[The Internet Archive](http://www.archive.org/) allows members to upload
|
|
collections using an Amazon S3
|
|
[compatible API](http://www.archive.org/help/abouts3.txt), and this can
|
|
be used with git-annex's [[special_remotes/S3]] support.
|
|
|
|
So, you can locally archive things with git-annex, define remotes that
|
|
correspond to "items" at the Internet Archive, and use git-annex to upload
|
|
your files to there. Of course, your use of the Internet Archive must
|
|
comply with their [terms of service](http://www.archive.org/about/terms.php).
|
|
|
|
A nice added feature is that whenever git-annex sends a file to the
|
|
Internet Archive, it records its url, the same as if you'd run `git annex
|
|
addurl`. So any users who can clone your repository can download the files
|
|
from archive.org, without needing any login or password info.
|
|
The url to the content in the Internet Archive is also displayed by
|
|
`git annex whereis`. This makes the Internet Archive a nice way to
|
|
publish the large files associated with a public git repository.
|
|
|
|
## webapp setup
|
|
|
|
Just go to "Add Another Repository", pick "Internet Archive",
|
|
and you're on your way.
|
|
|
|
## basic setup
|
|
|
|
Sign up for an account, and get your access keys here:
|
|
<http://www.archive.org/account/s3.php>
|
|
|
|
# export AWS_ACCESS_KEY_ID=blahblah
|
|
# export AWS_SECRET_ACCESS_KEY=xxxxxxx
|
|
|
|
Specify `host=s3.us.archive.org` when doing `initremote` to set up
|
|
a remote at the Archive. This will enable a special Internet Archive mode:
|
|
Encryption is not allowed; you are required to specify a bucket name
|
|
rather than having git-annex pick a random one; and you can optionally
|
|
specify `x-archive-meta*` headers to add metadata as explained in their
|
|
[documentation](http://www.archive.org/help/abouts3.txt).
|
|
|
|
# git annex initremote archive-panama type=S3 \
|
|
host=s3.us.archive.org bucket=panama-canal-lock-blueprints \
|
|
x-archive-meta-mediatype=texts x-archive-meta-language=eng \
|
|
x-archive-meta-title="original Panama Canal lock design blueprints"
|
|
initremote archive-panama (Internet Archive mode) ok
|
|
# git annex describe archive-panama "a man, a plan, a canal: panama"
|
|
describe archive-panama ok
|
|
|
|
Then you can annex files and copy them to the remote as usual:
|
|
|
|
# git annex add photo1.jpeg --backend=SHA256E
|
|
add photo1.jpeg (checksum...) ok
|
|
# git annex copy photo1.jpeg --fast --to archive-panama
|
|
copy (to archive-panama...) ok
|
|
|
|
## update lag
|
|
|
|
It may take a while for archive.org to make files publically visible after
|
|
they've been uploaded.
|
|
|
|
While files can be removed from the Internet Archive,
|
|
[derived versions](https://archive.org/help/derivatives.php)
|
|
of some files may continued to be stored there for a while
|
|
after the originals were removed.
|
|
|
|
## exporting trees
|
|
|
|
By default, files stored in the Internet Archive will show up there named
|
|
by their git-annex key, not the original filename. If the filenames
|
|
are important, you can run `git annex initremote` with an additional
|
|
parameter "exporttree=yes", and then use [[git-annex-export]] to publish
|
|
a tree of files to the Internet Archive.
|
|
|
|
Note that the Internet Archive may not support certian characters
|
|
in filenames ([see FAQ](http://archive.org/about/faqs.php#1099)).
|
|
If exporting a filename fails due to such limitations, you would need
|
|
to rename it in your git annex repository in order to export it.
|