S3 export (untested)

It opens a http connection per file exported, but then so does git
annex copy --to s3.

Decided not to munge exported filenames for IA. Too large a chance of
the munging having confusing results. Instead, export of files not
supported by IA, eg with spaces in their name, will fail.

This commit was supported by the NSF-funded DataLad project.
This commit is contained in:
Joey Hess 2017-09-08 15:41:31 -04:00
parent a1b195d84c
commit 44cd5ae313
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 121 additions and 64 deletions

View file

@ -55,31 +55,14 @@ from it. Also, git-annex whereis will tell you a public url for the file
on archive.org. (It may take a while for archive.org to make the file
publically visibile.)
Note the use of the SHA256E [[backend|backends]] when adding files. That is
the default backend used by git-annex, but even if you don't normally use
it, it makes most sense to use the WORM or SHA256E backend for files that
will be stored in the Internet Archive, since the key name will be exposed
as the filename there, and since the Archive does special processing of
files based on their extension.
## exporting trees
## publishing only one subdirectory
By default, files stored in the Internet Archive will show up there named
by their git-annex key, not the original filename. If the filenames
are important, you can run `git annex initremote` with an additional
parameter "exporttree=yes", and then use [[git-annex-export]] to publish
a tree of files to the Internet Archive.
Perhaps you have a repository with lots of files in it, and only want
to publish some of them to a particular Internet Archive item. Of course
you can specify which files to send manually, but it's useful to
configure [[preferred_content]] settings so git-annex knows what content
you want to store in the Internet Archive.
One way to do this is using the "public" repository type.
git annex enableremote archive-panama preferreddir=panama
git annex wanted archive-panama standard
git annex group archive-panama public
Now anything in a "panama" directory will be sent to that remote,
and anything else won't. You can use `git annex copy --auto` or the
assistant and it'll do the right thing.
When setting up an Internet Archive item using the webapp, this
configuration is automatically done, using an item name that the user
enters as the name of the subdirectory.
Note that the Internet Archive does not support filenames containing
whitespace and some other characters. Exporting such problem filenames will
fail; you can rename the file and re-export.