S3: Allow removing files from IA, but warn about derived versions potentially still existing there.

Removal works, only derives are a potential issue, so allow removing
with a warning. This way, unexporting a file works, and behavior is
consistent with IA remotes whether or not exporttree=yes.

Also tested exporting filenames containing unicode, spaces, underscores.
All worked, despite the IA's faq saying it doesn't.

This commit was sponsored by Trenton Cronholm on Patreon.
This commit is contained in:
Joey Hess 2017-09-12 12:33:08 -04:00
parent 7f0e2a4685
commit 267f47c473
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 33 additions and 23 deletions

View file

@ -11,9 +11,10 @@ comply with their [terms of service](http://www.archive.org/about/terms.php).
A nice added feature is that whenever git-annex sends a file to the
Internet Archive, it records its url, the same as if you'd run `git annex
addurl`. So any users who can clone your repository can download the files
from archive.org, without needing any login or password info. This makes
the Internet Archive a nice way to publish the large files associated with
a public git repository.
from archive.org, without needing any login or password info.
The url to the content in the Internet Archive is also displayed by
`git annex whereis`. This makes the Internet Archive a nice way to
publish the large files associated with a public git repository.
## webapp setup
@ -50,10 +51,15 @@ Then you can annex files and copy them to the remote as usual:
# git annex copy photo1.jpeg --fast --to archive-panama
copy (to archive-panama...) ok
Once a file has been stored on archive.org, it cannot be (easily) removed
from it. Also, git-annex whereis will tell you a public url for the file
on archive.org. (It may take a while for archive.org to make the file
publically visibile.)
It may take a while for archive.org to make files publically visible after
they've been uploaded.
## removing files
While files can be removed from the Internet Archive,
[derived versions](https://archive.org/help/derivatives.php)
of some files may continued to be stored there after the originals
were removed. git-annex warns about this problem.
## exporting trees
@ -63,6 +69,7 @@ are important, you can run `git annex initremote` with an additional
parameter "exporttree=yes", and then use [[git-annex-export]] to publish
a tree of files to the Internet Archive.
Note that the Internet Archive does not support filenames containing
whitespace and some other characters. Exporting such problem filenames will
fail; you can rename the file and re-export.
Note that the Internet Archive may not support certian characters
in filenames ([see FAQ](http://archive.org/about/faqs.php#1099)).
If exporting a filename fails due to such limitations, you would need
to rename it in your git annex repository in order to export it.

View file

@ -29,8 +29,6 @@ Work is in progress. Todo list:
Would need git-annex sync to export to the master tree?
This is similar to the little-used preferreddir= preferred content
setting and the "public" repository group.
* Test export to IA via S3. In particualar, does removing an exported file
work?
Low priority: