reorg
This commit is contained in:
parent
66fa4c947c
commit
617bdc740f
12 changed files with 5 additions and 18 deletions
49
doc/tips/Internet_Archive_via_S3.mdwn
Normal file
49
doc/tips/Internet_Archive_via_S3.mdwn
Normal file
|
@ -0,0 +1,49 @@
|
|||
[The Internet Archive](http://www.archive.org/) allows members to upload
|
||||
collections using an Amazon S3
|
||||
[compatible API](http://www.archive.org/help/abouts3.txt), and this can
|
||||
be used with git-annex's [[special_remotes/S3]] support.
|
||||
|
||||
So, you can locally archive things with git-annex, define remotes that
|
||||
correspond to "items" at the Internet Archive, and use git-annex to upload
|
||||
your files to there. Of course, your use of the Internet Archive must
|
||||
comply with their [terms of service](http://www.archive.org/about/terms.php).
|
||||
|
||||
Sign up for an account, and get your access keys here:
|
||||
<http://www.archive.org/account/s3.php>
|
||||
|
||||
# export AWS_ACCESS_KEY_ID=blahblah
|
||||
# export AWS_SECRET_ACCESS_KEY=xxxxxxx
|
||||
|
||||
Specify `host=s3.us.archive.org` when doing `initremote` to set up
|
||||
a remote at the Archive. This will enable a special Internet Archive mode:
|
||||
Encryption is not allowed; you are required to specify a bucket name
|
||||
rather than having git-annex pick a random one; and you can optionally
|
||||
specify `x-archive-meta*` headers to add metadata as explained in their
|
||||
[documentation](http://www.archive.org/help/abouts3.txt).
|
||||
|
||||
[[!template id=note text="""
|
||||
/!\ There seems to be a bug in either hS3 or the archive that breaks
|
||||
authentication when the bucket name contains spaces or upper-case letters..
|
||||
use all lowercase and no spaces when making the bucket with `initremote`.
|
||||
"""]]
|
||||
|
||||
# git annex initremote archive-panama type=S3 \
|
||||
host=s3.us.archive.org bucket=panama-canal-lock-blueprints \
|
||||
x-archive-meta-mediatype=texts x-archive-meta-language=eng \
|
||||
x-archive-meta-title="original Panama Canal lock design blueprints"
|
||||
initremote archive-panama (Internet Archive mode) ok
|
||||
# git annex describe archive-panama "a man, a plan, a canal: panama"
|
||||
describe archive-panama ok
|
||||
|
||||
Then you can annex files and copy them to the remote as usual:
|
||||
|
||||
# git annex add photo1.jpeg --backend=SHA1E
|
||||
add photo1.jpeg (checksum...) ok
|
||||
# git annex copy photo1.jpeg --fast --to archive-panama
|
||||
copy (to archive-panama...) ok
|
||||
|
||||
Note the use of the SHA1E [[backend|backends]]. It makes most sense
|
||||
to use the WORM or SHA1E backend for files that will be stored in
|
||||
the Internet Archive, since the key name will be exposed as the filename
|
||||
there, and since the Archive does special processing of files based on
|
||||
their extension.
|
16
doc/tips/migrating_data_to_a_new_backend.mdwn
Normal file
16
doc/tips/migrating_data_to_a_new_backend.mdwn
Normal file
|
@ -0,0 +1,16 @@
|
|||
Maybe you started out using the WORM backend, and have now configured
|
||||
git-annex to use SHA1. But files you added to the annex before still
|
||||
use the WORM backend. There is a simple command that can migrate that
|
||||
data:
|
||||
|
||||
# git annex migrate my_cool_big_file
|
||||
migrate my_cool_big_file (checksum...) ok
|
||||
|
||||
You can only migrate files whose content is currently available. Other
|
||||
files will be skipped.
|
||||
|
||||
After migrating a file to a new backend, the old content in the old backend
|
||||
will still be present. That is necessary because multiple files
|
||||
can point to the same content. The `git annex unused` subcommand can be
|
||||
used to clear up that detritus later. Note that hard links are used,
|
||||
to avoid wasting disk space.
|
36
doc/tips/powerful_file_matching.mdwn
Normal file
36
doc/tips/powerful_file_matching.mdwn
Normal file
|
@ -0,0 +1,36 @@
|
|||
git-annex has a powerful syntax for making it act on only certian files.
|
||||
|
||||
The simplest thing is to exclude some files, using wild cards:
|
||||
|
||||
git annex get --exclude '*.mp3' --exclude '*.ogg'
|
||||
|
||||
But you can also exclude files that git-annex's [[location_tracking]]
|
||||
information indicates are present in a given repository. For example,
|
||||
if you want to populate newarchive with files, but not those already
|
||||
on oldarchive, you could do it like this:
|
||||
|
||||
git annex copy --not --in oldarchive --to newarchive
|
||||
|
||||
Without the --not, --in makes it act on files that *are* in the specified
|
||||
repository. So, to remove files that are on oldarchive:
|
||||
|
||||
git annex drop --in oldarchive
|
||||
|
||||
Or maybe you're curious which files have a lot of copies, and then
|
||||
also want to know which files have only one copy:
|
||||
|
||||
git annex find --copies 7
|
||||
git annex find --not --copies 2
|
||||
|
||||
The above are the simple examples of specifying what files git-annex
|
||||
should act on. But you can specify anything you can dream up by combining
|
||||
the things above, with --and --or -( and -). Those last two strange-looking
|
||||
options are parentheses, for grouping other options. You will probably
|
||||
have to escape them from your shell.
|
||||
|
||||
Here are the mp3 files that are in either of two repositories, but have
|
||||
less than 3 copies:
|
||||
|
||||
git annex find --not --exclude '*.mp3' --and \
|
||||
-\( --in usbdrive --or --in archive -\) --and \
|
||||
--not --copies 3
|
19
doc/tips/recover_data_from_lost+found.mdwn
Normal file
19
doc/tips/recover_data_from_lost+found.mdwn
Normal file
|
@ -0,0 +1,19 @@
|
|||
Suppose something goes wrong, and fsck puts all the files in lost+found.
|
||||
It's actually very easy to recover from this disaster.
|
||||
|
||||
First, check out the git repository again. Then, in the new checkout:
|
||||
|
||||
$ mkdir recovered-content
|
||||
$ sudo mv ../lost+found/* recovered-content
|
||||
$ sudo chown you:you recovered-content
|
||||
$ chmod -R u+w recovered-content
|
||||
$ git annex add recovered-content
|
||||
$ git rm recovered-content
|
||||
$ git commit -m "recovered some content"
|
||||
$ git annex fsck
|
||||
|
||||
The way that works is that when git-annex adds the same content that was in
|
||||
the repository before, all the old links to that content start working
|
||||
again. This works particularly well if the SHA* backends are used, but even
|
||||
with the default backend it will work pretty well, as long as fsck
|
||||
preserved the modification time of the files.
|
28
doc/tips/untrusted_repositories.mdwn
Normal file
28
doc/tips/untrusted_repositories.mdwn
Normal file
|
@ -0,0 +1,28 @@
|
|||
Suppose you have a USB thumb drive and are using it as a git annex
|
||||
repository. You don't trust the drive, because you could lose it, or
|
||||
accidentally run it through the laundry. Or, maybe you have a drive that
|
||||
you know is dying, and you'd like to be warned if there are any files
|
||||
on it not backed up somewhere else. Maybe the drive has already died
|
||||
or been lost.
|
||||
|
||||
You can let git-annex know that you don't trust a repository, and it will
|
||||
adjust its behavior to avoid relying on that repositories's continued
|
||||
availability.
|
||||
|
||||
# git annex untrust usbdrive
|
||||
untrust usbdrive ok
|
||||
|
||||
Now when you do a fsck, you'll be warned appropriately:
|
||||
|
||||
# git annex fsck .
|
||||
fsck my_big_file
|
||||
Only these untrusted locations may have copies of this file!
|
||||
05e296c4-2989-11e0-bf40-bad1535567fe -- portable USB drive
|
||||
Back it up to trusted locations with git-annex copy.
|
||||
failed
|
||||
|
||||
Also, git-annex will refuse to drop a file from elsewhere just because
|
||||
it can see a copy on the untrusted repository.
|
||||
|
||||
It's also possible to tell git-annex that you have an unusually high
|
||||
level of trust for a repository. See [[trust]] for details.
|
37
doc/tips/using_Amazon_S3.mdwn
Normal file
37
doc/tips/using_Amazon_S3.mdwn
Normal file
|
@ -0,0 +1,37 @@
|
|||
git-annex extends git's usual remotes with some [[special_remotes]], that
|
||||
are not git repositories. This way you can set up a remote using say,
|
||||
Amazon S3, and use git-annex to transfer files into the cloud.
|
||||
|
||||
First, export your S3 credentials:
|
||||
|
||||
# export ANNEX_S3_ACCESS_KEY_ID="08TJMT99S3511WOZEP91"
|
||||
# export ANNEX_S3_SECRET_ACCESS_KEY="s3kr1t"
|
||||
|
||||
Now, create a gpg key, if you don't already have one. This will be used
|
||||
to encrypt everything stored in S3, for your privacy. Once you have
|
||||
a gpg key, run `gpg --list-secret-keys` to look up its key id, something
|
||||
like "2512E3C7"
|
||||
|
||||
Next, create the S3 remote, and describe it.
|
||||
|
||||
# git annex initremote cloud type=S3 encryption=2512E3C7
|
||||
initremote cloud (encryption setup with gpg key C910D9222512E3C7) (checking bucket) (creating bucket in US) (gpg) ok
|
||||
# git annex describe cloud "at Amazon's US datacenter"
|
||||
describe cloud ok
|
||||
|
||||
The configuration for the S3 remote is stored in git. So to make another
|
||||
repository use the same S3 remote is easy:
|
||||
|
||||
# cd /media/usb/annex
|
||||
# git pull laptop
|
||||
# git annex initremote cloud
|
||||
initremote cloud (gpg) (checking bucket) ok
|
||||
|
||||
Now the remote can be used like any other remote.
|
||||
|
||||
# git annex copy my_cool_big_file --to cloud
|
||||
copy my_cool_big_file (gpg) (checking cloud...) (to cloud...) ok
|
||||
# git annex move video/hackity_hack_and_kaxxt.mov --to cloud
|
||||
move video/hackity_hack_and_kaxxt.mov (checking cloud...) (to cloud...) ok
|
||||
|
||||
See [[special_remotes/S3]] for details.
|
11
doc/tips/using_the_SHA1_backend.mdwn
Normal file
11
doc/tips/using_the_SHA1_backend.mdwn
Normal file
|
@ -0,0 +1,11 @@
|
|||
A handy alternative to the default [[backend|backends]] is the
|
||||
SHA1 backend. This backend provides more git-style assurance that your data
|
||||
has not been damaged. And the checksum means that when you add the same
|
||||
content to the annex twice, only one copy need be stored in the backend.
|
||||
|
||||
The only reason it's not the default is that it needs to checksum
|
||||
files when they're added to the annex, and this can slow things down
|
||||
significantly for really big files. To make SHA1 the default, just
|
||||
add something like this to `.gitattributes`:
|
||||
|
||||
* annex.backend=SHA1
|
32
doc/tips/using_the_web.mdwn
Normal file
32
doc/tips/using_the_web.mdwn
Normal file
|
@ -0,0 +1,32 @@
|
|||
The web can be used as a [[special_remote|special_remotes]] too.
|
||||
|
||||
# git annex addurl http://example.com/video.mpeg
|
||||
addurl example.com_video.mpeg (downloading http://example.com/video.mpeg)
|
||||
########################################################## 100.0%
|
||||
ok
|
||||
|
||||
Now the file is downloaded, and has been added to the annex like any other
|
||||
file. So it can be renamed, copied to other repositories, and so on.
|
||||
|
||||
Note that git-annex assumes that, if the web site does not 404, the file is
|
||||
still present on the web, and this counts as one [[copy|copies]] of the
|
||||
file. So it will let you remove your last copy, trusting it can be
|
||||
downloaded again:
|
||||
|
||||
# git annex drop example.com_video.mpeg
|
||||
drop example.com_video.mpeg (checking http://example.com/video.mpeg) ok
|
||||
|
||||
If you don't [[trust]] the web to this degree, just let git-annex know:
|
||||
|
||||
# git annex untrust web
|
||||
untrust web ok
|
||||
|
||||
With the result that it will hang onto files:
|
||||
|
||||
# git annex drop example.com_video.mpeg
|
||||
drop example.com_video.mpeg (unsafe)
|
||||
Could only verify the existence of 0 out of 1 necessary copies
|
||||
Also these untrusted repositories may contain the file:
|
||||
00000000-0000-0000-0000-000000000001 -- web
|
||||
(Use --force to override this check, or adjust annex.numcopies.)
|
||||
failed
|
19
doc/tips/what_to_do_when_you_lose_a_repository.mdwn
Normal file
19
doc/tips/what_to_do_when_you_lose_a_repository.mdwn
Normal file
|
@ -0,0 +1,19 @@
|
|||
So you lost a thumb drive containing a git-annex repository. Or a hard
|
||||
drive died or some other misfortune has befallen your data.
|
||||
|
||||
Unless you configured backups, git-annex can't get your data back. But it
|
||||
can help you deal with the loss.
|
||||
|
||||
First, go somewhere that knows about the lost repository, and mark it as
|
||||
untrusted.
|
||||
|
||||
git annex untrust usbdrive
|
||||
|
||||
To remind yourself later what happened, you can change its description, too:
|
||||
|
||||
git annex describe usbdrive "USB drive lost in Timbuktu. Probably gone forever."
|
||||
|
||||
This retains the [[location_tracking]] information for the repository.
|
||||
Maybe you'll find the drive later. Maybe that's impossible. Either way,
|
||||
this lets git-annex tell you why a file is no longer accessible, and
|
||||
it avoids it relying on that drive to hold any content.
|
Loading…
Add table
Add a link
Reference in a new issue