split the walkthrough and inline back together
This commit is contained in:
parent
9e49a71282
commit
98e246b49b
20 changed files with 386 additions and 420 deletions
|
@ -2,423 +2,24 @@ A walkthrough of the basic features of git-annex.
|
||||||
|
|
||||||
[[!toc]]
|
[[!toc]]
|
||||||
|
|
||||||
## creating a repository
|
[[!inline feeds=no pagenames="""
|
||||||
|
creating_a_repository
|
||||||
This is very straightforward. Just tell it a description of the repository.
|
adding_a_remote
|
||||||
|
adding_files
|
||||||
# mkdir ~/annex
|
renaming_files
|
||||||
# cd ~/annex
|
getting_file_content
|
||||||
# git init
|
transferring_files:_When_things_go_wrong
|
||||||
# git annex init "my laptop"
|
removing_files
|
||||||
|
removing_files:_When_things_go_wrong
|
||||||
## adding a remote
|
modifying_annexed_files
|
||||||
|
using_ssh_remotes
|
||||||
Like any other git repository, git-annex repositories have remotes.
|
moving_file_content_between_repositories
|
||||||
Let's start by adding a USB drive as a remote.
|
using_the_URL_backend
|
||||||
|
using_the_SHA1_backend
|
||||||
# sudo mount /media/usb
|
migrating_data_to_a_new_backend
|
||||||
# cd /media/usb
|
unused_data
|
||||||
# git clone ~/annex
|
fsck:_verifying_your_data
|
||||||
# cd annex
|
fsck:_when_things_go_wrong
|
||||||
# git annex init "portable USB drive"
|
backups
|
||||||
# git remote add laptop ~/annex
|
untrusted_repositories
|
||||||
# cd ~/annex
|
"""]]
|
||||||
# git remote add usbdrive /media/usb
|
|
||||||
|
|
||||||
This is all standard ad-hoc distributed git repository setup.
|
|
||||||
The only git-annex specific part is telling it the name
|
|
||||||
of the new repository created on the USB drive.
|
|
||||||
|
|
||||||
Notice that both repos are set up as remotes of one another. This lets
|
|
||||||
either get annexed files from the other. You'll want to do that even
|
|
||||||
if you are using git in a more centralized fashion.
|
|
||||||
|
|
||||||
## adding files
|
|
||||||
|
|
||||||
# cd ~/annex
|
|
||||||
# cp /tmp/big_file .
|
|
||||||
# cp /tmp/debian.iso .
|
|
||||||
# git annex add .
|
|
||||||
add big_file ok
|
|
||||||
add debian.iso ok
|
|
||||||
# git commit -a -m added
|
|
||||||
|
|
||||||
When you add a file to the annex and commit it, only a symlink to
|
|
||||||
the annexed content is committed. The content itself is stored in
|
|
||||||
git-annex's backend.
|
|
||||||
|
|
||||||
## renaming files
|
|
||||||
|
|
||||||
# cd ~/annex
|
|
||||||
# git mv big_file my_cool_big_file
|
|
||||||
# mkdir iso
|
|
||||||
# git mv debian.iso iso/
|
|
||||||
# git commit -m moved
|
|
||||||
|
|
||||||
You can use any normal git operations to move files around, or even
|
|
||||||
make copies or delete them.
|
|
||||||
|
|
||||||
Notice that, since annexed files are represented by symlinks,
|
|
||||||
the symlink will break when the file is moved into a subdirectory.
|
|
||||||
But, git-annex will fix this up for you when you commit --
|
|
||||||
it has a pre-commit hook that watches for and corrects broken symlinks.
|
|
||||||
|
|
||||||
## getting file content
|
|
||||||
|
|
||||||
A repository does not always have all annexed file contents available.
|
|
||||||
When you need the content of a file, you can use "git annex get" to
|
|
||||||
make it available.
|
|
||||||
|
|
||||||
We can use this to copy everything in the laptop's annex to the
|
|
||||||
USB drive.
|
|
||||||
|
|
||||||
# cd /media/usb/annex
|
|
||||||
# git pull laptop master
|
|
||||||
# git annex get .
|
|
||||||
get my_cool_big_file (copying from laptop...) ok
|
|
||||||
get iso/debian.iso (copying from laptop...) ok
|
|
||||||
|
|
||||||
Notice that you had to git pull from laptop first, this lets git-annex know
|
|
||||||
what has changed in laptop, and so it knows about the files present there and
|
|
||||||
can get them.
|
|
||||||
|
|
||||||
## transferring files: When things go wrong
|
|
||||||
|
|
||||||
After a while, you'll have several annexes, with different file contents.
|
|
||||||
You don't have to try to keep all that straight; git-annex does
|
|
||||||
[[location_tracking]] for you. If you ask it to get a file and the drive
|
|
||||||
or file server is not accessible, it will let you know what it needs to get
|
|
||||||
it:
|
|
||||||
|
|
||||||
# git annex get video/hackity_hack_and_kaxxt.mov
|
|
||||||
get video/_why_hackity_hack_and_kaxxt.mov (not available)
|
|
||||||
Unable to access these remotes: usbdrive, server
|
|
||||||
Try making some of these repositories available:
|
|
||||||
5863d8c0-d9a9-11df-adb2-af51e6559a49 -- my home file server
|
|
||||||
58d84e8a-d9ae-11df-a1aa-ab9aa8c00826 -- portable USB drive
|
|
||||||
ca20064c-dbb5-11df-b2fe-002170d25c55 -- backup SATA drive
|
|
||||||
failed
|
|
||||||
# sudo mount /media/usb
|
|
||||||
# git annex get video/hackity_hack_and_kaxxt.mov
|
|
||||||
get video/hackity_hack_and_kaxxt.mov (copying from usbdrive...) ok
|
|
||||||
# git commit -a -m "got a video I want to rewatch on the plane"
|
|
||||||
|
|
||||||
## removing files
|
|
||||||
|
|
||||||
You can always drop files safely. Git-annex checks that some other annex
|
|
||||||
has the file before removing it.
|
|
||||||
|
|
||||||
# git annex drop iso/debian.iso
|
|
||||||
drop iso/Debian_5.0.iso ok
|
|
||||||
# git commit -a -m "freed up space"
|
|
||||||
|
|
||||||
## removing files: When things go wrong
|
|
||||||
|
|
||||||
Before dropping a file, git-annex wants to be able to look at other
|
|
||||||
remotes, and verify that they still have a file. After all, it could
|
|
||||||
have been dropped from them too. If the remotes are not mounted/available,
|
|
||||||
you'll see something like this.
|
|
||||||
|
|
||||||
# git annex drop important_file other.iso
|
|
||||||
drop important_file (unsafe)
|
|
||||||
Could only verify the existence of 0 out of 1 necessary copies
|
|
||||||
Unable to access these remotes: usbdrive
|
|
||||||
Try making some of these repositories available:
|
|
||||||
58d84e8a-d9ae-11df-a1aa-ab9aa8c00826 -- portable USB drive
|
|
||||||
ca20064c-dbb5-11df-b2fe-002170d25c55 -- backup SATA drive
|
|
||||||
(Use --force to override this check, or adjust annex.numcopies.)
|
|
||||||
failed
|
|
||||||
drop other.iso (unsafe)
|
|
||||||
Could only verify the existence of 0 out of 1 necessary copies
|
|
||||||
No other repository is known to contain the file.
|
|
||||||
(Use --force to override this check, or adjust annex.numcopies.)
|
|
||||||
failed
|
|
||||||
|
|
||||||
Here you might --force it to drop `important_file` if you [[trust]] your backup.
|
|
||||||
But `other.iso` looks to have never been copied to anywhere else, so if
|
|
||||||
it's something you want to hold onto, you'd need to transfer it to
|
|
||||||
some other repository before dropping it.
|
|
||||||
|
|
||||||
## modifying annexed files
|
|
||||||
|
|
||||||
Normally, the content of files in the annex is prevented from being modified.
|
|
||||||
That's a good thing, because it might be the only copy, you wouldn't
|
|
||||||
want to lose it in a fumblefingered mistake.
|
|
||||||
|
|
||||||
# echo oops > my_cool_big_file
|
|
||||||
bash: my_cool_big_file: Permission denied
|
|
||||||
|
|
||||||
In order to modify a file, it should first be unlocked.
|
|
||||||
|
|
||||||
# git annex unlock my_cool_big_file
|
|
||||||
unlock my_cool_big_file (copying...) ok
|
|
||||||
|
|
||||||
That replaces the symlink that normally points at its content with a copy
|
|
||||||
of the content. You can then modify the file like any regular file. Because
|
|
||||||
it is a regular file.
|
|
||||||
|
|
||||||
(If you decide you don't need to modify the file after all, or want to discard
|
|
||||||
modifications, just use `git annex lock`.)
|
|
||||||
|
|
||||||
When you `git commit`, git-annex's pre-commit hook will automatically
|
|
||||||
notice that you are committing an unlocked file, and add its new content
|
|
||||||
to the annex. The file will be replaced with a symlink to the new content,
|
|
||||||
and this symlink is what gets committed to git in the end.
|
|
||||||
|
|
||||||
# echo "now smaller, but even cooler" > my_cool_big_file
|
|
||||||
# git commit my_cool_big_file -m "changed an annexed file"
|
|
||||||
add my_cool_big_file ok
|
|
||||||
[master 64cda67] changed an annexed file
|
|
||||||
2 files changed, 2 insertions(+), 1 deletions(-)
|
|
||||||
create mode 100644 .git-annex/WORM:1289672605:30:file.log
|
|
||||||
|
|
||||||
There is one problem with using `git commit` like this: Git wants to first
|
|
||||||
stage the entire contents of the file in its index. That can be slow for
|
|
||||||
big files (sorta why git-annex exists in the first place). So, the
|
|
||||||
automatic handling on commit is a nice safety feature, since it prevents
|
|
||||||
the file content being accidentally committed into git. But when working with
|
|
||||||
big files, it's faster to explicitly add them to the annex yourself
|
|
||||||
before committing.
|
|
||||||
|
|
||||||
# echo "now smaller, but even cooler yet" > my_cool_big_file
|
|
||||||
# git annex add my_cool_big_file
|
|
||||||
add my_cool_big_file ok
|
|
||||||
# git commit my_cool_big_file -m "changed an annexed file"
|
|
||||||
|
|
||||||
## using ssh remotes
|
|
||||||
|
|
||||||
So far in this walkthrough, git-annex has been used with a remote
|
|
||||||
repository on a USB drive. But it can also be used with a git remote
|
|
||||||
that is truely remote, a host accessed by ssh.
|
|
||||||
|
|
||||||
Say you have a desktop on the same network as your laptop and want
|
|
||||||
to clone the laptop's annex to it:
|
|
||||||
|
|
||||||
# git clone ssh://mylaptop/home/me/annex ~/annex
|
|
||||||
# cd ~/annex
|
|
||||||
# git annex init "my desktop"
|
|
||||||
|
|
||||||
Now you can get files and they will be transferred (using `rsync`):
|
|
||||||
|
|
||||||
# git annex get my_cool_big_file
|
|
||||||
get my_cool_big_file (getting UUID for origin...) (copying from origin...)
|
|
||||||
WORM:1285650548:2159:my_cool_big_file 100% 2159 2.1KB/s 00:00
|
|
||||||
ok
|
|
||||||
|
|
||||||
When you drop files, git-annex will ssh over to the remote and make
|
|
||||||
sure the file's content is still there before removing it locally:
|
|
||||||
|
|
||||||
# git annex drop my_cool_big_file
|
|
||||||
drop my_cool_big_file (checking origin..) ok
|
|
||||||
|
|
||||||
Note that normally git-annex prefers to use non-ssh remotes, like
|
|
||||||
a USB drive, before ssh remotes. They are assumed to be faster/cheaper to
|
|
||||||
access, if available. There is a annex-cost setting you can configure in
|
|
||||||
`.git/config` to adjust which repositories it prefers. See
|
|
||||||
[[the_man_page|git-annex]] for details.
|
|
||||||
|
|
||||||
Also, note that you need full shell access for this to work --
|
|
||||||
git-annex needs to be able to ssh in and run commands.
|
|
||||||
|
|
||||||
## moving file content between repositories
|
|
||||||
|
|
||||||
Often you will want to move some file contents from a repository to some
|
|
||||||
other one. For example, your laptop's disk is getting full; time to move
|
|
||||||
some files to an external disk before moving another file from a file
|
|
||||||
server to your laptop. Doing that by hand (by using `git annex get` and
|
|
||||||
`git annex drop`) is possible, but a bit of a pain. `git annex move`
|
|
||||||
makes it very easy.
|
|
||||||
|
|
||||||
# git annex move my_cool_big_file --to usbdrive
|
|
||||||
move my_cool_big_file (moving to usbdrive...) ok
|
|
||||||
# git annex move video/hackity_hack_and_kaxxt.mov --from fileserver
|
|
||||||
move video/hackity_hack_and_kaxxt.mov (moving from fileserver...)
|
|
||||||
WORM:1274316523:86050597:hackity_hack_and_kax 100% 82MB 199.1KB/s 07:02
|
|
||||||
ok
|
|
||||||
|
|
||||||
## using the URL backend
|
|
||||||
|
|
||||||
git-annex has multiple key-value [[backends]]. So far this walkthrough has
|
|
||||||
demonstrated the default, WORM (Write Once, Read Many) backend.
|
|
||||||
|
|
||||||
Another handy backend is the URL backend, which can fetch file's content
|
|
||||||
from remote URLs. Here's how to set up some files in your repository
|
|
||||||
that use this backend:
|
|
||||||
|
|
||||||
# git annex fromkey --backend=URL --key=http://www.archive.org/somefile somefile
|
|
||||||
fromkey somefile ok
|
|
||||||
# git commit -m "added a file from the Internet Archive"
|
|
||||||
|
|
||||||
Now you if you ask git-annex to get that file, it will download it,
|
|
||||||
and cache it locally.
|
|
||||||
|
|
||||||
# git annex get somefile
|
|
||||||
get somefile (downloading)
|
|
||||||
#########################################################################100.0%
|
|
||||||
ok
|
|
||||||
|
|
||||||
You can always drop files downloaded by the URL backend. It is assumed
|
|
||||||
that the URL is stable; no local backup is kept.
|
|
||||||
|
|
||||||
# git annex drop somefile
|
|
||||||
drop somefile (ok)
|
|
||||||
|
|
||||||
## using the SHA1 backend
|
|
||||||
|
|
||||||
Another handy alternative to the default [[backend|backends]] is the
|
|
||||||
SHA1 backend. This backend provides more git-style assurance that your data
|
|
||||||
has not been damaged. And the checksum means that when you add the same
|
|
||||||
content to the annex twice, only one copy need be stored in the backend.
|
|
||||||
|
|
||||||
The only reason it's not the default is that it needs to checksum
|
|
||||||
files when they're added to the annex, and this can slow things down
|
|
||||||
significantly for really big files. To make SHA1 the default, just
|
|
||||||
add something like this to `.gitattributes`:
|
|
||||||
|
|
||||||
* annex.backend=SHA1
|
|
||||||
|
|
||||||
## migrating data to a new backend
|
|
||||||
|
|
||||||
Maybe you started out using the WORM backend, and have now configured
|
|
||||||
git-annex to use SHA1. But files you added to the annex before still
|
|
||||||
use the WORM backend. There is a simple command that can migrate that
|
|
||||||
data:
|
|
||||||
|
|
||||||
# git annex migrate my_cool_big_file
|
|
||||||
migrate my_cool_big_file (checksum...) ok
|
|
||||||
|
|
||||||
You can only migrate files whose content is currently available. Other
|
|
||||||
files will be skipped.
|
|
||||||
|
|
||||||
After migrating a file to a new backend, the old content in the old backend
|
|
||||||
will still be present. That is necessary because multiple files
|
|
||||||
can point to the same content. The `git annex unused` subcommand can be
|
|
||||||
used to clear up that detritus later. Note that hard links are used,
|
|
||||||
to avoid wasting disk space.
|
|
||||||
|
|
||||||
## unused data
|
|
||||||
|
|
||||||
It's possible for data to accumulate in the annex that no files point to
|
|
||||||
anymore. One way it can happen is if you `git rm` a file without
|
|
||||||
first calling `git annex drop`. And, when you modify an annexed file, the old
|
|
||||||
content of the file remains in the annex. Another way is when migrating
|
|
||||||
between backends.
|
|
||||||
|
|
||||||
This might be historical data you want to preserve, so git-annex defaults to
|
|
||||||
preserving it. So from time to time, you may want to check for such data and
|
|
||||||
eliminate it to save space.
|
|
||||||
|
|
||||||
# git annex unused
|
|
||||||
unused (checking for unused data...)
|
|
||||||
Some annexed data is no longer pointed to by any files in the repository.
|
|
||||||
NUMBER KEY
|
|
||||||
1 WORM:1289672605:3:file
|
|
||||||
2 WORM:1289672605:14:file
|
|
||||||
(To see where data was previously used, try: git log --stat -S'KEY')
|
|
||||||
(To remove unwanted data: git-annex dropunused NUMBER)
|
|
||||||
ok
|
|
||||||
|
|
||||||
After running `git annex unused`, you can follow the instructions to examine
|
|
||||||
the history of files that used the data, and if you decide you don't need that
|
|
||||||
data anymore, you can easily remove it:
|
|
||||||
|
|
||||||
# git annex dropunused 1
|
|
||||||
dropunused 1 ok
|
|
||||||
|
|
||||||
Hint: To drop a lot of unused data, use a command like this:
|
|
||||||
|
|
||||||
# git annex dropunused `seq 1 1000`
|
|
||||||
|
|
||||||
## fsck: verifying your data
|
|
||||||
|
|
||||||
You can use the fsck subcommand to check for problems in your data.
|
|
||||||
What can be checked depends on the [[backend|backends]] you've used to store
|
|
||||||
the data. For example, when you use the SHA1 backend, fsck will verify that
|
|
||||||
the checksums of your files are good. Fsck also checks that the annex.numcopies
|
|
||||||
setting is satisfied for all files.
|
|
||||||
|
|
||||||
# git annex fsck
|
|
||||||
unused (checking for unused data...) ok
|
|
||||||
fsck my_cool_big_file (checksum...) ok
|
|
||||||
...
|
|
||||||
|
|
||||||
You can also specify the files to check. This is particularly useful if
|
|
||||||
you're using sha1 and don't want to spend a long time checksumming everything.
|
|
||||||
|
|
||||||
# git annex fsck my_cool_big_file
|
|
||||||
fsck my_cool_big_file (checksum...) ok
|
|
||||||
|
|
||||||
## fsck: When things go wrong
|
|
||||||
|
|
||||||
Fsck never deletes possibly bad data; instead it will be moved to
|
|
||||||
`.git/annex/bad/` for you to recover. Here is a sample of what fsck
|
|
||||||
might say about a badly messed up annex:
|
|
||||||
|
|
||||||
# git annex fsck
|
|
||||||
fsck my_cool_big_file (checksum...)
|
|
||||||
git-annex: Bad file content; moved to .git/annex/bad/SHA1:7da006579dd64330eb2456001fd01948430572f2
|
|
||||||
git-annex: ** No known copies of the file exist!
|
|
||||||
failed
|
|
||||||
fsck important_file
|
|
||||||
git-annex: Only 1 of 2 copies exist. Run git annex get somewhere else to back it up.
|
|
||||||
failed
|
|
||||||
git-annex: 2 failed
|
|
||||||
|
|
||||||
## backups
|
|
||||||
|
|
||||||
git-annex can be configured to require more than one copy of a file exists,
|
|
||||||
as a simple backup for your data. This is controlled by the "annex.numcopies"
|
|
||||||
setting, which defaults to 1 copy. Let's change that to require 2 copies,
|
|
||||||
and send a copy of every file to a USB drive.
|
|
||||||
|
|
||||||
# echo "* annex.numcopies=2" >> .gitattributes
|
|
||||||
# git annex copy . --to usbdrive
|
|
||||||
|
|
||||||
Now when we try to `git annex drop` a file, it will verify that it
|
|
||||||
knows of 2 other repositories that have a copy before removing its
|
|
||||||
content from the current repository.
|
|
||||||
|
|
||||||
You can also vary the number of copies needed, depending on the file name.
|
|
||||||
So, if you want 3 copies of all your flac files, but only 1 copy of oggs:
|
|
||||||
|
|
||||||
# echo "*.ogg annex.numcopies=1" >> .gitattributes
|
|
||||||
# echo "*.flac annex.numcopies=3" >> .gitattributes
|
|
||||||
|
|
||||||
Or, you might want to make a directory for important stuff, and configure
|
|
||||||
it so anything put in there is backed up more thoroughly:
|
|
||||||
|
|
||||||
# mkdir important_stuff
|
|
||||||
# echo "* annex.numcopies=3" > important_stuff/.gitattributes
|
|
||||||
|
|
||||||
For more details about the numcopies setting, see [[copies]].
|
|
||||||
|
|
||||||
## untrusted repositories
|
|
||||||
|
|
||||||
Suppose you have a USB thumb drive and are using it as a git annex
|
|
||||||
repository. You don't trust the drive, because you could lose it, or
|
|
||||||
accidentally run it through the laundry. Or, maybe you have a drive that
|
|
||||||
you know is dying, and you'd like to be warned if there are any files
|
|
||||||
on it not backed up somewhere else. Maybe the drive has already died
|
|
||||||
or been lost.
|
|
||||||
|
|
||||||
You can let git-annex know that you don't trust a repository, and it will
|
|
||||||
adjust its behavior to avoid relying on that repositories's continued
|
|
||||||
availability.
|
|
||||||
|
|
||||||
# git annex untrust usbdrive
|
|
||||||
untrust usbdrive ok
|
|
||||||
|
|
||||||
Now when you do a fsck, you'll be warned appropriately:
|
|
||||||
|
|
||||||
# git annex fsck .
|
|
||||||
fsck my_big_file
|
|
||||||
Only these untrusted locations may have copies of this file!
|
|
||||||
05e296c4-2989-11e0-bf40-bad1535567fe -- portable USB drive
|
|
||||||
Back it up to trusted locations with git-annex copy.
|
|
||||||
failed
|
|
||||||
|
|
||||||
Also, git-annex will refuse to drop a file from elsewhere just because
|
|
||||||
it can see a copy on the untrusted repository.
|
|
||||||
|
|
||||||
It's also possible to tell git-annex that you have an unusually high
|
|
||||||
level of trust for a repository. See [[trust]] for details.
|
|
||||||
|
|
19
doc/walkthrough/adding_a_remote.mdwn
Normal file
19
doc/walkthrough/adding_a_remote.mdwn
Normal file
|
@ -0,0 +1,19 @@
|
||||||
|
Like any other git repository, git-annex repositories have remotes.
|
||||||
|
Let's start by adding a USB drive as a remote.
|
||||||
|
|
||||||
|
# sudo mount /media/usb
|
||||||
|
# cd /media/usb
|
||||||
|
# git clone ~/annex
|
||||||
|
# cd annex
|
||||||
|
# git annex init "portable USB drive"
|
||||||
|
# git remote add laptop ~/annex
|
||||||
|
# cd ~/annex
|
||||||
|
# git remote add usbdrive /media/usb
|
||||||
|
|
||||||
|
This is all standard ad-hoc distributed git repository setup.
|
||||||
|
The only git-annex specific part is telling it the name
|
||||||
|
of the new repository created on the USB drive.
|
||||||
|
|
||||||
|
Notice that both repos are set up as remotes of one another. This lets
|
||||||
|
either get annexed files from the other. You'll want to do that even
|
||||||
|
if you are using git in a more centralized fashion.
|
11
doc/walkthrough/adding_files.mdwn
Normal file
11
doc/walkthrough/adding_files.mdwn
Normal file
|
@ -0,0 +1,11 @@
|
||||||
|
# cd ~/annex
|
||||||
|
# cp /tmp/big_file .
|
||||||
|
# cp /tmp/debian.iso .
|
||||||
|
# git annex add .
|
||||||
|
add big_file ok
|
||||||
|
add debian.iso ok
|
||||||
|
# git commit -a -m added
|
||||||
|
|
||||||
|
When you add a file to the annex and commit it, only a symlink to
|
||||||
|
the annexed content is committed. The content itself is stored in
|
||||||
|
git-annex's backend.
|
25
doc/walkthrough/backups.mdwn
Normal file
25
doc/walkthrough/backups.mdwn
Normal file
|
@ -0,0 +1,25 @@
|
||||||
|
git-annex can be configured to require more than one copy of a file exists,
|
||||||
|
as a simple backup for your data. This is controlled by the "annex.numcopies"
|
||||||
|
setting, which defaults to 1 copy. Let's change that to require 2 copies,
|
||||||
|
and send a copy of every file to a USB drive.
|
||||||
|
|
||||||
|
# echo "* annex.numcopies=2" >> .gitattributes
|
||||||
|
# git annex copy . --to usbdrive
|
||||||
|
|
||||||
|
Now when we try to `git annex drop` a file, it will verify that it
|
||||||
|
knows of 2 other repositories that have a copy before removing its
|
||||||
|
content from the current repository.
|
||||||
|
|
||||||
|
You can also vary the number of copies needed, depending on the file name.
|
||||||
|
So, if you want 3 copies of all your flac files, but only 1 copy of oggs:
|
||||||
|
|
||||||
|
# echo "*.ogg annex.numcopies=1" >> .gitattributes
|
||||||
|
# echo "*.flac annex.numcopies=3" >> .gitattributes
|
||||||
|
|
||||||
|
Or, you might want to make a directory for important stuff, and configure
|
||||||
|
it so anything put in there is backed up more thoroughly:
|
||||||
|
|
||||||
|
# mkdir important_stuff
|
||||||
|
# echo "* annex.numcopies=3" > important_stuff/.gitattributes
|
||||||
|
|
||||||
|
For more details about the numcopies setting, see [[copies]].
|
6
doc/walkthrough/creating_a_repository.mdwn
Normal file
6
doc/walkthrough/creating_a_repository.mdwn
Normal file
|
@ -0,0 +1,6 @@
|
||||||
|
This is very straightforward. Just tell it a description of the repository.
|
||||||
|
|
||||||
|
# mkdir ~/annex
|
||||||
|
# cd ~/annex
|
||||||
|
# git init
|
||||||
|
# git annex init "my laptop"
|
16
doc/walkthrough/fsck:_verifying_your_data.mdwn
Normal file
16
doc/walkthrough/fsck:_verifying_your_data.mdwn
Normal file
|
@ -0,0 +1,16 @@
|
||||||
|
You can use the fsck subcommand to check for problems in your data.
|
||||||
|
What can be checked depends on the [[backend|backends]] you've used to store
|
||||||
|
the data. For example, when you use the SHA1 backend, fsck will verify that
|
||||||
|
the checksums of your files are good. Fsck also checks that the annex.numcopies
|
||||||
|
setting is satisfied for all files.
|
||||||
|
|
||||||
|
# git annex fsck
|
||||||
|
unused (checking for unused data...) ok
|
||||||
|
fsck my_cool_big_file (checksum...) ok
|
||||||
|
...
|
||||||
|
|
||||||
|
You can also specify the files to check. This is particularly useful if
|
||||||
|
you're using sha1 and don't want to spend a long time checksumming everything.
|
||||||
|
|
||||||
|
# git annex fsck my_cool_big_file
|
||||||
|
fsck my_cool_big_file (checksum...) ok
|
13
doc/walkthrough/fsck:_when_things_go_wrong.mdwn
Normal file
13
doc/walkthrough/fsck:_when_things_go_wrong.mdwn
Normal file
|
@ -0,0 +1,13 @@
|
||||||
|
Fsck never deletes possibly bad data; instead it will be moved to
|
||||||
|
`.git/annex/bad/` for you to recover. Here is a sample of what fsck
|
||||||
|
might say about a badly messed up annex:
|
||||||
|
|
||||||
|
# git annex fsck
|
||||||
|
fsck my_cool_big_file (checksum...)
|
||||||
|
git-annex: Bad file content; moved to .git/annex/bad/SHA1:7da006579dd64330eb2456001fd01948430572f2
|
||||||
|
git-annex: ** No known copies of the file exist!
|
||||||
|
failed
|
||||||
|
fsck important_file
|
||||||
|
git-annex: Only 1 of 2 copies exist. Run git annex get somewhere else to back it up.
|
||||||
|
failed
|
||||||
|
git-annex: 2 failed
|
16
doc/walkthrough/getting_file_content.mdwn
Normal file
16
doc/walkthrough/getting_file_content.mdwn
Normal file
|
@ -0,0 +1,16 @@
|
||||||
|
A repository does not always have all annexed file contents available.
|
||||||
|
When you need the content of a file, you can use "git annex get" to
|
||||||
|
make it available.
|
||||||
|
|
||||||
|
We can use this to copy everything in the laptop's annex to the
|
||||||
|
USB drive.
|
||||||
|
|
||||||
|
# cd /media/usb/annex
|
||||||
|
# git pull laptop master
|
||||||
|
# git annex get .
|
||||||
|
get my_cool_big_file (copying from laptop...) ok
|
||||||
|
get iso/debian.iso (copying from laptop...) ok
|
||||||
|
|
||||||
|
Notice that you had to git pull from laptop first, this lets git-annex know
|
||||||
|
what has changed in laptop, and so it knows about the files present there and
|
||||||
|
can get them.
|
16
doc/walkthrough/migrating_data_to_a_new_backend.mdwn
Normal file
16
doc/walkthrough/migrating_data_to_a_new_backend.mdwn
Normal file
|
@ -0,0 +1,16 @@
|
||||||
|
Maybe you started out using the WORM backend, and have now configured
|
||||||
|
git-annex to use SHA1. But files you added to the annex before still
|
||||||
|
use the WORM backend. There is a simple command that can migrate that
|
||||||
|
data:
|
||||||
|
|
||||||
|
# git annex migrate my_cool_big_file
|
||||||
|
migrate my_cool_big_file (checksum...) ok
|
||||||
|
|
||||||
|
You can only migrate files whose content is currently available. Other
|
||||||
|
files will be skipped.
|
||||||
|
|
||||||
|
After migrating a file to a new backend, the old content in the old backend
|
||||||
|
will still be present. That is necessary because multiple files
|
||||||
|
can point to the same content. The `git annex unused` subcommand can be
|
||||||
|
used to clear up that detritus later. Note that hard links are used,
|
||||||
|
to avoid wasting disk space.
|
43
doc/walkthrough/modifying_annexed_files.mdwn
Normal file
43
doc/walkthrough/modifying_annexed_files.mdwn
Normal file
|
@ -0,0 +1,43 @@
|
||||||
|
Normally, the content of files in the annex is prevented from being modified.
|
||||||
|
That's a good thing, because it might be the only copy, you wouldn't
|
||||||
|
want to lose it in a fumblefingered mistake.
|
||||||
|
|
||||||
|
# echo oops > my_cool_big_file
|
||||||
|
bash: my_cool_big_file: Permission denied
|
||||||
|
|
||||||
|
In order to modify a file, it should first be unlocked.
|
||||||
|
|
||||||
|
# git annex unlock my_cool_big_file
|
||||||
|
unlock my_cool_big_file (copying...) ok
|
||||||
|
|
||||||
|
That replaces the symlink that normally points at its content with a copy
|
||||||
|
of the content. You can then modify the file like any regular file. Because
|
||||||
|
it is a regular file.
|
||||||
|
|
||||||
|
(If you decide you don't need to modify the file after all, or want to discard
|
||||||
|
modifications, just use `git annex lock`.)
|
||||||
|
|
||||||
|
When you `git commit`, git-annex's pre-commit hook will automatically
|
||||||
|
notice that you are committing an unlocked file, and add its new content
|
||||||
|
to the annex. The file will be replaced with a symlink to the new content,
|
||||||
|
and this symlink is what gets committed to git in the end.
|
||||||
|
|
||||||
|
# echo "now smaller, but even cooler" > my_cool_big_file
|
||||||
|
# git commit my_cool_big_file -m "changed an annexed file"
|
||||||
|
add my_cool_big_file ok
|
||||||
|
[master 64cda67] changed an annexed file
|
||||||
|
2 files changed, 2 insertions(+), 1 deletions(-)
|
||||||
|
create mode 100644 .git-annex/WORM:1289672605:30:file.log
|
||||||
|
|
||||||
|
There is one problem with using `git commit` like this: Git wants to first
|
||||||
|
stage the entire contents of the file in its index. That can be slow for
|
||||||
|
big files (sorta why git-annex exists in the first place). So, the
|
||||||
|
automatic handling on commit is a nice safety feature, since it prevents
|
||||||
|
the file content being accidentally committed into git. But when working with
|
||||||
|
big files, it's faster to explicitly add them to the annex yourself
|
||||||
|
before committing.
|
||||||
|
|
||||||
|
# echo "now smaller, but even cooler yet" > my_cool_big_file
|
||||||
|
# git annex add my_cool_big_file
|
||||||
|
add my_cool_big_file ok
|
||||||
|
# git commit my_cool_big_file -m "changed an annexed file"
|
|
@ -0,0 +1,13 @@
|
||||||
|
Often you will want to move some file contents from a repository to some
|
||||||
|
other one. For example, your laptop's disk is getting full; time to move
|
||||||
|
some files to an external disk before moving another file from a file
|
||||||
|
server to your laptop. Doing that by hand (by using `git annex get` and
|
||||||
|
`git annex drop`) is possible, but a bit of a pain. `git annex move`
|
||||||
|
makes it very easy.
|
||||||
|
|
||||||
|
# git annex move my_cool_big_file --to usbdrive
|
||||||
|
move my_cool_big_file (moving to usbdrive...) ok
|
||||||
|
# git annex move video/hackity_hack_and_kaxxt.mov --from fileserver
|
||||||
|
move video/hackity_hack_and_kaxxt.mov (moving from fileserver...)
|
||||||
|
WORM:1274316523:86050597:hackity_hack_and_kax 100% 82MB 199.1KB/s 07:02
|
||||||
|
ok
|
6
doc/walkthrough/removing_files.mdwn
Normal file
6
doc/walkthrough/removing_files.mdwn
Normal file
|
@ -0,0 +1,6 @@
|
||||||
|
You can always drop files safely. Git-annex checks that some other annex
|
||||||
|
has the file before removing it.
|
||||||
|
|
||||||
|
# git annex drop iso/debian.iso
|
||||||
|
drop iso/Debian_5.0.iso ok
|
||||||
|
# git commit -a -m "freed up space"
|
24
doc/walkthrough/removing_files:_When_things_go_wrong.mdwn
Normal file
24
doc/walkthrough/removing_files:_When_things_go_wrong.mdwn
Normal file
|
@ -0,0 +1,24 @@
|
||||||
|
Before dropping a file, git-annex wants to be able to look at other
|
||||||
|
remotes, and verify that they still have a file. After all, it could
|
||||||
|
have been dropped from them too. If the remotes are not mounted/available,
|
||||||
|
you'll see something like this.
|
||||||
|
|
||||||
|
# git annex drop important_file other.iso
|
||||||
|
drop important_file (unsafe)
|
||||||
|
Could only verify the existence of 0 out of 1 necessary copies
|
||||||
|
Unable to access these remotes: usbdrive
|
||||||
|
Try making some of these repositories available:
|
||||||
|
58d84e8a-d9ae-11df-a1aa-ab9aa8c00826 -- portable USB drive
|
||||||
|
ca20064c-dbb5-11df-b2fe-002170d25c55 -- backup SATA drive
|
||||||
|
(Use --force to override this check, or adjust annex.numcopies.)
|
||||||
|
failed
|
||||||
|
drop other.iso (unsafe)
|
||||||
|
Could only verify the existence of 0 out of 1 necessary copies
|
||||||
|
No other repository is known to contain the file.
|
||||||
|
(Use --force to override this check, or adjust annex.numcopies.)
|
||||||
|
failed
|
||||||
|
|
||||||
|
Here you might --force it to drop `important_file` if you [[trust]] your backup.
|
||||||
|
But `other.iso` looks to have never been copied to anywhere else, so if
|
||||||
|
it's something you want to hold onto, you'd need to transfer it to
|
||||||
|
some other repository before dropping it.
|
13
doc/walkthrough/renaming_files.mdwn
Normal file
13
doc/walkthrough/renaming_files.mdwn
Normal file
|
@ -0,0 +1,13 @@
|
||||||
|
# cd ~/annex
|
||||||
|
# git mv big_file my_cool_big_file
|
||||||
|
# mkdir iso
|
||||||
|
# git mv debian.iso iso/
|
||||||
|
# git commit -m moved
|
||||||
|
|
||||||
|
You can use any normal git operations to move files around, or even
|
||||||
|
make copies or delete them.
|
||||||
|
|
||||||
|
Notice that, since annexed files are represented by symlinks,
|
||||||
|
the symlink will break when the file is moved into a subdirectory.
|
||||||
|
But, git-annex will fix this up for you when you commit --
|
||||||
|
it has a pre-commit hook that watches for and corrects broken symlinks.
|
|
@ -0,0 +1,18 @@
|
||||||
|
After a while, you'll have several annexes, with different file contents.
|
||||||
|
You don't have to try to keep all that straight; git-annex does
|
||||||
|
[[location_tracking]] for you. If you ask it to get a file and the drive
|
||||||
|
or file server is not accessible, it will let you know what it needs to get
|
||||||
|
it:
|
||||||
|
|
||||||
|
# git annex get video/hackity_hack_and_kaxxt.mov
|
||||||
|
get video/_why_hackity_hack_and_kaxxt.mov (not available)
|
||||||
|
Unable to access these remotes: usbdrive, server
|
||||||
|
Try making some of these repositories available:
|
||||||
|
5863d8c0-d9a9-11df-adb2-af51e6559a49 -- my home file server
|
||||||
|
58d84e8a-d9ae-11df-a1aa-ab9aa8c00826 -- portable USB drive
|
||||||
|
ca20064c-dbb5-11df-b2fe-002170d25c55 -- backup SATA drive
|
||||||
|
failed
|
||||||
|
# sudo mount /media/usb
|
||||||
|
# git annex get video/hackity_hack_and_kaxxt.mov
|
||||||
|
get video/hackity_hack_and_kaxxt.mov (copying from usbdrive...) ok
|
||||||
|
# git commit -a -m "got a video I want to rewatch on the plane"
|
28
doc/walkthrough/untrusted_repositories.mdwn
Normal file
28
doc/walkthrough/untrusted_repositories.mdwn
Normal file
|
@ -0,0 +1,28 @@
|
||||||
|
Suppose you have a USB thumb drive and are using it as a git annex
|
||||||
|
repository. You don't trust the drive, because you could lose it, or
|
||||||
|
accidentally run it through the laundry. Or, maybe you have a drive that
|
||||||
|
you know is dying, and you'd like to be warned if there are any files
|
||||||
|
on it not backed up somewhere else. Maybe the drive has already died
|
||||||
|
or been lost.
|
||||||
|
|
||||||
|
You can let git-annex know that you don't trust a repository, and it will
|
||||||
|
adjust its behavior to avoid relying on that repositories's continued
|
||||||
|
availability.
|
||||||
|
|
||||||
|
# git annex untrust usbdrive
|
||||||
|
untrust usbdrive ok
|
||||||
|
|
||||||
|
Now when you do a fsck, you'll be warned appropriately:
|
||||||
|
|
||||||
|
# git annex fsck .
|
||||||
|
fsck my_big_file
|
||||||
|
Only these untrusted locations may have copies of this file!
|
||||||
|
05e296c4-2989-11e0-bf40-bad1535567fe -- portable USB drive
|
||||||
|
Back it up to trusted locations with git-annex copy.
|
||||||
|
failed
|
||||||
|
|
||||||
|
Also, git-annex will refuse to drop a file from elsewhere just because
|
||||||
|
it can see a copy on the untrusted repository.
|
||||||
|
|
||||||
|
It's also possible to tell git-annex that you have an unusually high
|
||||||
|
level of trust for a repository. See [[trust]] for details.
|
30
doc/walkthrough/unused_data.mdwn
Normal file
30
doc/walkthrough/unused_data.mdwn
Normal file
|
@ -0,0 +1,30 @@
|
||||||
|
It's possible for data to accumulate in the annex that no files point to
|
||||||
|
anymore. One way it can happen is if you `git rm` a file without
|
||||||
|
first calling `git annex drop`. And, when you modify an annexed file, the old
|
||||||
|
content of the file remains in the annex. Another way is when migrating
|
||||||
|
between backends.
|
||||||
|
|
||||||
|
This might be historical data you want to preserve, so git-annex defaults to
|
||||||
|
preserving it. So from time to time, you may want to check for such data and
|
||||||
|
eliminate it to save space.
|
||||||
|
|
||||||
|
# git annex unused
|
||||||
|
unused (checking for unused data...)
|
||||||
|
Some annexed data is no longer pointed to by any files in the repository.
|
||||||
|
NUMBER KEY
|
||||||
|
1 WORM:1289672605:3:file
|
||||||
|
2 WORM:1289672605:14:file
|
||||||
|
(To see where data was previously used, try: git log --stat -S'KEY')
|
||||||
|
(To remove unwanted data: git-annex dropunused NUMBER)
|
||||||
|
ok
|
||||||
|
|
||||||
|
After running `git annex unused`, you can follow the instructions to examine
|
||||||
|
the history of files that used the data, and if you decide you don't need that
|
||||||
|
data anymore, you can easily remove it:
|
||||||
|
|
||||||
|
# git annex dropunused 1
|
||||||
|
dropunused 1 ok
|
||||||
|
|
||||||
|
Hint: To drop a lot of unused data, use a command like this:
|
||||||
|
|
||||||
|
# git annex dropunused `seq 1 1000`
|
33
doc/walkthrough/using_ssh_remotes.mdwn
Normal file
33
doc/walkthrough/using_ssh_remotes.mdwn
Normal file
|
@ -0,0 +1,33 @@
|
||||||
|
So far in this walkthrough, git-annex has been used with a remote
|
||||||
|
repository on a USB drive. But it can also be used with a git remote
|
||||||
|
that is truely remote, a host accessed by ssh.
|
||||||
|
|
||||||
|
Say you have a desktop on the same network as your laptop and want
|
||||||
|
to clone the laptop's annex to it:
|
||||||
|
|
||||||
|
# git clone ssh://mylaptop/home/me/annex ~/annex
|
||||||
|
# cd ~/annex
|
||||||
|
# git annex init "my desktop"
|
||||||
|
|
||||||
|
Now you can get files and they will be transferred (using `rsync`):
|
||||||
|
|
||||||
|
# git annex get my_cool_big_file
|
||||||
|
get my_cool_big_file (getting UUID for origin...) (copying from origin...)
|
||||||
|
WORM:1285650548:2159:my_cool_big_file 100% 2159 2.1KB/s 00:00
|
||||||
|
ok
|
||||||
|
|
||||||
|
When you drop files, git-annex will ssh over to the remote and make
|
||||||
|
sure the file's content is still there before removing it locally:
|
||||||
|
|
||||||
|
# git annex drop my_cool_big_file
|
||||||
|
drop my_cool_big_file (checking origin..) ok
|
||||||
|
|
||||||
|
Note that normally git-annex prefers to use non-ssh remotes, like
|
||||||
|
a USB drive, before ssh remotes. They are assumed to be faster/cheaper to
|
||||||
|
access, if available. There is a annex-cost setting you can configure in
|
||||||
|
`.git/config` to adjust which repositories it prefers. See
|
||||||
|
[[the_man_page|git-annex]] for details.
|
||||||
|
|
||||||
|
Also, note that you need full shell access for this to work --
|
||||||
|
git-annex needs to be able to ssh in and run commands. Or at least,
|
||||||
|
your shell needs to be able to run the [[git-annex-shell]] command.
|
11
doc/walkthrough/using_the_SHA1_backend.mdwn
Normal file
11
doc/walkthrough/using_the_SHA1_backend.mdwn
Normal file
|
@ -0,0 +1,11 @@
|
||||||
|
Another handy alternative to the default [[backend|backends]] is the
|
||||||
|
SHA1 backend. This backend provides more git-style assurance that your data
|
||||||
|
has not been damaged. And the checksum means that when you add the same
|
||||||
|
content to the annex twice, only one copy need be stored in the backend.
|
||||||
|
|
||||||
|
The only reason it's not the default is that it needs to checksum
|
||||||
|
files when they're added to the annex, and this can slow things down
|
||||||
|
significantly for really big files. To make SHA1 the default, just
|
||||||
|
add something like this to `.gitattributes`:
|
||||||
|
|
||||||
|
* annex.backend=SHA1
|
24
doc/walkthrough/using_the_URL_backend.mdwn
Normal file
24
doc/walkthrough/using_the_URL_backend.mdwn
Normal file
|
@ -0,0 +1,24 @@
|
||||||
|
git-annex has multiple key-value [[backends]]. So far this walkthrough has
|
||||||
|
demonstrated the default, WORM (Write Once, Read Many) backend.
|
||||||
|
|
||||||
|
Another handy backend is the URL backend, which can fetch file's content
|
||||||
|
from remote URLs. Here's how to set up some files in your repository
|
||||||
|
that use this backend:
|
||||||
|
|
||||||
|
# git annex fromkey --backend=URL --key=http://www.archive.org/somefile somefile
|
||||||
|
fromkey somefile ok
|
||||||
|
# git commit -m "added a file from the Internet Archive"
|
||||||
|
|
||||||
|
Now you if you ask git-annex to get that file, it will download it,
|
||||||
|
and cache it locally.
|
||||||
|
|
||||||
|
# git annex get somefile
|
||||||
|
get somefile (downloading)
|
||||||
|
#########################################################################100.0%
|
||||||
|
ok
|
||||||
|
|
||||||
|
You can always drop files downloaded by the URL backend. It is assumed
|
||||||
|
that the URL is stable; no local backup is kept.
|
||||||
|
|
||||||
|
# git annex drop somefile
|
||||||
|
drop somefile (ok)
|
Loading…
Reference in a new issue