This commit is contained in:
Joey Hess 2010-10-19 15:59:40 -04:00
parent 6ef1c2d2da
commit 7bc4435ffd
10 changed files with 202 additions and 150 deletions

3
.gitignore vendored
View file

@ -1,2 +1,5 @@
build/* build/*
git-annex git-annex
git-annex.1
doc/.ikiwiki
html

View file

@ -51,7 +51,7 @@ options = [
Option ['f'] ["force"] (NoArg Force) "allow actions that may lose annexed data" Option ['f'] ["force"] (NoArg Force) "allow actions that may lose annexed data"
] ]
header = "Usage: git-annex [" ++ (join "|" $ map cmdname cmds) ++ "] ..." header = "Usage: git-annex " ++ (join "|" $ map cmdname cmds) ++ " [path ...]"
usage = usageInfo header options ++ "\nSubcommands:\n" ++ cmddescs usage = usageInfo header options ++ "\nSubcommands:\n" ++ cmddescs
where where

View file

@ -8,10 +8,6 @@ install:
install -d $(DESTDIR)/usr/bin install -d $(DESTDIR)/usr/bin
install git-annex $(DESTDIR)/usr/bin install git-annex $(DESTDIR)/usr/bin
clean:
rm -rf build git-annex
rm -rf doc/.ikiwiki html
# If ikiwiki is available, build static html docs suitable for being # If ikiwiki is available, build static html docs suitable for being
# shipped in the software package. # shipped in the software package.
ifeq ($(shell which ikiwiki),) ifeq ($(shell which ikiwiki),)
@ -21,7 +17,12 @@ IKIWIKI=ikiwiki
endif endif
docs: docs:
./mdwn2man git-annex 1 doc/git-annex.mdwn > git-annex.1
$(IKIWIKI) doc html -v --wikiname git-annex --plugin=goodstuff \ $(IKIWIKI) doc html -v --wikiname git-annex --plugin=goodstuff \
--no-usedirs --no-usedirs
clean:
rm -rf build git-annex git-annex.1
rm -rf doc/.ikiwiki html
.PHONY: git-annex .PHONY: git-annex

View file

@ -1,15 +0,0 @@
# Build static html docs suitable for being shipped in the software
# package. This depends on ikiwiki being installed to build the docs.
ifeq ($(shell which ikiwiki),)
IKIWIKI=echo "** ikiwiki not found" >&2 ; echo ikiwiki
else
IKIWIKI=ikiwiki
endif
all:
$(IKIWIKI) `pwd` html -v --wikiname FooBar --plugin=goodstuff \
--exclude=html --exclude=Makefile
clean:
rm -rf .ikiwiki html

21
doc/backends.mdwn Normal file
View file

@ -0,0 +1,21 @@
git-annex uses a key-value abstraction layer to allow file contents to be
stored in different ways. In theory, any key-value storage system could be
used to store file contents.
When a file is annexed, a key is generated from its content and/or metadata.
The file checked into git symlinks to the key. This key can later be used
to retrieve the file's content (its value).
Multiple pluggable backends are supported, and more than one can be used
to store different files' contents in a given repository.
* `WORM` ("Write Once, Read Many") This backend stores the file's content
only in `.git/annex/`, and assumes that any file with the same basename,
size, and modification time has the same content. So with this backend,
files can be moved around, but should never be added to or changed.
This is the default, and the least expensive backend.
* `SHA1` -- This backend stores the file's content in
`.git/annex/`, with a name based on its sha1 checksum. This backend allows
modifications of files to be tracked. Its need to generate checksums
can make it slower for large files.
* `URL` -- This backend downloads the file's content from an external URL.

View file

@ -0,0 +1,12 @@
TODO: implement below
git-annex does use a lot of symlinks. Specicially, relative symlinks,
that are checked into git. To allow you to move those around without
annoyance, git-annex can run as a post-commit hook. This way, you can `git mv`
a symlink to an annexed file, and as soon as you commit, it will be fixed
up.
`git annex init` tries to set up a post-commit hook that is itself a symlink
back to git-annex. If you want to have your own shell script in the post-commit
hook, just make it call `git annex` with no parameters. git-annex will detect
when it's run from a git hook and do the necessary fixups.

30
doc/copies.mdwn Normal file
View file

@ -0,0 +1,30 @@
The WORM and SHA1 key-value [[backends|backend]] store data inside
your git repository's `.git` directory, not in some external data store.
It's important that data not get lost by an ill-considered `git annex drop`
command. So, then using those backends, git-annex can be configured to try
to keep N copies of a file's content available across all repositories. By
default, N is 1; it is configured by annex.numcopies.
`git annex drop` attempts to check with other git remotes, to check that N
copies of the file exist. If enough repositories cannot be verified to have
it, it will retain the file content to avoid data loss.
For example, consider three repositories: Server, Laptop, and USB. Both Server
and USB have a copy of a file, and N=1. If on Laptop, you `git annex get
$file`, this will transfer it from either Server or USB (depending on which
is available), and there are now 3 copies of the file.
Suppose you want to free up space on Laptop again, and you `git annex drop` the file
there. If USB is connected, or Server can be contacted, git-annex can check
that it still has a copy of the file, and the content is removed from
Laptop. But if USB is currently disconnected, and Server also cannot be
contacted, it can't verify that it is safe to drop the file, and will
refuse to do so.
With N=2, in order to drop the file content from Laptop, it would need access
to both USB and Server.
Note that different repositories can be configured with different values of
N. So just because Laptop has N=2, this does not prevent the number of
copies falling to 1, when USB and Server have N=1.

View file

@ -1,3 +1,13 @@
# NAME
git-annex - manage files with git, without checking their contents in
# SYNOPSIS
git annex subcommand [path ...]
# DESCRIPTION
git-annex allows managing files with git, without checking the file git-annex allows managing files with git, without checking the file
contents into git. While that may seem paradoxical, it is useful when contents into git. While that may seem paradoxical, it is useful when
dealing with files larger than git can currently easily handle, whether due dealing with files larger than git can currently easily handle, whether due
@ -11,157 +21,94 @@ versioned files, which is convenient for maintaining documents, Makefiles,
etc that are associated with annexed files but that benefit from full etc that are associated with annexed files but that benefit from full
revision control. revision control.
My motivation for git-annex was the growing number of external drives I When a file is annexed, its content is moved into a key-value store, and
use. Some are used to archive data, others hold backups, and yet others a symlink is made that points to the content. These symlinks are checked into
come with me when I'm away from home to carry data that doesn't fit on my git and versioned like regular files. You can move them around, delete
netbook. Maintaining all that was a nightmare, lots of ad-hoc moving files them, and so on. Pushing to another git repository will make git-annex
around, rsyncing files (unison is too slow), and deleting multiple copies there aware of the annexed file, and it can be used to retrieve its
of files from multiple places. I realized what what I needed was a form of content from the key-value store.
revision control where each drive was a repository, and where copying the
files around, and deciding which copies were safe to delete was automated.
I posted about this to the VCS-home mailing list and got a great suggestion
to make it support arbitrary key-value stores, for more generality and
flexability. A week of coding later, and git-annex is born.
Enough broad picture, here's how it actually looks: # EXAMPLES
* `git annex add $file` moves the file into `.git/annex/`, and replaces # git annex get video/hackity_hack_and_kaxxt.mov
it with a symlink pointing at the annexed file, and then calls `git add` get video/_why_hackity_hack_and_kaxxt.mov (not available)
to version the *symlink*. (If the file has already been annexed, it does I was unable to access these remotes: server
nothing.) Try making some of these repositories available:
5863d8c0-d9a9-11df-adb2-af51e6559a49 -- my home file server
If you then use normal git push/pull commands, the annexed file content 58d84e8a-d9ae-11df-a1aa-ab9aa8c00826 -- portable USB drive
won't be transferred between repositories, but the symlinks will be. ca20064c-dbb5-11df-b2fe-002170d25c55 -- backup SATA drive
So different clones of a repository can have different sets of annexed failed
files available. # sudo mount /media/usb
# git remote add usbdrive /media/usb
You can move the symlink around, copy it, delete it, etc, and commit changes # git annex get video/hackity_hack_and_kaxxt.mov
as desired using git. Reading the symlink will always get you the annexed get video/hackity_hack_and_kaxxt.mov (copying from usbdrive...) ok
file content, or the link may be broken if the content is not currently # git commit -a -m "got a video I want to rewatch on the plane"
available.
* `git annex get $file` is used to transfer a specified file from the # git annex add iso
backend storage to the current repository. add iso/Debian_5.0.iso ok
* `git annex drop $file` indicates that you no longer want the file's # git commit -a -m "saving Debian CD for later"
content to be available in this repository.
* `git annex file $file` adjusts the symlink for the file to point to its # git annex push usbdrive iso
content again. Use this if you've moved the file around. error: push not yet implemented!
* `git annex unannex $file` undoes a `git annex add`. But use `git annex drop` # git annex drop iso
if you're just done with a file; only use `unannex` if you drop iso/Debian_5.0.iso ok
accidentially added a file. (You can also run this on all your annexed # git commit -a -m "freed up space"
files come the Singularity. ;-)
* `git annex init "some description"` allows associating some description
(such as "USB archive drive 1") with a repository. This can help with
finding it later, see "Location Tracking" below.
Oh yeah, "$file" in the above can be any number of files, or directories, # SUBCOMMANDS
same as you'd pass to "git add" or "git rm".
So "git annex add ." or "git annex get dir/" work fine.
## key-value storage Like many git commands, git-annex can be passed a path that
is either a file or a directory. In the latter case it acts on all relevant
files in the directory.
git-annex uses a key-value abstraction layer to allow file contents to be Many git-annex subcommands will stage changes for later `git commit` by you.
stored in different ways. In theory, any key-value storage system could be
used to store the file contents, and git-annex would then retrieve them
as needed and put them in `.git/annex/`.
When a file is annexed, a key is generated from its content and/or metadata. * add [path ...]
The file checked into git symlinks to the key. This key can later be used
to retrieve the file's content (its value). This key generation must be
stable for a given file content, name, and size.
Multiple pluggable backends are supported, and more than one can be used Adds files in the path to the annex. Files that are already checked into
to store different files' contents in a given repository. git, or that git has been configured to ignore will be silently skipped.
* `WORM` ("Write Once, Read Many") This backend stores the file's content * get [path ...]
only in `.git/annex/`, and assumes that any file with the same basename,
size, and modification time has the same content. So with this backend,
files can be moved around, but should never be added to or changed.
This is the default, and the least expensive backend.
* `SHA1` -- This backend stores the file's content in
`.git/annex/`, with a name based on its sha1 checksum. This backend allows
modifications of files to be tracked. Its need to generate checksums
can make it slow for large files.
* `URL` -- This backend downloads the file's content from an external URL.
## copies Makes the content of annexed files available in this repository. Depending
on the backend used, this will involve copying them from another repository,
or downloading them, or transferring them from some kind of key-value store.
The WORM and SHA1 key-value backends store data inside your git repository. * drop [path ...]
It's important that data not get lost by an ill-though `git annex drop`
command. So, then using those backends, git-annex can be configured to try
to keep N copies of a file's content available across all repositories. By
default, N is 1; it is configured by annex.numcopies.
`git annex drop` attempts to check with other git remotes, to check that N Drops the content of annexed files from this repository.
copies of the file exist. If enough repositories cannot be verified to have
it, it will retain the file content to avoid data loss.
For example, consider three repositories: Server, Laptop, and USB. Both Server git-annex may refuse to drop a content if the backend does not think
and USB have a copy of a file, and N=1. If on Laptop, you `git annex get it is safe to do so.
$file`, this will transfer it from either Server or USB (depending on which
is available), and there are now 3 copies of the file.
Suppose you want to free up space on Laptop again, and you `git annex drop` the file * unannex [path ...]
there. If USB is connected, or Server can be contacted, git-annex can check
that it still has a copy of the file, and the content is removed from
Laptop. But if USB is currently disconnected, and Server also cannot be
contacted, it can't verify that it is safe to drop the file, and will
refuse to do so.
With N=2, in order to drop the file content from Laptop, it would need access Use this to undo an accidental add command. This is not the command you
to both USB and Server. should use if you intentionally annexed a file and don't want its contents
any more. In that case you should use `git annex drop` instead, and you
can also `git rm` the file.
Note that different repositories can be configured with different values of * init "description"
N. So just because Laptop has N=2, this does not prevent the number of
copies falling to 1, when USB and Server have N=1.
## location tracking Initializes git-annex with a descripotion of the git repository.
This is an optional, but recommended step.
git-annex keeps track of in which repositories it last saw a file's content. * fix [path ...]
This location tracking information is stored in `.git-annex/$key.log`.
Repositories record their UUID and the date when they get or drop
a file's content. (Git is configured to use a union merge for this file,
so the lines may be in arbitrary order, but it will never conflict.)
This location tracking information is useful if you have multiple Fixes up symlinks that have become broken to again point to annexed content.
repositories, and not all are always accessible. For example, perhaps one This is useful to run if you have been moving the symlinks around.
is on a home file server, and you are away from home. Then git-annex can
tell you what git remote it needs access to in order to get a file:
# git annex get myfile # OPTIONS
get myfile (need access to one of these remotes: home)
git-annex: get myfile failed
Another way the location tracking comes in handy is if you put repositories * --force
on removable USB drives, that might be archived away offline in a safe
place. In this sort of case, you probably don't have a git remotes
configured for every USB drive. So git-annex may have to resort to talking
about repository UUIDs. If you have previously used "git annex init"
to attach descriptions to those repositories, it will include their
descriptions to help you with finding them:
# git annex get myfile Force unsafe actions, such as dropping a file's content when no other
get myfile (No available git remotes have the file.) source of it can be verified to still exist. Use with care.
It has been seen before in these repositories:
c0a28e06-d7ef-11df-885c-775af44f8882 -- USB archive drive 1
e1938fee-d95b-11df-96cc-002170d25c55
git-annex: get myfile failed
## symlink farming commit hook ## CONFIGURATION
git-annex does use a lot of symlinks. Specicially, relative symlinks, Like other git commands, git-annex is configured via `.git/config`.
that are checked into git. To allow you to move those around without
annoyance, git-annex can run as a post-commit hook. This way, you can `git mv`
a symlink to an annexed file, and as soon as you commit, it will be fixed
up.
`git annex init` tries to set up a post-commit hook that is itself a symlink * `annex.uuid` -- a unique UUID for this repository (automatically set)
back to git-annex. If you want to have your own shell script in the post-commit
hook, just make it call `git annex` with no parameters. git-annex will detect
when it's run from a git hook and do the necessary fixups.
## configuration
* `annex.uuid` -- a unique UUID for this repository
* `annex.numcopies` -- number of copies of files to keep across all * `annex.numcopies` -- number of copies of files to keep across all
repositories (default: 1) repositories (default: 1)
* `annex.backends` -- space-separated list of names of * `annex.backends` -- space-separated list of names of
@ -176,6 +123,24 @@ when it's run from a git hook and do the necessary fixups.
* `remote.<name>.annex-uuid` -- git-annex caches UUIDs of repositories * `remote.<name>.annex-uuid` -- git-annex caches UUIDs of repositories
here. here.
## contact # FILES
Joey Hess <joey@kitenet.net> These files are used, in your git repository:
`.git/annex/` contains the annexed file contents that are currently
available. Annexed files in your git repository symlink to that content.
`.git-annex/uuid.log` is used to map between repository UUID and
decscriptions. You may edit it.
`.git-annex/*.log` is where git-annex records its content tracking
information. These files should be committed to git.
`.git-annex/.gitattributes` is configured to use git's union merge driver
to avoid conflicts when merging files in the `.git-annex` directory.
# AUTHOR
Joey Hess <joey@ikiwiki.info>
Warning: this page is automatically made into a man page via [mdwn2man](http://git.ikiwiki.info/?p=ikiwiki;a=blob;f=mdwn2man;hb=HEAD). Edit with care

View file

@ -11,12 +11,19 @@ versioned files, which is convenient for maintaining documents, Makefiles,
etc that are associated with annexed files but that benefit from full etc that are associated with annexed files but that benefit from full
revision control. revision control.
* [[man page|git-annex]]
* **[[download]]** * **[[download]]**
* [[news]] * [[news]]
* [[bugs]] * [[bugs]]
* [[contact]] * [[contact]]
## documentation
* [[man page|git-annex]]
* [[key-value backends|backends]] for data storage
* [[location_tracking]] reminds you where git-annex has seen files
* git-annex prevents accidential data loss by [[tracking copies|copies]]
of your files
---- ----
git-annex's wiki is powered by [Ikiwiki](http://ikiwiki.info/) and git-annex's wiki is powered by [Ikiwiki](http://ikiwiki.info/) and

View file

@ -0,0 +1,28 @@
git-annex keeps track of in which repositories it last saw a file's content.
This location tracking information is stored in `.git-annex/$key.log`.
Repositories record their UUID and the date when they get or drop
a file's content. (Git is configured to use a union merge for this file,
so the lines may be in arbitrary order, but it will never conflict.)
This location tracking information is useful if you have multiple
repositories, and not all are always accessible. For example, perhaps one
is on a home file server, and you are away from home. Then git-annex can
tell you what git remote it needs access to in order to get a file:
# git annex get myfile
get myfile(not available)
I was unable to access these remotes: home
Another way the location tracking comes in handy is if you put repositories
on removable USB drives, that might be archived away offline in a safe
place. In this sort of case, you probably don't have a git remotes
configured for every USB drive. So git-annex may have to resort to talking
about repository UUIDs. If you have previously used "git annex init"
to attach descriptions to those repositories, it will include their
descriptions to help you with finding them:
# git annex get myfile
get myfile (not available)
Try making some of these repositories available:
c0a28e06-d7ef-11df-885c-775af44f8882 -- USB archive drive 1
e1938fee-d95b-11df-96cc-002170d25c55