This commit is contained in:
Joey Hess 2011-03-04 12:31:01 -04:00
parent c5c7eaf009
commit 69c14d130b
7 changed files with 70 additions and 2 deletions

View file

@ -0,0 +1,13 @@
In git, there can be multiple clones of a repository, each clone can
be independently modified, and clones can push or pull changes to
one-another to get back in sync.
git-annex preserves that fundamental distributed nature of git, while
dropping the requirement that, once in sync, each clone contains all the data
that was committed to each other clone. Instead of storing the content
of a file in the repository, git-annex stores a pointer to the content.
Each git-annex repository is responsible for storing some of the content,
and can copy it to or from other repositories. [[Location_tracking]]
information is committed to git, to let repositories inform other
repositories what file contents they have available.

24
doc/future_proofing.mdwn Normal file
View file

@ -0,0 +1,24 @@
Imagine putting a git-annex drive in a time capsule. In 20, or 50, or 100
years, you'd like its contents to be as accessible as possible to whoever
digs it up.
This is a hard problem. git-annex cannot completly solve it, but it does
its best to not contribute to the problem. Here are some aspects of the
problem:
* How are files accessed? Git-annex carefully adds minimal complexity
to access files in a repository. Nothing needs to be done to extract
files from the repository; they are there on disk in the usual way,
with just some symlinks pointing at the annexed file contents.
Neither git-annex nor git is needed to get at the file contents.
* What file formats are used? Will they still be readable? To deal with
this, it's best to stick to plain text files, and the most common
image, sound, etc formats. Consider storing the same content in multiple
formats.
* What filesystem is used on the drive? Will that filesystem still be
available?
* What is the hardware interface of the drive? Will hardware still exist
to talk to it?

View file

@ -30,3 +30,11 @@
situations. It lacks git-annex's support for widely distributed storage,
using only a single backend data store. It also does not support
partial checkouts of file contents, like git-annex does.
* git-annex is also not [boar](http://code.google.com/p/boar/),
although it shares many of its goals and characteristics. Boar implements
its own version control system, rather than simply embarcing and
extending git. And while boar supports distributed clones of a repository,
it does not support keeping different files in different clones of the
same repository, which git-annex does, and is an important feature for
large-scale archiving.

BIN
doc/repomap.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 126 KiB

View file

@ -0,0 +1,14 @@
git-annex can transfer data to or from any of a repository's git remotes.
Depending on where the remote is, the data transfer is done using rsync
(over ssh, with automatic resume), or plain cp (with copy-on-write
optimisations on supported filesystems).
It's equally easy to transfer a single file to or from a repository,
or to launch a retrievel of a massive pile of files from whatever
repositories they are scattered amoung.
git-annex automatically uses whatever remotes are currently accessible,
preferring ones that are less expensive to talk to.
[[!img repomap.png caption="A real-world repository interconnection map
(generated by git-annex map)"]]

View file

@ -10,9 +10,11 @@ When she has 1 bar on her cell, Alice queues up interesting files on her
server for later. At a coffee shop, she has git-annex download them to her
USB drive. High in the sky or in a remote cabin, she catches up on
podcasts, videos, and games, first letting git-annex copy them from
her USB drive to the netbook (this saves battery power).
her USB drive to the netbook (this saves battery power).
([[more about transferring data|transferring_data]])
When she's done, she tells git-annex which to keep and which to remove.
They're all removed from her netbook to save space, and Alice knows
that next time she syncs up to the net, her changes will be synced back
to her server.
to her server.
([more about distributed version control|distributed_version_control])

View file

@ -11,8 +11,15 @@ without worry about accidentally deleting anything.
When Bob needs access to some files, git-annex can tell him which drive(s)
they're on, and easily make them available. Indeed, every drive knows what
is on every other drive.
([[more about location tracking|location_tracking]])
Bob thinks long-term, and so he's glad that git-annex uses a simple
repository format. He knows his files will be accessible in the future
even if the world has forgotten about git-annex and git.
([[more about future-proofing|future_proofing]])
Run in a cron job, git-annex adds new files to archival drives at night. It
also helps Bob keep track of intentional, and unintentional copies of
files, and logs information he can use to decide when it's time to duplicate
the content of old drives.
([[more about backup copies|copies]])