git-annex/doc/scalability.mdwn

git-annex is designed for scalability. The key points are:

* Arbitrarily large files can be managed. The only constraint
  on file size are how large a file your filesystem can hold.

  While git-annex does checksum files by default, there
  is a [[WORM_backend|backends]] available that avoids the checksumming
  overhead, so you can add new, enormous files, very fast. This also
  allows it to be used on systems with very slow disk IO.

* Memory usage should be constant. This is a "should", because there
  can sometimes be leaks (and this is one of haskell's weak spots),
  but git-annex is designed so that it does not need to hold all
  the details about your repository in memory.

  The one exception is that [[todo/git-annex_unused_eats_memory]],
  because it *does* need to hold the whole repo state in memory. But
  that is still considered a bug, and hoped to be solved one day.
  Luckily, that command is not often used.

* Many files can be managed. The limiting factor is git's own
  limitations in scaling to repositories with a lot of files, and as git
  improves this will improve. Scaling to hundreds of thousands of files
  is not a problem, scaling beyond that and git will start to get slow.

  To some degree, git-annex works around innefficiencies in git; for
  example it batches input sent to certian git commands that are slow
  when run in an emormous repository.

* It can use as much, or as little bandwidth as is available. In
  particular, any interrupted file transfer can be resumed by git-annex.
add scalability page 2012-02-14 22:50:25 +00:00			`git-annex is designed for scalability. The key points are:`

			`* Arbitrarily large files can be managed. The only constraint`
			`on file size are how large a file your filesystem can hold.`

			`While git-annex does checksum files by default, there`
			`is a [[WORM_backend\|backends]] available that avoids the checksumming`
			`overhead, so you can add new, enormous files, very fast. This also`
			`allows it to be used on systems with very slow disk IO.`

			`* Memory usage should be constant. This is a "should", because there`
			`can sometimes be leaks (and this is one of haskell's weak spots),`
			`but git-annex is designed so that it does not need to hold all`
			`the details about your repository in memory.`

			`The one exception is that [[todo/git-annex_unused_eats_memory]],`
			`because it does need to hold the whole repo state in memory. But`
			`that is still considered a bug, and hoped to be solved one day.`
			`Luckily, that command is not often used.`

			`* Many files can be managed. The limiting factor is git's own`
			`limitations in scaling to repositories with a lot of files, and as git`
			`improves this will improve. Scaling to hundreds of thousands of files`
			`is not a problem, scaling beyond that and git will start to get slow.`

			`To some degree, git-annex works around innefficiencies in git; for`
			`example it batches input sent to certian git commands that are slow`
			`when run in an emormous repository.`

			`* It can use as much, or as little bandwidth as is available. In`
			`particular, any interrupted file transfer can be resumed by git-annex.`