diff --git a/doc/index.mdwn b/doc/index.mdwn index 8bbffab4ad..9ba5d5c316 100644 --- a/doc/index.mdwn +++ b/doc/index.mdwn @@ -49,6 +49,7 @@ files with git. * [[encryption]] * [[bare_repositories]] * [[internals]] +* [[scalability]] * [[design]] * [[what git annex is not|not]] * [[sitemap]] diff --git a/doc/scalability.mdwn b/doc/scalability.mdwn new file mode 100644 index 0000000000..71e21ac4c2 --- /dev/null +++ b/doc/scalability.mdwn @@ -0,0 +1,31 @@ +git-annex is designed for scalability. The key points are: + +* Arbitrarily large files can be managed. The only constraint + on file size are how large a file your filesystem can hold. + + While git-annex does checksum files by default, there + is a [[WORM_backend|backends]] available that avoids the checksumming + overhead, so you can add new, enormous files, very fast. This also + allows it to be used on systems with very slow disk IO. + +* Memory usage should be constant. This is a "should", because there + can sometimes be leaks (and this is one of haskell's weak spots), + but git-annex is designed so that it does not need to hold all + the details about your repository in memory. + + The one exception is that [[todo/git-annex_unused_eats_memory]], + because it *does* need to hold the whole repo state in memory. But + that is still considered a bug, and hoped to be solved one day. + Luckily, that command is not often used. + +* Many files can be managed. The limiting factor is git's own + limitations in scaling to repositories with a lot of files, and as git + improves this will improve. Scaling to hundreds of thousands of files + is not a problem, scaling beyond that and git will start to get slow. + + To some degree, git-annex works around innefficiencies in git; for + example it batches input sent to certian git commands that are slow + when run in an emormous repository. + +* It can use as much, or as little bandwidth as is available. In + particular, any interrupted file transfer can be resumed by git-annex.