b6d46c212e
* unannex, uninit: Avoid committing after every file is unannexed, for massive speedup. * --notify-finish switch will cause desktop notifications after each file upload/download/drop completes (using the dbus Desktop Notifications Specification) * --notify-start switch will show desktop notifications when each file upload/download starts. * webapp: Automatically install Nautilus integration scripts to get and drop files. * tahoe: Pass -d parameter before subcommand; putting it after the subcommand no longer works with tahoe-lafs version 1.10. (Thanks, Alberto Berti) * forget --drop-dead: Avoid removing the dead remote from the trust.log, so that if git remotes for it still exist anywhere, git annex info will still know it's dead and not show it. * git-annex-shell: Make configlist automatically initialize a remote git repository, as long as a git-annex branch has been pushed to it, to simplify setup of remote git repositories, including via gitolite. * add --include-dotfiles: New option, perhaps useful for backups. * Version 5.20140227 broke creation of glacier repositories, not including the datacenter and vault in their configuration. This bug is fixed, but glacier repositories set up with the broken version of git-annex need to have the datacenter and vault set in order to be usable. This can be done using git annex enableremote to add the missing settings. For details, see http://git-annex.branchable.com/bugs/problems_with_glacier/ * Added required content configuration. * assistant: Improve ssh authorized keys line generated in local pairing or for a remote ssh server to set environment variables in an alternative way that works with the non-POSIX fish shell, as well as POSIX shells. # imported from the archive
44 lines
2 KiB
Markdown
44 lines
2 KiB
Markdown
git-annex is designed for scalability. The key points are:
|
|
|
|
* Arbitrarily large files can be managed. The only constraint
|
|
on file size are how large a file your filesystem can hold.
|
|
|
|
While git-annex does checksum files by default, there
|
|
is a [[WORM_backend|backends]] available that avoids the checksumming
|
|
overhead, so you can add new, enormous files, very fast. This also
|
|
allows it to be used on systems with very slow disk IO.
|
|
|
|
* Memory usage should be constant. This is a "should", because there
|
|
can sometimes be leaks (and this is one of haskell's weak spots),
|
|
but git-annex is designed so that it does not need to hold all
|
|
the details about your repository in memory.
|
|
|
|
The one exception is that [[todo/git-annex_unused_eats_memory]],
|
|
because it *does* need to hold the whole repo state in memory. But
|
|
that is still considered a bug, and hoped to be solved one day.
|
|
Luckily, that command is not often used.
|
|
|
|
* Many files can be managed. The limiting factor is git's own
|
|
limitations in scaling to repositories with a lot of files, and as git
|
|
improves this will improve. Scaling to hundreds of thousands of files
|
|
is not a problem, scaling beyond that and git will start to get slow.
|
|
|
|
To some degree, git-annex works around inefficiencies in git; for
|
|
example it batches input sent to certain git commands that are slow
|
|
when run in an enormous repository.
|
|
|
|
* It can use as much, or as little bandwidth as is available. In
|
|
particular, any interrupted file transfer can be resumed by git-annex.
|
|
|
|
## scalability tips
|
|
|
|
* If the files are so big that checksumming becomes a bottleneck, consider
|
|
using the [[WORM_backend|backends]]. You can always `git annex migrate`
|
|
files to a checksumming backend later on.
|
|
|
|
* If you're adding a huge number of files at once (hundreds of thousands),
|
|
you'll soon notice that git-annex periodically stops and say
|
|
"Recording state in git" while it runs a `git add` command that
|
|
becomes increasingly expensive. Consider adjusting the `annex.queuesize`
|
|
to a higher value, at the expense of it using more memory.
|
|
|