
* webapp: Detect when upgrades are available, and upgrade if the user desires. (Only when git-annex is installed using the prebuilt binaries from git-annex upstream, not from eg Debian.) * assistant: Detect when the git-annex binary is modified or replaced, and either prompt the user to restart the program, or automatically restart it. * annex.autoupgrade configures both the above upgrade behaviors. * Added support for quvi 0.9. Slightly suboptimal due to limitations in its interface compared with the old version. * Bug fix: annex.version did not get set on automatic upgrade to v5 direct mode repo, so the upgrade was performed repeatedly, slowing commands down. * webapp: Fix bug that broke switching between local repositories that use the new guarded direct mode. * Android: Fix stripping of the git-annex binary. * Android: Make terminal app show git-annex version number. * Android: Re-enable XMPP support. * reinject: Allow to be used in direct mode. * Futher improvements to git repo repair. Has now been tested in tens of thousands of intentionally damaged repos, and successfully repaired them all. * Allow use of --unused in bare repository. # imported from the archive
44 lines
2 KiB
Markdown
44 lines
2 KiB
Markdown
git-annex is designed for scalability. The key points are:
|
|
|
|
* Arbitrarily large files can be managed. The only constraint
|
|
on file size are how large a file your filesystem can hold.
|
|
|
|
While git-annex does checksum files by default, there
|
|
is a [[WORM_backend|backends]] available that avoids the checksumming
|
|
overhead, so you can add new, enormous files, very fast. This also
|
|
allows it to be used on systems with very slow disk IO.
|
|
|
|
* Memory usage should be constant. This is a "should", because there
|
|
can sometimes be leaks (and this is one of haskell's weak spots),
|
|
but git-annex is designed so that it does not need to hold all
|
|
the details about your repository in memory.
|
|
|
|
The one exception is that [[todo/git-annex_unused_eats_memory]],
|
|
because it *does* need to hold the whole repo state in memory. But
|
|
that is still considered a bug, and hoped to be solved one day.
|
|
Luckily, that command is not often used.
|
|
|
|
* Many files can be managed. The limiting factor is git's own
|
|
limitations in scaling to repositories with a lot of files, and as git
|
|
improves this will improve. Scaling to hundreds of thousands of files
|
|
is not a problem, scaling beyond that and git will start to get slow.
|
|
|
|
To some degree, git-annex works around inefficiencies in git; for
|
|
example it batches input sent to certain git commands that are slow
|
|
when run in an enormous repository.
|
|
|
|
* It can use as much, or as little bandwidth as is available. In
|
|
particular, any interrupted file transfer can be resumed by git-annex.
|
|
|
|
## scalability tips
|
|
|
|
* If the files are so big that checksumming becomes a bottleneck, consider
|
|
using the [[WORM_backend|backends]]. You can always `git annex migrate`
|
|
files to a checksumming backend later on.
|
|
|
|
* If you're adding a huge number of files at once (hundreds of thousands),
|
|
you'll soon notice that git-annex periodically stops and say
|
|
"Recording state in git" while it runs a `git add` command that
|
|
becomes increasingly expensive. Consider adjusting the `annex.queuesize`
|
|
to a higher value, at the expense of it using more memory.
|
|
|