Initial writeup of tips for repos with large file count
This commit is contained in:
parent
d096cd5c40
commit
ea60ce4066
1 changed files with 48 additions and 0 deletions
48
doc/tips/Repositories_with_large_number_of_files.mdwn
Normal file
48
doc/tips/Repositories_with_large_number_of_files.mdwn
Normal file
|
@ -0,0 +1,48 @@
|
|||
Just as git does not scale well with large files, it can also become painful to work with when you have a large *number* of files. Below are things I have found to minimise the pain.
|
||||
|
||||
# Using version 4 index files
|
||||
|
||||
During operations which affect the index, git writes an entirely new index out to index.lck and then replaces .git/index with it. With a large number of files, this index file can be quite large and take several seconds to write every time you manipulate the index!
|
||||
|
||||
This can be mitigated by changing it to version 4 which uses path compression to reduce the filesize:
|
||||
|
||||
git update-index --index-version 4
|
||||
|
||||
*NOTE: The git documentation warns that this version may not be supported by other git implementations like JGit and libgit2.*
|
||||
|
||||
Personally, I saw a reduction from 516MB to 206MB (*40% of original size*) and got a much more responsive git!
|
||||
|
||||
It may also be worth doing the same to git-annex's index:
|
||||
|
||||
GIT_INDEX_FILE=.git/annex/index git update-index --index-version 4
|
||||
|
||||
Though I didn't gain as much here with 89MB to 86MB (96% of original size).
|
||||
|
||||
# Packing
|
||||
|
||||
As I have gc disabled:
|
||||
|
||||
git config gc.auto 0
|
||||
|
||||
so I control when it is run, I ended up with a lot of loose objects which also cause slowness in git. Using
|
||||
|
||||
git count-objects
|
||||
|
||||
to tell me how many loose objects I have, when I reach a threshold (~25000), I pack those loose objects and clean things up:
|
||||
|
||||
git repack -d
|
||||
git gc
|
||||
git prune
|
||||
|
||||
# File count per directory
|
||||
|
||||
If it takes a long time to list the files in a directory, naturally, git(-annex) will be affected by this bottleneck.
|
||||
|
||||
You can avoid this by keeping the number of files in a directory to between 5000 and 20000 (depends on the filesystem and its settings).
|
||||
|
||||
[fpart](http://contribs.martymac.org/fpart/) can be a very useful tool to achieve this.
|
||||
|
||||
## Topics discussing this sort of usage
|
||||
|
||||
* [[forum/Handling_a_large_number_of_files]]
|
||||
* [[forum/__34__git_annex_sync__34___synced_after_8_hours]]
|
Loading…
Reference in a new issue