Added a comment

2015-06-18 14:03:08 +00:00 · 2015-06-18 14:03:08 +00:00 · 25f2881cae
commit 25f2881cae
parent 45187482da
1 changed files with 20 additions and 0 deletions
--- a/doc/forum/Handling_a_large_number_of_files/comment_4_f9e25d63417d0354de158c47235cb3be._comment
+++ b/doc/forum/Handling_a_large_number_of_files/comment_4_f9e25d63417d0354de158c47235cb3be._comment
@ -0,0 +1,20 @@
+[[!comment format=mdwn
+ username="CandyAngel"
+ subject="comment 4"
+ date="2015-06-18T14:03:08Z"
+ content="""
+From what I have gleaned about dir_index, it speeds up finding particular files in a directory, but slows down enumeration of directory contents. It could very well end up as 6 of one, half a dozen of another as git-annex does a bit of both.
+
+So yeah, I've benchmarked this a little (just with a loopback ext3 partitions, dropping caches each time, so *caveat emptor*) and my findings match the above.
+
+Without dir_index, enumerating 102400 files takes 56% more time than 125 files (0.69 seconds vs 1.08 seconds), but with dir_index, it takes 715% more time (0.6 vs. 4.86). However, the initial creation of the files was much faster with dir_index than without (though I wasn't timing that bit).
+
+The results I've gotten so far show that (with dir_index):
+min time: Increases after 64000 files
+max time: Increases after 128000 files
+mean time: Increases after 16000 files
+
+Pretty easy to see where the 20000 files number may have come from.
+
+I'll post the numbers after I have run this longer one (100 iterations) in case anyone is interested. My conclusion is that dir_index being enabled and 16000 files per directory is optimal.
+"""]]