devblog

2019-01-14 19:00:38 -04:00 · 2019-01-14 19:00:38 -04:00 · d79ac08532
commit d79ac08532
parent f289663611
2 changed files with 27 additions and 1 deletions
--- a/2
+++ b/2
@ -14,7 +14,7 @@ git-annex (7.20181212) UNRELEASED; urgency=medium
  * importfeed: Better error message when downloading the feed fails.
  * Some optimisations, including a 10x faster timestamp parser,
    a 7x faster key parser, and improved parsing and serialization of
-    git-annex branch data. Some commands will run up to 15% faster.
+    git-annex branch data. Many commands will run 5-15% faster.
  * Stricter parser for keys doesn't allow doubled fields or out of order fields.
  * The benchmark command, which only had some old benchmarking of the sqlite
    databases before, now allows benchmarking any other git-annex commands.
--- a/doc/devblog/day_566__stopping_place.mdwn
+++ b/doc/devblog/day_566__stopping_place.mdwn
@ -0,0 +1,26 @@
+I said I was going to stop with the ByteString conversion, but then I
+looked at [[/profiling]], and I knew I couldn't stop there --
+conversion between String and ByteString had became a major cost center.
+
+So today, converted all the code that reads and parses symlinks and pointer files
+to ByteString, now ByteString is used all the way from disk to Key. Also
+put in some caching, so git-annex does not need to re-serialize a Key
+that it's just deserialized from a ByteString.
+
+There's still some ByteString to String conversion when generating
+FilePaths; to avoid that will need an equivilant of System.FilePath that
+operates on RawFilePath, and I don't think there is one yet? But the
+[[/profiling]] does show improvement, it's more and more dominated by IO
+operations that can't be sped up, and less by slow code.
+
+This really does feel like a stopping place now.
+
+Updated benchmarks (compared to last git-annex release):
+
+find on 10000 files, none present... 8% speedup  
+whereis on 1000 files............... 12% speedup  
+info on dir with 1000 files......... 7% speedup  
+local get ; drop of 1000 files...... 4% speedup  
+setting metadata in 1000 files...... 8% speedup  
+getting metadata from 1000 files.... 7% speedup  
+finding a single file out of 1000 that has a given metadata value... 8% speedup