diff --git a/doc/design/assistant/blog/day_7__bugfixes.mdwn b/doc/design/assistant/blog/day_7__bugfixes.mdwn new file mode 100644 index 0000000000..3704969e30 --- /dev/null +++ b/doc/design/assistant/blog/day_7__bugfixes.mdwn @@ -0,0 +1,45 @@ +Kickstarter is over. Yay! + +Today I worked on the bug where `git annex watch` turned regular files +that were already checked into git into symlinks. So I made it check +if a file is already in git before trying to add it to the annex. + +The tricky part was doing this check quickly. Unless I want to write my +own git index parser (or use one from Hackage), this check requires running +`git ls-files`, once per file to be added. That won't fly if a huge +tree of files is being moved or unpacked into the watched directory. + +Instead, I made it only do the check during `git annex watch`'s initial +scan of the tree. This should be ok, because once it's running, you +won't be adding new files to git anyway, since it'll automatically annex +new files. This is good enough for now, but there are at least two problems +with it: + +* Someone might `git merge` in a branch that has some regular files, + and it would add the merged in files to the annex. +* Once `git annex watch` is running, if you modify a file that was + checked into git as a regular file, the new version will be added + to the annex. + +I'll probably come back to this issue, and may well find myself directly +querying git's index. + +--- + +I've started work to fix the memory leak I see when running `git annex +watch` in a large repository (40 thousand files). As always with a Haskell +memory leak, I crack open [Real World Haskell's chapter on profiling](http://book.realworldhaskell.org/read/profiling-and-optimization.html). + +Eventually this yields a nice graph of the problem: + +[[!img profile.png alt="memory profile"]] + +So, looks like a few minor memory leaks, and one huge leak. Stared +at this for a while and trying a few things, and got a much better result: + +[[!img profile2.png alt="memory profile"]] + +I may come back later and try to improve this further, but it's not bad memory +usage. But, it's still rather slow to start up in such a large repository, +and its initial scan is still doing too much work. I need to optimize +more.. diff --git a/doc/design/assistant/blog/day_7__bugfixes/profile.png b/doc/design/assistant/blog/day_7__bugfixes/profile.png new file mode 100644 index 0000000000..702af1ca00 Binary files /dev/null and b/doc/design/assistant/blog/day_7__bugfixes/profile.png differ diff --git a/doc/design/assistant/blog/day_7__bugfixes/profile2.png b/doc/design/assistant/blog/day_7__bugfixes/profile2.png new file mode 100644 index 0000000000..4d487d02a2 Binary files /dev/null and b/doc/design/assistant/blog/day_7__bugfixes/profile2.png differ