update and blog for the day

the last of the bad bugs is fixed!
2012-06-15 22:59:32 -04:00 · 2012-06-15 22:59:32 -04:00 · bd8319e78c
commit bd8319e78c
parent af7b6319d7
2 changed files with 91 additions and 37 deletions
--- a/doc/design/assistant/blog/day_10__lsof.mdwn
+++ b/doc/design/assistant/blog/day_10__lsof.mdwn
@ -0,0 +1,54 @@
+A rather frustrating and long day coding went like this:
+
+## 1-3 pm
+
+Wrote a single function, of which all any Haskell programmer needs to know
+is its type signature:
+
+	Lsof.queryDir :: FilePath -> IO [(FilePath, LsofOpenMode, ProcessInfo)]
+
+When I'm spending another hour or two taking a unix utility like lsof and
+parsing its output, which in this case is in a rather complicated
+machine-parsable output format, I often wish unix streams were strongly
+typed, which would avoid this bother.
+
+## 3-9 pm
+
+Six hours spent making it defer annexing files until the commit thread
+wakes up and is about to make a commit. Why did it take so horribly long?
+Well, there were a number of complications, and some really bad bugs
+involving races that were hard to reproduce reliably enough to deal with.
+
+In other words, I was lost in the weeds for a lot of those hours...
+
+At one point, something glorious happened, and it was always making exactly
+one commit for batch mode modifications of a lot of files (like untarring
+them). Unfortunatly, I had to lose that gloriousness due to another
+potential race, which, while unlikely, would have made the program deadlock
+if it happened. 
+
+So, it's back to making 2 or 3 commits per batch mode change. I also have a
+buglet that causes sometimes a second empty commit after a file is added.
+I know why (the inotify event for the symlink gets in late,
+after the commit); will try to improve commit frequency later.
+
+## 9-11 pm
+
+Put the capstone on the day's work, by calling lsof on a directory full
+of hardlinks to the files that are about to be annexed, to check if any
+are still open for write.
+
+This works great! Starting up `git annex watch` when processes have files
+open is no longer a problem, and even if you're evil enough to try having
+muliple processes open the same file, it will complain and not annex it
+until all the writers close it.
+
+(Well, someone really evil could turn the write bit back on after git annex
+clears it, and open the file again, but then really evil people can do
+that to files in `.git/annex/objects` too, and they'll get their just
+deserts when `git annex fsck` runs. So, that's ok..)
+
+----
+
+Anyway, will beat on it more tomorrow, and if all is well, this will finally
+go out to the beta testers.
--- a/doc/design/assistant/inotify.mdwn
+++ b/doc/design/assistant/inotify.mdwn
@ -5,43 +5,6 @@ There is a `watch` branch in git that adds the command.

 ## known bugs

-* A process has a file open for write, another one closes it,
-  and so it's added. Then the first process modifies it.
-
-  Or, a process has a file open for write when `git annex watch` starts
-  up, it will be added to the annex. If the process later continues
-  writing, it will change content in the annex.
-
-  This changes content in the annex, and fsck will later catch
-  the inconsistency.
-
-  Possible fixes: 
-
-  * Somehow track or detect if a file is open for write by any processes.
-    `lsof` could be used, although it would be a little slow.
-
-    Here's one way to avoid the slowdown: When a file is being added,
-    set it read-only, and hard-link it into a quarantine directory,
-    remembering both filenames.
-    Then use the batch change mode code to detect batch adds and bundle
-    them together.
-    Just before committing, lsof the quarantine directory. Any files in
-    it that are still open for write can just have their write bit turned
-    back on and be deleted from quarantine, to be handled when their writer
-    closes. Files that pass quarantine get added as usual. This avoids
-    repeated lsof calls slowing down adds, but does add a constant factor
-    overhead (0.25 seconds lsof call) before any add gets committed.
-
-  * Or, when possible, making a copy on write copy before adding the file
-    would avoid this.
-  * Or, as a last resort, make an expensive copy of the file and add that.
-  * Tracking file opens and closes with inotify could tell if any other
-    processes have the file open. But there are problems.. It doesn't
-    seem to differentiate between files opened for read and for write.
-    And there would still be a race after the last close and before it's
-    injected into the annex, where it could be opened for write again.
-    Would need to detect that and undo the annex injection or something.
-
 * If a file is checked into git as a normal file and gets modified
  (or merged, etc), it will be converted into an annexed file.
  See [[blog/day_7__bugfixes]]
@ -140,3 +103,40 @@ Many races need to be dealt with by this code. Here are some of them.
 - coleasce related add/rm events for speed and less disk IO **done**
 - don't annex `.gitignore` and `.gitattributes` files **done**
 - run as a daemon **done**
+- A process has a file open for write, another one closes it,
+  and so it's added. Then the first process modifies it.
+
+  Or, a process has a file open for write when `git annex watch` starts
+  up, it will be added to the annex. If the process later continues
+  writing, it will change content in the annex.
+
+  This changes content in the annex, and fsck will later catch
+  the inconsistency.
+
+  Possible fixes: 
+
+  * Somehow track or detect if a file is open for write by any processes.
+    `lsof` could be used, although it would be a little slow.
+
+    Here's one way to avoid the slowdown: When a file is being added,
+    set it read-only, and hard-link it into a quarantine directory,
+    remembering both filenames.
+    Then use the batch change mode code to detect batch adds and bundle
+    them together.
+    Just before committing, lsof the quarantine directory. Any files in
+    it that are still open for write can just have their write bit turned
+    back on and be deleted from quarantine, to be handled when their writer
+    closes. Files that pass quarantine get added as usual. This avoids
+    repeated lsof calls slowing down adds, but does add a constant factor
+    overhead (0.25 seconds lsof call) before any add gets committed. **done**
+
+  * Or, when possible, making a copy on write copy before adding the file
+    would avoid this.
+  * Or, as a last resort, make an expensive copy of the file and add that.
+  * Tracking file opens and closes with inotify could tell if any other
+    processes have the file open. But there are problems.. It doesn't
+    seem to differentiate between files opened for read and for write.
+    And there would still be a race after the last close and before it's
+    injected into the annex, where it could be opened for write again.
+    Would need to detect that and undo the annex injection or something.
+