update and blog for the day

the last of the bad bugs is fixed!
This commit is contained in:
Joey Hess 2012-06-15 22:59:32 -04:00
parent af7b6319d7
commit bd8319e78c
2 changed files with 91 additions and 37 deletions

View file

@ -0,0 +1,54 @@
A rather frustrating and long day coding went like this:
## 1-3 pm
Wrote a single function, of which all any Haskell programmer needs to know
is its type signature:
Lsof.queryDir :: FilePath -> IO [(FilePath, LsofOpenMode, ProcessInfo)]
When I'm spending another hour or two taking a unix utility like lsof and
parsing its output, which in this case is in a rather complicated
machine-parsable output format, I often wish unix streams were strongly
typed, which would avoid this bother.
## 3-9 pm
Six hours spent making it defer annexing files until the commit thread
wakes up and is about to make a commit. Why did it take so horribly long?
Well, there were a number of complications, and some really bad bugs
involving races that were hard to reproduce reliably enough to deal with.
In other words, I was lost in the weeds for a lot of those hours...
At one point, something glorious happened, and it was always making exactly
one commit for batch mode modifications of a lot of files (like untarring
them). Unfortunatly, I had to lose that gloriousness due to another
potential race, which, while unlikely, would have made the program deadlock
if it happened.
So, it's back to making 2 or 3 commits per batch mode change. I also have a
buglet that causes sometimes a second empty commit after a file is added.
I know why (the inotify event for the symlink gets in late,
after the commit); will try to improve commit frequency later.
## 9-11 pm
Put the capstone on the day's work, by calling lsof on a directory full
of hardlinks to the files that are about to be annexed, to check if any
are still open for write.
This works great! Starting up `git annex watch` when processes have files
open is no longer a problem, and even if you're evil enough to try having
muliple processes open the same file, it will complain and not annex it
until all the writers close it.
(Well, someone really evil could turn the write bit back on after git annex
clears it, and open the file again, but then really evil people can do
that to files in `.git/annex/objects` too, and they'll get their just
deserts when `git annex fsck` runs. So, that's ok..)
----
Anyway, will beat on it more tomorrow, and if all is well, this will finally
go out to the beta testers.

View file

@ -5,43 +5,6 @@ There is a `watch` branch in git that adds the command.
## known bugs
* A process has a file open for write, another one closes it,
and so it's added. Then the first process modifies it.
Or, a process has a file open for write when `git annex watch` starts
up, it will be added to the annex. If the process later continues
writing, it will change content in the annex.
This changes content in the annex, and fsck will later catch
the inconsistency.
Possible fixes:
* Somehow track or detect if a file is open for write by any processes.
`lsof` could be used, although it would be a little slow.
Here's one way to avoid the slowdown: When a file is being added,
set it read-only, and hard-link it into a quarantine directory,
remembering both filenames.
Then use the batch change mode code to detect batch adds and bundle
them together.
Just before committing, lsof the quarantine directory. Any files in
it that are still open for write can just have their write bit turned
back on and be deleted from quarantine, to be handled when their writer
closes. Files that pass quarantine get added as usual. This avoids
repeated lsof calls slowing down adds, but does add a constant factor
overhead (0.25 seconds lsof call) before any add gets committed.
* Or, when possible, making a copy on write copy before adding the file
would avoid this.
* Or, as a last resort, make an expensive copy of the file and add that.
* Tracking file opens and closes with inotify could tell if any other
processes have the file open. But there are problems.. It doesn't
seem to differentiate between files opened for read and for write.
And there would still be a race after the last close and before it's
injected into the annex, where it could be opened for write again.
Would need to detect that and undo the annex injection or something.
* If a file is checked into git as a normal file and gets modified
(or merged, etc), it will be converted into an annexed file.
See [[blog/day_7__bugfixes]]
@ -140,3 +103,40 @@ Many races need to be dealt with by this code. Here are some of them.
- coleasce related add/rm events for speed and less disk IO **done**
- don't annex `.gitignore` and `.gitattributes` files **done**
- run as a daemon **done**
- A process has a file open for write, another one closes it,
and so it's added. Then the first process modifies it.
Or, a process has a file open for write when `git annex watch` starts
up, it will be added to the annex. If the process later continues
writing, it will change content in the annex.
This changes content in the annex, and fsck will later catch
the inconsistency.
Possible fixes:
* Somehow track or detect if a file is open for write by any processes.
`lsof` could be used, although it would be a little slow.
Here's one way to avoid the slowdown: When a file is being added,
set it read-only, and hard-link it into a quarantine directory,
remembering both filenames.
Then use the batch change mode code to detect batch adds and bundle
them together.
Just before committing, lsof the quarantine directory. Any files in
it that are still open for write can just have their write bit turned
back on and be deleted from quarantine, to be handled when their writer
closes. Files that pass quarantine get added as usual. This avoids
repeated lsof calls slowing down adds, but does add a constant factor
overhead (0.25 seconds lsof call) before any add gets committed. **done**
* Or, when possible, making a copy on write copy before adding the file
would avoid this.
* Or, as a last resort, make an expensive copy of the file and add that.
* Tracking file opens and closes with inotify could tell if any other
processes have the file open. But there are problems.. It doesn't
seem to differentiate between files opened for read and for write.
And there would still be a race after the last close and before it's
injected into the annex, where it could be opened for write again.
Would need to detect that and undo the annex injection or something.