update and blog for the day
the last of the bad bugs is fixed!
This commit is contained in:
parent
af7b6319d7
commit
bd8319e78c
2 changed files with 91 additions and 37 deletions
54
doc/design/assistant/blog/day_10__lsof.mdwn
Normal file
54
doc/design/assistant/blog/day_10__lsof.mdwn
Normal file
|
@ -0,0 +1,54 @@
|
|||
A rather frustrating and long day coding went like this:
|
||||
|
||||
## 1-3 pm
|
||||
|
||||
Wrote a single function, of which all any Haskell programmer needs to know
|
||||
is its type signature:
|
||||
|
||||
Lsof.queryDir :: FilePath -> IO [(FilePath, LsofOpenMode, ProcessInfo)]
|
||||
|
||||
When I'm spending another hour or two taking a unix utility like lsof and
|
||||
parsing its output, which in this case is in a rather complicated
|
||||
machine-parsable output format, I often wish unix streams were strongly
|
||||
typed, which would avoid this bother.
|
||||
|
||||
## 3-9 pm
|
||||
|
||||
Six hours spent making it defer annexing files until the commit thread
|
||||
wakes up and is about to make a commit. Why did it take so horribly long?
|
||||
Well, there were a number of complications, and some really bad bugs
|
||||
involving races that were hard to reproduce reliably enough to deal with.
|
||||
|
||||
In other words, I was lost in the weeds for a lot of those hours...
|
||||
|
||||
At one point, something glorious happened, and it was always making exactly
|
||||
one commit for batch mode modifications of a lot of files (like untarring
|
||||
them). Unfortunatly, I had to lose that gloriousness due to another
|
||||
potential race, which, while unlikely, would have made the program deadlock
|
||||
if it happened.
|
||||
|
||||
So, it's back to making 2 or 3 commits per batch mode change. I also have a
|
||||
buglet that causes sometimes a second empty commit after a file is added.
|
||||
I know why (the inotify event for the symlink gets in late,
|
||||
after the commit); will try to improve commit frequency later.
|
||||
|
||||
## 9-11 pm
|
||||
|
||||
Put the capstone on the day's work, by calling lsof on a directory full
|
||||
of hardlinks to the files that are about to be annexed, to check if any
|
||||
are still open for write.
|
||||
|
||||
This works great! Starting up `git annex watch` when processes have files
|
||||
open is no longer a problem, and even if you're evil enough to try having
|
||||
muliple processes open the same file, it will complain and not annex it
|
||||
until all the writers close it.
|
||||
|
||||
(Well, someone really evil could turn the write bit back on after git annex
|
||||
clears it, and open the file again, but then really evil people can do
|
||||
that to files in `.git/annex/objects` too, and they'll get their just
|
||||
deserts when `git annex fsck` runs. So, that's ok..)
|
||||
|
||||
----
|
||||
|
||||
Anyway, will beat on it more tomorrow, and if all is well, this will finally
|
||||
go out to the beta testers.
|
|
@ -5,43 +5,6 @@ There is a `watch` branch in git that adds the command.
|
|||
|
||||
## known bugs
|
||||
|
||||
* A process has a file open for write, another one closes it,
|
||||
and so it's added. Then the first process modifies it.
|
||||
|
||||
Or, a process has a file open for write when `git annex watch` starts
|
||||
up, it will be added to the annex. If the process later continues
|
||||
writing, it will change content in the annex.
|
||||
|
||||
This changes content in the annex, and fsck will later catch
|
||||
the inconsistency.
|
||||
|
||||
Possible fixes:
|
||||
|
||||
* Somehow track or detect if a file is open for write by any processes.
|
||||
`lsof` could be used, although it would be a little slow.
|
||||
|
||||
Here's one way to avoid the slowdown: When a file is being added,
|
||||
set it read-only, and hard-link it into a quarantine directory,
|
||||
remembering both filenames.
|
||||
Then use the batch change mode code to detect batch adds and bundle
|
||||
them together.
|
||||
Just before committing, lsof the quarantine directory. Any files in
|
||||
it that are still open for write can just have their write bit turned
|
||||
back on and be deleted from quarantine, to be handled when their writer
|
||||
closes. Files that pass quarantine get added as usual. This avoids
|
||||
repeated lsof calls slowing down adds, but does add a constant factor
|
||||
overhead (0.25 seconds lsof call) before any add gets committed.
|
||||
|
||||
* Or, when possible, making a copy on write copy before adding the file
|
||||
would avoid this.
|
||||
* Or, as a last resort, make an expensive copy of the file and add that.
|
||||
* Tracking file opens and closes with inotify could tell if any other
|
||||
processes have the file open. But there are problems.. It doesn't
|
||||
seem to differentiate between files opened for read and for write.
|
||||
And there would still be a race after the last close and before it's
|
||||
injected into the annex, where it could be opened for write again.
|
||||
Would need to detect that and undo the annex injection or something.
|
||||
|
||||
* If a file is checked into git as a normal file and gets modified
|
||||
(or merged, etc), it will be converted into an annexed file.
|
||||
See [[blog/day_7__bugfixes]]
|
||||
|
@ -140,3 +103,40 @@ Many races need to be dealt with by this code. Here are some of them.
|
|||
- coleasce related add/rm events for speed and less disk IO **done**
|
||||
- don't annex `.gitignore` and `.gitattributes` files **done**
|
||||
- run as a daemon **done**
|
||||
- A process has a file open for write, another one closes it,
|
||||
and so it's added. Then the first process modifies it.
|
||||
|
||||
Or, a process has a file open for write when `git annex watch` starts
|
||||
up, it will be added to the annex. If the process later continues
|
||||
writing, it will change content in the annex.
|
||||
|
||||
This changes content in the annex, and fsck will later catch
|
||||
the inconsistency.
|
||||
|
||||
Possible fixes:
|
||||
|
||||
* Somehow track or detect if a file is open for write by any processes.
|
||||
`lsof` could be used, although it would be a little slow.
|
||||
|
||||
Here's one way to avoid the slowdown: When a file is being added,
|
||||
set it read-only, and hard-link it into a quarantine directory,
|
||||
remembering both filenames.
|
||||
Then use the batch change mode code to detect batch adds and bundle
|
||||
them together.
|
||||
Just before committing, lsof the quarantine directory. Any files in
|
||||
it that are still open for write can just have their write bit turned
|
||||
back on and be deleted from quarantine, to be handled when their writer
|
||||
closes. Files that pass quarantine get added as usual. This avoids
|
||||
repeated lsof calls slowing down adds, but does add a constant factor
|
||||
overhead (0.25 seconds lsof call) before any add gets committed. **done**
|
||||
|
||||
* Or, when possible, making a copy on write copy before adding the file
|
||||
would avoid this.
|
||||
* Or, as a last resort, make an expensive copy of the file and add that.
|
||||
* Tracking file opens and closes with inotify could tell if any other
|
||||
processes have the file open. But there are problems.. It doesn't
|
||||
seem to differentiate between files opened for read and for write.
|
||||
And there would still be a race after the last close and before it's
|
||||
injected into the annex, where it could be opened for write again.
|
||||
Would need to detect that and undo the annex injection or something.
|
||||
|
||||
|
|
Loading…
Reference in a new issue