update and blog for the day
the last of the bad bugs is fixed!
This commit is contained in:
parent
af7b6319d7
commit
bd8319e78c
2 changed files with 91 additions and 37 deletions
54
doc/design/assistant/blog/day_10__lsof.mdwn
Normal file
54
doc/design/assistant/blog/day_10__lsof.mdwn
Normal file
|
@ -0,0 +1,54 @@
|
||||||
|
A rather frustrating and long day coding went like this:
|
||||||
|
|
||||||
|
## 1-3 pm
|
||||||
|
|
||||||
|
Wrote a single function, of which all any Haskell programmer needs to know
|
||||||
|
is its type signature:
|
||||||
|
|
||||||
|
Lsof.queryDir :: FilePath -> IO [(FilePath, LsofOpenMode, ProcessInfo)]
|
||||||
|
|
||||||
|
When I'm spending another hour or two taking a unix utility like lsof and
|
||||||
|
parsing its output, which in this case is in a rather complicated
|
||||||
|
machine-parsable output format, I often wish unix streams were strongly
|
||||||
|
typed, which would avoid this bother.
|
||||||
|
|
||||||
|
## 3-9 pm
|
||||||
|
|
||||||
|
Six hours spent making it defer annexing files until the commit thread
|
||||||
|
wakes up and is about to make a commit. Why did it take so horribly long?
|
||||||
|
Well, there were a number of complications, and some really bad bugs
|
||||||
|
involving races that were hard to reproduce reliably enough to deal with.
|
||||||
|
|
||||||
|
In other words, I was lost in the weeds for a lot of those hours...
|
||||||
|
|
||||||
|
At one point, something glorious happened, and it was always making exactly
|
||||||
|
one commit for batch mode modifications of a lot of files (like untarring
|
||||||
|
them). Unfortunatly, I had to lose that gloriousness due to another
|
||||||
|
potential race, which, while unlikely, would have made the program deadlock
|
||||||
|
if it happened.
|
||||||
|
|
||||||
|
So, it's back to making 2 or 3 commits per batch mode change. I also have a
|
||||||
|
buglet that causes sometimes a second empty commit after a file is added.
|
||||||
|
I know why (the inotify event for the symlink gets in late,
|
||||||
|
after the commit); will try to improve commit frequency later.
|
||||||
|
|
||||||
|
## 9-11 pm
|
||||||
|
|
||||||
|
Put the capstone on the day's work, by calling lsof on a directory full
|
||||||
|
of hardlinks to the files that are about to be annexed, to check if any
|
||||||
|
are still open for write.
|
||||||
|
|
||||||
|
This works great! Starting up `git annex watch` when processes have files
|
||||||
|
open is no longer a problem, and even if you're evil enough to try having
|
||||||
|
muliple processes open the same file, it will complain and not annex it
|
||||||
|
until all the writers close it.
|
||||||
|
|
||||||
|
(Well, someone really evil could turn the write bit back on after git annex
|
||||||
|
clears it, and open the file again, but then really evil people can do
|
||||||
|
that to files in `.git/annex/objects` too, and they'll get their just
|
||||||
|
deserts when `git annex fsck` runs. So, that's ok..)
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
Anyway, will beat on it more tomorrow, and if all is well, this will finally
|
||||||
|
go out to the beta testers.
|
|
@ -5,43 +5,6 @@ There is a `watch` branch in git that adds the command.
|
||||||
|
|
||||||
## known bugs
|
## known bugs
|
||||||
|
|
||||||
* A process has a file open for write, another one closes it,
|
|
||||||
and so it's added. Then the first process modifies it.
|
|
||||||
|
|
||||||
Or, a process has a file open for write when `git annex watch` starts
|
|
||||||
up, it will be added to the annex. If the process later continues
|
|
||||||
writing, it will change content in the annex.
|
|
||||||
|
|
||||||
This changes content in the annex, and fsck will later catch
|
|
||||||
the inconsistency.
|
|
||||||
|
|
||||||
Possible fixes:
|
|
||||||
|
|
||||||
* Somehow track or detect if a file is open for write by any processes.
|
|
||||||
`lsof` could be used, although it would be a little slow.
|
|
||||||
|
|
||||||
Here's one way to avoid the slowdown: When a file is being added,
|
|
||||||
set it read-only, and hard-link it into a quarantine directory,
|
|
||||||
remembering both filenames.
|
|
||||||
Then use the batch change mode code to detect batch adds and bundle
|
|
||||||
them together.
|
|
||||||
Just before committing, lsof the quarantine directory. Any files in
|
|
||||||
it that are still open for write can just have their write bit turned
|
|
||||||
back on and be deleted from quarantine, to be handled when their writer
|
|
||||||
closes. Files that pass quarantine get added as usual. This avoids
|
|
||||||
repeated lsof calls slowing down adds, but does add a constant factor
|
|
||||||
overhead (0.25 seconds lsof call) before any add gets committed.
|
|
||||||
|
|
||||||
* Or, when possible, making a copy on write copy before adding the file
|
|
||||||
would avoid this.
|
|
||||||
* Or, as a last resort, make an expensive copy of the file and add that.
|
|
||||||
* Tracking file opens and closes with inotify could tell if any other
|
|
||||||
processes have the file open. But there are problems.. It doesn't
|
|
||||||
seem to differentiate between files opened for read and for write.
|
|
||||||
And there would still be a race after the last close and before it's
|
|
||||||
injected into the annex, where it could be opened for write again.
|
|
||||||
Would need to detect that and undo the annex injection or something.
|
|
||||||
|
|
||||||
* If a file is checked into git as a normal file and gets modified
|
* If a file is checked into git as a normal file and gets modified
|
||||||
(or merged, etc), it will be converted into an annexed file.
|
(or merged, etc), it will be converted into an annexed file.
|
||||||
See [[blog/day_7__bugfixes]]
|
See [[blog/day_7__bugfixes]]
|
||||||
|
@ -140,3 +103,40 @@ Many races need to be dealt with by this code. Here are some of them.
|
||||||
- coleasce related add/rm events for speed and less disk IO **done**
|
- coleasce related add/rm events for speed and less disk IO **done**
|
||||||
- don't annex `.gitignore` and `.gitattributes` files **done**
|
- don't annex `.gitignore` and `.gitattributes` files **done**
|
||||||
- run as a daemon **done**
|
- run as a daemon **done**
|
||||||
|
- A process has a file open for write, another one closes it,
|
||||||
|
and so it's added. Then the first process modifies it.
|
||||||
|
|
||||||
|
Or, a process has a file open for write when `git annex watch` starts
|
||||||
|
up, it will be added to the annex. If the process later continues
|
||||||
|
writing, it will change content in the annex.
|
||||||
|
|
||||||
|
This changes content in the annex, and fsck will later catch
|
||||||
|
the inconsistency.
|
||||||
|
|
||||||
|
Possible fixes:
|
||||||
|
|
||||||
|
* Somehow track or detect if a file is open for write by any processes.
|
||||||
|
`lsof` could be used, although it would be a little slow.
|
||||||
|
|
||||||
|
Here's one way to avoid the slowdown: When a file is being added,
|
||||||
|
set it read-only, and hard-link it into a quarantine directory,
|
||||||
|
remembering both filenames.
|
||||||
|
Then use the batch change mode code to detect batch adds and bundle
|
||||||
|
them together.
|
||||||
|
Just before committing, lsof the quarantine directory. Any files in
|
||||||
|
it that are still open for write can just have their write bit turned
|
||||||
|
back on and be deleted from quarantine, to be handled when their writer
|
||||||
|
closes. Files that pass quarantine get added as usual. This avoids
|
||||||
|
repeated lsof calls slowing down adds, but does add a constant factor
|
||||||
|
overhead (0.25 seconds lsof call) before any add gets committed. **done**
|
||||||
|
|
||||||
|
* Or, when possible, making a copy on write copy before adding the file
|
||||||
|
would avoid this.
|
||||||
|
* Or, as a last resort, make an expensive copy of the file and add that.
|
||||||
|
* Tracking file opens and closes with inotify could tell if any other
|
||||||
|
processes have the file open. But there are problems.. It doesn't
|
||||||
|
seem to differentiate between files opened for read and for write.
|
||||||
|
And there would still be a race after the last close and before it's
|
||||||
|
injected into the annex, where it could be opened for write again.
|
||||||
|
Would need to detect that and undo the annex injection or something.
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue