Merge branch 'master' into watch
This commit is contained in:
commit
bf3339e5b7
6 changed files with 177 additions and 37 deletions
54
doc/design/assistant/blog/day_10__lsof.mdwn
Normal file
54
doc/design/assistant/blog/day_10__lsof.mdwn
Normal file
|
@ -0,0 +1,54 @@
|
||||||
|
A rather frustrating and long day coding went like this:
|
||||||
|
|
||||||
|
## 1-3 pm
|
||||||
|
|
||||||
|
Wrote a single function, of which all any Haskell programmer needs to know
|
||||||
|
is its type signature:
|
||||||
|
|
||||||
|
Lsof.queryDir :: FilePath -> IO [(FilePath, LsofOpenMode, ProcessInfo)]
|
||||||
|
|
||||||
|
When I'm spending another hour or two taking a unix utility like lsof and
|
||||||
|
parsing its output, which in this case is in a rather complicated
|
||||||
|
machine-parsable output format, I often wish unix streams were strongly
|
||||||
|
typed, which would avoid this bother.
|
||||||
|
|
||||||
|
## 3-9 pm
|
||||||
|
|
||||||
|
Six hours spent making it defer annexing files until the commit thread
|
||||||
|
wakes up and is about to make a commit. Why did it take so horribly long?
|
||||||
|
Well, there were a number of complications, and some really bad bugs
|
||||||
|
involving races that were hard to reproduce reliably enough to deal with.
|
||||||
|
|
||||||
|
In other words, I was lost in the weeds for a lot of those hours...
|
||||||
|
|
||||||
|
At one point, something glorious happened, and it was always making exactly
|
||||||
|
one commit for batch mode modifications of a lot of files (like untarring
|
||||||
|
them). Unfortunatly, I had to lose that gloriousness due to another
|
||||||
|
potential race, which, while unlikely, would have made the program deadlock
|
||||||
|
if it happened.
|
||||||
|
|
||||||
|
So, it's back to making 2 or 3 commits per batch mode change. I also have a
|
||||||
|
buglet that causes sometimes a second empty commit after a file is added.
|
||||||
|
I know why (the inotify event for the symlink gets in late,
|
||||||
|
after the commit); will try to improve commit frequency later.
|
||||||
|
|
||||||
|
## 9-11 pm
|
||||||
|
|
||||||
|
Put the capstone on the day's work, by calling lsof on a directory full
|
||||||
|
of hardlinks to the files that are about to be annexed, to check if any
|
||||||
|
are still open for write.
|
||||||
|
|
||||||
|
This works great! Starting up `git annex watch` when processes have files
|
||||||
|
open is no longer a problem, and even if you're evil enough to try having
|
||||||
|
muliple processes open the same file, it will complain and not annex it
|
||||||
|
until all the writers close it.
|
||||||
|
|
||||||
|
(Well, someone really evil could turn the write bit back on after git annex
|
||||||
|
clears it, and open the file again, but then really evil people can do
|
||||||
|
that to files in `.git/annex/objects` too, and they'll get their just
|
||||||
|
deserts when `git annex fsck` runs. So, that's ok..)
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
Anyway, will beat on it more tomorrow, and if all is well, this will finally
|
||||||
|
go out to the beta testers.
|
|
@ -0,0 +1,9 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="http://dieter-be.myopenid.com/"
|
||||||
|
nickname="dieter"
|
||||||
|
subject="comment 1"
|
||||||
|
date="2012-06-16T09:14:26Z"
|
||||||
|
content="""
|
||||||
|
maybe at some point, your tool could show \"warning, the following files are still open and are hence not being annexed\"
|
||||||
|
to avoid any nasty surprises of a file not being annexed and the user not realizing it.
|
||||||
|
"""]]
|
|
@ -7,43 +7,6 @@ There is a `watch` branch in git that adds the command.
|
||||||
|
|
||||||
## known bugs
|
## known bugs
|
||||||
|
|
||||||
* A process has a file open for write, another one closes it,
|
|
||||||
and so it's added. Then the first process modifies it.
|
|
||||||
|
|
||||||
Or, a process has a file open for write when `git annex watch` starts
|
|
||||||
up, it will be added to the annex. If the process later continues
|
|
||||||
writing, it will change content in the annex.
|
|
||||||
|
|
||||||
This changes content in the annex, and fsck will later catch
|
|
||||||
the inconsistency.
|
|
||||||
|
|
||||||
Possible fixes:
|
|
||||||
|
|
||||||
* Somehow track or detect if a file is open for write by any processes.
|
|
||||||
`lsof` could be used, although it would be a little slow.
|
|
||||||
|
|
||||||
Here's one way to avoid the slowdown: When a file is being added,
|
|
||||||
set it read-only, and hard-link it into a quarantine directory,
|
|
||||||
remembering both filenames.
|
|
||||||
Then use the batch change mode code to detect batch adds and bundle
|
|
||||||
them together.
|
|
||||||
Just before committing, lsof the quarantine directory. Any files in
|
|
||||||
it that are still open for write can just have their write bit turned
|
|
||||||
back on and be deleted from quarantine, to be handled when their writer
|
|
||||||
closes. Files that pass quarantine get added as usual. This avoids
|
|
||||||
repeated lsof calls slowing down adds, but does add a constant factor
|
|
||||||
overhead (0.25 seconds lsof call) before any add gets committed.
|
|
||||||
|
|
||||||
* Or, when possible, making a copy on write copy before adding the file
|
|
||||||
would avoid this.
|
|
||||||
* Or, as a last resort, make an expensive copy of the file and add that.
|
|
||||||
* Tracking file opens and closes with inotify could tell if any other
|
|
||||||
processes have the file open. But there are problems.. It doesn't
|
|
||||||
seem to differentiate between files opened for read and for write.
|
|
||||||
And there would still be a race after the last close and before it's
|
|
||||||
injected into the annex, where it could be opened for write again.
|
|
||||||
Would need to detect that and undo the annex injection or something.
|
|
||||||
|
|
||||||
* If a file is checked into git as a normal file and gets modified
|
* If a file is checked into git as a normal file and gets modified
|
||||||
(or merged, etc), it will be converted into an annexed file.
|
(or merged, etc), it will be converted into an annexed file.
|
||||||
See [[blog/day_7__bugfixes]]
|
See [[blog/day_7__bugfixes]]
|
||||||
|
@ -54,6 +17,51 @@ There is a `watch` branch in git that adds the command.
|
||||||
|
|
||||||
I'd also like to support OSX and if possible the BSDs.
|
I'd also like to support OSX and if possible the BSDs.
|
||||||
|
|
||||||
|
* kqueue ([haskell bindings](http://hackage.haskell.org/package/kqueue))
|
||||||
|
is supported by FreeBSD, OSX, and other BSDs.
|
||||||
|
|
||||||
|
In kqueue, to watch for changes to a file, you have to have an open file
|
||||||
|
descriptor to the file. This wouldn't scale.
|
||||||
|
|
||||||
|
Apparently, a directory can be watched, and events are generated when
|
||||||
|
files are added/removed from it. You then have to scan to find which
|
||||||
|
files changed. [example](https://developer.apple.com/library/mac/#samplecode/FileNotification/Listings/Main_c.html#//apple_ref/doc/uid/DTS10003143-Main_c-DontLinkElementID_3)
|
||||||
|
|
||||||
|
Gamin does the best it can with just kqueue, supplimented by polling.
|
||||||
|
The source file `server/gam_kqueue.c` makes for interesting reading.
|
||||||
|
Using gamin to do the heavy lifting is one option.
|
||||||
|
([haskell bindings](http://hackage.haskell.org/package/hlibfam) for FAM;
|
||||||
|
gamin shares the API)
|
||||||
|
|
||||||
|
kqueue does not seem to provide a way to tell when a file gets closed,
|
||||||
|
only when it's initially created. Poses problems..
|
||||||
|
|
||||||
|
* [man page](http://www.freebsd.org/cgi/man.cgi?query=kqueue&apropos=0&sektion=0&format=html)
|
||||||
|
* <https://github.com/gorakhargosh/watchdog/blob/master/src/watchdog/observers/kqueue.py> (good example program)
|
||||||
|
|
||||||
|
* hfsevents ([haskell bindings](http://hackage.haskell.org/package/hfsevents))
|
||||||
|
is OSX specific.
|
||||||
|
|
||||||
|
Originally it was only directory level, and you were only told a
|
||||||
|
directory had changed and not which file. Based on the haskell
|
||||||
|
binding's code, from OSX 10.7.0, file level events were added.
|
||||||
|
|
||||||
|
This will be harder for me to develop for, since I don't have access to
|
||||||
|
OSX machines..
|
||||||
|
|
||||||
|
hfsevents does not seem to provide a way to tell when a file gets closed,
|
||||||
|
only when it's initially created. Poses problems..
|
||||||
|
|
||||||
|
* <https://developer.apple.com/library/mac/#documentation/Darwin/Conceptual/FSEvents_ProgGuide/Introduction/Introduction.html>
|
||||||
|
* <http://pypi.python.org/pypi/MacFSEvents/0.2.8> (good example program)
|
||||||
|
* <https://github.com/gorakhargosh/watchdog/blob/master/src/watchdog/observers/fsevents.py> (good example program)
|
||||||
|
|
||||||
|
* Windows has a Win32 ReadDirectoryChangesW, and perhaps other things.
|
||||||
|
|
||||||
|
## beyond Linux
|
||||||
|
|
||||||
|
I'd also like to support OSX and if possible the BSDs.
|
||||||
|
|
||||||
* kqueue ([haskell bindings](http://hackage.haskell.org/package/kqueue))
|
* kqueue ([haskell bindings](http://hackage.haskell.org/package/kqueue))
|
||||||
is supported by FreeBSD, OSX, and other BSDs.
|
is supported by FreeBSD, OSX, and other BSDs.
|
||||||
|
|
||||||
|
@ -171,3 +179,40 @@ Many races need to be dealt with by this code. Here are some of them.
|
||||||
- coleasce related add/rm events for speed and less disk IO **done**
|
- coleasce related add/rm events for speed and less disk IO **done**
|
||||||
- don't annex `.gitignore` and `.gitattributes` files **done**
|
- don't annex `.gitignore` and `.gitattributes` files **done**
|
||||||
- run as a daemon **done**
|
- run as a daemon **done**
|
||||||
|
- A process has a file open for write, another one closes it,
|
||||||
|
and so it's added. Then the first process modifies it.
|
||||||
|
|
||||||
|
Or, a process has a file open for write when `git annex watch` starts
|
||||||
|
up, it will be added to the annex. If the process later continues
|
||||||
|
writing, it will change content in the annex.
|
||||||
|
|
||||||
|
This changes content in the annex, and fsck will later catch
|
||||||
|
the inconsistency.
|
||||||
|
|
||||||
|
Possible fixes:
|
||||||
|
|
||||||
|
* Somehow track or detect if a file is open for write by any processes.
|
||||||
|
`lsof` could be used, although it would be a little slow.
|
||||||
|
|
||||||
|
Here's one way to avoid the slowdown: When a file is being added,
|
||||||
|
set it read-only, and hard-link it into a quarantine directory,
|
||||||
|
remembering both filenames.
|
||||||
|
Then use the batch change mode code to detect batch adds and bundle
|
||||||
|
them together.
|
||||||
|
Just before committing, lsof the quarantine directory. Any files in
|
||||||
|
it that are still open for write can just have their write bit turned
|
||||||
|
back on and be deleted from quarantine, to be handled when their writer
|
||||||
|
closes. Files that pass quarantine get added as usual. This avoids
|
||||||
|
repeated lsof calls slowing down adds, but does add a constant factor
|
||||||
|
overhead (0.25 seconds lsof call) before any add gets committed. **done**
|
||||||
|
|
||||||
|
* Or, when possible, making a copy on write copy before adding the file
|
||||||
|
would avoid this.
|
||||||
|
* Or, as a last resort, make an expensive copy of the file and add that.
|
||||||
|
* Tracking file opens and closes with inotify could tell if any other
|
||||||
|
processes have the file open. But there are problems.. It doesn't
|
||||||
|
seem to differentiate between files opened for read and for write.
|
||||||
|
And there would still be a race after the last close and before it's
|
||||||
|
injected into the annex, where it could be opened for write again.
|
||||||
|
Would need to detect that and undo the annex injection or something.
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,16 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="http://joeyh.name/"
|
||||||
|
ip="4.154.6.135"
|
||||||
|
subject="comment 1"
|
||||||
|
date="2012-06-15T19:25:59Z"
|
||||||
|
content="""
|
||||||
|
Sure, you can simply:
|
||||||
|
|
||||||
|
cp annexedfile ~
|
||||||
|
|
||||||
|
Or just attach the file right from the git repository to an email, like any other file. Should work fine.
|
||||||
|
|
||||||
|
If you wanted to copy a whole directory to export, you'd need to use the -L flag to make cp follow the symlinks and copy the real contents:
|
||||||
|
|
||||||
|
cp -r -L annexeddirectory /media/usbdrive/
|
||||||
|
"""]]
|
|
@ -0,0 +1,8 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="http://denis.laxalde.org/"
|
||||||
|
nickname="dlax"
|
||||||
|
subject="nautilus"
|
||||||
|
date="2012-06-15T19:57:31Z"
|
||||||
|
content="""
|
||||||
|
Ah! I was fooled by nautilus which is not able to properly handle symlinks when copying. It copies links instead of target [[!gnomebug 623580]].
|
||||||
|
"""]]
|
|
@ -0,0 +1,8 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="http://joeyh.name/"
|
||||||
|
ip="4.154.6.135"
|
||||||
|
subject="comment 3"
|
||||||
|
date="2012-06-16T03:26:37Z"
|
||||||
|
content="""
|
||||||
|
That nautilous behavior is a bad thing when trying to export files out, but it's a good thing when just moving files around inside your repository...
|
||||||
|
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue