Merge branch 'master' into watch
This commit is contained in:
commit
bf3339e5b7
6 changed files with 177 additions and 37 deletions
54
doc/design/assistant/blog/day_10__lsof.mdwn
Normal file
54
doc/design/assistant/blog/day_10__lsof.mdwn
Normal file
|
@ -0,0 +1,54 @@
|
|||
A rather frustrating and long day coding went like this:
|
||||
|
||||
## 1-3 pm
|
||||
|
||||
Wrote a single function, of which all any Haskell programmer needs to know
|
||||
is its type signature:
|
||||
|
||||
Lsof.queryDir :: FilePath -> IO [(FilePath, LsofOpenMode, ProcessInfo)]
|
||||
|
||||
When I'm spending another hour or two taking a unix utility like lsof and
|
||||
parsing its output, which in this case is in a rather complicated
|
||||
machine-parsable output format, I often wish unix streams were strongly
|
||||
typed, which would avoid this bother.
|
||||
|
||||
## 3-9 pm
|
||||
|
||||
Six hours spent making it defer annexing files until the commit thread
|
||||
wakes up and is about to make a commit. Why did it take so horribly long?
|
||||
Well, there were a number of complications, and some really bad bugs
|
||||
involving races that were hard to reproduce reliably enough to deal with.
|
||||
|
||||
In other words, I was lost in the weeds for a lot of those hours...
|
||||
|
||||
At one point, something glorious happened, and it was always making exactly
|
||||
one commit for batch mode modifications of a lot of files (like untarring
|
||||
them). Unfortunatly, I had to lose that gloriousness due to another
|
||||
potential race, which, while unlikely, would have made the program deadlock
|
||||
if it happened.
|
||||
|
||||
So, it's back to making 2 or 3 commits per batch mode change. I also have a
|
||||
buglet that causes sometimes a second empty commit after a file is added.
|
||||
I know why (the inotify event for the symlink gets in late,
|
||||
after the commit); will try to improve commit frequency later.
|
||||
|
||||
## 9-11 pm
|
||||
|
||||
Put the capstone on the day's work, by calling lsof on a directory full
|
||||
of hardlinks to the files that are about to be annexed, to check if any
|
||||
are still open for write.
|
||||
|
||||
This works great! Starting up `git annex watch` when processes have files
|
||||
open is no longer a problem, and even if you're evil enough to try having
|
||||
muliple processes open the same file, it will complain and not annex it
|
||||
until all the writers close it.
|
||||
|
||||
(Well, someone really evil could turn the write bit back on after git annex
|
||||
clears it, and open the file again, but then really evil people can do
|
||||
that to files in `.git/annex/objects` too, and they'll get their just
|
||||
deserts when `git annex fsck` runs. So, that's ok..)
|
||||
|
||||
----
|
||||
|
||||
Anyway, will beat on it more tomorrow, and if all is well, this will finally
|
||||
go out to the beta testers.
|
|
@ -0,0 +1,9 @@
|
|||
[[!comment format=mdwn
|
||||
username="http://dieter-be.myopenid.com/"
|
||||
nickname="dieter"
|
||||
subject="comment 1"
|
||||
date="2012-06-16T09:14:26Z"
|
||||
content="""
|
||||
maybe at some point, your tool could show \"warning, the following files are still open and are hence not being annexed\"
|
||||
to avoid any nasty surprises of a file not being annexed and the user not realizing it.
|
||||
"""]]
|
|
@ -7,43 +7,6 @@ There is a `watch` branch in git that adds the command.
|
|||
|
||||
## known bugs
|
||||
|
||||
* A process has a file open for write, another one closes it,
|
||||
and so it's added. Then the first process modifies it.
|
||||
|
||||
Or, a process has a file open for write when `git annex watch` starts
|
||||
up, it will be added to the annex. If the process later continues
|
||||
writing, it will change content in the annex.
|
||||
|
||||
This changes content in the annex, and fsck will later catch
|
||||
the inconsistency.
|
||||
|
||||
Possible fixes:
|
||||
|
||||
* Somehow track or detect if a file is open for write by any processes.
|
||||
`lsof` could be used, although it would be a little slow.
|
||||
|
||||
Here's one way to avoid the slowdown: When a file is being added,
|
||||
set it read-only, and hard-link it into a quarantine directory,
|
||||
remembering both filenames.
|
||||
Then use the batch change mode code to detect batch adds and bundle
|
||||
them together.
|
||||
Just before committing, lsof the quarantine directory. Any files in
|
||||
it that are still open for write can just have their write bit turned
|
||||
back on and be deleted from quarantine, to be handled when their writer
|
||||
closes. Files that pass quarantine get added as usual. This avoids
|
||||
repeated lsof calls slowing down adds, but does add a constant factor
|
||||
overhead (0.25 seconds lsof call) before any add gets committed.
|
||||
|
||||
* Or, when possible, making a copy on write copy before adding the file
|
||||
would avoid this.
|
||||
* Or, as a last resort, make an expensive copy of the file and add that.
|
||||
* Tracking file opens and closes with inotify could tell if any other
|
||||
processes have the file open. But there are problems.. It doesn't
|
||||
seem to differentiate between files opened for read and for write.
|
||||
And there would still be a race after the last close and before it's
|
||||
injected into the annex, where it could be opened for write again.
|
||||
Would need to detect that and undo the annex injection or something.
|
||||
|
||||
* If a file is checked into git as a normal file and gets modified
|
||||
(or merged, etc), it will be converted into an annexed file.
|
||||
See [[blog/day_7__bugfixes]]
|
||||
|
@ -54,6 +17,51 @@ There is a `watch` branch in git that adds the command.
|
|||
|
||||
I'd also like to support OSX and if possible the BSDs.
|
||||
|
||||
* kqueue ([haskell bindings](http://hackage.haskell.org/package/kqueue))
|
||||
is supported by FreeBSD, OSX, and other BSDs.
|
||||
|
||||
In kqueue, to watch for changes to a file, you have to have an open file
|
||||
descriptor to the file. This wouldn't scale.
|
||||
|
||||
Apparently, a directory can be watched, and events are generated when
|
||||
files are added/removed from it. You then have to scan to find which
|
||||
files changed. [example](https://developer.apple.com/library/mac/#samplecode/FileNotification/Listings/Main_c.html#//apple_ref/doc/uid/DTS10003143-Main_c-DontLinkElementID_3)
|
||||
|
||||
Gamin does the best it can with just kqueue, supplimented by polling.
|
||||
The source file `server/gam_kqueue.c` makes for interesting reading.
|
||||
Using gamin to do the heavy lifting is one option.
|
||||
([haskell bindings](http://hackage.haskell.org/package/hlibfam) for FAM;
|
||||
gamin shares the API)
|
||||
|
||||
kqueue does not seem to provide a way to tell when a file gets closed,
|
||||
only when it's initially created. Poses problems..
|
||||
|
||||
* [man page](http://www.freebsd.org/cgi/man.cgi?query=kqueue&apropos=0&sektion=0&format=html)
|
||||
* <https://github.com/gorakhargosh/watchdog/blob/master/src/watchdog/observers/kqueue.py> (good example program)
|
||||
|
||||
* hfsevents ([haskell bindings](http://hackage.haskell.org/package/hfsevents))
|
||||
is OSX specific.
|
||||
|
||||
Originally it was only directory level, and you were only told a
|
||||
directory had changed and not which file. Based on the haskell
|
||||
binding's code, from OSX 10.7.0, file level events were added.
|
||||
|
||||
This will be harder for me to develop for, since I don't have access to
|
||||
OSX machines..
|
||||
|
||||
hfsevents does not seem to provide a way to tell when a file gets closed,
|
||||
only when it's initially created. Poses problems..
|
||||
|
||||
* <https://developer.apple.com/library/mac/#documentation/Darwin/Conceptual/FSEvents_ProgGuide/Introduction/Introduction.html>
|
||||
* <http://pypi.python.org/pypi/MacFSEvents/0.2.8> (good example program)
|
||||
* <https://github.com/gorakhargosh/watchdog/blob/master/src/watchdog/observers/fsevents.py> (good example program)
|
||||
|
||||
* Windows has a Win32 ReadDirectoryChangesW, and perhaps other things.
|
||||
|
||||
## beyond Linux
|
||||
|
||||
I'd also like to support OSX and if possible the BSDs.
|
||||
|
||||
* kqueue ([haskell bindings](http://hackage.haskell.org/package/kqueue))
|
||||
is supported by FreeBSD, OSX, and other BSDs.
|
||||
|
||||
|
@ -171,3 +179,40 @@ Many races need to be dealt with by this code. Here are some of them.
|
|||
- coleasce related add/rm events for speed and less disk IO **done**
|
||||
- don't annex `.gitignore` and `.gitattributes` files **done**
|
||||
- run as a daemon **done**
|
||||
- A process has a file open for write, another one closes it,
|
||||
and so it's added. Then the first process modifies it.
|
||||
|
||||
Or, a process has a file open for write when `git annex watch` starts
|
||||
up, it will be added to the annex. If the process later continues
|
||||
writing, it will change content in the annex.
|
||||
|
||||
This changes content in the annex, and fsck will later catch
|
||||
the inconsistency.
|
||||
|
||||
Possible fixes:
|
||||
|
||||
* Somehow track or detect if a file is open for write by any processes.
|
||||
`lsof` could be used, although it would be a little slow.
|
||||
|
||||
Here's one way to avoid the slowdown: When a file is being added,
|
||||
set it read-only, and hard-link it into a quarantine directory,
|
||||
remembering both filenames.
|
||||
Then use the batch change mode code to detect batch adds and bundle
|
||||
them together.
|
||||
Just before committing, lsof the quarantine directory. Any files in
|
||||
it that are still open for write can just have their write bit turned
|
||||
back on and be deleted from quarantine, to be handled when their writer
|
||||
closes. Files that pass quarantine get added as usual. This avoids
|
||||
repeated lsof calls slowing down adds, but does add a constant factor
|
||||
overhead (0.25 seconds lsof call) before any add gets committed. **done**
|
||||
|
||||
* Or, when possible, making a copy on write copy before adding the file
|
||||
would avoid this.
|
||||
* Or, as a last resort, make an expensive copy of the file and add that.
|
||||
* Tracking file opens and closes with inotify could tell if any other
|
||||
processes have the file open. But there are problems.. It doesn't
|
||||
seem to differentiate between files opened for read and for write.
|
||||
And there would still be a race after the last close and before it's
|
||||
injected into the annex, where it could be opened for write again.
|
||||
Would need to detect that and undo the annex injection or something.
|
||||
|
||||
|
|
|
@ -0,0 +1,16 @@
|
|||
[[!comment format=mdwn
|
||||
username="http://joeyh.name/"
|
||||
ip="4.154.6.135"
|
||||
subject="comment 1"
|
||||
date="2012-06-15T19:25:59Z"
|
||||
content="""
|
||||
Sure, you can simply:
|
||||
|
||||
cp annexedfile ~
|
||||
|
||||
Or just attach the file right from the git repository to an email, like any other file. Should work fine.
|
||||
|
||||
If you wanted to copy a whole directory to export, you'd need to use the -L flag to make cp follow the symlinks and copy the real contents:
|
||||
|
||||
cp -r -L annexeddirectory /media/usbdrive/
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="http://denis.laxalde.org/"
|
||||
nickname="dlax"
|
||||
subject="nautilus"
|
||||
date="2012-06-15T19:57:31Z"
|
||||
content="""
|
||||
Ah! I was fooled by nautilus which is not able to properly handle symlinks when copying. It copies links instead of target [[!gnomebug 623580]].
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="http://joeyh.name/"
|
||||
ip="4.154.6.135"
|
||||
subject="comment 3"
|
||||
date="2012-06-16T03:26:37Z"
|
||||
content="""
|
||||
That nautilous behavior is a bad thing when trying to export files out, but it's a good thing when just moving files around inside your repository...
|
||||
"""]]
|
Loading…
Add table
Reference in a new issue