From 3a05b66cf9197ab3fcac7753f38d1d92429f6f16 Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Fri, 15 Jun 2012 19:25:59 +0000 Subject: [PATCH 1/8] Added a comment --- ...t_1_e08e4c79588e17fb2f1cdf53d9fab7ea._comment | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) create mode 100644 doc/forum/exporting_annexed_files/comment_1_e08e4c79588e17fb2f1cdf53d9fab7ea._comment diff --git a/doc/forum/exporting_annexed_files/comment_1_e08e4c79588e17fb2f1cdf53d9fab7ea._comment b/doc/forum/exporting_annexed_files/comment_1_e08e4c79588e17fb2f1cdf53d9fab7ea._comment new file mode 100644 index 0000000000..69fc46245f --- /dev/null +++ b/doc/forum/exporting_annexed_files/comment_1_e08e4c79588e17fb2f1cdf53d9fab7ea._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="4.154.6.135" + subject="comment 1" + date="2012-06-15T19:25:59Z" + content=""" +Sure, you can simply: + + cp annexedfile ~ + +Or just attach the file right from the git repository to an email, like any other file. Should work fine. + +If you wanted to copy a whole directory to export, you'd need to use the -L flag to make cp follow the symlinks and copy the real contents: + + cp -r -L annexeddirectory /media/usbdrive/ +"""]] From 0641d6053321b1ef3049554929065402e3d25f6d Mon Sep 17 00:00:00 2001 From: "http://denis.laxalde.org/" Date: Fri, 15 Jun 2012 19:57:31 +0000 Subject: [PATCH 2/8] Added a comment: nautilus --- .../comment_2_15dc3024417b5b2ff3544a08beacab34._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/forum/exporting_annexed_files/comment_2_15dc3024417b5b2ff3544a08beacab34._comment diff --git a/doc/forum/exporting_annexed_files/comment_2_15dc3024417b5b2ff3544a08beacab34._comment b/doc/forum/exporting_annexed_files/comment_2_15dc3024417b5b2ff3544a08beacab34._comment new file mode 100644 index 0000000000..3621f9b895 --- /dev/null +++ b/doc/forum/exporting_annexed_files/comment_2_15dc3024417b5b2ff3544a08beacab34._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://denis.laxalde.org/" + nickname="dlax" + subject="nautilus" + date="2012-06-15T19:57:31Z" + content=""" +Ah! I was fooled by nautilus which is not able to properly handle symlinks when copying. It copies links instead of target [[!gnomebug 623580]]. +"""]] From bd8319e78ccf13d52ecf14b4f0c86ebf141671ab Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 15 Jun 2012 22:59:32 -0400 Subject: [PATCH 3/8] update and blog for the day the last of the bad bugs is fixed! --- doc/design/assistant/blog/day_10__lsof.mdwn | 54 +++++++++++++++ doc/design/assistant/inotify.mdwn | 74 ++++++++++----------- 2 files changed, 91 insertions(+), 37 deletions(-) create mode 100644 doc/design/assistant/blog/day_10__lsof.mdwn diff --git a/doc/design/assistant/blog/day_10__lsof.mdwn b/doc/design/assistant/blog/day_10__lsof.mdwn new file mode 100644 index 0000000000..32b6705714 --- /dev/null +++ b/doc/design/assistant/blog/day_10__lsof.mdwn @@ -0,0 +1,54 @@ +A rather frustrating and long day coding went like this: + +## 1-3 pm + +Wrote a single function, of which all any Haskell programmer needs to know +is its type signature: + + Lsof.queryDir :: FilePath -> IO [(FilePath, LsofOpenMode, ProcessInfo)] + +When I'm spending another hour or two taking a unix utility like lsof and +parsing its output, which in this case is in a rather complicated +machine-parsable output format, I often wish unix streams were strongly +typed, which would avoid this bother. + +## 3-9 pm + +Six hours spent making it defer annexing files until the commit thread +wakes up and is about to make a commit. Why did it take so horribly long? +Well, there were a number of complications, and some really bad bugs +involving races that were hard to reproduce reliably enough to deal with. + +In other words, I was lost in the weeds for a lot of those hours... + +At one point, something glorious happened, and it was always making exactly +one commit for batch mode modifications of a lot of files (like untarring +them). Unfortunatly, I had to lose that gloriousness due to another +potential race, which, while unlikely, would have made the program deadlock +if it happened. + +So, it's back to making 2 or 3 commits per batch mode change. I also have a +buglet that causes sometimes a second empty commit after a file is added. +I know why (the inotify event for the symlink gets in late, +after the commit); will try to improve commit frequency later. + +## 9-11 pm + +Put the capstone on the day's work, by calling lsof on a directory full +of hardlinks to the files that are about to be annexed, to check if any +are still open for write. + +This works great! Starting up `git annex watch` when processes have files +open is no longer a problem, and even if you're evil enough to try having +muliple processes open the same file, it will complain and not annex it +until all the writers close it. + +(Well, someone really evil could turn the write bit back on after git annex +clears it, and open the file again, but then really evil people can do +that to files in `.git/annex/objects` too, and they'll get their just +deserts when `git annex fsck` runs. So, that's ok..) + +---- + +Anyway, will beat on it more tomorrow, and if all is well, this will finally +go out to the beta testers. diff --git a/doc/design/assistant/inotify.mdwn b/doc/design/assistant/inotify.mdwn index baa420b4e9..0b0eb430c0 100644 --- a/doc/design/assistant/inotify.mdwn +++ b/doc/design/assistant/inotify.mdwn @@ -5,43 +5,6 @@ There is a `watch` branch in git that adds the command. ## known bugs -* A process has a file open for write, another one closes it, - and so it's added. Then the first process modifies it. - - Or, a process has a file open for write when `git annex watch` starts - up, it will be added to the annex. If the process later continues - writing, it will change content in the annex. - - This changes content in the annex, and fsck will later catch - the inconsistency. - - Possible fixes: - - * Somehow track or detect if a file is open for write by any processes. - `lsof` could be used, although it would be a little slow. - - Here's one way to avoid the slowdown: When a file is being added, - set it read-only, and hard-link it into a quarantine directory, - remembering both filenames. - Then use the batch change mode code to detect batch adds and bundle - them together. - Just before committing, lsof the quarantine directory. Any files in - it that are still open for write can just have their write bit turned - back on and be deleted from quarantine, to be handled when their writer - closes. Files that pass quarantine get added as usual. This avoids - repeated lsof calls slowing down adds, but does add a constant factor - overhead (0.25 seconds lsof call) before any add gets committed. - - * Or, when possible, making a copy on write copy before adding the file - would avoid this. - * Or, as a last resort, make an expensive copy of the file and add that. - * Tracking file opens and closes with inotify could tell if any other - processes have the file open. But there are problems.. It doesn't - seem to differentiate between files opened for read and for write. - And there would still be a race after the last close and before it's - injected into the annex, where it could be opened for write again. - Would need to detect that and undo the annex injection or something. - * If a file is checked into git as a normal file and gets modified (or merged, etc), it will be converted into an annexed file. See [[blog/day_7__bugfixes]] @@ -140,3 +103,40 @@ Many races need to be dealt with by this code. Here are some of them. - coleasce related add/rm events for speed and less disk IO **done** - don't annex `.gitignore` and `.gitattributes` files **done** - run as a daemon **done** +- A process has a file open for write, another one closes it, + and so it's added. Then the first process modifies it. + + Or, a process has a file open for write when `git annex watch` starts + up, it will be added to the annex. If the process later continues + writing, it will change content in the annex. + + This changes content in the annex, and fsck will later catch + the inconsistency. + + Possible fixes: + + * Somehow track or detect if a file is open for write by any processes. + `lsof` could be used, although it would be a little slow. + + Here's one way to avoid the slowdown: When a file is being added, + set it read-only, and hard-link it into a quarantine directory, + remembering both filenames. + Then use the batch change mode code to detect batch adds and bundle + them together. + Just before committing, lsof the quarantine directory. Any files in + it that are still open for write can just have their write bit turned + back on and be deleted from quarantine, to be handled when their writer + closes. Files that pass quarantine get added as usual. This avoids + repeated lsof calls slowing down adds, but does add a constant factor + overhead (0.25 seconds lsof call) before any add gets committed. **done** + + * Or, when possible, making a copy on write copy before adding the file + would avoid this. + * Or, as a last resort, make an expensive copy of the file and add that. + * Tracking file opens and closes with inotify could tell if any other + processes have the file open. But there are problems.. It doesn't + seem to differentiate between files opened for read and for write. + And there would still be a race after the last close and before it's + injected into the annex, where it could be opened for write again. + Would need to detect that and undo the annex injection or something. + From eaf4cbec541673c25cfbb2104eb67ac5818de1b1 Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Sat, 16 Jun 2012 03:26:37 +0000 Subject: [PATCH 4/8] Added a comment --- .../comment_3_86f0e0f767a84a0f583e121d36cb7d48._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/forum/exporting_annexed_files/comment_3_86f0e0f767a84a0f583e121d36cb7d48._comment diff --git a/doc/forum/exporting_annexed_files/comment_3_86f0e0f767a84a0f583e121d36cb7d48._comment b/doc/forum/exporting_annexed_files/comment_3_86f0e0f767a84a0f583e121d36cb7d48._comment new file mode 100644 index 0000000000..db6f90d881 --- /dev/null +++ b/doc/forum/exporting_annexed_files/comment_3_86f0e0f767a84a0f583e121d36cb7d48._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="4.154.6.135" + subject="comment 3" + date="2012-06-16T03:26:37Z" + content=""" +That nautilous behavior is a bad thing when trying to export files out, but it's a good thing when just moving files around inside your repository... +"""]] From 2a9c5ebfaa16799c2d45555be6414f4a45b893e7 Mon Sep 17 00:00:00 2001 From: "http://dieter-be.myopenid.com/" Date: Sat, 16 Jun 2012 09:14:27 +0000 Subject: [PATCH 5/8] Added a comment --- .../comment_1_9b8c28c85c979f32e5c295b6a03c048e._comment | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 doc/design/assistant/blog/day_10__lsof/comment_1_9b8c28c85c979f32e5c295b6a03c048e._comment diff --git a/doc/design/assistant/blog/day_10__lsof/comment_1_9b8c28c85c979f32e5c295b6a03c048e._comment b/doc/design/assistant/blog/day_10__lsof/comment_1_9b8c28c85c979f32e5c295b6a03c048e._comment new file mode 100644 index 0000000000..9d970da22e --- /dev/null +++ b/doc/design/assistant/blog/day_10__lsof/comment_1_9b8c28c85c979f32e5c295b6a03c048e._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="http://dieter-be.myopenid.com/" + nickname="dieter" + subject="comment 1" + date="2012-06-16T09:14:26Z" + content=""" +maybe at some point, your tool could show \"warning, the following files are still open and are hence not being annexed\" +to avoid any nasty surprises of a file not being annexed and the user not realizing it. +"""]] From da261b31de349f99e518f590017f3a10078bc6dd Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 17 Jun 2012 01:25:48 -0400 Subject: [PATCH 6/8] surveyed the OSX and BSD options for file monitoring --- doc/design/assistant/inotify.mdwn | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/doc/design/assistant/inotify.mdwn b/doc/design/assistant/inotify.mdwn index 0b0eb430c0..2cd6654488 100644 --- a/doc/design/assistant/inotify.mdwn +++ b/doc/design/assistant/inotify.mdwn @@ -3,6 +3,8 @@ inotify for changes, and automatically annexing new files, etc. There is a `watch` branch in git that adds the command. +[[!toc]] + ## known bugs * If a file is checked into git as a normal file and gets modified @@ -11,6 +13,35 @@ There is a `watch` branch in git that adds the command. * When you `git annex unlock` a file, it will immediately be re-locked. +## beyond Linux + +I'd also like to support OSX and if possible the BSDs. + +* kqueue ([haskell bindings](http://hackage.haskell.org/package/kqueue)) + is supported by FreeBSD, OSX, and other BSDs. + + From what I can find, kqueue does not provide full directory watching + capabilities. To watch a file, you have to have an open file descriptor + to the file. This wouldn't scale. + + Gamin does the best it can with just kqueue, supplimented by polling. + The source file `server/gam_kqueue.c` makes for interesting reading. + Using gamin to do the heavy lifting is one option. + ([haskell bindings](http://hackage.haskell.org/package/hlibfam) for FAM; + gamin shares the API) + +* hfsevents ([haskell bindings](http://hackage.haskell.org/package/hfsevents)) + is OSX specific. + + Originally it was only directory level, and you were only told a + directory had changed and not which file. Based on the haskell + binding's code, from OSX 10.7.0, file level events were added. + + This will be harder for me to develop for, since I don't have access to + OSX machines.. + +* Windows has a Win32 ReadDirectoryChangesW, and perhaps other things. + ## todo - Support OSes other than Linux; it only uses inotify currently. From 31c15aa9b9285733c6874bf5c7a89fccdd01b5d2 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 17 Jun 2012 01:34:10 -0400 Subject: [PATCH 7/8] update --- doc/design/assistant/inotify.mdwn | 3 +++ 1 file changed, 3 insertions(+) diff --git a/doc/design/assistant/inotify.mdwn b/doc/design/assistant/inotify.mdwn index 2cd6654488..e549c4d9d9 100644 --- a/doc/design/assistant/inotify.mdwn +++ b/doc/design/assistant/inotify.mdwn @@ -40,6 +40,9 @@ I'd also like to support OSX and if possible the BSDs. This will be harder for me to develop for, since I don't have access to OSX machines.. + [This perl module](http://search.cpan.org/~agrundma/Mac-FSEvents-0.02/lib/Mac/FSEvents/Event.pm) + has the best description I've found of what the flags mean. + * Windows has a Win32 ReadDirectoryChangesW, and perhaps other things. ## todo From ec197feec062c59760a931aafb5d3087b921999a Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 17 Jun 2012 02:08:28 -0400 Subject: [PATCH 8/8] update --- doc/design/assistant/inotify.mdwn | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/doc/design/assistant/inotify.mdwn b/doc/design/assistant/inotify.mdwn index e549c4d9d9..cae6298094 100644 --- a/doc/design/assistant/inotify.mdwn +++ b/doc/design/assistant/inotify.mdwn @@ -20,9 +20,12 @@ I'd also like to support OSX and if possible the BSDs. * kqueue ([haskell bindings](http://hackage.haskell.org/package/kqueue)) is supported by FreeBSD, OSX, and other BSDs. - From what I can find, kqueue does not provide full directory watching - capabilities. To watch a file, you have to have an open file descriptor - to the file. This wouldn't scale. + In kqueue, to watch for changes to a file, you have to have an open file + descriptor to the file. This wouldn't scale. + + Apparently, a directory can be watched, and events are generated when + files are added/removed from it. You then have to scan to find which + files changed. [example](https://developer.apple.com/library/mac/#samplecode/FileNotification/Listings/Main_c.html#//apple_ref/doc/uid/DTS10003143-Main_c-DontLinkElementID_3) Gamin does the best it can with just kqueue, supplimented by polling. The source file `server/gam_kqueue.c` makes for interesting reading. @@ -30,6 +33,12 @@ I'd also like to support OSX and if possible the BSDs. ([haskell bindings](http://hackage.haskell.org/package/hlibfam) for FAM; gamin shares the API) + kqueue does not seem to provide a way to tell when a file gets closed, + only when it's initially created. Poses problems.. + + * [man page](http://www.freebsd.org/cgi/man.cgi?query=kqueue&apropos=0&sektion=0&format=html) + * (good example program) + * hfsevents ([haskell bindings](http://hackage.haskell.org/package/hfsevents)) is OSX specific. @@ -40,8 +49,12 @@ I'd also like to support OSX and if possible the BSDs. This will be harder for me to develop for, since I don't have access to OSX machines.. - [This perl module](http://search.cpan.org/~agrundma/Mac-FSEvents-0.02/lib/Mac/FSEvents/Event.pm) - has the best description I've found of what the flags mean. + hfsevents does not seem to provide a way to tell when a file gets closed, + only when it's initially created. Poses problems.. + + * + * (good example program) + * (good example program) * Windows has a Win32 ReadDirectoryChangesW, and perhaps other things.