From 1f09ae686ef35f8fd2d973754f8e1efd99161f4a Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Wed, 27 Jun 2012 21:11:39 -0400 Subject: [PATCH 01/23] update --- doc/design/assistant/inotify.mdwn | 11 +++++++++++ doc/design/assistant/syncing.mdwn | 2 +- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/doc/design/assistant/inotify.mdwn b/doc/design/assistant/inotify.mdwn index 47b8c84a34..f783f9a7df 100644 --- a/doc/design/assistant/inotify.mdwn +++ b/doc/design/assistant/inotify.mdwn @@ -125,6 +125,17 @@ Many races need to be dealt with by this code. Here are some of them. Not a problem; The removal event removes the old file from the index, and the add event adds the new one. +* Symlink appears, but is then deleted before it can be processed. + + Leads to an ugly message, otherwise no problem: + + ./me: readSymbolicLink: does not exist (No such file or directory) + + Here `me` is a file that was in a conflicted merge, which got + removed as part of the resolution. This is probably coming from the watcher + thread, which sees the newly added symlink (created by the git merge), + but finds it deleted (by the conflict resolver) by the time it processes it. + ## done - on startup, add any files that have appeared since last run **done** diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index 3e90e6b105..8b681ac100 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -17,7 +17,7 @@ all the other git clones, at both the git level and the key/value level. 1. Also, detect if a push failed due to not being up-to-date, pull, and repush. **done** 2. Use a git merge driver that adds both conflicting files, - so conflicts never break a sync. + so conflicts never break a sync. **done** 3. Investigate the XMPP approach like dvcs-autosync does, or other ways of signaling a change out of band. 4. Add a hook, so when there's a change to sync, a program can be run From b2327f04c6484d47ad2bf7194934af956fe9e953 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawmURXBzaYE1gmVc-X9eLAyDat_6rHPl670" Date: Thu, 28 Jun 2012 12:37:29 +0000 Subject: [PATCH 02/23] --- doc/bugs/watcher_commits_unlocked_files.mdwn | 43 ++++++++++++++++++++ 1 file changed, 43 insertions(+) create mode 100644 doc/bugs/watcher_commits_unlocked_files.mdwn diff --git a/doc/bugs/watcher_commits_unlocked_files.mdwn b/doc/bugs/watcher_commits_unlocked_files.mdwn new file mode 100644 index 0000000000..b807593763 --- /dev/null +++ b/doc/bugs/watcher_commits_unlocked_files.mdwn @@ -0,0 +1,43 @@ +When having "git annex watch" running, unlocking files causes the watcher to immediately lock/commit them. Observe: + + bram@falafel% git annex unlock + unlock 01 - Crunchy Joe (featuring Sakhile Moleshe).flac (copying...) ok + unlock 02 - Get Busy Living (featuring Emily Bruce).flac (copying...) ok + unlock 03 - Show You How.flac (copying...) ok + unlock 04 - Call Me (featuring Monique Hellenberg).flac (copying...) ok + unlock 05 - Humbug (featuring Sakhile Moleshe).flac (copying...) ok + unlock 06 - Brush Your Hair.flac (copying...) ok + unlock 07 - We Come Together (featuring Sakhile Moleshe).flac (copying...) ok + unlock 08 - In Too Deep (featuring Emily Bruce).flac (copying...) ok + unlock 09 - My Rainbow.flac (copying...) ok + unlock 10 - Big Band Wolf.flac (copying...) ok + (Recording state in git...) + bram@falafel% ls -l 01\ -\ Crunchy\ Joe\ \(featuring\ Sakhile\ Moleshe\).flac + lrwxrwxrwx 1 bram bram 208 Jul 18 2011 01 - Crunchy Joe (featuring Sakhile Moleshe).flac -> ../../.git/annex/objects/KX/15/SHA256E-s23981083--5ffd30042e313f8e10cf51ded59c369dd03a600fa3b8c13962f833694af449b5.flac/SHA256E-s23981083--5ffd30042e313f8e10cf51ded59c369dd03a600fa3b8c13962f833694af449b5.flac + bram@falafel% tail ~/Media/.git/annex/daemon.log + add ./Uncategorized/Goldfish - Get Busy Living (2010)/04 - Call Me (featuring Monique Hellenberg).flac (checksum...) ok + add ./Uncategorized/Goldfish - Get Busy Living (2010)/03 - Show You How.flac (checksum...) ok + add ./Uncategorized/Goldfish - Get Busy Living (2010)/02 - Get Busy Living (featuring Emily Bruce).flac (checksum...) ok + add ./Uncategorized/Goldfish - Get Busy Living (2010)/10 - Big Band Wolf.flac (checksum...) ok + add ./Uncategorized/Goldfish - Get Busy Living (2010)/09 - My Rainbow.flac (checksum...) ok + add ./Uncategorized/Goldfish - Get Busy Living (2010)/08 - In Too Deep (featuring Emily Bruce).flac (checksum...) ok + add ./Uncategorized/Goldfish - Get Busy Living (2010)/07 - We Come Together (featuring Sakhile Moleshe).flac (checksum...) ok + add ./Uncategorized/Goldfish - Get Busy Living (2010)/06 - Brush Your Hair.flac (checksum...) ok + (Recording state in git...) + (Recording state in git...) + bram@falafel% git annex watch --stop + bram@falafel% git annex unlock + unlock 01 - Crunchy Joe (featuring Sakhile Moleshe).flac (copying...) ok + unlock 02 - Get Busy Living (featuring Emily Bruce).flac (copying...) ok + unlock 03 - Show You How.flac (copying...) ok + unlock 04 - Call Me (featuring Monique Hellenberg).flac (copying...) ok + unlock 05 - Humbug (featuring Sakhile Moleshe).flac (copying...) ok + unlock 06 - Brush Your Hair.flac (copying...) ok + unlock 07 - We Come Together (featuring Sakhile Moleshe).flac (copying...) ok + unlock 08 - In Too Deep (featuring Emily Bruce).flac (copying...) ok + unlock 09 - My Rainbow.flac (copying...) ok + unlock 10 - Big Band Wolf.flac (copying...) ok + bram@falafel% ls -l 01\ -\ Crunchy\ Joe\ \(featuring\ Sakhile\ Moleshe\).flac + -rw-r--r-- 1 bram bram 23981083 Jul 18 2011 01 - Crunchy Joe (featuring Sakhile Moleshe).flac + +This is using git-annex 3.20120624 on Ubuntu, compiled with cabal (I upgraded my libghc-stm-dev package, as you mentioned in another bug, to get the watch command working on this version). From e4596a133e1c6781bd8dd369448f11dc602d0d28 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawkSq2FDpK2n66QRUxtqqdbyDuwgbQmUWus" Date: Thu, 28 Jun 2012 13:39:19 +0000 Subject: [PATCH 03/23] Added a comment --- .../comment_1_f70e1912fde0eee59e208307df06b503._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/bugs/watcher_commits_unlocked_files/comment_1_f70e1912fde0eee59e208307df06b503._comment diff --git a/doc/bugs/watcher_commits_unlocked_files/comment_1_f70e1912fde0eee59e208307df06b503._comment b/doc/bugs/watcher_commits_unlocked_files/comment_1_f70e1912fde0eee59e208307df06b503._comment new file mode 100644 index 0000000000..a06b8fe822 --- /dev/null +++ b/doc/bugs/watcher_commits_unlocked_files/comment_1_f70e1912fde0eee59e208307df06b503._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawkSq2FDpK2n66QRUxtqqdbyDuwgbQmUWus" + nickname="Jimmy" + subject="comment 1" + date="2012-06-28T13:39:18Z" + content=""" +That is a known problem/bug which is listed at [[design/assistant/inotify]] +"""]] From 343ecf999a1ecb700ba2973763fc9237576dcc1c Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Thu, 28 Jun 2012 14:00:25 -0400 Subject: [PATCH 04/23] post my current set of ideas for handling unlocking --- doc/bugs/watcher_commits_unlocked_files.mdwn | 67 ++++++++------------ 1 file changed, 26 insertions(+), 41 deletions(-) diff --git a/doc/bugs/watcher_commits_unlocked_files.mdwn b/doc/bugs/watcher_commits_unlocked_files.mdwn index b807593763..ef64921f1a 100644 --- a/doc/bugs/watcher_commits_unlocked_files.mdwn +++ b/doc/bugs/watcher_commits_unlocked_files.mdwn @@ -1,43 +1,28 @@ -When having "git annex watch" running, unlocking files causes the watcher to immediately lock/commit them. Observe: +When having "git annex watch" running, unlocking files causes the watcher +to immediately lock/commit them. - bram@falafel% git annex unlock - unlock 01 - Crunchy Joe (featuring Sakhile Moleshe).flac (copying...) ok - unlock 02 - Get Busy Living (featuring Emily Bruce).flac (copying...) ok - unlock 03 - Show You How.flac (copying...) ok - unlock 04 - Call Me (featuring Monique Hellenberg).flac (copying...) ok - unlock 05 - Humbug (featuring Sakhile Moleshe).flac (copying...) ok - unlock 06 - Brush Your Hair.flac (copying...) ok - unlock 07 - We Come Together (featuring Sakhile Moleshe).flac (copying...) ok - unlock 08 - In Too Deep (featuring Emily Bruce).flac (copying...) ok - unlock 09 - My Rainbow.flac (copying...) ok - unlock 10 - Big Band Wolf.flac (copying...) ok - (Recording state in git...) - bram@falafel% ls -l 01\ -\ Crunchy\ Joe\ \(featuring\ Sakhile\ Moleshe\).flac - lrwxrwxrwx 1 bram bram 208 Jul 18 2011 01 - Crunchy Joe (featuring Sakhile Moleshe).flac -> ../../.git/annex/objects/KX/15/SHA256E-s23981083--5ffd30042e313f8e10cf51ded59c369dd03a600fa3b8c13962f833694af449b5.flac/SHA256E-s23981083--5ffd30042e313f8e10cf51ded59c369dd03a600fa3b8c13962f833694af449b5.flac - bram@falafel% tail ~/Media/.git/annex/daemon.log - add ./Uncategorized/Goldfish - Get Busy Living (2010)/04 - Call Me (featuring Monique Hellenberg).flac (checksum...) ok - add ./Uncategorized/Goldfish - Get Busy Living (2010)/03 - Show You How.flac (checksum...) ok - add ./Uncategorized/Goldfish - Get Busy Living (2010)/02 - Get Busy Living (featuring Emily Bruce).flac (checksum...) ok - add ./Uncategorized/Goldfish - Get Busy Living (2010)/10 - Big Band Wolf.flac (checksum...) ok - add ./Uncategorized/Goldfish - Get Busy Living (2010)/09 - My Rainbow.flac (checksum...) ok - add ./Uncategorized/Goldfish - Get Busy Living (2010)/08 - In Too Deep (featuring Emily Bruce).flac (checksum...) ok - add ./Uncategorized/Goldfish - Get Busy Living (2010)/07 - We Come Together (featuring Sakhile Moleshe).flac (checksum...) ok - add ./Uncategorized/Goldfish - Get Busy Living (2010)/06 - Brush Your Hair.flac (checksum...) ok - (Recording state in git...) - (Recording state in git...) - bram@falafel% git annex watch --stop - bram@falafel% git annex unlock - unlock 01 - Crunchy Joe (featuring Sakhile Moleshe).flac (copying...) ok - unlock 02 - Get Busy Living (featuring Emily Bruce).flac (copying...) ok - unlock 03 - Show You How.flac (copying...) ok - unlock 04 - Call Me (featuring Monique Hellenberg).flac (copying...) ok - unlock 05 - Humbug (featuring Sakhile Moleshe).flac (copying...) ok - unlock 06 - Brush Your Hair.flac (copying...) ok - unlock 07 - We Come Together (featuring Sakhile Moleshe).flac (copying...) ok - unlock 08 - In Too Deep (featuring Emily Bruce).flac (copying...) ok - unlock 09 - My Rainbow.flac (copying...) ok - unlock 10 - Big Band Wolf.flac (copying...) ok - bram@falafel% ls -l 01\ -\ Crunchy\ Joe\ \(featuring\ Sakhile\ Moleshe\).flac - -rw-r--r-- 1 bram bram 23981083 Jul 18 2011 01 - Crunchy Joe (featuring Sakhile Moleshe).flac +---- -This is using git-annex 3.20120624 on Ubuntu, compiled with cabal (I upgraded my libghc-stm-dev package, as you mentioned in another bug, to get the watch command working on this version). +Possible approaches: + +* The watcher could detect unlocked files by checking if newly added files + are a typechange of a file already in git. But this would add git overhead + to every file add. +* `git annex unlock` could add some type of flag file, which the assistant + could check. This would work fine, for users who want to use `git annex + unlock` with the assistant. That's probably not simple enough for most + users, though. +* There could be a UI in the assistant to pick a file and unlock it. + The assistant would have its own list of files it knows are unlocked. + But I'm trying to avoid mandatory UI to use the assistant. +* Perhaps instead, have a directory, like "edit". The assistant could notice + when files move into this special directory, and automatically unlock them. + Then when they're moved out, automatically commit them. +* Alternatively, files that are moved out of the repository entirely could be + automatically unlocked, and then when they're moved back in, it would + automatically do the right thing. This may be worth implementing in + combination with the "edit" directory, as different use cases would work + better with one or the other. However, I don't currently get inotify + events when files are moved out of the repository (well, I do, but it + just says "file moved", with no forwarding address, so I don't know + how to find the file to unlock it. From 6cc3eb97dbd665bdaedf0a28f315d62c169dbe9d Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Thu, 28 Jun 2012 14:06:22 -0400 Subject: [PATCH 05/23] update --- doc/design/assistant/inotify.mdwn | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/doc/design/assistant/inotify.mdwn b/doc/design/assistant/inotify.mdwn index f783f9a7df..7b600090ad 100644 --- a/doc/design/assistant/inotify.mdwn +++ b/doc/design/assistant/inotify.mdwn @@ -8,13 +8,15 @@ available! * If a file is checked into git as a normal file and gets modified (or merged, etc), it will be converted into an annexed file. - See [[blog/day_7__bugfixes]] + See [[blog/day_7__bugfixes]]. * When you `git annex unlock` a file, it will immediately be re-locked. + See [[bugs/watcher_commits_unlocked_files]]. * Kqueue has to open every directory it watches, so too many directories will run it out of the max number of open files (typically 1024), and fail. I may need to fork off multiple watcher processes to handle this. + See [[bugs/Issue_on_OSX_with_some_system_limits]]. ## beyond Linux @@ -42,6 +44,8 @@ I'd also like to support OSX and if possible the BSDs. * [man page](http://www.freebsd.org/cgi/man.cgi?query=kqueue&apropos=0&sektion=0&format=html) * (good example program) + *kqueue is now supported* + * hfsevents ([haskell bindings](http://hackage.haskell.org/package/hfsevents)) is OSX specific. @@ -71,9 +75,6 @@ I'd also like to support OSX and if possible the BSDs. - honor .gitignore, not adding files it excludes (difficult, probably needs my own .gitignore parser to avoid excessive running of git commands to check for ignored files) -- Possibly, when a directory is moved out of the annex location, - unannex its contents. (Does inotify tell us where the directory moved - to so we can access it?) ## the races From cd0ab91c91e84b726dbc3da39e57893bd1417ee9 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Thu, 28 Jun 2012 18:22:43 -0400 Subject: [PATCH 06/23] blog for the day --- .../blog/day_19__random_improvements.mdwn | 50 +++++++++++++++++++ 1 file changed, 50 insertions(+) create mode 100644 doc/design/assistant/blog/day_19__random_improvements.mdwn diff --git a/doc/design/assistant/blog/day_19__random_improvements.mdwn b/doc/design/assistant/blog/day_19__random_improvements.mdwn new file mode 100644 index 0000000000..93c1296bab --- /dev/null +++ b/doc/design/assistant/blog/day_19__random_improvements.mdwn @@ -0,0 +1,50 @@ +Random improvements day.. + +Got the merge conflict resolution code working in `git annex assistant`. + +Did some more fixes to the pushing and pulling code, covering some cases +I missed earlier. + +Git syncing seems to work well for me now; I've seen it recover +from a variety of error conditions, including merge conflicts and repos +that were temporarily unavailable. + +---- + +There is definitely a MVar deadlock if the merger thread's inotify event +handler tries to run code in the Annex monad. Luckily, it doesn't +currently seem to need to do that, so I have put off debugging what's going +on there. + +Reworked how the inotify thread runs, to avoid the two inotify threads +in the assistant now from both needing to wait for program termination, +in a possibly conflicting manner. + +Hmm, that *seems* to have fixed the MVar deadlock problem. + +---- + +Been thinking about how to fix [[bugs/watcher_commits_unlocked_files]]. +Posted some thoughts there. + +It's about time to move on to data [[syncing]]. While eventually that will +need to build a map of the repo network to efficiently sync data over the +fastest paths, I'm thinking that I'll first write a dumb version. So, two +more threads: + +1. Uploads new data to every configured remote. Triggered by the watcher + thread when it adds content. Easy; just use a `TSet` of Keys to send. + +2. Downloads new data from the cheapest remote that has it. COuld be + triggered by the + merger thread, after it merges in a git sync. Rather hard; how does it + work out what new keys are in the tree without scanning it all? Scan + through the git history to find newly created files? Maybe the watcher + triggers this thread instead, when it sees a new symlink, without data, + appear. + +Both threads will need to be able to be stopped, and restarted, as needed +to control the data transfer. And a lot of other control smarts will +eventually be needed, but my first pass will be to do a straightforward +implementation. Once it's done, the git annex assistant will be basically +usable. From 6b84f23317b77a4caf923fb9ab907e39e8cc926d Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawkSq2FDpK2n66QRUxtqqdbyDuwgbQmUWus" Date: Fri, 29 Jun 2012 12:02:49 +0000 Subject: [PATCH 07/23] Added a comment --- ...mment_2_b14e697c211843163285aaa8de5bf4c6._comment | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100644 doc/bugs/Issue_on_OSX_with_some_system_limits/comment_2_b14e697c211843163285aaa8de5bf4c6._comment diff --git a/doc/bugs/Issue_on_OSX_with_some_system_limits/comment_2_b14e697c211843163285aaa8de5bf4c6._comment b/doc/bugs/Issue_on_OSX_with_some_system_limits/comment_2_b14e697c211843163285aaa8de5bf4c6._comment new file mode 100644 index 0000000000..17dcf76343 --- /dev/null +++ b/doc/bugs/Issue_on_OSX_with_some_system_limits/comment_2_b14e697c211843163285aaa8de5bf4c6._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawkSq2FDpK2n66QRUxtqqdbyDuwgbQmUWus" + nickname="Jimmy" + subject="comment 2" + date="2012-06-29T12:02:48Z" + content=""" +Doing, + + sudo sysctl -w kern.maxfilesperproc=400000 + +Somewhat works for me, git-annex watch at least starts up and takes a while to scan the directory, but it's not ideal. Also, creating files seems to work okay, when I remove a file the changes don't seem to get pushed across my other repos, running a sync on the remote repo fixes things. +"""]] From 29335bf32685ee665f9ec5acbcfe7f8edabd1b96 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 29 Jun 2012 10:00:05 -0400 Subject: [PATCH 08/23] pointlessness --- Annex.hs | 2 +- Common.hs | 1 + Logs/Presence.hs | 2 +- Remote/Directory.hs | 2 +- Utility/Applicative.hs | 16 ++++++++++++++++ 5 files changed, 20 insertions(+), 3 deletions(-) create mode 100644 Utility/Applicative.hs diff --git a/Annex.hs b/Annex.hs index 38168334dd..32edeff5c1 100644 --- a/Annex.hs +++ b/Annex.hs @@ -128,7 +128,7 @@ newState gitrepo = AnnexState {- Makes an Annex state object for the specified git repo. - Ensures the config is read, if it was not already. -} new :: Git.Repo -> IO AnnexState -new gitrepo = newState <$> Git.Config.read gitrepo +new = newState <$$> Git.Config.read {- performs an action in the Annex monad -} run :: AnnexState -> Annex a -> IO (a, AnnexState) diff --git a/Common.hs b/Common.hs index 3475024601..7f07781ce9 100644 --- a/Common.hs +++ b/Common.hs @@ -26,6 +26,7 @@ import Utility.SafeCommand as X import Utility.Path as X import Utility.Directory as X import Utility.Monad as X +import Utility.Applicative as X import Utility.FileSystemEncoding as X import Utility.PartialPrelude as X diff --git a/Logs/Presence.hs b/Logs/Presence.hs index 933426718b..e75e1e4e6d 100644 --- a/Logs/Presence.hs +++ b/Logs/Presence.hs @@ -48,7 +48,7 @@ addLog file line = Annex.Branch.change file $ \s -> {- Reads a log file. - Note that the LogLines returned may be in any order. -} readLog :: FilePath -> Annex [LogLine] -readLog file = parseLog <$> Annex.Branch.get file +readLog = parseLog <$$> Annex.Branch.get {- Parses a log file. Unparseable lines are ignored. -} parseLog :: String -> [LogLine] diff --git a/Remote/Directory.hs b/Remote/Directory.hs index a5b0ff2a25..f618f518ed 100644 --- a/Remote/Directory.hs +++ b/Remote/Directory.hs @@ -272,7 +272,7 @@ retrieveCheap d _ k f = liftIO $ withStoredFiles Nothing d k go remove :: FilePath -> ChunkSize -> Key -> Annex Bool remove d chunksize k = liftIO $ withStoredFiles chunksize d k go where - go files = all id <$> mapM removefile files + go = all id <$$> mapM removefile removefile file = catchBoolIO $ do let dir = parentDir file allowWrite dir diff --git a/Utility/Applicative.hs b/Utility/Applicative.hs new file mode 100644 index 0000000000..64400c8012 --- /dev/null +++ b/Utility/Applicative.hs @@ -0,0 +1,16 @@ +{- applicative stuff + - + - Copyright 2012 Joey Hess + - + - Licensed under the GNU GPL version 3 or higher. + -} + +module Utility.Applicative where + +{- Like <$> , but supports one level of currying. + - + - foo v = bar <$> action v == foo = bar <$$> action + -} +(<$$>) :: Functor f => (a -> b) -> (c -> f a) -> c -> f b +f <$$> v = fmap f . v +infixr 4 <$$> From e7182ad1191b42d3431f14ced24d0a87ab91495e Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 29 Jun 2012 11:59:25 -0400 Subject: [PATCH 09/23] further design --- doc/design/assistant/syncing.mdwn | 40 +++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index 8b681ac100..99474928c4 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -39,6 +39,46 @@ and with appropriate rate limiting and control facilities. This probably will need lots of refinements to get working well. +### first pass: flood syncing + +Before mapping the network, the best we can do is flood all files out to every +reachable remote. This is worth doing first, since it's the simplest way to +get the basic functionality of the assistant to work. And we'll need this +anyway. + + data ToTransfer = ToUpload Key | ToDownload Key + type ToTransferChan = TChan [ToTransfer] + +* ToUpload added by the watcher thread when it adds content. +* ToDownload added by the watcher thread when it seens new symlinks + that lack content. + +Transfer threads started/stopped as necessary to move data. +May sometimes want multiple threads downloading, or uploading, or even both. + + data TransferID = TransferThread ThreadID | TransferProcess Pid + data Direction = Uploading | Downloading + data Transfer = Transfer Direction Key TransferID EpochTime Integer + -- add [Transfer] to DaemonStatus + +The assistant needs to find out when `git-annex-shell` is receiving or +sending (triggered by another remote), so it can add data for those too. +This is important to avoid uploading content to a remote that is already +downloading it from us, or vice versa, as well as to in future let the web +app manage transfers as user desires. + +For files being received, it can see the temp file, but other than lsof +there's no good way to find the pid (and I'd rather not kill blindly). + +For files being sent, there's no filesystem indication. So git-annex-shell +(and other git-annex transfer processes) should write a status file to disk. + +Can use file locking on these status files to claim upload/download rights, +which will avoid races. + +This status file can also be updated periodically to show amount of transfer +complete (necessary for tracking uploads). + ## other considerations It would be nice if, when a USB drive is connected, From 61786c52ad128fe39346241ef47e50ac41afb774 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 29 Jun 2012 14:03:03 -0400 Subject: [PATCH 10/23] releasing version 3.20120629 --- debian/changelog | 4 ++-- git-annex.cabal | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/debian/changelog b/debian/changelog index 46afb6e4d5..96d85da278 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,4 +1,4 @@ -git-annex (3.20120625) UNRELEASED; urgency=low +git-annex (3.20120629) unstable; urgency=low * cabal: Only try to use inotify on Linux. * Version build dependency on STM, and allow building without it, @@ -11,7 +11,7 @@ git-annex (3.20120625) UNRELEASED; urgency=low in their names. * sync: Automatically resolves merge conflicts. - -- Joey Hess Mon, 25 Jun 2012 11:38:12 -0400 + -- Joey Hess Fri, 29 Jun 2012 10:17:49 -0400 git-annex (3.20120624) unstable; urgency=low diff --git a/git-annex.cabal b/git-annex.cabal index f559406959..0bd35e14fe 100644 --- a/git-annex.cabal +++ b/git-annex.cabal @@ -1,5 +1,5 @@ Name: git-annex -Version: 3.20120625 +Version: 3.20120629 Cabal-Version: >= 1.8 License: GPL Maintainer: Joey Hess From 0ed7db5f3ac87405f56eb27adb9fdaf42bc49125 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 29 Jun 2012 14:03:37 -0400 Subject: [PATCH 11/23] add news item for git-annex 3.20120629 --- doc/design/assistant/syncing.mdwn | 2 ++ doc/news/version_3.20120605.mdwn | 11 ----------- doc/news/version_3.20120629.mdwn | 12 ++++++++++++ 3 files changed, 14 insertions(+), 11 deletions(-) delete mode 100644 doc/news/version_3.20120605.mdwn create mode 100644 doc/news/version_3.20120629.mdwn diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index 99474928c4..7c6ef16d39 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -46,6 +46,8 @@ reachable remote. This is worth doing first, since it's the simplest way to get the basic functionality of the assistant to work. And we'll need this anyway. +### transfer tracking + data ToTransfer = ToUpload Key | ToDownload Key type ToTransferChan = TChan [ToTransfer] diff --git a/doc/news/version_3.20120605.mdwn b/doc/news/version_3.20120605.mdwn deleted file mode 100644 index ed0a091771..0000000000 --- a/doc/news/version_3.20120605.mdwn +++ /dev/null @@ -1,11 +0,0 @@ -git-annex 3.20120605 released with [[!toggle text="these changes"]] -[[!toggleable text=""" - * sync: Show a nicer message if a user tries to sync to a special remote. - * lock: Reset unlocked file to index, rather than to branch head. - * import: New subcommand, pulls files from a directory outside the annex - and adds them. - * Fix display of warning message when encountering a file that uses an - unsupported backend. - * Require that the SHA256 backend can be used when building, since it's the - default. - * Preserve parent environment when running hooks of the hook special remote."""]] \ No newline at end of file diff --git a/doc/news/version_3.20120629.mdwn b/doc/news/version_3.20120629.mdwn new file mode 100644 index 0000000000..e6b98ae997 --- /dev/null +++ b/doc/news/version_3.20120629.mdwn @@ -0,0 +1,12 @@ +git-annex 3.20120629 released with [[!toggle text="these changes"]] +[[!toggleable text=""" + * cabal: Only try to use inotify on Linux. + * Version build dependency on STM, and allow building without it, + which disables the watch command. + * Avoid ugly failure mode when moving content from a local repository + that is not available. + * Got rid of the last place that did utf8 decoding. + * Accept arbitrarily encoded repository filepaths etc when reading + git config output. This fixes support for remotes with unusual characters + in their names. + * sync: Automatically resolves merge conflicts."""]] \ No newline at end of file From c79625290a9e17e8c9f6f0ed93a0e23a5ef0126c Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 29 Jun 2012 14:12:16 -0400 Subject: [PATCH 12/23] improving transfer data types and design --- doc/design/assistant/syncing.mdwn | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-) diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index 7c6ef16d39..02811f07ef 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -48,20 +48,24 @@ anyway. ### transfer tracking - data ToTransfer = ToUpload Key | ToDownload Key - type ToTransferChan = TChan [ToTransfer] - -* ToUpload added by the watcher thread when it adds content. -* ToDownload added by the watcher thread when it seens new symlinks +* Upload added to queue by the watcher thread when it adds content. +* Download added to queue by the watcher thread when it seens new symlinks that lack content. - -Transfer threads started/stopped as necessary to move data. -May sometimes want multiple threads downloading, or uploading, or even both. +* Transfer threads started/stopped as necessary to move data. + (May sometimes want multiple threads downloading, or uploading, or even both.) + + type TransferQueue = TChan [Transfer] + data Transfer = Upload Key Remote | Download Key Remote data TransferID = TransferThread ThreadID | TransferProcess Pid - data Direction = Uploading | Downloading - data Transfer = Transfer Direction Key TransferID EpochTime Integer - -- add [Transfer] to DaemonStatus + type AmountComplete = Integer + type StartedTime = EpochTime + data TransferInfo = TransferInfo TransferID StartedTime AmountComplete + -- add (M.Map Transfer TransferInfo) to DaemonStatus + + startTransfer :: Transfer -> Annex TransferID + + stopTransfer :: TransferID -> IO () The assistant needs to find out when `git-annex-shell` is receiving or sending (triggered by another remote), so it can add data for those too. From 660f81d2b2d8393577771c5f51e9da5f0ba00e22 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 29 Jun 2012 15:44:14 -0400 Subject: [PATCH 13/23] blog for the day --- .../blog/day_20__data_transfer_design.mdwn | 51 +++++++++++++++++++ doc/design/assistant/progressbars.mdwn | 2 +- doc/design/assistant/syncing.mdwn | 4 +- 3 files changed, 54 insertions(+), 3 deletions(-) create mode 100644 doc/design/assistant/blog/day_20__data_transfer_design.mdwn diff --git a/doc/design/assistant/blog/day_20__data_transfer_design.mdwn b/doc/design/assistant/blog/day_20__data_transfer_design.mdwn new file mode 100644 index 0000000000..2733f09bc4 --- /dev/null +++ b/doc/design/assistant/blog/day_20__data_transfer_design.mdwn @@ -0,0 +1,51 @@ +Today is a planning day. I have only a few days left before I'm off to +Nicaragua for [DebConf](http://debconf12.debconf.org/), where I'll only +have smaller chunks of time without interruptions. So it's important to get +some well-defined smallish chunks designed that I can work on later. See +bulleted action items below. Each should be around 1-2 hours unless it +turns out to be 8 hours... :) + +First, worked on writing down a design, and some data types, for data transfer +tracking (see [[syncing]] page). Found that writing down these simple data +types before I started slinging code has clarified things a lot for me. + +Most importantly, I realized that I will need to modify `git-annex-shell` +to record on disk what transfers it's doing, so the assistant can get that +information and use it to both avoid redundant transfers (potentially a big +problem!), and later to allow the user to control them using the web app. + +So these will be the first steps as I move toward implementing data +transfer tracking and naive flood fill transferring. + +* on-disk transfers in progress information files (read/write/enumerate) +* locking for the files, so redundant transfer races can be detected, + and failed transfers noticed +* update files as transfers proceed. See [[progressbars]] + (updating for downloads is easy; for uploads is hard) +* add Transfer queue TChan +* enqueue Transfers (Uploads) as new files are added to the annex by + Watcher. +* enqueue Tranferrs (Downloads) as new dangling symlinks are noticed by + Watcher. +* add TransferInfo Map to DaemonStatus for tracking transfers in progress. +* Poll transfer in progress info files for changes (use inotify again! + wow! hammer, meet nail..), and update the TransferInfo Map +* Write basic Transfer handling thread. Multiple such threads need to be + able to be run at once. Each will need its own independant copy of the + Annex state monad. +* Write transfer control thread, which decides when to launch transfers. +* At startup, and possibly periodically, look for files we have that + location tracking indicates remotes do not, and enqueue Uploads for + them. Also, enqueue Downloads for any files we're missing. + +While eventually the user will be able to use the web app to prioritize +transfers, stop and start, throttle, etc, it's important to get the default +behavior right. So I'm thinking about things like how to prioritize uploads +vs downloads, when it's appropriate to have multiple downloads running at +once, etc. + +* Find a way to probe available outgoing bandwidth, to throttle so + we don't bufferbloat the network to death. +* git-annex needs a simple speed control knob, which can be plumbed + through to, at least, rsync. A good job for an hour in an + airport somewhere. diff --git a/doc/design/assistant/progressbars.mdwn b/doc/design/assistant/progressbars.mdwn index 2ade05aa57..ee73842743 100644 --- a/doc/design/assistant/progressbars.mdwn +++ b/doc/design/assistant/progressbars.mdwn @@ -9,6 +9,6 @@ To get this info for downloads, git-annex can watch the file as it arrives and use its size. TODO: What about uploads? Will i have to parse rsync's progresss output? -Feed it via a named pipe? Ugh. +Feed it via a named pipe? Ugh. Check into librsync. This is one of those potentially hidden but time consuming problems. diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index 02811f07ef..ce7f9673b5 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -58,9 +58,9 @@ anyway. data Transfer = Upload Key Remote | Download Key Remote data TransferID = TransferThread ThreadID | TransferProcess Pid - type AmountComplete = Integer + type BytesComplete = Integer type StartedTime = EpochTime - data TransferInfo = TransferInfo TransferID StartedTime AmountComplete + data TransferInfo = TransferInfo TransferID StartedTime BytesComplete -- add (M.Map Transfer TransferInfo) to DaemonStatus startTransfer :: Transfer -> Annex TransferID From 49136c22d052ea83d0c9468e7cce28d08d961923 Mon Sep 17 00:00:00 2001 From: Philipp Kern Date: Sat, 30 Jun 2012 15:00:00 +0200 Subject: [PATCH 14/23] doc/download.mdwn: document no-s3 and assistant branches --- doc/download.mdwn | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/download.mdwn b/doc/download.mdwn index f0f17e141d..242de13c39 100644 --- a/doc/download.mdwn +++ b/doc/download.mdwn @@ -18,6 +18,7 @@ others need some manual work. See [[install]] for details. The git repository has some branches: +* `assistant` contains the new change-tracking daemon * `ghc7.0` supports versions of ghc older than 7.4, which had a major change to filename encoding. * `old-monad-control` is for systems that don't have a newer monad-control @@ -25,6 +26,7 @@ The git repository has some branches: * `no-ifelse` avoids using the IFelse library (merge it into master if you need it) * `no-bloom` avoids using bloom filters. (merge it into master if you need it) +* `no-s3` avoids using the S3 library (merge it into master if you need it) * `debian-stable` contains the latest backport of git-annex to Debian stable. * `tweak-fetch` adds support for the git tweak-fetch hook, which has From 768036f3dd42bb4b733679cfcd8af1ee42dcd70c Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawnHrjHxJAm39x8DR4bnbazQO6H0nMNuY9c" Date: Sat, 30 Jun 2012 14:34:12 +0000 Subject: [PATCH 15/23] Added a comment: sha256 alternative --- .../comment_12_60d13f2c8e008af1041bea565a392c83._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/install/OSX/comment_12_60d13f2c8e008af1041bea565a392c83._comment diff --git a/doc/install/OSX/comment_12_60d13f2c8e008af1041bea565a392c83._comment b/doc/install/OSX/comment_12_60d13f2c8e008af1041bea565a392c83._comment new file mode 100644 index 0000000000..e2e85aaa94 --- /dev/null +++ b/doc/install/OSX/comment_12_60d13f2c8e008af1041bea565a392c83._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawnHrjHxJAm39x8DR4bnbazQO6H0nMNuY9c" + nickname="Damien" + subject="sha256 alternative" + date="2012-06-30T14:34:11Z" + content=""" +in reply to comment 6: On my Mac (10.7.4) there's `/usr/bin/shasum -a 256 ` command that will produce the same output as `sha256sum `. +"""]] From edee8ad05b2fe8487c05cdcdacafa19a75151931 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawnHrjHxJAm39x8DR4bnbazQO6H0nMNuY9c" Date: Sun, 1 Jul 2012 17:03:58 +0000 Subject: [PATCH 16/23] Added a comment: gnu commands --- .../comment_13_a6f48c87c2d6eabe379d6e10a6cac453._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/install/OSX/comment_13_a6f48c87c2d6eabe379d6e10a6cac453._comment diff --git a/doc/install/OSX/comment_13_a6f48c87c2d6eabe379d6e10a6cac453._comment b/doc/install/OSX/comment_13_a6f48c87c2d6eabe379d6e10a6cac453._comment new file mode 100644 index 0000000000..e5ce62b138 --- /dev/null +++ b/doc/install/OSX/comment_13_a6f48c87c2d6eabe379d6e10a6cac453._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawnHrjHxJAm39x8DR4bnbazQO6H0nMNuY9c" + nickname="Damien" + subject="gnu commands" + date="2012-07-01T17:03:57Z" + content=""" +…and another approach to the same problem: apparently git-annex also relies on the GNU coreutils (for instance, when doing `git annex get .`, `cp` complains about `illegal option -- -`). I do have the GNU coreutils installed with Homebrew, but they are all prefixed with `g`. So maybe you should try `gsha256sum` and `gcp` before `sha256sum` and `cp`, that seems like a more general solution. +"""]] From be0e38bcc38405afec3283e31e8628e8c6a494aa Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 1 Jul 2012 14:29:00 -0400 Subject: [PATCH 17/23] add transfer information files --- Locations.hs | 6 ++ Logs/Transfer.hs | 159 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 165 insertions(+) create mode 100644 Logs/Transfer.hs diff --git a/Locations.hs b/Locations.hs index cd3f55d466..082a72a506 100644 --- a/Locations.hs +++ b/Locations.hs @@ -18,6 +18,7 @@ module Locations ( gitAnnexBadDir, gitAnnexBadLocation, gitAnnexUnusedLog, + gitAnnexTransferDir, gitAnnexJournalDir, gitAnnexJournalLock, gitAnnexIndex, @@ -127,6 +128,11 @@ gitAnnexBadLocation key r = gitAnnexBadDir r keyFile key gitAnnexUnusedLog :: FilePath -> Git.Repo -> FilePath gitAnnexUnusedLog prefix r = gitAnnexDir r (prefix ++ "unused") +{- .git/annex/transfer/ is used is used to record keys currently + - being transferred. -} +gitAnnexTransferDir :: Git.Repo -> FilePath +gitAnnexTransferDir r = addTrailingPathSeparator $ gitAnnexDir r "transfer" + {- .git/annex/journal/ is used to journal changes made to the git-annex - branch -} gitAnnexJournalDir :: Git.Repo -> FilePath diff --git a/Logs/Transfer.hs b/Logs/Transfer.hs new file mode 100644 index 0000000000..ab99304d1f --- /dev/null +++ b/Logs/Transfer.hs @@ -0,0 +1,159 @@ +{- git-annex transfer log files + - + - Copyright 2012 Joey Hess + - + - Licensed under the GNU GPL version 3 or higher. + -} + +module Logs.Transfer where + +import Common.Annex +import Types.Remote +import Remote +import Annex.Perms +import Annex.Exception +import qualified Git + +import qualified Data.Map as M +import Control.Concurrent +import System.Posix.Process +import System.Posix.Types +import Data.Time.Clock + +{- Enough information to uniquely identify a transfer, used as the filename + - of the transfer information file. -} +data Transfer = Transfer Direction Remote Key + deriving (Show) + +{- Information about a Transfer, stored in the transfer information file. -} +data TransferInfo = TransferInfo + { transferPid :: Maybe ProcessID + , transferThread :: Maybe ThreadId + , startedTime :: UTCTime + , bytesComplete :: Maybe Integer + , associatedFile :: Maybe FilePath + } + deriving (Show) + +data Direction = Upload | Download + +instance Show Direction where + show Upload = "upload" + show Download = "download" + +readDirection :: String -> Maybe Direction +readDirection "upload" = Just Upload +readDirection "download" = Just Download +readDirection _ = Nothing + +{- Runs a transfer action. Creates and locks the transfer information file + - while the action is running. Will throw an error if the transfer is + - already in progress. + -} +transfer :: Transfer -> Maybe FilePath -> Annex a -> Annex a +transfer transfer file a = do + createAnnexDirectory =<< fromRepo gitAnnexTransferDir + tfile <- fromRepo $ transferFile transfer + mode <- annexFileMode + info <- liftIO $ TransferInfo + <$> pure Nothing -- pid not stored in file, so omitted for speed + <*> pure Nothing -- threadid not stored in file, so omitted for speed + <*> getCurrentTime + <*> pure Nothing -- not 0; transfer may be resuming + <*> pure file + bracketIO (setup tfile mode info) (cleanup tfile) a + where + setup tfile mode info = do + fd <- openFd tfile ReadWrite (Just mode) + defaultFileFlags { trunc = True } + locked <- catchMaybeIO $ + setLock fd (WriteLock, AbsoluteSeek, 0, 0) + when (locked == Nothing) $ + error $ "transfer already in progress" + fdWrite fd $ writeTransferInfo info + return fd + cleanup tfile fd = do + removeFile tfile + closeFd fd + +{- If a transfer is still running, returns its TransferInfo. -} +checkTransfer :: Transfer -> Annex (Maybe TransferInfo) +checkTransfer transfer = do + mode <- annexFileMode + tfile <- fromRepo $ transferFile transfer + mfd <- liftIO $ catchMaybeIO $ + openFd tfile ReadOnly (Just mode) defaultFileFlags + case mfd of + Nothing -> return Nothing -- failed to open file; not running + Just fd -> do + locked <- liftIO $ + getLock fd (WriteLock, AbsoluteSeek, 0, 0) + case locked of + Nothing -> do + liftIO $ closeFd fd + return Nothing + Just (pid, _) -> liftIO $ do + handle <- fdToHandle fd + info <- readTransferInfo pid + <$> hGetContentsStrict handle + closeFd fd + return info + +{- Gets all currently running transfers. -} +getTransfers :: Annex [(Transfer, TransferInfo)] +getTransfers = do + uuidmap <- remoteMap id + transfers <- catMaybes . map (parseTransferFile uuidmap) <$> findfiles + infos <- mapM checkTransfer transfers + return $ map (\(t, Just i) -> (t, i)) $ + filter running $ zip transfers infos + where + findfiles = liftIO . dirContentsRecursive + =<< fromRepo gitAnnexTransferDir + running (_, i) = isJust i + +{- The transfer information file to use for a given Transfer. -} +transferFile :: Transfer -> Git.Repo -> FilePath +transferFile (Transfer direction remote key) repo = + gitAnnexTransferDir repo + show direction + show (uuid remote) + keyFile key + +{- Parses a transfer information filename to a Transfer. -} +parseTransferFile :: M.Map UUID Remote -> FilePath -> Maybe Transfer +parseTransferFile uuidmap file = + case drop (length bits - 3) bits of + [direction, uuid, key] -> Transfer + <$> readDirection direction + <*> M.lookup (toUUID uuid) uuidmap + <*> fileKey key + _ -> Nothing + where + bits = splitDirectories file + +writeTransferInfo :: TransferInfo -> String +writeTransferInfo info = unwords + -- transferPid is not included; instead obtained by looking at + -- the process that locks the file. + -- transferThread is not included; not relevant for other processes + [ show $ startedTime info + -- bytesComplete is not included; changes too fast + , fromMaybe "" $ associatedFile info -- comes last, may contain spaces + ] + +readTransferInfo :: ProcessID -> String -> Maybe TransferInfo +readTransferInfo pid s = + case bits of + [time] -> TransferInfo + <$> pure (Just pid) + <*> pure Nothing + <*> readish time + <*> pure Nothing + <*> pure filename + _ -> Nothing + where + (bits, filebits) = splitAt 1 $ split " " s + filename + | null filebits = Nothing + | otherwise = Just $ join " " filebits From 72988bae34030295f029b36e859d28bd45f7dbc1 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 1 Jul 2012 15:04:29 -0400 Subject: [PATCH 18/23] tested; bugfixes --- Logs/Transfer.hs | 52 +++++++++++++++++++++++++++--------------------- 1 file changed, 29 insertions(+), 23 deletions(-) diff --git a/Logs/Transfer.hs b/Logs/Transfer.hs index ab99304d1f..ab569aa0d4 100644 --- a/Logs/Transfer.hs +++ b/Logs/Transfer.hs @@ -1,4 +1,4 @@ -{- git-annex transfer log files +{- git-annex transfer information files - - Copyright 2012 Joey Hess - @@ -16,7 +16,6 @@ import qualified Git import qualified Data.Map as M import Control.Concurrent -import System.Posix.Process import System.Posix.Types import Data.Time.Clock @@ -46,14 +45,20 @@ readDirection "upload" = Just Upload readDirection "download" = Just Download readDirection _ = Nothing +upload :: Remote -> Key -> FilePath -> Annex a -> Annex a +upload remote key file a = transfer (Transfer Upload remote key) (Just file) a + +download :: Remote -> Key -> FilePath -> Annex a -> Annex a +download remote key file a = transfer (Transfer Download remote key) (Just file) a + {- Runs a transfer action. Creates and locks the transfer information file - while the action is running. Will throw an error if the transfer is - already in progress. -} transfer :: Transfer -> Maybe FilePath -> Annex a -> Annex a -transfer transfer file a = do - createAnnexDirectory =<< fromRepo gitAnnexTransferDir - tfile <- fromRepo $ transferFile transfer +transfer t file a = do + tfile <- fromRepo $ transferFile t + createAnnexDirectory $ takeDirectory tfile mode <- annexFileMode info <- liftIO $ TransferInfo <$> pure Nothing -- pid not stored in file, so omitted for speed @@ -61,16 +66,18 @@ transfer transfer file a = do <*> getCurrentTime <*> pure Nothing -- not 0; transfer may be resuming <*> pure file - bracketIO (setup tfile mode info) (cleanup tfile) a + bracketIO (prep tfile mode info) (cleanup tfile) a where - setup tfile mode info = do + prep tfile mode info = do fd <- openFd tfile ReadWrite (Just mode) defaultFileFlags { trunc = True } locked <- catchMaybeIO $ setLock fd (WriteLock, AbsoluteSeek, 0, 0) when (locked == Nothing) $ error $ "transfer already in progress" - fdWrite fd $ writeTransferInfo info + h <- fdToHandle fd + hPutStr h $ writeTransferInfo info + hFlush h return fd cleanup tfile fd = do removeFile tfile @@ -78,9 +85,9 @@ transfer transfer file a = do {- If a transfer is still running, returns its TransferInfo. -} checkTransfer :: Transfer -> Annex (Maybe TransferInfo) -checkTransfer transfer = do +checkTransfer t = do mode <- annexFileMode - tfile <- fromRepo $ transferFile transfer + tfile <- fromRepo $ transferFile t mfd <- liftIO $ catchMaybeIO $ openFd tfile ReadOnly (Just mode) defaultFileFlags case mfd of @@ -93,9 +100,9 @@ checkTransfer transfer = do liftIO $ closeFd fd return Nothing Just (pid, _) -> liftIO $ do - handle <- fdToHandle fd + h <- fdToHandle fd info <- readTransferInfo pid - <$> hGetContentsStrict handle + <$> hGetContentsStrict h closeFd fd return info @@ -114,32 +121,31 @@ getTransfers = do {- The transfer information file to use for a given Transfer. -} transferFile :: Transfer -> Git.Repo -> FilePath -transferFile (Transfer direction remote key) repo = - gitAnnexTransferDir repo - show direction - show (uuid remote) - keyFile key +transferFile (Transfer direction remote key) r = gitAnnexTransferDir r + show direction + fromUUID (uuid remote) + keyFile key {- Parses a transfer information filename to a Transfer. -} parseTransferFile :: M.Map UUID Remote -> FilePath -> Maybe Transfer parseTransferFile uuidmap file = case drop (length bits - 3) bits of - [direction, uuid, key] -> Transfer + [direction, u, key] -> Transfer <$> readDirection direction - <*> M.lookup (toUUID uuid) uuidmap + <*> M.lookup (toUUID u) uuidmap <*> fileKey key _ -> Nothing where bits = splitDirectories file writeTransferInfo :: TransferInfo -> String -writeTransferInfo info = unwords +writeTransferInfo info = unlines -- transferPid is not included; instead obtained by looking at -- the process that locks the file. -- transferThread is not included; not relevant for other processes [ show $ startedTime info -- bytesComplete is not included; changes too fast - , fromMaybe "" $ associatedFile info -- comes last, may contain spaces + , fromMaybe "" $ associatedFile info -- comes last; arbitrary content ] readTransferInfo :: ProcessID -> String -> Maybe TransferInfo @@ -153,7 +159,7 @@ readTransferInfo pid s = <*> pure filename _ -> Nothing where - (bits, filebits) = splitAt 1 $ split " " s + (bits, filebits) = splitAt 1 $ lines s filename | null filebits = Nothing - | otherwise = Just $ join " " filebits + | otherwise = Just $ unlines filebits From e5fd8b67b7dc3321b13c9b01c36cc7f4d01e1ad8 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 1 Jul 2012 15:18:36 -0400 Subject: [PATCH 19/23] get, move, copy: Now refuse to do anything when the requested file transfer is already in progress by another process. Note this is per-remote, so trying to get the same file from multiple remotes can still let duplicate downloads run. (And uploading the same file to multiple remotes is not duplicate at all of course.) get, move, and copy are the only git-annex subcommands that transfer files, but there's still git-annex-shell recvkey and sendkey to deal with too. I considered modifying retrieveKeyFile or getViaTmp, but they are called by other code that does not involve expensive file transfers (migrate) or that does file transfers that should not be checked by this (fsck --from). --- Command/Get.hs | 19 ++++++++++--------- Command/Move.hs | 17 +++++++++-------- debian/changelog | 7 +++++++ 3 files changed, 26 insertions(+), 17 deletions(-) diff --git a/Command/Get.hs b/Command/Get.hs index c4ba483126..35e25d9751 100644 --- a/Command/Get.hs +++ b/Command/Get.hs @@ -12,6 +12,7 @@ import Command import qualified Remote import Annex.Content import qualified Command.Move +import Logs.Transfer def :: [Command] def = [withOptions [Command.Move.fromOption] $ command "get" paramPaths seek @@ -25,24 +26,24 @@ start :: Maybe Remote -> FilePath -> (Key, Backend) -> CommandStart start from file (key, _) = stopUnless (not <$> inAnnex key) $ autoCopies file key (<) $ \_numcopies -> case from of - Nothing -> go $ perform key + Nothing -> go $ perform key file Just src -> -- get --from = copy --from stopUnless (Command.Move.fromOk src key) $ - go $ Command.Move.fromPerform src False key + go $ Command.Move.fromPerform src False key file where go a = do showStart "get" file - next a + next a -perform :: Key -> CommandPerform -perform key = stopUnless (getViaTmp key $ getKeyFile key) $ +perform :: Key -> FilePath -> CommandPerform +perform key file = stopUnless (getViaTmp key $ getKeyFile key file) $ next $ return True -- no cleanup needed {- Try to find a copy of the file in one of the remotes, - and copy it to here. -} -getKeyFile :: Key -> FilePath -> Annex Bool -getKeyFile key file = dispatch =<< Remote.keyPossibilities key +getKeyFile :: Key -> FilePath -> FilePath -> Annex Bool +getKeyFile key file dest = dispatch =<< Remote.keyPossibilities key where dispatch [] = do showNote "not available" @@ -64,7 +65,7 @@ getKeyFile key file = dispatch =<< Remote.keyPossibilities key | Remote.hasKeyCheap r = either (const False) id <$> Remote.hasKey r key | otherwise = return True - docopy r continue = do + docopy r continue = download r key file $ do showAction $ "from " ++ Remote.name r - ifM (Remote.retrieveKeyFile r key file) + ifM (Remote.retrieveKeyFile r key dest) ( return True , continue) diff --git a/Command/Move.hs b/Command/Move.hs index 6ec7cd90ab..8bba468783 100644 --- a/Command/Move.hs +++ b/Command/Move.hs @@ -16,6 +16,7 @@ import qualified Remote import Annex.UUID import qualified Option import Logs.Presence +import Logs.Transfer def :: [Command] def = [withOptions options $ command "move" paramPaths seek @@ -68,9 +69,9 @@ toStart dest move file key = do then stop -- not here, so nothing to do else do showMoveAction move file - next $ toPerform dest move key -toPerform :: Remote -> Bool -> Key -> CommandPerform -toPerform dest move key = moveLock move key $ do + next $ toPerform dest move key file +toPerform :: Remote -> Bool -> Key -> FilePath -> CommandPerform +toPerform dest move key file = moveLock move key $ do -- Checking the remote is expensive, so not done in the start step. -- In fast mode, location tracking is assumed to be correct, -- and an explicit check is not done, when copying. When moving, @@ -88,7 +89,7 @@ toPerform dest move key = moveLock move key $ do stop Right False -> do showAction $ "to " ++ Remote.name dest - ok <- Remote.storeKey dest key + ok <- upload dest key file $ Remote.storeKey dest key if ok then finish else do @@ -118,7 +119,7 @@ fromStart src move file key where go = stopUnless (fromOk src key) $ do showMoveAction move file - next $ fromPerform src move key + next $ fromPerform src move key file fromOk :: Remote -> Key -> Annex Bool fromOk src key | Remote.hasKeyCheap src = @@ -129,11 +130,11 @@ fromOk src key u <- getUUID remotes <- Remote.keyPossibilities key return $ u /= Remote.uuid src && elem src remotes -fromPerform :: Remote -> Bool -> Key -> CommandPerform -fromPerform src move key = moveLock move key $ +fromPerform :: Remote -> Bool -> Key -> FilePath -> CommandPerform +fromPerform src move key file = moveLock move key $ ifM (inAnnex key) ( handle move True - , do + , download src key file $ do showAction $ "from " ++ Remote.name src ok <- getViaTmp key $ Remote.retrieveKeyFile src key handle move ok diff --git a/debian/changelog b/debian/changelog index 96d85da278..babd1786de 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,3 +1,10 @@ +git-annex (3.20120630) UNRELEASED; urgency=low + + * get, move, copy: Now refuse to do anything when the requested file + transfer is already in progress by another process. + + -- Joey Hess Sun, 01 Jul 2012 15:04:37 -0400 + git-annex (3.20120629) unstable; urgency=low * cabal: Only try to use inotify on Linux. From 8c10f377146e6599054488f47a3a742f6a7c5ae2 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 1 Jul 2012 16:10:00 -0400 Subject: [PATCH 20/23] bugfixes fdToHandle seems to close the fd avoid excess trailing newline --- Logs/Transfer.hs | 27 ++++++++++++++++----------- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git a/Logs/Transfer.hs b/Logs/Transfer.hs index ab569aa0d4..fe93b90b43 100644 --- a/Logs/Transfer.hs +++ b/Logs/Transfer.hs @@ -21,20 +21,25 @@ import Data.Time.Clock {- Enough information to uniquely identify a transfer, used as the filename - of the transfer information file. -} -data Transfer = Transfer Direction Remote Key - deriving (Show) +data Transfer = Transfer + { transferDirection :: Direction + , transferRemote :: Remote + , transferKey :: Key + } + deriving (Show, Eq, Ord) {- Information about a Transfer, stored in the transfer information file. -} data TransferInfo = TransferInfo - { transferPid :: Maybe ProcessID + { startedTime :: UTCTime + , transferPid :: Maybe ProcessID , transferThread :: Maybe ThreadId - , startedTime :: UTCTime , bytesComplete :: Maybe Integer , associatedFile :: Maybe FilePath } - deriving (Show) + deriving (Show, Eq, Ord) data Direction = Upload | Download + deriving (Eq, Ord) instance Show Direction where show Upload = "upload" @@ -61,9 +66,9 @@ transfer t file a = do createAnnexDirectory $ takeDirectory tfile mode <- annexFileMode info <- liftIO $ TransferInfo - <$> pure Nothing -- pid not stored in file, so omitted for speed + <$> getCurrentTime + <*> pure Nothing -- pid not stored in file, so omitted for speed <*> pure Nothing -- threadid not stored in file, so omitted for speed - <*> getCurrentTime <*> pure Nothing -- not 0; transfer may be resuming <*> pure file bracketIO (prep tfile mode info) (cleanup tfile) a @@ -103,7 +108,7 @@ checkTransfer t = do h <- fdToHandle fd info <- readTransferInfo pid <$> hGetContentsStrict h - closeFd fd + hClose h return info {- Gets all currently running transfers. -} @@ -152,9 +157,9 @@ readTransferInfo :: ProcessID -> String -> Maybe TransferInfo readTransferInfo pid s = case bits of [time] -> TransferInfo - <$> pure (Just pid) + <$> readish time + <*> pure (Just pid) <*> pure Nothing - <*> readish time <*> pure Nothing <*> pure filename _ -> Nothing @@ -162,4 +167,4 @@ readTransferInfo pid s = (bits, filebits) = splitAt 1 $ lines s filename | null filebits = Nothing - | otherwise = Just $ unlines filebits + | otherwise = Just $ join "\n" filebits From 7225c2bfc0c7149e646fa9af998da983e3fa8bc8 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 1 Jul 2012 16:59:54 -0400 Subject: [PATCH 21/23] record transfer information on local git remotes In order to record a semi-useful filename associated with the key, this required plumbing the filename all the way through to the remotes' storeKey and retrieveKeyFile. Note that there is potential for deadlock here, narrowly avoided. Suppose the repos are A and B. A sends file foo to B, and at the same time, B gets file foo from A. So, A locks its upload transfer info file, and then locks B's download transfer info file. At the same time, B is taking the two locks in the opposite order. This is only not a deadlock because the lock code does not wait, and aborts. So one of A or B's transfers will be aborted and the other transfer will continue. Whew! --- Command/Fsck.hs | 2 +- Command/Get.hs | 4 ++-- Command/Move.hs | 8 +++++--- Command/Status.hs | 20 ++++++++++++++++++++ Logs/Transfer.hs | 33 +++++++++++++++------------------ Remote/Bup.hs | 10 +++++----- Remote/Directory.hs | 8 ++++---- Remote/Git.hs | 32 ++++++++++++++++++++------------ Remote/Helper/Encryptable.hs | 14 +++++++------- Remote/Helper/Hooks.hs | 4 ++-- Remote/Hook.hs | 8 ++++---- Remote/Rsync.hs | 12 ++++++------ Remote/S3.hs | 10 +++++----- Remote/Web.hs | 10 +++++----- Types/Remote.hs | 7 +++++-- debian/changelog | 1 + 16 files changed, 107 insertions(+), 76 deletions(-) diff --git a/Command/Fsck.hs b/Command/Fsck.hs index 7bfc46f4a6..10cca489b1 100644 --- a/Command/Fsck.hs +++ b/Command/Fsck.hs @@ -94,7 +94,7 @@ performRemote key file backend numcopies remote = ( return True , ifM (Annex.getState Annex.fast) ( return False - , Remote.retrieveKeyFile remote key tmp + , Remote.retrieveKeyFile remote key Nothing tmp ) ) diff --git a/Command/Get.hs b/Command/Get.hs index 35e25d9751..a5901ba664 100644 --- a/Command/Get.hs +++ b/Command/Get.hs @@ -65,7 +65,7 @@ getKeyFile key file dest = dispatch =<< Remote.keyPossibilities key | Remote.hasKeyCheap r = either (const False) id <$> Remote.hasKey r key | otherwise = return True - docopy r continue = download r key file $ do + docopy r continue = download (Remote.uuid r) key (Just file) $ do showAction $ "from " ++ Remote.name r - ifM (Remote.retrieveKeyFile r key dest) + ifM (Remote.retrieveKeyFile r key (Just file) dest) ( return True , continue) diff --git a/Command/Move.hs b/Command/Move.hs index 8bba468783..e7c11e80d3 100644 --- a/Command/Move.hs +++ b/Command/Move.hs @@ -89,7 +89,8 @@ toPerform dest move key file = moveLock move key $ do stop Right False -> do showAction $ "to " ++ Remote.name dest - ok <- upload dest key file $ Remote.storeKey dest key + ok <- upload (Remote.uuid dest) key (Just file) $ + Remote.storeKey dest key (Just file) if ok then finish else do @@ -134,9 +135,10 @@ fromPerform :: Remote -> Bool -> Key -> FilePath -> CommandPerform fromPerform src move key file = moveLock move key $ ifM (inAnnex key) ( handle move True - , download src key file $ do + , download (Remote.uuid src) key (Just file) $ do showAction $ "from " ++ Remote.name src - ok <- getViaTmp key $ Remote.retrieveKeyFile src key + ok <- getViaTmp key $ + Remote.retrieveKeyFile src key (Just file) handle move ok ) where diff --git a/Command/Status.hs b/Command/Status.hs index 2540a92da8..eff21bb509 100644 --- a/Command/Status.hs +++ b/Command/Status.hs @@ -31,6 +31,7 @@ import Logs.Trust import Remote import Config import Utility.Percentage +import Logs.Transfer -- a named computation that produces a statistic type Stat = StatState (Maybe (String, StatState String)) @@ -70,6 +71,7 @@ fast_stats = , remote_list SemiTrusted "semitrusted" , remote_list UnTrusted "untrusted" , remote_list DeadTrusted "dead" + , transfer_list , disk_size ] slow_stats :: [Stat] @@ -170,6 +172,24 @@ bloom_info = stat "bloom filter size" $ json id $ do return $ size ++ note +transfer_list :: Stat +transfer_list = stat "transfers in progress" $ nojson $ lift $ do + uuidmap <- Remote.remoteMap id + ts <- getTransfers + if null ts + then return "none" + else return $ pp uuidmap "" $ sort ts + where + pp _ c [] = c + pp uuidmap c ((t, i):xs) = "\n\t" ++ line uuidmap t i ++ pp uuidmap c xs + line uuidmap t i = unwords + [ show (transferDirection t) ++ "ing" + , fromMaybe (show $ transferKey t) (associatedFile i) + , if transferDirection t == Upload then "to" else "from" + , maybe (fromUUID $ transferRemote t) Remote.name $ + M.lookup (transferRemote t) uuidmap + ] + disk_size :: Stat disk_size = stat "available local disk space" $ json id $ lift $ calcfree diff --git a/Logs/Transfer.hs b/Logs/Transfer.hs index fe93b90b43..526241f935 100644 --- a/Logs/Transfer.hs +++ b/Logs/Transfer.hs @@ -8,13 +8,11 @@ module Logs.Transfer where import Common.Annex -import Types.Remote -import Remote import Annex.Perms import Annex.Exception import qualified Git +import Types.Remote -import qualified Data.Map as M import Control.Concurrent import System.Posix.Types import Data.Time.Clock @@ -23,7 +21,7 @@ import Data.Time.Clock - of the transfer information file. -} data Transfer = Transfer { transferDirection :: Direction - , transferRemote :: Remote + , transferRemote :: UUID , transferKey :: Key } deriving (Show, Eq, Ord) @@ -50,11 +48,11 @@ readDirection "upload" = Just Upload readDirection "download" = Just Download readDirection _ = Nothing -upload :: Remote -> Key -> FilePath -> Annex a -> Annex a -upload remote key file a = transfer (Transfer Upload remote key) (Just file) a +upload :: UUID -> Key -> AssociatedFile -> Annex a -> Annex a +upload u key file a = transfer (Transfer Upload u key) file a -download :: Remote -> Key -> FilePath -> Annex a -> Annex a -download remote key file a = transfer (Transfer Download remote key) (Just file) a +download :: UUID -> Key -> AssociatedFile -> Annex a -> Annex a +download u key file a = transfer (Transfer Download u key) file a {- Runs a transfer action. Creates and locks the transfer information file - while the action is running. Will throw an error if the transfer is @@ -83,10 +81,10 @@ transfer t file a = do h <- fdToHandle fd hPutStr h $ writeTransferInfo info hFlush h - return fd - cleanup tfile fd = do + return h + cleanup tfile h = do removeFile tfile - closeFd fd + hClose h {- If a transfer is still running, returns its TransferInfo. -} checkTransfer :: Transfer -> Annex (Maybe TransferInfo) @@ -114,8 +112,7 @@ checkTransfer t = do {- Gets all currently running transfers. -} getTransfers :: Annex [(Transfer, TransferInfo)] getTransfers = do - uuidmap <- remoteMap id - transfers <- catMaybes . map (parseTransferFile uuidmap) <$> findfiles + transfers <- catMaybes . map parseTransferFile <$> findfiles infos <- mapM checkTransfer transfers return $ map (\(t, Just i) -> (t, i)) $ filter running $ zip transfers infos @@ -126,18 +123,18 @@ getTransfers = do {- The transfer information file to use for a given Transfer. -} transferFile :: Transfer -> Git.Repo -> FilePath -transferFile (Transfer direction remote key) r = gitAnnexTransferDir r +transferFile (Transfer direction u key) r = gitAnnexTransferDir r show direction - fromUUID (uuid remote) + fromUUID u keyFile key {- Parses a transfer information filename to a Transfer. -} -parseTransferFile :: M.Map UUID Remote -> FilePath -> Maybe Transfer -parseTransferFile uuidmap file = +parseTransferFile :: FilePath -> Maybe Transfer +parseTransferFile file = case drop (length bits - 3) bits of [direction, u, key] -> Transfer <$> readDirection direction - <*> M.lookup (toUUID u) uuidmap + <*> pure (toUUID u) <*> fileKey key _ -> Nothing where diff --git a/Remote/Bup.hs b/Remote/Bup.hs index f1a36e468e..0d1b606d3d 100644 --- a/Remote/Bup.hs +++ b/Remote/Bup.hs @@ -108,8 +108,8 @@ bupSplitParams r buprepo k src = do return $ bupParams "split" buprepo (os ++ [Param "-n", Param (bupRef k), src]) -store :: Git.Repo -> BupRepo -> Key -> Annex Bool -store r buprepo k = do +store :: Git.Repo -> BupRepo -> Key -> AssociatedFile -> Annex Bool +store r buprepo k _f = do src <- inRepo $ gitAnnexLocation k params <- bupSplitParams r buprepo k (File src) liftIO $ boolSystem "bup" params @@ -122,11 +122,11 @@ storeEncrypted r buprepo (cipher, enck) k = do withEncryptedHandle cipher (L.readFile src) $ \h -> pipeBup params (Just h) Nothing -retrieve :: BupRepo -> Key -> FilePath -> Annex Bool -retrieve buprepo k f = do +retrieve :: BupRepo -> Key -> AssociatedFile -> FilePath -> Annex Bool +retrieve buprepo k _f d = do let params = bupParams "join" buprepo [Param $ bupRef k] liftIO $ catchBoolIO $ do - tofile <- openFile f WriteMode + tofile <- openFile d WriteMode pipeBup params Nothing (Just tofile) retrieveCheap :: BupRepo -> Key -> FilePath -> Annex Bool diff --git a/Remote/Directory.hs b/Remote/Directory.hs index f618f518ed..6b158730e8 100644 --- a/Remote/Directory.hs +++ b/Remote/Directory.hs @@ -122,8 +122,8 @@ withCheckedFiles check (Just _) d k a = go $ locations d k withStoredFiles :: ChunkSize -> FilePath -> Key -> ([FilePath] -> IO Bool) -> IO Bool withStoredFiles = withCheckedFiles doesFileExist -store :: FilePath -> ChunkSize -> Key -> Annex Bool -store d chunksize k = do +store :: FilePath -> ChunkSize -> Key -> AssociatedFile -> Annex Bool +store d chunksize k _f = do src <- inRepo $ gitAnnexLocation k metered k $ \meterupdate -> storeHelper d chunksize k $ \dests -> @@ -242,8 +242,8 @@ storeHelper d chunksize key a = prep <&&> check <&&> go preventWrite dir return (not $ null stored) -retrieve :: FilePath -> ChunkSize -> Key -> FilePath -> Annex Bool -retrieve d chunksize k f = metered k $ \meterupdate -> +retrieve :: FilePath -> ChunkSize -> Key -> AssociatedFile -> FilePath -> Annex Bool +retrieve d chunksize k _ f = metered k $ \meterupdate -> liftIO $ withStoredFiles chunksize d k $ \files -> catchBoolIO $ do meteredWriteFile' meterupdate f files feeder diff --git a/Remote/Git.hs b/Remote/Git.hs index 60a881803a..0b839c9a5e 100644 --- a/Remote/Git.hs +++ b/Remote/Git.hs @@ -21,6 +21,7 @@ import qualified Git.Config import qualified Git.Construct import qualified Annex import Logs.Presence +import Logs.Transfer import Annex.UUID import qualified Annex.Content import qualified Annex.BranchState @@ -219,14 +220,19 @@ dropKey r key ] {- Tries to copy a key's content from a remote's annex to a file. -} -copyFromRemote :: Git.Repo -> Key -> FilePath -> Annex Bool -copyFromRemote r key file +copyFromRemote :: Git.Repo -> Key -> AssociatedFile -> FilePath -> Annex Bool +copyFromRemote r key file dest | not $ Git.repoIsUrl r = guardUsable r False $ do params <- rsyncParams r - loc <- liftIO $ gitAnnexLocation key r - rsyncOrCopyFile params loc file - | Git.repoIsSsh r = rsyncHelper =<< rsyncParamsRemote r True key file - | Git.repoIsHttp r = Annex.Content.downloadUrl (keyUrls r key) file + u <- getUUID + -- run copy from perspective of remote + liftIO $ onLocal r $ do + ensureInitialized + loc <- inRepo $ gitAnnexLocation key + upload u key file $ + rsyncOrCopyFile params loc dest + | Git.repoIsSsh r = rsyncHelper =<< rsyncParamsRemote r True key dest + | Git.repoIsHttp r = Annex.Content.downloadUrl (keyUrls r key) dest | otherwise = error "copying from non-ssh, non-http repo not supported" copyFromRemoteCheap :: Git.Repo -> Key -> FilePath -> Annex Bool @@ -236,23 +242,25 @@ copyFromRemoteCheap r key file liftIO $ catchBoolIO $ createSymbolicLink loc file >> return True | Git.repoIsSsh r = ifM (Annex.Content.preseedTmp key file) - ( copyFromRemote r key file + ( copyFromRemote r key Nothing file , return False ) | otherwise = return False {- Tries to copy a key's content to a remote's annex. -} -copyToRemote :: Git.Repo -> Key -> Annex Bool -copyToRemote r key +copyToRemote :: Git.Repo -> Key -> AssociatedFile -> Annex Bool +copyToRemote r key file | not $ Git.repoIsUrl r = guardUsable r False $ commitOnCleanup r $ do keysrc <- inRepo $ gitAnnexLocation key params <- rsyncParams r + u <- getUUID -- run copy from perspective of remote liftIO $ onLocal r $ do ensureInitialized - Annex.Content.saveState True `after` - Annex.Content.getViaTmp key - (rsyncOrCopyFile params keysrc) + download u key file $ + Annex.Content.saveState True `after` + Annex.Content.getViaTmp key + (rsyncOrCopyFile params keysrc) | Git.repoIsSsh r = commitOnCleanup r $ do keysrc <- inRepo $ gitAnnexLocation key rsyncHelper =<< rsyncParamsRemote r False key keysrc diff --git a/Remote/Helper/Encryptable.hs b/Remote/Helper/Encryptable.hs index 789a1d9964..6d5405d9e0 100644 --- a/Remote/Helper/Encryptable.hs +++ b/Remote/Helper/Encryptable.hs @@ -59,14 +59,14 @@ encryptableRemote c storeKeyEncrypted retrieveKeyFileEncrypted r = cost = cost r + encryptedRemoteCostAdj } where - store k = cip k >>= maybe - (storeKey r k) + store k f = cip k >>= maybe + (storeKey r k f) (`storeKeyEncrypted` k) - retrieve k f = cip k >>= maybe - (retrieveKeyFile r k f) - (\enck -> retrieveKeyFileEncrypted enck k f) - retrieveCheap k f = cip k >>= maybe - (retrieveKeyFileCheap r k f) + retrieve k f d = cip k >>= maybe + (retrieveKeyFile r k f d) + (\enck -> retrieveKeyFileEncrypted enck k d) + retrieveCheap k d = cip k >>= maybe + (retrieveKeyFileCheap r k d) (\_ -> return False) withkey a k = cip k >>= maybe (a k) (a . snd) cip = cipherKey c diff --git a/Remote/Helper/Hooks.hs b/Remote/Helper/Hooks.hs index d85959062e..0a6b22081e 100644 --- a/Remote/Helper/Hooks.hs +++ b/Remote/Helper/Hooks.hs @@ -27,8 +27,8 @@ addHooks' r Nothing Nothing = r addHooks' r starthook stophook = r' where r' = r - { storeKey = \k -> wrapper $ storeKey r k - , retrieveKeyFile = \k f -> wrapper $ retrieveKeyFile r k f + { storeKey = \k f -> wrapper $ storeKey r k f + , retrieveKeyFile = \k f d -> wrapper $ retrieveKeyFile r k f d , retrieveKeyFileCheap = \k f -> wrapper $ retrieveKeyFileCheap r k f , removeKey = \k -> wrapper $ removeKey r k , hasKey = \k -> wrapper $ hasKey r k diff --git a/Remote/Hook.hs b/Remote/Hook.hs index 5fb793e65f..9e8d3c620d 100644 --- a/Remote/Hook.hs +++ b/Remote/Hook.hs @@ -101,8 +101,8 @@ runHook hooktype hook k f a = maybe (return False) run =<< lookupHook hooktype h return False ) -store :: String -> Key -> Annex Bool -store h k = do +store :: String -> Key -> AssociatedFile -> Annex Bool +store h k _f = do src <- inRepo $ gitAnnexLocation k runHook h "store" k (Just src) $ return True @@ -112,8 +112,8 @@ storeEncrypted h (cipher, enck) k = withTmp enck $ \tmp -> do liftIO $ withEncryptedContent cipher (L.readFile src) $ L.writeFile tmp runHook h "store" enck (Just tmp) $ return True -retrieve :: String -> Key -> FilePath -> Annex Bool -retrieve h k f = runHook h "retrieve" k (Just f) $ return True +retrieve :: String -> Key -> AssociatedFile -> FilePath -> Annex Bool +retrieve h k _f d = runHook h "retrieve" k (Just d) $ return True retrieveCheap :: String -> Key -> FilePath -> Annex Bool retrieveCheap _ _ _ = return False diff --git a/Remote/Rsync.hs b/Remote/Rsync.hs index 6207e14253..887c68339a 100644 --- a/Remote/Rsync.hs +++ b/Remote/Rsync.hs @@ -99,8 +99,8 @@ rsyncUrls o k = map use annexHashes use h = rsyncUrl o h k rsyncEscape o (f f) f = keyFile k -store :: RsyncOpts -> Key -> Annex Bool -store o k = rsyncSend o k <=< inRepo $ gitAnnexLocation k +store :: RsyncOpts -> Key -> AssociatedFile -> Annex Bool +store o k _f = rsyncSend o k <=< inRepo $ gitAnnexLocation k storeEncrypted :: RsyncOpts -> (Cipher, Key) -> Key -> Annex Bool storeEncrypted o (cipher, enck) k = withTmp enck $ \tmp -> do @@ -108,8 +108,8 @@ storeEncrypted o (cipher, enck) k = withTmp enck $ \tmp -> do liftIO $ withEncryptedContent cipher (L.readFile src) $ L.writeFile tmp rsyncSend o enck tmp -retrieve :: RsyncOpts -> Key -> FilePath -> Annex Bool -retrieve o k f = untilTrue (rsyncUrls o k) $ \u -> rsyncRemote o +retrieve :: RsyncOpts -> Key -> AssociatedFile -> FilePath -> Annex Bool +retrieve o k _ f = untilTrue (rsyncUrls o k) $ \u -> rsyncRemote o -- use inplace when retrieving to support resuming [ Param "--inplace" , Param u @@ -117,11 +117,11 @@ retrieve o k f = untilTrue (rsyncUrls o k) $ \u -> rsyncRemote o ] retrieveCheap :: RsyncOpts -> Key -> FilePath -> Annex Bool -retrieveCheap o k f = ifM (preseedTmp k f) ( retrieve o k f , return False ) +retrieveCheap o k f = ifM (preseedTmp k f) ( retrieve o k undefined f , return False ) retrieveEncrypted :: RsyncOpts -> (Cipher, Key) -> Key -> FilePath -> Annex Bool retrieveEncrypted o (cipher, enck) _ f = withTmp enck $ \tmp -> do - ifM (retrieve o enck tmp) + ifM (retrieve o enck undefined tmp) ( liftIO $ catchBoolIO $ do withDecryptedContent cipher (L.readFile tmp) $ L.writeFile f return True diff --git a/Remote/S3.hs b/Remote/S3.hs index 18d4915dcb..dca08fff8b 100644 --- a/Remote/S3.hs +++ b/Remote/S3.hs @@ -113,8 +113,8 @@ s3Setup u c = handlehost $ M.lookup "host" c -- be human-readable M.delete "bucket" defaults -store :: Remote -> Key -> Annex Bool -store r k = s3Action r False $ \(conn, bucket) -> do +store :: Remote -> Key -> AssociatedFile -> Annex Bool +store r k _f = s3Action r False $ \(conn, bucket) -> do dest <- inRepo $ gitAnnexLocation k res <- liftIO $ storeHelper (conn, bucket) r k dest s3Bool res @@ -149,12 +149,12 @@ storeHelper (conn, bucket) r k file = do xheaders = filter isxheader $ M.assocs $ fromJust $ config r isxheader (h, _) = "x-amz-" `isPrefixOf` h -retrieve :: Remote -> Key -> FilePath -> Annex Bool -retrieve r k f = s3Action r False $ \(conn, bucket) -> do +retrieve :: Remote -> Key -> AssociatedFile -> FilePath -> Annex Bool +retrieve r k _f d = s3Action r False $ \(conn, bucket) -> do res <- liftIO $ getObject conn $ bucketKey r bucket k case res of Right o -> do - liftIO $ L.writeFile f $ obj_data o + liftIO $ L.writeFile d $ obj_data o return True Left e -> s3Warning e diff --git a/Remote/Web.hs b/Remote/Web.hs index 5fc592326c..2516240ab3 100644 --- a/Remote/Web.hs +++ b/Remote/Web.hs @@ -51,21 +51,21 @@ gen r _ _ = remotetype = remote } -downloadKey :: Key -> FilePath -> Annex Bool -downloadKey key file = get =<< getUrls key +downloadKey :: Key -> AssociatedFile -> FilePath -> Annex Bool +downloadKey key _file dest = get =<< getUrls key where get [] = do warning "no known url" return False get urls = do showOutput -- make way for download progress bar - downloadUrl urls file + downloadUrl urls dest downloadKeyCheap :: Key -> FilePath -> Annex Bool downloadKeyCheap _ _ = return False -uploadKey :: Key -> Annex Bool -uploadKey _ = do +uploadKey :: Key -> AssociatedFile -> Annex Bool +uploadKey _ _ = do warning "upload to web not supported" return False diff --git a/Types/Remote.hs b/Types/Remote.hs index 9bac2ca0f8..c7628165c7 100644 --- a/Types/Remote.hs +++ b/Types/Remote.hs @@ -33,6 +33,9 @@ data RemoteTypeA a = RemoteType { instance Eq (RemoteTypeA a) where x == y = typename x == typename y +{- A filename associated with a Key, for display to user. -} +type AssociatedFile = Maybe FilePath + {- An individual remote. -} data RemoteA a = Remote { -- each Remote has a unique uuid @@ -42,9 +45,9 @@ data RemoteA a = Remote { -- Remotes have a use cost; higher is more expensive cost :: Int, -- Transfers a key to the remote. - storeKey :: Key -> a Bool, + storeKey :: Key -> AssociatedFile -> a Bool, -- retrieves a key's contents to a file - retrieveKeyFile :: Key -> FilePath -> a Bool, + retrieveKeyFile :: Key -> AssociatedFile -> FilePath -> a Bool, -- retrieves a key's contents to a tmp file, if it can be done cheaply retrieveKeyFileCheap :: Key -> FilePath -> a Bool, -- removes a key's contents diff --git a/debian/changelog b/debian/changelog index babd1786de..c279614ca9 100644 --- a/debian/changelog +++ b/debian/changelog @@ -2,6 +2,7 @@ git-annex (3.20120630) UNRELEASED; urgency=low * get, move, copy: Now refuse to do anything when the requested file transfer is already in progress by another process. + * status: Lists transfers that are currently in progress. -- Joey Hess Sun, 01 Jul 2012 15:04:37 -0400 From c53da2b04a2e802253bfbbfd4e00e02807d6de77 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 1 Jul 2012 17:11:20 -0400 Subject: [PATCH 22/23] blog for the day --- .../blog/day_21__transfer_tracking.mdwn | 28 +++++++++++++++++++ 1 file changed, 28 insertions(+) create mode 100644 doc/design/assistant/blog/day_21__transfer_tracking.mdwn diff --git a/doc/design/assistant/blog/day_21__transfer_tracking.mdwn b/doc/design/assistant/blog/day_21__transfer_tracking.mdwn new file mode 100644 index 0000000000..79c0b64387 --- /dev/null +++ b/doc/design/assistant/blog/day_21__transfer_tracking.mdwn @@ -0,0 +1,28 @@ +Worked today on two action items from my last blog post: + +* on-disk transfers in progress information files (read/write/enumerate) +* locking for the files, so redundant transfer races can be detected, + and failed transfers noticed + +That's all done, and used by the `get`, `copy`, and `move` subcommands. + +Also, I made `git-annex status` use that information to display any +file transfers that are currently in progress: + + joey@gnu:~/lib/sound/misc>git annex status + [...] + transfers in progress: + downloading Vic-303.mp3 from leech + +(Webapp, here we come!) + +However... Files being sent or received by `git-annex-shell` don't yet +have this transfer info recorded. The problem is that to do so, +`git-annex-shell` will need to be run with a `--remote=` parameter. But +old versions will of course fail when run with such an unknown parameter. + +This is a problem I last faced in December 2011 when adding the `--uuid=` +parameter. That time I punted and required the remote `git-annex-shell` be +updated to a new enough version to accept it. But as git-annex gets more widely +used and packaged, that's becoming less an option. I need to find a real +solution to this problem. From 2d2bfe9809f8d8d5862bc12fbe40c2e25b2405a3 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Sun, 1 Jul 2012 20:55:20 -0400 Subject: [PATCH 23/23] reorg --- .../blog/day_20__data_transfer_design.mdwn | 33 +---------------- doc/design/assistant/syncing.mdwn | 37 ++++++++++++++++--- 2 files changed, 33 insertions(+), 37 deletions(-) diff --git a/doc/design/assistant/blog/day_20__data_transfer_design.mdwn b/doc/design/assistant/blog/day_20__data_transfer_design.mdwn index 2733f09bc4..4f47ae63c4 100644 --- a/doc/design/assistant/blog/day_20__data_transfer_design.mdwn +++ b/doc/design/assistant/blog/day_20__data_transfer_design.mdwn @@ -2,8 +2,8 @@ Today is a planning day. I have only a few days left before I'm off to Nicaragua for [DebConf](http://debconf12.debconf.org/), where I'll only have smaller chunks of time without interruptions. So it's important to get some well-defined smallish chunks designed that I can work on later. See -bulleted action items below. Each should be around 1-2 hours unless it -turns out to be 8 hours... :) +bulleted action items below (now moved to [[syncing]]. Each +should be around 1-2 hours unless it turns out to be 8 hours... :) First, worked on writing down a design, and some data types, for data transfer tracking (see [[syncing]] page). Found that writing down these simple data @@ -14,38 +14,9 @@ to record on disk what transfers it's doing, so the assistant can get that information and use it to both avoid redundant transfers (potentially a big problem!), and later to allow the user to control them using the web app. -So these will be the first steps as I move toward implementing data -transfer tracking and naive flood fill transferring. - -* on-disk transfers in progress information files (read/write/enumerate) -* locking for the files, so redundant transfer races can be detected, - and failed transfers noticed -* update files as transfers proceed. See [[progressbars]] - (updating for downloads is easy; for uploads is hard) -* add Transfer queue TChan -* enqueue Transfers (Uploads) as new files are added to the annex by - Watcher. -* enqueue Tranferrs (Downloads) as new dangling symlinks are noticed by - Watcher. -* add TransferInfo Map to DaemonStatus for tracking transfers in progress. -* Poll transfer in progress info files for changes (use inotify again! - wow! hammer, meet nail..), and update the TransferInfo Map -* Write basic Transfer handling thread. Multiple such threads need to be - able to be run at once. Each will need its own independant copy of the - Annex state monad. -* Write transfer control thread, which decides when to launch transfers. -* At startup, and possibly periodically, look for files we have that - location tracking indicates remotes do not, and enqueue Uploads for - them. Also, enqueue Downloads for any files we're missing. - While eventually the user will be able to use the web app to prioritize transfers, stop and start, throttle, etc, it's important to get the default behavior right. So I'm thinking about things like how to prioritize uploads vs downloads, when it's appropriate to have multiple downloads running at once, etc. -* Find a way to probe available outgoing bandwidth, to throttle so - we don't bufferbloat the network to death. -* git-annex needs a simple speed control knob, which can be plumbed - through to, at least, rsync. A good job for an hour in an - airport somewhere. diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index ce7f9673b5..c18badb533 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -1,6 +1,37 @@ Once files are added (or removed or moved), need to send those changes to all the other git clones, at both the git level and the key/value level. +## action items + +* on-disk transfers in progress information files (read/write/enumerate) + **done** +* locking for the files, so redundant transfer races can be detected, + and failed transfers noticed **done** +* transfer info for git-annex-shell (problem: how to add a switch + with the necessary info w/o breaking backwards compatability?) +* update files as transfers proceed. See [[progressbars]] + (updating for downloads is easy; for uploads is hard) +* add Transfer queue TChan +* enqueue Transfers (Uploads) as new files are added to the annex by + Watcher. +* enqueue Tranferrs (Downloads) as new dangling symlinks are noticed by + Watcher. +* add TransferInfo Map to DaemonStatus for tracking transfers in progress. +* Poll transfer in progress info files for changes (use inotify again! + wow! hammer, meet nail..), and update the TransferInfo Map +* Write basic Transfer handling thread. Multiple such threads need to be + able to be run at once. Each will need its own independant copy of the + Annex state monad. +* Write transfer control thread, which decides when to launch transfers. +* At startup, and possibly periodically, look for files we have that + location tracking indicates remotes do not, and enqueue Uploads for + them. Also, enqueue Downloads for any files we're missing. +* Find a way to probe available outgoing bandwidth, to throttle so + we don't bufferbloat the network to death. +* git-annex needs a simple speed control knob, which can be plumbed + through to, at least, rsync. A good job for an hour in an + airport somewhere. + ## git syncing 1. Can use `git annex sync`, which already handles bidirectional syncing. @@ -55,12 +86,6 @@ anyway. (May sometimes want multiple threads downloading, or uploading, or even both.) type TransferQueue = TChan [Transfer] - data Transfer = Upload Key Remote | Download Key Remote - - data TransferID = TransferThread ThreadID | TransferProcess Pid - type BytesComplete = Integer - type StartedTime = EpochTime - data TransferInfo = TransferInfo TransferID StartedTime BytesComplete -- add (M.Map Transfer TransferInfo) to DaemonStatus startTransfer :: Transfer -> Annex TransferID