From f5cb8ed6906aff0f793b9bea9714227ece66e474 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawldKnauegZulM7X6JoHJs7Gd5PnDjcgx-E" Date: Fri, 6 Jul 2012 00:12:16 +0000 Subject: [PATCH 01/10] Added a comment: Source code --- ...comment_1_59fd4f1ffe96c412f613dc86276e7dbd._comment | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 doc/design/assistant/blog/day_25__transfer_queueing/comment_1_59fd4f1ffe96c412f613dc86276e7dbd._comment diff --git a/doc/design/assistant/blog/day_25__transfer_queueing/comment_1_59fd4f1ffe96c412f613dc86276e7dbd._comment b/doc/design/assistant/blog/day_25__transfer_queueing/comment_1_59fd4f1ffe96c412f613dc86276e7dbd._comment new file mode 100644 index 0000000000..0a5b6c0991 --- /dev/null +++ b/doc/design/assistant/blog/day_25__transfer_queueing/comment_1_59fd4f1ffe96c412f613dc86276e7dbd._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawldKnauegZulM7X6JoHJs7Gd5PnDjcgx-E" + nickname="Matt" + subject="Source code" + date="2012-07-06T00:12:15Z" + content=""" +Hi Joey, + +Is the source code for git-annex assistant available somewhere? +"""]] From c944f50fc103107fb1082355dc804fb3255fe9a9 Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Fri, 6 Jul 2012 00:21:43 +0000 Subject: [PATCH 02/10] Added a comment --- .../comment_2_93bf768a67117e873af5732ecf08dc78._comment | 7 +++++++ 1 file changed, 7 insertions(+) create mode 100644 doc/design/assistant/blog/day_25__transfer_queueing/comment_2_93bf768a67117e873af5732ecf08dc78._comment diff --git a/doc/design/assistant/blog/day_25__transfer_queueing/comment_2_93bf768a67117e873af5732ecf08dc78._comment b/doc/design/assistant/blog/day_25__transfer_queueing/comment_2_93bf768a67117e873af5732ecf08dc78._comment new file mode 100644 index 0000000000..6c0ca0781d --- /dev/null +++ b/doc/design/assistant/blog/day_25__transfer_queueing/comment_2_93bf768a67117e873af5732ecf08dc78._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + subject="comment 2" + date="2012-07-06T00:21:43Z" + content=""" +It's in the `assistant` branch of git://git-annex.branchable.com/ +"""]] From 4104785bcbc26ebc8976fb4c44246f3539a067d9 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawncBlzaDI248OZGjKQMXrLVQIx4XrZrzFo" Date: Fri, 6 Jul 2012 08:10:00 +0000 Subject: [PATCH 03/10] --- doc/forum/Wishlist:_mark_remotes_offline.mdwn | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100644 doc/forum/Wishlist:_mark_remotes_offline.mdwn diff --git a/doc/forum/Wishlist:_mark_remotes_offline.mdwn b/doc/forum/Wishlist:_mark_remotes_offline.mdwn new file mode 100644 index 0000000000..046c62210f --- /dev/null +++ b/doc/forum/Wishlist:_mark_remotes_offline.mdwn @@ -0,0 +1,12 @@ +I have several remotes which are not always accessible. For example they can +be on hosts only accessible by LAN or on a portable hard drive which is not +plugged in. When running sync these remotes are checked as well, leading to +unnecessary error messages and possibly git-annex waiting for a few minutes +on each remote for a timeout. + +In this situation it would be useful to mark some remotes as offline +(`git annex offline `), so that git-annex would not even attempt +to contact them. Then, I could configure my system to automatically, for example, +mark a portable hard disk remote online when plugging it in, and offline when +unplugging it, and similarly marking remotes offline and online depending on +whether I have an internet connection or a connection to a specific network. From 7a5229eb3b2347d7cd00901f20d6909afcea1d60 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawkSq2FDpK2n66QRUxtqqdbyDuwgbQmUWus" Date: Fri, 6 Jul 2012 08:48:10 +0000 Subject: [PATCH 04/10] --- ...ist:_options_for_syncing_meta-data_and_data.mdwn | 13 +++++++++++++ 1 file changed, 13 insertions(+) create mode 100644 doc/forum/Wishlist:_options_for_syncing_meta-data_and_data.mdwn diff --git a/doc/forum/Wishlist:_options_for_syncing_meta-data_and_data.mdwn b/doc/forum/Wishlist:_options_for_syncing_meta-data_and_data.mdwn new file mode 100644 index 0000000000..d1df6628ea --- /dev/null +++ b/doc/forum/Wishlist:_options_for_syncing_meta-data_and_data.mdwn @@ -0,0 +1,13 @@ +Since _transfer queueing_ and syncing of data works now in the assistant branch (been playing with it), there are times when I really don't want to sync the data, I would like to just sync meta-data and manually do a _get_ on files that I would want or selectively sync data in a subtree. + +It would be nice to have the syncing/watch feature to have the option of syncing only *meta-data* or *meta-data and data*, I think this sort of option was already planned? It would also be nice to be able to automatically sync data for only a subtree. + +My use case is, I have a big stash of files somewhere at home or work, and I want to keep what I am actually using on my laptop and be able to selectively just take a subtree or a set of subtree's of files. I would not always want to suck down all the data but still have the functionally to add files and push them upstream and sync meta-data. + +that is... + +> * Site A: big master annex in a server room with lots of disk (or machines), watches a directory and syncs both data and meta-data, it should always try and pull data from all it's child repos. That way I will always have a master copy of my data somewhere, it would be even nicer if I could have clones of the annex, where each annex is on a different machine which is configured to only sync a subtree of files so I can distribute my annex across different systems and disks. +> * Site A: machine A: syncs Folder A +> * Site A: machine B: syncs Folder B +> * and so on with selectively syncing sites and directories +> * Laptop: has a clone of the annex, and watches a directory, syncs meta-data as usual and only uploads files to a remote (all or a designated one) but it never downloads files automatically or it should only occur inside a selected subtree. From 1cae1bf79cb0e1679f71a9acb690b2d09162e7a1 Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Fri, 6 Jul 2012 13:04:07 +0000 Subject: [PATCH 05/10] Added a comment --- ...ent_1_9e3901f0123abb66034cce95cc5a941a._comment | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 doc/forum/Wishlist:_mark_remotes_offline/comment_1_9e3901f0123abb66034cce95cc5a941a._comment diff --git a/doc/forum/Wishlist:_mark_remotes_offline/comment_1_9e3901f0123abb66034cce95cc5a941a._comment b/doc/forum/Wishlist:_mark_remotes_offline/comment_1_9e3901f0123abb66034cce95cc5a941a._comment new file mode 100644 index 0000000000..c24a786c96 --- /dev/null +++ b/doc/forum/Wishlist:_mark_remotes_offline/comment_1_9e3901f0123abb66034cce95cc5a941a._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + subject="comment 1" + date="2012-07-06T13:04:07Z" + content=""" +You can already do this: + + git config remote.foo.annex-ignore true + +There's no need to do anything for portable drives that are sometimes mounted and sometimes not -- git-annex will automatically avoid using repositories in directories that do not currently exist. + +I thought git-annex also had a way to run a command and use its exit status to control whether a repo was +ignored or not, but it seems I never actually implemented that. It might be worth adding, although the command would necessarily run whenever git-annex is transferring data around. +"""]] From bde355a65bf78e76a966f015af1642c95794b7cf Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 6 Jul 2012 14:21:26 -0400 Subject: [PATCH 06/10] reorg, and add a link to a good forum post todo --- doc/design/assistant/syncing.mdwn | 94 ++++++++++++++++--------------- 1 file changed, 48 insertions(+), 46 deletions(-) diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index de07ef2d22..6dd1f79b3f 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -3,27 +3,6 @@ all the other git clones, at both the git level and the key/value level. ## action items -* on-disk transfers in progress information files (read/write/enumerate) - **done** -* locking for the files, so redundant transfer races can be detected, - and failed transfers noticed **done** -* transfer info for git-annex-shell **done** -* update files as transfers proceed. See [[progressbars]] - (updating for downloads is easy; for uploads is hard) -* add Transfer queue TChan **done** -* add TransferInfo Map to DaemonStatus for tracking transfers in progress. - **done** -* Poll transfer in progress info files for changes (use inotify again! - wow! hammer, meet nail..), and update the TransferInfo Map **done** -* enqueue Transfers (Uploads) as new files are added to the annex by - Watcher. **done** -* enqueue Tranferrs (Downloads) as new dangling symlinks are noticed by - Watcher. **done** -* Write basic Transfer handling thread. Multiple such threads need to be - able to be run at once. Each will need its own independant copy of the - Annex state monad. **done** -* Write transfer control thread, which decides when to launch transfers. - **done** * Check that download transfer triggering code works (when a symlink appears and the remote does *not* upload to us. * Investigate why transfers seem to block other git-annex assistant work. @@ -36,30 +15,21 @@ all the other git clones, at both the git level and the key/value level. through to, at least, rsync. A good job for an hour in an airport somewhere. -## git syncing +## longer-term TODO -1. Can use `git annex sync`, which already handles bidirectional syncing. - When a change is committed, launch the part of `git annex sync` that pushes - out changes. **done**; changes are pushed out to all remotes in parallel -1. Watch `.git/refs/remotes/` for changes (which would be pushed in from - another node via `git annex sync`), and run the part of `git annex sync` - that merges in received changes, and follow it by the part that pushes out - changes (sending them to any other remotes). - [The watching can be done with the existing inotify code! This avoids needing - any special mechanism to notify a remote that it's been synced to.] - **done** -1. Periodically retry pushes that failed. **done** (every half an hour) -1. Also, detect if a push failed due to not being up-to-date, pull, - and repush. **done** -2. Use a git merge driver that adds both conflicting files, - so conflicts never break a sync. **done** -3. Investigate the XMPP approach like dvcs-autosync does, or other ways of +* Investigate the XMPP approach like dvcs-autosync does, or other ways of signaling a change out of band. -4. Add a hook, so when there's a change to sync, a program can be run +* Add a hook, so when there's a change to sync, a program can be run and do its own signaling. -5. --debug will show often unnecessary work being done. Optimise. -6. It would be nice if, when a USB drive is connected, +* --debug will show often unnecessary work being done. Optimise. +* It would be nice if, when a USB drive is connected, syncing starts automatically. Use dbus on Linux? +* This assumes the network is connected. It's often not, so the + [[cloud]] needs to be used to bridge between LANs. +* Configurablity, including only enabling git syncing but not data transfer; + only uploading new files but not downloading, and only downloading + files in some directories and not others. See for use cases: + [[forum/Wishlist:_options_for_syncing_meta-data_and_data]] ## data syncing @@ -84,10 +54,42 @@ reachable remote. This is worth doing first, since it's the simplest way to get the basic functionality of the assistant to work. And we'll need this anyway. -## other considerations +## done -It would be nice if, when a USB drive is connected, -syncing starts automatically. Use dbus on Linux? +1. Can use `git annex sync`, which already handles bidirectional syncing. + When a change is committed, launch the part of `git annex sync` that pushes + out changes. **done**; changes are pushed out to all remotes in parallel +1. Watch `.git/refs/remotes/` for changes (which would be pushed in from + another node via `git annex sync`), and run the part of `git annex sync` + that merges in received changes, and follow it by the part that pushes out + changes (sending them to any other remotes). + [The watching can be done with the existing inotify code! This avoids needing + any special mechanism to notify a remote that it's been synced to.] + **done** +1. Periodically retry pushes that failed. **done** (every half an hour) +1. Also, detect if a push failed due to not being up-to-date, pull, + and repush. **done** +2. Use a git merge driver that adds both conflicting files, + so conflicts never break a sync. **done** -This assumes the network is connected. It's often not, so the -[[cloud]] needs to be used to bridge between LANs. +* on-disk transfers in progress information files (read/write/enumerate) + **done** +* locking for the files, so redundant transfer races can be detected, + and failed transfers noticed **done** +* transfer info for git-annex-shell **done** +* update files as transfers proceed. See [[progressbars]] + (updating for downloads is easy; for uploads is hard) +* add Transfer queue TChan **done** +* add TransferInfo Map to DaemonStatus for tracking transfers in progress. + **done** +* Poll transfer in progress info files for changes (use inotify again! + wow! hammer, meet nail..), and update the TransferInfo Map **done** +* enqueue Transfers (Uploads) as new files are added to the annex by + Watcher. **done** +* enqueue Tranferrs (Downloads) as new dangling symlinks are noticed by + Watcher. **done** +* Write basic Transfer handling thread. Multiple such threads need to be + able to be run at once. Each will need its own independant copy of the + Annex state monad. **done** +* Write transfer control thread, which decides when to launch transfers. + **done** From 8489419debfdfb3fb58ba447afd6a7e340d99b62 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 6 Jul 2012 14:56:41 -0600 Subject: [PATCH 07/10] todo --- doc/design/assistant/syncing.mdwn | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index 6dd1f79b3f..abeb74fb7b 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -14,6 +14,8 @@ all the other git clones, at both the git level and the key/value level. * git-annex needs a simple speed control knob, which can be plumbed through to, at least, rsync. A good job for an hour in an airport somewhere. +* file transfers run in background processes, which means they + probably don't participate in ssh connection caching. Verify this and fix. ## longer-term TODO From 27ac0ec332d4d3cd8ee03d16b7e22d0498157b14 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 6 Jul 2012 14:58:30 -0600 Subject: [PATCH 08/10] ssh connection caching is ok, but there is another problem --- doc/design/assistant/syncing.mdwn | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn index abeb74fb7b..9c607f992d 100644 --- a/doc/design/assistant/syncing.mdwn +++ b/doc/design/assistant/syncing.mdwn @@ -14,8 +14,7 @@ all the other git clones, at both the git level and the key/value level. * git-annex needs a simple speed control knob, which can be plumbed through to, at least, rsync. A good job for an hour in an airport somewhere. -* file transfers run in background processes, which means they - probably don't participate in ssh connection caching. Verify this and fix. +* file transfer processes are not waited for, contain the zombies. ## longer-term TODO From 2c4b39be4f68b53c2d2bc3647b17789b316fc542 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 6 Jul 2012 17:06:05 -0600 Subject: [PATCH 09/10] blog for the day --- .../assistant/blog/day_26__dying_drives.mdwn | 28 +++++++++++++++++++ 1 file changed, 28 insertions(+) create mode 100644 doc/design/assistant/blog/day_26__dying_drives.mdwn diff --git a/doc/design/assistant/blog/day_26__dying_drives.mdwn b/doc/design/assistant/blog/day_26__dying_drives.mdwn new file mode 100644 index 0000000000..109ceb19ec --- /dev/null +++ b/doc/design/assistant/blog/day_26__dying_drives.mdwn @@ -0,0 +1,28 @@ +My laptop's SSD died this morning. I had some work from yesterday +committed to the git repo on it, but not pushed as it didn't build. +Luckily I was able to get that off the SSD, which is now a read-only +drive -- even mounting it fails with fsck write errors. + +Wish I'd realized the SSD was dying before the day before my trip to +Nicaragua.. +Getting back to a useful laptop used most of my time and energy today. + +I did manage to fix transfers to not block the rest of the assistant's +threads. Problem was that, without Haskell's threaded runtime, waiting +on something like a rsync command blocks all threads. To fix this, +transfers now are run in separate processes. + +Also added code to allow multiple transfers to run at once. Each transfer +takes up a slot, with the number of free slots tracked by a `QSemN`. +This allows the transfer starting thread to block until a slot frees up, +and then run the transfer. + +This needs to be extended to be aware of transfers initiated by remotes. +The transfer watcher thread should detect those starting and stopping +and update the `QSemN` accordingly. It would also be nice if transfers +initiated by remotes would be delayed when there are no free slots for them +... but I have not thought of a good way to do that. + +There's a bug somewhere in the new transfer code, when two transfers are +queued close together, the second one is lost and doesn't happen. +Would debug this, but I'm spent for the day. From 8ad844e45c3f8a65ef5b725e9c6ac0f414b50fa4 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Fri, 6 Jul 2012 17:22:56 -0600 Subject: [PATCH 10/10] fix leading period before two-element extensions --- Backend/SHA.hs | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/Backend/SHA.hs b/Backend/SHA.hs index 95ce4a7701..cf61139e00 100644 --- a/Backend/SHA.hs +++ b/Backend/SHA.hs @@ -102,9 +102,13 @@ keyValueE size source = keyValue size source >>= maybe (return Nothing) addE } selectExtension :: FilePath -> String -selectExtension = join "." . reverse . take 2 . takeWhile shortenough . - reverse . split "." . takeExtensions +selectExtension f + | null es = "" + | otherwise = join "." ("":es) where + es = filter (not . null) $ reverse $ + take 2 $ takeWhile shortenough $ + reverse $ split "." $ takeExtensions f shortenough e | '\n' `elem` e = False -- newline in extension?! | otherwise = length e <= 4 -- long enough for "jpeg"