Merge branch 'master' into assistant

Conflicts: debian/changelog Updated changelog for assistant and webapp
2012-08-27 13:31:54 -04:00 · 2012-08-27 13:31:54 -04:00 · b12db9ef92
commit b12db9ef92
parent 347d3892e7 d228e4ca8c
17 changed files with 284 additions and 11 deletions
--- a/doc/design/assistant/blog/day_61__network_connection_detection.mdwn
+++ b/doc/design/assistant/blog/day_61__network_connection_detection.mdwn
@ -0,0 +1,36 @@
+Today, added a thread that deals with recovering when there's been a loss
+of network connectivity. When the network's down, the normal immediate
+syncing of changes of course doesn't work. So this thread detects when the
+network comes back up, and does a pull+push to network remotes, and
+triggers scanning for file content that needs to be transferred.
+
+I used dbus again, to detect events generated by both network-manager and
+wicd when they've sucessfully brought an interface up. Or, if they're not
+available, it polls every 30 minutes.
+
+When the network comes up, in addition to the git pull+push, it also
+currently does a full scan of the repo to find files whose contents
+need to be transferred to get fully back into sync.
+
+I think it'll be ok for some git pulls and pushes to happen when
+moving to a new network, or resuming a laptop (or every 30 minutes when
+resorting to polling). But the transfer scan is currently really too heavy
+to be appropriate to do every time in those situations. I have an idea for
+avoiding that scan when the remote's git-annex branch has not changed. But
+I need to refine it, to handle cases like this:
+
+1. a new remote is added
+2. file contents start being transferred to (or from it)
+3. the network is taken down
+4. all the queued transfers fail
+5. the network comes back up
+6. the transfer scan needs to know the remote was not all in sync
+   before #3, and so should do a full scan despite the git-annex branch
+   not having changed
+
+---
+
+Doubled the ram in my netbook, which I use for all development. Yesod needs
+rather a lot of ram to compile and link, and this should make me quite a
+lot more productive. I was struggling with OOM killing bits of chromium
+during my last week of development.
--- a/doc/design/assistant/blog/day_61__network_connection_detection/comment_1_09b58f41a8d48f218619711ee19511ac._comment
+++ b/doc/design/assistant/blog/day_61__network_connection_detection/comment_1_09b58f41a8d48f218619711ee19511ac._comment
@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawmubB1Sj2rwFoVdZYvGV0ACaQUJQyiJXJI"
+ nickname="Paul"
+ subject="Amazon Glacier"
+ date="2012-08-23T06:32:24Z"
+ content="""
+Do you think git-annex could support [Amazon Glacier](http://aws.amazon.com/glacier/) as a backend?
+"""]]
--- a/doc/design/assistant/blog/day_62__smarter_syncing.mdwn
+++ b/doc/design/assistant/blog/day_62__smarter_syncing.mdwn
@ -0,0 +1,21 @@
+Woke up this morning with most of the design for a smarter approach to
+[[syncing]] in my head. (This is why I sometimes slip up and tell people I
+work on this project 12 hours a day..)
+
+To keep the current `assistant` branch working while I make changes
+that break use cases that are working, I've started 
+developing in a new branch, `assistant-wip`.
+
+In it, I've started getting rid of unnecessary expensive transfer scans.
+
+First optimisation I've done is to detect when a remote that was
+disconnected has diverged its `git-annex` branch from the local branch.
+Only when that's the case does a new transfer scan need to be done, to find
+out what new stuff might be available on that remote, to have caused the
+change to its branch, while it was disconnected.
+
+That broke a lot of stuff. I have a plan to fix it written down in
+[[syncing]]. It'll involve keeping track of whether a transfer scan has
+ever been done (if not, one should be run), and recording logs when
+transfers failed, so those failed transfers can be retried when the
+remote gets reconnected.
--- a/doc/design/assistant/blog/day_63__transfer_retries.mdwn
+++ b/doc/design/assistant/blog/day_63__transfer_retries.mdwn
@ -0,0 +1,26 @@
+Implemented everything I planned out yesterday: Expensive scans are only
+done once per remote (unless the remote changed while it was disconnected),
+and failed transfers are logged so they can be retried later.
+
+Changed the TransferScanner to prefer to scan low cost remotes first,
+as a crude form of scheduling lower-cost transfers first.
+
+A whole bunch of interesting syncing scenarios should work now. I have not
+tested them all in detail, but to the best of my knowledge, all these
+should work:
+
+* Connect to the network. It starts syncing with a networked remote.
+  Disconnect the network. Reconnect, and it resumes where it left off.
+* Migrate between networks (ie, home to cafe to work). Any transfers
+  that can only happen on one LAN are retried on each new network you
+  visit, until they succeed.
+
+One that is not working, but is soooo close:
+
+* Plug in a removable drive. Some transfers start. Yank the plug.
+  Plug it back in. All necessary transfers resume, and it ends up
+  fully in sync, no matter how many times you yank that cable.
+
+That's not working because of an infelicity in the MountWatcher.
+It doesn't notice when the drive gets unmounted, so it ignores
+the new mount event.
--- a/doc/design/assistant/blog/day_63__transfer_retries/comment_1_990d4eb6066c4e2b9ddb3cabef32e4b9._comment
+++ b/doc/design/assistant/blog/day_63__transfer_retries/comment_1_990d4eb6066c4e2b9ddb3cabef32e4b9._comment
@ -0,0 +1,10 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawmBUR4O9mofxVbpb8JV9mEbVfIYv670uJo"
+ nickname="Justin"
+ subject="comment 1"
+ date="2012-08-23T21:25:48Z"
+ content="""
+Do encrypted rsync remotes resume quickly as well?
+
+One thing I noticed was that if a copy --to an encrypted rsync remote gets interrupted it will remove the tmp file and re-encrypt the whole file before resuming rsync.
+"""]]
--- a/doc/design/assistant/blog/day_64__syncing_robustly.mdwn
+++ b/doc/design/assistant/blog/day_64__syncing_robustly.mdwn
@ -0,0 +1,33 @@
+Working toward getting the data syncing to happen robustly,
+so a bunch of improvements.
+
+* Got unmount events to be noticed, so unplugging and replugging
+  a removable drive will resume the syncing to it. There's really no
+  good unmount event available on dbus in kde, so it uses a heuristic
+  there.
+* Avoid requeuing a download from a remote that no longer has a key.
+* Run a full scan on startup, for multiple reasons, including dealing with
+  crashes.
+
+Ran into a strange issue: Occasionally the assistant will run `git-annex
+copy` and it will not transfer the requested file. It seems that
+when the copy command runs `git ls-files`, it does not see the file
+it's supposed to act on in its output.
+
+Eventually I figured out what's going on: When updating the git-annex
+branch, it sets `GIT_INDEX_FILE`, and of course environment settings are
+not thread-safe! So there's a race between threads that access
+the git-annex branch, and the Transferrer thread, or any other thread
+that might expect to look at the normal git index.
+
+Unfortunatly, I don't have a fix for this yet.. Git's only interface for
+using a different index file is `GIT_INDEX_FILE`. It seems I have a lot of
+code to tear apart, to push back the setenv until after forking every git
+command. :(
+
+Before I figured out the root problem, I developed a workaround for the
+symptom I was seeing. I added a `git-annex transferkey`, which is
+optimised to be run by the assistant, and avoids running `git ls-files`, so
+avoids the problem. While I plan to fix this environment variable problem
+properly, `transferkey` turns out to be so much faster than how it was
+using `copy` that I'm going to keep it.
--- a/doc/design/assistant/syncing.mdwn
+++ b/doc/design/assistant/syncing.mdwn
@ -3,9 +3,16 @@ all the other git clones, at both the git level and the key/value level.

 ## immediate action items

-* At startup, and possibly periodically, or when the network connection
-  changes, or some heuristic suggests that a remote was disconnected from
-  us for a while, queue remotes for processing by the TransferScanner.
+* The syncing code currently doesn't run for special remotes. While
+  transfering the git info about special remotes could be a complication,
+  if we assume that's synced between existing git remotes, it should be
+  possible for them to do file transfers to/from special remotes.
+* Often several remotes will be queued for full TransferScanner scans,
+  and the scan does the same thing for each .. so it would be better to
+  combine them into one scan in such a case.
+* Sometimes a Download gets queued from a slow remote, and then a fast
+  remote becomes available, and a Download is queued from it. Would be
+  good to sort the transfer queue to run fast Downloads (and Uploads) first.
 * Ensure that when a remote receives content, and updates its location log,
  it syncs that update back out. Prerequisite for:
 * After git sync, identify new content that we don't have that is now available
@ -34,14 +41,17 @@ all the other git clones, at both the git level and the key/value level.
  files in some directories and not others. See for use cases:
  [[forum/Wishlist:_options_for_syncing_meta-data_and_data]]
 * speed up git syncing by using the cached ssh connection for it too
-  (will need to use `GIT_SSH`, which needs to point to a command to run,
-  not a shell command line)
+  Will need to use `GIT_SSH`, which needs to point to a command to run,
+  not a shell command line. Beware that the network connection may have
+  bounced and the cached ssh connection not be usable.
 * Map the network of git repos, and use that map to calculate
  optimal transfers to keep the data in sync. Currently a naive flood fill
  is done instead.
 * Find a more efficient way for the TransferScanner to find the transfers
  that need to be done to sync with a remote. Currently it walks the git
-  working copy and checks each file.
+  working copy and checks each file. That probably needs to be done once,
+  but further calls to the TransferScanner could eg, look at the delta
+  between the last scan and the current one in the git-annex branch.

 ## misc todo

@ -163,3 +173,42 @@ redone to check it.
  finishes. **done**
 * Test MountWatcher on KDE, and add whatever dbus events KDE emits when
  drives are mounted. **done**
+* It would be nice if, when a USB drive is connected, 
+  syncing starts automatically. Use dbus on Linux? **done**
+* Optimisations in 5c3e14649ee7c404f86a1b82b648d896762cbbc2 temporarily
+  broke content syncing in some situations, which need to be added back.
+  **done**
+
+  Now syncing a disconnected remote only starts a transfer scan if the
+  remote's git-annex branch has diverged, which indicates it probably has
+  new files. But that leaves open the cases where the local repo has
+  new files; and where the two repos git branches are in sync, but the
+  content transfers are lagging behind; and where the transfer scan has
+  never been run.
+
+  Need to track locally whether we're believed to be in sync with a remote.
+  This includes:
+  * All local content has been transferred to it successfully.
+  * The remote has been scanned once for data to transfer from it, and all
+    transfers initiated by that scan succeeded.
+
+  Note the complication that, if it's initiated a transfer, our queued
+  transfer will be thrown out as unnecessary. But if its transfer then
+  fails, that needs to be noticed.
+
+  If we're going to track failed transfers, we could just set a flag,
+  and use that flag later to initiate a new transfer scan. We need a flag
+  in any case, to ensure that a transfer scan is run for each new remote.
+  The flag could be `.git/annex/transfer/scanned/uuid`.
+
+  But, if failed transfers are tracked, we could also record them, in 
+  order to retry them later, without the scan. I'm thinking about a
+  directory like `.git/annex/transfer/failed/{upload,download}/uuid/`,
+  which failed transfer log files could be moved to.
+* A remote may lose content it had before, so when requeuing
+  a failed download, check the location log to see if the remote still has
+  the content, and if not, queue a download from elsewhere. (And, a remote
+  may get content we were uploading from elsewhere, so check the location
+  log when queuing a failed Upload too.) **done**
+* Fix MountWatcher to notice umounts and remounts of drives. **done**
+* Run transfer scan on startup. **done**