Merge remote-tracking branch 'origin/master'

2012-08-24 12:17:24 -04:00 · 2012-08-24 12:17:24 -04:00 · bc6eaa4ebb
commit bc6eaa4ebb
parent b985e0b7ec 199cedf978
5 changed files with 115 additions and 3 deletions
--- a/doc/design/assistant/blog/day_61__network_connection_detection.mdwn
+++ b/doc/design/assistant/blog/day_61__network_connection_detection.mdwn
@ -0,0 +1,36 @@
 Today, added a thread that deals with recovering when there's been a loss
 of network connectivity. When the network's down, the normal immediate
 syncing of changes of course doesn't work. So this thread detects when the
 network comes back up, and does a pull+push to network remotes, and
 triggers scanning for file content that needs to be transferred.
 I used dbus again, to detect events generated by both network-manager and
 wicd when they've sucessfully brought an interface up. Or, if they're not
 available, it polls every 30 minutes.
 When the network comes up, in addition to the git pull+push, it also
 currently does a full scan of the repo to find files whose contents
 need to be transferred to get fully back into sync.
 I think it'll be ok for some git pulls and pushes to happen when
 moving to a new network, or resuming a laptop (or every 30 minutes when
 resorting to polling). But the transfer scan is currently really too heavy
 to be appropriate to do every time in those situations. I have an idea for
 avoiding that scan when the remote's git-annex branch has not changed. But
 I need to refine it, to handle cases like this:
 1. a new remote is added
 2. file contents start being transferred to (or from it)
 3. the network is taken down
 4. all the queued transfers fail
 5. the network comes back up
 6. the transfer scan needs to know the remote was not all in sync
   before #3, and so should do a full scan despite the git-annex branch
   not having changed
 ---
 Doubled the ram in my netbook, which I use for all development. Yesod needs
 rather a lot of ram to compile and link, and this should make me quite a
 lot more productive. I was struggling with OOM killing bits of chromium
 during my last week of development.
--- a/doc/design/assistant/blog/day_61__network_connection_detection/comment_1_09b58f41a8d48f218619711ee19511ac._comment
+++ b/doc/design/assistant/blog/day_61__network_connection_detection/comment_1_09b58f41a8d48f218619711ee19511ac._comment
@ -0,0 +1,8 @@
 [[!comment format=mdwn
 username="https://www.google.com/accounts/o8/id?id=AItOawmubB1Sj2rwFoVdZYvGV0ACaQUJQyiJXJI"
 nickname="Paul"
 subject="Amazon Glacier"
 date="2012-08-23T06:32:24Z"
 content="""
 Do you think git-annex could support [Amazon Glacier](http://aws.amazon.com/glacier/) as a backend?
 """]]
--- a/doc/design/assistant/blog/day_62__smarter_syncing.mdwn
+++ b/doc/design/assistant/blog/day_62__smarter_syncing.mdwn
@ -0,0 +1,21 @@
 Woke up this morning with most of the design for a smarter approach to
 [[syncing]] in my head. (This is why I sometimes slip up and tell people I
 work on this project 12 hours a day..)
 To keep the current `assistant` branch working while I make changes
 that break use cases that are working, I've started 
 developing in a new branch, `assistant-wip`.
 In it, I've started getting rid of unnecessary expensive transfer scans.
 First optimisation I've done is to detect when a remote that was
 disconnected has diverged its `git-annex` branch from the local branch.
 Only when that's the case does a new transfer scan need to be done, to find
 out what new stuff might be available on that remote, to have caused the
 change to its branch, while it was disconnected.
 That broke a lot of stuff. I have a plan to fix it written down in
 [[syncing]]. It'll involve keeping track of whether a transfer scan has
 ever been done (if not, one should be run), and recording logs when
 transfers failed, so those failed transfers can be retried when the
 remote gets reconnected.
--- a/doc/design/assistant/syncing.mdwn
+++ b/doc/design/assistant/syncing.mdwn
@ -3,9 +3,42 @@ all the other git clones, at both the git level and the key/value level.
 ## immediate action items
-* At startup, and possibly periodically, or when the network connection
+* Optimisations in 5c3e14649ee7c404f86a1b82b648d896762cbbc2 temporarily
-  changes, or some heuristic suggests that a remote was disconnected from
+  broke content syncing in some situations, which need to be added back.
-  us for a while, queue remotes for processing by the TransferScanner.
+
  Now syncing a disconnected remote only starts a transfer scan if the
  remote's git-annex branch has diverged, which indicates it probably has
  new files. But that leaves open the cases where the local repo has
  new files; and where the two repos git branches are in sync, but the
  content transfers are lagging behind; and where the transfer scan has
  never been run.
  Need to track locally whether we're believed to be in sync with a remote.
  This includes:
  * All local content has been transferred to it successfully.
  * The remote has been scanned once for data to transfer from it, and all
    transfers initiated by that scan succeeded.
  Note the complication that, if it's initiated a transfer, our queued
  transfer will be thrown out as unnecessary. But if its transfer then
  fails, that needs to be noticed.
  If we're going to track failed transfers, we could just set a flag,
  and use that flag later to initiate a new transfer scan. We need a flag
  in any case, to ensure that a transfer scan is run for each new remote.
  The flag could be `.git/annex/transfer/scanned/uuid`.
  But, if failed transfers are tracked, we could also record them, in 
  order to retry them later, without the scan. I'm thinking about a
  directory like `.git/annex/transfer/failed/{upload,download}/uuid/`,
  which failed transfer log files could be moved to.
  Note that a remote may lose content it had before, so when requeuing
  a failed download, should check the location log to see if it still has
  the content, and if not, queue a download from elsewhere. (And, a remote
  may get content we were uploading from elsewhere, so check the location
  log when queuing a failed Upload too.)
 * Ensure that when a remote receives content, and updates its location log,
  it syncs that update back out. Prerequisite for:
 * After git sync, identify new content that we don't have that is now available
@ -43,6 +76,10 @@ all the other git clones, at both the git level and the key/value level.
  that need to be done to sync with a remote. Currently it walks the git
  working copy and checks each file.
 ## misc todo
 * --debug will show often unnecessary work being done. Optimise.
 ## data syncing
 There are two parts to data syncing. First, map the network and second,
@ -157,3 +194,5 @@ redone to check it.
  finishes. **done**
 * Test MountWatcher on KDE, and add whatever dbus events KDE emits when
  drives are mounted. **done**
 * It would be nice if, when a USB drive is connected, 
  syncing starts automatically. Use dbus on Linux? **done**
--- a/doc/forum/How_to_define_an_alternative_remote_url_for_a_git_remote_repository63/comment_3_48c3a80c14a85f27d742482b2ccbe628._comment
+++ b/doc/forum/How_to_define_an_alternative_remote_url_for_a_git_remote_repository63/comment_3_48c3a80c14a85f27d742482b2ccbe628._comment
@ -0,0 +1,8 @@
 [[!comment format=mdwn
 username="https://me.yahoo.com/speredenn#aaf38"
 nickname="Jean-Baptiste Carré"
 subject="comment 3"
 date="2012-08-21T18:15:48Z"
 content="""
 You're totally right: The UUIDs are the same. So it shouldn't matter if there are many repositories pointing to the same folder, as you state it. Thanks a lot!
 """]]