From d5d4b8db345b3e4a81b2ad2d7a4f2e5f1e30d519 Mon Sep 17 00:00:00 2001
From: Joey Hess <joey@kitenet.net>
Date: Thu, 23 Aug 2012 16:24:22 -0400
Subject: [PATCH 1/5] update

---
 doc/design/assistant/syncing.mdwn | 79 +++++++++++++++----------------
 1 file changed, 39 insertions(+), 40 deletions(-)

diff --git a/doc/design/assistant/syncing.mdwn b/doc/design/assistant/syncing.mdwn
index 83c5e9d223..071ea2730a 100644
--- a/doc/design/assistant/syncing.mdwn
+++ b/doc/design/assistant/syncing.mdwn
@@ -3,42 +3,12 @@ all the other git clones, at both the git level and the key/value level.
 
 ## immediate action items
 
-* Optimisations in 5c3e14649ee7c404f86a1b82b648d896762cbbc2 temporarily
-  broke content syncing in some situations, which need to be added back.
-
-  Now syncing a disconnected remote only starts a transfer scan if the
-  remote's git-annex branch has diverged, which indicates it probably has
-  new files. But that leaves open the cases where the local repo has
-  new files; and where the two repos git branches are in sync, but the
-  content transfers are lagging behind; and where the transfer scan has
-  never been run.
-
-  Need to track locally whether we're believed to be in sync with a remote.
-  This includes:
-  * All local content has been transferred to it successfully.
-  * The remote has been scanned once for data to transfer from it, and all
-    transfers initiated by that scan succeeded.
-
-  Note the complication that, if it's initiated a transfer, our queued
-  transfer will be thrown out as unnecessary. But if its transfer then
-  fails, that needs to be noticed.
-
-  If we're going to track failed transfers, we could just set a flag,
-  and use that flag later to initiate a new transfer scan. We need a flag
-  in any case, to ensure that a transfer scan is run for each new remote.
-  The flag could be `.git/annex/transfer/scanned/uuid`.
-
-  But, if failed transfers are tracked, we could also record them, in 
-  order to retry them later, without the scan. I'm thinking about a
-  directory like `.git/annex/transfer/failed/{upload,download}/uuid/`,
-  which failed transfer log files could be moved to.
-
-  Note that a remote may lose content it had before, so when requeuing
-  a failed download, should check the location log to see if it still has
+* Fix MountWatcher to notice umounts and remounts of drives.
+* A remote may lose content it had before, so when requeuing
+  a failed download, check the location log to see if the remote still has
   the content, and if not, queue a download from elsewhere. (And, a remote
   may get content we were uploading from elsewhere, so check the location
   log when queuing a failed Upload too.)
-
 * Ensure that when a remote receives content, and updates its location log,
   it syncs that update back out. Prerequisite for:
 * After git sync, identify new content that we don't have that is now available
@@ -67,18 +37,17 @@ all the other git clones, at both the git level and the key/value level.
   files in some directories and not others. See for use cases:
   [[forum/Wishlist:_options_for_syncing_meta-data_and_data]]
 * speed up git syncing by using the cached ssh connection for it too
-  (will need to use `GIT_SSH`, which needs to point to a command to run,
-  not a shell command line)
+  Will need to use `GIT_SSH`, which needs to point to a command to run,
+  not a shell command line. Beware that the network connection may have
+  bounced and the cached ssh connection not be usable.
 * Map the network of git repos, and use that map to calculate
   optimal transfers to keep the data in sync. Currently a naive flood fill
   is done instead.
 * Find a more efficient way for the TransferScanner to find the transfers
   that need to be done to sync with a remote. Currently it walks the git
-  working copy and checks each file.
-
-## misc todo
-
-* --debug will show often unnecessary work being done. Optimise.
+  working copy and checks each file. That probably needs to be done once,
+  but further calls to the TransferScanner could eg, look at the delta
+  between the last scan and the current one in the git-annex branch.
 
 ## data syncing
 
@@ -196,3 +165,33 @@ redone to check it.
   drives are mounted. **done**
 * It would be nice if, when a USB drive is connected, 
   syncing starts automatically. Use dbus on Linux? **done**
+* Optimisations in 5c3e14649ee7c404f86a1b82b648d896762cbbc2 temporarily
+  broke content syncing in some situations, which need to be added back.
+  **done**
+
+  Now syncing a disconnected remote only starts a transfer scan if the
+  remote's git-annex branch has diverged, which indicates it probably has
+  new files. But that leaves open the cases where the local repo has
+  new files; and where the two repos git branches are in sync, but the
+  content transfers are lagging behind; and where the transfer scan has
+  never been run.
+
+  Need to track locally whether we're believed to be in sync with a remote.
+  This includes:
+  * All local content has been transferred to it successfully.
+  * The remote has been scanned once for data to transfer from it, and all
+    transfers initiated by that scan succeeded.
+
+  Note the complication that, if it's initiated a transfer, our queued
+  transfer will be thrown out as unnecessary. But if its transfer then
+  fails, that needs to be noticed.
+
+  If we're going to track failed transfers, we could just set a flag,
+  and use that flag later to initiate a new transfer scan. We need a flag
+  in any case, to ensure that a transfer scan is run for each new remote.
+  The flag could be `.git/annex/transfer/scanned/uuid`.
+
+  But, if failed transfers are tracked, we could also record them, in 
+  order to retry them later, without the scan. I'm thinking about a
+  directory like `.git/annex/transfer/failed/{upload,download}/uuid/`,
+  which failed transfer log files could be moved to.

From 73c24e05d86e0d8ef5c312fde92e7898154a01e2 Mon Sep 17 00:00:00 2001
From: Joey Hess <joey@kitenet.net>
Date: Thu, 23 Aug 2012 16:27:21 -0400
Subject: [PATCH 2/5] blog for the day

---
 .../blog/day_63__transfer_retries.mdwn        | 26 +++++++++++++++++++
 1 file changed, 26 insertions(+)
 create mode 100644 doc/design/assistant/blog/day_63__transfer_retries.mdwn

diff --git a/doc/design/assistant/blog/day_63__transfer_retries.mdwn b/doc/design/assistant/blog/day_63__transfer_retries.mdwn
new file mode 100644
index 0000000000..d668f507ba
--- /dev/null
+++ b/doc/design/assistant/blog/day_63__transfer_retries.mdwn
@@ -0,0 +1,26 @@
+Implemented everything I planned out yesterday: Expensive scans are only
+done once per remote (unless the remote changed while it was disconnected),
+and failed transfers are logged so they can be retried later.
+
+Changed the TransferScanner to prefer to scan low cost remotes first,
+as a crude form of scheduling lower-cost transfers first.
+
+A whole bunch of interesting syncing scenarios should work now. I have not
+tested them all in detail, but to the best of my knowledge, all these
+should work:
+
+* Connect to the network. It starts syncing with a networked remote.
+  Disconnect the network. Reconnect, and it resumes where it left off.
+* Migrate between networks (ie, home to cafe to work). Any transfers
+  that can only happen on one LAN are retried on each new network you
+  visit, until they succeed.
+
+One that is not working, but is soooo close:
+
+* Plug in a removable drive. Some transfers start. Yank the plug.
+  Plug it back in. All necessary transfers resume, and it ends up
+  fully in sync, no matter how many times you yank that cable.
+
+That's not working because of an infelicity in the MountWatcher.
+It doesn't notice when the drive gets unmounted, so it ignores
+the new mount event.

From 476c60ce1fd1b5298d639c77103958b929d87a42 Mon Sep 17 00:00:00 2001
From: 
 "https://www.google.com/accounts/o8/id?id=AItOawnY9ObrNrQuRp8Xs0XvdtJJssm5cp4NMZA"
 <alan@web>
Date: Thu, 23 Aug 2012 21:00:12 +0000
Subject: [PATCH 3/5] Added a comment: Rackspace Cloud Files support?

---
 ...comment_6_78da9e233882ec0908962882ea8c4056._comment | 10 ++++++++++
 1 file changed, 10 insertions(+)
 create mode 100644 doc/special_remotes/S3/comment_6_78da9e233882ec0908962882ea8c4056._comment

diff --git a/doc/special_remotes/S3/comment_6_78da9e233882ec0908962882ea8c4056._comment b/doc/special_remotes/S3/comment_6_78da9e233882ec0908962882ea8c4056._comment
new file mode 100644
index 0000000000..742dbedc2f
--- /dev/null
+++ b/doc/special_remotes/S3/comment_6_78da9e233882ec0908962882ea8c4056._comment
@@ -0,0 +1,10 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawnY9ObrNrQuRp8Xs0XvdtJJssm5cp4NMZA"
+ nickname="alan"
+ subject="Rackspace Cloud Files support?"
+ date="2012-08-23T21:00:11Z"
+ content="""
+Any chance I could bribe you to setup Rackspace Cloud Files support?  We are using them and would hate to have a S3 bucket only for this.
+
+https://github.com/rackspace/python-cloudfiles
+"""]]

From bb949ed2d278baaa4cc8542ff7c477dc535aeb3e Mon Sep 17 00:00:00 2001
From: 
 "https://www.google.com/accounts/o8/id?id=AItOawmBUR4O9mofxVbpb8JV9mEbVfIYv670uJo"
 <Justin@web>
Date: Thu, 23 Aug 2012 21:25:48 +0000
Subject: [PATCH 4/5] Added a comment

---
 ...comment_1_990d4eb6066c4e2b9ddb3cabef32e4b9._comment | 10 ++++++++++
 1 file changed, 10 insertions(+)
 create mode 100644 doc/design/assistant/blog/day_63__transfer_retries/comment_1_990d4eb6066c4e2b9ddb3cabef32e4b9._comment

diff --git a/doc/design/assistant/blog/day_63__transfer_retries/comment_1_990d4eb6066c4e2b9ddb3cabef32e4b9._comment b/doc/design/assistant/blog/day_63__transfer_retries/comment_1_990d4eb6066c4e2b9ddb3cabef32e4b9._comment
new file mode 100644
index 0000000000..119aee2c91
--- /dev/null
+++ b/doc/design/assistant/blog/day_63__transfer_retries/comment_1_990d4eb6066c4e2b9ddb3cabef32e4b9._comment
@@ -0,0 +1,10 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawmBUR4O9mofxVbpb8JV9mEbVfIYv670uJo"
+ nickname="Justin"
+ subject="comment 1"
+ date="2012-08-23T21:25:48Z"
+ content="""
+Do encrypted rsync remotes resume quickly as well?
+
+One thing I noticed was that if a copy --to an encrypted rsync remote gets interrupted it will remove the tmp file and re-encrypt the whole file before resuming rsync.
+"""]]

From d25f407e6767c8ce9214fcc7c503178cfa3fa9f5 Mon Sep 17 00:00:00 2001
From: 
 "https://www.google.com/accounts/o8/id?id=AItOawla3gLc6_rHuggFfy7o7eGMPvPztFZTrUQ"
 <Florian@web>
Date: Fri, 24 Aug 2012 08:44:56 +0000
Subject: [PATCH 5/5]

---
 .../fsck_thinks_file_content_is_bad_when_it_isn__39__t.mdwn | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/doc/bugs/fsck_thinks_file_content_is_bad_when_it_isn__39__t.mdwn b/doc/bugs/fsck_thinks_file_content_is_bad_when_it_isn__39__t.mdwn
index e15529c645..883c53d36f 100644
--- a/doc/bugs/fsck_thinks_file_content_is_bad_when_it_isn__39__t.mdwn
+++ b/doc/bugs/fsck_thinks_file_content_is_bad_when_it_isn__39__t.mdwn
@@ -22,3 +22,9 @@ The original file also has sha512 ead9db1f34739014a216239d9624bce74d92fe723de065
 >> And what sha512 does the file in .git/annex/bad have **now**? (fsck
 >> preserves the original filename; this says nothing about what the
 >> current checksum is, if the file has been corrupted). --[[Joey]]
+
+The same, as it's the file I was trying to inject:
+
+ead9db1f34739014a216239d9624bce74d92fe723de06505f9b94cb4c063142ba42b04546f11d3d33869b736e40ded2ff779cb32b26aa10482f09407df0f3c8d  .git/annex/bad/SHA512E-s94402560--ead9db1f34739014a216239d9624bce74d92fe723de06505f9b94cb4c063142ba42b04546f11d3d33869b736e40ded2ff779cb32b26aa10482f09407df0f3c8d.Moon.avi
+
+That's what puzzles me, it is the same file, but for some weird reason git annex thinks it's not.