update

2012-08-23 16:24:22 -04:00 · 2012-08-23 16:24:22 -04:00 · d5d4b8db34
commit d5d4b8db34
parent dfb6709064
1 changed files with 39 additions and 40 deletions
--- a/doc/design/assistant/syncing.mdwn
+++ b/doc/design/assistant/syncing.mdwn
@ -3,42 +3,12 @@ all the other git clones, at both the git level and the key/value level.

 ## immediate action items

-* Optimisations in 5c3e14649ee7c404f86a1b82b648d896762cbbc2 temporarily
-  broke content syncing in some situations, which need to be added back.
-
-  Now syncing a disconnected remote only starts a transfer scan if the
-  remote's git-annex branch has diverged, which indicates it probably has
-  new files. But that leaves open the cases where the local repo has
-  new files; and where the two repos git branches are in sync, but the
-  content transfers are lagging behind; and where the transfer scan has
-  never been run.
-
-  Need to track locally whether we're believed to be in sync with a remote.
-  This includes:
-  * All local content has been transferred to it successfully.
-  * The remote has been scanned once for data to transfer from it, and all
-    transfers initiated by that scan succeeded.
-
-  Note the complication that, if it's initiated a transfer, our queued
-  transfer will be thrown out as unnecessary. But if its transfer then
-  fails, that needs to be noticed.
-
-  If we're going to track failed transfers, we could just set a flag,
-  and use that flag later to initiate a new transfer scan. We need a flag
-  in any case, to ensure that a transfer scan is run for each new remote.
-  The flag could be `.git/annex/transfer/scanned/uuid`.
-
-  But, if failed transfers are tracked, we could also record them, in 
-  order to retry them later, without the scan. I'm thinking about a
-  directory like `.git/annex/transfer/failed/{upload,download}/uuid/`,
-  which failed transfer log files could be moved to.
-
-  Note that a remote may lose content it had before, so when requeuing
-  a failed download, should check the location log to see if it still has
+* Fix MountWatcher to notice umounts and remounts of drives.
+* A remote may lose content it had before, so when requeuing
+  a failed download, check the location log to see if the remote still has
  the content, and if not, queue a download from elsewhere. (And, a remote
  may get content we were uploading from elsewhere, so check the location
  log when queuing a failed Upload too.)
-
 * Ensure that when a remote receives content, and updates its location log,
  it syncs that update back out. Prerequisite for:
 * After git sync, identify new content that we don't have that is now available
@ -67,18 +37,17 @@ all the other git clones, at both the git level and the key/value level.
  files in some directories and not others. See for use cases:
  [[forum/Wishlist:_options_for_syncing_meta-data_and_data]]
 * speed up git syncing by using the cached ssh connection for it too
-  (will need to use `GIT_SSH`, which needs to point to a command to run,
-  not a shell command line)
+  Will need to use `GIT_SSH`, which needs to point to a command to run,
+  not a shell command line. Beware that the network connection may have
+  bounced and the cached ssh connection not be usable.
 * Map the network of git repos, and use that map to calculate
  optimal transfers to keep the data in sync. Currently a naive flood fill
  is done instead.
 * Find a more efficient way for the TransferScanner to find the transfers
  that need to be done to sync with a remote. Currently it walks the git
-  working copy and checks each file.
-
-## misc todo
-
-* --debug will show often unnecessary work being done. Optimise.
+  working copy and checks each file. That probably needs to be done once,
+  but further calls to the TransferScanner could eg, look at the delta
+  between the last scan and the current one in the git-annex branch.

 ## data syncing

@ -196,3 +165,33 @@ redone to check it.
  drives are mounted. **done**
 * It would be nice if, when a USB drive is connected, 
  syncing starts automatically. Use dbus on Linux? **done**
+* Optimisations in 5c3e14649ee7c404f86a1b82b648d896762cbbc2 temporarily
+  broke content syncing in some situations, which need to be added back.
+  **done**
+
+  Now syncing a disconnected remote only starts a transfer scan if the
+  remote's git-annex branch has diverged, which indicates it probably has
+  new files. But that leaves open the cases where the local repo has
+  new files; and where the two repos git branches are in sync, but the
+  content transfers are lagging behind; and where the transfer scan has
+  never been run.
+
+  Need to track locally whether we're believed to be in sync with a remote.
+  This includes:
+  * All local content has been transferred to it successfully.
+  * The remote has been scanned once for data to transfer from it, and all
+    transfers initiated by that scan succeeded.
+
+  Note the complication that, if it's initiated a transfer, our queued
+  transfer will be thrown out as unnecessary. But if its transfer then
+  fails, that needs to be noticed.
+
+  If we're going to track failed transfers, we could just set a flag,
+  and use that flag later to initiate a new transfer scan. We need a flag
+  in any case, to ensure that a transfer scan is run for each new remote.
+  The flag could be `.git/annex/transfer/scanned/uuid`.
+
+  But, if failed transfers are tracked, we could also record them, in 
+  order to retry them later, without the scan. I'm thinking about a
+  directory like `.git/annex/transfer/failed/{upload,download}/uuid/`,
+  which failed transfer log files could be moved to.