Merge branch 'master' into assistant

Conflicts: debian/changelog Updated changelog for assistant and webapp
2012-08-27 13:31:54 -04:00 · 2012-08-27 13:31:54 -04:00 · b12db9ef92
commit b12db9ef92
parent 347d3892e7 d228e4ca8c
17 changed files with 284 additions and 11 deletions
--- a/debian/changelog
+++ b/debian/changelog
@ -1,13 +1,22 @@
-git-annex (3.20120808) UNRELEASED; urgency=low
+git-annex (3.20120826) UNRELEASED; urgency=low
  * assistant: New command, a daemon which does everything watch does,
    as well as automatically syncing file contents between repositories.
  * webapp: New command (and FreeDesktop menu item) that allows managing
    and configuring the assistant in a web browser.
  * init: If no description is provided for a new repository, one will
    automatically be generated, like "joey@gnu:~/foo"
 -- Joey Hess <joeyh@debian.org>  Mon, 27 Aug 2012 13:27:39 -0400
 git-annex (3.20120825) unstable; urgency=low
  * S3: Add fileprefix setting.
  * Pass --use-agent to gpg when in no tty mode. Thanks, Eskild Hustvedt.
  * init: If no description is provided for a new repository, one will
    automatically be generated, like "joey@gnu:~/foo"
  * Bugfix: Fix fsck in SHA*E backends, when the key contains composite
    extensions, as added in 3.20120721.
- -- Joey Hess <joeyh@debian.org>  Thu, 09 Aug 2012 13:51:47 -0400
+ -- Joey Hess <joeyh@debian.org>  Sat, 25 Aug 2012 10:00:10 -0400
 git-annex (3.20120807) unstable; urgency=low
--- a/doc/bugs/fsck_thinks_file_content_is_bad_when_it_isn39t.mdwn
+++ b/doc/bugs/fsck_thinks_file_content_is_bad_when_it_isn39t.mdwn
@ -22,3 +22,14 @@ The original file also has sha512 ead9db1f34739014a216239d9624bce74d92fe723de065
 >> And what sha512 does the file in .git/annex/bad have **now**? (fsck
 >> preserves the original filename; this says nothing about what the
 >> current checksum is, if the file has been corrupted). --[[Joey]]
 The same, as it's the file I was trying to inject:
 ead9db1f34739014a216239d9624bce74d92fe723de06505f9b94cb4c063142ba42b04546f11d3d33869b736e40ded2ff779cb32b26aa10482f09407df0f3c8d  .git/annex/bad/SHA512E-s94402560--ead9db1f34739014a216239d9624bce74d92fe723de06505f9b94cb4c063142ba42b04546f11d3d33869b736e40ded2ff779cb32b26aa10482f09407df0f3c8d.Moon.avi
 That's what puzzles me, it is the same file, but for some weird reason git annex thinks it's not.
 > Ok, reproduced and fixed the bug. The "E" backends recently got support
 > for 2 levels of filename extensions, but were not made to drop them both
 > when fscking. [[done]] (I'll release a fixed version probably tomorrow;
 > fix is in git now.) --[[Joey]] 
--- a/doc/design/assistant/blog/day_61__network_connection_detection.mdwn
+++ b/doc/design/assistant/blog/day_61__network_connection_detection.mdwn
@ -0,0 +1,36 @@
 Today, added a thread that deals with recovering when there's been a loss
 of network connectivity. When the network's down, the normal immediate
 syncing of changes of course doesn't work. So this thread detects when the
 network comes back up, and does a pull+push to network remotes, and
 triggers scanning for file content that needs to be transferred.
 I used dbus again, to detect events generated by both network-manager and
 wicd when they've sucessfully brought an interface up. Or, if they're not
 available, it polls every 30 minutes.
 When the network comes up, in addition to the git pull+push, it also
 currently does a full scan of the repo to find files whose contents
 need to be transferred to get fully back into sync.
 I think it'll be ok for some git pulls and pushes to happen when
 moving to a new network, or resuming a laptop (or every 30 minutes when
 resorting to polling). But the transfer scan is currently really too heavy
 to be appropriate to do every time in those situations. I have an idea for
 avoiding that scan when the remote's git-annex branch has not changed. But
 I need to refine it, to handle cases like this:
 1. a new remote is added
 2. file contents start being transferred to (or from it)
 3. the network is taken down
 4. all the queued transfers fail
 5. the network comes back up
 6. the transfer scan needs to know the remote was not all in sync
   before #3, and so should do a full scan despite the git-annex branch
   not having changed
 ---
 Doubled the ram in my netbook, which I use for all development. Yesod needs
 rather a lot of ram to compile and link, and this should make me quite a
 lot more productive. I was struggling with OOM killing bits of chromium
 during my last week of development.
--- a/doc/design/assistant/blog/day_61__network_connection_detection/comment_1_09b58f41a8d48f218619711ee19511ac._comment
+++ b/doc/design/assistant/blog/day_61__network_connection_detection/comment_1_09b58f41a8d48f218619711ee19511ac._comment
@ -0,0 +1,8 @@
 [[!comment format=mdwn
 username="https://www.google.com/accounts/o8/id?id=AItOawmubB1Sj2rwFoVdZYvGV0ACaQUJQyiJXJI"
 nickname="Paul"
 subject="Amazon Glacier"
 date="2012-08-23T06:32:24Z"
 content="""
 Do you think git-annex could support [Amazon Glacier](http://aws.amazon.com/glacier/) as a backend?
 """]]
--- a/doc/design/assistant/blog/day_62__smarter_syncing.mdwn
+++ b/doc/design/assistant/blog/day_62__smarter_syncing.mdwn
@ -0,0 +1,21 @@
 Woke up this morning with most of the design for a smarter approach to
 [[syncing]] in my head. (This is why I sometimes slip up and tell people I
 work on this project 12 hours a day..)
 To keep the current `assistant` branch working while I make changes
 that break use cases that are working, I've started 
 developing in a new branch, `assistant-wip`.
 In it, I've started getting rid of unnecessary expensive transfer scans.
 First optimisation I've done is to detect when a remote that was
 disconnected has diverged its `git-annex` branch from the local branch.
 Only when that's the case does a new transfer scan need to be done, to find
 out what new stuff might be available on that remote, to have caused the
 change to its branch, while it was disconnected.
 That broke a lot of stuff. I have a plan to fix it written down in
 [[syncing]]. It'll involve keeping track of whether a transfer scan has
 ever been done (if not, one should be run), and recording logs when
 transfers failed, so those failed transfers can be retried when the
 remote gets reconnected.
--- a/doc/design/assistant/blog/day_63__transfer_retries.mdwn
+++ b/doc/design/assistant/blog/day_63__transfer_retries.mdwn
@ -0,0 +1,26 @@
 Implemented everything I planned out yesterday: Expensive scans are only
 done once per remote (unless the remote changed while it was disconnected),
 and failed transfers are logged so they can be retried later.
 Changed the TransferScanner to prefer to scan low cost remotes first,
 as a crude form of scheduling lower-cost transfers first.
 A whole bunch of interesting syncing scenarios should work now. I have not
 tested them all in detail, but to the best of my knowledge, all these
 should work:
 * Connect to the network. It starts syncing with a networked remote.
  Disconnect the network. Reconnect, and it resumes where it left off.
 * Migrate between networks (ie, home to cafe to work). Any transfers
  that can only happen on one LAN are retried on each new network you
  visit, until they succeed.
 One that is not working, but is soooo close:
 * Plug in a removable drive. Some transfers start. Yank the plug.
  Plug it back in. All necessary transfers resume, and it ends up
  fully in sync, no matter how many times you yank that cable.
 That's not working because of an infelicity in the MountWatcher.
 It doesn't notice when the drive gets unmounted, so it ignores
 the new mount event.
--- a/doc/design/assistant/blog/day_63__transfer_retries/comment_1_990d4eb6066c4e2b9ddb3cabef32e4b9._comment
+++ b/doc/design/assistant/blog/day_63__transfer_retries/comment_1_990d4eb6066c4e2b9ddb3cabef32e4b9._comment
@ -0,0 +1,10 @@
 [[!comment format=mdwn
 username="https://www.google.com/accounts/o8/id?id=AItOawmBUR4O9mofxVbpb8JV9mEbVfIYv670uJo"
 nickname="Justin"
 subject="comment 1"
 date="2012-08-23T21:25:48Z"
 content="""
 Do encrypted rsync remotes resume quickly as well?
 One thing I noticed was that if a copy --to an encrypted rsync remote gets interrupted it will remove the tmp file and re-encrypt the whole file before resuming rsync.
 """]]
--- a/doc/design/assistant/blog/day_64__syncing_robustly.mdwn
+++ b/doc/design/assistant/blog/day_64__syncing_robustly.mdwn
@ -0,0 +1,33 @@
 Working toward getting the data syncing to happen robustly,
 so a bunch of improvements.
 * Got unmount events to be noticed, so unplugging and replugging
  a removable drive will resume the syncing to it. There's really no
  good unmount event available on dbus in kde, so it uses a heuristic
  there.
 * Avoid requeuing a download from a remote that no longer has a key.
 * Run a full scan on startup, for multiple reasons, including dealing with
  crashes.
 Ran into a strange issue: Occasionally the assistant will run `git-annex
 copy` and it will not transfer the requested file. It seems that
 when the copy command runs `git ls-files`, it does not see the file
 it's supposed to act on in its output.
 Eventually I figured out what's going on: When updating the git-annex
 branch, it sets `GIT_INDEX_FILE`, and of course environment settings are
 not thread-safe! So there's a race between threads that access
 the git-annex branch, and the Transferrer thread, or any other thread
 that might expect to look at the normal git index.
 Unfortunatly, I don't have a fix for this yet.. Git's only interface for
 using a different index file is `GIT_INDEX_FILE`. It seems I have a lot of
 code to tear apart, to push back the setenv until after forking every git
 command. :(
 Before I figured out the root problem, I developed a workaround for the
 symptom I was seeing. I added a `git-annex transferkey`, which is
 optimised to be run by the assistant, and avoids running `git ls-files`, so
 avoids the problem. While I plan to fix this environment variable problem
 properly, `transferkey` turns out to be so much faster than how it was
 using `copy` that I'm going to keep it.
--- a/doc/design/assistant/syncing.mdwn
+++ b/doc/design/assistant/syncing.mdwn
@ -3,9 +3,16 @@ all the other git clones, at both the git level and the key/value level.
 ## immediate action items
-* At startup, and possibly periodically, or when the network connection
+* The syncing code currently doesn't run for special remotes. While
-  changes, or some heuristic suggests that a remote was disconnected from
+  transfering the git info about special remotes could be a complication,
-  us for a while, queue remotes for processing by the TransferScanner.
+  if we assume that's synced between existing git remotes, it should be
  possible for them to do file transfers to/from special remotes.
 * Often several remotes will be queued for full TransferScanner scans,
  and the scan does the same thing for each .. so it would be better to
  combine them into one scan in such a case.
 * Sometimes a Download gets queued from a slow remote, and then a fast
  remote becomes available, and a Download is queued from it. Would be
  good to sort the transfer queue to run fast Downloads (and Uploads) first.
 * Ensure that when a remote receives content, and updates its location log,
  it syncs that update back out. Prerequisite for:
 * After git sync, identify new content that we don't have that is now available
@ -34,14 +41,17 @@ all the other git clones, at both the git level and the key/value level.
  files in some directories and not others. See for use cases:
  [[forum/Wishlist:_options_for_syncing_meta-data_and_data]]
 * speed up git syncing by using the cached ssh connection for it too
-  (will need to use `GIT_SSH`, which needs to point to a command to run,
+  Will need to use `GIT_SSH`, which needs to point to a command to run,
-  not a shell command line)
+  not a shell command line. Beware that the network connection may have
  bounced and the cached ssh connection not be usable.
 * Map the network of git repos, and use that map to calculate
  optimal transfers to keep the data in sync. Currently a naive flood fill
  is done instead.
 * Find a more efficient way for the TransferScanner to find the transfers
  that need to be done to sync with a remote. Currently it walks the git
-  working copy and checks each file.
+  working copy and checks each file. That probably needs to be done once,
  but further calls to the TransferScanner could eg, look at the delta
  between the last scan and the current one in the git-annex branch.
 ## misc todo
@ -163,3 +173,42 @@ redone to check it.
  finishes. **done**
 * Test MountWatcher on KDE, and add whatever dbus events KDE emits when
  drives are mounted. **done**
 * It would be nice if, when a USB drive is connected, 
  syncing starts automatically. Use dbus on Linux? **done**
 * Optimisations in 5c3e14649ee7c404f86a1b82b648d896762cbbc2 temporarily
  broke content syncing in some situations, which need to be added back.
  **done**
  Now syncing a disconnected remote only starts a transfer scan if the
  remote's git-annex branch has diverged, which indicates it probably has
  new files. But that leaves open the cases where the local repo has
  new files; and where the two repos git branches are in sync, but the
  content transfers are lagging behind; and where the transfer scan has
  never been run.
  Need to track locally whether we're believed to be in sync with a remote.
  This includes:
  * All local content has been transferred to it successfully.
  * The remote has been scanned once for data to transfer from it, and all
    transfers initiated by that scan succeeded.
  Note the complication that, if it's initiated a transfer, our queued
  transfer will be thrown out as unnecessary. But if its transfer then
  fails, that needs to be noticed.
  If we're going to track failed transfers, we could just set a flag,
  and use that flag later to initiate a new transfer scan. We need a flag
  in any case, to ensure that a transfer scan is run for each new remote.
  The flag could be `.git/annex/transfer/scanned/uuid`.
  But, if failed transfers are tracked, we could also record them, in 
  order to retry them later, without the scan. I'm thinking about a
  directory like `.git/annex/transfer/failed/{upload,download}/uuid/`,
  which failed transfer log files could be moved to.
 * A remote may lose content it had before, so when requeuing
  a failed download, check the location log to see if the remote still has
  the content, and if not, queue a download from elsewhere. (And, a remote
  may get content we were uploading from elsewhere, so check the location
  log when queuing a failed Upload too.) **done**
 * Fix MountWatcher to notice umounts and remounts of drives. **done**
 * Run transfer scan on startup. **done**
--- a/doc/forum/DBus_on_Ubuntu_12.0463.mdwn
+++ b/doc/forum/DBus_on_Ubuntu_12.0463.mdwn
@ -0,0 +1,3 @@
 I tried to compile the assitant branch on Ubuntu 12.04. But i depends on the DBus libraryw hich does not compile with some glibberish errors. Is there a way to solve this?
--- a/doc/forum/DBus_on_Ubuntu_12.0463/comment_1_dc14a40b64b7eda94d1a3fd766cd39cc._comment
+++ b/doc/forum/DBus_on_Ubuntu_12.0463/comment_1_dc14a40b64b7eda94d1a3fd766cd39cc._comment
@ -0,0 +1,28 @@
 [[!comment format=mdwn
 username="http://joeyh.name/"
 ip="4.152.246.119"
 subject="comment 1"
 date="2012-08-25T13:06:31Z"
 content="""
 Hmm, let's see... 
 If the gibberish error is ouyay orgotfay otay otay elltay emay utwhay ethay
 roreay asway, then we can figure it out, surely..
 If the gibberish error looks something like Ḩ̶̞̗̓ͯ̅͒ͪͫe̢ͦ̊ͭͭͤͣ̂͏̢̳̦͔̬ͅ ̣̘̹̄̕͢Ç̛͈͔̹̮̗͈͓̞ͨ͂͑ͅo̿ͥͮ̿͢͏̧̹̗̪͇̫m̷̢̞̙͑̊̔ͧ̍ͩ̇̚ę̜͑̀͝s̖̱̝̩̞̻͐͂̐́̂̇̆͂ 
 .. your use of cabal  
 has accidentually summoned Cthulu! Back slowly away from the monitor!
 Otherwise, you might try installing the `libdbus-1-dev` package with apt, 
 which might make cabal install the haskell dbus bindings successfully. Or
 you could just install the `libghc-dbus-dev` package, which contains the 
 necessary haskell library pre-built. But I don't know if it's in Ubuntu 
 12.04; it only seems to be available in quantal 
 <http://packages.ubuntu.com/search?keywords=libghc-dbus-dev>
 Or you could even build it with the Makefile, rather than using cabal.
 The Makefile has a `-DWITH_DBUS` setting in it that can be removed to build
 the fallback mode that doesn't use dbus.
 """]]
--- a/doc/forum/DBus_on_Ubuntu_12.0463/comment_2_608a30e274e6a691a39f69503720e320._comment
+++ b/doc/forum/DBus_on_Ubuntu_12.0463/comment_2_608a30e274e6a691a39f69503720e320._comment
@ -0,0 +1,10 @@
 [[!comment format=mdwn
 username="http://joeyh.name/"
 ip="4.152.246.119"
 subject="comment 2"
 date="2012-08-25T13:11:37Z"
 content="""
 I fnordgot to mention, cabal can be configured to not build with dbus too. The relevant incantation is:
 cabal install git-annex --flags=\"-Dbus\"
 """]]
--- a/doc/forum/How_to_define_an_alternative_remote_url_for_a_git_remote_repository63/comment_3_48c3a80c14a85f27d742482b2ccbe628._comment
+++ b/doc/forum/How_to_define_an_alternative_remote_url_for_a_git_remote_repository63/comment_3_48c3a80c14a85f27d742482b2ccbe628._comment
@ -0,0 +1,8 @@
 [[!comment format=mdwn
 username="https://me.yahoo.com/speredenn#aaf38"
 nickname="Jean-Baptiste Carré"
 subject="comment 3"
 date="2012-08-21T18:15:48Z"
 content="""
 You're totally right: The UUIDs are the same. So it shouldn't matter if there are many repositories pointing to the same folder, as you state it. Thanks a lot!
 """]]
--- a/doc/install/OSX.mdwn
+++ b/doc/install/OSX.mdwn
@ -11,6 +11,11 @@ sudo cabal update
 cabal install git-annex --bindir=$HOME/bin
 </pre>
 Do not forget to add to your PATH variable your ~/bin folder. In your .bashrc, for example:
 <pre>
 PATH=~/bin:/usr/bin/local:$PATH
 </pre>
 See also:
 * [[forum/OSX__39__s_haskell-platform_statically_links_things]]
--- a/doc/news/version_3.20120825.mdwn
+++ b/doc/news/version_3.20120825.mdwn
@ -0,0 +1,6 @@
 git-annex 3.20120825 released with [[!toggle text="these changes"]]
 [[!toggleable text="""
   * S3: Add fileprefix setting.
   * Pass --use-agent to gpg when in no tty mode. Thanks, Eskild Hustvedt.
   * Bugfix: Fix fsck in SHA*E backends, when the key contains composite
     extensions, as added in 3.20120721."""]]
--- a/doc/special_remotes/S3/comment_6_78da9e233882ec0908962882ea8c4056._comment
+++ b/doc/special_remotes/S3/comment_6_78da9e233882ec0908962882ea8c4056._comment
@ -0,0 +1,10 @@
 [[!comment format=mdwn
 username="https://www.google.com/accounts/o8/id?id=AItOawnY9ObrNrQuRp8Xs0XvdtJJssm5cp4NMZA"
 nickname="alan"
 subject="Rackspace Cloud Files support?"
 date="2012-08-23T21:00:11Z"
 content="""
 Any chance I could bribe you to setup Rackspace Cloud Files support?  We are using them and would hate to have a S3 bucket only for this.
 https://github.com/rackspace/python-cloudfiles
 """]]
--- a/git-annex.cabal
+++ b/git-annex.cabal
@ -1,5 +1,5 @@
 Name: git-annex
-Version: 3.20120807
+Version: 3.20120825
 Cabal-Version: >= 1.8
 License: GPL
 Maintainer: Joey Hess <joey@kitenet.net>
		`@ -0,0 +1,3 @@`
							`I tried to compile the assitant branch on Ubuntu 12.04. But i depends on the DBus libraryw hich does not compile with some glibberish errors. Is there a way to solve this?`