Merge branch 'master' into assistant
Conflicts: debian/changelog Updated changelog for assistant and webapp
This commit is contained in:
commit
b12db9ef92
17 changed files with 284 additions and 11 deletions
17
debian/changelog
vendored
17
debian/changelog
vendored
|
@ -1,13 +1,22 @@
|
||||||
git-annex (3.20120808) UNRELEASED; urgency=low
|
git-annex (3.20120826) UNRELEASED; urgency=low
|
||||||
|
|
||||||
|
* assistant: New command, a daemon which does everything watch does,
|
||||||
|
as well as automatically syncing file contents between repositories.
|
||||||
|
* webapp: New command (and FreeDesktop menu item) that allows managing
|
||||||
|
and configuring the assistant in a web browser.
|
||||||
|
* init: If no description is provided for a new repository, one will
|
||||||
|
automatically be generated, like "joey@gnu:~/foo"
|
||||||
|
|
||||||
|
-- Joey Hess <joeyh@debian.org> Mon, 27 Aug 2012 13:27:39 -0400
|
||||||
|
|
||||||
|
git-annex (3.20120825) unstable; urgency=low
|
||||||
|
|
||||||
* S3: Add fileprefix setting.
|
* S3: Add fileprefix setting.
|
||||||
* Pass --use-agent to gpg when in no tty mode. Thanks, Eskild Hustvedt.
|
* Pass --use-agent to gpg when in no tty mode. Thanks, Eskild Hustvedt.
|
||||||
* init: If no description is provided for a new repository, one will
|
|
||||||
automatically be generated, like "joey@gnu:~/foo"
|
|
||||||
* Bugfix: Fix fsck in SHA*E backends, when the key contains composite
|
* Bugfix: Fix fsck in SHA*E backends, when the key contains composite
|
||||||
extensions, as added in 3.20120721.
|
extensions, as added in 3.20120721.
|
||||||
|
|
||||||
-- Joey Hess <joeyh@debian.org> Thu, 09 Aug 2012 13:51:47 -0400
|
-- Joey Hess <joeyh@debian.org> Sat, 25 Aug 2012 10:00:10 -0400
|
||||||
|
|
||||||
git-annex (3.20120807) unstable; urgency=low
|
git-annex (3.20120807) unstable; urgency=low
|
||||||
|
|
||||||
|
|
|
@ -22,3 +22,14 @@ The original file also has sha512 ead9db1f34739014a216239d9624bce74d92fe723de065
|
||||||
>> And what sha512 does the file in .git/annex/bad have **now**? (fsck
|
>> And what sha512 does the file in .git/annex/bad have **now**? (fsck
|
||||||
>> preserves the original filename; this says nothing about what the
|
>> preserves the original filename; this says nothing about what the
|
||||||
>> current checksum is, if the file has been corrupted). --[[Joey]]
|
>> current checksum is, if the file has been corrupted). --[[Joey]]
|
||||||
|
|
||||||
|
The same, as it's the file I was trying to inject:
|
||||||
|
|
||||||
|
ead9db1f34739014a216239d9624bce74d92fe723de06505f9b94cb4c063142ba42b04546f11d3d33869b736e40ded2ff779cb32b26aa10482f09407df0f3c8d .git/annex/bad/SHA512E-s94402560--ead9db1f34739014a216239d9624bce74d92fe723de06505f9b94cb4c063142ba42b04546f11d3d33869b736e40ded2ff779cb32b26aa10482f09407df0f3c8d.Moon.avi
|
||||||
|
|
||||||
|
That's what puzzles me, it is the same file, but for some weird reason git annex thinks it's not.
|
||||||
|
|
||||||
|
> Ok, reproduced and fixed the bug. The "E" backends recently got support
|
||||||
|
> for 2 levels of filename extensions, but were not made to drop them both
|
||||||
|
> when fscking. [[done]] (I'll release a fixed version probably tomorrow;
|
||||||
|
> fix is in git now.) --[[Joey]]
|
||||||
|
|
|
@ -0,0 +1,36 @@
|
||||||
|
Today, added a thread that deals with recovering when there's been a loss
|
||||||
|
of network connectivity. When the network's down, the normal immediate
|
||||||
|
syncing of changes of course doesn't work. So this thread detects when the
|
||||||
|
network comes back up, and does a pull+push to network remotes, and
|
||||||
|
triggers scanning for file content that needs to be transferred.
|
||||||
|
|
||||||
|
I used dbus again, to detect events generated by both network-manager and
|
||||||
|
wicd when they've sucessfully brought an interface up. Or, if they're not
|
||||||
|
available, it polls every 30 minutes.
|
||||||
|
|
||||||
|
When the network comes up, in addition to the git pull+push, it also
|
||||||
|
currently does a full scan of the repo to find files whose contents
|
||||||
|
need to be transferred to get fully back into sync.
|
||||||
|
|
||||||
|
I think it'll be ok for some git pulls and pushes to happen when
|
||||||
|
moving to a new network, or resuming a laptop (or every 30 minutes when
|
||||||
|
resorting to polling). But the transfer scan is currently really too heavy
|
||||||
|
to be appropriate to do every time in those situations. I have an idea for
|
||||||
|
avoiding that scan when the remote's git-annex branch has not changed. But
|
||||||
|
I need to refine it, to handle cases like this:
|
||||||
|
|
||||||
|
1. a new remote is added
|
||||||
|
2. file contents start being transferred to (or from it)
|
||||||
|
3. the network is taken down
|
||||||
|
4. all the queued transfers fail
|
||||||
|
5. the network comes back up
|
||||||
|
6. the transfer scan needs to know the remote was not all in sync
|
||||||
|
before #3, and so should do a full scan despite the git-annex branch
|
||||||
|
not having changed
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Doubled the ram in my netbook, which I use for all development. Yesod needs
|
||||||
|
rather a lot of ram to compile and link, and this should make me quite a
|
||||||
|
lot more productive. I was struggling with OOM killing bits of chromium
|
||||||
|
during my last week of development.
|
|
@ -0,0 +1,8 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="https://www.google.com/accounts/o8/id?id=AItOawmubB1Sj2rwFoVdZYvGV0ACaQUJQyiJXJI"
|
||||||
|
nickname="Paul"
|
||||||
|
subject="Amazon Glacier"
|
||||||
|
date="2012-08-23T06:32:24Z"
|
||||||
|
content="""
|
||||||
|
Do you think git-annex could support [Amazon Glacier](http://aws.amazon.com/glacier/) as a backend?
|
||||||
|
"""]]
|
21
doc/design/assistant/blog/day_62__smarter_syncing.mdwn
Normal file
21
doc/design/assistant/blog/day_62__smarter_syncing.mdwn
Normal file
|
@ -0,0 +1,21 @@
|
||||||
|
Woke up this morning with most of the design for a smarter approach to
|
||||||
|
[[syncing]] in my head. (This is why I sometimes slip up and tell people I
|
||||||
|
work on this project 12 hours a day..)
|
||||||
|
|
||||||
|
To keep the current `assistant` branch working while I make changes
|
||||||
|
that break use cases that are working, I've started
|
||||||
|
developing in a new branch, `assistant-wip`.
|
||||||
|
|
||||||
|
In it, I've started getting rid of unnecessary expensive transfer scans.
|
||||||
|
|
||||||
|
First optimisation I've done is to detect when a remote that was
|
||||||
|
disconnected has diverged its `git-annex` branch from the local branch.
|
||||||
|
Only when that's the case does a new transfer scan need to be done, to find
|
||||||
|
out what new stuff might be available on that remote, to have caused the
|
||||||
|
change to its branch, while it was disconnected.
|
||||||
|
|
||||||
|
That broke a lot of stuff. I have a plan to fix it written down in
|
||||||
|
[[syncing]]. It'll involve keeping track of whether a transfer scan has
|
||||||
|
ever been done (if not, one should be run), and recording logs when
|
||||||
|
transfers failed, so those failed transfers can be retried when the
|
||||||
|
remote gets reconnected.
|
26
doc/design/assistant/blog/day_63__transfer_retries.mdwn
Normal file
26
doc/design/assistant/blog/day_63__transfer_retries.mdwn
Normal file
|
@ -0,0 +1,26 @@
|
||||||
|
Implemented everything I planned out yesterday: Expensive scans are only
|
||||||
|
done once per remote (unless the remote changed while it was disconnected),
|
||||||
|
and failed transfers are logged so they can be retried later.
|
||||||
|
|
||||||
|
Changed the TransferScanner to prefer to scan low cost remotes first,
|
||||||
|
as a crude form of scheduling lower-cost transfers first.
|
||||||
|
|
||||||
|
A whole bunch of interesting syncing scenarios should work now. I have not
|
||||||
|
tested them all in detail, but to the best of my knowledge, all these
|
||||||
|
should work:
|
||||||
|
|
||||||
|
* Connect to the network. It starts syncing with a networked remote.
|
||||||
|
Disconnect the network. Reconnect, and it resumes where it left off.
|
||||||
|
* Migrate between networks (ie, home to cafe to work). Any transfers
|
||||||
|
that can only happen on one LAN are retried on each new network you
|
||||||
|
visit, until they succeed.
|
||||||
|
|
||||||
|
One that is not working, but is soooo close:
|
||||||
|
|
||||||
|
* Plug in a removable drive. Some transfers start. Yank the plug.
|
||||||
|
Plug it back in. All necessary transfers resume, and it ends up
|
||||||
|
fully in sync, no matter how many times you yank that cable.
|
||||||
|
|
||||||
|
That's not working because of an infelicity in the MountWatcher.
|
||||||
|
It doesn't notice when the drive gets unmounted, so it ignores
|
||||||
|
the new mount event.
|
|
@ -0,0 +1,10 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="https://www.google.com/accounts/o8/id?id=AItOawmBUR4O9mofxVbpb8JV9mEbVfIYv670uJo"
|
||||||
|
nickname="Justin"
|
||||||
|
subject="comment 1"
|
||||||
|
date="2012-08-23T21:25:48Z"
|
||||||
|
content="""
|
||||||
|
Do encrypted rsync remotes resume quickly as well?
|
||||||
|
|
||||||
|
One thing I noticed was that if a copy --to an encrypted rsync remote gets interrupted it will remove the tmp file and re-encrypt the whole file before resuming rsync.
|
||||||
|
"""]]
|
33
doc/design/assistant/blog/day_64__syncing_robustly.mdwn
Normal file
33
doc/design/assistant/blog/day_64__syncing_robustly.mdwn
Normal file
|
@ -0,0 +1,33 @@
|
||||||
|
Working toward getting the data syncing to happen robustly,
|
||||||
|
so a bunch of improvements.
|
||||||
|
|
||||||
|
* Got unmount events to be noticed, so unplugging and replugging
|
||||||
|
a removable drive will resume the syncing to it. There's really no
|
||||||
|
good unmount event available on dbus in kde, so it uses a heuristic
|
||||||
|
there.
|
||||||
|
* Avoid requeuing a download from a remote that no longer has a key.
|
||||||
|
* Run a full scan on startup, for multiple reasons, including dealing with
|
||||||
|
crashes.
|
||||||
|
|
||||||
|
Ran into a strange issue: Occasionally the assistant will run `git-annex
|
||||||
|
copy` and it will not transfer the requested file. It seems that
|
||||||
|
when the copy command runs `git ls-files`, it does not see the file
|
||||||
|
it's supposed to act on in its output.
|
||||||
|
|
||||||
|
Eventually I figured out what's going on: When updating the git-annex
|
||||||
|
branch, it sets `GIT_INDEX_FILE`, and of course environment settings are
|
||||||
|
not thread-safe! So there's a race between threads that access
|
||||||
|
the git-annex branch, and the Transferrer thread, or any other thread
|
||||||
|
that might expect to look at the normal git index.
|
||||||
|
|
||||||
|
Unfortunatly, I don't have a fix for this yet.. Git's only interface for
|
||||||
|
using a different index file is `GIT_INDEX_FILE`. It seems I have a lot of
|
||||||
|
code to tear apart, to push back the setenv until after forking every git
|
||||||
|
command. :(
|
||||||
|
|
||||||
|
Before I figured out the root problem, I developed a workaround for the
|
||||||
|
symptom I was seeing. I added a `git-annex transferkey`, which is
|
||||||
|
optimised to be run by the assistant, and avoids running `git ls-files`, so
|
||||||
|
avoids the problem. While I plan to fix this environment variable problem
|
||||||
|
properly, `transferkey` turns out to be so much faster than how it was
|
||||||
|
using `copy` that I'm going to keep it.
|
|
@ -3,9 +3,16 @@ all the other git clones, at both the git level and the key/value level.
|
||||||
|
|
||||||
## immediate action items
|
## immediate action items
|
||||||
|
|
||||||
* At startup, and possibly periodically, or when the network connection
|
* The syncing code currently doesn't run for special remotes. While
|
||||||
changes, or some heuristic suggests that a remote was disconnected from
|
transfering the git info about special remotes could be a complication,
|
||||||
us for a while, queue remotes for processing by the TransferScanner.
|
if we assume that's synced between existing git remotes, it should be
|
||||||
|
possible for them to do file transfers to/from special remotes.
|
||||||
|
* Often several remotes will be queued for full TransferScanner scans,
|
||||||
|
and the scan does the same thing for each .. so it would be better to
|
||||||
|
combine them into one scan in such a case.
|
||||||
|
* Sometimes a Download gets queued from a slow remote, and then a fast
|
||||||
|
remote becomes available, and a Download is queued from it. Would be
|
||||||
|
good to sort the transfer queue to run fast Downloads (and Uploads) first.
|
||||||
* Ensure that when a remote receives content, and updates its location log,
|
* Ensure that when a remote receives content, and updates its location log,
|
||||||
it syncs that update back out. Prerequisite for:
|
it syncs that update back out. Prerequisite for:
|
||||||
* After git sync, identify new content that we don't have that is now available
|
* After git sync, identify new content that we don't have that is now available
|
||||||
|
@ -34,14 +41,17 @@ all the other git clones, at both the git level and the key/value level.
|
||||||
files in some directories and not others. See for use cases:
|
files in some directories and not others. See for use cases:
|
||||||
[[forum/Wishlist:_options_for_syncing_meta-data_and_data]]
|
[[forum/Wishlist:_options_for_syncing_meta-data_and_data]]
|
||||||
* speed up git syncing by using the cached ssh connection for it too
|
* speed up git syncing by using the cached ssh connection for it too
|
||||||
(will need to use `GIT_SSH`, which needs to point to a command to run,
|
Will need to use `GIT_SSH`, which needs to point to a command to run,
|
||||||
not a shell command line)
|
not a shell command line. Beware that the network connection may have
|
||||||
|
bounced and the cached ssh connection not be usable.
|
||||||
* Map the network of git repos, and use that map to calculate
|
* Map the network of git repos, and use that map to calculate
|
||||||
optimal transfers to keep the data in sync. Currently a naive flood fill
|
optimal transfers to keep the data in sync. Currently a naive flood fill
|
||||||
is done instead.
|
is done instead.
|
||||||
* Find a more efficient way for the TransferScanner to find the transfers
|
* Find a more efficient way for the TransferScanner to find the transfers
|
||||||
that need to be done to sync with a remote. Currently it walks the git
|
that need to be done to sync with a remote. Currently it walks the git
|
||||||
working copy and checks each file.
|
working copy and checks each file. That probably needs to be done once,
|
||||||
|
but further calls to the TransferScanner could eg, look at the delta
|
||||||
|
between the last scan and the current one in the git-annex branch.
|
||||||
|
|
||||||
## misc todo
|
## misc todo
|
||||||
|
|
||||||
|
@ -163,3 +173,42 @@ redone to check it.
|
||||||
finishes. **done**
|
finishes. **done**
|
||||||
* Test MountWatcher on KDE, and add whatever dbus events KDE emits when
|
* Test MountWatcher on KDE, and add whatever dbus events KDE emits when
|
||||||
drives are mounted. **done**
|
drives are mounted. **done**
|
||||||
|
* It would be nice if, when a USB drive is connected,
|
||||||
|
syncing starts automatically. Use dbus on Linux? **done**
|
||||||
|
* Optimisations in 5c3e14649ee7c404f86a1b82b648d896762cbbc2 temporarily
|
||||||
|
broke content syncing in some situations, which need to be added back.
|
||||||
|
**done**
|
||||||
|
|
||||||
|
Now syncing a disconnected remote only starts a transfer scan if the
|
||||||
|
remote's git-annex branch has diverged, which indicates it probably has
|
||||||
|
new files. But that leaves open the cases where the local repo has
|
||||||
|
new files; and where the two repos git branches are in sync, but the
|
||||||
|
content transfers are lagging behind; and where the transfer scan has
|
||||||
|
never been run.
|
||||||
|
|
||||||
|
Need to track locally whether we're believed to be in sync with a remote.
|
||||||
|
This includes:
|
||||||
|
* All local content has been transferred to it successfully.
|
||||||
|
* The remote has been scanned once for data to transfer from it, and all
|
||||||
|
transfers initiated by that scan succeeded.
|
||||||
|
|
||||||
|
Note the complication that, if it's initiated a transfer, our queued
|
||||||
|
transfer will be thrown out as unnecessary. But if its transfer then
|
||||||
|
fails, that needs to be noticed.
|
||||||
|
|
||||||
|
If we're going to track failed transfers, we could just set a flag,
|
||||||
|
and use that flag later to initiate a new transfer scan. We need a flag
|
||||||
|
in any case, to ensure that a transfer scan is run for each new remote.
|
||||||
|
The flag could be `.git/annex/transfer/scanned/uuid`.
|
||||||
|
|
||||||
|
But, if failed transfers are tracked, we could also record them, in
|
||||||
|
order to retry them later, without the scan. I'm thinking about a
|
||||||
|
directory like `.git/annex/transfer/failed/{upload,download}/uuid/`,
|
||||||
|
which failed transfer log files could be moved to.
|
||||||
|
* A remote may lose content it had before, so when requeuing
|
||||||
|
a failed download, check the location log to see if the remote still has
|
||||||
|
the content, and if not, queue a download from elsewhere. (And, a remote
|
||||||
|
may get content we were uploading from elsewhere, so check the location
|
||||||
|
log when queuing a failed Upload too.) **done**
|
||||||
|
* Fix MountWatcher to notice umounts and remounts of drives. **done**
|
||||||
|
* Run transfer scan on startup. **done**
|
||||||
|
|
3
doc/forum/DBus_on_Ubuntu_12.04__63__.mdwn
Normal file
3
doc/forum/DBus_on_Ubuntu_12.04__63__.mdwn
Normal file
|
@ -0,0 +1,3 @@
|
||||||
|
I tried to compile the assitant branch on Ubuntu 12.04. But i depends on the DBus libraryw hich does not compile with some glibberish errors. Is there a way to solve this?
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,28 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="http://joeyh.name/"
|
||||||
|
ip="4.152.246.119"
|
||||||
|
subject="comment 1"
|
||||||
|
date="2012-08-25T13:06:31Z"
|
||||||
|
content="""
|
||||||
|
Hmm, let's see...
|
||||||
|
|
||||||
|
If the gibberish error is ouyay orgotfay otay otay elltay emay utwhay ethay
|
||||||
|
roreay asway, then we can figure it out, surely..
|
||||||
|
|
||||||
|
If the gibberish error looks something like Ḩ̶̞̗̓ͯ̅͒ͪͫe̢ͦ̊ͭͭͤͣ̂͏̢̳̦͔̬ͅ ̣̘̹̄̕͢Ç̛͈͔̹̮̗͈͓̞ͨ͂͑ͅo̿ͥͮ̿͢͏̧̹̗̪͇̫m̷̢̞̙͑̊̔ͧ̍ͩ̇̚ę̜͑̀͝s̖̱̝̩̞̻͐͂̐́̂̇̆͂
|
||||||
|
|
||||||
|
.. your use of cabal
|
||||||
|
has accidentually summoned Cthulu! Back slowly away from the monitor!
|
||||||
|
|
||||||
|
Otherwise, you might try installing the `libdbus-1-dev` package with apt,
|
||||||
|
which might make cabal install the haskell dbus bindings successfully. Or
|
||||||
|
you could just install the `libghc-dbus-dev` package, which contains the
|
||||||
|
necessary haskell library pre-built. But I don't know if it's in Ubuntu
|
||||||
|
12.04; it only seems to be available in quantal
|
||||||
|
<http://packages.ubuntu.com/search?keywords=libghc-dbus-dev>
|
||||||
|
|
||||||
|
Or you could even build it with the Makefile, rather than using cabal.
|
||||||
|
The Makefile has a `-DWITH_DBUS` setting in it that can be removed to build
|
||||||
|
the fallback mode that doesn't use dbus.
|
||||||
|
|
||||||
|
"""]]
|
|
@ -0,0 +1,10 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="http://joeyh.name/"
|
||||||
|
ip="4.152.246.119"
|
||||||
|
subject="comment 2"
|
||||||
|
date="2012-08-25T13:11:37Z"
|
||||||
|
content="""
|
||||||
|
I fnordgot to mention, cabal can be configured to not build with dbus too. The relevant incantation is:
|
||||||
|
|
||||||
|
cabal install git-annex --flags=\"-Dbus\"
|
||||||
|
"""]]
|
|
@ -0,0 +1,8 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="https://me.yahoo.com/speredenn#aaf38"
|
||||||
|
nickname="Jean-Baptiste Carré"
|
||||||
|
subject="comment 3"
|
||||||
|
date="2012-08-21T18:15:48Z"
|
||||||
|
content="""
|
||||||
|
You're totally right: The UUIDs are the same. So it shouldn't matter if there are many repositories pointing to the same folder, as you state it. Thanks a lot!
|
||||||
|
"""]]
|
|
@ -11,6 +11,11 @@ sudo cabal update
|
||||||
cabal install git-annex --bindir=$HOME/bin
|
cabal install git-annex --bindir=$HOME/bin
|
||||||
</pre>
|
</pre>
|
||||||
|
|
||||||
|
Do not forget to add to your PATH variable your ~/bin folder. In your .bashrc, for example:
|
||||||
|
<pre>
|
||||||
|
PATH=~/bin:/usr/bin/local:$PATH
|
||||||
|
</pre>
|
||||||
|
|
||||||
See also:
|
See also:
|
||||||
|
|
||||||
* [[forum/OSX__39__s_haskell-platform_statically_links_things]]
|
* [[forum/OSX__39__s_haskell-platform_statically_links_things]]
|
||||||
|
|
6
doc/news/version_3.20120825.mdwn
Normal file
6
doc/news/version_3.20120825.mdwn
Normal file
|
@ -0,0 +1,6 @@
|
||||||
|
git-annex 3.20120825 released with [[!toggle text="these changes"]]
|
||||||
|
[[!toggleable text="""
|
||||||
|
* S3: Add fileprefix setting.
|
||||||
|
* Pass --use-agent to gpg when in no tty mode. Thanks, Eskild Hustvedt.
|
||||||
|
* Bugfix: Fix fsck in SHA*E backends, when the key contains composite
|
||||||
|
extensions, as added in 3.20120721."""]]
|
|
@ -0,0 +1,10 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="https://www.google.com/accounts/o8/id?id=AItOawnY9ObrNrQuRp8Xs0XvdtJJssm5cp4NMZA"
|
||||||
|
nickname="alan"
|
||||||
|
subject="Rackspace Cloud Files support?"
|
||||||
|
date="2012-08-23T21:00:11Z"
|
||||||
|
content="""
|
||||||
|
Any chance I could bribe you to setup Rackspace Cloud Files support? We are using them and would hate to have a S3 bucket only for this.
|
||||||
|
|
||||||
|
https://github.com/rackspace/python-cloudfiles
|
||||||
|
"""]]
|
|
@ -1,5 +1,5 @@
|
||||||
Name: git-annex
|
Name: git-annex
|
||||||
Version: 3.20120807
|
Version: 3.20120825
|
||||||
Cabal-Version: >= 1.8
|
Cabal-Version: >= 1.8
|
||||||
License: GPL
|
License: GPL
|
||||||
Maintainer: Joey Hess <joey@kitenet.net>
|
Maintainer: Joey Hess <joey@kitenet.net>
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue