Merge remote-tracking branch 'origin/master'

This commit is contained in:
Joey Hess 2012-08-24 12:17:24 -04:00
commit bc6eaa4ebb
5 changed files with 115 additions and 3 deletions

View file

@ -0,0 +1,36 @@
Today, added a thread that deals with recovering when there's been a loss
of network connectivity. When the network's down, the normal immediate
syncing of changes of course doesn't work. So this thread detects when the
network comes back up, and does a pull+push to network remotes, and
triggers scanning for file content that needs to be transferred.
I used dbus again, to detect events generated by both network-manager and
wicd when they've sucessfully brought an interface up. Or, if they're not
available, it polls every 30 minutes.
When the network comes up, in addition to the git pull+push, it also
currently does a full scan of the repo to find files whose contents
need to be transferred to get fully back into sync.
I think it'll be ok for some git pulls and pushes to happen when
moving to a new network, or resuming a laptop (or every 30 minutes when
resorting to polling). But the transfer scan is currently really too heavy
to be appropriate to do every time in those situations. I have an idea for
avoiding that scan when the remote's git-annex branch has not changed. But
I need to refine it, to handle cases like this:
1. a new remote is added
2. file contents start being transferred to (or from it)
3. the network is taken down
4. all the queued transfers fail
5. the network comes back up
6. the transfer scan needs to know the remote was not all in sync
before #3, and so should do a full scan despite the git-annex branch
not having changed
---
Doubled the ram in my netbook, which I use for all development. Yesod needs
rather a lot of ram to compile and link, and this should make me quite a
lot more productive. I was struggling with OOM killing bits of chromium
during my last week of development.

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawmubB1Sj2rwFoVdZYvGV0ACaQUJQyiJXJI"
nickname="Paul"
subject="Amazon Glacier"
date="2012-08-23T06:32:24Z"
content="""
Do you think git-annex could support [Amazon Glacier](http://aws.amazon.com/glacier/) as a backend?
"""]]

View file

@ -0,0 +1,21 @@
Woke up this morning with most of the design for a smarter approach to
[[syncing]] in my head. (This is why I sometimes slip up and tell people I
work on this project 12 hours a day..)
To keep the current `assistant` branch working while I make changes
that break use cases that are working, I've started
developing in a new branch, `assistant-wip`.
In it, I've started getting rid of unnecessary expensive transfer scans.
First optimisation I've done is to detect when a remote that was
disconnected has diverged its `git-annex` branch from the local branch.
Only when that's the case does a new transfer scan need to be done, to find
out what new stuff might be available on that remote, to have caused the
change to its branch, while it was disconnected.
That broke a lot of stuff. I have a plan to fix it written down in
[[syncing]]. It'll involve keeping track of whether a transfer scan has
ever been done (if not, one should be run), and recording logs when
transfers failed, so those failed transfers can be retried when the
remote gets reconnected.

View file

@ -3,9 +3,42 @@ all the other git clones, at both the git level and the key/value level.
## immediate action items
* At startup, and possibly periodically, or when the network connection
changes, or some heuristic suggests that a remote was disconnected from
us for a while, queue remotes for processing by the TransferScanner.
* Optimisations in 5c3e14649ee7c404f86a1b82b648d896762cbbc2 temporarily
broke content syncing in some situations, which need to be added back.
Now syncing a disconnected remote only starts a transfer scan if the
remote's git-annex branch has diverged, which indicates it probably has
new files. But that leaves open the cases where the local repo has
new files; and where the two repos git branches are in sync, but the
content transfers are lagging behind; and where the transfer scan has
never been run.
Need to track locally whether we're believed to be in sync with a remote.
This includes:
* All local content has been transferred to it successfully.
* The remote has been scanned once for data to transfer from it, and all
transfers initiated by that scan succeeded.
Note the complication that, if it's initiated a transfer, our queued
transfer will be thrown out as unnecessary. But if its transfer then
fails, that needs to be noticed.
If we're going to track failed transfers, we could just set a flag,
and use that flag later to initiate a new transfer scan. We need a flag
in any case, to ensure that a transfer scan is run for each new remote.
The flag could be `.git/annex/transfer/scanned/uuid`.
But, if failed transfers are tracked, we could also record them, in
order to retry them later, without the scan. I'm thinking about a
directory like `.git/annex/transfer/failed/{upload,download}/uuid/`,
which failed transfer log files could be moved to.
Note that a remote may lose content it had before, so when requeuing
a failed download, should check the location log to see if it still has
the content, and if not, queue a download from elsewhere. (And, a remote
may get content we were uploading from elsewhere, so check the location
log when queuing a failed Upload too.)
* Ensure that when a remote receives content, and updates its location log,
it syncs that update back out. Prerequisite for:
* After git sync, identify new content that we don't have that is now available
@ -43,6 +76,10 @@ all the other git clones, at both the git level and the key/value level.
that need to be done to sync with a remote. Currently it walks the git
working copy and checks each file.
## misc todo
* --debug will show often unnecessary work being done. Optimise.
## data syncing
There are two parts to data syncing. First, map the network and second,
@ -157,3 +194,5 @@ redone to check it.
finishes. **done**
* Test MountWatcher on KDE, and add whatever dbus events KDE emits when
drives are mounted. **done**
* It would be nice if, when a USB drive is connected,
syncing starts automatically. Use dbus on Linux? **done**

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="https://me.yahoo.com/speredenn#aaf38"
nickname="Jean-Baptiste Carré"
subject="comment 3"
date="2012-08-21T18:15:48Z"
content="""
You're totally right: The UUIDs are the same. So it shouldn't matter if there are many repositories pointing to the same folder, as you state it. Thanks a lot!
"""]]