Merge remote-tracking branch 'origin/master'
This commit is contained in:
commit
bc6eaa4ebb
5 changed files with 115 additions and 3 deletions
|
@ -0,0 +1,36 @@
|
||||||
|
Today, added a thread that deals with recovering when there's been a loss
|
||||||
|
of network connectivity. When the network's down, the normal immediate
|
||||||
|
syncing of changes of course doesn't work. So this thread detects when the
|
||||||
|
network comes back up, and does a pull+push to network remotes, and
|
||||||
|
triggers scanning for file content that needs to be transferred.
|
||||||
|
|
||||||
|
I used dbus again, to detect events generated by both network-manager and
|
||||||
|
wicd when they've sucessfully brought an interface up. Or, if they're not
|
||||||
|
available, it polls every 30 minutes.
|
||||||
|
|
||||||
|
When the network comes up, in addition to the git pull+push, it also
|
||||||
|
currently does a full scan of the repo to find files whose contents
|
||||||
|
need to be transferred to get fully back into sync.
|
||||||
|
|
||||||
|
I think it'll be ok for some git pulls and pushes to happen when
|
||||||
|
moving to a new network, or resuming a laptop (or every 30 minutes when
|
||||||
|
resorting to polling). But the transfer scan is currently really too heavy
|
||||||
|
to be appropriate to do every time in those situations. I have an idea for
|
||||||
|
avoiding that scan when the remote's git-annex branch has not changed. But
|
||||||
|
I need to refine it, to handle cases like this:
|
||||||
|
|
||||||
|
1. a new remote is added
|
||||||
|
2. file contents start being transferred to (or from it)
|
||||||
|
3. the network is taken down
|
||||||
|
4. all the queued transfers fail
|
||||||
|
5. the network comes back up
|
||||||
|
6. the transfer scan needs to know the remote was not all in sync
|
||||||
|
before #3, and so should do a full scan despite the git-annex branch
|
||||||
|
not having changed
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Doubled the ram in my netbook, which I use for all development. Yesod needs
|
||||||
|
rather a lot of ram to compile and link, and this should make me quite a
|
||||||
|
lot more productive. I was struggling with OOM killing bits of chromium
|
||||||
|
during my last week of development.
|
|
@ -0,0 +1,8 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="https://www.google.com/accounts/o8/id?id=AItOawmubB1Sj2rwFoVdZYvGV0ACaQUJQyiJXJI"
|
||||||
|
nickname="Paul"
|
||||||
|
subject="Amazon Glacier"
|
||||||
|
date="2012-08-23T06:32:24Z"
|
||||||
|
content="""
|
||||||
|
Do you think git-annex could support [Amazon Glacier](http://aws.amazon.com/glacier/) as a backend?
|
||||||
|
"""]]
|
21
doc/design/assistant/blog/day_62__smarter_syncing.mdwn
Normal file
21
doc/design/assistant/blog/day_62__smarter_syncing.mdwn
Normal file
|
@ -0,0 +1,21 @@
|
||||||
|
Woke up this morning with most of the design for a smarter approach to
|
||||||
|
[[syncing]] in my head. (This is why I sometimes slip up and tell people I
|
||||||
|
work on this project 12 hours a day..)
|
||||||
|
|
||||||
|
To keep the current `assistant` branch working while I make changes
|
||||||
|
that break use cases that are working, I've started
|
||||||
|
developing in a new branch, `assistant-wip`.
|
||||||
|
|
||||||
|
In it, I've started getting rid of unnecessary expensive transfer scans.
|
||||||
|
|
||||||
|
First optimisation I've done is to detect when a remote that was
|
||||||
|
disconnected has diverged its `git-annex` branch from the local branch.
|
||||||
|
Only when that's the case does a new transfer scan need to be done, to find
|
||||||
|
out what new stuff might be available on that remote, to have caused the
|
||||||
|
change to its branch, while it was disconnected.
|
||||||
|
|
||||||
|
That broke a lot of stuff. I have a plan to fix it written down in
|
||||||
|
[[syncing]]. It'll involve keeping track of whether a transfer scan has
|
||||||
|
ever been done (if not, one should be run), and recording logs when
|
||||||
|
transfers failed, so those failed transfers can be retried when the
|
||||||
|
remote gets reconnected.
|
|
@ -3,9 +3,42 @@ all the other git clones, at both the git level and the key/value level.
|
||||||
|
|
||||||
## immediate action items
|
## immediate action items
|
||||||
|
|
||||||
* At startup, and possibly periodically, or when the network connection
|
* Optimisations in 5c3e14649ee7c404f86a1b82b648d896762cbbc2 temporarily
|
||||||
changes, or some heuristic suggests that a remote was disconnected from
|
broke content syncing in some situations, which need to be added back.
|
||||||
us for a while, queue remotes for processing by the TransferScanner.
|
|
||||||
|
Now syncing a disconnected remote only starts a transfer scan if the
|
||||||
|
remote's git-annex branch has diverged, which indicates it probably has
|
||||||
|
new files. But that leaves open the cases where the local repo has
|
||||||
|
new files; and where the two repos git branches are in sync, but the
|
||||||
|
content transfers are lagging behind; and where the transfer scan has
|
||||||
|
never been run.
|
||||||
|
|
||||||
|
Need to track locally whether we're believed to be in sync with a remote.
|
||||||
|
This includes:
|
||||||
|
* All local content has been transferred to it successfully.
|
||||||
|
* The remote has been scanned once for data to transfer from it, and all
|
||||||
|
transfers initiated by that scan succeeded.
|
||||||
|
|
||||||
|
Note the complication that, if it's initiated a transfer, our queued
|
||||||
|
transfer will be thrown out as unnecessary. But if its transfer then
|
||||||
|
fails, that needs to be noticed.
|
||||||
|
|
||||||
|
If we're going to track failed transfers, we could just set a flag,
|
||||||
|
and use that flag later to initiate a new transfer scan. We need a flag
|
||||||
|
in any case, to ensure that a transfer scan is run for each new remote.
|
||||||
|
The flag could be `.git/annex/transfer/scanned/uuid`.
|
||||||
|
|
||||||
|
But, if failed transfers are tracked, we could also record them, in
|
||||||
|
order to retry them later, without the scan. I'm thinking about a
|
||||||
|
directory like `.git/annex/transfer/failed/{upload,download}/uuid/`,
|
||||||
|
which failed transfer log files could be moved to.
|
||||||
|
|
||||||
|
Note that a remote may lose content it had before, so when requeuing
|
||||||
|
a failed download, should check the location log to see if it still has
|
||||||
|
the content, and if not, queue a download from elsewhere. (And, a remote
|
||||||
|
may get content we were uploading from elsewhere, so check the location
|
||||||
|
log when queuing a failed Upload too.)
|
||||||
|
|
||||||
* Ensure that when a remote receives content, and updates its location log,
|
* Ensure that when a remote receives content, and updates its location log,
|
||||||
it syncs that update back out. Prerequisite for:
|
it syncs that update back out. Prerequisite for:
|
||||||
* After git sync, identify new content that we don't have that is now available
|
* After git sync, identify new content that we don't have that is now available
|
||||||
|
@ -43,6 +76,10 @@ all the other git clones, at both the git level and the key/value level.
|
||||||
that need to be done to sync with a remote. Currently it walks the git
|
that need to be done to sync with a remote. Currently it walks the git
|
||||||
working copy and checks each file.
|
working copy and checks each file.
|
||||||
|
|
||||||
|
## misc todo
|
||||||
|
|
||||||
|
* --debug will show often unnecessary work being done. Optimise.
|
||||||
|
|
||||||
## data syncing
|
## data syncing
|
||||||
|
|
||||||
There are two parts to data syncing. First, map the network and second,
|
There are two parts to data syncing. First, map the network and second,
|
||||||
|
@ -157,3 +194,5 @@ redone to check it.
|
||||||
finishes. **done**
|
finishes. **done**
|
||||||
* Test MountWatcher on KDE, and add whatever dbus events KDE emits when
|
* Test MountWatcher on KDE, and add whatever dbus events KDE emits when
|
||||||
drives are mounted. **done**
|
drives are mounted. **done**
|
||||||
|
* It would be nice if, when a USB drive is connected,
|
||||||
|
syncing starts automatically. Use dbus on Linux? **done**
|
||||||
|
|
|
@ -0,0 +1,8 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="https://me.yahoo.com/speredenn#aaf38"
|
||||||
|
nickname="Jean-Baptiste Carré"
|
||||||
|
subject="comment 3"
|
||||||
|
date="2012-08-21T18:15:48Z"
|
||||||
|
content="""
|
||||||
|
You're totally right: The UUIDs are the same. So it shouldn't matter if there are many repositories pointing to the same folder, as you state it. Thanks a lot!
|
||||||
|
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue