Merge branch 'master' of ssh://git-annex.branchable.com
This commit is contained in:
commit
13fa141cd3
5 changed files with 91 additions and 40 deletions
|
@ -22,3 +22,9 @@ The original file also has sha512 ead9db1f34739014a216239d9624bce74d92fe723de065
|
||||||
>> And what sha512 does the file in .git/annex/bad have **now**? (fsck
|
>> And what sha512 does the file in .git/annex/bad have **now**? (fsck
|
||||||
>> preserves the original filename; this says nothing about what the
|
>> preserves the original filename; this says nothing about what the
|
||||||
>> current checksum is, if the file has been corrupted). --[[Joey]]
|
>> current checksum is, if the file has been corrupted). --[[Joey]]
|
||||||
|
|
||||||
|
The same, as it's the file I was trying to inject:
|
||||||
|
|
||||||
|
ead9db1f34739014a216239d9624bce74d92fe723de06505f9b94cb4c063142ba42b04546f11d3d33869b736e40ded2ff779cb32b26aa10482f09407df0f3c8d .git/annex/bad/SHA512E-s94402560--ead9db1f34739014a216239d9624bce74d92fe723de06505f9b94cb4c063142ba42b04546f11d3d33869b736e40ded2ff779cb32b26aa10482f09407df0f3c8d.Moon.avi
|
||||||
|
|
||||||
|
That's what puzzles me, it is the same file, but for some weird reason git annex thinks it's not.
|
||||||
|
|
26
doc/design/assistant/blog/day_63__transfer_retries.mdwn
Normal file
26
doc/design/assistant/blog/day_63__transfer_retries.mdwn
Normal file
|
@ -0,0 +1,26 @@
|
||||||
|
Implemented everything I planned out yesterday: Expensive scans are only
|
||||||
|
done once per remote (unless the remote changed while it was disconnected),
|
||||||
|
and failed transfers are logged so they can be retried later.
|
||||||
|
|
||||||
|
Changed the TransferScanner to prefer to scan low cost remotes first,
|
||||||
|
as a crude form of scheduling lower-cost transfers first.
|
||||||
|
|
||||||
|
A whole bunch of interesting syncing scenarios should work now. I have not
|
||||||
|
tested them all in detail, but to the best of my knowledge, all these
|
||||||
|
should work:
|
||||||
|
|
||||||
|
* Connect to the network. It starts syncing with a networked remote.
|
||||||
|
Disconnect the network. Reconnect, and it resumes where it left off.
|
||||||
|
* Migrate between networks (ie, home to cafe to work). Any transfers
|
||||||
|
that can only happen on one LAN are retried on each new network you
|
||||||
|
visit, until they succeed.
|
||||||
|
|
||||||
|
One that is not working, but is soooo close:
|
||||||
|
|
||||||
|
* Plug in a removable drive. Some transfers start. Yank the plug.
|
||||||
|
Plug it back in. All necessary transfers resume, and it ends up
|
||||||
|
fully in sync, no matter how many times you yank that cable.
|
||||||
|
|
||||||
|
That's not working because of an infelicity in the MountWatcher.
|
||||||
|
It doesn't notice when the drive gets unmounted, so it ignores
|
||||||
|
the new mount event.
|
|
@ -0,0 +1,10 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="https://www.google.com/accounts/o8/id?id=AItOawmBUR4O9mofxVbpb8JV9mEbVfIYv670uJo"
|
||||||
|
nickname="Justin"
|
||||||
|
subject="comment 1"
|
||||||
|
date="2012-08-23T21:25:48Z"
|
||||||
|
content="""
|
||||||
|
Do encrypted rsync remotes resume quickly as well?
|
||||||
|
|
||||||
|
One thing I noticed was that if a copy --to an encrypted rsync remote gets interrupted it will remove the tmp file and re-encrypt the whole file before resuming rsync.
|
||||||
|
"""]]
|
|
@ -3,42 +3,12 @@ all the other git clones, at both the git level and the key/value level.
|
||||||
|
|
||||||
## immediate action items
|
## immediate action items
|
||||||
|
|
||||||
* Optimisations in 5c3e14649ee7c404f86a1b82b648d896762cbbc2 temporarily
|
* Fix MountWatcher to notice umounts and remounts of drives.
|
||||||
broke content syncing in some situations, which need to be added back.
|
* A remote may lose content it had before, so when requeuing
|
||||||
|
a failed download, check the location log to see if the remote still has
|
||||||
Now syncing a disconnected remote only starts a transfer scan if the
|
|
||||||
remote's git-annex branch has diverged, which indicates it probably has
|
|
||||||
new files. But that leaves open the cases where the local repo has
|
|
||||||
new files; and where the two repos git branches are in sync, but the
|
|
||||||
content transfers are lagging behind; and where the transfer scan has
|
|
||||||
never been run.
|
|
||||||
|
|
||||||
Need to track locally whether we're believed to be in sync with a remote.
|
|
||||||
This includes:
|
|
||||||
* All local content has been transferred to it successfully.
|
|
||||||
* The remote has been scanned once for data to transfer from it, and all
|
|
||||||
transfers initiated by that scan succeeded.
|
|
||||||
|
|
||||||
Note the complication that, if it's initiated a transfer, our queued
|
|
||||||
transfer will be thrown out as unnecessary. But if its transfer then
|
|
||||||
fails, that needs to be noticed.
|
|
||||||
|
|
||||||
If we're going to track failed transfers, we could just set a flag,
|
|
||||||
and use that flag later to initiate a new transfer scan. We need a flag
|
|
||||||
in any case, to ensure that a transfer scan is run for each new remote.
|
|
||||||
The flag could be `.git/annex/transfer/scanned/uuid`.
|
|
||||||
|
|
||||||
But, if failed transfers are tracked, we could also record them, in
|
|
||||||
order to retry them later, without the scan. I'm thinking about a
|
|
||||||
directory like `.git/annex/transfer/failed/{upload,download}/uuid/`,
|
|
||||||
which failed transfer log files could be moved to.
|
|
||||||
|
|
||||||
Note that a remote may lose content it had before, so when requeuing
|
|
||||||
a failed download, should check the location log to see if it still has
|
|
||||||
the content, and if not, queue a download from elsewhere. (And, a remote
|
the content, and if not, queue a download from elsewhere. (And, a remote
|
||||||
may get content we were uploading from elsewhere, so check the location
|
may get content we were uploading from elsewhere, so check the location
|
||||||
log when queuing a failed Upload too.)
|
log when queuing a failed Upload too.)
|
||||||
|
|
||||||
* Ensure that when a remote receives content, and updates its location log,
|
* Ensure that when a remote receives content, and updates its location log,
|
||||||
it syncs that update back out. Prerequisite for:
|
it syncs that update back out. Prerequisite for:
|
||||||
* After git sync, identify new content that we don't have that is now available
|
* After git sync, identify new content that we don't have that is now available
|
||||||
|
@ -67,18 +37,17 @@ all the other git clones, at both the git level and the key/value level.
|
||||||
files in some directories and not others. See for use cases:
|
files in some directories and not others. See for use cases:
|
||||||
[[forum/Wishlist:_options_for_syncing_meta-data_and_data]]
|
[[forum/Wishlist:_options_for_syncing_meta-data_and_data]]
|
||||||
* speed up git syncing by using the cached ssh connection for it too
|
* speed up git syncing by using the cached ssh connection for it too
|
||||||
(will need to use `GIT_SSH`, which needs to point to a command to run,
|
Will need to use `GIT_SSH`, which needs to point to a command to run,
|
||||||
not a shell command line)
|
not a shell command line. Beware that the network connection may have
|
||||||
|
bounced and the cached ssh connection not be usable.
|
||||||
* Map the network of git repos, and use that map to calculate
|
* Map the network of git repos, and use that map to calculate
|
||||||
optimal transfers to keep the data in sync. Currently a naive flood fill
|
optimal transfers to keep the data in sync. Currently a naive flood fill
|
||||||
is done instead.
|
is done instead.
|
||||||
* Find a more efficient way for the TransferScanner to find the transfers
|
* Find a more efficient way for the TransferScanner to find the transfers
|
||||||
that need to be done to sync with a remote. Currently it walks the git
|
that need to be done to sync with a remote. Currently it walks the git
|
||||||
working copy and checks each file.
|
working copy and checks each file. That probably needs to be done once,
|
||||||
|
but further calls to the TransferScanner could eg, look at the delta
|
||||||
## misc todo
|
between the last scan and the current one in the git-annex branch.
|
||||||
|
|
||||||
* --debug will show often unnecessary work being done. Optimise.
|
|
||||||
|
|
||||||
## data syncing
|
## data syncing
|
||||||
|
|
||||||
|
@ -196,3 +165,33 @@ redone to check it.
|
||||||
drives are mounted. **done**
|
drives are mounted. **done**
|
||||||
* It would be nice if, when a USB drive is connected,
|
* It would be nice if, when a USB drive is connected,
|
||||||
syncing starts automatically. Use dbus on Linux? **done**
|
syncing starts automatically. Use dbus on Linux? **done**
|
||||||
|
* Optimisations in 5c3e14649ee7c404f86a1b82b648d896762cbbc2 temporarily
|
||||||
|
broke content syncing in some situations, which need to be added back.
|
||||||
|
**done**
|
||||||
|
|
||||||
|
Now syncing a disconnected remote only starts a transfer scan if the
|
||||||
|
remote's git-annex branch has diverged, which indicates it probably has
|
||||||
|
new files. But that leaves open the cases where the local repo has
|
||||||
|
new files; and where the two repos git branches are in sync, but the
|
||||||
|
content transfers are lagging behind; and where the transfer scan has
|
||||||
|
never been run.
|
||||||
|
|
||||||
|
Need to track locally whether we're believed to be in sync with a remote.
|
||||||
|
This includes:
|
||||||
|
* All local content has been transferred to it successfully.
|
||||||
|
* The remote has been scanned once for data to transfer from it, and all
|
||||||
|
transfers initiated by that scan succeeded.
|
||||||
|
|
||||||
|
Note the complication that, if it's initiated a transfer, our queued
|
||||||
|
transfer will be thrown out as unnecessary. But if its transfer then
|
||||||
|
fails, that needs to be noticed.
|
||||||
|
|
||||||
|
If we're going to track failed transfers, we could just set a flag,
|
||||||
|
and use that flag later to initiate a new transfer scan. We need a flag
|
||||||
|
in any case, to ensure that a transfer scan is run for each new remote.
|
||||||
|
The flag could be `.git/annex/transfer/scanned/uuid`.
|
||||||
|
|
||||||
|
But, if failed transfers are tracked, we could also record them, in
|
||||||
|
order to retry them later, without the scan. I'm thinking about a
|
||||||
|
directory like `.git/annex/transfer/failed/{upload,download}/uuid/`,
|
||||||
|
which failed transfer log files could be moved to.
|
||||||
|
|
|
@ -0,0 +1,10 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="https://www.google.com/accounts/o8/id?id=AItOawnY9ObrNrQuRp8Xs0XvdtJJssm5cp4NMZA"
|
||||||
|
nickname="alan"
|
||||||
|
subject="Rackspace Cloud Files support?"
|
||||||
|
date="2012-08-23T21:00:11Z"
|
||||||
|
content="""
|
||||||
|
Any chance I could bribe you to setup Rackspace Cloud Files support? We are using them and would hate to have a S3 bucket only for this.
|
||||||
|
|
||||||
|
https://github.com/rackspace/python-cloudfiles
|
||||||
|
"""]]
|
Loading…
Add table
Reference in a new issue