update
This commit is contained in:
parent
dfb6709064
commit
d5d4b8db34
1 changed files with 39 additions and 40 deletions
|
@ -3,42 +3,12 @@ all the other git clones, at both the git level and the key/value level.
|
|||
|
||||
## immediate action items
|
||||
|
||||
* Optimisations in 5c3e14649ee7c404f86a1b82b648d896762cbbc2 temporarily
|
||||
broke content syncing in some situations, which need to be added back.
|
||||
|
||||
Now syncing a disconnected remote only starts a transfer scan if the
|
||||
remote's git-annex branch has diverged, which indicates it probably has
|
||||
new files. But that leaves open the cases where the local repo has
|
||||
new files; and where the two repos git branches are in sync, but the
|
||||
content transfers are lagging behind; and where the transfer scan has
|
||||
never been run.
|
||||
|
||||
Need to track locally whether we're believed to be in sync with a remote.
|
||||
This includes:
|
||||
* All local content has been transferred to it successfully.
|
||||
* The remote has been scanned once for data to transfer from it, and all
|
||||
transfers initiated by that scan succeeded.
|
||||
|
||||
Note the complication that, if it's initiated a transfer, our queued
|
||||
transfer will be thrown out as unnecessary. But if its transfer then
|
||||
fails, that needs to be noticed.
|
||||
|
||||
If we're going to track failed transfers, we could just set a flag,
|
||||
and use that flag later to initiate a new transfer scan. We need a flag
|
||||
in any case, to ensure that a transfer scan is run for each new remote.
|
||||
The flag could be `.git/annex/transfer/scanned/uuid`.
|
||||
|
||||
But, if failed transfers are tracked, we could also record them, in
|
||||
order to retry them later, without the scan. I'm thinking about a
|
||||
directory like `.git/annex/transfer/failed/{upload,download}/uuid/`,
|
||||
which failed transfer log files could be moved to.
|
||||
|
||||
Note that a remote may lose content it had before, so when requeuing
|
||||
a failed download, should check the location log to see if it still has
|
||||
* Fix MountWatcher to notice umounts and remounts of drives.
|
||||
* A remote may lose content it had before, so when requeuing
|
||||
a failed download, check the location log to see if the remote still has
|
||||
the content, and if not, queue a download from elsewhere. (And, a remote
|
||||
may get content we were uploading from elsewhere, so check the location
|
||||
log when queuing a failed Upload too.)
|
||||
|
||||
* Ensure that when a remote receives content, and updates its location log,
|
||||
it syncs that update back out. Prerequisite for:
|
||||
* After git sync, identify new content that we don't have that is now available
|
||||
|
@ -67,18 +37,17 @@ all the other git clones, at both the git level and the key/value level.
|
|||
files in some directories and not others. See for use cases:
|
||||
[[forum/Wishlist:_options_for_syncing_meta-data_and_data]]
|
||||
* speed up git syncing by using the cached ssh connection for it too
|
||||
(will need to use `GIT_SSH`, which needs to point to a command to run,
|
||||
not a shell command line)
|
||||
Will need to use `GIT_SSH`, which needs to point to a command to run,
|
||||
not a shell command line. Beware that the network connection may have
|
||||
bounced and the cached ssh connection not be usable.
|
||||
* Map the network of git repos, and use that map to calculate
|
||||
optimal transfers to keep the data in sync. Currently a naive flood fill
|
||||
is done instead.
|
||||
* Find a more efficient way for the TransferScanner to find the transfers
|
||||
that need to be done to sync with a remote. Currently it walks the git
|
||||
working copy and checks each file.
|
||||
|
||||
## misc todo
|
||||
|
||||
* --debug will show often unnecessary work being done. Optimise.
|
||||
working copy and checks each file. That probably needs to be done once,
|
||||
but further calls to the TransferScanner could eg, look at the delta
|
||||
between the last scan and the current one in the git-annex branch.
|
||||
|
||||
## data syncing
|
||||
|
||||
|
@ -196,3 +165,33 @@ redone to check it.
|
|||
drives are mounted. **done**
|
||||
* It would be nice if, when a USB drive is connected,
|
||||
syncing starts automatically. Use dbus on Linux? **done**
|
||||
* Optimisations in 5c3e14649ee7c404f86a1b82b648d896762cbbc2 temporarily
|
||||
broke content syncing in some situations, which need to be added back.
|
||||
**done**
|
||||
|
||||
Now syncing a disconnected remote only starts a transfer scan if the
|
||||
remote's git-annex branch has diverged, which indicates it probably has
|
||||
new files. But that leaves open the cases where the local repo has
|
||||
new files; and where the two repos git branches are in sync, but the
|
||||
content transfers are lagging behind; and where the transfer scan has
|
||||
never been run.
|
||||
|
||||
Need to track locally whether we're believed to be in sync with a remote.
|
||||
This includes:
|
||||
* All local content has been transferred to it successfully.
|
||||
* The remote has been scanned once for data to transfer from it, and all
|
||||
transfers initiated by that scan succeeded.
|
||||
|
||||
Note the complication that, if it's initiated a transfer, our queued
|
||||
transfer will be thrown out as unnecessary. But if its transfer then
|
||||
fails, that needs to be noticed.
|
||||
|
||||
If we're going to track failed transfers, we could just set a flag,
|
||||
and use that flag later to initiate a new transfer scan. We need a flag
|
||||
in any case, to ensure that a transfer scan is run for each new remote.
|
||||
The flag could be `.git/annex/transfer/scanned/uuid`.
|
||||
|
||||
But, if failed transfers are tracked, we could also record them, in
|
||||
order to retry them later, without the scan. I'm thinking about a
|
||||
directory like `.git/annex/transfer/failed/{upload,download}/uuid/`,
|
||||
which failed transfer log files could be moved to.
|
||||
|
|
Loading…
Reference in a new issue