comment
This commit is contained in:
parent
629026bdbc
commit
3353ff236a
1 changed files with 42 additions and 0 deletions
|
@ -0,0 +1,42 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 3"""
|
||||
date="2020-07-02T19:06:17Z"
|
||||
content="""
|
||||
I see why you want this, but how is git-annex supposed to know when it has
|
||||
the right size batch of transfers ready?
|
||||
|
||||
It could look at the -J number, and wait until it's sent that many
|
||||
TRANSFER requests, and call that a batch. But it could be that some jobs run
|
||||
transfers and other jobs do other things (eg CHECKPRESENT or checking
|
||||
that the content is locally present and skipping doing a download).
|
||||
Leading to either a deadlock or a long time stalled out before
|
||||
beginning any transfers.
|
||||
|
||||
One strategy that would work for git-annex is to start the first transfer
|
||||
immediately. While that transfer is running, hold off on starting any more,
|
||||
batching up the requests. Send each batch of transfers after the last batch
|
||||
finishes. And with some messages for framing a batch of transfers for
|
||||
remotes that care.
|
||||
|
||||
What that would naturally result in is, at -Jn, batches of size
|
||||
`[1, n-1, n-1, ..., m < n]` unless transfers were happening faster
|
||||
than git-annex was able to queue up new ones.
|
||||
|
||||
So that's pretty good. But I don't know if it's ideal for every special
|
||||
remote.
|
||||
|
||||
A special remote could implement the same strategy with no help from
|
||||
git-annex, and no changes to my proposed protocol. All you have to do is
|
||||
wait for that first TRANSFER request, call it a batch and start it, and
|
||||
gather the next batch, etc.
|
||||
|
||||
Or, if you know your remote works well with a certian batch size of transfers,
|
||||
you could gather up TRANSFER requests until you have the optimal number,
|
||||
or until a timeout, and then start the batch.
|
||||
|
||||
I don't know if that would work for globus, but it seems like a valid
|
||||
strategy for some hypothetical remotes. Since a remote can implement either
|
||||
strategy, maybe it's better to let them make use of remote-specific
|
||||
knowledge and not put the explicit batching in git-annex?
|
||||
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue