designing new git-annex-shell multi
This commit was supported by the NSF-funded DataLad project.
This commit is contained in:
parent
dd63b4e744
commit
f42baedd4c
3 changed files with 42 additions and 2 deletions
|
@ -0,0 +1,33 @@
|
|||
As shown by benchmarks in
|
||||
*[[here|todo/speed_up_transfers_over_ssh+rsync_--_directly_reuse_the_same_connection__63__]]*,
|
||||
there is some overhead for each file transfer to a rsync special remote, to
|
||||
set up the connection. Idea is to extend git-annex-shell with a command or
|
||||
commands that don't use rsync for transferring objects, and that can handle
|
||||
transferring or otherwise operating on multiple objects inside a single tcp
|
||||
session.
|
||||
|
||||
This might only be used when it doesn't need to resume transfer of a file;
|
||||
it could fall back to rsync for resuming.
|
||||
|
||||
Of course, when talking with a git-annex-shell that does not support this
|
||||
new command, git-annex would still need to fall back to the old commands
|
||||
using rsync. And should remember for the session that the remote doesn't
|
||||
support the new command.
|
||||
|
||||
It could use sftp, but that seems kind of difficult; it would need to lock
|
||||
down sftp-server to only write annexed objects to the right repository.
|
||||
And, using sftp would mean that git-annex would need to figure out the
|
||||
filenames to use for annexed objects in the remote repository, rather than
|
||||
letting git-annex-shell on the remote work that out.
|
||||
|
||||
So, it seems better to not use sftp, and instead roll our own simple
|
||||
file transfer protocol.
|
||||
|
||||
So, "git-annex-shell -c multi" would speak a protocol over stdin/stdout
|
||||
that essentially contains the commands inannex, lockcontent, dropkey,
|
||||
recvkey, and sendkey.
|
||||
|
||||
P2P.Protocol already contains such a similar protocol, used over tor.
|
||||
That protocol even supports resuming interrupted transfers.
|
||||
It has stuff including auth that this wouldn't need, but it would be
|
||||
good to unify with it as much as possible.
|
|
@ -1,6 +1,10 @@
|
|||
A sftp backend would be nice because gpg operations could be pipelined to the network transfer, not requiring the creation of a full file to disk with gpg before the network transmission, as it happens with rsync.
|
||||
A sftp special remote would be nice because gpg operations could be
|
||||
pipelined to the network transfer, not requiring the creation of a full
|
||||
file to disk with gpg before the network transmission, as it happens with
|
||||
the rsync special remote.
|
||||
|
||||
There should be some libraries that can handle the sftp connections and transfers. I read that even curl has support for that.
|
||||
There should be some libraries that can handle the sftp connections and
|
||||
transfers. I read that even curl has support for that.
|
||||
|
||||
> Another reason to build this is that sftp has a `SFTP_FXP_STAT`
|
||||
> that can get disk free space information. "echo df | sftp user@host"
|
|
@ -32,3 +32,6 @@ ATM, even with ControlPersist=yes, on a fast interconnection between hosts (so i
|
|||
|
||||
|
||||
both hosts do not show any high CPU load
|
||||
|
||||
> [[closed|done]]; wrung out all the perf gains we can without
|
||||
> [[accellerate_ssh_remotes_with_git-annex-shell_mass_protocol]] --[[Joey]]
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue