82 lines
3.2 KiB
Text
82 lines
3.2 KiB
Text
|
[[!comment format=mdwn
|
||
|
username="https://anarc.at/openid/"
|
||
|
nickname="anarcat"
|
||
|
subject="thanks for considering this!"
|
||
|
date="2016-09-22T12:43:11Z"
|
||
|
content="""
|
||
|
> (Let's not discuss the behavior of copy --to when the file is not
|
||
|
> locally present here; there is plenty of other discussion of that in
|
||
|
> eg http://bugs.debian.org/671179)
|
||
|
|
||
|
Agreed, it's kind of secondary.
|
||
|
|
||
|
> git-annex's special remote API does not allow remote-to-remote
|
||
|
> transfers without spooling it to a file on disk first.
|
||
|
|
||
|
yeah, i noticed that when writing my own special remote.
|
||
|
|
||
|
> And it's not possible to do using rsync on either end, AFAICS.
|
||
|
|
||
|
That is correct.
|
||
|
|
||
|
> It would be possible in some other cases but this would need to be
|
||
|
> implemented for each type of remote as a new API call.
|
||
|
|
||
|
... and would fail for most, so there's little benefit there.
|
||
|
|
||
|
how about a socket or FIFO of some sort? i know those break a lot of
|
||
|
semantics (e.g. `[ -f /tmp/fifo ]` fails in bash) but they could be a
|
||
|
solution...
|
||
|
|
||
|
> Modern systems tend to have quite a large disk cache, so it's quite
|
||
|
> possible that going via a temp file on disk is not going to use a
|
||
|
> lot of disk IO to write and read it when the read and write occur
|
||
|
> fairly close together.
|
||
|
|
||
|
true. there are also in-memory files that could be used, although I
|
||
|
don't think this would work across different process spaces.
|
||
|
|
||
|
> The main benefit from streaming would probably be if it could run
|
||
|
> the download and the upload concurrently.
|
||
|
|
||
|
for me, the main benefit would be to deal with low disk space
|
||
|
conditions, which is quite common on my machines: i often cram the
|
||
|
disk almost to capacity with good stuff i want to listen to
|
||
|
later... git-annex allows me to freely remove stuff when i need the
|
||
|
space, but it often means i am close to 99% capacity on the media
|
||
|
drives i use.
|
||
|
|
||
|
> But that would only be a benefit sometimes. With an asymmetric
|
||
|
> connection, saturating the uplink tends to swamp downloads. Also,
|
||
|
> if download is faster than upload, it would have to throttle
|
||
|
> downloads (which complicates the remote API much more), or buffer
|
||
|
> them to memory (which has its own complications).
|
||
|
|
||
|
that is true.
|
||
|
|
||
|
> Streaming the download to the upload would at best speed things up
|
||
|
> by a factor of 2. It would probably work nearly as well to upload
|
||
|
> the previously downloaded file while downloading the next file.
|
||
|
|
||
|
presented like that, it's true that the benefits of streaming are not
|
||
|
good enough to justify the complexity - the only problem is large
|
||
|
files and low local disk space... but maybe we can delegate that
|
||
|
solution to the user: \"free up at least enough space for one of those
|
||
|
files you want to transfer\".
|
||
|
|
||
|
[... -J magic stuff ...]
|
||
|
|
||
|
> And there is a complication with running that at the same time as eg
|
||
|
> git annex get of the same file. It would be surprising for get to
|
||
|
> succeed (because copy has already temporarily downloaded the file)
|
||
|
> and then have the file later get dropped. So, it seems that copy
|
||
|
> --from --to would need to stash the content away in a temp file
|
||
|
> somewhere instead of storing it in the annex proper.
|
||
|
|
||
|
My thoughts exactly: actually copying the files to the local repo
|
||
|
introduces all sorts of weird --numcopies nastiness and race
|
||
|
conditions, it seems to me.
|
||
|
|
||
|
thanks for considering this!
|
||
|
"""]]
|