Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2018-10-06 19:53:06 -04:00
commit e7ff1c6762
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
8 changed files with 212 additions and 0 deletions

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="comment 4"
date="2018-10-05T19:56:24Z"
content="""
If the s3 remote claims s3:// URLs, does the bucket name have to be a DNS domain? I thought, when a special remote claims a URL, it can interepret it however it wants?
"""]]

View file

@ -0,0 +1,18 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 5"
date="2018-10-05T20:20:42Z"
content="""
yes, for `s3://` urls there is only a bucket name, not a domain. Since bucket name is allowed to have `.`, some folks started to use their project domain name as a bucket name (e.g. `openneuro.org`, `images.cocodataset.org`). Then if you are to access them directly via url, full domain name would be e.g. http://images.cocodataset.org.s3.amazonaws.com, which would start causing troubles if you try to access it via https
[[!format sh \"\"\"
$> wget -S https://images.cocodataset.org.s3.amazonaws.com
--2018-10-05 16:19:48-- https://images.cocodataset.org.s3.amazonaws.com/
Resolving images.cocodataset.org.s3.amazonaws.com (images.cocodataset.org.s3.amazonaws.com)... 52.216.18.32
Connecting to images.cocodataset.org.s3.amazonaws.com (images.cocodataset.org.s3.amazonaws.com)|52.216.18.32|:443... connected.
The certificate's owner does not match hostname images.cocodataset.org.s3.amazonaws.com
\"\"\"]]
for which we started to provide workarounds in datalad.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="comment 6"
date="2018-10-05T21:31:33Z"
content="""
My current planned solution is to write an external special remote that claims s3:// URLs and downloads them. Then can use addurl --fast . My use case is that, if I run a batch job that reads inputs from s3 and writes outputs to s3, what I get at the end are pointers to s3, and I want to check these results into git-annex. Ideally there'd be a way for me to tell the batch system to use git-annex to send things to s3, but currently that's not possible.
Question: if an external special remote claims a URL that a built-in special remote could handle, does the external special remote take priority?
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 7"
date="2018-10-05T21:41:45Z"
content="""
FWIW datalad special remote already supports download from such s3:// URLs
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="comment 2"
date="2018-10-05T18:24:18Z"
content="""
Thanks for adding annex.jobs . Could you make it so that setting it to 0 means \"use all available processors\"? I use git-annex on AWS instances, and reserve instances with different processor counts at different times.
\"git-annex is rarely cpu-bound\" -- I though parallelization helps by parallelizing I/O operations such as file transfers?
"""]]