Merge branch 'master' of ssh://git-annex.branchable.com

2018-10-06 19:53:06 -04:00 · 2018-10-06 19:53:06 -04:00 · e7ff1c6762
commit e7ff1c6762
parent 5df65af933 8317223b98
8 changed files with 212 additions and 0 deletions
--- a/doc/todo/Natively_support_s3584747_urls_40for_addurl44_get44_etc41/comment_4_8778b93dbc123c484727bb55e39608c9._comment
+++ b/doc/todo/Natively_support_s3584747_urls_40for_addurl44_get44_etc41/comment_4_8778b93dbc123c484727bb55e39608c9._comment
@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="Ilya_Shlyakhter"
+ avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
+ subject="comment 4"
+ date="2018-10-05T19:56:24Z"
+ content="""
+If the s3 remote claims s3:// URLs, does the bucket name have to be a DNS domain?  I thought, when a special remote claims a URL, it can interepret it however it wants?
+"""]]
--- a/doc/todo/Natively_support_s3584747_urls_40for_addurl44_get44_etc41/comment_5_9c7bdcb483b371f421bcdca3d8c2073c._comment
+++ b/doc/todo/Natively_support_s3584747_urls_40for_addurl44_get44_etc41/comment_5_9c7bdcb483b371f421bcdca3d8c2073c._comment
@ -0,0 +1,18 @@
+[[!comment format=mdwn
+ username="yarikoptic"
+ avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
+ subject="comment 5"
+ date="2018-10-05T20:20:42Z"
+ content="""
+yes, for `s3://` urls there is only a bucket name, not a domain.  Since bucket name is allowed to have `.`, some folks started to use their project domain name as a bucket name (e.g. `openneuro.org`, `images.cocodataset.org`).  Then if you are to access them directly via url, full domain name would be e.g. http://images.cocodataset.org.s3.amazonaws.com, which would start causing troubles if you try to access it via https
+
+[[!format sh \"\"\"
+$> wget -S https://images.cocodataset.org.s3.amazonaws.com 
+--2018-10-05 16:19:48--  https://images.cocodataset.org.s3.amazonaws.com/
+Resolving images.cocodataset.org.s3.amazonaws.com (images.cocodataset.org.s3.amazonaws.com)... 52.216.18.32
+Connecting to images.cocodataset.org.s3.amazonaws.com (images.cocodataset.org.s3.amazonaws.com)|52.216.18.32|:443... connected.
+The certificate's owner does not match hostname ‘images.cocodataset.org.s3.amazonaws.com’
+\"\"\"]]
+
+for which we started to provide workarounds in datalad.
+"""]]
--- a/doc/todo/Natively_support_s3584747_urls_40for_addurl44_get44_etc41/comment_6_c94fdab4db1d5f94929ae068c546b533._comment
+++ b/doc/todo/Natively_support_s3584747_urls_40for_addurl44_get44_etc41/comment_6_c94fdab4db1d5f94929ae068c546b533._comment
@ -0,0 +1,10 @@
+[[!comment format=mdwn
+ username="Ilya_Shlyakhter"
+ avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
+ subject="comment 6"
+ date="2018-10-05T21:31:33Z"
+ content="""
+My current planned solution is to write an external special remote that claims s3:// URLs and downloads them.  Then can use addurl --fast .  My use case is that, if I run a batch job that reads inputs from s3 and writes outputs to s3, what I get at the end are pointers to s3, and I want to check these results into git-annex.  Ideally there'd be a way for me to tell the batch system to use git-annex to send things to s3, but currently that's not possible.
+
+Question: if an external special remote claims a URL that a built-in special remote could handle, does the external special remote take priority?
+"""]]
--- a/doc/todo/Natively_support_s3584747_urls_40for_addurl44_get44_etc41/comment_7_e3c50b7ab1c5cd49a3e5376e20b7d4c0._comment
+++ b/doc/todo/Natively_support_s3584747_urls_40for_addurl44_get44_etc41/comment_7_e3c50b7ab1c5cd49a3e5376e20b7d4c0._comment
@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="yarikoptic"
+ avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
+ subject="comment 7"
+ date="2018-10-05T21:41:45Z"
+ content="""
+FWIW datalad special remote already supports download from such s3:// URLs
+"""]]
--- a/doc/todo/config_option_to_use_all_processors_by_default/comment_2_72f1e7c4ac97010be4d8eee7bceb03f2._comment
+++ b/doc/todo/config_option_to_use_all_processors_by_default/comment_2_72f1e7c4ac97010be4d8eee7bceb03f2._comment
@ -0,0 +1,12 @@
+[[!comment format=mdwn
+ username="Ilya_Shlyakhter"
+ avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
+ subject="comment 2"
+ date="2018-10-05T18:24:18Z"
+ content="""
+Thanks for adding annex.jobs .  Could you make it so that setting it to 0 means \"use all available processors\"?  I use git-annex on AWS instances, and reserve instances with different processor counts at different times.
+
+\"git-annex is rarely cpu-bound\" -- I though parallelization helps by parallelizing I/O operations such as file transfers?
+
+
+"""]]