From 366f7f81efe3e367709c8f8df6468f72ca8074c8 Mon Sep 17 00:00:00 2001 From: yarikoptic Date: Mon, 1 Oct 2018 16:29:19 +0000 Subject: [PATCH 1/4] Added a comment --- .../comment_2_3580025d828b27072530eb8ccda9bdf4._comment | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 doc/todo/configuration_option_for_default___34__mode__34___on_crippled_file_systems/comment_2_3580025d828b27072530eb8ccda9bdf4._comment diff --git a/doc/todo/configuration_option_for_default___34__mode__34___on_crippled_file_systems/comment_2_3580025d828b27072530eb8ccda9bdf4._comment b/doc/todo/configuration_option_for_default___34__mode__34___on_crippled_file_systems/comment_2_3580025d828b27072530eb8ccda9bdf4._comment new file mode 100644 index 0000000000..319002b34e --- /dev/null +++ b/doc/todo/configuration_option_for_default___34__mode__34___on_crippled_file_systems/comment_2_3580025d828b27072530eb8ccda9bdf4._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 2" + date="2018-10-01T16:29:19Z" + content=""" +I think that is correct. +But isn't `annex init` is also indirectly invoked by any annex command, e.g. if I just do `git clone URL ; cd DIR; git annex get FILEs`? +"""]] From e8e9299d7d53f94d3705450d0e2c74d406c4d945 Mon Sep 17 00:00:00 2001 From: yarikoptic Date: Mon, 1 Oct 2018 17:13:58 +0000 Subject: [PATCH 2/4] initial findings on the "smart HTTP" and git-annex --- ...te_serving_via___34__smart_HTTP__34__.mdwn | 110 ++++++++++++++++++ 1 file changed, 110 insertions(+) create mode 100644 doc/bugs/unable_to_access_annexed_files_from_a_git_repo_website_serving_via___34__smart_HTTP__34__.mdwn diff --git a/doc/bugs/unable_to_access_annexed_files_from_a_git_repo_website_serving_via___34__smart_HTTP__34__.mdwn b/doc/bugs/unable_to_access_annexed_files_from_a_git_repo_website_serving_via___34__smart_HTTP__34__.mdwn new file mode 100644 index 0000000000..4376dd0666 --- /dev/null +++ b/doc/bugs/unable_to_access_annexed_files_from_a_git_repo_website_serving_via___34__smart_HTTP__34__.mdwn @@ -0,0 +1,110 @@ +### Please describe the problem. + +Our http://datasets.datalad.org has been providing git annex repos, some of which with the content, via a "dummy" HTTP support of git. For various reasons (performance, progress reporting by git upon clone) we want to switch to use [Smart HTTP](https://git-scm.com/book/en/v2/Git-on-the-Server-Smart-HTTP) git-http-backend backend. Sample deployment is at http://datasets-dev.datalad.org/. +I followed the docs to set it up and only added one more configuration tune up + +``` + RewriteEngine On + RewriteCond "%{HTTP_USER_AGENT}" "(git)" + RewriteRule ^(.*)$ "/git/$1" [PT] +``` + +so that people could still browse the website in the browser, but whenever `git` tries to access it, we direct to the `git-http-backend` CGI serving under `/git/` prefix (`ScriptAlias /git/ /usr/lib/git-core/git-http-backend/`). + +Everything seems to work nicely on git side, BUT I am having difficulty to make git-annex being able to serve annexed files from it: + +### What version of git-annex are you using? On what operating system? + +6.20180913+git149-g23bd27773-1~ndall+1 + + +### Please provide any additional information below. + +[[!format sh """ +$> builtin cd /tmp/; rm -rf raiders; git clone http://datasets-dev.datalad.org/labs/haxby/raiders/ ; cd raiders; git annex get sub-rid000005/anat/sub-rid000005_run-01_T1w_defacemask.nii.gz Cloning into 'raiders'... +remote: Counting objects: 17926, done. +remote: Compressing objects: 100% (7203/7203), done. +remote: Total 17926 (delta 7356), reused 15517 (delta 6237) +Receiving objects: 100% (17926/17926), 1.23 MiB | 6.53 MiB/s, done. +Resolving deltas: 100% (7356/7356), done. +README.md masks/ stimulus/ sub-rid000014/ sub-rid000028/ sub-rid000038/ task-raiders_bold.json +dataset_description.json scripts/ sub-rid000005/ sub-rid000015/ sub-rid000029/ sub-rid000042/ +derivatives/ sourcedata/ sub-rid000011/ sub-rid000020/ sub-rid000033/ sub-rid000043/ +(merging origin/git-annex into git-annex...) +(recording state in git...) +get sub-rid000005/anat/sub-rid000005_run-01_T1w_defacemask.nii.gz download failed: Not Found + + Remote origin not usable by git-annex; setting annex-ignore +(not available) + Try making some of these repositories available: + 41e5039d-1750-43d2-8bea-89897d969326 -- /mnt/datasets/datalad/crawl/labs/haxby/raiders + 87d7db62-683d-43b2-b594-baeb420ae7a6 -- . + afde6679-1f2f-41f2-935a-93e7e3d70274 -- nastase@head1:~/BIDS/haxby/raiders + de53ce43-2c07-4971-8de8-0445c596f7dc -- datalad-public-ro + + (Note that these git remotes have annex-ignore set: origin) +failed +(recording state in git...) +git-annex: get: 1 failed +"""]] + +fails because `config` file is under `.git/` subdirectory there and git-annex doesn't try to access it at all to deduce the uuid, thus marking origin as annex-ignore. + +But if I add that `.git` suffix to the url, then: + +[[!format sh """ +(git)hopa:/tmp/raiders[master] +$> builtin cd /tmp/; rm -rf raiders; git clone http://datasets-dev.datalad.org/labs/haxby/raiders/.git/ ; cd raiders; git annex get sub-rid000005/anat/sub-rid000005_run-01_T1w_defacemask.nii.gz +Cloning into 'raiders'... +remote: Counting objects: 17926, done. +remote: Compressing objects: 100% (7203/7203), done. +remote: Total 17926 (delta 7356), reused 15517 (delta 6237) +Receiving objects: 100% (17926/17926), 1.23 MiB | 5.08 MiB/s, done. +Resolving deltas: 100% (7356/7356), done. +README.md masks/ stimulus/ sub-rid000014/ sub-rid000028/ sub-rid000038/ task-raiders_bold.json +dataset_description.json scripts/ sub-rid000005/ sub-rid000015/ sub-rid000029/ sub-rid000042/ +derivatives/ sourcedata/ sub-rid000011/ sub-rid000020/ sub-rid000033/ sub-rid000043/ +(merging origin/git-annex into git-annex...) +(recording state in git...) +get sub-rid000005/anat/sub-rid000005_run-01_T1w_defacemask.nii.gz (from origin...) +download failed: Not Found +download failed: Not Found + + Unable to access these remotes: origin + + Try making some of these repositories available: + 41e5039d-1750-43d2-8bea-89897d969326 -- /mnt/datasets/datalad/crawl/labs/haxby/raiders + 87d7db62-683d-43b2-b594-baeb420ae7a6 -- . + afde6679-1f2f-41f2-935a-93e7e3d70274 -- nastase@head1:~/BIDS/haxby/raiders + de53ce43-2c07-4971-8de8-0445c596f7dc -- datalad-public-ro [origin] +failed +(recording state in git...) +git-annex: get: 1 failed +"""]] +because it fails to find those two files under `.git/annex/objects`, here is apache log file +[[!format apache """ +10.31.191.134 - - [01/Oct/2018:13:01:58 -0400] "GET /labs/haxby/raiders/.git//config HTTP/1.1" 206 501 "-" "-" +10.31.191.134 - - [01/Oct/2018:13:01:58 -0400] "GET /labs/haxby/raiders/.git//annex/objects/Z8/f1/MD5E-s41438--06c245e709e7d40a90ed48c6c3b58295.nii.gz/MD5E-s41438--06c245e709e7d40a90ed48c6c3b58295.nii.gz HTTP/1.1" 404 243 "-" "git-annex/6.20180913+git149-g23bd27773-1~ndall+1" +10.31.191.134 - - [01/Oct/2018:13:01:58 -0400] "GET /labs/haxby/raiders/.git//annex/objects/681/5d0/MD5E-s41438--06c245e709e7d40a90ed48c6c3b58295.nii.gz/MD5E-s41438--06c245e709e7d40a90ed48c6c3b58295.nii.gz HTTP/1.1" 404 243 "-" "git-annex/6.20180913+git149-g23bd27773-1~ndall+1" +"""]] + +where it seems to assume different layout: + +[[!format sh """ +$> ls -dl $webroot//labs/haxby/raiders/.git/annex/objects/*/*/MD5E-s41438--06c245e709e7d40a90ed48c6c3b58295.nii.gz +drwxrwsr-x 1 yoh datalad 104 Sep 26 2016 /mnt/btrfs/manual-snapshots/srv-20180928/datasets.datalad.org/www///labs/haxby/raiders/.git/annex/objects/Z8/f1/MD5E-s41438--06c245e709e7d40a90ed48c6c3b58295.nii.gz/ +"""]] + + +which git-annex assumes when working with the dummy HTTP: +``` +10.31.191.134 - - [01/Oct/2018:13:09:53 -0400] "GET /labs/haxby/raiders/.git//config HTTP/1.1" 206 501 "-" "-" +10.31.191.134 - - [01/Oct/2018:13:09:53 -0400] "GET /labs/haxby/raiders/.git//annex/objects/Z8/f1/MD5E-s41438--06c245e709e7d40a90ed48c6c3b58295.nii.gz/MD5E-s41438--06c245e709e7d40a90ed48c6c3b58295.nii.gz HTTP/1.1" 200 41679 "-" "git-annex/6.20180913+git149-g23bd27773-1~ndall+1" +``` + +So I wonder if I need to do something on my end in configuring apache2, or something could/should be done on git-annex side? Ideally I would like to be able to just clone them without specifying `.git/` suffix to the url. + +But also note that `git-annex` seems to not even provide any agent value while trying to access `config` file: +``` +10.31.191.134 - - [01/Oct/2018:13:12:45 -0400] "GET /labs/haxby/raiders/.git//config HTTP/1.1" 206 501 "-" "-" +``` From 6e41f970e8dfb359c738b64a27862e0e6f744178 Mon Sep 17 00:00:00 2001 From: yarikoptic Date: Mon, 1 Oct 2018 17:25:13 +0000 Subject: [PATCH 3/4] --- ..._git_repo_website_serving_via___34__smart_HTTP__34__.mdwn | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/doc/bugs/unable_to_access_annexed_files_from_a_git_repo_website_serving_via___34__smart_HTTP__34__.mdwn b/doc/bugs/unable_to_access_annexed_files_from_a_git_repo_website_serving_via___34__smart_HTTP__34__.mdwn index 4376dd0666..53faabf7ca 100644 --- a/doc/bugs/unable_to_access_annexed_files_from_a_git_repo_website_serving_via___34__smart_HTTP__34__.mdwn +++ b/doc/bugs/unable_to_access_annexed_files_from_a_git_repo_website_serving_via___34__smart_HTTP__34__.mdwn @@ -82,11 +82,12 @@ failed git-annex: get: 1 failed """]] because it fails to find those two files under `.git/annex/objects`, here is apache log file -[[!format apache """ + +``` 10.31.191.134 - - [01/Oct/2018:13:01:58 -0400] "GET /labs/haxby/raiders/.git//config HTTP/1.1" 206 501 "-" "-" 10.31.191.134 - - [01/Oct/2018:13:01:58 -0400] "GET /labs/haxby/raiders/.git//annex/objects/Z8/f1/MD5E-s41438--06c245e709e7d40a90ed48c6c3b58295.nii.gz/MD5E-s41438--06c245e709e7d40a90ed48c6c3b58295.nii.gz HTTP/1.1" 404 243 "-" "git-annex/6.20180913+git149-g23bd27773-1~ndall+1" 10.31.191.134 - - [01/Oct/2018:13:01:58 -0400] "GET /labs/haxby/raiders/.git//annex/objects/681/5d0/MD5E-s41438--06c245e709e7d40a90ed48c6c3b58295.nii.gz/MD5E-s41438--06c245e709e7d40a90ed48c6c3b58295.nii.gz HTTP/1.1" 404 243 "-" "git-annex/6.20180913+git149-g23bd27773-1~ndall+1" -"""]] +``` where it seems to assume different layout: From 8df00a084a45b274c060107852a096954af5369d Mon Sep 17 00:00:00 2001 From: yarikoptic Date: Mon, 1 Oct 2018 17:40:05 +0000 Subject: [PATCH 4/4] --- ..._repo_website_serving_via___34__smart_HTTP__34__.mdwn | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/doc/bugs/unable_to_access_annexed_files_from_a_git_repo_website_serving_via___34__smart_HTTP__34__.mdwn b/doc/bugs/unable_to_access_annexed_files_from_a_git_repo_website_serving_via___34__smart_HTTP__34__.mdwn index 53faabf7ca..234a930eb6 100644 --- a/doc/bugs/unable_to_access_annexed_files_from_a_git_repo_website_serving_via___34__smart_HTTP__34__.mdwn +++ b/doc/bugs/unable_to_access_annexed_files_from_a_git_repo_website_serving_via___34__smart_HTTP__34__.mdwn @@ -3,9 +3,12 @@ Our http://datasets.datalad.org has been providing git annex repos, some of which with the content, via a "dummy" HTTP support of git. For various reasons (performance, progress reporting by git upon clone) we want to switch to use [Smart HTTP](https://git-scm.com/book/en/v2/Git-on-the-Server-Smart-HTTP) git-http-backend backend. Sample deployment is at http://datasets-dev.datalad.org/. I followed the docs to set it up and only added one more configuration tune up + ``` RewriteEngine On + RewriteCond "%{HTTP_USER_AGENT}" "(git)" + RewriteRule ^(.*)$ "/git/$1" [PT] ``` @@ -83,9 +86,12 @@ git-annex: get: 1 failed """]] because it fails to find those two files under `.git/annex/objects`, here is apache log file + ``` 10.31.191.134 - - [01/Oct/2018:13:01:58 -0400] "GET /labs/haxby/raiders/.git//config HTTP/1.1" 206 501 "-" "-" + 10.31.191.134 - - [01/Oct/2018:13:01:58 -0400] "GET /labs/haxby/raiders/.git//annex/objects/Z8/f1/MD5E-s41438--06c245e709e7d40a90ed48c6c3b58295.nii.gz/MD5E-s41438--06c245e709e7d40a90ed48c6c3b58295.nii.gz HTTP/1.1" 404 243 "-" "git-annex/6.20180913+git149-g23bd27773-1~ndall+1" + 10.31.191.134 - - [01/Oct/2018:13:01:58 -0400] "GET /labs/haxby/raiders/.git//annex/objects/681/5d0/MD5E-s41438--06c245e709e7d40a90ed48c6c3b58295.nii.gz/MD5E-s41438--06c245e709e7d40a90ed48c6c3b58295.nii.gz HTTP/1.1" 404 243 "-" "git-annex/6.20180913+git149-g23bd27773-1~ndall+1" ``` @@ -98,14 +104,17 @@ drwxrwsr-x 1 yoh datalad 104 Sep 26 2016 /mnt/btrfs/manual-snapshots/srv-201809 which git-annex assumes when working with the dummy HTTP: + ``` 10.31.191.134 - - [01/Oct/2018:13:09:53 -0400] "GET /labs/haxby/raiders/.git//config HTTP/1.1" 206 501 "-" "-" + 10.31.191.134 - - [01/Oct/2018:13:09:53 -0400] "GET /labs/haxby/raiders/.git//annex/objects/Z8/f1/MD5E-s41438--06c245e709e7d40a90ed48c6c3b58295.nii.gz/MD5E-s41438--06c245e709e7d40a90ed48c6c3b58295.nii.gz HTTP/1.1" 200 41679 "-" "git-annex/6.20180913+git149-g23bd27773-1~ndall+1" ``` So I wonder if I need to do something on my end in configuring apache2, or something could/should be done on git-annex side? Ideally I would like to be able to just clone them without specifying `.git/` suffix to the url. But also note that `git-annex` seems to not even provide any agent value while trying to access `config` file: + ``` 10.31.191.134 - - [01/Oct/2018:13:12:45 -0400] "GET /labs/haxby/raiders/.git//config HTTP/1.1" 206 501 "-" "-" ```