Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2022-09-21 10:41:12 -04:00
commit b072410e06
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
7 changed files with 324 additions and 0 deletions

View file

@ -0,0 +1,92 @@
### Please describe the problem.
git status reports having staged changes and no changes from index
```shell
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: .dandi/assets.json
no changes added to commit (use "git add" and/or "git commit -a")
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git annex status
M ./.dandi/assets.json
```
although git shows no diff and sha256 checksum corresponds to the key:
```shell
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git diff --cached
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git show -- .dandi/assets.json
commit b859efed7ddb2ff31cc26168f40676c572d2798f (HEAD -> draft, github/draft, github/HEAD)
Author: DANDI User <info@dandiarchive.org>
Date: Fri Sep 16 22:22:29 2022 +0000
[backups2datalad] 66 files added
diff --git a/.dandi/assets.json b/.dandi/assets.json
index d3ef95e1ee..62fe372810 100644
--- a/.dandi/assets.json
+++ b/.dandi/assets.json
@@ -1 +1 @@
-/annex/objects/SHA256E-s69400783--8b576786d3926ab0e84809b4131cdc5a8f631674d378afa343e7dcd84f011c90.json
+/annex/objects/SHA256E-s69507227--6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14.json
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ sha256sum .dandi/assets.json
6a0a91c4158d316ab8ad9bd8ebf7579b9c3c579e1035c48134246b6a5d2f6f14 .dandi/assets.json
```
I think may be the tricky part is that I have it of
```
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git config annex.version
10
```
although I thought that we kept it at 8 but I have user wider config setting
```
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git config filter.annex.process
git-annex filter-process
```
I was recommended to speed up operations while avoiding upgrade to 10, but I guess running most recent version once lead to the upgrade since all the other repos are still at 8 as I thought it would be
```
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ grep -h '\<version =' ../*/.git/config | sort | uniq -c
1 version = 10
186 version = 8
```
having it reported modified causes our script which does sanity check to operate only on clean repo to fail.
`git reset --hard` seems mitigated that
```
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git reset --hard
HEAD is now at b859efed7d [backups2datalad] 66 files added
(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000026$ git status
On branch draft
Your branch is up to date with 'github/draft'.
nothing to commit, working tree clean
```
all. I will now rerun our script and see in what state I would end up (although, once again, I ended up in version 10 of the repo already, so may be behavior would be different).
### What steps will reproduce the problem?
I think I get it after I `annex move` and then `annex get` that file back. Just for my own reference -- git-annex repo is result of the https://github.com/dandi/dandisets/blob/draft/tools/backups2datalad-update-cron
### What version of git-annex are you using? On what operating system?
10.20220822-g84f1875 (conda build), originally observed on earlier 10.20220724-ge30d846
[[!meta author=yoh]]
[[!tag projects/dandi]]

133
doc/forum/HTTP_uploads.mdwn Normal file
View file

@ -0,0 +1,133 @@
Does git-annex support **uploading** over HTTP? I learned how to set up a public (anonymous, download-only) HTTP remote -- a regular git remote, **not** a special remote -- by following [[tips/setup_a_public_repository_on_a_web_site/]]. Now I also want private repos (so, non-anonymous downloads), and, for completeness, uploads.
I know those Apache-based instructions don't cover those cases. I'm working on extending them. I've [ported those instructions into Gitea](https://github.com/neuropoly/gitea/pull/1), and now I can `git clone` with both gitea's SSH and HTTP URLs and `git annex get` works, and permissions are enforced on private repos.
So I've got non-anonymous downloads covered. But how do I make `git annex sync --content` (or equivalently, `git annex copy --to origin`) upload when the remote is an HTTP URL? Does it know how? I've experimented and haven't been able to get it to work, but neither does it give me a clear error rejecting my attempt.
```
[kousu@nigiri CANDICE-fMRI-]$ git config annex.debug true
[kousu@nigiri CANDICE-fMRI-]$ git annex copy --to origin
[2022-05-08 02:52:23.217458919] (Utility.Process) process [28036] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"]
[2022-05-08 02:52:23.219000083] (Utility.Process) process [28036] done ExitSuccess
[2022-05-08 02:52:23.219516464] (Utility.Process) process [28037] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2022-05-08 02:52:23.221135085] (Utility.Process) process [28037] done ExitSuccess
[2022-05-08 02:52:23.22179462] (Utility.Process) process [28038] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..6c3b23ed5b89e73fdb170e4dfb3dc4f7324acd87","--pretty=%H","-n1"]
[2022-05-08 02:52:23.224259745] (Utility.Process) process [28038] done ExitSuccess
[2022-05-08 02:52:23.225976282] (Utility.Process) process [28039] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2022-05-08 02:52:23.228332152] (Utility.Process) process [28040] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--stage","-z","--error-unmatch","--"]
[2022-05-08 02:52:23.229159541] (Utility.Process) process [28041] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)","--buffer"]
[2022-05-08 02:52:23.22958377] (Utility.Process) process [28042] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
[2022-05-08 02:52:23.230213916] (Utility.Process) process [28039] done ExitSuccess
[2022-05-08 02:52:23.231313697] (Utility.Process) process [28043] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
[...]
copy sub-BAN04/anat/sub-BAN02_T1w.nii.gz [2022-05-08 02:52:53.078448893] (Utility.Url) Request {
host = "data.praxisinstitute.org.dev.neuropoly.org"
port = 443
secure = True
requestHeaders = [("Accept-Encoding",""),("User-Agent","git-annex/10.20220322-g959beeea9")]
path = "/UofC/CANDICE-fMRI-.git/annex/objects/433/2b9/SHA256E-s6677912--29be5e1eef0d3b5fcd6817999c9055f6c088d58b37c7c09bb87c440fb8037c81.nii.gz/SHA256E-s6677912--29be5e1eef0d3b5fcd6817999c9055f6c088d58b37c7c09bb87c440fb8037c81.nii.gz"
queryString = ""
method = "HEAD"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
proxySecureMode = ProxySecureWithConnect
}
[2022-05-08 02:52:53.16577961] (Utility.Process) process [28097] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","credential","fill"]
Username for 'https://data.praxisinstitute.org.dev.neuropoly.org': kousu
Password for 'https://kousu@data.praxisinstitute.org.dev.neuropoly.org':
[2022-05-08 02:52:56.868778349] (Utility.Process) process [28097] done ExitSuccess
[2022-05-08 02:52:56.869119387] (Utility.Url) Request {
host = "data.praxisinstitute.org.dev.neuropoly.org"
port = 443
secure = True
requestHeaders = [("Accept-Encoding",""),("Authorization","<REDACTED>"),("User-Agent","git-annex/10.20220322-g959beeea9")]
path = "/UofC/CANDICE-fMRI-.git/annex/objects/433/2b9/SHA256E-s6677912--29be5e1eef0d3b5fcd6817999c9055f6c088d58b37c7c09bb87c440fb8037c81.nii.gz/SHA256E-s6677912--29be5e1eef0d3b5fcd6817999c9055f6c088d58b37c7c09bb87c440fb8037c81.nii.gz"
queryString = ""
method = "HEAD"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
proxySecureMode = ProxySecureWithConnect
}
[2022-05-08 02:52:56.954725712] (Utility.Process) process [28098] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","credential","reject"]
[2022-05-08 02:52:56.956656836] (Utility.Process) process [28098] done ExitSuccess
[2022-05-08 02:52:56.957140664] (Utility.Url) Request {
host = "data.praxisinstitute.org.dev.neuropoly.org"
port = 443
secure = True
requestHeaders = [("Accept-Encoding",""),("User-Agent","git-annex/10.20220322-g959beeea9")]
path = "/UofC/CANDICE-fMRI-.git/annex/objects/93/kp/SHA256E-s6677912--29be5e1eef0d3b5fcd6817999c9055f6c088d58b37c7c09bb87c440fb8037c81.nii.gz/SHA256E-s6677912--29be5e1eef0d3b5fcd6817999c9055f6c088d58b37c7c09bb87c440fb8037c81.nii.gz"
queryString = ""
method = "HEAD"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
proxySecureMode = ProxySecureWithConnect
}
[2022-05-08 02:52:57.041148655] (Utility.Process) process [28099] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","credential","fill"]
Username for 'https://data.praxisinstitute.org.dev.neuropoly.org': kousu
Password for 'https://kousu@data.praxisinstitute.org.dev.neuropoly.org':
[2022-05-08 02:53:00.095495453] (Utility.Process) process [28099] done ExitSuccess
[2022-05-08 02:53:00.095774849] (Utility.Url) Request {
host = "data.praxisinstitute.org.dev.neuropoly.org"
port = 443
secure = True
requestHeaders = [("Accept-Encoding",""),("Authorization","<REDACTED>"),("User-Agent","git-annex/10.20220322-g959beeea9")]
path = "/UofC/CANDICE-fMRI-.git/annex/objects/93/kp/SHA256E-s6677912--29be5e1eef0d3b5fcd6817999c9055f6c088d58b37c7c09bb87c440fb8037c81.nii.gz/SHA256E-s6677912--29be5e1eef0d3b5fcd6817999c9055f6c088d58b37c7c09bb87c440fb8037c81.nii.gz"
queryString = ""
method = "HEAD"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
proxySecureMode = ProxySecureWithConnect
}
[2022-05-08 02:53:00.183376947] (Utility.Process) process [28100] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","credential","reject"]
[2022-05-08 02:53:00.18501078] (Utility.Process) process [28100] done ExitSuccess
(not found) failed
[2022-05-08 02:53:00.185253039] (Utility.Process) process [28043] done ExitSuccess
[2022-05-08 02:53:00.185356286] (Utility.Process) process [28042] done ExitSuccess
[2022-05-08 02:53:00.185460462] (Utility.Process) process [28041] done ExitSuccess
[2022-05-08 02:53:00.185552806] (Utility.Process) process [28040] done ExitSuccess
copy: 4 failed
```
I observe that `git-annex` issues a HEAD request to find out of the file already exists (presumably, checking if it needs to be uploaded), hits the authwall, asks for and reissues the HEAD request with credentials, which _successfully_ gets a 404 -- I know by watching the server logs -- but then it says "(not found) failed". Then it reissues the same two requests for the hashdirmixed ([[internals/hashing]]) variant of the URLs.
Why is it issuing a HEAD at all if it's not going to try to later send a PUT or a POST?
I'm posting this in `forum` because I can't tell if this is a `bug` or a `wishlist` item. Is git-annex supposed to be able to upload over HTTP or not? It can upload to S3, why not to regular HTTP remotes? Would you be interested in exploring this feature if it doesn't yet exist?
<details><summary>version</summary>
```
[kousu@nigiri ~]$ git annex version
git-annex version: 10.20220504-g4e4c44ed8
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.30 DAV-1.3.4 feed-1.3.2.1 ghc-9.0.2 http-client-0.7.11 persistent-sqlite-2.13.0.3 torrent-10000.1.1 uuid-1.3.15 yesod-1.6.2
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: linux x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
```
</details>
My motivation for HTTP uploads is that I want my users to be able to do everything using only one credential; special remotes imply adding an extra credential, and ssh remotes mean managing ssh keys. I know maybe it's less safe, but getting people to adopt this technology is already difficult due to the number of nearly identical but incompatible apps, and asking them to manage extra credentials is often the final breaking point. If I can say "just make up one password" it'll go down easier. Hopefully you can give me a clear answer about if this is/isn't/will/won't be a thing, and then I will know where to focus my efforts next :)
Thanks!

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="Gus"
avatar="http://cdn.libravatar.org/avatar/665626c67ab3ee7e842183f6f659e120"
subject="comment 5"
date="2022-09-20T22:07:30Z"
content="""
Thank you, helpful community, for your assistance.
joey, I stand corrected regarding the documentation. I guess it was the magnitude of the consequences of that detail that took me by surprise.
"""]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="rinomizu5@5ead4c82685c65d7717dbd5591b80425036ae9e3"
nickname="rinomizu5"
avatar="http://cdn.libravatar.org/avatar/62478823018c68821064febcda7e5d4f"
subject="&quot;not inbackend=URL&quot; is failed with parse error"
date="2022-09-21T07:04:35Z"
content="""
I tried the `git annex wanted gin \"not inbackend=URL\"` because I don't want to sync to gin remote if the backend key is a URL. But it failed with a Parse error.
The syntax says `inbackend=name`, but is `URL` not included in this `name`?
Please advise me what to do.
"""]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="rinomizu5@5ead4c82685c65d7717dbd5591b80425036ae9e3"
nickname="rinomizu5"
avatar="http://cdn.libravatar.org/avatar/62478823018c68821064febcda7e5d4f"
subject="&quot;not inbackend=URL&quot; is failed with parse error"
date="2022-09-21T07:04:55Z"
content="""
I tried the `git annex wanted gin \"not inbackend=URL\"` because I don't want to sync to gin remote if the backend key is a URL. But it failed with a Parse error.
The syntax says `inbackend=name`, but is `URL` not included in this `name`?
Please advise me what to do.
"""]]

View file

@ -0,0 +1,49 @@
[[!comment format=mdwn
username="nick.guenther@e418ed3c763dff37995c2ed5da4232a7c6cee0a9"
nickname="nick.guenther"
avatar="http://cdn.libravatar.org/avatar/9e85c6ca61c3f877fef4f91c2bf6e278"
subject="comment 5"
date="2022-09-20T21:30:51Z"
content="""
We are hosting two datasets on S3 that allows anonymous downloads: https://github.com/spine-generic/data-multi-subject, https://github.com/spine-generic/data-single-subject.
You can try it right now:
```
p115628@joplin:~/datasets/t$ git clone https://github.com/spine-generic/data-multi-subject
Clonage dans 'data-multi-subject'...
remote: Enumerating objects: 123344, done.
remote: Counting objects: 100% (26796/26796), done.
remote: Compressing objects: 100% (19491/19491), done.
remote: Total 123344 (delta 6052), reused 25545 (delta 5941), pack-reused 96548
Réception d'objets: 100% (123344/123344), 15.50 Mio | 8.48 Mio/s, fait.
Résolution des deltas: 100% (46253/46253), fait.
p115628@joplin:~/datasets/t$ cd data-multi-subject/
p115628@joplin:~/datasets/t/data-multi-subject$ git annex get
(merging origin/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)
get derivatives/labels/sub-amu01/anat/sub-amu01_T1w_labels-disc-manual.nii.gz (from amazon...)
(checksum...) ok
get derivatives/labels/sub-amu01/anat/sub-amu01_T1w_seg-manual.nii.gz (from amazon...)
(checksum...) ok
get derivatives/labels/sub-amu01/anat/sub-amu01_T2star_seg-manual.nii.gz (from amazon...)
(checksum...) ok
get derivatives/labels/sub-amu01/anat/sub-amu01_T2w_labels-disc-manual.nii.gz (from amazon...)
[...]
```
The trick was simply to set `public=yes` and `publicurl` when running `initremote`. The final config I have stored is
```
5a5447a8-a9b8-49bc-8276-01a62632b502 autoenable=true bucket=data-multi-subject---spine-generic---neuropoly datacenter=ca-central-1 encryption=none host=s3.ca-central-1.amazonaws.com name=amazon port=443 public=yes publicurl=https://data-multi-subject---spine-generic---neuropoly.s3.ca-central-1.amazonaws.com signature=v4 storageclass=STANDARD type=S3 timestamp=1661783824.374956s
```
Why would `importtree` behave so differently?
"""]]

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 6"
date="2022-09-20T22:50:46Z"
content="""
interesting idea. May be my invocation is still incomplete (no host or datacenter) but with the following one I am still queried for the credentials:
```
git annex initremote s3-origin type=S3 importtree=yes encryption=none autoenable=true bucket=dandiarchive fileprefix=zarr-checksums/2ac71edb-738c-40ac-bd8c-8ca985adaa12/ public=yes publicurl=https://dandiarchive.s3.amazonaws.com/ signature=v4 storageclass=STANDARD port=443
```
or may be whenever you ran `initremote` you did have those credential variables exported already? ;)
"""]]