Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2020-02-24 12:47:47 -04:00
commit 08d0b44f4d
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
7 changed files with 183 additions and 0 deletions

View file

@ -0,0 +1,67 @@
### Please describe the problem.
I created a dataset with datalad and added multiple remotes/buckets on a private S3-like server (minio).
On the HPC I am able to get the data and push it.
On another machine, I cannot get the data, here are the debug info.
$ git-annex get -v -d --from s3unf.phantom.anat.mri sub-01/ses-ana001/anat/sub-01_ses-ana001_T1w.nii.gz
[2020-02-21 16:20:09.399086] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"]
[2020-02-21 16:20:09.405552] process done ExitSuccess
[2020-02-21 16:20:09.405884] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
[2020-02-21 16:20:09.411446] process done ExitSuccess
[2020-02-21 16:20:09.413223] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..050f99f2712d25b581b58a3adc6af83b8d5345b0","--pretty=%H","-n1"]
[2020-02-21 16:20:09.419815] process done ExitSuccess
[2020-02-21 16:20:09.420835] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
[2020-02-21 16:20:09.421976] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
[2020-02-21 16:20:09.456813] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","symbolic-ref","-q","HEAD"]
[2020-02-21 16:20:09.462354] process done ExitSuccess
[2020-02-21 16:20:09.462577] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","refs/heads/master"]
[2020-02-21 16:20:09.468639] process done ExitSuccess
[2020-02-21 16:20:09.468923] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--cached","-z","--","sub-01/ses-ana001/anat/sub-01_ses-ana001_T1w.nii.gz"]
get sub-01/ses-ana001/anat/sub-01_ses-ana001_T1w.nii.gz (from s3unf.phantom.anat.mri...)
[2020-02-21 16:20:09.585446] String to sign: "GET\n\n\nFri, 21 Feb 2020 21:20:09 GMT\n/phantom.anat.mri/SHA256E-s28509481-S1073741824-C1--7eb345c88a120d934cb390ad0385149cb6ae8540a2ed6689251cba22b7832306.nii.gz"
[2020-02-21 16:20:09.585844] Host: "<redacted_hostname>"
[2020-02-21 16:20:09.586001] Path: "/phantom.anat.mri/SHA256E-s28509481-S1073741824-C1--7eb345c88a120d934cb390ad0385149cb6ae8540a2ed6689251cba22b7832306.nii.gz"
[2020-02-21 16:20:09.586261] Query string: ""
[2020-02-21 16:20:09.586423] Header: [("Date","Fri, 21 Feb 2020 21:20:09 GMT"),("Authorization","AWS manager:k/aaaaaaaaaaaaaaaaaaaaaa=")]
[2020-02-21 16:20:09.593542] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>
[2020-02-21 16:20:09.594433] String to sign: "GET\n\n\nFri, 21 Feb 2020 21:20:09 GMT\n/phantom.anat.mri/SHA256E-s28509481--7eb345c88a120d934cb390ad0385149cb6ae8540a2ed6689251cba22b7832306.nii.gz"
[2020-02-21 16:20:09.594667] Host: "<redacted_hostname>"
[2020-02-21 16:20:09.594819] Path: "/phantom.anat.mri/SHA256E-s28509481--7eb345c88a120d934cb390ad0385149cb6ae8540a2ed6689251cba22b7832306.nii.gz"
[2020-02-21 16:20:09.595006] Query string: ""
[2020-02-21 16:20:09.595153] Header: [("Date","Fri, 21 Feb 2020 21:20:09 GMT"),("Authorization","AWS manager:wwwwwwwwwwwwwwwwwww ")]
[2020-02-21 16:20:09.596938] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>
failed
[2020-02-21 16:20:09.600838] process done ExitSuccess
[2020-02-21 16:20:09.601651] process done ExitSuccess
git-annex: get: 1 failed
When running that command, no request is sent to the server!? (in minio trace logs), but I don't know why.
### What steps will reproduce the problem?
### What version of git-annex are you using? On what operating system?
latest package in conda-forge on ubuntu
### Please provide any additional information below.
[[!format sh """
# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
# End of transcript or log.
"""]]
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Yes.

View file

@ -0,0 +1,51 @@
I expected `git annex init --version=N`, where N is a version in
`autoUpgradeableVersions`, to auto-upgrade in a way that lands on a
supported version. For example, with a git-annex built from a recent
master (835a214de), `supportedVersions` contains only 8, and I
expected
git init && git annex init --version=6
to result in a v8 repo. Instead it lands on v7:
git config annex.version # => 7
Shouldn't the auto-upgrade mapping by `init` be carried all the way
through until a supported version is reached?
I was able to get the behavior I expected with the change below.
[[!format diff """
diff --git a/Command/Init.hs b/Command/Init.hs
index db6cb14fb..ad7ed8962 100644
--- a/Command/Init.hs
+++ b/Command/Init.hs
@@ -36,11 +36,15 @@ optParser desc = InitOptions
parseRepoVersion :: Monad m => String -> m RepoVersion
parseRepoVersion s = case RepoVersion <$> readish s of
Nothing -> fail $ "version parse error"
- Just v
- | v `elem` supportedVersions -> return v
- | otherwise -> case M.lookup v autoUpgradeableVersions of
- Just v' -> return v'
- Nothing -> fail $ s ++ " is not a currently supported repository version"
+ Just v -> case check v of
+ Nothing -> fail $ s ++ " is not a currently supported repository version"
+ Just supported -> return supported
+ where
+ check v'
+ | v' `elem` supportedVersions = Just v'
+ | otherwise = case M.lookup v' autoUpgradeableVersions of
+ Just v'' -> check v''
+ Nothing -> Nothing
seek :: InitOptions -> CommandSeek
seek = commandAction . start
"""]]
I'm quite confident that my attempt above is not pretty, but if I got
lucky and it's passable Haskell, I'd be happy to send a patch.
[[!meta author=kyle]]
[[!tag projects/datalad]]

View file

@ -0,0 +1,30 @@
I've been trying out exporting to S3 with Wasabi, it has worked for some files, but for one large file I had to set a partsize value otherwise the upload would be rejected.
Even with a partsize value, the upload still fails though. It gets right to the end and then fails with `WrongRequestBodyStreamSize`.
```
export wasabi 2020-01-24/guix_data_service_full.dump
100% 7.31 GiB 45 MiB/s 0s
HttpExceptionRequest Request {
host = "data-guix-gnu-org-database-dumps.s3.wasabisys.com"
port = 443
secure = True
requestHeaders = [("Date","Sat, 22 Feb 2020 08:49:58 GMT"),("Authorization","<REDACTED>")]
path = "/2020-01-24/guix_data_service_full.dump"
queryString = "partNumber=78&uploadId=MaRGWhxX1uZZ12IdGSSsc3rinnMgmKMfOcXRPuSo1zxzCviuBR37gxh_4gAmtvOmiRyqJjCMfgO4zhqKOB2Fi8WPLE2b7Iix-2gXkXgcIvLVlijoixwRS-UcllhHhG38"
method = "PUT"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
}
(WrongRequestBodyStreamSize 31419314 29814466)
failed
```
I've tried with partsizes from 100MB to 5GB, and I'm on a pretty recent version of git-annex: 7.20200219. I'm not sure if this is an issue in git-annex, or the aws or http-client dependencies.
Thanks,
Chris

View file

@ -0,0 +1,20 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="locking granularity"
date="2020-02-20T22:35:33Z"
content="""
Joey,
I very much value all the work/thought you've put into making git-annex robust, starting with choosing Haskell.
As to why the question keeps coming up...
I often find myself wanting to use git-annex in what seems to me non-standard ways, so it's possible the usage pattern wasn't planned/optimized/tested for.
E.g. with `git-annex-get --batch` the typical usage would be to feed a large batch of keys at once, and to not have other git-annex processes running at the time.
The git-annex test suite [[does not test under concurrency|todo/add_tests_under_concurrency]]. I've run into intermittent failures with concurrent operations, that
were fixed by disabling concurrency. I'll try at some point to isolate reproducible examples of these failures, but they do happen quite consistently on my system.
I understand that git-annex is parallelism-safe in that parallelism does not cause data loss. But things short of data loss, like intermittent failures/deadlocks, are something I still need to work around, when using git-annex as a building block in larger automated workflows.
"""]]

View file

@ -55,4 +55,6 @@ designed to interoperate with it.
* [AnnexRemote](https://github.com/Lykos153/AnnexRemote) is a python library for creating new external special remotes.
* [Git::Annex](https://metacpan.org/release/Git-Annex) is a Perl interface to git-annex, including a script `annex-review-unused` for interactively processing git-annex-unused output
See also [[not]] for software that is *not* related to git-annex, but similar.

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 5"
date="2020-02-21T00:07:53Z"
content="""
[done](https://git-annex.branchable.com/todo/Provide_a_way_to_white_list_local_networks___40__not_just_specific_IPs__41__/)
"""]]

View file

@ -0,0 +1,5 @@
Original whining and use case [is buried in the comments to CVE fix announcement](https://git-annex.branchable.com/security/#comment-7d9435f13213b85896a3424c342c9a1f). In summary: `http_proxy` environment variable could point to some address within local network (e.g. `10.0.0.1/24`). `security.allowed-ip-addresses` (or older `annex.security.allowed-http-addresses`) seems to support only a whitespace separate list of addresses (too tedious to list for sizeable private networks) or "all" (too much) not support networks.
In case of `http_proxy` it would also be valuable to be able to limit to specific (e.g. privileged) port(s) (thus kicking back to `http` from a generic `ip`), to avoid opening access to some malicious user providing URLs on the same private network on some other port. I think both address + port "wishlist" items are related thus filing in a single issue.
[[!meta author=yoh]]
[[!tag projects/datalad]]