Merge branch 'master' of ssh://git-annex.branchable.com
This commit is contained in:
commit
08d0b44f4d
7 changed files with 183 additions and 0 deletions
|
@ -0,0 +1,67 @@
|
|||
### Please describe the problem.
|
||||
|
||||
I created a dataset with datalad and added multiple remotes/buckets on a private S3-like server (minio).
|
||||
On the HPC I am able to get the data and push it.
|
||||
On another machine, I cannot get the data, here are the debug info.
|
||||
|
||||
|
||||
$ git-annex get -v -d --from s3unf.phantom.anat.mri sub-01/ses-ana001/anat/sub-01_ses-ana001_T1w.nii.gz
|
||||
[2020-02-21 16:20:09.399086] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"]
|
||||
[2020-02-21 16:20:09.405552] process done ExitSuccess
|
||||
[2020-02-21 16:20:09.405884] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
|
||||
[2020-02-21 16:20:09.411446] process done ExitSuccess
|
||||
[2020-02-21 16:20:09.413223] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..050f99f2712d25b581b58a3adc6af83b8d5345b0","--pretty=%H","-n1"]
|
||||
[2020-02-21 16:20:09.419815] process done ExitSuccess
|
||||
[2020-02-21 16:20:09.420835] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
|
||||
[2020-02-21 16:20:09.421976] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
|
||||
[2020-02-21 16:20:09.456813] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","symbolic-ref","-q","HEAD"]
|
||||
[2020-02-21 16:20:09.462354] process done ExitSuccess
|
||||
[2020-02-21 16:20:09.462577] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","refs/heads/master"]
|
||||
[2020-02-21 16:20:09.468639] process done ExitSuccess
|
||||
[2020-02-21 16:20:09.468923] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--cached","-z","--","sub-01/ses-ana001/anat/sub-01_ses-ana001_T1w.nii.gz"]
|
||||
get sub-01/ses-ana001/anat/sub-01_ses-ana001_T1w.nii.gz (from s3unf.phantom.anat.mri...)
|
||||
[2020-02-21 16:20:09.585446] String to sign: "GET\n\n\nFri, 21 Feb 2020 21:20:09 GMT\n/phantom.anat.mri/SHA256E-s28509481-S1073741824-C1--7eb345c88a120d934cb390ad0385149cb6ae8540a2ed6689251cba22b7832306.nii.gz"
|
||||
[2020-02-21 16:20:09.585844] Host: "<redacted_hostname>"
|
||||
[2020-02-21 16:20:09.586001] Path: "/phantom.anat.mri/SHA256E-s28509481-S1073741824-C1--7eb345c88a120d934cb390ad0385149cb6ae8540a2ed6689251cba22b7832306.nii.gz"
|
||||
[2020-02-21 16:20:09.586261] Query string: ""
|
||||
[2020-02-21 16:20:09.586423] Header: [("Date","Fri, 21 Feb 2020 21:20:09 GMT"),("Authorization","AWS manager:k/aaaaaaaaaaaaaaaaaaaaaa=")]
|
||||
[2020-02-21 16:20:09.593542] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>
|
||||
[2020-02-21 16:20:09.594433] String to sign: "GET\n\n\nFri, 21 Feb 2020 21:20:09 GMT\n/phantom.anat.mri/SHA256E-s28509481--7eb345c88a120d934cb390ad0385149cb6ae8540a2ed6689251cba22b7832306.nii.gz"
|
||||
[2020-02-21 16:20:09.594667] Host: "<redacted_hostname>"
|
||||
[2020-02-21 16:20:09.594819] Path: "/phantom.anat.mri/SHA256E-s28509481--7eb345c88a120d934cb390ad0385149cb6ae8540a2ed6689251cba22b7832306.nii.gz"
|
||||
[2020-02-21 16:20:09.595006] Query string: ""
|
||||
[2020-02-21 16:20:09.595153] Header: [("Date","Fri, 21 Feb 2020 21:20:09 GMT"),("Authorization","AWS manager:wwwwwwwwwwwwwwwwwww ")]
|
||||
[2020-02-21 16:20:09.596938] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>
|
||||
failed
|
||||
[2020-02-21 16:20:09.600838] process done ExitSuccess
|
||||
[2020-02-21 16:20:09.601651] process done ExitSuccess
|
||||
git-annex: get: 1 failed
|
||||
|
||||
|
||||
When running that command, no request is sent to the server!? (in minio trace logs), but I don't know why.
|
||||
|
||||
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
|
||||
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
latest package in conda-forge on ubuntu
|
||||
|
||||
|
||||
### Please provide any additional information below.
|
||||
|
||||
[[!format sh """
|
||||
# If you can, paste a complete transcript of the problem occurring here.
|
||||
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
|
||||
|
||||
|
||||
# End of transcript or log.
|
||||
"""]]
|
||||
|
||||
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
|
||||
|
||||
Yes.
|
||||
|
|
@ -0,0 +1,51 @@
|
|||
I expected `git annex init --version=N`, where N is a version in
|
||||
`autoUpgradeableVersions`, to auto-upgrade in a way that lands on a
|
||||
supported version. For example, with a git-annex built from a recent
|
||||
master (835a214de), `supportedVersions` contains only 8, and I
|
||||
expected
|
||||
|
||||
git init && git annex init --version=6
|
||||
|
||||
to result in a v8 repo. Instead it lands on v7:
|
||||
|
||||
git config annex.version # => 7
|
||||
|
||||
Shouldn't the auto-upgrade mapping by `init` be carried all the way
|
||||
through until a supported version is reached?
|
||||
|
||||
I was able to get the behavior I expected with the change below.
|
||||
|
||||
[[!format diff """
|
||||
diff --git a/Command/Init.hs b/Command/Init.hs
|
||||
index db6cb14fb..ad7ed8962 100644
|
||||
--- a/Command/Init.hs
|
||||
+++ b/Command/Init.hs
|
||||
@@ -36,11 +36,15 @@ optParser desc = InitOptions
|
||||
parseRepoVersion :: Monad m => String -> m RepoVersion
|
||||
parseRepoVersion s = case RepoVersion <$> readish s of
|
||||
Nothing -> fail $ "version parse error"
|
||||
- Just v
|
||||
- | v `elem` supportedVersions -> return v
|
||||
- | otherwise -> case M.lookup v autoUpgradeableVersions of
|
||||
- Just v' -> return v'
|
||||
- Nothing -> fail $ s ++ " is not a currently supported repository version"
|
||||
+ Just v -> case check v of
|
||||
+ Nothing -> fail $ s ++ " is not a currently supported repository version"
|
||||
+ Just supported -> return supported
|
||||
+ where
|
||||
+ check v'
|
||||
+ | v' `elem` supportedVersions = Just v'
|
||||
+ | otherwise = case M.lookup v' autoUpgradeableVersions of
|
||||
+ Just v'' -> check v''
|
||||
+ Nothing -> Nothing
|
||||
|
||||
seek :: InitOptions -> CommandSeek
|
||||
seek = commandAction . start
|
||||
"""]]
|
||||
|
||||
I'm quite confident that my attempt above is not pretty, but if I got
|
||||
lucky and it's passable Haskell, I'd be happy to send a patch.
|
||||
|
||||
|
||||
[[!meta author=kyle]]
|
||||
[[!tag projects/datalad]]
|
|
@ -0,0 +1,30 @@
|
|||
I've been trying out exporting to S3 with Wasabi, it has worked for some files, but for one large file I had to set a partsize value otherwise the upload would be rejected.
|
||||
|
||||
Even with a partsize value, the upload still fails though. It gets right to the end and then fails with `WrongRequestBodyStreamSize`.
|
||||
|
||||
```
|
||||
export wasabi 2020-01-24/guix_data_service_full.dump
|
||||
100% 7.31 GiB 45 MiB/s 0s
|
||||
HttpExceptionRequest Request {
|
||||
host = "data-guix-gnu-org-database-dumps.s3.wasabisys.com"
|
||||
port = 443
|
||||
secure = True
|
||||
requestHeaders = [("Date","Sat, 22 Feb 2020 08:49:58 GMT"),("Authorization","<REDACTED>")]
|
||||
path = "/2020-01-24/guix_data_service_full.dump"
|
||||
queryString = "partNumber=78&uploadId=MaRGWhxX1uZZ12IdGSSsc3rinnMgmKMfOcXRPuSo1zxzCviuBR37gxh_4gAmtvOmiRyqJjCMfgO4zhqKOB2Fi8WPLE2b7Iix-2gXkXgcIvLVlijoixwRS-UcllhHhG38"
|
||||
method = "PUT"
|
||||
proxy = Nothing
|
||||
rawBody = False
|
||||
redirectCount = 10
|
||||
responseTimeout = ResponseTimeoutDefault
|
||||
requestVersion = HTTP/1.1
|
||||
}
|
||||
(WrongRequestBodyStreamSize 31419314 29814466)
|
||||
failed
|
||||
```
|
||||
|
||||
I've tried with partsizes from 100MB to 5GB, and I'm on a pretty recent version of git-annex: 7.20200219. I'm not sure if this is an issue in git-annex, or the aws or http-client dependencies.
|
||||
|
||||
Thanks,
|
||||
|
||||
Chris
|
|
@ -0,0 +1,20 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="locking granularity"
|
||||
date="2020-02-20T22:35:33Z"
|
||||
content="""
|
||||
Joey,
|
||||
|
||||
I very much value all the work/thought you've put into making git-annex robust, starting with choosing Haskell.
|
||||
|
||||
As to why the question keeps coming up...
|
||||
|
||||
I often find myself wanting to use git-annex in what seems to me non-standard ways, so it's possible the usage pattern wasn't planned/optimized/tested for.
|
||||
E.g. with `git-annex-get --batch` the typical usage would be to feed a large batch of keys at once, and to not have other git-annex processes running at the time.
|
||||
The git-annex test suite [[does not test under concurrency|todo/add_tests_under_concurrency]]. I've run into intermittent failures with concurrent operations, that
|
||||
were fixed by disabling concurrency. I'll try at some point to isolate reproducible examples of these failures, but they do happen quite consistently on my system.
|
||||
|
||||
I understand that git-annex is parallelism-safe in that parallelism does not cause data loss. But things short of data loss, like intermittent failures/deadlocks, are something I still need to work around, when using git-annex as a building block in larger automated workflows.
|
||||
|
||||
"""]]
|
|
@ -55,4 +55,6 @@ designed to interoperate with it.
|
|||
|
||||
* [AnnexRemote](https://github.com/Lykos153/AnnexRemote) is a python library for creating new external special remotes.
|
||||
|
||||
* [Git::Annex](https://metacpan.org/release/Git-Annex) is a Perl interface to git-annex, including a script `annex-review-unused` for interactively processing git-annex-unused output
|
||||
|
||||
See also [[not]] for software that is *not* related to git-annex, but similar.
|
||||
|
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="yarikoptic"
|
||||
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
|
||||
subject="comment 5"
|
||||
date="2020-02-21T00:07:53Z"
|
||||
content="""
|
||||
[done](https://git-annex.branchable.com/todo/Provide_a_way_to_white_list_local_networks___40__not_just_specific_IPs__41__/)
|
||||
"""]]
|
|
@ -0,0 +1,5 @@
|
|||
Original whining and use case [is buried in the comments to CVE fix announcement](https://git-annex.branchable.com/security/#comment-7d9435f13213b85896a3424c342c9a1f). In summary: `http_proxy` environment variable could point to some address within local network (e.g. `10.0.0.1/24`). `security.allowed-ip-addresses` (or older `annex.security.allowed-http-addresses`) seems to support only a whitespace separate list of addresses (too tedious to list for sizeable private networks) or "all" (too much) not support networks.
|
||||
In case of `http_proxy` it would also be valuable to be able to limit to specific (e.g. privileged) port(s) (thus kicking back to `http` from a generic `ip`), to avoid opening access to some malicious user providing URLs on the same private network on some other port. I think both address + port "wishlist" items are related thus filing in a single issue.
|
||||
|
||||
[[!meta author=yoh]]
|
||||
[[!tag projects/datalad]]
|
Loading…
Add table
Add a link
Reference in a new issue