diff --git a/doc/bugs/S3-remote_fail_to_get_data_on_some_clones__47__machines.mdwn b/doc/bugs/S3-remote_fail_to_get_data_on_some_clones__47__machines.mdwn new file mode 100644 index 0000000000..06fec3b320 --- /dev/null +++ b/doc/bugs/S3-remote_fail_to_get_data_on_some_clones__47__machines.mdwn @@ -0,0 +1,67 @@ +### Please describe the problem. + +I created a dataset with datalad and added multiple remotes/buckets on a private S3-like server (minio). +On the HPC I am able to get the data and push it. +On another machine, I cannot get the data, here are the debug info. + + +$ git-annex get -v -d --from s3unf.phantom.anat.mri sub-01/ses-ana001/anat/sub-01_ses-ana001_T1w.nii.gz +[2020-02-21 16:20:09.399086] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"] +[2020-02-21 16:20:09.405552] process done ExitSuccess +[2020-02-21 16:20:09.405884] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"] +[2020-02-21 16:20:09.411446] process done ExitSuccess +[2020-02-21 16:20:09.413223] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","log","refs/heads/git-annex..050f99f2712d25b581b58a3adc6af83b8d5345b0","--pretty=%H","-n1"] +[2020-02-21 16:20:09.419815] process done ExitSuccess +[2020-02-21 16:20:09.420835] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"] +[2020-02-21 16:20:09.421976] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"] +[2020-02-21 16:20:09.456813] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","symbolic-ref","-q","HEAD"] +[2020-02-21 16:20:09.462354] process done ExitSuccess +[2020-02-21 16:20:09.462577] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","refs/heads/master"] +[2020-02-21 16:20:09.468639] process done ExitSuccess +[2020-02-21 16:20:09.468923] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--cached","-z","--","sub-01/ses-ana001/anat/sub-01_ses-ana001_T1w.nii.gz"] +get sub-01/ses-ana001/anat/sub-01_ses-ana001_T1w.nii.gz (from s3unf.phantom.anat.mri...) +[2020-02-21 16:20:09.585446] String to sign: "GET\n\n\nFri, 21 Feb 2020 21:20:09 GMT\n/phantom.anat.mri/SHA256E-s28509481-S1073741824-C1--7eb345c88a120d934cb390ad0385149cb6ae8540a2ed6689251cba22b7832306.nii.gz" +[2020-02-21 16:20:09.585844] Host: "" +[2020-02-21 16:20:09.586001] Path: "/phantom.anat.mri/SHA256E-s28509481-S1073741824-C1--7eb345c88a120d934cb390ad0385149cb6ae8540a2ed6689251cba22b7832306.nii.gz" +[2020-02-21 16:20:09.586261] Query string: "" +[2020-02-21 16:20:09.586423] Header: [("Date","Fri, 21 Feb 2020 21:20:09 GMT"),("Authorization","AWS manager:k/aaaaaaaaaaaaaaaaaaaaaa=")] +[2020-02-21 16:20:09.593542] Response metadata: S3: request ID=, x-amz-id-2= +[2020-02-21 16:20:09.594433] String to sign: "GET\n\n\nFri, 21 Feb 2020 21:20:09 GMT\n/phantom.anat.mri/SHA256E-s28509481--7eb345c88a120d934cb390ad0385149cb6ae8540a2ed6689251cba22b7832306.nii.gz" +[2020-02-21 16:20:09.594667] Host: "" +[2020-02-21 16:20:09.594819] Path: "/phantom.anat.mri/SHA256E-s28509481--7eb345c88a120d934cb390ad0385149cb6ae8540a2ed6689251cba22b7832306.nii.gz" +[2020-02-21 16:20:09.595006] Query string: "" +[2020-02-21 16:20:09.595153] Header: [("Date","Fri, 21 Feb 2020 21:20:09 GMT"),("Authorization","AWS manager:wwwwwwwwwwwwwwwwwww ")] +[2020-02-21 16:20:09.596938] Response metadata: S3: request ID=, x-amz-id-2= +failed +[2020-02-21 16:20:09.600838] process done ExitSuccess +[2020-02-21 16:20:09.601651] process done ExitSuccess +git-annex: get: 1 failed + + +When running that command, no request is sent to the server!? (in minio trace logs), but I don't know why. + + + +### What steps will reproduce the problem? + + + +### What version of git-annex are you using? On what operating system? + +latest package in conda-forge on ubuntu + + +### Please provide any additional information below. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + + +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +Yes. + diff --git a/doc/bugs/__39__init_--version__61__N__39___can_land_on_unsupported_N+1.mdwn b/doc/bugs/__39__init_--version__61__N__39___can_land_on_unsupported_N+1.mdwn new file mode 100644 index 0000000000..035ca3c76c --- /dev/null +++ b/doc/bugs/__39__init_--version__61__N__39___can_land_on_unsupported_N+1.mdwn @@ -0,0 +1,51 @@ +I expected `git annex init --version=N`, where N is a version in +`autoUpgradeableVersions`, to auto-upgrade in a way that lands on a +supported version. For example, with a git-annex built from a recent +master (835a214de), `supportedVersions` contains only 8, and I +expected + + git init && git annex init --version=6 + +to result in a v8 repo. Instead it lands on v7: + + git config annex.version # => 7 + +Shouldn't the auto-upgrade mapping by `init` be carried all the way +through until a supported version is reached? + +I was able to get the behavior I expected with the change below. + +[[!format diff """ +diff --git a/Command/Init.hs b/Command/Init.hs +index db6cb14fb..ad7ed8962 100644 +--- a/Command/Init.hs ++++ b/Command/Init.hs +@@ -36,11 +36,15 @@ optParser desc = InitOptions + parseRepoVersion :: Monad m => String -> m RepoVersion + parseRepoVersion s = case RepoVersion <$> readish s of + Nothing -> fail $ "version parse error" +- Just v +- | v `elem` supportedVersions -> return v +- | otherwise -> case M.lookup v autoUpgradeableVersions of +- Just v' -> return v' +- Nothing -> fail $ s ++ " is not a currently supported repository version" ++ Just v -> case check v of ++ Nothing -> fail $ s ++ " is not a currently supported repository version" ++ Just supported -> return supported ++ where ++ check v' ++ | v' `elem` supportedVersions = Just v' ++ | otherwise = case M.lookup v' autoUpgradeableVersions of ++ Just v'' -> check v'' ++ Nothing -> Nothing + + seek :: InitOptions -> CommandSeek + seek = commandAction . start +"""]] + +I'm quite confident that my attempt above is not pretty, but if I got +lucky and it's passable Haskell, I'd be happy to send a patch. + + +[[!meta author=kyle]] +[[!tag projects/datalad]] diff --git a/doc/forum/WrongRequestBodyStreamSize_exporting_large_file_to_S3.mdwn b/doc/forum/WrongRequestBodyStreamSize_exporting_large_file_to_S3.mdwn new file mode 100644 index 0000000000..1a93839d75 --- /dev/null +++ b/doc/forum/WrongRequestBodyStreamSize_exporting_large_file_to_S3.mdwn @@ -0,0 +1,30 @@ +I've been trying out exporting to S3 with Wasabi, it has worked for some files, but for one large file I had to set a partsize value otherwise the upload would be rejected. + +Even with a partsize value, the upload still fails though. It gets right to the end and then fails with `WrongRequestBodyStreamSize`. + +``` +export wasabi 2020-01-24/guix_data_service_full.dump +100% 7.31 GiB 45 MiB/s 0s + HttpExceptionRequest Request { + host = "data-guix-gnu-org-database-dumps.s3.wasabisys.com" + port = 443 + secure = True + requestHeaders = [("Date","Sat, 22 Feb 2020 08:49:58 GMT"),("Authorization","")] + path = "/2020-01-24/guix_data_service_full.dump" + queryString = "partNumber=78&uploadId=MaRGWhxX1uZZ12IdGSSsc3rinnMgmKMfOcXRPuSo1zxzCviuBR37gxh_4gAmtvOmiRyqJjCMfgO4zhqKOB2Fi8WPLE2b7Iix-2gXkXgcIvLVlijoixwRS-UcllhHhG38" + method = "PUT" + proxy = Nothing + rawBody = False + redirectCount = 10 + responseTimeout = ResponseTimeoutDefault + requestVersion = HTTP/1.1 + } + (WrongRequestBodyStreamSize 31419314 29814466) +failed +``` + +I've tried with partsizes from 100MB to 5GB, and I'm on a pretty recent version of git-annex: 7.20200219. I'm not sure if this is an issue in git-annex, or the aws or http-client dependencies. + +Thanks, + +Chris diff --git a/doc/forum/locks_during_--batch_operations/comment_2_b19ed6657064c11bf54673f1b2ff4860._comment b/doc/forum/locks_during_--batch_operations/comment_2_b19ed6657064c11bf54673f1b2ff4860._comment new file mode 100644 index 0000000000..4cee1a00a6 --- /dev/null +++ b/doc/forum/locks_during_--batch_operations/comment_2_b19ed6657064c11bf54673f1b2ff4860._comment @@ -0,0 +1,20 @@ +[[!comment format=mdwn + username="Ilya_Shlyakhter" + avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0" + subject="locking granularity" + date="2020-02-20T22:35:33Z" + content=""" +Joey, + +I very much value all the work/thought you've put into making git-annex robust, starting with choosing Haskell. + +As to why the question keeps coming up... + +I often find myself wanting to use git-annex in what seems to me non-standard ways, so it's possible the usage pattern wasn't planned/optimized/tested for. +E.g. with `git-annex-get --batch` the typical usage would be to feed a large batch of keys at once, and to not have other git-annex processes running at the time. +The git-annex test suite [[does not test under concurrency|todo/add_tests_under_concurrency]]. I've run into intermittent failures with concurrent operations, that +were fixed by disabling concurrency. I'll try at some point to isolate reproducible examples of these failures, but they do happen quite consistently on my system. + +I understand that git-annex is parallelism-safe in that parallelism does not cause data loss. But things short of data loss, like intermittent failures/deadlocks, are something I still need to work around, when using git-annex as a building block in larger automated workflows. + +"""]] diff --git a/doc/related_software.mdwn b/doc/related_software.mdwn index b067e8df99..551908e94d 100644 --- a/doc/related_software.mdwn +++ b/doc/related_software.mdwn @@ -55,4 +55,6 @@ designed to interoperate with it. * [AnnexRemote](https://github.com/Lykos153/AnnexRemote) is a python library for creating new external special remotes. +* [Git::Annex](https://metacpan.org/release/Git-Annex) is a Perl interface to git-annex, including a script `annex-review-unused` for interactively processing git-annex-unused output + See also [[not]] for software that is *not* related to git-annex, but similar. diff --git a/doc/security/comment_5_616b08553430a64beab196571c5b9cdc._comment b/doc/security/comment_5_616b08553430a64beab196571c5b9cdc._comment new file mode 100644 index 0000000000..0bc9fba814 --- /dev/null +++ b/doc/security/comment_5_616b08553430a64beab196571c5b9cdc._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 5" + date="2020-02-21T00:07:53Z" + content=""" +[done](https://git-annex.branchable.com/todo/Provide_a_way_to_white_list_local_networks___40__not_just_specific_IPs__41__/) +"""]] diff --git a/doc/todo/Provide_a_way_to_white_list_local_networks___40__not_just_specific_IPs__41__.mdwn b/doc/todo/Provide_a_way_to_white_list_local_networks___40__not_just_specific_IPs__41__.mdwn new file mode 100644 index 0000000000..0cf990e830 --- /dev/null +++ b/doc/todo/Provide_a_way_to_white_list_local_networks___40__not_just_specific_IPs__41__.mdwn @@ -0,0 +1,5 @@ +Original whining and use case [is buried in the comments to CVE fix announcement](https://git-annex.branchable.com/security/#comment-7d9435f13213b85896a3424c342c9a1f). In summary: `http_proxy` environment variable could point to some address within local network (e.g. `10.0.0.1/24`). `security.allowed-ip-addresses` (or older `annex.security.allowed-http-addresses`) seems to support only a whitespace separate list of addresses (too tedious to list for sizeable private networks) or "all" (too much) not support networks. +In case of `http_proxy` it would also be valuable to be able to limit to specific (e.g. privileged) port(s) (thus kicking back to `http` from a generic `ip`), to avoid opening access to some malicious user providing URLs on the same private network on some other port. I think both address + port "wishlist" items are related thus filing in a single issue. + +[[!meta author=yoh]] +[[!tag projects/datalad]]