Merge branch 'master' into fromto

This commit is contained in:
Joey Hess 2023-01-23 17:44:44 -04:00
commit 3585481470
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
7 changed files with 160 additions and 0 deletions

View file

@ -0,0 +1,30 @@
### Please describe the problem.
When I attempt to create a S3 remote against my garage[1] cluster, it errors with the following:
```
$ git annex initremote garage type=S3 encryption=none host=my-s3-endpoint.domain.com protocol=https bucket=git-annex requeststyle=path datacenter=garage signature=v4
initremote garage (checking bucket...) (creating bucket in garage...)
git-annex: S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "AuthorizationHeaderMalformed", s3ErrorMessage = "Authorization header malformed, expected scope: 20230118/my-s3-endpoint.domain.com/s3/aws4_request", s3ErrorResource = Just "/git-annex/", s3ErrorHostId = Nothing, s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
failed
initremote: 1 failed
$ git annex initremote garage type=S3 encryption=none host=my-s3-endpoint.domain.com protocol=https bucket=git-annex requeststyle=path datacenter=garage
initremote garage (checking bucket...) (creating bucket in garage...)
git-annex: S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "InvalidRequest", s3ErrorMessage = "Bad request: Unsupported authorization method", s3ErrorResource = Just "/git-annex/", s3ErrorHostId = Nothing, s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
failed
initremote: 1 failed
```
Garage appears to support v4 signatures: https://garagehq.deuxfleurs.fr/documentation/reference-manual/s3-compatibility/#high-level-features - and other S3 tooling works against the endpoint.
### What version of git-annex are you using? On what operating system?
Fedora Silverblue 37 / git-annex-10.20221212-1.fc37.x86_64
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Yes, many years ago - now trying to get it up and running with my self-hosted S3 endpoint.
[1]: https://garagehq.deuxfleurs.fr/

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="jpds"
avatar="http://cdn.libravatar.org/avatar/24d746ec6a7726b162c12ecceb3ee267"
subject="comment 1"
date="2023-01-18T22:57:58Z"
content="""
Error on Garage's side is triggered here: https://git.deuxfleurs.fr/Deuxfleurs/garage/src/commit/fcc5033466e58e3beec05ee7748d33522b6b32b0/src/api/signature/payload.rs#L297
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="jpds"
avatar="http://cdn.libravatar.org/avatar/24d746ec6a7726b162c12ecceb3ee267"
subject="comment 2"
date="2023-01-19T15:09:01Z"
content="""
I took a look at the credentialv4 structure at https://github.com/aristidb/aws/blob/9bdc4ee018d0d9047c0434eeb21e2383afaa9ccf/Aws/Core.hs#L621 and found it curious that it has the region inside the scope (as the garage code) does... however in my error message from git-annex - the hostname of the S3 service is what's inside the scope instead of the 'garage' region name.
I therefore adjusted the garage API's configuration to have the FQDN as the region and then... git-annex Just Worked.
"""]]

View file

@ -0,0 +1,43 @@
[[!comment format=mdwn
username="jpds"
avatar="http://cdn.libravatar.org/avatar/24d746ec6a7726b162c12ecceb3ee267"
subject="comment 3"
date="2023-01-19T16:28:19Z"
content="""
I believe the fix for this is:
```
diff --git a/Remote/S3.hs b/Remote/S3.hs
index f5014202e..49f2ebd58 100644
--- a/Remote/S3.hs
+++ b/Remote/S3.hs
@@ -948,8 +948,8 @@ s3Configuration c = cfg
| otherwise -> AWS.HTTP
cfg = case getRemoteConfigValue signatureField c of
Just (SignatureVersion 4) ->
- S3.s3v4 proto endpoint False S3.SignWithEffort
- _ -> S3.s3 proto endpoint False
+ S3.s3v4 proto datacenter False S3.SignWithEffort
+ _ -> S3.s3 proto datacenter False
data S3Info = S3Info
{ bucket :: S3.Bucket
```
...however I cannot test it myself right now as it's failing to compile on another bit of code:
```
[452 of 679] Compiling Remote.S3
git/joeyh/git-annex.branchable.com/Remote/S3.hs:922:68: error:
• Couldn't match type B8.ByteString with [Char]
Expected type: String
Actual type: B8.ByteString
• In the first argument of T.pack, namely datacenter
In the second argument of ($), namely T.pack datacenter
In the expression: AWS.s3HostName $ T.pack datacenter
|
922 | | h == AWS.s3DefaultHost = AWS.s3HostName $ T.pack datacenter
| ^^^^^^^^^^
```
"""]]

View file

@ -0,0 +1,26 @@
Hey Joey,
If I understand correctly, the default content expression (when it's empty, e.g. after a `git annex init` or `git clone ...;git annex sync`) is currently apparently `anything`. This means that a `git annex sync --content` (or just `git annex sync` if `git config --set annex.synccontent true`) will fetch all files.
It would be very handy if there was something like:
[[!format bash """
git annex config --set annex.defaultwanted ...
git annex config --set annex.defaultgroup ...
git annex config --set annex.defaultgroupwanted ...
git annex config --set annex.defaultrequired ...
# and the corresponding git variant for user-overriding
git config [--global|--system] annex.defaultwanted ...
git config [--global|--system] annex.defaultgroup ...
git config [--global|--system] annex.defaultgroupwanted ...
git config [--global|--system] annex.defaultrequired ...
"""]]
These defaults would be applied when `git annex` initializes a repository (i.e. gives it a `annex.uuid`, e.g. `git annex init` or `git annex sync` of a fresh clone of a repo with annex).
I like my annexed/datalad repos (mostly research data next to analysis code for collaboration) to have `annex.synccontent = true` so people can just do (`datalad save`/`git annex add`) `git annex sync` and be sure afterwards everything is in order and safe. However as the default `wanted` is `anything` (apparently), they also get all files they probably don't want if they don't to `git annex wanted . present` manually (and manual boilerplate config and extra steps is always something that's nice to automate). Something like `git annex config --set annex.defaultwanted present` would solve this.
Thanks again very much for git-annex, I love it! 💛
Yann

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="yarikoptic"
avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4"
subject="comment 6"
date="2023-01-18T17:55:49Z"
content="""
FWIW: I also feel that 2nd one (absent affect on a possibly present locally copy) would be preferable.
"""]]

View file

@ -0,0 +1,35 @@
[[!comment format=mdwn
username="joey"
subject="""comment 6"""
date="2023-01-20T15:11:50Z"
content="""
I've started on an implementation of this, in the `fromto` branch.
Downloading to a local temp file has some complications which make me want
to avoid it is possible. For one thing, these temp files would have to
somehow get cleaned up after an interrupted move. For another, two
concurrent move processes from different remotes to different remotes would
need to either use separate temp files (wasting disk space) or locking so
only one uses the temp file at one time. The existing code in
Annex.Transfer would have to be parameterized with the temp file to use,
but then the transfer log/lock files that are used by that code would be
problematic. So perhaps that Annex.Transfer code could not be reused, but
then it would need to independeantly deal with resuming, locking, and stall
detection.
So, I'm considering downloading --from the remote as usual, populating the
local annex with the content, sending that --to the remote, and then
dropping the local copy. That has its own complications, but they seem
mostly less. Although there are two small races that I have not been able
to resolve yet, which would result in `git-annex move --from --to`, when
run concurrently with a `git-annex get` type process, result in the local
copy not being present at the end (see [[!commit a46c385aec2584419330c5dbb571c19ceb92f6fb]]).
That would be surprising behavior, but also unlikely to happen.
(And perhaps not too surprising, since running `git-annex move --to`
concurrently with `git-annex get` can of course result in the local copy
not being present at the end..)
The latter approach also has the problem that, when the file is unlocked, the
unlocked file would get populated after downloading the content, which would be
unncessary work.
"""]]