Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2023-05-27 13:09:48 -04:00
commit 595adac6ea
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38

View file

@ -0,0 +1,63 @@
### Please describe the problem.
When using `git annex copy` to copy ~4M files to s3 I am running into what appears to be a bug in the library used to access s3.
### What steps will reproduce the problem?
I am using git annex initially through datalad. The problem occurs when running these commands run from inside a clone of a repository that exists as a RIA store on the local file system. The RIA store remote is named `output-storage`. The s3 special remote is named `fcp-indi`. The command producing the error is `git annex copy --all --from output-storage --to fcp-indi -J 9`.
### What version of git-annex are you using? On what operating system?
```
git-annex version: 10.20230330-g98a3ba0ea
build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV
dependency versions: aws-0.22.1 bloomfilter-2.0.1.0 cryptonite-0.29 DAV-1.3.4 feed-1.3.2.1 ghc-9.0.2 http-client-0.7.13.1 persistent-sqlite-2.13.1.0 torrent-10000.1.1 uuid-1.3.15 yesod-1.6.2.1
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: linux x86_64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
```
The OS is Centos7 x86_64, with 10 CPUs available and 64GB RAM. Reducing the available RAM will cause the error to happen earlier.
### Please provide any additional information below.
The error can appear as :
```
S3Error {s3StatusCode = Status {statusCode = 500, statusMessage = "Internal Server Error"}, s3ErrorCode = "InternalError", s3ErrorMessage = "We encountered an internal error. Please try again.", s3ErrorResource = Nothing, s3ErrorHostId = Just "[shasum redacted]=", s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
ok
git-annex: out of memory
```
or
```
copy MD5E-s874193--d602a02b0cfebc008f9da8b02ade0114.nii.gz (from output-storage..
.) (checksum...) (to fcp-indi...)
HttpExceptionRequest Request {
host = "fcp-indi.s3.amazonaws.com"
port = 80
secure = False
requestHeaders = [("Date","Wed, 24 May 2023 19:19:47 GMT"),("Content-Ty
pe","application/gzip"),("Authorization","<REDACTED>"),("x-amz-acl","public-rea
d"),("x-amz-storage-class","STANDARD")]
path = "/data/Projects/RBC/HBN_CPAC/MD5E-s874193--d602a02b0cfebc008f9da8b02ade0114.nii.gz"
queryString = ""
method = "PUT"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
proxySecureMode = ProxySecureWithConnect
}
(ConnectionFailure Network.Socket.getAddrInfo (called with preferred socket type/protocol: AddrInfo {addrFlags = [], addrFamily = AF_UNSPEC, addrSocketType = Stream, addrProtocol = 6, addrAddress = 0.0.0.0:0, addrCanonName = Nothing}, host name: Just "fcp-indi.s3.amazonaws.com", service name: Just "80"): does not exist (Temporary failure in name resolution))
failed
```
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Lots of luck, it's a main part of our workflow!