Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2023-12-20 12:52:46 -04:00
commit da5726e790
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 159 additions and 0 deletions

View file

@ -777,3 +777,7 @@ backups, where it gives structure to my image-based backup routines, so you coul
say I'm a believer. :)
[[!meta author=jkniiv]]
### Update 20 Dec 2023
P.S. This is a bit embarrasing but I found out this is [[notabug|done]], cf. my comment --[[jkniiv]]

View file

@ -0,0 +1,67 @@
[[!comment format=mdwn
username="jkniiv"
avatar="http://cdn.libravatar.org/avatar/05fd8b33af7183342153e8013aa3713d"
subject="my report was actually a User Failure on my part"
date="2023-12-19T23:02:15Z"
content="""
Uh-oh, after adding the option `--test-debug` to the `test` subcommand I got a lead on
the real culprit and it wasn't git-annex but libmagic:
[[!format sh \"\"\"
[...snip...]
ok
[2023-12-19 22:52:17.5370274] (Utility.Process) process [21636] done ExitSuccess
[2023-12-19 22:52:17.5460328] (Utility.Process) process [10460] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-p
athspecs\",\"-c\",\"annex.debug=true\",\"ls-files\",\"-z\",\"--modified\",\"--\",\"foo\"]
[2023-12-19 22:52:17.5970346] (Utility.Process) process [10460] done ExitSuccess
[2023-12-19 22:52:17.6030281] (Utility.Process) process [9208] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pa
thspecs\",\"-c\",\"annex.debug=true\",\"diff\",\"--name-only\",\"--diff-filter=T\",\"-z\",\"--cached\",\"--\",\"foo\"]
[2023-12-19 22:52:17.6550242] (Utility.Process) process [9208] done ExitSuccess
(recording state in git...)
[2023-12-19 22:52:17.6650254] (Utility.Process) process [24064] feed: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-p
athspecs\",\"-c\",\"annex.debug=true\",\"update-index\",\"-z\",\"--index-info\"]
[2023-12-19 22:52:17.7090277] (Utility.Process) process [24064] done ExitSuccess
[2023-12-19 22:52:17.7240248] (Utility.Process) process [22248] feed: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-p
athspecs\",\"-c\",\"annex.debug=true\",\"update-index\",\"-z\",\"--index-info\"]
[2023-12-19 22:52:17.7700259] (Utility.Process) process [22248] done ExitSuccess
[2023-12-19 22:52:17.7780282] (Utility.Process) process [23188] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-p
athspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"--hash\",\"refs/heads/git-annex\"]
[2023-12-19 22:52:17.8270228] (Utility.Process) process [23188] done ExitSuccess
[2023-12-19 22:52:17.8340274] (Utility.Process) process [17648] feed: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-p
athspecs\",\"-c\",\"annex.debug=true\",\"update-index\",\"-z\",\"--index-info\"]
[2023-12-19 22:52:17.8420331] (Utility.Process) process [21928] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-p
athspecs\",\"-c\",\"annex.debug=true\",\"diff-index\",\"--raw\",\"-z\",\"-r\",\"--no-renames\",\"-l0\",\"--cached\",\"refs/heads/git-annex\",
\"--\"]
[2023-12-19 22:52:17.8980416] (Utility.Process) process [21928] done ExitSuccess
[2023-12-19 22:52:17.9060216] (Utility.Process) process [17648] done ExitSuccess
[2023-12-19 22:52:17.920024] (Utility.Process) process [18680] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pa
thspecs\",\"-c\",\"annex.debug=true\",\"write-tree\"]
[2023-12-19 22:52:17.982027] (Utility.Process) process [18680] done ExitSuccess
[2023-12-19 22:52:17.9890276] (Utility.Process) process [10892] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-p
athspecs\",\"-c\",\"annex.debug=true\",\"commit-tree\",\"bb68ce15d91f27eec51749b9481ede5cc6bcd190\",\"--no-gpg-sign\",\"-p\",\"refs/he
ads/git-annex\"]
[2023-12-19 22:52:18.0490233] (Utility.Process) process [10892] done ExitSuccess
[2023-12-19 22:52:18.0570304] (Utility.Process) process [6552] call: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pa
thspecs\",\"-c\",\"annex.debug=true\",\"update-ref\",\"refs/heads/git-annex\",\"71b9bb13e4244bd2233de7ca7cf7fc01011f1e62\"]
[2023-12-19 22:52:18.1170343] (Utility.Process) process [6552] done ExitSuccess
[2023-12-19 22:52:18.1300416] (Utility.Process) process [10116] done ExitSuccess
[2023-12-19 22:52:18.1350241] (Utility.Process) process [20976] done ExitSuccess
[2023-12-19 22:52:18.1400231] (Utility.Process) process [9880] done ExitSuccess
[2023-12-19 22:52:18.1450249] (Utility.Process) process [872] done ExitSuccess
C:\Users\jkniiv\AppData\Local/.magic/magic.mgc, 1: Warning: offset `∟♦▲FAIL (2.36s)
.\\Test\\Framework.hs:83:
add with SHA1 failed with unexpected exit code
Use -p '(/Init Tests.add/||/Init Tests/)&&/add/' to rerun this test only.
1 out of 2 tests failed (8.86s)
git-annex: thread blocked indefinitely in an STM transaction
\"\"\"]]
So the real error turned out to be a user failure of mine: libmagic (or the msys2 package `mingw-w64-x86_64-file`)
had a recent update and the new library didn't like my previous magic database located in the fallback location
`%localappdata%\.magic\magic.mgc`. By copying the msys2 file `/mingw64/share/misc/magic.mgc` to the aforementioned
location, the whole issue cleared itself and this report became moot.
"""]]

View file

@ -0,0 +1,66 @@
### Please describe the problem.
I'm not sure if what I am experiencing is a bug, or just something I am doing incorrectly.
I am running into an issue with git-annex-import where it seems to stall, and use all available ram it can until the system terminates the process.
### What steps will reproduce the problem?
Here are my prep steps:
```sh
git init
git annex initremote s3-data type=S3 encryption=none port=443 protocol=https public=no \
importtree=yes versioning=yes host=$S3HOST bucket=$BUCKET fileprefix=primary_folder/
git annex wanted s3-data "exclude=subfolder-*/* and include=specialfilename1.*"
git annex import main --from s3-data --skip-duplicates --backend MD5E --jobs=4
```
### What version of git-annex are you using? On what operating system?
Originally, 10.20230321 on Debian Bookworm
I also tried 10.20231129 on Debian Bookworm with the same results
### Please provide any additional information below.
There are around 22000 files under the prefix I am trying to import from , and it amounts to around 115 GB. However, most of that data is part of many seperate subdatasets underneath this one. These have all worked fine and without any issue.
There are only 2 files I am actually trying to import, though there are several versions(about 70 each for a total of 140) of them at this location. In this example, that is `specialfilename1.json` & `specialfilename1.csv`
When I use debug mode on the import command a lot of information is printed, but it mostly seems to amount to filenames that I would think would be excluded based on my `git-annex-wanted` command. That output looks like the following until it stops, and then uses up what's available for RAM before inevitably terminating the process.
```
[2023-12-18 23:47:54.924814306] (Utility.Process) process [11787] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"]
[2023-12-18 23:47:54.929719575] (Utility.Process) process [11787] done ExitSuccess
[2023-12-18 23:47:54.930053775] (Utility.Process) process [11789] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"]
[2023-12-18 23:47:54.935010757] (Utility.Process) process [11789] done ExitSuccess
[2023-12-18 23:47:54.935508638] (Utility.Process) process [11790] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","log","refs/heads/git-annex..a6d7c6ae03747e23c2bedbecc8d1a5afeabe5220","--pretty=%H","-n1"]
[2023-12-18 23:47:54.940887356] (Utility.Process) process [11790] done ExitSuccess
[2023-12-18 23:47:54.941241371] (Utility.Process) process [11791] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","log","refs/heads/git-annex..27a0dedea083605106c614a159d48fd3daa92284","--pretty=%H","-n1"]
[2023-12-18 23:47:54.947198539] (Utility.Process) process [11791] done ExitSuccess
[2023-12-18 23:47:54.949236306] (Utility.Process) process [11792] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"]
[2023-12-18 23:47:54.971233148] (Utility.Process) process [11793] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/remotes/s3-data/main"]
[2023-12-18 23:47:54.97613841] (Utility.Process) process [11793] done ExitFailure 1
...
String to sign: "GET\n\n\nMon, 18 Dec 2023 23:47:57 GMT\n/bucketname/?versions"
[2023-12-18 23:47:57.978207613] (Remote.S3) Host: "bucketname.s3-us-east-1.amazonaws.com"
[2023-12-18 23:47:57.978237701] (Remote.S3) Path: "/"
[2023-12-18 23:47:57.978260264] (Remote.S3) Query string: "versions&key-marker=primary_folder%subfolder-100101%2sessions.json&prefix=primary_folder%2F&version-id-marker=Xg1KUaCh6tpvJ2E1juz4qobn.w3.x9k"
[2023-12-18 23:47:57.978337803] (Remote.S3) Header: [("Date","Mon, 18 Dec 2023 23:47:57 GMT"),("Authorization","AWS [Redacted]")]
[2023-12-18 23:47:58.003376688] (Remote.S3) Response status: Status {statusCode = 200, statusMessage = "OK"}
[2023-12-18 23:47:58.00343782] (Remote.S3) Response header 'Transfer-Encoding': 'chunked'
[2023-12-18 23:47:58.003472907] (Remote.S3) Response header 'x-amz-request-id': 'tx00000925b31abf8c9c162-006580da2d-19170577-default'
[2023-12-18 23:47:58.003527331] (Remote.S3) Response header 'Content-Type': 'application/xml'
[2023-12-18 23:47:58.003557859] (Remote.S3) Response header 'Date': 'Mon, 18 Dec 2023 23:47:58 GMT
```
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
In the past, on smaller versions of data structured the same way, this setup has worked, and I don't run into this issue.
I'm not exactly sure how to troubleshoot further and I am feeling stuck. Is there something else I can be doing to see more info about what's happening behind the scenes?

View file

@ -0,0 +1,22 @@
[[!comment format=mdwn
username="unqueued"
avatar="http://cdn.libravatar.org/avatar/3bcbe0c9e9825637ad7efa70f458640d"
subject="comment 6"
date="2023-12-19T18:50:08Z"
content="""
Whoa, thanks for implementing that Joey! Can't wait to give it a try!
FYI, one of the cases I was talking about before where I repeatedly import keys in MD5E format, is that construct an annex repo, set web urls, and deal with mirroring further down the pipeline.
Code isn't great, just something I threw together years ago:
https://github.com/unqueued/annex-drive-share
Because it is gdrive, I can get MD5s and filenames with rclone urls for web remotes.
The way I use it is to init a new annex repo (reusing the same uuid), and then absorb into primary downstream repo overwriting filenames and letting sync update any new keys with the primary repo. Considered using subtree.
It does end up causing merge commits to build up in the git-annex branch, but I might want to run this on a server without sharing an entire repo.
It happens to make sense for me because I have an unlimited @edu gdrive account and it can work great for some workflows as an intermediate file store.
"""]]