From 19edfc69a79f42aa1ac8c285f713abd5846be58e Mon Sep 17 00:00:00 2001 From: lemondata Date: Tue, 19 Dec 2023 00:17:00 +0000 Subject: [PATCH 1/4] --- ...ort_stalls_and_uses_all_ram_available.mdwn | 66 +++++++++++++++++++ 1 file changed, 66 insertions(+) create mode 100644 doc/bugs/git-annex-import_stalls_and_uses_all_ram_available.mdwn diff --git a/doc/bugs/git-annex-import_stalls_and_uses_all_ram_available.mdwn b/doc/bugs/git-annex-import_stalls_and_uses_all_ram_available.mdwn new file mode 100644 index 0000000000..3efe9ba570 --- /dev/null +++ b/doc/bugs/git-annex-import_stalls_and_uses_all_ram_available.mdwn @@ -0,0 +1,66 @@ +### Please describe the problem. + +I'm not sure if what I am experiencing is a bug, or just something I am doing incorrectly. + +I am running into an issue with git-annex-import where it seems to stall, and use all available ram it can until the system terminates the process. + +### What steps will reproduce the problem? + +Here are my prep steps: + +```sh +git init + +git annex initremote s3-data type=S3 encryption=none port=443 protocol=https public=no \ +importtree=yes versioning=yes host=$S3HOST bucket=$BUCKET fileprefix=primary_folder/ + +git annex wanted s3-data "exclude=subfolder-*/* and include=specialfilename1.*" + +git annex import main --from s3-data --skip-duplicates --backend MD5E --jobs=4 +``` + +### What version of git-annex are you using? On what operating system? + +Originally, 10.20230321 on Debian Bookworm + +I also tried 10.20231129 on Debian Bookworm with the same results + +### Please provide any additional information below. + +There are around 22000 files under the prefix I am trying to import from , and it amounts to around 115 GB. However, most of that data is part of many seperate subdatasets underneath this one. These have all worked fine and without any issue. + +There are only 2 files I am actually trying to import, though there are several versions(about 70 each for a total of 140) of them at this location. In this example, that is `specialfilename1.json` & `specialfilename1.csv` + +When I use debug mode on the import command a lot of information is printed, but it mostly seems to amount to filenames that I would think would be excluded based on my `git-annex-wanted` command. That output looks like the following until it stops, and then uses up what's available for RAM before inevitably terminating the process. + +``` +[2023-12-18 23:47:54.924814306] (Utility.Process) process [11787] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"] +[2023-12-18 23:47:54.929719575] (Utility.Process) process [11787] done ExitSuccess +[2023-12-18 23:47:54.930053775] (Utility.Process) process [11789] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"] +[2023-12-18 23:47:54.935010757] (Utility.Process) process [11789] done ExitSuccess +[2023-12-18 23:47:54.935508638] (Utility.Process) process [11790] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","log","refs/heads/git-annex..a6d7c6ae03747e23c2bedbecc8d1a5afeabe5220","--pretty=%H","-n1"] +[2023-12-18 23:47:54.940887356] (Utility.Process) process [11790] done ExitSuccess +[2023-12-18 23:47:54.941241371] (Utility.Process) process [11791] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","log","refs/heads/git-annex..27a0dedea083605106c614a159d48fd3daa92284","--pretty=%H","-n1"] +[2023-12-18 23:47:54.947198539] (Utility.Process) process [11791] done ExitSuccess +[2023-12-18 23:47:54.949236306] (Utility.Process) process [11792] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"] +[2023-12-18 23:47:54.971233148] (Utility.Process) process [11793] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/remotes/s3-data/main"] +[2023-12-18 23:47:54.97613841] (Utility.Process) process [11793] done ExitFailure 1 +... + +String to sign: "GET\n\n\nMon, 18 Dec 2023 23:47:57 GMT\n/bucketname/?versions" +[2023-12-18 23:47:57.978207613] (Remote.S3) Host: "bucketname.s3-us-east-1.amazonaws.com" +[2023-12-18 23:47:57.978237701] (Remote.S3) Path: "/" +[2023-12-18 23:47:57.978260264] (Remote.S3) Query string: "versions&key-marker=primary_folder%subfolder-100101%2sessions.json&prefix=primary_folder%2F&version-id-marker=Xg1KUaCh6tpvJ2E1juz4qobn.w3.x9k" +[2023-12-18 23:47:57.978337803] (Remote.S3) Header: [("Date","Mon, 18 Dec 2023 23:47:57 GMT"),("Authorization","AWS [Redacted]")] +[2023-12-18 23:47:58.003376688] (Remote.S3) Response status: Status {statusCode = 200, statusMessage = "OK"} +[2023-12-18 23:47:58.00343782] (Remote.S3) Response header 'Transfer-Encoding': 'chunked' +[2023-12-18 23:47:58.003472907] (Remote.S3) Response header 'x-amz-request-id': 'tx00000925b31abf8c9c162-006580da2d-19170577-default' +[2023-12-18 23:47:58.003527331] (Remote.S3) Response header 'Content-Type': 'application/xml' +[2023-12-18 23:47:58.003557859] (Remote.S3) Response header 'Date': 'Mon, 18 Dec 2023 23:47:58 GMT +``` + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +In the past, on smaller versions of data structured the same way, this setup has worked, and I don't run into this issue. + +I'm not exactly sure how to troubleshoot further and I am feeling stuck. Is there something else I can be doing to see more info about what's happening behind the scenes? From 09acfef0b616ebad7c7d3be27fd4547494b95e3e Mon Sep 17 00:00:00 2001 From: unqueued Date: Tue, 19 Dec 2023 18:50:08 +0000 Subject: [PATCH 2/4] Added a comment --- ..._3db8d28e13f6508e22e8488a6faaa8b2._comment | 22 +++++++++++++++++++ 1 file changed, 22 insertions(+) create mode 100644 doc/forum/Revisiting_migration_and_multiple_keys/comment_6_3db8d28e13f6508e22e8488a6faaa8b2._comment diff --git a/doc/forum/Revisiting_migration_and_multiple_keys/comment_6_3db8d28e13f6508e22e8488a6faaa8b2._comment b/doc/forum/Revisiting_migration_and_multiple_keys/comment_6_3db8d28e13f6508e22e8488a6faaa8b2._comment new file mode 100644 index 0000000000..83a860c358 --- /dev/null +++ b/doc/forum/Revisiting_migration_and_multiple_keys/comment_6_3db8d28e13f6508e22e8488a6faaa8b2._comment @@ -0,0 +1,22 @@ +[[!comment format=mdwn + username="unqueued" + avatar="http://cdn.libravatar.org/avatar/3bcbe0c9e9825637ad7efa70f458640d" + subject="comment 6" + date="2023-12-19T18:50:08Z" + content=""" +Whoa, thanks for implementing that Joey! Can't wait to give it a try! + + +FYI, one of the cases I was talking about before where I repeatedly import keys in MD5E format, is that construct an annex repo, set web urls, and deal with mirroring further down the pipeline. + +Code isn't great, just something I threw together years ago: +https://github.com/unqueued/annex-drive-share + +Because it is gdrive, I can get MD5s and filenames with rclone urls for web remotes. + +The way I use it is to init a new annex repo (reusing the same uuid), and then absorb into primary downstream repo overwriting filenames and letting sync update any new keys with the primary repo. Considered using subtree. + +It does end up causing merge commits to build up in the git-annex branch, but I might want to run this on a server without sharing an entire repo. + +It happens to make sense for me because I have an unlimited @edu gdrive account and it can work great for some workflows as an intermediate file store. +"""]] From 191dde2857caa0823da343c330469237e0e331c7 Mon Sep 17 00:00:00 2001 From: jkniiv Date: Tue, 19 Dec 2023 23:02:16 +0000 Subject: [PATCH 3/4] Added a comment: my report was actually a User Failure on my part --- ..._7ac6938faef9111da701080a62c4d7e9._comment | 67 +++++++++++++++++++ 1 file changed, 67 insertions(+) create mode 100644 doc/bugs/__39__Production__39___build_doesn__39__t_pass_testsuite_on_Win32/comment_1_7ac6938faef9111da701080a62c4d7e9._comment diff --git a/doc/bugs/__39__Production__39___build_doesn__39__t_pass_testsuite_on_Win32/comment_1_7ac6938faef9111da701080a62c4d7e9._comment b/doc/bugs/__39__Production__39___build_doesn__39__t_pass_testsuite_on_Win32/comment_1_7ac6938faef9111da701080a62c4d7e9._comment new file mode 100644 index 0000000000..6b70005ccf --- /dev/null +++ b/doc/bugs/__39__Production__39___build_doesn__39__t_pass_testsuite_on_Win32/comment_1_7ac6938faef9111da701080a62c4d7e9._comment @@ -0,0 +1,67 @@ +[[!comment format=mdwn + username="jkniiv" + avatar="http://cdn.libravatar.org/avatar/05fd8b33af7183342153e8013aa3713d" + subject="my report was actually a User Failure on my part" + date="2023-12-19T23:02:15Z" + content=""" +Uh-oh, after adding the option `--test-debug` to the `test` subcommand I got a lead on +the real culprit and it wasn't git-annex but libmagic: + +[[!format sh \"\"\" +[...snip...] +ok +[2023-12-19 22:52:17.5370274] (Utility.Process) process [21636] done ExitSuccess +[2023-12-19 22:52:17.5460328] (Utility.Process) process [10460] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-p +athspecs\",\"-c\",\"annex.debug=true\",\"ls-files\",\"-z\",\"--modified\",\"--\",\"foo\"] +[2023-12-19 22:52:17.5970346] (Utility.Process) process [10460] done ExitSuccess +[2023-12-19 22:52:17.6030281] (Utility.Process) process [9208] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pa +thspecs\",\"-c\",\"annex.debug=true\",\"diff\",\"--name-only\",\"--diff-filter=T\",\"-z\",\"--cached\",\"--\",\"foo\"] +[2023-12-19 22:52:17.6550242] (Utility.Process) process [9208] done ExitSuccess +(recording state in git...) +[2023-12-19 22:52:17.6650254] (Utility.Process) process [24064] feed: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-p +athspecs\",\"-c\",\"annex.debug=true\",\"update-index\",\"-z\",\"--index-info\"] +[2023-12-19 22:52:17.7090277] (Utility.Process) process [24064] done ExitSuccess +[2023-12-19 22:52:17.7240248] (Utility.Process) process [22248] feed: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-p +athspecs\",\"-c\",\"annex.debug=true\",\"update-index\",\"-z\",\"--index-info\"] +[2023-12-19 22:52:17.7700259] (Utility.Process) process [22248] done ExitSuccess +[2023-12-19 22:52:17.7780282] (Utility.Process) process [23188] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-p +athspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"--hash\",\"refs/heads/git-annex\"] +[2023-12-19 22:52:17.8270228] (Utility.Process) process [23188] done ExitSuccess +[2023-12-19 22:52:17.8340274] (Utility.Process) process [17648] feed: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-p +athspecs\",\"-c\",\"annex.debug=true\",\"update-index\",\"-z\",\"--index-info\"] +[2023-12-19 22:52:17.8420331] (Utility.Process) process [21928] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-p +athspecs\",\"-c\",\"annex.debug=true\",\"diff-index\",\"--raw\",\"-z\",\"-r\",\"--no-renames\",\"-l0\",\"--cached\",\"refs/heads/git-annex\", +\"--\"] +[2023-12-19 22:52:17.8980416] (Utility.Process) process [21928] done ExitSuccess +[2023-12-19 22:52:17.9060216] (Utility.Process) process [17648] done ExitSuccess +[2023-12-19 22:52:17.920024] (Utility.Process) process [18680] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pa +thspecs\",\"-c\",\"annex.debug=true\",\"write-tree\"] +[2023-12-19 22:52:17.982027] (Utility.Process) process [18680] done ExitSuccess +[2023-12-19 22:52:17.9890276] (Utility.Process) process [10892] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-p +athspecs\",\"-c\",\"annex.debug=true\",\"commit-tree\",\"bb68ce15d91f27eec51749b9481ede5cc6bcd190\",\"--no-gpg-sign\",\"-p\",\"refs/he +ads/git-annex\"] +[2023-12-19 22:52:18.0490233] (Utility.Process) process [10892] done ExitSuccess +[2023-12-19 22:52:18.0570304] (Utility.Process) process [6552] call: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pa +thspecs\",\"-c\",\"annex.debug=true\",\"update-ref\",\"refs/heads/git-annex\",\"71b9bb13e4244bd2233de7ca7cf7fc01011f1e62\"] +[2023-12-19 22:52:18.1170343] (Utility.Process) process [6552] done ExitSuccess +[2023-12-19 22:52:18.1300416] (Utility.Process) process [10116] done ExitSuccess +[2023-12-19 22:52:18.1350241] (Utility.Process) process [20976] done ExitSuccess +[2023-12-19 22:52:18.1400231] (Utility.Process) process [9880] done ExitSuccess +[2023-12-19 22:52:18.1450249] (Utility.Process) process [872] done ExitSuccess +C:\Users\jkniiv\AppData\Local/.magic/magic.mgc, 1: Warning: offset `∟♦▲FAIL (2.36s) + .\\Test\\Framework.hs:83: + add with SHA1 failed with unexpected exit code + Use -p '(/Init Tests.add/||/Init Tests/)&&/add/' to rerun this test only. + +1 out of 2 tests failed (8.86s) +git-annex: thread blocked indefinitely in an STM transaction + +\"\"\"]] + +So the real error turned out to be a user failure of mine: libmagic (or the msys2 package `mingw-w64-x86_64-file`) +had a recent update and the new library didn't like my previous magic database located in the fallback location +`%localappdata%\.magic\magic.mgc`. By copying the msys2 file `/mingw64/share/misc/magic.mgc` to the aforementioned +location, the whole issue cleared itself and this report became moot. + + +"""]] From 6c0259018a1ea033591561c09cd9195d6b21c7a0 Mon Sep 17 00:00:00 2001 From: jkniiv Date: Tue, 19 Dec 2023 23:12:21 +0000 Subject: [PATCH 4/4] close bug as notabug due to user error --- ...tion__39___build_doesn__39__t_pass_testsuite_on_Win32.mdwn | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/doc/bugs/__39__Production__39___build_doesn__39__t_pass_testsuite_on_Win32.mdwn b/doc/bugs/__39__Production__39___build_doesn__39__t_pass_testsuite_on_Win32.mdwn index 13e29221ab..4965962860 100644 --- a/doc/bugs/__39__Production__39___build_doesn__39__t_pass_testsuite_on_Win32.mdwn +++ b/doc/bugs/__39__Production__39___build_doesn__39__t_pass_testsuite_on_Win32.mdwn @@ -777,3 +777,7 @@ backups, where it gives structure to my image-based backup routines, so you coul say I'm a believer. :) [[!meta author=jkniiv]] + +### Update 20 Dec 2023 + +P.S. This is a bit embarrasing but I found out this is [[notabug|done]], cf. my comment --[[jkniiv]]