Merge branch 'master' of ssh://git-annex.branchable.com
This commit is contained in:
commit
da5726e790
4 changed files with 159 additions and 0 deletions
|
@ -777,3 +777,7 @@ backups, where it gives structure to my image-based backup routines, so you coul
|
||||||
say I'm a believer. :)
|
say I'm a believer. :)
|
||||||
|
|
||||||
[[!meta author=jkniiv]]
|
[[!meta author=jkniiv]]
|
||||||
|
|
||||||
|
### Update 20 Dec 2023
|
||||||
|
|
||||||
|
P.S. This is a bit embarrasing but I found out this is [[notabug|done]], cf. my comment --[[jkniiv]]
|
||||||
|
|
|
@ -0,0 +1,67 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="jkniiv"
|
||||||
|
avatar="http://cdn.libravatar.org/avatar/05fd8b33af7183342153e8013aa3713d"
|
||||||
|
subject="my report was actually a User Failure on my part"
|
||||||
|
date="2023-12-19T23:02:15Z"
|
||||||
|
content="""
|
||||||
|
Uh-oh, after adding the option `--test-debug` to the `test` subcommand I got a lead on
|
||||||
|
the real culprit and it wasn't git-annex but libmagic:
|
||||||
|
|
||||||
|
[[!format sh \"\"\"
|
||||||
|
[...snip...]
|
||||||
|
ok
|
||||||
|
[2023-12-19 22:52:17.5370274] (Utility.Process) process [21636] done ExitSuccess
|
||||||
|
[2023-12-19 22:52:17.5460328] (Utility.Process) process [10460] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-p
|
||||||
|
athspecs\",\"-c\",\"annex.debug=true\",\"ls-files\",\"-z\",\"--modified\",\"--\",\"foo\"]
|
||||||
|
[2023-12-19 22:52:17.5970346] (Utility.Process) process [10460] done ExitSuccess
|
||||||
|
[2023-12-19 22:52:17.6030281] (Utility.Process) process [9208] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pa
|
||||||
|
thspecs\",\"-c\",\"annex.debug=true\",\"diff\",\"--name-only\",\"--diff-filter=T\",\"-z\",\"--cached\",\"--\",\"foo\"]
|
||||||
|
[2023-12-19 22:52:17.6550242] (Utility.Process) process [9208] done ExitSuccess
|
||||||
|
(recording state in git...)
|
||||||
|
[2023-12-19 22:52:17.6650254] (Utility.Process) process [24064] feed: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-p
|
||||||
|
athspecs\",\"-c\",\"annex.debug=true\",\"update-index\",\"-z\",\"--index-info\"]
|
||||||
|
[2023-12-19 22:52:17.7090277] (Utility.Process) process [24064] done ExitSuccess
|
||||||
|
[2023-12-19 22:52:17.7240248] (Utility.Process) process [22248] feed: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-p
|
||||||
|
athspecs\",\"-c\",\"annex.debug=true\",\"update-index\",\"-z\",\"--index-info\"]
|
||||||
|
[2023-12-19 22:52:17.7700259] (Utility.Process) process [22248] done ExitSuccess
|
||||||
|
[2023-12-19 22:52:17.7780282] (Utility.Process) process [23188] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-p
|
||||||
|
athspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"--hash\",\"refs/heads/git-annex\"]
|
||||||
|
[2023-12-19 22:52:17.8270228] (Utility.Process) process [23188] done ExitSuccess
|
||||||
|
[2023-12-19 22:52:17.8340274] (Utility.Process) process [17648] feed: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-p
|
||||||
|
athspecs\",\"-c\",\"annex.debug=true\",\"update-index\",\"-z\",\"--index-info\"]
|
||||||
|
[2023-12-19 22:52:17.8420331] (Utility.Process) process [21928] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-p
|
||||||
|
athspecs\",\"-c\",\"annex.debug=true\",\"diff-index\",\"--raw\",\"-z\",\"-r\",\"--no-renames\",\"-l0\",\"--cached\",\"refs/heads/git-annex\",
|
||||||
|
\"--\"]
|
||||||
|
[2023-12-19 22:52:17.8980416] (Utility.Process) process [21928] done ExitSuccess
|
||||||
|
[2023-12-19 22:52:17.9060216] (Utility.Process) process [17648] done ExitSuccess
|
||||||
|
[2023-12-19 22:52:17.920024] (Utility.Process) process [18680] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pa
|
||||||
|
thspecs\",\"-c\",\"annex.debug=true\",\"write-tree\"]
|
||||||
|
[2023-12-19 22:52:17.982027] (Utility.Process) process [18680] done ExitSuccess
|
||||||
|
[2023-12-19 22:52:17.9890276] (Utility.Process) process [10892] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-p
|
||||||
|
athspecs\",\"-c\",\"annex.debug=true\",\"commit-tree\",\"bb68ce15d91f27eec51749b9481ede5cc6bcd190\",\"--no-gpg-sign\",\"-p\",\"refs/he
|
||||||
|
ads/git-annex\"]
|
||||||
|
[2023-12-19 22:52:18.0490233] (Utility.Process) process [10892] done ExitSuccess
|
||||||
|
[2023-12-19 22:52:18.0570304] (Utility.Process) process [6552] call: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pa
|
||||||
|
thspecs\",\"-c\",\"annex.debug=true\",\"update-ref\",\"refs/heads/git-annex\",\"71b9bb13e4244bd2233de7ca7cf7fc01011f1e62\"]
|
||||||
|
[2023-12-19 22:52:18.1170343] (Utility.Process) process [6552] done ExitSuccess
|
||||||
|
[2023-12-19 22:52:18.1300416] (Utility.Process) process [10116] done ExitSuccess
|
||||||
|
[2023-12-19 22:52:18.1350241] (Utility.Process) process [20976] done ExitSuccess
|
||||||
|
[2023-12-19 22:52:18.1400231] (Utility.Process) process [9880] done ExitSuccess
|
||||||
|
[2023-12-19 22:52:18.1450249] (Utility.Process) process [872] done ExitSuccess
|
||||||
|
C:\Users\jkniiv\AppData\Local/.magic/magic.mgc, 1: Warning: offset `∟♦▲FAIL (2.36s)
|
||||||
|
.\\Test\\Framework.hs:83:
|
||||||
|
add with SHA1 failed with unexpected exit code
|
||||||
|
Use -p '(/Init Tests.add/||/Init Tests/)&&/add/' to rerun this test only.
|
||||||
|
|
||||||
|
1 out of 2 tests failed (8.86s)
|
||||||
|
git-annex: thread blocked indefinitely in an STM transaction
|
||||||
|
|
||||||
|
\"\"\"]]
|
||||||
|
|
||||||
|
So the real error turned out to be a user failure of mine: libmagic (or the msys2 package `mingw-w64-x86_64-file`)
|
||||||
|
had a recent update and the new library didn't like my previous magic database located in the fallback location
|
||||||
|
`%localappdata%\.magic\magic.mgc`. By copying the msys2 file `/mingw64/share/misc/magic.mgc` to the aforementioned
|
||||||
|
location, the whole issue cleared itself and this report became moot.
|
||||||
|
|
||||||
|
|
||||||
|
"""]]
|
|
@ -0,0 +1,66 @@
|
||||||
|
### Please describe the problem.
|
||||||
|
|
||||||
|
I'm not sure if what I am experiencing is a bug, or just something I am doing incorrectly.
|
||||||
|
|
||||||
|
I am running into an issue with git-annex-import where it seems to stall, and use all available ram it can until the system terminates the process.
|
||||||
|
|
||||||
|
### What steps will reproduce the problem?
|
||||||
|
|
||||||
|
Here are my prep steps:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
git init
|
||||||
|
|
||||||
|
git annex initremote s3-data type=S3 encryption=none port=443 protocol=https public=no \
|
||||||
|
importtree=yes versioning=yes host=$S3HOST bucket=$BUCKET fileprefix=primary_folder/
|
||||||
|
|
||||||
|
git annex wanted s3-data "exclude=subfolder-*/* and include=specialfilename1.*"
|
||||||
|
|
||||||
|
git annex import main --from s3-data --skip-duplicates --backend MD5E --jobs=4
|
||||||
|
```
|
||||||
|
|
||||||
|
### What version of git-annex are you using? On what operating system?
|
||||||
|
|
||||||
|
Originally, 10.20230321 on Debian Bookworm
|
||||||
|
|
||||||
|
I also tried 10.20231129 on Debian Bookworm with the same results
|
||||||
|
|
||||||
|
### Please provide any additional information below.
|
||||||
|
|
||||||
|
There are around 22000 files under the prefix I am trying to import from , and it amounts to around 115 GB. However, most of that data is part of many seperate subdatasets underneath this one. These have all worked fine and without any issue.
|
||||||
|
|
||||||
|
There are only 2 files I am actually trying to import, though there are several versions(about 70 each for a total of 140) of them at this location. In this example, that is `specialfilename1.json` & `specialfilename1.csv`
|
||||||
|
|
||||||
|
When I use debug mode on the import command a lot of information is printed, but it mostly seems to amount to filenames that I would think would be excluded based on my `git-annex-wanted` command. That output looks like the following until it stops, and then uses up what's available for RAM before inevitably terminating the process.
|
||||||
|
|
||||||
|
```
|
||||||
|
[2023-12-18 23:47:54.924814306] (Utility.Process) process [11787] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"]
|
||||||
|
[2023-12-18 23:47:54.929719575] (Utility.Process) process [11787] done ExitSuccess
|
||||||
|
[2023-12-18 23:47:54.930053775] (Utility.Process) process [11789] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"]
|
||||||
|
[2023-12-18 23:47:54.935010757] (Utility.Process) process [11789] done ExitSuccess
|
||||||
|
[2023-12-18 23:47:54.935508638] (Utility.Process) process [11790] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","log","refs/heads/git-annex..a6d7c6ae03747e23c2bedbecc8d1a5afeabe5220","--pretty=%H","-n1"]
|
||||||
|
[2023-12-18 23:47:54.940887356] (Utility.Process) process [11790] done ExitSuccess
|
||||||
|
[2023-12-18 23:47:54.941241371] (Utility.Process) process [11791] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","log","refs/heads/git-annex..27a0dedea083605106c614a159d48fd3daa92284","--pretty=%H","-n1"]
|
||||||
|
[2023-12-18 23:47:54.947198539] (Utility.Process) process [11791] done ExitSuccess
|
||||||
|
[2023-12-18 23:47:54.949236306] (Utility.Process) process [11792] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"]
|
||||||
|
[2023-12-18 23:47:54.971233148] (Utility.Process) process [11793] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/remotes/s3-data/main"]
|
||||||
|
[2023-12-18 23:47:54.97613841] (Utility.Process) process [11793] done ExitFailure 1
|
||||||
|
...
|
||||||
|
|
||||||
|
String to sign: "GET\n\n\nMon, 18 Dec 2023 23:47:57 GMT\n/bucketname/?versions"
|
||||||
|
[2023-12-18 23:47:57.978207613] (Remote.S3) Host: "bucketname.s3-us-east-1.amazonaws.com"
|
||||||
|
[2023-12-18 23:47:57.978237701] (Remote.S3) Path: "/"
|
||||||
|
[2023-12-18 23:47:57.978260264] (Remote.S3) Query string: "versions&key-marker=primary_folder%subfolder-100101%2sessions.json&prefix=primary_folder%2F&version-id-marker=Xg1KUaCh6tpvJ2E1juz4qobn.w3.x9k"
|
||||||
|
[2023-12-18 23:47:57.978337803] (Remote.S3) Header: [("Date","Mon, 18 Dec 2023 23:47:57 GMT"),("Authorization","AWS [Redacted]")]
|
||||||
|
[2023-12-18 23:47:58.003376688] (Remote.S3) Response status: Status {statusCode = 200, statusMessage = "OK"}
|
||||||
|
[2023-12-18 23:47:58.00343782] (Remote.S3) Response header 'Transfer-Encoding': 'chunked'
|
||||||
|
[2023-12-18 23:47:58.003472907] (Remote.S3) Response header 'x-amz-request-id': 'tx00000925b31abf8c9c162-006580da2d-19170577-default'
|
||||||
|
[2023-12-18 23:47:58.003527331] (Remote.S3) Response header 'Content-Type': 'application/xml'
|
||||||
|
[2023-12-18 23:47:58.003557859] (Remote.S3) Response header 'Date': 'Mon, 18 Dec 2023 23:47:58 GMT
|
||||||
|
```
|
||||||
|
|
||||||
|
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
|
||||||
|
|
||||||
|
In the past, on smaller versions of data structured the same way, this setup has worked, and I don't run into this issue.
|
||||||
|
|
||||||
|
I'm not exactly sure how to troubleshoot further and I am feeling stuck. Is there something else I can be doing to see more info about what's happening behind the scenes?
|
|
@ -0,0 +1,22 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="unqueued"
|
||||||
|
avatar="http://cdn.libravatar.org/avatar/3bcbe0c9e9825637ad7efa70f458640d"
|
||||||
|
subject="comment 6"
|
||||||
|
date="2023-12-19T18:50:08Z"
|
||||||
|
content="""
|
||||||
|
Whoa, thanks for implementing that Joey! Can't wait to give it a try!
|
||||||
|
|
||||||
|
|
||||||
|
FYI, one of the cases I was talking about before where I repeatedly import keys in MD5E format, is that construct an annex repo, set web urls, and deal with mirroring further down the pipeline.
|
||||||
|
|
||||||
|
Code isn't great, just something I threw together years ago:
|
||||||
|
https://github.com/unqueued/annex-drive-share
|
||||||
|
|
||||||
|
Because it is gdrive, I can get MD5s and filenames with rclone urls for web remotes.
|
||||||
|
|
||||||
|
The way I use it is to init a new annex repo (reusing the same uuid), and then absorb into primary downstream repo overwriting filenames and letting sync update any new keys with the primary repo. Considered using subtree.
|
||||||
|
|
||||||
|
It does end up causing merge commits to build up in the git-annex branch, but I might want to run this on a server without sharing an entire repo.
|
||||||
|
|
||||||
|
It happens to make sense for me because I have an unlimited @edu gdrive account and it can work great for some workflows as an intermediate file store.
|
||||||
|
"""]]
|
Loading…
Reference in a new issue