From f47ee1f7c05ad70ef2f7c40f59969d36c3a1b134 Mon Sep 17 00:00:00 2001 From: "hello@da0030bba070302e85904b4d73db61fb4af7bced" Date: Thu, 16 Jan 2025 18:17:52 +0000 Subject: [PATCH 1/9] Added a comment: Still happening, managed to get a reproduction (maybe ?) --- ..._6032081d9f96b09e5eed32dc28ec7738._comment | 30 +++++++++++++++++++ 1 file changed, 30 insertions(+) create mode 100644 doc/bugs/Packfile_does_not_match_digest__58___gcrypt_with_assistant/comment_18_6032081d9f96b09e5eed32dc28ec7738._comment diff --git a/doc/bugs/Packfile_does_not_match_digest__58___gcrypt_with_assistant/comment_18_6032081d9f96b09e5eed32dc28ec7738._comment b/doc/bugs/Packfile_does_not_match_digest__58___gcrypt_with_assistant/comment_18_6032081d9f96b09e5eed32dc28ec7738._comment new file mode 100644 index 0000000000..d69c21e45e --- /dev/null +++ b/doc/bugs/Packfile_does_not_match_digest__58___gcrypt_with_assistant/comment_18_6032081d9f96b09e5eed32dc28ec7738._comment @@ -0,0 +1,30 @@ +[[!comment format=mdwn + username="hello@da0030bba070302e85904b4d73db61fb4af7bced" + nickname="hello" + subject="Still happening, managed to get a reproduction (maybe ?)" + date="2025-01-16T18:17:51Z" + content=""" +Hi, I have stumbled upon this specific bug today too. + +I use `git-annex`: `10.20240927-3` (latest on Manjaro) with `git-remote-gcrypt`: `1.5-1` (latest at the time) and I now have the same issue. + + $ git push master + gcrypt: Decrypting manifest + gpg: Signature made jeu. 16 janv. 2025 18:56:55 CET + gpg: using EDDSA key *** + gpg: Good signature from \"*** <***>\" [ultimate] + gcrypt: Due to a longstanding bug, this push implicitly has --force. + gcrypt: Consider explicitly passing --force, and setting + gcrypt: gcrypt's require-explicit-force-push git config key. + gcrypt: Repacking remote , ... + gcrypt: Packfile *** does not match digest! + fatal: early EOF + error: failed to push some refs to 'gcrypt::ssh://librarian@:/~/library.git' + +I tried to reproduce the issue and it seems that it is easy to force this to happen if +- you have a `git annex assistant` running in the annex +- copy a large directory (I used a `.flac` music album, 2.1Gi) to the annex so that it is uploaded to the `gcrypt` remote automatically +- issue a `git annex sync --content` while the assistant is trying to upload the content to the remote + + +"""]] From 92ba4c915d0d12235ef1e58beba395bbfb7275c0 Mon Sep 17 00:00:00 2001 From: goglu6 Date: Fri, 17 Jan 2025 05:12:14 +0000 Subject: [PATCH 2/9] --- ...r_seems_to_deadlock_for_huge_worktree.mdwn | 77 +++++++++++++++++++ 1 file changed, 77 insertions(+) create mode 100644 doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn diff --git a/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn b/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn new file mode 100644 index 0000000000..40b1994e6a --- /dev/null +++ b/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn @@ -0,0 +1,77 @@ +### Please describe the problem. + +I have a pretty big repository with around 300 000 files in the workdir of a branch. +I wanted to unlock all those files from that branch on a machine, so I tried to use git-annex-adjust --unlock. +Sadly, the command do not seems to finish, ever. + +Executing the command with debug from a clone(to avoid interacting with the broken index from the first), it seems to deadlock after executing between 10000 and 20000 "thawing" processes when executing the filter-process logic over the files in the worktree. +The problems seems to be reproducible with any repository with a lot of files in the worktree as far as I can tell, independant of file size. + +The infinite loop make higher-level commands like git annex sync also deadlock when checkout-ing the unlocked branch for any reason. +Also, because the filtering is not completely applied, the index is pretty scrambled, its easier to clone the repo and move the annex than fix it, for me at least. + +### What steps will reproduce the problem? + +Here is a minimum set of bash commands that generate the deadlock on my end: +(https://github.com/klieret/RandomFileTree for the randomfiletree python command used) + + mkdir test_data + # Create about 280 000 empty and random files(Can still happens with non-empty files) + randomfiletree test_data -d 30 -f 250 -r 3 + cd test_data + git init + git annex init + git commit -m "empty commit" --allow-empty + git annex add + git commit -m "add all empty files" + + # This will get stuck after around ~10000-20000 processes from Utility.Process in the debug log while the git annex thaws files into unlocked files + # The deadlock seems to happens after outputing the start of a new thawing, ctrl-c seems to be the only end state for this + git adjust --unlock --debug + + +### What version of git-annex are you using? On what operating system? + +Happens on both: +Archlinux [normal package] + + git-annex version: 10.20240831-g9d29b99ac4074884d33fd25ef81baed5a11d0244 + build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Servant Benchmark Feeds Testsuite S3 WebDAV + dependency versions: aws-0.24.2 bloomfilter-2.0.1.2 crypton-1.0.0 DAV-1.3.4 feed-1.3.2.1 ghc-9.8.2 http-client-0.7.17 persistent-sqlite-2.13.3.0 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1 + key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* + remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external + operating system: linux x86_64 + supported repository versions: 8 9 10 + upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 + local repository version: 10 + +and + +Debian Bookworm [Compiled via "building from source on Debian"] + + git-annex version: 10.20250102-gaba8ee1ca1d571cada979ef47becb2a75379d63b + build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV + dependency versions: aws-0.22.1 bloomfilter-2.0.1.0 cryptonite-0.29 DAV-1.3.4 feed-1.3.2.1 ghc-9.0.2 http-client-0.7.13.1 persistent-sqlite-2.13.1.0 + torrent-10000.1.1 uuid-1.3.15 yesod-1.6.2.1 + key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* + remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external + operating system: linux x86_64 + supported repository versions: 8 9 10 + upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 + local repository version: 10 + +### Please provide any additional information below. + +Excerpt of the last lines from the huge debug log: + + a blocking[2025-01-16 23:30:27.913022014] (Utility.Process) process [493397] done ExitSuccess + [2025-01-16 23:30:27.91309169] (Annex.Perms) thawing content .git/annex/othertmp/BKQKGR.0/BKQKGR + +Given the huge debug log produced, it may be easier to reproduce the bug to have it than copying it here. If wanted, I can generated one as required. + +Repeatedly calling this(and ctrl-c it when it inevitably get stuck) seems to eventually unlock the files, but its not really a valid solution in my case. +git annex smudge --update --debug + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) +I really like git-annex, it allowed me to deduplicate the files in the big repository described above without much issues except for this bug. + From 1d58c62da873419c5dc071c3bb965446886335a5 Mon Sep 17 00:00:00 2001 From: goglu6 Date: Fri, 17 Jan 2025 05:13:55 +0000 Subject: [PATCH 3/9] --- doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn b/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn index 40b1994e6a..4a5022d67f 100644 --- a/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn +++ b/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn @@ -27,7 +27,7 @@ Here is a minimum set of bash commands that generate the deadlock on my end: # This will get stuck after around ~10000-20000 processes from Utility.Process in the debug log while the git annex thaws files into unlocked files # The deadlock seems to happens after outputing the start of a new thawing, ctrl-c seems to be the only end state for this - git adjust --unlock --debug + git annex adjust --unlock --debug ### What version of git-annex are you using? On what operating system? From 8ec6d7cfdd658d77df72b28700f1e1f0bb3915d7 Mon Sep 17 00:00:00 2001 From: goglu6 Date: Fri, 17 Jan 2025 06:23:41 +0000 Subject: [PATCH 4/9] --- .../Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn b/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn index 4a5022d67f..1516481f44 100644 --- a/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn +++ b/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn @@ -64,13 +64,13 @@ Debian Bookworm [Compiled via "building from source on Debian"] Excerpt of the last lines from the huge debug log: - a blocking[2025-01-16 23:30:27.913022014] (Utility.Process) process [493397] done ExitSuccess + [2025-01-16 23:30:27.913022014] (Utility.Process) process [493397] done ExitSuccess [2025-01-16 23:30:27.91309169] (Annex.Perms) thawing content .git/annex/othertmp/BKQKGR.0/BKQKGR -Given the huge debug log produced, it may be easier to reproduce the bug to have it than copying it here. If wanted, I can generated one as required. +Given the huge debug log produced, it may be easier to reproduce the bug to have it than copying it here. If wanted, I can generate one as required. Repeatedly calling this(and ctrl-c it when it inevitably get stuck) seems to eventually unlock the files, but its not really a valid solution in my case. -git annex smudge --update --debug + git annex smudge --update --debug ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) I really like git-annex, it allowed me to deduplicate the files in the big repository described above without much issues except for this bug. From a1641206de0b61433fb7b1855433fba67bc6d13f Mon Sep 17 00:00:00 2001 From: goglu6 Date: Fri, 17 Jan 2025 06:24:07 +0000 Subject: [PATCH 5/9] --- doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn b/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn index 1516481f44..e6ba8e5bb1 100644 --- a/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn +++ b/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn @@ -70,6 +70,7 @@ Excerpt of the last lines from the huge debug log: Given the huge debug log produced, it may be easier to reproduce the bug to have it than copying it here. If wanted, I can generate one as required. Repeatedly calling this(and ctrl-c it when it inevitably get stuck) seems to eventually unlock the files, but its not really a valid solution in my case. + git annex smudge --update --debug ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) From 5d370fdfece807afea55564a0f63f740ee49805c Mon Sep 17 00:00:00 2001 From: goglu6 Date: Fri, 17 Jan 2025 06:29:39 +0000 Subject: [PATCH 6/9] --- .../Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn b/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn index e6ba8e5bb1..e4d448f828 100644 --- a/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn +++ b/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn @@ -5,11 +5,13 @@ I wanted to unlock all those files from that branch on a machine, so I tried to Sadly, the command do not seems to finish, ever. Executing the command with debug from a clone(to avoid interacting with the broken index from the first), it seems to deadlock after executing between 10000 and 20000 "thawing" processes when executing the filter-process logic over the files in the worktree. -The problems seems to be reproducible with any repository with a lot of files in the worktree as far as I can tell, independant of file size. +The problem seems to be reproducible with any repository with a lot of files in the worktree as far as I can tell, independant of file size. -The infinite loop make higher-level commands like git annex sync also deadlock when checkout-ing the unlocked branch for any reason. +The deadlock described makes higher-level commands like git annex sync also block indefinitely when checkout-ing the unlocked branch for any reason. Also, because the filtering is not completely applied, the index is pretty scrambled, its easier to clone the repo and move the annex than fix it, for me at least. +I call the behavior "deadlock" due to the absence of outpout and low cpu usage on the process when in that state. This seems to indicate some kind of multiprocessing deadlock to me. + ### What steps will reproduce the problem? Here is a minimum set of bash commands that generate the deadlock on my end: From 171de7c00eee3a841c54d70dc1282112bdef164a Mon Sep 17 00:00:00 2001 From: goglu6 Date: Fri, 17 Jan 2025 06:30:22 +0000 Subject: [PATCH 7/9] --- doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn b/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn index e4d448f828..5530cd570f 100644 --- a/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn +++ b/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree.mdwn @@ -10,7 +10,7 @@ The problem seems to be reproducible with any repository with a lot of files in The deadlock described makes higher-level commands like git annex sync also block indefinitely when checkout-ing the unlocked branch for any reason. Also, because the filtering is not completely applied, the index is pretty scrambled, its easier to clone the repo and move the annex than fix it, for me at least. -I call the behavior "deadlock" due to the absence of outpout and low cpu usage on the process when in that state. This seems to indicate some kind of multiprocessing deadlock to me. +I call the behavior "deadlock" due to the absence of debug log output and low cpu usage on the process when in that state. This seems to indicate some kind of multiprocessing deadlock to me. ### What steps will reproduce the problem? From 159496a0c60035aa135147b7af642fe63e25a5cf Mon Sep 17 00:00:00 2001 From: "hello@da0030bba070302e85904b4d73db61fb4af7bced" Date: Fri, 17 Jan 2025 11:11:51 +0000 Subject: [PATCH 8/9] --- ...om_corrupted_encrypted_GITBUNDLE_file.mdwn | 63 +++++++++++++++++++ 1 file changed, 63 insertions(+) create mode 100644 doc/tips/Recovering_from_corrupted_encrypted_GITBUNDLE_file.mdwn diff --git a/doc/tips/Recovering_from_corrupted_encrypted_GITBUNDLE_file.mdwn b/doc/tips/Recovering_from_corrupted_encrypted_GITBUNDLE_file.mdwn new file mode 100644 index 0000000000..d3d175b0af --- /dev/null +++ b/doc/tips/Recovering_from_corrupted_encrypted_GITBUNDLE_file.mdwn @@ -0,0 +1,63 @@ +Yesterday upon setting up an `hybrid` encrypted rsync backend (with `git-remote-annex`) support, I stumbled around this bug. The annex remote would corrupt itself and no amount of `git annex repair` would fix it, unfortunately. + +``` +$ git annex push +push +Full remote url: annex::?encryption=hybrid&rsyncurl=&type=rsync +gpg: zlib inflate problem: invalid block type + user error (gpg ["--quiet","--trust-model","always","--batch","--passphrase-fd","13","--decrypt"] exited 2) +git-annex: Failed to download GITBUNDLE-s---77e0ff1580b971da4e39b15ba22439d66e3c5729adea2d7df8643438ef900c49 +Full remote url: annex::?encryption=hybrid&rsyncurl=&type=rsync +gpg: zlib inflate problem: invalid block type + user error (gpg ["--quiet","--trust-model","always","--batch","--passphrase-fd","13","--decrypt"] exited 2) +git-annex: Failed to download GITBUNDLE-s---77e0ff1580b971da4e39b15ba22439d66e3c5729adea2d7df8643438ef900c49 + + Pushing to failed. +failed +push: 1 failed +``` + +I then started to investigate the temporary files in `.git/annex/tmp` and found the aforementionned **GITBUNDLE**, as well as it's encrypted counterpart. + +``` +$ tree .git/annex/tmp/ +.git/annex/tmp/ +├── GITBUNDLE-s---77e0ff1580b971da4e39b15ba22439d66e3c5729adea2d7df8643438ef900c49 +└── GPGHMACSHA1--f2d78638494030b34841f7a30ffb2800816ef839 +``` + +I wanted to confirm that the file was corrupted for real, so obtained the cipher for my hyrid repository like so: + +``` +$ cipher=$(git show git-annex:remote.log | grep 'name=' | grep -oP 'cipher\=.*? ' | sed 's/cipher=//') +``` + +And confirmed the encrypted file was indeed corrupted by decrypting it manually: + +``` +$ echo $cipher | base64 -d | gpg -d | tail -c +257 | gpg --batch --passphrase-fd 0 --decrypt .git/annex/tmp/GPGHMACSHA1--f2d78638494030b34841f7a30ffb2800816ef839 > /dev/null +gpg: encrypted with rsa2048 key, ID , created 1970-01-01 + "... <...>" +gpg: encrypted with cv25519 key, ID , created 1970-01-01 + "... <...>" +gpg: AES256.CFB encrypted data +gpg: encrypted with 1 passphrase +gpg: zlib inflate problem: invalid block type +``` + +I then searched in the annex's directory for the uncorrupted **GITBUNDLE** to find out that it indeed existed: + +``` +$ find . -type f -name "GITBUNDLE-s---77e0ff1580b971da4e39b15ba22439d66e3c5729adea2d7df8643438ef900c49" +./.git/annex/objects/Vg/Mm/GITBUNDLE-s---77e0ff1580b971da4e39b15ba22439d66e3c5729adea2d7df8643438ef900c49/GITBUNDLE-s---77e0ff1580b971da4e39b15ba22439d66e3c5729adea2d7df8643438ef900c49 +``` + +Now that I have a way to get the symmetric cipher and the source file, as well as it's encrypted filename, I can re-encrypt it: + +``` +$ echo $cipher | base64 -d | gpg -d | tail -c +257 | gpg --batch --passphrase-fd 0 --output /tmp/GPGHMACSHA1--f2d78638494030b34841f7a30ffb2800816ef839 --symmetric ./.git/annex/objects/Vg/Mm/GITBUNDLE-s---77e0ff1580b971da4e39b15ba22439d66e3c5729adea2d7df8643438ef900c49/GITBUNDLE-s---77e0ff1580b971da4e39b15ba22439d66e3c5729adea2d7df8643438ef900c49 +``` + +And then upload it to replace the corruped one on the remote. + +I then confirmed it was fixed by issuing a `git annex sync --content`. From ad974c3bd2640db8f7b62cce2f8f0c254cb58858 Mon Sep 17 00:00:00 2001 From: "hello@da0030bba070302e85904b4d73db61fb4af7bced" Date: Fri, 17 Jan 2025 11:14:55 +0000 Subject: [PATCH 9/9] Added a comment: Feature idea --- .../comment_1_ad2fc3f81a714dde9a39a27948ae0e5f._comment | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 doc/tips/Recovering_from_corrupted_encrypted_GITBUNDLE_file/comment_1_ad2fc3f81a714dde9a39a27948ae0e5f._comment diff --git a/doc/tips/Recovering_from_corrupted_encrypted_GITBUNDLE_file/comment_1_ad2fc3f81a714dde9a39a27948ae0e5f._comment b/doc/tips/Recovering_from_corrupted_encrypted_GITBUNDLE_file/comment_1_ad2fc3f81a714dde9a39a27948ae0e5f._comment new file mode 100644 index 0000000000..d51ad42610 --- /dev/null +++ b/doc/tips/Recovering_from_corrupted_encrypted_GITBUNDLE_file/comment_1_ad2fc3f81a714dde9a39a27948ae0e5f._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="hello@da0030bba070302e85904b4d73db61fb4af7bced" + nickname="hello" + subject="Feature idea" + date="2025-01-17T11:14:55Z" + content=""" +This could be detected and handled directly by `git annex repair` or the `git annex assistant` daemon. +I unfortunately have no Haskell knowledge and the barrier to entry seems to big for me to contribute right now. +"""]]