Merge branch 'master' into sqlite

This commit is contained in:
Joey Hess 2019-11-21 17:26:50 -04:00
commit d4661959de
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
152 changed files with 2443 additions and 462 deletions

View file

@ -0,0 +1,12 @@
The OSX .dmg contains a few binaries in git-core like git-remote-http.
They have been adjusted by otool to link to libraries in the same directory
as the binary. However, the libraries are not located in the git-core
directory, but in its parent directory, and so the git-core binaries don't
link.
I don't think this is a new regression, but not entirely sure.
Seems that OSXMkLibs could symlink ../lib into git-core.
--[[Joey]]
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,52 @@
[[!comment format=mdwn
username="xwvvvvwx"
avatar="http://cdn.libravatar.org/avatar/7198160b33539b5b1b2d56ca85c562d9"
subject="comment 14"
date="2019-11-21T17:32:31Z"
content="""
I just reproduced this when pushing to a gcrypt remote on rsync.net using the assistant. There is only one client pushing to the gcrypt remote.
It was during the initial sync of a moderately large amount of data (~22G), perhaps this has something to do with it?
I could reproduce the issue by cloning with gcrypt directly (`git clone gcrypt::ssh://....`).
I was able to recover by following the steps outlined in Schnouki's comment (#12), but this is obviously quite an unsatisfactory fix.
I am using annex to replicate important personal data, and I find this issue highly concerning.
Foolishly, I did not keep a copy of the bad repo before I forced pushed over it on the remote, so I do not have a copy available to experiment with :(
---
## logs
`daemon.log` excerpt: [https://ipfs.io/ipfs/QmcoPuTLY2v5FWPABQLVwgyqW5WdsvkBbVS33cJh6zjzi4](https://ipfs.io/ipfs/QmcoPuTLY2v5FWPABQLVwgyqW5WdsvkBbVS33cJh6zjzi4)
`git clone` output:
```
[annex@xwvvvvwx:~]$ git clone gcrypt::ssh://<URL> remote
Cloning into 'remote'...
gcrypt: Decrypting manifest
gpg: Signature made Thu 21 Nov 2019 04:02:40 PM CET
gpg: using RSA key 92E9F58E9F8C6845423C251AACD9A98951774194
gpg: Good signature from \"git-annex <annex@xwvvvvwx.com>\" [ultimate]
gcrypt: Remote ID is :id:tWrcOFKu2yX7y+jLDLxm
gcrypt: Packfile e7b619864585f3c921b491fd041127cf0ae33c4480810610dcb2e37ec46a82be does not match digest!
fatal: early EOF
```
`git annex version`:
```
git-annex version: 7.20191114
build flags: Assistant Webapp Pairing S3 WebDAV Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite
dependency versions: aws-0.21.1 bloomfilter-2.0.1.0 cryptonite-0.25 DAV-1.3.3 feed-1.2.0.1 ghc-8.6.5 http-client-0.6.4 persistent-sqlite-2.9.3 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs hook external
operating system: linux x86_64
supported repository versions: 7
upgrade supported from repository versions: 0 1 2 3 4 5 6
```
"""]]

View file

@ -0,0 +1,24 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2019-11-13T19:29:34Z"
content="""
--debug might provide some clue in its http dump.
The ParseError comes from attoparsec. Seems likely that aeson/aws is what's
using it there, and that it is failing to parse something from S3.
Of course, the malloc error suggests a low-level memory problem, probably
from C code. I don't think git-annex contains anything like that, so it
must be from a dependency.
The S3 signature being wrong again points to the aws library, or something
lower level. And then the following double free is another low-level memory
problem.
So there's a pattern, and it seems to extend across linux and OSX.
Kind of wondering if something in the library stack is somehow failing to
be concurrency safe. If two http requests end up using the same memory,
it would kind of explain all of this.
"""]]

406
doc/bugs/cygwin.mdwn Normal file
View file

@ -0,0 +1,406 @@
Cygwin do not work with git-annex windows installed version
### What steps will reproduce the problem?
* Install git-annex windows version
* Try run git annex test under cygwin, and got 65 test failed out of 101.
* Try run git annex test under git bash windows and got 101 test passed.
* NOTE: git-lfs windows installed version working fine under cygwin and git bash windows.
### What version of git-annex are you using? On what operating system?
git-annex version: 7.20191106-ge486fd5e0
build flags: Assistant Webapp Pairing S3 WebDAV TorrentParser Feeds Testsuite
dependency versions: aws-0.21.1 bloomfilter-2.0.1.0 cryptonite-0.25 DAV-1.3.3 feed-1.0.1.0 ghc-8.6.5 http-client-0.5.14 persistent-sqlite-2.9.3 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs hook external
operating system: mingw32 x86_64
supported repository versions: 7
upgrade supported from repository versions: 2 3 4 5 6
### Please provide any additional information below.
[[!format sh """
# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
Cygwin ~/git..t_tools/wget/cache (test)
(506)$ git-annex test
Tests
QuickCheck
prop_encode_decode_roundtrip: OK (0.04s)
+++ OK, passed 1000 tests.
prop_encode_c_decode_c_roundtrip: OK (0.03s)
+++ OK, passed 1000 tests.
prop_isomorphic_key_encode: OK
+++ OK, passed 1000 tests.
prop_isomorphic_shellEscape: OK (0.02s)
+++ OK, passed 1000 tests.
prop_isomorphic_shellEscape_multiword: OK (0.70s)
+++ OK, passed 1000 tests.
prop_isomorphic_configEscape: OK (0.02s)
+++ OK, passed 1000 tests.
prop_parse_show_Config: OK (0.04s)
+++ OK, passed 1000 tests.
prop_upFrom_basics: OK (0.02s)
+++ OK, passed 1000 tests.
prop_relPathDirToFile_basics: OK (0.03s)
+++ OK, passed 1000 tests.
prop_relPathDirToFile_regressionTest: OK
+++ OK, passed 1 test.
prop_cost_sane: OK
+++ OK, passed 1 test.
prop_matcher_sane: OK
+++ OK, passed 1 test.
prop_HmacSha1WithCipher_sane: OK
+++ OK, passed 1 test.
prop_VectorClock_sane: OK
+++ OK, passed 1 test.
prop_addMapLog_sane: OK
+++ OK, passed 1 test.
prop_verifiable_sane: OK (0.07s)
+++ OK, passed 1000 tests.
prop_segment_regressionTest: OK
+++ OK, passed 1 test.
prop_read_write_transferinfo: OK (0.04s)
+++ OK, passed 1000 tests.
prop_read_show_inodecache: OK (0.02s)
+++ OK, passed 1000 tests.
prop_parse_build_presence_log: OK (1.27s)
+++ OK, passed 1000 tests.
prop_parse_build_contentidentifier_log: OK (1.23s)
+++ OK, passed 1000 tests.
prop_read_show_TrustLevel: OK
+++ OK, passed 1 test.
prop_parse_build_TrustLevelLog: OK
+++ OK, passed 1 test.
prop_hashes_stable: OK
+++ OK, passed 1 test.
prop_mac_stable: OK
+++ OK, passed 1 test.
prop_schedule_roundtrips: OK (0.01s)
+++ OK, passed 1000 tests.
prop_past_sane: OK
+++ OK, passed 1 test.
prop_duration_roundtrips: OK
+++ OK, passed 1000 tests.
prop_metadata_sane: OK (0.86s)
+++ OK, passed 1000 tests.
prop_metadata_serialize: OK (0.84s)
+++ OK, passed 1000 tests.
prop_branchView_legal: OK (0.77s)
+++ OK, passed 1000 tests.
prop_viewPath_roundtrips: OK (0.03s)
+++ OK, passed 1000 tests.
prop_view_roundtrips: OK (0.52s)
+++ OK, passed 1000 tests.
prop_viewedFile_rountrips: OK (0.02s)
+++ OK, passed 1000 tests.
prop_b64_roundtrips: OK
+++ OK, passed 1000 tests.
prop_standardGroups_parse: OK
+++ OK, passed 1 test.
Unit Tests v7 adjusted unlocked branch
add dup: Init Tests
init: init test repo
Detected a filesystem without fifo support.
Disabling ssh connection caching.
Detected a crippled filesystem.
Disabling core.symlinks.
(scanning for unlocked files...)
Entering an adjusted branch where files are unlocked as this filesystem does not support locked files.
not found .
git-annex.exe: pre-commit: 1 failed
Failed to enter adjusted branch!
ok
(recording state in git...)
not found .
git-annex.exe: pre-commit: 1 failed
FAIL (6.92s)
.\\Test\\Framework.hs:469:
git commit failed
add: add foo
ok
(recording state in git...)
add sha1foo
ok
(recording state in git...)
not found .
git-annex.exe: pre-commit: 1 failed
FAIL (8.10s)
Test.hs:303:
git commit failed
2 out of 2 tests failed (15.02s)
FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
add extras: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
export_import: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
export_import_subdir: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
shared clone: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
log: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
import: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
reinject: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
unannex (no copy): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
unannex (with copy): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
drop (no remote): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
drop (with remote): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
drop (untrusted remote): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
get: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
get (ssh remote): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
move: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
move (ssh remote): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
copy: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
lock: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
lock --force: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
edit (no pre-commit): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
edit (pre-commit): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
partial commit: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
fix: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
trust: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
fsck (basics): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
fsck (bare): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
fsck (local untrusted): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
fsck (remote untrusted): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
fsck --from remote: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
migrate: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
migrate (via gitattributes): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
unused: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
describe: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
find: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
merge: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
info: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
version: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
sync: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
union merge regression: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
adjusted branch merge regression: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
adjusted branch subtree regression: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
conflict resolution: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
conflict resolution (adjusted branch): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
conflict resolution movein regression: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
conflict resolution (mixed directory and file): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
conflict resolution symlink bit: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
conflict resolution (uncommitted local file): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
conflict resolution (removed file): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
conflict resolution (nonannexed file): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
conflict resolution (nonannexed symlink): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
conflict resolution (mixed locked and unlocked file): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
map: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
uninit: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
uninit (in git-annex branch): FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
upgrade: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
whereis: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
hook remote: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
directory remote: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
rsync remote: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
bup remote: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
crypto: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
preferred content: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
add subdirs: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
addurl: FAIL
Exception: init tests failed! cannot continue
CallStack (from HasCallStack):
error, called at .\\Test\\Framework.hs:427:33 in main:Test.Framework
65 out of 101 tests failed (21.64s)
(Failures above could be due to a bug in git-annex, or an incompatibility
with utilities, such as git, installed on this system.)
# End of transcript or log.
"""]]

View file

@ -22,3 +22,8 @@ If user convenience was something to strive for here, it should technically be p
[[!meta author=yoh]]
[[!tag projects/datalad]]
> [[fixed|done]], and I also converted a number of other places
> where an error could leak through to stderr, although there are still
> some places where direct writes to stderr happen -- I'll probably never
> be able to guarantee --json-error-messages catches every possible stderr
> output. --[[Joey]]

View file

@ -0,0 +1,13 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2019-11-19T17:12:41Z"
content="""
I think that you can accomplish what you want by making the directory
you're importing from be a directory special remote with exporttree=yes
importtree=yes and use the new `git annex import master --from remote`
If that does not do what you want, I'd prefer to look at making it be able
to do so. I hope to eventually remove the legacy git-annex import from
directory, since we have this new more general interface.
"""]]

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2019-11-13T19:37:16Z"
content="""
The signal 11 is very significant. It points to a problem in a lower-level
library (or ghc runtime), or perhaps a bad memory problem. git-annex does
not itself contain any code that can segfault, afaik.
Almost certianly the same as the other bug.
"""]]

View file

@ -0,0 +1,73 @@
Originally was trying to reproduce [datalad/issues/3653](https://github.com/datalad/datalad/issues/3653) assuming that multiple files pointed to the same key.
It was not the case, and my attempt revealed another bug - annex inability to "obtain" files in parallel when multiple of them point to the same key:
<details>
<summary>setup of original repo(click to expand)</summary>
[[!format sh """
/tmp > mkdir src; (cd src; git init; git annex init; dd if=/dev/zero of=1 count=1024 bs=1024; for f in {2..10}; do cp 1 $f; done ; git annex add *; git commit -m added; )
Initialized empty Git repository in /tmp/src/.git/
init (scanning for unlocked files...)
ok
(recording state in git...)
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00106651 s, 983 MB/s
add 1
ok
add 10
ok
add 2
ok
add 3
ok
add 4
ok
add 5
ok
add 6
ok
add 7
ok
add 8
ok
add 9
ok
(recording state in git...)
[master (root-commit) 63b1163] added
10 files changed, 10 insertions(+)
create mode 120000 1
create mode 120000 10
create mode 120000 2
create mode 120000 3
create mode 120000 4
create mode 120000 5
create mode 120000 6
create mode 120000 7
create mode 120000 8
create mode 120000 9
"""]]
</details>
And that is what happens then when we try to get the same key in parallel:
[[!format sh """
/tmp > git clone src dst; (cd dst; git annex get -J 5 *; )
Cloning into 'dst'...
done.
(merging origin/git-annex into git-annex...)
(recording state in git...)
(scanning for unlocked files...)
get 2 (from origin...) (checksum...)
git-annex: thread blocked indefinitely in an STM transaction
failed
git-annex: thread blocked indefinitely in an MVar operation
"""]]
I felt like it is an old issue but failed to find a trace of it upon a quick lookup
[[!meta author=yoh]]
[[!tag projects/datalad]]
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,20 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2019-11-13T16:34:34Z"
content="""
Reproduced.
After building git-annex with the DebugLocks flag, I got this:
debugLocks, called at ./Annex/Transfer.hs:248:18 in main:Annex.Transfer
debugLocks, called at ./CmdLine/Action.hs:263:26 in main:CmdLine.Action
Which points to pickRemote and ensureOnlyActionOn. But pickRemote
does no STM actions when there's only 1 remote, so it must really be
the latter.
Also, I notice that when 5 files to get are provided, it crashes, but with
less than 5, it succeeds.
Even this trivial case crashes: `git annex get -J1 1 2`
"""]]

View file

@ -0,0 +1,83 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2019-11-13T17:07:29Z"
content="""
Ok, I see the bug. ensureOnlyActionOn does a STM
retry if it finds in the activekeys map some other thread
is operating on the same key.
But, there is no running STM transaction what will update
the map. So, STM detects that the retry would deadlock.
It's not really a deadlock, because once the other thread finishes,
it will update the map to remove itself. But STM can't know that.
The solution will be to not use STM for waiting on the other thread.
Hmm, I tried the obvious approach, using a MVar semaphore to wait for the
thread, but that just resulted in more STM and MVar deadlocks.
I don't understand why after puzzling over it for two hours. I did
instrument all calls to atomically, and it looks, unfortunately, like
the one in finishCommandActions is deadlocking. If the problem extends
beyond ensureOnlyActionOn it may be much more complicated.
Patch that does not work and I don't know why.
diff --git a/CmdLine/Action.hs b/CmdLine/Action.hs
index 87298a95f..bf4bdd589 100644
--- a/CmdLine/Action.hs
+++ b/CmdLine/Action.hs
@@ -268,16 +268,30 @@ ensureOnlyActionOn k a = debugLocks $
go ConcurrentPerCpu = goconcurrent
goconcurrent = do
tv <- Annex.getState Annex.activekeys
- bracket (setup tv) id (const a)
- setup tv = liftIO $ do
+ bracketIO (setup tv) id (const a)
+ setup tv = do
+ mysem <- newEmptyMVar
mytid <- myThreadId
- atomically $ do
+ finishsetup <- atomically $ do
m <- readTVar tv
case M.lookup k m of
- Just tid
- | tid /= mytid -> retry
- | otherwise -> return $ return ()
+ Just (tid, theirsem)
+ | tid /= mytid -> return $ do
+ -- wait for the other
+ -- thread to finish, and
+ -- retry (STM retry would
+ -- deadlock)
+ readMVar theirsem
+ setup tv
+ | otherwise -> return $
+ -- same thread, so no
+ -- blocking
+ return $ return ()
Nothing -> do
- writeTVar tv $! M.insert k mytid m
- return $ liftIO $ atomically $
- modifyTVar tv $ M.delete k
+ writeTVar tv $! M.insert k (mytid, mysem) m
+ return $ return $ do
+ atomically $ modifyTVar tv $
+ M.delete k
+ -- indicate finished
+ putMVar mysem ()
+ finishsetup
diff --git a/Annex.hs b/Annex.hs
index 9eb4c5f39..936399ae7 100644
--- a/Annex.hs
+++ b/Annex.hs
@@ -143,7 +143,7 @@ data AnnexState = AnnexState
, existinghooks :: M.Map Git.Hook.Hook Bool
, desktopnotify :: DesktopNotify
, workers :: Maybe (TMVar (WorkerPool AnnexState))
- , activekeys :: TVar (M.Map Key ThreadId)
+ , activekeys :: TVar (M.Map Key (ThreadId, MVar ()))
, activeremotes :: MVar (M.Map (Types.Remote.RemoteA Annex) Integer)
, keysdbhandle :: Maybe Keys.DbHandle
, cachedcurrentbranch :: (Maybe (Maybe Git.Branch, Maybe Adjustment))
"""]]

View file

@ -0,0 +1,17 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2019-11-13T19:07:49Z"
content="""
Tried going back to c04b2af3e1a8316e7cf640046ad0aa68826650ed,
which is before the separation of perform and cleanup stages.
The same code was in onlyActionOn back then. And the test case does not
crash.
So, that gives a good commit to start a bisection. Which will probably
find the bug was introduced in the separation of perform and cleanup stages,
because that added a lot of STM complexity.
(Have to cherry-pick 018b5b81736a321f3eb9762a2afb7124e19dbdf9
onto those old commits to make them build with current libraries.)
"""]]

View file

@ -0,0 +1,83 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2019-11-13T21:22:07Z"
content="""
Simplified version of patch above, that converts ensureOnlyActionOn to not use
STM at all, and is significantly simpler.
With this patch, the test case still STM deadlocks. So this seems to be
proof that the actual problem is not in ensureOnlyActionOn.
diff --git a/Annex.hs b/Annex.hs
index 9eb4c5f39..9baf7755a 100644
--- a/Annex.hs
+++ b/Annex.hs
@@ -143,7 +143,7 @@ data AnnexState = AnnexState
, existinghooks :: M.Map Git.Hook.Hook Bool
, desktopnotify :: DesktopNotify
, workers :: Maybe (TMVar (WorkerPool AnnexState))
- , activekeys :: TVar (M.Map Key ThreadId)
+ , activekeys :: MVar (M.Map Key (ThreadId, MVar ()))
, activeremotes :: MVar (M.Map (Types.Remote.RemoteA Annex) Integer)
, keysdbhandle :: Maybe Keys.DbHandle
, cachedcurrentbranch :: (Maybe (Maybe Git.Branch, Maybe Adjustment))
@@ -154,7 +154,7 @@ data AnnexState = AnnexState
newState :: GitConfig -> Git.Repo -> IO AnnexState
newState c r = do
emptyactiveremotes <- newMVar M.empty
- emptyactivekeys <- newTVarIO M.empty
+ emptyactivekeys <- newMVar M.empty
o <- newMessageState
sc <- newTMVarIO False
return $ AnnexState
diff --git a/CmdLine/Action.hs b/CmdLine/Action.hs
index 87298a95f..a8c2bd205 100644
--- a/CmdLine/Action.hs
+++ b/CmdLine/Action.hs
@@ -22,7 +22,7 @@ import Remote.List
import Control.Concurrent
import Control.Concurrent.Async
import Control.Concurrent.STM
-import GHC.Conc
+import GHC.Conc (getNumProcessors)
import qualified Data.Map.Strict as M
import qualified System.Console.Regions as Regions
@@ -267,17 +267,22 @@ ensureOnlyActionOn k a = debugLocks $
go (Concurrent _) = goconcurrent
go ConcurrentPerCpu = goconcurrent
goconcurrent = do
- tv <- Annex.getState Annex.activekeys
- bracket (setup tv) id (const a)
- setup tv = liftIO $ do
+ mv <- Annex.getState Annex.activekeys
+ bracketIO (setup mv) id (const a)
+ setup mv = do
mytid <- myThreadId
- atomically $ do
- m <- readTVar tv
- case M.lookup k m of
- Just tid
- | tid /= mytid -> retry
- | otherwise -> return $ return ()
- Nothing -> do
- writeTVar tv $! M.insert k mytid m
- return $ liftIO $ atomically $
- modifyTVar tv $ M.delete k
+ m <- takeMVar mv
+ let ready sem = do
+ putMVar mv $! M.insert k (mytid, sem) m
+ return $ do
+ modifyMVar_ mv $ pure . M.delete k
+ putMVar sem ()
+ case M.lookup k m of
+ Nothing -> ready =<< newEmptyMVar
+ Just (tid, sem)
+ | tid /= mytid -> do
+ takeMVar sem
+ ready sem
+ | otherwise -> do
+ putMVar mv m
+ return noop
"""]]

View file

@ -0,0 +1,25 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2019-11-13T21:42:58Z"
content="""
finishCommandActions is reaching the retry case, and STM deadlocks there.
The WorkerPool is getting into a state where allIdle is False, and is not
leaving it, perhaps due to an earlier STM deadlock. (There seem to be two
different ones.)
Also, I notice with --json-error-messages:
{"command":"get","note":"from origin...\nchecksum...","success":false,"key":"SHA256E-s524288--07854d2fef297a06ba81685e660c332de36d5d18d546927d30daad6d7fda1541","error-messages":["git-annex: thread blocked indefinitely in an STM transaction"],"file":"1"}
So the thread that actually gets to run on the key is somehow reaching a
STM deadlock.
Which made me wonder if that thread deadlocks on enteringStage.
And it seems so. If Command.Get is changed to use commandStages
rather than transferStages, the test case succeeds.
Like finishCommandActions, enteringStage has a STM retry if it needs to
wait for something to happen to the WorkerPool. So again it looks like
the WorkerPool is getting screwed up.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="catch-all deadlock breaker"
date="2019-11-13T22:33:59Z"
content="""
Not sure if feasible, but maybe a [[catch-all deadlock breaker|todo/more_extensive_retries_to_mask_transient_failures]] could be implemented to mask this and other deadlocks?
The moon landings software [[had something|https://www.ibiblio.org/apollo/hrst/archive/1033.pdf]] [[like this|https://history.nasa.gov/computers/Ch2-6.html]], and it worked [[pretty well|https://www.wsj.com/articles/apollo-11-had-a-hidden-hero-software-11563153001]]...
"""]]

View file

@ -0,0 +1,69 @@
[[!comment format=mdwn
username="joey"
subject="""comment 6"""
date="2019-11-14T15:20:13Z"
content="""
Added tracing of changes to the WorkerPool.
joey@darkstar:/tmp/dst>git annex get -J1 1 2 --json
("initial pool",WorkerPool UsedStages {initialStage = TransferStage, stageSet = fromList [TransferStage,VerifyStage]} [IdleWorker TransferStage,IdleWorker VerifyStage] 2)
("starting worker",WorkerPool UsedStages {initialStage = TransferStage, stageSet = fromList [TransferStage,VerifyStage]} [ActiveWorker TransferStage,IdleWorker VerifyStage] 1)
Transfer starts for file 1
(("change stage from",TransferStage,"to",VerifyStage),WorkerPool UsedStages {initialStage = TransferStage, stageSet = fromList [TransferStage,VerifyStage]} [IdleWorker TransferStage,ActiveWorker VerifyStage] 1)
Transfer complete, verifying starts.
("starting worker",WorkerPool UsedStages {initialStage = TransferStage, stageSet = fromList [TransferStage,VerifyStage]} [ActiveWorker TransferStage,ActiveWorker VerifyStage] 0)
This second thread is being started to process file 2.
It starts in TransferStage, but it will be blocked from doing anything
by ensureOnlyActionOn.
("finishCommandActions starts with",WorkerPool UsedStages {initialStage = TransferStage, stageSet = fromList [TransferStage,VerifyStage]} [ActiveWorker TransferStage,ActiveWorker VerifyStage] 0)
("finishCommandActions observes",WorkerPool UsedStages {initialStage = TransferStage, stageSet = fromList [TransferStage,VerifyStage]} [ActiveWorker TransferStage,ActiveWorker VerifyStage] 0)
All files have threads to process them started, so finishCommandActions starts up.
It will retry since the threads are still running.
(("change stage from",VerifyStage,"to",TransferStage),WorkerPool UsedStages {initialStage = TransferStage, stageSet = fromList [TransferStage,VerifyStage]} [IdleWorker VerifyStage,ActiveWorker TransferStage] 0)
The first thread is done with verification, and
the stage is being restored to transfer.
The 0 means that there are 0 spareVals. Normally, the number of spareVals
should be the same as the number of IdleWorkers, so it should be 1.
It's 0 because the thread is in the process of changing between stages.
The thread should at this point be waiting for an idle TransferStage
slot to become available. The second thread still has that active.
It seems that wait never completes, because a trace I had after that wait
never got printed.
("finishCommandActions observes",WorkerPool UsedStages {initialStage = TransferStage, stageSet = fromList [TransferStage,VerifyStage]} [IdleWorker VerifyStage,ActiveWorker TransferStage] 0)
It retries again, because of the active worker and also because spareVals
is not the same as IdleWorkers.
git-annex: thread blocked indefinitely in an STM transaction
Deadlock.
Looks like that second thread that got into transfer stage
never leaves it, and then the first thread, which wants to
restore back to transfer stage, is left waiting forever for it. And so is
finishCommandActions.
Aha! The second thread is in fact still in ensureOnlyActionOn.
So it's waiting on the first thread to finish. But the first thread can't
transition back to TransferStage because the second thread has stolen it.
Now it makes sense.
So.. One way to fix this would be to add a new stage, which is used for
threads that are just starting. Then the second thread would be in
StartStage, and the first thread would not be prevented from transitioning
back to TransferStage. Would need to make sure that, once a thread leaves
StartStage, it does not ever transition back to it.
"""]]

View file

@ -0,0 +1,79 @@
### Please describe the problem.
There were a few changes introduced since then to Makefile (I will not guess which one broke it) which resulted in git within git-annex-standalone of neurodebian to be unable to clone from https://:
[[!format sh """
$> /usr/lib/git-annex.linux/git clone https://github.com/afni/afni_ci_test_data.git
Cloning into 'afni_ci_test_data'...
fatal: unable to find remote helper for 'https'
"""]]
<details>
<summary>diff between list of files in 7.20190819+git60-gcdb679818 and 7.20191017+git2-g7b13db551 package builds shows many git-* missing</summary>
[[!format sh """
lena:/tmp
$> ls 7.2019*/usr/lib/git-annex.linux/exe/
7.20190819/usr/lib/git-annex.linux/exe/:
cp@ git-diff-index@ git-mktag@ git-sh-i18n--envsubst@
curl@ git-diff-tree@ git-mktree@ git-shell@
git@ git-difftool@ git-multi-pack-index@ git-shortlog@
git-add@ git-fast-export@ git-mv@ git-show@
git-am@ git-fast-import@ git-name-rev@ git-show-branch@
git-annex@ git-fetch@ git-notes@ git-show-index@
git-annex-shell@ git-fetch-pack@ git-pack-objects@ git-show-ref@
git-annotate@ git-fmt-merge-msg@ git-pack-redundant@ git-stage@
git-apply@ git-for-each-ref@ git-pack-refs@ git-status@
git-archive@ git-format-patch@ git-patch-id@ git-stripspace@
git-bisect--helper@ git-fsck@ git-prune@ git-submodule--helper@
git-blame@ git-fsck-objects@ git-prune-packed@ git-symbolic-ref@
git-branch@ git-gc@ git-pull@ git-tag@
git-bundle@ git-get-tar-commit-id@ git-push@ git-unpack-file@
git-cat-file@ git-grep@ git-range-diff@ git-unpack-objects@
git-check-attr@ git-hash-object@ git-read-tree@ git-update-index@
git-check-ignore@ git-help@ git-rebase@ git-update-ref@
git-check-mailmap@ git-http-backend@ git-rebase--interactive@ git-update-server-info@
git-check-ref-format@ git-http-fetch@ git-receive-pack@ git-upload-archive@
git-checkout@ git-http-push@ git-reflog@ git-upload-pack@
git-checkout-index@ git-imap-send@ git-remote@ git-var@
git-cherry@ git-index-pack@ git-remote-ext@ git-verify-commit@
git-cherry-pick@ git-init@ git-remote-fd@ git-verify-pack@
git-clean@ git-init-db@ git-remote-ftp@ git-verify-tag@
git-clone@ git-interpret-trailers@ git-remote-ftps@ git-whatchanged@
git-column@ git-log@ git-remote-http@ git-worktree@
git-commit@ git-ls-files@ git-remote-https@ git-write-tree@
git-commit-graph@ git-ls-remote@ git-remote-testsvn@ localedef@
git-commit-tree@ git-ls-tree@ git-remote-tor-annex@ lsof@
git-config@ git-mailinfo@ git-repack@ rsync@
git-count-objects@ git-mailsplit@ git-replace@ sh@
git-credential@ git-merge@ git-rerere@ ssh@
git-credential-cache@ git-merge-base@ git-reset@ ssh-keygen@
git-credential-cache--daemon@ git-merge-file@ git-rev-list@ tar@
git-credential-store@ git-merge-index@ git-rev-parse@ uname@
git-daemon@ git-merge-ours@ git-revert@ xargs@
git-describe@ git-merge-recursive@ git-rm@
git-diff@ git-merge-subtree@ git-send-pack@
git-diff-files@ git-merge-tree@ git-serve@
7.20191017/usr/lib/git-annex.linux/exe/:
cp@ git-credential-cache--daemon@ git-http-push@ git-sh-i18n--envsubst@ sh@
curl@ git-credential-store@ git-imap-send@ git-shell@ ssh@
git@ git-daemon@ git-receive-pack@ git-upload-pack@ ssh-keygen@
git-annex@ git-fast-import@ git-remote-http@ localedef@ tar@
git-annex-shell@ git-http-backend@ git-remote-testsvn@ lsof@ uname@
git-credential-cache@ git-http-fetch@ git-remote-tor-annex@ rsync@ xargs@
"""]]
</details>
so may be that is related.
Unfortunately in datalad we had no test testing cloning over https, so I added such integration test in https://github.com/datalad/datalad/pull/3867 to at least detect such regressions in the future before hitting the userland
[[!meta author=yoh]]
[[!tag projects/datalad]]
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2019-11-14T18:14:35Z"
content="""
It will either be caused by 5463f97ca216cd261f7a1da08aa8a62cef415a71 or by
a new version of git reorging files (or both).
"""]]