reproduced and some analysis

This commit is contained in:
Joey Hess 2020-05-01 14:58:24 -04:00
parent f9f4f29cc2
commit e4a626018f
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 75 additions and 0 deletions

View file

@ -0,0 +1,35 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2020-05-01T17:53:57Z"
content="""
I was able to reproduce the progress bar sticking at 100%. I made two local
repos in the assistant, each a remote of the other (not using ssh), and
added a file to one. It reached 100% fairly quickly, and then sat there
for a minute or so, before getting unstuck.
(The bug submitter's log includes "hPutStr: illegal operation (handle is
closed)" and I did not always see that in my test. May be unrelated.)
While stuck like that, I noticed that git-annex info also said the transfer
was still in progress. So apparently transferkeys is being delayed from
finishing the transfer for some reason.
Looking at ps, I noticed that "git-annex smudge --clean -- thefile" was a
child process of transferkeys. (The file was unlocked, and was the file I
had just added and that it was stuck on.) This may be a red herring; I
don't always see that when it's stuck.
On a hunch, I stopped the assistant process that was running in the sibling
repo of the one where I was adding files. So there was only a single
assistant running, not two. That seems to have gotten rid of the delay, or
at least most of it. It still takes around 6s longer than I would have
expected at the end.
Running transferkeys manually, and feeding it requests in its internal
protocol, transfers happened in well under a second. It did not do any
smudging then.
My guess so far is that something about the two assistants causes a race,
which it has to delay for a while to recover from.
"""]]

View file

@ -0,0 +1,40 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2020-05-01T18:57:22Z"
content="""
stracing transferkeys after it got stuck, it was stuck until
this completed:
wait4(461218, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 461218
That pid was "git --git-dir=../annex2/.git --work-tree=../annex2
--literal-pathspecs --literal-pathspecs -c core.safecrlf=false update-index
-q --refresh -z --stdin", and it did have a git-annex smudge --clean that
seemed stuck for a while.
Then it proceeded doing this:
rename("../annex2/.git/annexindex.0/index", "../annex2/.git/index") = 0
annex2 is the remote. That temp directory and the git update-index process
says restagePointerFile is what got stuck.
I managed to strace the stuck git-annex smudge --clean, and it's in a loop:
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
(Just before that loop, it seemed to be waiting for some
child processes, but I didn't catch which ones they were.)
After many iterations:
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
munmap(0x7f4438549000, 41947136) = 0
munmap(0x7f443ad4a000, 41947136) = 0
exit_group(0) = ?
+++ exited with 0 +++
"""]]