Added a comment: Many network sockets with associated fds hanging around

This commit is contained in:
https://www.google.com/accounts/o8/id?id=AItOawlmRpGORNKWimtzqItvwm4I6cn16vx8OvU 2013-08-15 18:00:43 +00:00 committed by admin
parent 244b14d14d
commit f5829d0fb7

View file

@ -0,0 +1,137 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawlmRpGORNKWimtzqItvwm4I6cn16vx8OvU"
nickname="hayden"
subject="Many network sockets with associated fds hanging around"
date="2013-08-15T18:00:42Z"
content="""
I see something similar in logs and after roughly 10 mins the web-apps dies.
So I think I hit the same as the above user.
Sometimes I get thread deaths and restart requests but the root cause appears to match the scenario mentioned over.
Often the webapp just hangs. But always when it hits the fd ulimit... 1024 on this system.
git-annex version is 4.20130802-g1452ac3 and I used the static-linked linux tar.gz linux-binary download.
Test setup is a from scratch assistant startup on Ubuntu 12.04.
Not exactly a clean ubuntu though, so maybe difficult to duplicate troubles at your end.
I fired up the web-app with a cleaned out config. No signs of leaks until an annex is created.
On creation of an empty annex I get fd leaks a about 1 per second after a repository is created.
Strace'ing the main process only shows 8-bytes writes (see below) at the same rate as the leak.
Sometimes the fd-leak stops before the resource limit, sometimes not.
Creating a new annex on top of an existing directory tree with many files is pretty reliable trigger though.
Startup scan finishes and fds leak away until the ulimit is hit.
hayden@orca:~/gamma$ ls /proc/26319/fd
0 10 12 14 16 18 2 21 23 25 27 29 30 32 34 36 38 4 41 43 45 6 8
1 11 13 15 17 19 20 22 24 26 28 3 31 33 35 37 39 40 42 44 5 7 9
hayden@orca:~/gamma$ ls /proc/26319/fd
0 10 12 14 16 18 2 21 23 25 27 29 30 32 34 36 38 4 41 43 45 6 8
1 11 13 15 17 19 20 22 24 26 28 3 31 33 35 37 39 40 42 44 5 7 9
hayden@orca:~/gamma$ ls /proc/26319/fd
0 10 12 14 16 18 2 21 23 25 27 29 30 32 34 36 38 4 41 43 45 5 7 9
1 11 13 15 17 19 20 22 24 26 28 3 31 33 35 37 39 40 42 44 46 6 8
hayden@orca:~/gamma$ ls /proc/26319/fd/43
/proc/26319/fd/43
hayden@orca:~/gamma$ ls -l /proc/26319/fd/43
ls -l /proc/26319/fd/43
lrwx------ 1 hayden hayden 64 Aug 14 21:10 /proc/26319/fd/43 -> socket:[568994]
hayden@orca:~/gamma$ lsof | grep 568994
git-annex 26319 hayden 43u IPv4 568994 0t0 UDP 224.0.0.251:55556
hayden@orca:~/gamma$ uname -a
Linux orca 3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
hayden@orca:~/gamma$ cat /etc/issue
Ubuntu 12.04 LTS \n \l
hayden@orca:~/gamma$ fuser 55556/udp -a
55556/udp: 4415 26319 27080 27083
hayden@orca:~/gamma$ ps aux | grep 4415
hayden 4415 0.0 0.0 66716 3036 ? S Aug12 0:03 curl -s --head -L http://127.0.0.1:38464/?auth=da9c4aba4cc2db9cf78574753f6e94d8031c6a7bdf8bfe100bde868f57b81fd751965cc9d68a9afac79f826d257a256a04ce62615a7f23cd7c925969dda1c7b8 -w %{http_code}
hayden 27458 0.0 0.0 10612 924 pts/0 S+ 21:50 0:00 grep --color=auto 4415
hayden@orca:~/gamma$ ps aux | grep 26319
hayden 26319 0.1 0.7 497188 28560 pts/5 Sl 21:09 0:04 git-annex webapp
hayden 26338 3.4 3.4 1035572 137864 pts/5 Sl 21:09 1:25 /usr/lib/firefox/firefox /tmp/webapp26319.html
hayden 27460 0.0 0.0 10612 920 pts/0 S+ 21:50 0:00 grep --color=auto 26319
hayden@orca:~/gamma$ ps aux | grep 27080
hayden 27080 0.0 0.0 16476 1288 pts/5 S 21:33 0:00 git --git-dir=/home/hayden/boo/.git --work-tree=/home/hayden/boo cat-file --batch
hayden 27462 0.0 0.0 10612 924 pts/0 S+ 21:50 0:00 grep --color=auto 27080
hayden@orca:~/gamma$ ps aux | grep 27083
hayden 27083 0.0 0.0 16476 1060 pts/5 S 21:33 0:00 git --git-dir=/home/hayden/boo/.git --work-tree=/home/hayden/boo check-attr -z --stdin annex.backend annex.numcopies --
hayden 27464 0.0 0.0 10612 920 pts/0 S+ 21:51 0:00 grep --color=auto 27083
----> has 579 open fds at this point but this number holds stable over 10 min
(copy in new tree to provoke)
(no change)
----> restart daemon in gui to provoke
(new process has open fds slowly climbing after startup scan)
straces look like this repeating every second. (clipped down)
futex(0x34f001c, FUTEX_WAIT_PRIVATE, 51, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0x1a) = 202
futex(0x34f001c, FUTEX_WAIT_PRIVATE, 51, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0x1a) = 202
futex(0x34f001c, FUTEX_WAIT_PRIVATE, 51, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0x1a) = 202
futex(0x34f001c, FUTEX_WAIT_PRIVATE, 51, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
write(6, \"\377\0\0\0\0\0\0\0\", 8) = 8
rt_sigreturn(0x2) = 202
futex(0x34f001c, FUTEX_WAIT_PRIVATE, 51, NULL^C <unfinished ...>
Process 29803 detached
Until after roughly 10 mins...
hayden@orca:~/boo$ ls /proc/29803/fd | wc -l
1017
hayden@orca:~/boo$ ls /proc/29803/fd | wc -l
1023
hayden@orca:~/boo$ ls /proc/29803/fd | wc -l
1024
hayden@orca:~/boo$ ls /proc/29803/fd | wc -l
1024
hayden@orca:~/boo$ ls /proc/29803/fd | wc -l
1024
hayden@orca:~/boo$ ls /proc/29803/fd | wc -l
1024
hayden@orca:~/boo$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31164
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 31164
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
hayden@orca:~/boo$
At this point the webapp hangs but a number of interesting crashes may occur. I've also seen the particular error in the previous users log (on a big tree).
[2013-08-14 21:56:12 CEST] main: starting assistant version 4.20130802-g1452ac3
(scanning...) [2013-08-14 21:56:12 CEST] Watcher: Performing startup scan
(started...) DaemonStatus crashed: /home/hayden/boo/.git/annex/: openTempFile: resource exhausted (Too many open files)
[2013-08-14 22:06:12 CEST] DaemonStatus: warning DaemonStatus crashed: /home/hayden/boo/.git/annex/: openTempFile: resource exhausted (Too many open files)
Is any of the above helpful? Anything else useful to kick for testing that you'd like done?
I'd guess this is something weird with my ubuntu setup that provokes this as more users would see it otherwise.
"""]]