Hypermail

From: Willy Tarreau <w#1wt.eu>
Date: Sat, 16 Feb 2008 19:54:46 +0100

On Sat, Feb 16, 2008 at 01:20:03PM -0500, Marc Breslow wrote:
> Thanks for continuing to help me trouble-shoot here. Wasn't able to upgrade
> the kernel today as we had issues with failover. Will try again next
> saturday morning.

OK.

> You asked:
> What's the approximative session rate and data rate ?
>
> We're peaking at about 30Mbps. There are two types of data going through
> haproxy:
> - HTTP - 90% of requests have maybe 30 bytes of data outside of the HTTP
> overhead. These 90% requests are AJAX polling calls that are *held open on
> the server for 20 seconds* before returning a response that effectively says
> "no messages waiting for you".

yes, fantastic "light" technology... :-)

> - TCP - there's a lot of data going through here at high rates. I'd
> estimate that this makes up 20Mbps of the 30. Yesterday for about 60
> minutes we didn't have the TCP traffic going through here. We were
> bypassing the load balancer during that period. The CPU usage of haproxy
> remained at the same 100% level during that period. These TCP sessions are
> persistent, btw. There are about 100 of them and once the sockets are
> established they stick around for days at a clip.

OK, however this is still what I consider a small to medium load.

> You asked about the size of the machine:
> Dual Core Pentium E2160, 1.8 GHz, 1MB Cache, 800MHz Front Side Bus
> 2GB RAM
it's already quite a correct setup !

> Glad to hear the configuration sounds ok and that health-checks are not an
> issue at all.
>
> Here's the stats output from strace:
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 93.05 3.276951 1146 2859 poll
> 1.56 0.054937 12 4532 8 send
> 1.29 0.045360 5 8370 40 recv
> 1.04 0.036664 9 3941 close
> 0.68 0.023890 7 3235 1314 accept
> 0.67 0.023423 12 1938 1926 connect
> 0.47 0.016556 9 1926 socket
> 0.37 0.013019 7 1877 shutdown
> 0.36 0.012678 3 3847 setsockopt
> 0.30 0.010513 3 3847 fcntl64
> 0.21 0.007548 3 2860 gettimeofday
> 0.00 0.000103 4 28 getsockopt
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 3.521642 39260 3288 total

Ah very interesting things there. I think I should suggest people do this more often.

poll() is used instead of epoll(). That means that for large numbers of file descriptors, the system spends a lot of time there. That's strange, I thought you were running with epoll. Could you run "haproxy -vv" (and nothing else so that it does not conflict with your running config) ? Here it says :

willy#pcw:haproxy-1.3$ ./haproxy -vv
HA-Proxy version 1.3.14-55 2008/02/15
Copyright 2000-2007 Willy Tarreau <w#1wt.eu>

Build options :

  TARGET  = linux24e
  CPU     = pm
  CC      = gcc-4.1
  CFLAGS  = -g -DBUFSIZE=8030 -DMAXREWRITE=1030 -DSYSTEM_MAXCONN=40000
  OPTIONS = USE_CTTPROXY=1 USE_LINUX_TPROXY=1 USE_REGPARM=1 USE_STATIC_PCRE=1

Now, when you start it with this simple fake config with -V, it will report which pollers are available and the one it will use :

willy#pcw:haproxy-1.3$ cat status.cfg
backend test

balance roundrobin

willy#pcw:haproxy-1.3$ ./haproxy -V -f status.cfg Available polling systems :

     sepoll : pref=400,  test result OK
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK

Total: 4 (4 usable), will use sepoll.
Using sepoll() as the polling mechanism. <<<===== this one is good [ALERT] 046/195318 (26338) : [./haproxy.main()] No enabled listener found (check the <listen> keywords) ! Exiting.

Now, analyzing your trace, I see that you get a medium load :

> 0.68 0.023890 7 3235 1314 accept

3235-1314 = 1921 connections accepted in 3.52 seconds = 545 connections/s.

> 0.67 0.023423 12 1938 1926 connect

Don't focus on the accept() errors, they are caused by the last one in a series returning EAGAIN which is expected in asynchronous accept(). Same for connect errors here, they are mostly caused by EINPROGRESS due to asynchronous connect().

What is very interesting is the first line :

> 93.05 3.276951 1146 2859 poll

haproxy spends 93% of its time in poll(), which means it does almost nothing.

Here's for example how it behaves on my Geode at 600 hits/s under strace :

root#ALOHA-500:~# strace -c -p 945
Process 945 attached - interrupt to quit Process 945 detached
% time seconds usecs/call calls errors syscall

------ ----------- ----------- --------- --------- ----------------
 46.74    0.016419           6      2817           shutdown
 18.65    0.006553           1      5746           send
  9.51    0.003342           0     11424      2844 recv
  8.00    0.002809           0      5632           close
  5.57    0.001957           1      2861      2861 connect
  3.25    0.001143           0      5709           fcntl64
  2.11    0.000742           0      2861           socket
  2.04    0.000717           4       185           epoll_wait
  1.91    0.000671           0      3014       166 accept
  1.57    0.000550           0      5709           setsockopt
  0.64    0.000225           0      2844           epoll_ctl
  0.00    0.000000           0       327           gettimeofday
------ ----------- ----------- --------- --------- ----------------
100.00    0.035128                 49129      5871 total

As you can see, the time is spend in "real" system calls such as shutdown, send, recv, ... and epoll_wait() which is the equivalent of poll() only accounts for 2% instead of 93% like in your case.

So my guess is that poll() is already going mad in your case with 600 conns/s and a few hundred concurrent connections. That seems very low, but clearly possible.

If you're doubting about the build options, please try with the pre-compiled binary from the haproxy site. That way you will be sure. I think we're on the way to solve your issue, keep patience :-)

Best regards,
Willy Received on 2008/02/16 19:54

Re: high cpu utilization