Re: Running haproxy on the cluster nodes

From: Willy Tarreau <w#1wt.eu>
Date: Wed, 12 Dec 2007 07:02:14 +0100


On Tue, Dec 11, 2007 at 08:15:51PM -0500, Martin Goldman wrote:
> Thanks Willy. I got excited there for a second, but I tried making the
> suggested changes, and would you believe me if I said it didn't seem to
> help?
>
> As you suggested, I updated the conf file with maxconn set to 100000, just
> so I could be sure that wouldn't be a bottleneck. I recompiled for linux26,
> and got the appropriate output this time:
>
> martin#kramer:~/haproxy-1.3.13.1$ sudo /usr/sbin/haproxy -f /etc/haproxy.cfg
> -V
> Available polling systems :
> sepoll : pref=400, test result OK
> epoll : pref=300, test result OK
> poll : pref=200, test result OK
> select : pref=150, test result OK
> Total: 4 (4 usable), will use sepoll.
> Using sepoll() as the polling mechanism.
>
> I re-ran the apachebench, and the requests per second achieved were still
> lower than that of each of the web servers individually. I did two tests:
>
> - 500-byte file, 1000 concurrent requests, 500,000 total requests:
> Individual node = 14,000 requests/second; Cluster = 13,100 requests/second
> - 100 KB file, 100 concurrent requests, 50,000 total requests: Individual
> node = 780 requests/second; Cluster = 400 requests/second

This is really not expected, because here I see three limitations :

Among the possibilities I see, a poor network chip or driver would explain the most these symptoms. For instance, if you just have a PCI NIC, it is limited to around 800 Mbps in+out, which would explain why you don't saturate the Gig with your apache alone, and why it is halved through haproxy. But as I said, I expect such machines to run correct chips.

In fact, it is essential that you first achieve to reach the gigabit on your individual nodes. As long as we don't know why it is not possible, we'll not find any solution.

> I don't know if there's anything else you can think of, but I certainly
> appreciate all the ideas thus far.

I also have another idea for an additional test. By default, apache-bench uses keep-alive requests, but haproxy transforms them into close requests. Since a keep-alive request needs less packets than a close request, and since we don't know yet what's happening at the network level, it could be possible that this difference explains the performance drop.

You could try to run apache-bench with keep-alive disabled on individual nodes (I believe it is the -k option but I'm not sure), then you could comment out the "option httpclose" in your haproxy config so that it does not transform the requests. It will not be good for logs and load balancing but it will show if this is what lowers your performance.

Last, are you sure that both of your nodes respond correctly for the 500 bytes files ? Having only one of them responding fast would lead to a lower performance for the cluster.

Best regards,
Willy Received on 2007/12/12 07:02

This archive was generated by hypermail 2.2.0 : 2007/12/12 07:15 CET