Hi!
I have tried to bench HAProxy 1.4.15 using two Spirent Avalanche. The HAProxy box has the following features:
driver: igb
version: 2.1.0-k2
firmware-version: 1.2-1
Those are multiqueue cards and their interrupts are spread on the 8 processors on TX and RX.
http://www.intel.com/products/server/adapters/pro1000pt-dualport/pro1000pt-dualport-overview.htm
The setup is fairly simple. The HAProxy box is connected to some Nortel 5530 switch using active/active bond (balance-xor on Linux side, MLT on Nortel side). Both Avalanche are also connected to this switch using 2 links. One of them act as a reflector (web server). Each link is mapped to a set of clients (for the regular Avalanche) or act as a set of servers (for the reflector).
Offloading is enabled.
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: on
MTU is set to 1500 (no jumbo frames)
╭─────────────────────────────────────────────────╮ │ 5530 switch ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ │ │ └┬┘ └┬┘ └┬┘ └┬┘ └┬┘ └┬┘ └─┘ └─┘ │ ╰────────────────┼───┼────┼───┼───┼───┼───────────╯
│ │ │ │ │ └──────────────┐ │ │ │ │ └──────────────┐ │ ╭────────────────┼───┼─╮╭─┼───┼────────────────╮ │ │ │ Avalanche ┌┴┐ ┌┴┐││┌┴┐ ┌┴┐ Avalanche │ │ │ │ (clients) └─┘ └─┘││└─┘ └─┘ (reflector) │ │ │ ╰──────────────────────╯╰──────────────────────╯ │ │ │ │ ╭──────────┼───┼─╮ │ HAProxy ┌┴┐ ┌┴┐│ │ └─┘ └─┘│ ╰────────────────╯
The Avalanche simulates 256 clients on each port to attach 4 IP that are configured in HAProxy. The Reflector simulates 4 web servers, 2 on each port. Those servers serve 1KB pages. Here is my configuration of haproxy :
global
log 127.0.0.1 local0 log 127.0.0.1 local1 notice user haproxy group haproxy nbproc 1 daemon stats socket /var/run/haproxy.socket defaults log global mode http option httplog option dontlognull option splice-auto retries 3 option redispatch contimeout 5s clitimeout 50s srvtimeout 50s listen poolbench bind 172.31.200.10:80 bind 172.31.201.10:80 bind 172.31.202.10:80 bind 172.31.203.10:80 mode http option splice-response stats enable option httpchk / option dontlog-normal option log-health-checks balance roundrobin server real1 172.31.208.2:80 server real2 172.31.209.2:80 server real3 172.31.210.2:80 server real4 172.31.211.2:80
HA-Proxy version 1.4.15 2011/04/08
Copyright 2000-2010 Willy Tarreau <w#1wt.eu>
Build options :
TARGET = linux26 CPU = generic CC = gcc CFLAGS = -O2 -g -fno-strict-aliasing OPTIONS = USE_LINUX_SPLICE=1 USE_LINUX_TPROXY=1 USE_REGPARM=1USE_PCRE=1 Default settings :
Encrypted password support via crypt(3): yes
Available polling systems :
sepoll : pref=400, test result OK epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OKTotal: 4 (4 usable), will use sepoll.
With this configuration, I get 10 000 HTTP req/s. The haproxy process takes 100% CPU. Changing "maxconn" or disabling splice does not change anything. If I use 6 haproxy process, I can get to 30 000 HTTP req/s. All haproxy takes 100% CPU in this case. Moreover, I am pretty sure that the Avalanche is not the bottleneck since we can bench more than 120 000 HTTP req/s with the same setup. I have tried to stick haproxy to 1 CPU (with taskset) and I still get 10 000 HTTP req/s.
Now, if I look at http://haproxy.1wt.eu/#perf, I can see that I should be able to achieve 40 000 HTTP req/s. This is four times what I am able to achieve. What is wrong with my setup? Why enabling/disabling splice does not affect my results? Is there a way to fetch the 2.6.27-wt5 used for the tests?
A side question now. Enabling the use of multiple processes would allow to leverage the power of modern multi-core machines (we now get 6 cores per CPU on recent servers). However, this is discouraged. One drawback is the inability to get reliable stats. Is this problem worked on? We could spawn some master process that exhibits the stat socket. This master will grab stats from the other processes using the same protocol as on the socket but using pipes. Stats will be aggregated and sent back to the client.
Thanks for any insight on the performance part. Received on 2011/05/04 16:54
This archive was generated by hypermail 2.2.0 : 2011/05/04 17:00 CEST