Hi!
I have tried to bench HAProxy 1.4.15 using two Spirent Avalanche. The HAProxy box has the following features:
driver: igb
version: 2.1.0-k2
firmware-version: 1.2-1
Those are multiqueue cards and their interrupts are spread on the 8 processors on TX and RX.
http://www.intel.com/products/server/adapters/pro1000pt-dualport/pro1000pt-dualport-overview.htm
The setup is fairly simple. The HAProxy box is connected to some Nortel 5530 switch using active/active bond (balance-xor on Linux side, MLT on Nortel side). Both Avalanche are also connected to this switch using 2 links. One of them act as a reflector (web server). Each link is mapped to a set of clients (for the regular Avalanche) or act as a set of servers (for the reflector).
Offloading is enabled.
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: on
MTU is set to 1500 (no jumbo frames)
╭─────────────────────────────────────────────────╮ │ 5530 switch ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ │ │ └┬┘ └┬┘ └┬┘ └┬┘ └┬┘ └┬┘ └─┘ └─┘ │ ╰────────────────┼───┼────┼───┼───┼───┼───────────╯
│ │ │ │ │ └──────────────┐
│ │ │ │ └──────────────┐ │
╭────────────────┼───┼─╮╭─┼───┼────────────────╮ │ │
│ Avalanche ┌┴┐ ┌┴┐││┌┴┐ ┌┴┐ Avalanche │ │ │
│ (clients) └─┘ └─┘││└─┘ └─┘ (reflector) │ │ │
╰──────────────────────╯╰──────────────────────╯ │ │
│ │
╭──────────┼───┼─╮
│ HAProxy ┌┴┐ ┌┴┐│
│ └─┘ └─┘│
╰────────────────╯
The Avalanche simulates 256 clients on each port to attach 4 IP that are configured in HAProxy. The Reflector simulates 4 web servers, 2 on each port. Those servers serve 1KB pages. Here is my configuration of haproxy :
global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
user haproxy
group haproxy
nbproc 1
daemon
stats socket /var/run/haproxy.socket
defaults
log global
mode http
option httplog
option dontlognull
option splice-auto
retries 3
option redispatch
contimeout 5s
clitimeout 50s
srvtimeout 50s
listen poolbench
bind 172.31.200.10:80
bind 172.31.201.10:80
bind 172.31.202.10:80
bind 172.31.203.10:80
mode http
option splice-response
stats enable
option httpchk /
option dontlog-normal
option log-health-checks
balance roundrobin
server real1 172.31.208.2:80
server real2 172.31.209.2:80
server real3 172.31.210.2:80
server real4 172.31.211.2:80
HA-Proxy version 1.4.15 2011/04/08
Copyright 2000-2010 Willy Tarreau <w#1wt.eu>
Build options :
TARGET = linux26 CPU = generic CC = gcc CFLAGS = -O2 -g -fno-strict-aliasing OPTIONS = USE_LINUX_SPLICE=1 USE_LINUX_TPROXY=1 USE_REGPARM=1USE_PCRE=1 Default settings :
Encrypted password support via crypt(3): yes
Available polling systems :
sepoll : pref=400, test result OK
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 4 (4 usable), will use sepoll.
With this configuration, I get 10 000 HTTP req/s. The haproxy process takes 100% CPU. Changing "maxconn" or disabling splice does not change anything. If I use 6 haproxy process, I can get to 30 000 HTTP req/s. All haproxy takes 100% CPU in this case. Moreover, I am pretty sure that the Avalanche is not the bottleneck since we can bench more than 120 000 HTTP req/s with the same setup. I have tried to stick haproxy to 1 CPU (with taskset) and I still get 10 000 HTTP req/s.
Now, if I look at http://haproxy.1wt.eu/#perf, I can see that I should be able to achieve 40 000 HTTP req/s. This is four times what I am able to achieve. What is wrong with my setup? Why enabling/disabling splice does not affect my results? Is there a way to fetch the 2.6.27-wt5 used for the tests?
A side question now. Enabling the use of multiple processes would allow to leverage the power of modern multi-core machines (we now get 6 cores per CPU on recent servers). However, this is discouraged. One drawback is the inability to get reliable stats. Is this problem worked on? We could spawn some master process that exhibits the stat socket. This master will grab stats from the other processes using the same protocol as on the socket but using pipes. Stats will be aggregated and sent back to the client.
Thanks for any insight on the performance part. Received on 2011/05/04 16:54
This archive was generated by hypermail 2.2.0 : 2011/05/04 17:00 CEST