Hi,
We run Haproxy on Amazon ec2 for http load balancing. On Monday (august 11) we upgraded seven of our load balancers in two of our products to 1.3.20 from 1.3.15.8 (four servers, all of one product) and 1.3.18 (three servers, all of the other product). We kept the config files the same. We finished replacing the load balancers by 2300 UTC on aug 11, and at about 0900 UTC Aug 12 the first cluster (the one upgraded from 1.3.15.8) started showing performance issues, enough to cause our monitoring systems to go off. Response times were several seconds. Logging on to one of the load balancers I saw normal cpu and memory, but looking at netstat -anp I saw more than 30k lines there, the majority in TIME_WAIT state. For background, the load balancers each point to the same pool of about 60 servers, which at the time were doing about 20-30 sessions per server, and the servers reporting about 80 requests per second (nominally 60% of peak). At this point we put the old load balancers back into production and found them to be still working fine. At around 1200 UTC Aug 12 a nearly identical state occured on the other set of load balancers (the ones upgraded from 1.3.18).
If anyone can see any issues please let me know.
I have pasted a representative haproxy.cfg file below:
# this config needs haproxy-1.1.28 or haproxy-1.2.1
global
#log 127.0.0.1 local0 info #log 127.0.0.1 local1 notice #log loghost local0 info
defaults
#log global
mode http
#option httplog
option dontlognull
option redispatch
retries 3
maxconn 75000
contimeout 5000
clitimeout 50000
srvtimeout 2000
frontend openx *:80
#log global
maxconn 75000
option forwardfor
default_backend openx_ec2_hosted_http
backend openx_ec2_hosted_http
mode http
#balance roundrobin
balance leastconn
option abortonclose
option httpclose
#remove the line below if not 1.3.20
#option httpchk HEAD /health.chk
timeout queue 500
#option forceclose
server crt.hosted.bigd04 10.252.102.128:80 check maxconn 150 weight 2
...
server crt.hosted.d03 10.252.203.175:80 check maxconn 50
...
server crt.hosted.d75 10.209.81.155:80 check maxconn 30
frontend openx_ssl *:443
#log global
mode tcp
maxconn 75000
option forwardfor
default_backend openx_ec2_hosted_ssl
backend openx_ec2_hosted_ssl
mode tcp
#balance roundrobin
balance leastconn
option abortonclose
option httpclose
#option forceclose
server crt.hosted.bigd04-ssl 10.252.102.128:443 check maxconn 150
...
server crt.hosted.d03-ssl 10.252.203.175:443 check maxconn 30
Received on 2009/08/12 21:50
This archive was generated by hypermail 2.2.0 : 2009/08/12 22:00 CEST