I currently have Apache acting as a proxy/load balancer in front of several backend application servers. It works pretty well, but as our traffic is ramping up I feel we need a more intelligent proxy solution. So I've come across haproxy as what seems to be the standard for extremely high traffic load balancing. Editors note: this is all set up in Rackspace cloud servers. So far I've been able to set up all functionality I need, (sticky sessions with cookie, and an ACL where certain requests will go to a different set of application servers).
The only problem I'm having is that the round-trip time for a request has gone up dramatically. When using apache or nginx, with both HTTP and HTTPS connections, there is an average RTT per request of 50ms (which I consider good). With haproxy I am getting with HTTP an average RTT of a bit over 100ms, and adding stunnel4 in front to terminate the HTTPS connections adds another ~50ms which brings the total upwards 175-225ms per request.
This is strange because when I prototyped this network layout at home with hardware (e.g. no virtual machines) haproxy had a higher minimum RTT than nginx, but it did not add anywhere near 50ms even when using stunnel in front of and again behind haproxy (to retain fully encrypted traffic to the backend servers) and under very high loads there was no significant increase as well. Granted that was all on a LAN, but when I see +50ms added to my requests in the cloud just for using haproxy instead of apache/nginx, that seems a little out of the ordinary. Another +50ms for terminating an HTTPS request with stunnel and routing it to a local static page? That's crazy.
My theory was that perhaps in the virtualized environment when you route a request through the localhost interface before sending out through your internal LAN IP (which is how haproxy/stunnel are passing on their connections), it somehow hits a virtualization buffer/queue that takes it out to the host OS network stack before being routed back as a localhost request. This could explain all the added latency every time I make a new localhost based route. But when I ping localhost on a cloud server, I get 0.000ms RTT. Same if I ping my eth0 or eth1 network interfaces. Which is the way it should be, no network added latency and in fact no measurable latency at all.
If I am doing a proxy configuration that allows SSL to the proxy and opens another SSL tunnel to a backend server adding 250-300ms latency penalty right off the top, that is just not acceptable. My site will look sluggish and unresponsive. (And I will need this full-through encryption to meet PCI compliance.)
I've tried every option related to keepalive and no keepalive...
This is my current haproxy.cfg
global
log 127.0.0.1 local0 maxconn 4096 user haproxy group haproxy daemon # for hatop stats socket /var/run/haproxy.sock mode 0600 level admin defaults log global mode http option httplog option logasap option redispatch retries 3 maxconn 2000 timeout connect 10s timeout client 60s timeout server 60s option http-server-close option http-pretend-keepalive option forwardfor cookie BALANCEID frontend http_proxy bind 0.0.0.0:80 default_backend cluster1 acl is_gfish path_beg /gfish_url use_backend cluster2 if is_gfish backend cluster1 server swww1 :80 cookie balancer.www1 server swww2 :80 cookie balancer.www2 backend cluster2 # replace the HTTP version in the request header (from 1.1 to 1.0) because glassfish will give a 0 byte response otherwise reqrep ^(.*)\ HTTP/[^\ ]+$ \1\ HTTP/1.0 server gfish1 :8080 cookie balancer.www1 server gfish2 :8080 cookie balancer.www2
I was also thinking that maybe the reqrep header mangling would add some latency, but I don't know how much and I can't test since I don't manage the glassfish servers on the back to try them with different settings. But since stunnel+haproxy is adding latency to all requests, and not just glassfish ones, then I don't really know. Any ideas for debugging this issue would be greatly appreciated.
--- posted at http://www.serverphorums.com http://www.serverphorums.com/read.php?10,216799,216799#msg-216799Received on 2010/10/18 23:24
This archive was generated by hypermail 2.2.0 : 2010/10/18 23:30 CEST