Hi Willy,
thanks for you analysis!
On Friday 11 July 2008 22:09:13 Willy Tarreau wrote:
> Since you put me the doubt, I've just checked the code. The number of bytes
> logged is the number of bytes READ on the socket. So if you see 66749 bytes
> in the request buffer, it really means that only that many bytes (including
> the headers and the request line) have reached the process.
OK.
To see some more information from apache I have added %I to its log format. This way I get the total size of the request. However, this is the number of bytes before SSL decryption, so it is just a not very accurate estimate of the deciphered request.
So, here is my last 502 error. Haproxy says: Jul 24 09:18:41 127.0.0.1 haproxy[4541]: 127.0.0.1:37099 [24/Jul/2008:09:15:56.316] myapp_http myapp_default/web8 6038/0/0/-1/164829 502 204 - - SHVN 3/0/0/0 0/0 {87.xxx.yyy.zzz|161272|58502|0|58502} "POST /AttachDocument.Asp?SID=569137&action=143&state=0&binaryattach=1&TaskID=NEW_1&TSK_ORD_OrderID=180292 HTTP/1.0" So the content length is 161272 bytes, but the request size is only 58502 byte, which is not even the half.
And this is what apache says:
87.xxx.yyy.zzz - - [24/Jul/2008:09:15:56 +0200] "POST
/AttachDocument.Asp?SID=569137&action=143&state=0&binaryattach=1&TaskID=NEW_1&TSK_ORD_OrderID=180292
HTTP/1.1" 502 123 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;
.NET CLR 1.1.4322; .NET CLR 2.0.50727)" 161272 - 10.3.0.41 369 162850 -
Here we see the content length header is again 161272 (not surprisingly) and
the total request size is 162850 byte.
This makes 1578 bytes difference, which sounds reasonable for the headers and
the SSL overhead when I compare it to other (successfull) requests.
So it seems apache gets the complete request.
Another interesting detail:
From the lines before and after the above line of the apache access log file I
conclude that it was written at 9:22:05.
Haproxy wrote its logfile line at 09:18:41. So this means it took apache more
than 4 minutes to send the 502 response to the client?
Something seems really broken here.
> That would mean
> several possibilities (from the client to the server) :
> - MSIE not sending you all the data
It seems MSIE is behaving ok and sends all data.
> - the retransmit mechanism between windows and the OS hosting your apache
> proxies not correctly working (eg: no retransmit after a FIN has been
> sent, which could explain why it hits close to the end).
Also seems to work.
> - problem with apache (not much likely)
hmm I hope not, but could be possible
> - problem between apache's OS and haproxy's OS => possible, given the
> recent issues with TCP retransmits not correctly working on some
> linux 2.6.
Haproxy and apache are running on the same machine. Apache connects to haproxy
locally via 127.0.0.1:84.
I think problems with TCP retransmits aren't an issue with local connections,
are they?
> - problem in linux sometimes not waking up haproxy for the last data
> block (not much likely)
> - problem with haproxy not always being woken up when data comes in
> (possible, but sounds strange)
> - problem with haproxy not refreshing some timeouts which would
> finally strike (possible). However from your example above, the
> request lasted more than two minutes, which is enormous for only
> 66 kB of data (it is only 480 Bps), so having a timeout problem
> with that little data sounds really weird.
> - problem with haproxy sometimes not sending the last data chunk
> to the server (possible).
> - problem with the server not reading all the data (not ACKing the
> last block for instance, which would clobber the request buffer).
>
> To rule out the two last possibilities, you could try the following :
> - report the request buffer's data length in the logs. You should
> find it in sess->req->l. You could also check that sess->req->total
> matches the request size you have already logged, as it is the
> source for this data. The request buffer should be empty if we're
> waiting for the client, or populated if we're waiting for the
> server (one of the last two possibilities).
I've added those. The value of sess->req->total always matches s-
>logs.bytes_in. And sess->req->l is always 0.
So this would mean that the problem lies between apache and haproxy?
> Also, since the requests are small, you might want to retry using
> larger buffers so that one full request fits in the buffer. For that,
> you need to rebuild with SMALL_OPTS="-DBUFSIZE=131072" for 128kB
> buffers for instance. That way, you'll know how large of the request
> apache did send.
OK. I'll try that.
> While I am thinking about it, could you check that you have the
> client timeout set to the same value as the server timeout ?
Yes its the same, my (relevant part of the) config file is below.
> Last resort will unfortunately be using tcpdump to try to capture
> a full request both on the front and back sides. I really hope
> that you won't have to tcpdump on an SSL connection :-/
Well I should try. But, since these 502 errors happen relatively seldom compared to the total number of requests I'll have to capture really large amounts of data.
But before I start with tcpdump I think I'll try another kernel first. Can you recommend a specific version?
Thanks for you help.
Best regards,
-Rainer
log 127.0.0.1 local6 chroot /var/lib/haproxy user haproxy group haproxy daemon defaults log global mode http option httplog option dontlognull retries 3 option redispatch option persist # maxconn 2000 contimeout 30000 clitimeout 600000 srvtimeout 600000 frontend myapp_http bind 10.3.0.41:80 bind 10.4.0.41:80 bind *:84 capture request header X-Forwarded-For len 15 capture request header content-length len 15 default_backend myapp_default backend myapp_default stats enable stats auth Administrator:****** cookie SERVERID insert indirect nocache balance roundrobin option httpclose option forwardfor except 127.0.0.1 fullconn 300 option httpchk HEAD /loadbalancer/server_available.html server web4 10.4.0.4:80 cookie srv4 check weight 7 minconn 4 maxconn 30 server web8 10.4.0.31:80 cookie srv8 check weight 50 minconn 16 maxconn 60 server web10 10.4.0.33:80 cookie srv10 check weight 50 minconn 16maxconn 60 Received on 2008/07/24 16:35
This archive was generated by hypermail 2.2.0 : 2008/07/24 16:45 CEST