On 30/11/08 07:52, Willy Tarreau wrote:
> Hello Kai,
>
> On Sun, Nov 30, 2008 at 01:28:05AM +0000, Kai Krueger wrote:
>
>>> From your description, it looks like this is what is happening. However,
>>> this must be independant on the LB algo. What might be happening though,
>>> is that the last server to be identified as failing gets all the requests.
>>>
>>>
>> Is there a way to try and find out if this is the case? I think at those
>> times, there were still other backends up.
>>
>
> You can only check this by carefully analysing the logs. You will see
> at what moment exactly a server goes down, and when requests are sent
> to it.
>
>
>> Bellow I have a log which I think captures the effect. (Similar config
>> as previously, but some maxconn and check inter settings tuned to make
>> it easier to trigger this and it was run on a different machine)
>>
>
> Ah, exactly the thing I wanted to see ;-)
>
>
>> Nov 30 00:21:31 aiputerlx haproxy[10400]: Proxy ROMA started.
>> Nov 30 00:21:45 aiputerlx haproxy[10400]: 127.0.0.1:51993
>> [30/Nov/2008:00:21:39.125] ROMA ROMA/mat 0/0/0/2339/6830 200 8909913 - -
>> ---- 3/3/3/1/0 0/0 "GET /api/0.5/map?bbox=8.20,49.07,8.50,49.30 HTTP/1.0"
>>
>
> This one was before the last check.
>
>
>> Nov 30 00:21:47 aiputerlx haproxy[10400]: Server ROMA/mat is DOWN. 2
>> active and 0 backup servers left. 2 sessions active, 0 requeued, 0
>> remaining in queue.
>> Nov 30 00:21:55 aiputerlx haproxy[10400]: 127.0.0.1:52006
>> [30/Nov/2008:00:21:45.544] ROMA ROMA/mat 0/0/0/4754/9698 200 8909913 - -
>> ---- 4/4/4/1/0 0/0 "GET /api/0.5/map?bbox=8.20,49.07,8.50,49.30 HTTP/1.0"
>> Nov 30 00:21:56 aiputerlx haproxy[10400]: 127.0.0.1:52009
>> [30/Nov/2008:00:21:46.908] ROMA ROMA/mat 2/0/0/3830/8578 200 8909913 - -
>> ---- 3/3/3/1/0 0/0 "GET /api/0.5/map?bbox=8.20,49.07,8.50,49.30 HTTP/1.0"
>>
>
> These last two were sent before the the server was seen as down.
>
>
>> Nov 30 00:22:03 aiputerlx haproxy[10400]: 127.0.0.1:52014
>> [30/Nov/2008:00:21:48.399] ROMA ROMA/mat 0/6843/0/3320/14661 200 8909913
>> - - ---- 2/2/2/0/0 0/1 "GET /api/0.5/map?bbox=8.20,49.07,8.50,49.30
>> HTTP/1.0"
>>
>
> This one was sent after the server was marked as down. Strange.
>
>
>> Nov 30 00:22:05 aiputerlx haproxy[10400]: 127.0.0.1:52001
>> [30/Nov/2008:00:21:44.025] ROMA ROMA/mat 0/0/13/10559/21837 200 8938528
>> - - ---- 1/1/1/0/0 0/0 "GET /api/0.5/map?bbox=8.20,49.07,8.50,49.30 HTTP/1.0"
>> Nov 30 00:22:18 aiputerlx haproxy[10400]: 127.0.0.1:51998
>> [30/Nov/2008:00:21:42.519] ROMA ROMA/Quiky 0/0/130/3176/35580 200 8938503
>> - - ---- 0/0/0/0/0 0/0 "GET /api/0.5/map?bbox=8.20,49.07,8.50,49.30 HTTP/1.0"
>>
>
> These one were sent before. I don't understand why they were not reported
> in the DOWN message ("2 sessions active", should have been "4").
>
>
>> Nov 30 00:22:26 aiputerlx haproxy[10400]: Server ROMA/mat is UP. 3
>> active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
>>
>> According to the log, Server ROMA/mat is DOWN at 00:21:47, however the
>> 4th request which was sent at 00:21:48 was still queued to ROMA/mat
>> which at that time was still down, as it is not up till 00:22:26. It
>> might have received a valid health check message and might have been
>> down going up. Am I perhaps misreading the logs here?
>>
>
> No, you're reading the logs correctly and I don't understand the problem
> either. There are two problems above IMHO :
> - 2 sessions were reported on the server while 4 were really being processed
> - 1 session was sent to the server after it was marked down.
>
Are the numbers that are reported at the time of the Server going down
the numbers of the server, or the entire backend? Weren't there only two
sessions being processed by the server and the others queued in the backend?
> Could you try to change the LB algo from leastconn to roundrobin and
> reproduce the test above ? If only leastconn triggers the issue, then
> there might be a bug in its set_server_down() function. Maybe sometimes
> a server is not removed from the tree.
>
I have done the test with roundrobin now too and it looks like it the
balancing mechanism doesn't matter, as it happens also with roundrobin.
Nov 30 10:39:37 aiputerlx haproxy[16291]: Proxy ROMA started.
Nov 30 10:40:22 aiputerlx haproxy[16291]: Server ROMA/localhost is DOWN.
2 active and 0 backup servers left. 2 sessions active, 0 requeued, 0
remaining in queue.
Nov 30 10:40:23 aiputerlx haproxy[16291]: 127.0.0.1:56102
[30/Nov/2008:10:40:18.500] ROMA ROMA/localhost 0/0/0/1429/5268 200
5436019 - - ---- 4/4/4/1/0 0/0 "GET
/api/0.5/map?bbox=8.20,49.07,8.40,49.30 HTTP/1.0"
Nov 30 10:40:26 aiputerlx haproxy[16291]: 127.0.0.1:56112
[30/Nov/2008:10:40:22.249] ROMA ROMA/localhost 0/0/0/1417/4173 200
5436019 - - ---- 6/6/6/1/0 0/0 "GET
/api/0.5/map?bbox=8.20,49.07,8.40,49.30 HTTP/1.0"
Nov 30 10:40:27 aiputerlx haproxy[16291]: 127.0.0.1:56121
[30/Nov/2008:10:40:24.791] ROMA ROMA/localhost 0/1631/1/1/1633 503 619 -
- CD-- 5/5/5/3/0 0/2 "GET /api/0.5/map?bbox=8.20,49.07,8.40,49.30 HTTP/1.0"
Nov 30 10:40:28 aiputerlx haproxy[16291]: 127.0.0.1:56123
[30/Nov/2008:10:40:25.727] ROMA ROMA/localhost 0/696/0/1004/1701 503 619
- - CD-- 5/5/5/2/0 0/3 "GET /api/0.5/map?bbox=8.20,49.07,8.40,49.30
HTTP/1.0"
Nov 30 10:40:28 aiputerlx haproxy[16291]: 127.0.0.1:56128
[30/Nov/2008:10:40:26.783] ROMA ROMA/localhost 0/646/2/0/648 503 619 - -
CD-- 4/4/4/2/0 0/1 "GET /api/0.5/map?bbox=8.20,49.07,8.40,49.30 HTTP/1.0"
Nov 30 10:40:29 aiputerlx haproxy[16291]: 127.0.0.1:56115
[30/Nov/2008:10:40:23.057] ROMA ROMA/localhost 0/711/0/1584/6251 200
5436019 - - ---- 6/6/6/1/0 0/1 "GET
/api/0.5/map?bbox=8.20,49.07,8.40,49.30 HTTP/1.0"
Nov 30 10:40:29 aiputerlx haproxy[16291]: 127.0.0.1:56132
[30/Nov/2008:10:40:28.351] ROMA ROMA/localhost 0/958/0/25/984 503 619 -
- CD-- 5/5/5/3/0 0/2 "GET /api/0.5/map?bbox=8.20,49.07,8.40,49.30 HTTP/1.0"
Nov 30 10:40:30 aiputerlx haproxy[16291]: 127.0.0.1:56131
[30/Nov/2008:10:40:28.039] ROMA ROMA/localhost 0/1269/0/1002/2272 503
619 - - CD-- 4/4/4/2/0 0/1 "GET /api/0.5/map?bbox=8.20,49.07,8.54,49.30
HTTP/1.0"
Nov 30 10:40:32 aiputerlx haproxy[16291]: 127.0.0.1:56119
[30/Nov/2008:10:40:24.410] ROMA ROMA/localhost 0/2012/0/2329/7978 200
5436019 - - ---- 3/3/3/1/0 0/1 "GET
/api/0.5/map?bbox=8.20,49.07,8.40,49.30 HTTP/1.0"
Nov 30 10:40:33 aiputerlx haproxy[16291]: 127.0.0.1:56134
[30/Nov/2008:10:40:29.272] ROMA ROMA/localhost 0/36/1/1943/4710 200
5436019 - - ---- 2/2/2/0/0 0/3 "GET
/api/0.5/map?bbox=8.20,49.07,8.40,49.30 HTTP/1.0"
Nov 30 10:40:42 aiputerlx haproxy[16291]: 127.0.0.1:56108
[30/Nov/2008:10:40:21.321] ROMA ROMA/mat 0/0/32/7096/21248 200 5446937 -
- ---- 1/1/1/0/0 0/0 "GET /api/0.5/map?bbox=8.20,49.07,8.40,49.30 HTTP/1.0"
Nov 30 10:40:56 aiputerlx haproxy[16291]: 127.0.0.1:56105
[30/Nov/2008:10:40:19.720] ROMA ROMA/Quiky 0/0/152/2077/37209 200
5446912 - - ---- 0/0/0/0/0 0/0 "GET
/api/0.5/map?bbox=8.20,49.07,8.40,49.30 HTTP/1.0"
Nov 30 10:41:03 aiputerlx haproxy[16291]: Server ROMA/localhost is UP. 3
active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
>
>>>> Is there a way to prevent this from happening?
>>>>
>>>>
>>> Not really since the connections are already established, it's too late.
>>> You can shorten your srvtimeout though. It will abort the requests earlier.
>>> Yours are really huge (20mn for HTTP is really huge). But anyway, once
>>> connected to the server, the request is immediately sent, and even if you
>>> close, you only close one-way so that server will still process the
>>> request.
>>>
>>>
>> The backend servers have to query a database to generate the response
>> and some of these replies can occasionally take up to 10 - 20 minutes to
>> process. This is why we chose the very long request timeout so that even
>> these can be handled. Th
>>
>
> The problem is that you have large timeouts for all requests. Can't you
> identify some requests which might take long, and switch them to a distinct
> backend section with a large timeout ?
>
That would be difficult, as a priory we can't tell really tell if the
query is going to take long. The queries are for a geospacial database
(www.openstreetmap.org) and so depending on how much data is available
in a given bounding box the time might vary from a few seconds to many
minutes.
>
>>> In fact, what I wanted to add in the past, was the ability to either
>>> modulate weights depending on the error ratios, or speed-up health
>>> checks when an error has been detected in a response, so that a failing
>>> server can be identified faster.
>>>
>> That would probably be useful to detect these problems and take the
>> backend server offline faster. I had a similar idea and tried hacking
>> the source code a bit. I ended up adding a call to set_server_down() in
>> process_srv() of proto_http.c if the reply is a 500 error. It seemed to
>> work sofar, but I haven't understood the code well enough to know if
>> this is even closely valid or if it has nasty race conditions or other
>> problems. Is this a safe thing to do?
>>
>
> For it to work, you need to set s->health = s->rise before calling the
> function (otherwise it does nothing as you might have noticed). I don't
> see any reason for this to be unsafe.
>
Great. We will probably use this then and if we also understand the
above problem I think that should be sufficient for us for the moment.
Yes, I had inserted the line s->health = s->rise too. I also have a
patch to display the number of 50x errors the server has returned in the
haproxy?stats page next to the response errors, in case you are interested.
>
>>> Also, is it on purpose that your "inter"
>>> parameter is set to 10 seconds ? You need at least 20 seconds in your
>>> config to detect a faulty server. Depending on the load or your site,
>>> this might impact a huge number of requests. Isn't it possible to set
>>> shorter intervals ? I commonly use 1s, and sometimes even 100ms on some
>>> servers (with a higher fall parameter). Of course it depends on the work
>>> performed on the server for each health-check.
>>>
>>>
>> It could probably be reduced, somewhat, although if there is a queue
>> even a short interval may see a lot of failed requests,
>> as it can drain away the entire queue in the mean time.
>>
>
> You shouldn't have a queue on the server itself, otherwise haproxy's
> health checks could fail. You should only use haproxy's queuing
> mechanism as you do with the servers' maxconn parameters. However,
> you should ensure that these values are low enough to let a health
> check pass through, since health-checks bypass the queue. So if
> your server can process 10 concurrent requests, having maxconn 9
> leaves one slot for a health-check, and this one can be fast.
>
I don't think we had a queue on the server it self at least there
shouldn't have been, as all requests are independent, there is no need
for session persistence in our case. What I think happened is the
following senario: We have 3 servers, at that time all servers are
saturated with maxcon connections, so requests get queued up in the
proxy queue. Now one request finishes on a server that has just become
overloaded, but not yet marked down. As it is the only server that has a
free connection slot, the proxy sends the session to that server. But
because it has become overloaded, the server replies immediately (a few
milliseconds) with a 503, thus the connection slot becomes free again
and the next session is sent to that server from the proxy queue. Of
course, the smaller the time window of checks, the smaller the chances
are that a requests finishes on a server that has become overloaded but
not yet marked down.
>
>>> There are two problems to replay a request. The first one is that the
>>> request is not in the buffer anymore once the connection is established
>>> to the server. We could imagine mechanisms to block it under some
>>> circumstances. The second problem is that as you say, haproxy cannot
>>> know which requests are safe to replay. HTTP defines idempotent methods
>>> such as GET, PUT, DELETE, ... which are normally safe to replay.
>>> Obviously now GET is not safe anymore, and with cookies it's even
>>> more complicated because a backend server can identify a session
>>> for a request and make that session's state progress for each request.
>>> Also imagine what can happen if the user presses Stop, clicks a different
>>> link, and by this time, haproxy finally gives up on former request and
>>> replays it on another server. You can seriously affect the usability of
>>> a site or even its security (you don't want a login page to be replayed
>>> past the logout link for instance).
>>>
>>> So there are a lot of complex cases where replaying is very dangerous.
>>>
>>>
>> It is true that in many cases it is dangerous to requeue requests, but
>> it would be nice if there were a configuration parameter with which one
>> could tell haproxy that in this particular case one knows for sure that
>> requeueing is safe.
>>
>
> This could be selected via ACLs for instance. The real problem right
> now is still to keep the request in the buffer once it has already been
> sent.
>
>
>>> I'd be more tempted by adding an option to not return anything upon
>>> certain types of errors, so that the browser can decide itself whether
>>> to replay or not. Oh BTW you can already do that using "errorfile".
>>> Simply return an empty file for 502, 503 and 504, and your clients
>>> will decide whether they should replay or not when a server fails to
>>> respond.
>>>
>>>
>> Well, I had hoped to use haproxy to mask these errors and provide a
>> higher availability and not shift the burden of retrying to the client.
>>
>
> I understand your point of view, but HTTP was designed so that it's up
> to the client to decide what to do in abnormal conditions. In particular,
> HTTP 1.1 is very clear on this. There are a lot of situations where a
> server or a client cannot complete an operation and must break the
> connection without sending data so that the other end knows.
>
> I think that in your case it's different, because you don't use real
> browsers, but you use HTTP as a transport protocol between various
> systems. In this case, haproxy abstracts TCP problems from the HTTP
> layer, but the HTTP errors still have to be processed on the client
> just like it would have to handle TCP errors if it would be using TCP.
>
> I'm still noting your need of replayable request. It might be easier
> to implement in future versions than it is right now. I still think
> that there's little chance we see this before keep-alive though...
>
Thank you. With the patch to additionally use the real requests as
health check, it shouldn't be too much of a problem anymore for us, as
we can cope with the odd request failing.
>
>>>> P.S. In case it helps, I have attached the configuration we use with
>>>> haproxy (version 1.3.15.5, running on freeBSD)
>>>>
>>>>
>>> Thanks, that's a good reflex, people often forget that important part ;-)
>>>
>>>
>> :-)
>>
>> Here are the things that changed in the config that generated the log
>> file above and it was this time run on a linux box
>>
>> server mat localhost:80 maxconn 6 weight 10 check inter 1s rise 30 fall 1
>> server Quiky 68.49.216.76:80 maxconn 1 weight 10 check inter 10s rise 3 fall 1
>> server mat 79.143.240.199:80 maxconn 1 weight 10 check inter 10s rise 3 fall 1
>>
>
> Look, you have to servers called "mat". That explains a lot of things
> in the log above ! Some of the requests were for the first one, others for
> the second one. Could you please recheck with different names for all of
> them ?
>
Oops, sorry, what a silly mistake. I have changed the first server name
to localhost now, still seems to behave similarly though. Below is
another log of the problem with least connection balancing.
Kai
> Regards,
> Willy
>
>
Nov 30 10:31:17 aiputerlx haproxy[15892]: Proxy ROMA started. Nov 30 10:31:39 aiputerlx haproxy[15892]: 127.0.0.1:53274 [30/Nov/2008:10:31:34.561] ROMA ROMA/localhost 0/0/0/1371/4863 2005436019 - - ---- 4/4/4/2/0 0/0 "GET
This archive was generated by hypermail 2.2.0 : 2008/11/30 12:30 CET