Hi,
On 30/11/08 07:52, Willy Tarreau wrote:
> ...
> Ah, exactly the thing I wanted to see ;-)
>
>
>> Nov 30 00:21:31 aiputerlx haproxy[10400]: Proxy ROMA started.
>> Nov 30 00:21:45 aiputerlx haproxy[10400]: 127.0.0.1:51993
>> [30/Nov/2008:00:21:39.125] ROMA ROMA/mat 0/0/0/2339/6830 200 8909913 - -
>> ---- 3/3/3/1/0 0/0 "GET /api/0.5/map?bbox=8.20,49.07,8.50,49.30 HTTP/1.0"
>>
>
> This one was before the last check.
>
>
>> Nov 30 00:21:47 aiputerlx haproxy[10400]: Server ROMA/mat is DOWN. 2
>> active and 0 backup servers left. 2 sessions active, 0 requeued, 0
>> remaining in queue.
>> Nov 30 00:21:55 aiputerlx haproxy[10400]: 127.0.0.1:52006
>> [30/Nov/2008:00:21:45.544] ROMA ROMA/mat 0/0/0/4754/9698 200 8909913 - -
>> ---- 4/4/4/1/0 0/0 "GET /api/0.5/map?bbox=8.20,49.07,8.50,49.30 HTTP/1.0"
>> Nov 30 00:21:56 aiputerlx haproxy[10400]: 127.0.0.1:52009
>> [30/Nov/2008:00:21:46.908] ROMA ROMA/mat 2/0/0/3830/8578 200 8909913 - -
>> ---- 3/3/3/1/0 0/0 "GET /api/0.5/map?bbox=8.20,49.07,8.50,49.30 HTTP/1.0"
>>
>
> These last two were sent before the the server was seen as down.
>
>
>> Nov 30 00:22:03 aiputerlx haproxy[10400]: 127.0.0.1:52014
>> [30/Nov/2008:00:21:48.399] ROMA ROMA/mat 0/6843/0/3320/14661 200 8909913
>> - - ---- 2/2/2/0/0 0/1 "GET /api/0.5/map?bbox=8.20,49.07,8.50,49.30
>> HTTP/1.0"
>>
>
> This one was sent after the server was marked as down. Strange.
>
>
>> Nov 30 00:22:05 aiputerlx haproxy[10400]: 127.0.0.1:52001
>> [30/Nov/2008:00:21:44.025] ROMA ROMA/mat 0/0/13/10559/21837 200 8938528
>> - - ---- 1/1/1/0/0 0/0 "GET /api/0.5/map?bbox=8.20,49.07,8.50,49.30 HTTP/1.0"
>> Nov 30 00:22:18 aiputerlx haproxy[10400]: 127.0.0.1:51998
>> [30/Nov/2008:00:21:42.519] ROMA ROMA/Quiky 0/0/130/3176/35580 200 8938503
>> - - ---- 0/0/0/0/0 0/0 "GET /api/0.5/map?bbox=8.20,49.07,8.50,49.30 HTTP/1.0"
>>
>
> These one were sent before. I don't understand why they were not reported
> in the DOWN message ("2 sessions active", should have been "4").
>
>
>> Nov 30 00:22:26 aiputerlx haproxy[10400]: Server ROMA/mat is UP. 3
>> active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
>>
>> According to the log, Server ROMA/mat is DOWN at 00:21:47, however the
>> 4th request which was sent at 00:21:48 was still queued to ROMA/mat
>> which at that time was still down, as it is not up till 00:22:26. It
>> might have received a valid health check message and might have been
>> down going up. Am I perhaps misreading the logs here?
>>
>
> No, you're reading the logs correctly and I don't understand the problem
> either. There are two problems above IMHO :
> - 2 sessions were reported on the server while 4 were really being processed
> - 1 session was sent to the server after it was marked down.
>
> Could you try to change the LB algo from leastconn to roundrobin and
> reproduce the test above ? If only leastconn triggers the issue, then
> there might be a bug in its set_server_down() function. Maybe sometimes
> a server is not removed from the tree.
>
Ok, I think I might have tracked the problem down a bit more. I still
don't have a full grasp of which path exactly the requests go through,
but I might have identified one potential way of preventing this. The
problem is not that the server is not removed from the tree. That
happens correctly. However there seem to be several paths in
proto_http.c which bypass the load balancing tree with comments of the
form "We used to have a free connection slot. Since we'll never use it,
we have to inform the server that it may be used by another session. "
However the sever could have gone down in the mean time and neither
may_dequeue_tasks(server) nor process_srv_queue(server) check for the
status of the server before wakeing up tasks on that server. Adding a
check in either of these functions seem to do the trick, but again, I am
not sure if this is the right fix, or if my analysis was correct.
> ...
>
> Regards,
> Willy
>
>
Kai
Received on 2008/11/30 21:24
This archive was generated by hypermail 2.2.0 : 2008/11/30 21:30 CET