Hypermail

From: Krzysztof Oledzki <ole#ans.pl>
Date: Sat, 20 Oct 2007 14:54:00 +0200 (CEST)

On Sat, 20 Oct 2007, Willy Tarreau wrote:

> Hi Krzysztof,

Hi Willy,

> On Sat, Oct 20, 2007 at 12:21:49AM +0200, Krzysztof Oledzki wrote:
>> Hello,
>>
>> This is maybe not strictly haproxy related but I believe that it is worth
>> to notice that recently there were two quite important fixes that can
>> dramatically improve performance of haproxy installed on a linux server
>> with conntrack enabled, especially on the most recent kernels (2.6.22+?)
>> that have tcp port randomisation feature implemented:
>>
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=17311393f969090ab060540bd9dbe7dc885a76d5
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=bc34b841556aad437baf4199744e55500bfa2088
>>
>> If any of you are interested, there is a full thread describing the
>> problem:
>> http://marc.info/?t=119081130100010&r=2&w=4
>> http://marc.info/?t=119081130100010&r=1&w=4
>
> Quite interesting, it reminds me the old days when I put netfilter-based
> firewalls in production for the first time.... I got 10% drops because
> at this time it would not accept a SYN during TIME_WAIT.

This is exactly what I get, but I managed to workaround it temporairly allowing haproxy to setup a pool of addresses used in a roundrobin mode:

backend some-name

         mode http
         balance roundrobin
         cookie SERVERID insert indirect nocache

         retries 4
         redispatch

         source 192.168.150.11
         source 192.168.150.12
         source 192.168.150.13
         source 192.168.150.14
         source 192.168.150.15
         source 192.168.150.16
         source 192.168.150.17
         source 192.168.150.18
         source 192.168.150.19

 	server (...)

It helped _a lot_ but still it did not resolved this problem completely - I still get about 1% unsuccessful connections. Unfortunately Linux can still use a port waiting in a TIME_WAIT, even if there are other "free" ports.

> I remember to have worked with Joszef precisely on the part which was
> changed above, and I'm not sure that those changes are enough.

Did your work got pushed into the kernel?

> In fact, what is strange is that the TCP stack on the peer accepts the
> SYN. I've very used to encounter this problem when testing firewalls
> for instance. You simply chain an HTTP client, a firewall which randomizes
> ISN (PIX or OpenBSD) and an HTTP server.

No, there is no firewall which randomizes ISN, only Linux & Windows. Both ISN and port randomization is performed by my Linux server (IP stack feature).

> The common problem is that once you have rolled over the range of source
> ports, the traffic falls down to a very low rate, and you observe this :

With newest kernels (src port randomization code is there) this problem may appear _much_ faster as there is no need to roll over to hit previously used port. This is the reason why "source pool" only made this less likely to happen.

> 1. C ---[SYN(SEQ=X)]---------> FW ---[SYN(SEQ=Y)]---------> S
> 2. C <--[ACK(ACK=Z)]---------> FW <--[ACK(ACK=Z)]---------- S
> 3. C ---[RST(SEQ=Z+1)]-------> FW ---[RST(SEQ=Z+1)]-------> S
> ( 3 seconds delay )
> 4. C ---[SYN(SEQ=X)]---------> FW ---[SYN(SEQ=Y)]---------> S
> 5. C <--[SYN/ACK(ACK=X+1)]---> FW <--[SYN/ACK(ACK=Y+1)]---- S
> 6. C ---[ACK(SEQ=X+1)]-------> FW ---[ACK(SEQ=X+1)]-------> S
>
> The reason is S geting a SYN with a SEQ lower than what it has in its
> table for a TIME_WAIT session. Thus it naturally just sends an ACK to
> remind its peer where it was last time, but the peer obviously refuses
> a simple ACK in response to a SYN, then sends an RST which definitely
> terminates the session on S. When C retries its SYN, S is happy and
> accepts it.
>
> The two solutions I know to this problem are :
> 1) enable PAWS (echo 1 > tcp_timestamps)
> This is the cleaner method as it was invented exactly for that
> problem of ISNs rolling over in too short a time. It requires
> both the client, the server and the firewall to support it,
> though. But while the real problem would be on the firewall,
> we can note that those which are able to randomize ISNs generally
> support PAWS.

Yes, I'm using timestamps, maybe this explains why my Windows server accepts such connections.

> 2) disable randomization on the firewall. This is also a solution,
> but it more often hides a real problem than fix it. In fact,
> while adding random breaks compliance with the RFC (which clearly
> states that ISNs must monotonically grow), it also enlights a
> real problem with the way the connections are handled in the
> whole chain.

Like I said - there is no randomization on the firewall. Strictly speaking - there is no other device (except a L2 switch) between haproxy (Linux) and Window Servers.

> In your case, you fixed the firewall, which was the first one to
> block. But I'm surprized that the server accepts your SYNs. Maybe
> it's because the TCP stack is different (windows). As Patrick said
> it in the discussion, it would be better to add PAWS to netfilter
> (and 8 more bytes aren't that much of a problem, considering the
> current size of the session table).
>
> I'll see if the patches are also relevant to my 2.4-based kernels
> (since I still get quite a higher performance with 2.4 than 2.6).

OK. BTW: what do you think about this "source pool" idea? Initially I thought that it is only a workaround for a bug existed outside the haproxy, but since I already mentioned about this patch I start wondering if such functionality may be useful. If so, I can clean this patch and push it to you.

Best regards,

Krzysztof Olêdzki Received on 2007/10/20 14:54

Re: haproxy & linux firewall (netfilter)