On Sat, 20 Oct 2007, Willy Tarreau wrote:
> On Sat, Oct 20, 2007 at 02:54:00PM +0200, Krzysztof Oledzki wrote:
>>> Quite interesting, it reminds me the old days when I put netfilter-based
>>> firewalls in production for the first time.... I got 10% drops because
>>> at this time it would not accept a SYN during TIME_WAIT.
>>
>> This is exactly what I get, but I managed to workaround it temporairly
>> allowing haproxy to setup a pool of addresses used in a roundrobin mode:
>>
>> backend some-name
>> mode http
>> balance roundrobin
>> cookie SERVERID insert indirect nocache
>>
>> retries 4
>> redispatch
>>
>> source 192.168.150.11
>> source 192.168.150.12
>> source 192.168.150.13
>> source 192.168.150.14
>> source 192.168.150.15
>> source 192.168.150.16
>> source 192.168.150.17
>> source 192.168.150.18
>> source 192.168.150.19
>>
>> server (...)
>
> I assume that you put *one* source address per server entry.
No, with each new connection haproxy gets next source address from above list. There are no source address defined per server.
<CUT>
>> Did your work got pushed into the kernel?
>
> Yes, and in fact you're using it ;-)
>
> $ grep -iA3 willy /usr/src/linux-2.6.20/net/ipv4/netfilter/ip_conntrack_proto_tcp.c
> * Willy Tarreau:
> * - State table bugfixes
> * - More robust state changes
> * - Tuning timer parameters
OK, thank you.
> I had to add several states to the FSM and to fix several transitions too.
> We've been working hard with Jozsef, because it's very tempting to reject
> non-conform traffic, but we must refrain from it. I used to grab logs and
> captures on the production system, try to analyze, reproduce, and propose
> fixes. Jozsef is a very nice person to work with BTW.
Ineed, I can confirm this. :)
>>> In fact, what is strange is that the TCP stack on the peer accepts the
>>> SYN. I've very used to encounter this problem when testing firewalls
>>> for instance. You simply chain an HTTP client, a firewall which randomizes
>>> ISN (PIX or OpenBSD) and an HTTP server.
>>
>> No, there is no firewall which randomizes ISN, only Linux & Windows. Both
>> ISN and port randomization is performed by my Linux server (IP stack
>> feature).
>
> What is very strange is that linux uses random increments, so your ISNs
> should not wrap in a matter of a few seconds.
Good point. I need to investigate this.
>>> The common problem is that once you have rolled over the range of source
>>> ports, the traffic falls down to a very low rate, and you observe this :
>>
>> With newest kernels (src port randomization code is there) this problem
>> may appear _much_ faster as there is no need to roll over to hit
>> previously used port. This is the reason why "source pool" only made this
>> less likely to happen.
>
> I think there has been another change to randomize ISNs, otherwise I cannot
> explain what you get!
I am not sure exactly when (2.6.21 or 2.6.22) but there is a new code for source port randomization. Dunno about ISNs. :(
>>> 1. C ---[SYN(SEQ=X)]---------> FW ---[SYN(SEQ=Y)]---------> S
>>> 2. C <--[ACK(ACK=Z)]---------> FW <--[ACK(ACK=Z)]---------- S
>>> 3. C ---[RST(SEQ=Z+1)]-------> FW ---[RST(SEQ=Z+1)]-------> S
>>> ( 3 seconds delay )
>>> 4. C ---[SYN(SEQ=X)]---------> FW ---[SYN(SEQ=Y)]---------> S
>>> 5. C <--[SYN/ACK(ACK=X+1)]---> FW <--[SYN/ACK(ACK=Y+1)]---- S
>>> 6. C ---[ACK(SEQ=X+1)]-------> FW ---[ACK(SEQ=X+1)]-------> S
>>>
>>> The reason is S geting a SYN with a SEQ lower than what it has in its
>>> table for a TIME_WAIT session. Thus it naturally just sends an ACK to
>>> remind its peer where it was last time, but the peer obviously refuses
>>> a simple ACK in response to a SYN, then sends an RST which definitely
>>> terminates the session on S. When C retries its SYN, S is happy and
>>> accepts it.
>>>
>>> The two solutions I know to this problem are :
>>> 1) enable PAWS (echo 1 > tcp_timestamps)
>>> This is the cleaner method as it was invented exactly for that
>>> problem of ISNs rolling over in too short a time. It requires
>>> both the client, the server and the firewall to support it,
>>> though. But while the real problem would be on the firewall,
>>> we can note that those which are able to randomize ISNs generally
>>> support PAWS.
>>
>> Yes, I'm using timestamps, maybe this explains why my Windows server
>> accepts such connections.
>
> Maybe (I said *maybe*) linux completely randomizes the ISNs when timestamps
> are enabled ? You may want to retry with timestamps disabled. Anyway, I
> think it would be time to implement PAWS in netfilter :-/
I agree but I do not feel brave enough to do it myself. :|
<CUT>
>> OK. BTW: what do you think about this "source pool" idea? Initially I
>> thought that it is only a workaround for a bug existed outside the
>> haproxy, but since I already mentioned about this patch I start wondering
>> if such functionality may be useful. If so, I can clean this patch and
>> push it to you.
>
> Since I've already implemented it in another program, I know that when
> you do this, you also need to manage the source ports yourself.
Only when you want to implement it this way. In my solution it simply was:
+struct source_addr_pool { + struct sockaddr_in addr; + struct source_addr_pool *next; +}; +
struct proxy {
(...)
- struct sockaddr_in source_addr; /* the address to which we want to bind for connect() */ + struct source_addr_pool *source_addr; /* pool of addresses to which we want to bind for connect() */ + struct source_addr_pool *curr_sa; /* the address to which we want to bind for connect() */}
(...)
- if (bind(fd, (struct sockaddr *)&s->be->source_addr, sizeof(s->be->source_addr)) == -1) { + struct source_addr_pool *sap = s->be->curr_sa; + + s->be->curr_sa=sap->next?sap->next:s->be->source_addr; + if (bind(fd, (struct sockaddr *)&(sap->addr), sizeof(sap->addr)) == -1) {
Of course this is only a short example, there are more places requiring changes.
<CUT>
> I'm not really sure this is interesting to do. In your case, the bug is
> between linux and the firewall which runs on it (netfilter). It's not
> expected that if you enable timestamps exactly to fix this problem, it
> makes the problem worse !
OK. I am also not sure and that is why I have never pushed this.
Best regards,
Krzysztof Olędzki Received on 2007/10/20 16:03
This archive was generated by hypermail 2.2.0 : 2007/11/04 19:21 CET