Hypermail

From: Lincoln <linxbetter#gmail.com>
Date: Thu, 3 Dec 2009 00:29:55 -0500

Hi Willy, I agree it's pretty confusing.

I should have been clearer - the problem does not happen every time, it's very random. But when it happens it always follows that exact pattern - that's what I meant to say.

I actually have somaxconn set to 10000 so I don't think that's the issue.

At this point I'm thinking about scrapping my EC2 instances and trying 2 new ones - you never know.

Just in case you have any other insights here's the output from the 3 commands you mentioned. Thanks again for all your help!

Lincoln

root#lb1:~$ uname -a
Linux domU-12-31-39-0A-92-72 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:39:36 EST 2008 i686 i686 i386 GNU/Linux

root#lb1:~$ netstat -i
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0 67999261 0 0 0 70299595 0 0 0 BMRU
lo 16436 0 8045554 0 0 0 8045554 0 0 0 LRU root#lb1:~$ netstat -s
Ip:

    76004137 total packets received
    2 with invalid addresses
    0 forwarded
    0 incoming packets discarded
    76004135 incoming packets delivered
    78424996 requests sent out
Icmp:

    1700441 ICMP messages received
    6 input ICMP message failed.
    ICMP input histogram:

        destination unreachable: 1485599
        echo requests: 74856
        echo replies: 139986

    1559234 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:

        destination unreachable: 1484378
        echo replies: 74856

Tcp:

    15400091 active connections openings     1500044 passive connection openings
    2110125 failed connection attempts
    646 connection resets received
    1 connections established
    72811429 segments received
    74607063 segments send out
    56735 segments retransmited
    1510 bad segments received.
    781257 resets sent
Udp:

    7887 packets received
    1484378 packets to unknown port received.     0 packet receive errors
    1781057 packets sent
UdpLite:
TcpExt:

    2722 invalid SYN cookies received
    1922 resets received for embryonic SYN_RECV sockets     712136 TCP sockets finished time wait in fast timer     22808 time wait sockets recycled by time stamp     851007 TCP sockets finished time wait in slow timer     39530 passive connections rejected because of time stamp     160 packets rejects in established connections because of timestamp     585822 delayed acks sent
    23 delayed acks further delayed because of locked socket     Quick ack mode was activated 6840 times     19917 packets directly queued to recvmsg prequeue.     338 packets directly received from prequeue     15636688 packets header predicted
    26116808 acknowledgments not containing data received     936810 predicted acknowledgments
    130 times recovered from packet loss due to fast retransmit     7603 times recovered from packet loss due to SACK data     3 bad SACKs received
    Detected reordering 22 times using FACK     Detected reordering 6 times using SACK     Detected reordering 13 times using reno fast retransmit     Detected reordering 119 times using time stamp     117 congestion windows fully recovered     542 congestion windows partially recovered using Hoe heuristic     TCPDSACKUndo: 43
    14626 congestion windows recovered after partial ack     6847 TCP data loss events
    60 timeouts after reno fast retransmit     1965 timeouts after SACK recovery
    306 timeouts in loss state
    12099 fast retransmits
    3795 forward retransmits
    9935 retransmits in slow start
    23335 other TCP timeouts
    TCPRenoRecoveryFail: 74
    739 sack retransmits failed
    6890 DSACKs sent for old packets
    3367 DSACKs received
    19 DSACKs for out of order packets received     643 connections reset due to unexpected data     240 connections reset due to early user close     201 connections aborted due to timeout

On Thu, Dec 3, 2009 at 12:16 AM, Willy Tarreau <w#1wt.eu> wrote:

> On Wed, Dec 02, 2009 at 07:44:40PM -0500, Lincoln wrote:
> > Thanks Willy for offering to help us out with this.
> >
> > We are running on an Amazon EC2 m1small instance which is very common for
> a
> > load balancer machine.
> >
> > I changed /proc/sys/net/ipv4/tcp_timestamps to 1 - unfortunately to no
> > effect.
>
> OK.
>
> > Here are my iptables settings (nothing special here that I can see - I
> > haven't modified anything):
> > root#lb1:~$ iptables -L
> > Chain INPUT (policy ACCEPT)
> > target prot opt source destination
> >
> > Chain FORWARD (policy ACCEPT)
> > target prot opt source destination
> >
> > Chain OUTPUT (policy ACCEPT)
> > target prot opt source destination
>
> OK so most likely it was not even loaded.
>
> > I would like to try accepting INVALIDs as you suggest - just to see if
> that
> > addresses the problem before digging deeper. Unfortunately I'm not very
> > familiar with iptables - could you show me what I should run to try that?
>
> you don't need to because you don't have any iptables rules, so those are
> implicitly allowed. The common case I was talking about was when people
> explicitly drop packets in invalid state.
>
> > If not that, perhaps something else about the EC2 infrastructure is using
> > sequence number randomization? Are there other things I can look for?
>
> If you don't have iptables, the your machine should have sent either a
> SYN/ACK or an ACK. If you really took the trace from the machine itself,
> then I have no explanation about the problem :-(
>
> You said that in every trace it was the same pattern, ie the first
> packet which was accepted was the SYN without timestamps. Are you
> absolutely sure it's *always* the case and it's not just random ?
> I'm asking because the system might refrain from sending a SYN/ACK
> when the TCP SYN backlog is full, which is completely independant
> from the SYN packet's shape. Your tcp parameters tuning were OK,
> but for the backlog you also need to set /proc/sys/net/core/somaxconn
> to a large value otherwise it serves as a max. By default it's very
> low (128). Try setting it to 10000 (you need to restart haproxy for
> the change to take effect).
>
> A "uname -a", "netstat -i" and "netstat -s" can help too.
>
> Regards,
> Willy
>
>
Received on 2009/12/03 06:29

Re: weird tcp syn/ack problem