Hi Willy, I agree it's pretty confusing.
I should have been clearer - the problem does not happen every time, it's very random. But when it happens it always follows that exact pattern - that's what I meant to say.
I actually have somaxconn set to 10000 so I don't think that's the issue.
At this point I'm thinking about scrapping my EC2 instances and trying 2 new ones - you never know.
Just in case you have any other insights here's the output from the 3 commands you mentioned. Thanks again for all your help!
Lincoln
root#lb1:~$ uname -a
Linux domU-12-31-39-0A-92-72 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:39:36
EST 2008 i686 i686 i386 GNU/Linux
root#lb1:~$ netstat -i
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP
TX-OVR Flg
eth0 1500 0 67999261 0 0 0 70299595 0 0
0 BMRU
lo 16436 0 8045554 0 0 0 8045554 0 0
0 LRU
root#lb1:~$ netstat -s
Ip:
76004137 total packets received
2 with invalid addresses
0 forwarded
0 incoming packets discarded
76004135 incoming packets delivered
78424996 requests sent out
Icmp:
1700441 ICMP messages received
6 input ICMP message failed.
ICMP input histogram:
destination unreachable: 1485599 echo requests: 74856 echo replies: 139986
destination unreachable: 1484378 echo replies: 74856
15400091 active connections openings
1500044 passive connection openings
2110125 failed connection attempts
646 connection resets received
1 connections established
72811429 segments received
74607063 segments send out
56735 segments retransmited
1510 bad segments received.
781257 resets sent
Udp:
7887 packets received
1484378 packets to unknown port received.
0 packet receive errors
1781057 packets sent
UdpLite:
TcpExt:
2722 invalid SYN cookies received
1922 resets received for embryonic SYN_RECV sockets
712136 TCP sockets finished time wait in fast timer
22808 time wait sockets recycled by time stamp
851007 TCP sockets finished time wait in slow timer
39530 passive connections rejected because of time stamp
160 packets rejects in established connections because of timestamp
585822 delayed acks sent
23 delayed acks further delayed because of locked socket
Quick ack mode was activated 6840 times
19917 packets directly queued to recvmsg prequeue.
338 packets directly received from prequeue
15636688 packets header predicted
26116808 acknowledgments not containing data received
936810 predicted acknowledgments
130 times recovered from packet loss due to fast retransmit
7603 times recovered from packet loss due to SACK data
3 bad SACKs received
Detected reordering 22 times using FACK
Detected reordering 6 times using SACK
Detected reordering 13 times using reno fast retransmit
Detected reordering 119 times using time stamp
117 congestion windows fully recovered
542 congestion windows partially recovered using Hoe heuristic
TCPDSACKUndo: 43
14626 congestion windows recovered after partial ack
6847 TCP data loss events
60 timeouts after reno fast retransmit
1965 timeouts after SACK recovery
306 timeouts in loss state
12099 fast retransmits
3795 forward retransmits
9935 retransmits in slow start
23335 other TCP timeouts
TCPRenoRecoveryFail: 74
739 sack retransmits failed
6890 DSACKs sent for old packets
3367 DSACKs received
19 DSACKs for out of order packets received
643 connections reset due to unexpected data
240 connections reset due to early user close
201 connections aborted due to timeout
On Thu, Dec 3, 2009 at 12:16 AM, Willy Tarreau <w#1wt.eu> wrote:
> On Wed, Dec 02, 2009 at 07:44:40PM -0500, Lincoln wrote:
> > Thanks Willy for offering to help us out with this.
> >
> > We are running on an Amazon EC2 m1small instance which is very common for
> a
> > load balancer machine.
> >
> > I changed /proc/sys/net/ipv4/tcp_timestamps to 1 - unfortunately to no
> > effect.
>
> OK.
>
> > Here are my iptables settings (nothing special here that I can see - I
> > haven't modified anything):
> > root#lb1:~$ iptables -L
> > Chain INPUT (policy ACCEPT)
> > target prot opt source destination
> >
> > Chain FORWARD (policy ACCEPT)
> > target prot opt source destination
> >
> > Chain OUTPUT (policy ACCEPT)
> > target prot opt source destination
>
> OK so most likely it was not even loaded.
>
> > I would like to try accepting INVALIDs as you suggest - just to see if
> that
> > addresses the problem before digging deeper. Unfortunately I'm not very
> > familiar with iptables - could you show me what I should run to try that?
>
> you don't need to because you don't have any iptables rules, so those are
> implicitly allowed. The common case I was talking about was when people
> explicitly drop packets in invalid state.
>
> > If not that, perhaps something else about the EC2 infrastructure is using
> > sequence number randomization? Are there other things I can look for?
>
> If you don't have iptables, the your machine should have sent either a
> SYN/ACK or an ACK. If you really took the trace from the machine itself,
> then I have no explanation about the problem :-(
>
> You said that in every trace it was the same pattern, ie the first
> packet which was accepted was the SYN without timestamps. Are you
> absolutely sure it's *always* the case and it's not just random ?
> I'm asking because the system might refrain from sending a SYN/ACK
> when the TCP SYN backlog is full, which is completely independant
> from the SYN packet's shape. Your tcp parameters tuning were OK,
> but for the backlog you also need to set /proc/sys/net/core/somaxconn
> to a large value otherwise it serves as a max. By default it's very
> low (128). Try setting it to 10000 (you need to restart haproxy for
> the change to take effect).
>
> A "uname -a", "netstat -i" and "netstat -s" can help too.
>
> Regards,
> Willy
>
>
Received on 2009/12/03 06:29
This archive was generated by hypermail 2.2.0 : 2009/12/03 06:45 CET