Hi Willy, thanks for all your help with this issue. I upgraded to ubuntu
with a recent kernel and poof, the problem disappeared.
Thanks,
Lincoln
On Thu, Dec 3, 2009 at 12:59 AM, Willy Tarreau <w#1wt.eu> wrote:
> On Thu, Dec 03, 2009 at 12:29:55AM -0500, Lincoln wrote:
> > Hi Willy, I agree it's pretty confusing.
> >
> > I should have been clearer - the problem does not happen every time, it's
> > very random. But when it happens it always follows that exact pattern -
> > that's what I meant to say.
>
> OK, that's what I understood first, but I wanted confirmation.
>
> > I actually have somaxconn set to 10000 so I don't think that's the issue.
>
> indeed.
>
> > At this point I'm thinking about scrapping my EC2 instances and trying 2
> new
> > ones - you never know.
>
> One large site I know about had problems with some instances that were
> a lot slower than others, and looked like they were randomly losing a
> lot of packets (probably sharing the same machine as others saturating
> the bandwidth). When they switched to other instances, they discovered
> that some of them were immediately receiving attacks, most likely
> because they were abandonned by sites being attacked. It seems like
> what works well is already used and what you can find unused is probably
> bad... This site finally moved off there to solve their problems, which
> were undebugable in virtualized environments.
>
> > Just in case you have any other insights here's the output from the 3
> > commands you mentioned. Thanks again for all your help!
> >
> > Lincoln
> >
> > root#lb1:~$ uname -a
> > Linux domU-12-31-39-0A-92-72 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:39:36
> > EST 2008 i686 i686 i386 GNU/Linux
>
> I don't know if it's the latest Xen kernel available, but 2.6.21 does not
> sound like on of the best kernels to me, so maybe that can explain things,
> though I'm not specifically aware of issues in it. Don't you have anything
> more recent for these boxes ? This kernel was built almost 2 years ago, and
> given the number of critical security vulnerabilities since, there must
> have been updates.
>
> > root#lb1:~$ netstat -i
> > Kernel Interface table
> > Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP
> > TX-OVR Flg
> > eth0 1500 0 67999261 0 0 0 70299595 0 0
> > 0 BMRU
> > lo 16436 0 8045554 0 0 0 8045554 0 0
> > 0 LRU
>
> OK no drop here.
>
> > root#lb1:~$ netstat -s
> > Tcp:
> > 15400091 active connections openings
> > 1500044 passive connection openings
> > 2110125 failed connection attempts
>
> Is it expected that you have that many failed connection
> attempts ? Maybe one of your servers is down and it's just
> the health checks count, but it looks large for a health
> check. It's possible that we have the same problem on both
> sides.
>
> > TcpExt:
> > 2722 invalid SYN cookies received
>
> Do you have SYN cookies enabled ? If so, could you try disabling
> them ?
>
> > 1922 resets received for embryonic SYN_RECV sockets
> > 712136 TCP sockets finished time wait in fast timer
>
> That sounds a lot, how many connections per second do you get
> in average ? And from a same IP address ?
>
> > 39530 passive connections rejected because of time stamp
>
> Troubling ! Looks like what you're experiencing. I don't know
> under what condition it can happen. Maybe the sender's clock
> is going backwards when it reuses a same connection ?
>
> Willy
>
>
Received on 2009/12/09 01:16
This archive was generated by hypermail 2.2.0 : 2009/12/09 01:30 CET