Here's the result:http://pastie.org/387928
This box used to run everything (much of which has now been moved to other clusters). If I can't get it to behave it'll be doing nothing soon :)
log/messages isn't large enough to trigger a misbehavior, but hopefully it'll show something... I can't really do it on the nginx log (which is massive) because I always have to kill that before enough backend tests flip over to cause a site outage.
On Thu, Feb 12, 2009 at 6:44 PM, John Lauro <john.lauro#covenanteyes.com>wrote:
> > I stopped logging so much in haproxy, but I get the same thing if I
> > grep the nginx logs on this server: haproxy's mongrel backend checks
> > start failing. I've noticed it only happens when using httpchk (or at
> > least it happens much, much more quickly).
> >
> > Here's an iostat I ran -- the first two are during the grep on the
> > nginx logs; the last one is after I finished:
>
> The iostat looks ok.
>
> Cut-n-past the following (or run from a script) so we can get a better idea
> of the box's general load and to see if they turn up anything:
>
> cat /proc/interrupts
> free
> netstat --inet -n | awk '{ print $6 }' | sort | uniq -c
> ulimit -a
> vmstat 1 10 & ( sleep 5 ; grep whatever /var/log/messages >/dev/null )
> cat /proc/interrupts
> echo lsof count `lsof | wc -l`
>
> What type of disk subsystem do you have? Given how it chokes when doing a
> grep, it almost sounds like you might have a faulty driver. You do realize
> 8 cores is overkill for this, unless you are running other stuff on the
> box.
> The two checks on the interrupts is to see if something (especially disk
> I/O) is generating too many as we need to look at the difference.
>
>
>
Received on 2009/02/13 05:20
This archive was generated by hypermail 2.2.0 : 2009/02/13 06:30 CET