I'm confused :)
I stopped logging so much in haproxy, but I get the same thing if I grep the nginx logs on this server: haproxy's mongrel backend checks start failing. I've noticed it only happens when using httpchk (or at least it happens much, much more quickly).
Here's an iostat I ran -- the first two are during the grep on the nginx logs; the last one is after I finished:
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 580.32 116.43 2094.73 11.76 359234.08 1049.90 171.04 6.56 3.12 0.48 101.44 dm-0 0.00 0.00 2675.25 127.99 359234.08 1023.94 128.51 35.64 12.82 0.36 101.44 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 242.11 407.49 776.72 245.34 148764.37 5222.67 150.66 2.88 2.85 0.48 48.83 dm-0 0.00 0.00 1018.22 652.63 148660.73 5221.05 92.10 10.99 6.59 0.29 48.83 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 3.46 211.79 7.32 83.94 102.44 2365.85 27.05 0.05 0.51 0.37 3.39 dm-0 0.00 0.00 10.77 295.73 102.44 2365.85 8.05 0.12 0.40 0.11 3.37 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Top showed cpu use around 16% for grep. There was still ram available. It's an 8 core machine.
Here's a bit out of the haproxy config:
global
        log 127.0.0.1 local0 notice #notice
        #log 127.0.0.1 local1 alert
        daemon
        nbproc 1
        maxconn 32000
        ulimit-n 65536
        # and uid, gid, daemon, maxconn...
defaults
        # setup logging
        log global
        option tcplog
        option httplog
        option dontlognull
        # setup statistics pages
        stats   enable
        stats uri  /haproxy_stats
        stats refresh 10
        stats auth    xxxxxxxxxxxxxx
        mode            http
        retries         20
        option          abortonclose
        option          redispatch
        timeout connect 10  # time waiting for backend server to accept
        timeout client  80s  # data from client (browser)
        timeout server  120s  # data from server (mongrel)
frontend rails *:7000
        # set connection limits
        maxconn         700
        default_backend mongrels
backend mongrels
        fullconn        80
        option httpchk HEAD /videos/heartbeat
        option abortonclose
        balance leastconn
        reqrep ^([^\ ]*)\ //videos(.*)     \1\ /videos\2
        server webapp01-101 10.2.0.3:8101 minconn 1 maxconn 5 check
inter 1s fastinter 200 rise 1 fall 1
        server webapp01-102 10.2.0.3:8102 minconn 1 maxconn 5 check
inter 1s fastinter 200 rise 1 fall 1
I'm not really sure what to try next.
On Wed, Feb 11, 2009 at 7:36 PM, John Lauro <john.lauro#covenanteyes.com> wrote:
> If you prefix the filename with a -, that tells syslog not to flush the data
> to disk immediately (which is only good in case of a kernel crash, as you
> might not miss a log entry leading to it, ha!), and it is inefficient
> forcing a sync every log entry when you have a log entry more at least every
> few seconds, and much more so when you have hundreds or thousands a second.
>
> Note, this isn't exactly standard in all versions of syslog.  Do a "man
> syslog.conf" to make sure it applies to your OS, but generally is standard
> with Linux, and other operating systems have others ways to allow for normal
> buffering.
>
> So, instead of a line like:
> *.debug;mail.none;authpriv.none;cron.none               /var/log/messages
> Use the following:
> *.debug;mail.none;authpriv.none;cron.none               -/var/log/messages
>
>
> How are you doing your log rotates?  That seems strange that it would cause
> problems, as that should be a relatively light process, unless you are doing
> something like compressing the logs when you rotate.  It should be just be a
> mv and then send a -HUP to syslogd.  Of course, if a simple grep is causing
> issues, then it sounds like you definitely have some sort of resource
> contention problem.
>
>
>
>> -----Original Message-----
>> From: Michael Fortson [mailto:mfortson#gmail.com]
>> Sent: Wednesday, February 11, 2009 8:38 PM
>> To: John Lauro
>> Cc: James Brady; haproxy#formilux.org
>> Subject: Re: Reducing I/O load of logging
>>
>> Hi John,
>>
>> Can you explain what you mean by making sure there's a dash in the
>> name? Is this for something like syslog via /var/log/messages?
>>
>> We've had some major issues when logrotate kicks in, knocking haproxy
>> offline (it reports most of the back-end servers as down and starts
>> throwing 503s). It's a busy site with 80 servers configured in the
>> backend. Reducing logrotate to 3 hour intervals shortened the
>> downtimes but it still happened. It does the same thing if I try to
>> grep the log.
>>
>> I've had to turn off logrotate for now...
>>
>>
>>
>>
>>
>> On Wed, Feb 11, 2009 at 10:01 AM, John Lauro
>> <john.lauro#covenanteyes.com> wrote:
>> > It shouldn't be too hard for a machine to keep up with logging.  How
>> are you
>> > logging?  standard syslog?   Make sure you have - in front of the
>> filename?
>> > How connections per second are you logging?
>> >
>> >
>> >
>> > Haven't done it with Haproxy, but have with other things that
>> generate tons
>> > of logs.
>> >
>> >
>> >
>> > what you could do is dump the logs (don't forget the - as part of the
>> file
>> > name) to /dev/shm/ (assuming linux), and then rotate the logs once a
>> minute
>> > and process them.  That way, you will not have any disk I/O from the
>> logs,
>> > but would increase memory requirements.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > From: James Brady [mailto:james.colin.brady#gmail.com]
>> > Sent: Wednesday, February 11, 2009 12:14 PM
>> > To: haproxy#formilux.org
>> > Subject: Reducing I/O load of logging
>> >
>> >
>> >
>> > The machine we run HAProxy on is approaching its limits in terms of
>> disk I/O
>> > due to our debug-level logging.
>> >
>> >
>> >
>> > The CPU is barely used, memory is no problem at all - the bottleneck
>> will
>> > soon be the rate at which we can log to disk. Because the machine is
>> more
>> > than adequate in all other areas, we'd really like to reduce the I/O
>> load of
>> > logging.
>> >
>> >
>> >
>> > We are using the debug log output to tell us the response time and
>> status
>> > code for various URLs - this is an essential task we can't do
>> without.
>> >
>> >
>> >
>> > Is there any way to get this information without logging and
>> > post-processing every request? Can HAProxy dump out averaged
>> statistics like
>> > this? Can we somehow reduce the I/O load by just logging the exact
>> fields
>> > we're interested in?
>> >
>> >
>> >
>> > Many thanks!
>> >
>> > James
>
>
>
Received on 2009/02/13 02:14
This archive was generated by hypermail 2.2.0 : 2009/02/13 03:30 CET