Hypermail

From: Michael Fortson <mfortson#gmail.com>
Date: Thu, 12 Feb 2009 17:14:11 -0800

I'm confused :)

I stopped logging so much in haproxy, but I get the same thing if I grep the nginx logs on this server: haproxy's mongrel backend checks start failing. I've noticed it only happens when using httpchk (or at least it happens much, much more quickly).

Here's an iostat I ran -- the first two are during the grep on the nginx logs; the last one is after I finished:

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util

sda             580.32   116.43 2094.73   11.76 359234.08  1049.90
171.04     6.56    3.12   0.48 101.44
dm-0              0.00     0.00 2675.25  127.99 359234.08  1023.94
128.51    35.64   12.82   0.36 101.44
dm-1              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sda             242.11   407.49  776.72  245.34 148764.37  5222.67
150.66     2.88    2.85   0.48  48.83
dm-0              0.00     0.00 1018.22  652.63 148660.73  5221.05
92.10    10.99    6.59   0.29  48.83
dm-1              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               3.46   211.79    7.32   83.94   102.44  2365.85
27.05     0.05    0.51   0.37   3.39
dm-0              0.00     0.00   10.77  295.73   102.44  2365.85
8.05     0.12    0.40   0.11   3.37
dm-1              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00   0.00   0.00

Top showed cpu use around 16% for grep. There was still ram available. It's an 8 core machine.

Here's a bit out of the haproxy config:

global

        log 127.0.0.1 local0 notice #notice
        #log 127.0.0.1 local1 alert

        daemon
        nbproc 1
        maxconn 32000
        ulimit-n 65536
        # and uid, gid, daemon, maxconn...

defaults
        # setup logging
        log global
        option tcplog
        option httplog
        option dontlognull

        # setup statistics pages
        stats   enable
        stats uri  /haproxy_stats
        stats refresh 10
        stats auth    xxxxxxxxxxxxxx


        mode            http
        retries         20
        option          abortonclose
        option          redispatch


        timeout connect 10  # time waiting for backend server to accept
        timeout client  80s  # data from client (browser)
        timeout server  120s  # data from server (mongrel)

frontend rails *:7000
        # set connection limits
        maxconn         700
        default_backend mongrels


backend mongrels
        fullconn        80

        option httpchk HEAD /videos/heartbeat
        option abortonclose
        balance leastconn

        reqrep ^([^\ ]*)\ //videos(.*)     \1\ /videos\2

        server webapp01-101 10.2.0.3:8101 minconn 1 maxconn 5 check
inter 1s fastinter 200 rise 1 fall 1
        server webapp01-102 10.2.0.3:8102 minconn 1 maxconn 5 check

inter 1s fastinter 200 rise 1 fall 1
( etc, for 80 instances over 5 servers)

I'm not really sure what to try next.

On Wed, Feb 11, 2009 at 7:36 PM, John Lauro <john.lauro#covenanteyes.com> wrote:
> If you prefix the filename with a -, that tells syslog not to flush the data
> to disk immediately (which is only good in case of a kernel crash, as you
> might not miss a log entry leading to it, ha!), and it is inefficient
> forcing a sync every log entry when you have a log entry more at least every
> few seconds, and much more so when you have hundreds or thousands a second.
>
> Note, this isn't exactly standard in all versions of syslog. Do a "man
> syslog.conf" to make sure it applies to your OS, but generally is standard
> with Linux, and other operating systems have others ways to allow for normal
> buffering.
>
> So, instead of a line like:
> *.debug;mail.none;authpriv.none;cron.none /var/log/messages
> Use the following:
> *.debug;mail.none;authpriv.none;cron.none -/var/log/messages
>
>
> How are you doing your log rotates? That seems strange that it would cause
> problems, as that should be a relatively light process, unless you are doing
> something like compressing the logs when you rotate. It should be just be a
> mv and then send a -HUP to syslogd. Of course, if a simple grep is causing
> issues, then it sounds like you definitely have some sort of resource
> contention problem.
>
>
>
>> -----Original Message-----
>> From: Michael Fortson [mailto:mfortson#gmail.com]
>> Sent: Wednesday, February 11, 2009 8:38 PM
>> To: John Lauro
>> Cc: James Brady; haproxy#formilux.org
>> Subject: Re: Reducing I/O load of logging
>>
>> Hi John,
>>
>> Can you explain what you mean by making sure there's a dash in the
>> name? Is this for something like syslog via /var/log/messages?
>>
>> We've had some major issues when logrotate kicks in, knocking haproxy
>> offline (it reports most of the back-end servers as down and starts
>> throwing 503s). It's a busy site with 80 servers configured in the
>> backend. Reducing logrotate to 3 hour intervals shortened the
>> downtimes but it still happened. It does the same thing if I try to
>> grep the log.
>>
>> I've had to turn off logrotate for now...
>>
>>
>>
>>
>>
>> On Wed, Feb 11, 2009 at 10:01 AM, John Lauro
>> <john.lauro#covenanteyes.com> wrote:
>> > It shouldn't be too hard for a machine to keep up with logging. How
>> are you
>> > logging? standard syslog? Make sure you have - in front of the
>> filename?
>> > How connections per second are you logging?
>> >
>> >
>> >
>> > Haven't done it with Haproxy, but have with other things that
>> generate tons
>> > of logs.
>> >
>> >
>> >
>> > what you could do is dump the logs (don't forget the - as part of the
>> file
>> > name) to /dev/shm/ (assuming linux), and then rotate the logs once a
>> minute
>> > and process them. That way, you will not have any disk I/O from the
>> logs,
>> > but would increase memory requirements.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > From: James Brady [mailto:james.colin.brady#gmail.com]
>> > Sent: Wednesday, February 11, 2009 12:14 PM
>> > To: haproxy#formilux.org
>> > Subject: Reducing I/O load of logging
>> >
>> >
>> >
>> > The machine we run HAProxy on is approaching its limits in terms of
>> disk I/O
>> > due to our debug-level logging.
>> >
>> >
>> >
>> > The CPU is barely used, memory is no problem at all - the bottleneck
>> will
>> > soon be the rate at which we can log to disk. Because the machine is
>> more
>> > than adequate in all other areas, we'd really like to reduce the I/O
>> load of
>> > logging.
>> >
>> >
>> >
>> > We are using the debug log output to tell us the response time and
>> status
>> > code for various URLs - this is an essential task we can't do
>> without.
>> >
>> >
>> >
>> > Is there any way to get this information without logging and
>> > post-processing every request? Can HAProxy dump out averaged
>> statistics like
>> > this? Can we somehow reduce the I/O load by just logging the exact
>> fields
>> > we're interested in?
>> >
>> >
>> >
>> > Many thanks!
>> >
>> > James
>
>
>
Received on 2009/02/13 02:14