Thanks, I'm certainly on the same page with you... I've certainly run into
this kind of stuff when there's a run away process and the like.
That said, I've run the production log of ruby through their log analysis tool (request-log-analyzer). That tool will chew on the logs and give you the min, max, and average load times for any transaction that occurs in Ruby... thing is... I don't see any query with a taking over a few seconds.
Essentially, from what I can tell, apache is responding within seconds... or if it's not, it's not tossing any errors to the access or error logs.
PhilD
On Fri, Jul 30, 2010 at 9:52 AM, Willy Tarreau <w#1wt.eu> wrote:
> Hi Phil,
>
> On Wed, Jul 28, 2010 at 02:06:57PM -0400, Phil Dupont wrote:
> > I could use a hand... we recently switched to HAProxy to do some
> > basic load-balancing for our site. Nothing too fancy... basically just
> > splitting traffic between two apache servers running a ruby on rails
> site.
> >
> > Mostly, everything is fantastic. However, periodically we get server
> > timeouts (sH in the error logs) and users see the 504 gateway timeout
> error.
> > Not a ton, but a few hundred throughout the day.
>
> That's very common on some applications which have a few complex
> requests that can take ages. A few in a day may mean that there is
> 100% CPU for the time you have configured in your timeouts, or 100%
> I/O on the database, but that amount of time is not large enough to
> be reported in graphs or monitoring.
>
> > What's killing me is that there doesn't appear to be a logical reason...
> CPU
> > load is non-existent on all the boxes (less than 1%), memory use on the
> > boxes are low (we have about a gig of ram free on each), connections are
> > fine (at the most 30 concurrent apache connections on each box), and we
> > don't see any run away processes on the MySQL database.
>
> That would really match what I describe above. If you're not looking
> at the exact moment the problem manifests itself, you can't see anything.
>
> > Further, when I look at the apache and ruby error and connection logs....
> I
> > don't see any errors being tossed.
>
> You should isolate the 504 from haproxy's logs, you'll get many information
> about where the time is spent and whether those are always the same
> requests
> or not. I suspect only one request is concerned and you'll find "sH" flags
> indicating that the server has failed to respond in time, with the
> associated
> time in the last field of the timers.
>
> > Basically, from what I can tell, for no good reason we're randomly
> getting
> > timeouts...
> >
> > Any ideas where I can look for the cause of the problem? Anyone else
> > encounter this? Anything else I should consider looking at?
>
> Quite frankly, the most common cause for 504 are long database requests.
> But I don't know what's the cause in your case.
>
> Regards,
> Willy
>
>
Received on 2010/07/30 20:57
This archive was generated by hypermail 2.2.0 : 2010/07/30 21:00 CEST