Make sure you set KeepAlive to off in Apache. That keeps more than one
request being queued at a time without multiple connections being open. You
can also have haproxy do this for you with option httpclose even if it's
enabled in Apache.
You could then use --histcount with iptables rules and limit on the number of connections / sec based on ip addresses...
> -----Original Message-----
> From: Wout Mertens [mailto:wout.mertens#gmail.com]
> Sent: Monday, November 16, 2009 9:19 AM
> To: John Lauro
> Cc: haproxy#formilux.org
> Subject: Re: Preventing bots from starving other users?
>
> On Nov 16, 2009, at 2:43 PM, John Lauro wrote:
>
> > Oopps, my bad... It's actually tc and not iptables. Google tc
> qdisc
> > for some info.
> >
> > You could allow your local ips go unrestricted, and throttle all
> other IPs
> > to 512kb/sec for example.
>
> Hmmm... The problem isn't the data rate, it's the work associated with
> incoming requests. As soon as a 500 byte request hits, the web server
> has to do a lot of work.
>
> > What software is the running on? I assume it's not running under
> apache or
> > there would be some ways to tune apache. As other have mentioned,
> telling
> > the crawlers to behave themselves or totally ignore the wiki with a
> robots
> > file is probably best.
>
> Well the web server is Apache, but surprisingly Apache doesn't allow
> for tuning this particular case. Suppose normal request traffic looks
> like (A are users)
>
> Time ->
>
> A A AA A A AAA A AA A
>
> With the bot this becomes
>
> ABBBBBBBBBB A BBBBA BBA BBBBBA AABBBBBB
>
> So you can see that normal users are just swamped out of "slots". The
> webserver can render about 9 pages at the same time without impact, but
> it takes a second or more to render. At first I set MaxClients to 9,
> which makes it so the web server doesn't swap to death, but if the bots
> have 8 requests queued up, and then another 8, and another 8, regular
> users have no chance of decent interactivity...
>
> This may be a corner case due to slow serving, because I'm having a
> hard time finding a way to throttle the bots. I suppose that normally
> you'd just add servers...
>
> Wout.
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.425 / Virus Database: 270.14.60/2495 - Release Date:
> 11/16/09 07:43:00
Received on 2009/11/16 15:38
This archive was generated by hypermail 2.2.0 : 2009/11/16 15:45 CET