On Fri, Nov 20, 2009 at 01:55:37PM -0800, Jose Avila(Tachu) wrote:
> Thanks Willy, this usually happens when i send restart signals to a process while the older process has not finished.
>
> ej.
>
> proces 1000 is currently active so i send restart to haproxy with -sf 1000 new process is 1001
> the killing of process 1000 takes say 30 secs in that same 30 seconds my auto scale adds or removes another server and sends restarts haproxy with -sh 1001
OK so in fact you have multiple processes at a given instant when this happens.
> im not sure which of the both gets in D state but it only happens when im doing scaling of more than 1 host at the time. the server is currently processing about 200k requests per minute so it takes a bit to restart an instance.
It should not be long anyway, because the new process can bind to the post as soon as the old one has released it, which is almost instant. The fact that there are still established connections is irrelevant in this case. What can be long is the time the old process remains alive. It will stay here until the last session completes. But it will not bother the new process.
Oh I'm thinking about something. Check your free RAM when the problem happens. It's very possible that having multiple concurrent processes makes your system swap, which would exactly cause a D state. This is the reason I build with dlmalloc, because it is able to release unused memory since it uses mmap().
> I've changed my script to only add or remove 1 server at the time see if that helps.
>
> On another note, I've been looking for a concise guide on what kernel parameters i can tweak to improve performance. I gotta say im impressed already on how well it handles traffic. but i would like to perhaps try to squeeze a bit more. out of it on peaks each one of my load balancer is balancing about 100 backend servers and processing an average of 3k- 5k requests per second.
Check the list archives, there have already been some posts on the subject. The principle is always the same : don't use conntrack on the system, increase somaxconn and tcp_max_syn_backlog, enlarge the source ports range, set tcp_tw_reuse to 1 and reduce the default tcp_rmem/tcp_wmem values. Once that's done, you can observe and finely tune even deeper for your specific usage.
Regards,
Willy
Received on 2009/11/21 22:58
This archive was generated by hypermail 2.2.0 : 2009/11/21 23:00 CET