Re: Maintenance mode

From: Alexander Staubo <alex#bengler.no>
Date: Fri, 12 Sep 2008 14:34:50 +0200


[Reposting this as the ML manager rejected it.]

On Thu, Sep 11, 2008 at 6:35 AM, Willy Tarreau <w#1wt.eu> wrote:
> In fact, what would be possible right now would be to start your backup
> server only when your are putting your servers in maintenance mode. That
> way, it will return the maintenance page only if you're working on them.
> But I agree that it's not much satisfactory either.

Yeah, that's not great. I intend to route requests to a static file running on the same Nginx server we use for all other static assets. (Static assets are obviously available during the maintenance period.) I don't want to maintain a separate Nginx config + init.d script just for that page.

> Using SHM would indeed be possible. There is just one thing I don't like
> with IPCs in general, it is that they are not cleaned up when a process
> dies. It's very common to find lots of remaining IPCs on a system where
> apps use them. So we have to find a way to ensure we either :
> - always clean them up
> - always use the same key (eg: put it in the conf)

I didn't know that. Does the same apply to mmap()? That's an alternative method of sharing memory which I am pretty sure is cleaned up automatically, since it's dependent on a file handle and not an arbitrary segment key.

>> Having such variables at hand would also let you do other tricks not
>> specifically related to maintenance. For example, you can have
>> external monitoring scripts that modify the behaviour of HAProxy based
>> on some sort of load parameter.
>
> There is still something to manage. If we use SHMs, we need to lock access
> to variables. Semaphores are out of question, as they're extremely expensive
> and dangerous. Spinlocks are possible and ideal but require that we link with
> libpthread, which is a new added dependency. We could also just count on atomic
> writes on integers on most architectures, and just read the variable twice to
> ensure that it is stable. This would be the cheapest in fact.

Yep, just assume atomic integers.

> In the mean time, I could propose you an alternative : make use of environment
> variables in the configuration, and just reload your config.
>
> Your config would look like this :
>
> acl maintenance_mode env_int(HAPROXY_MAINT) gt 0
> use_backend maintenance if maintenance_mode
>
> When you want to put it in maintenance mode, simply restart it that way :
> $ HAPROXY_MAINT=1 haproxy -f /etc/haproxy.cfg -sf $(pidof haproxy)
>
> then perform your changes and when finished :
> $ HAPROXY_MAINT=0 haproxy -f /etc/haproxy.cfg -sf $(pidof haproxy)
>
> We could even add the ability to set internal variables on the command line :
> $ haproxy -f /etc/haproxy.cfg -s HAPROXY_MAINT=1 -sf $(pidof haproxy)
>
> The advantage is that we will need it one day or another, and it is very
> easy to implement (the acl function will be getenv() followed by atol()).
> Later, a CLI will make it possible to set/unset variables.

That's definitely a workable approach, but one that requires shutting down the old HAProxy process and starting a new one, thus throwing away existing proxy state, including stats. I don't like restarting processes unnecessarily, especially not to just change a little flag.

I'm also suspicious of the current process handling and how it would play into this scenario. In my experience, the previous HAProxy tends to hang around for a while before it shuts down. Our init.d script comes from Ubuntu or Debian, and simply does a "-sf $(cat $PIDFILE)". It looks like a potential race condition if someone will switch the flag around so quickly that there are *two* HAProxy processes still running when you do the next restart.

The last time I tried that, I ended up with three HAProxy processes, two of them running with the same "-sf <old pid>" argument, implying that the new PID file had not been written yet at the time of restart, and further implying that the second process had sent the shutdown signal to the wrong process. The fact that there's a single PID file being used, but there can be multiple HAProxy processes, sounds dangerous to me -- or at best something that will confuse things. Indeed, we have had cases where multiple processes have been found running, apparently indefinitely.

Of course, the environment variable support would be better than nothing, so if you have a patch up your sleeve I would love to try it out. :-)

Alexander. Received on 2008/09/12 14:34

This archive was generated by hypermail 2.2.0 : 2008/09/12 14:46 CEST