Hypermail

From: dormando <dormando#rydia.net>
Date: Sun, 12 Sep 2010 14:53:52 -0700 (PDT)

> The problem is not the client but the server. When you're resending a
> request to it, you have to know whether it may have started processing
> your past request or not.

Yeah, my point is that semantically it doesn't seem to make a difference whether or not the LB is there if you close the connection mid-error. See below.

> Unfortunately, when doing multiplexing, it's possible that it was the first
> request for this client, and then it will not retry, but immediately return
> an error. However, I think this is a reasonable compromise. Connection drops
> are not *that* frequent and leaving it to the client to decide whether to
> repost or not is the only way to keep safe.

Yeah. It almost always indicates a real problem on the backend, for that particular page a client is trying to render, or some random bug. I view it as something that's only slightly incorrect since I can construct ways it'd fail in the same manner from a remote client issuing multiple requests over a single connection (first one returns an image, second one segfaults the server; just because it's keep-alive doesn't necessarily mean the two requests were related to the same click).

> Not exactly due to the specifics of multiplexing explained above.

I'm having a hard time figuring out how the client can tell the difference in a case of reusing the same keepalive connection over several clicks. If click 1 returns a page, opens a backend keepalive conn, then click 2 segfaults the server, will it retry?

Hmm, maybe nevermind. Think I see what you mean here.

> But as far as I understand it, you only have to process idempotent requests
> with mogilefs. I'm not saying those are easy, but if you keep a copy of the
> request, you can safely retry them, which makes it easier to hide the issue
> to the end user ;-)

Nope, not at all. This is a feature particular to *perlbal*, which doesn't mean the request comes through mogilefs. It's just also a part of mogilefs :)

Many very very large websites load balance their dynamic rendering farm with sets of perlbal servers. Which means POSTs, dynamic GETs, etc. There're other situations where (some?) browsers will resubmit things on their own (timeouts), and situations where users will go bananas on the submit button, so having the "resubmit on error" case in there didn't change how things were developed at all.

In fact, I've never honestly noticed it. When we were deploying varnish recently I had cooked in some retry logic which was causing us pain for a while. Turns out we were accidentally retrying POST's, which normally would've been okay, but varnish has some extra POST-dupe-defying logic where it'll empty out the POST body on a redispatched request :)

We just fixed the reason why they were failing in the first place and disabled the redispatch code. Received on 2010/09/12 23:53

Re: HAPROXY & pool of keep-alive connections to backend