Hi Adam,
On Thu, Jan 10, 2008 at 03:29:46PM -0600, Adam Fritzler wrote:
> Hi guys,
>
> I've been running haproxy in production for several months, and it's been great. However, recently I discovered a problem with responses of a certain size being truncated.
>
> curl -sfv 'http://xxx/x'
> * About to connect() to xxx port nnnn
> * Trying n.n.n.n... connected
> * Connected to xxx port nnnn
> > GET /x HTTP/1.1
> > User-Agent: curl/7.15.5 (x86_64-pc-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8c zlib/1.2.3 libidn/0.6.5
> > Host: xxx:nnnn
> > Accept: */*
> >
> < HTTP/1.0 200 OK
> < Content-Length: 6104
> < Content-Type: text/plain
> < Connection: close
> transfer closed with 1876 bytes remaining to read
> * Closing connection #0
>
> Note the warning from curl about "transfer closed with 1876 bytes remaining to be read". Indeed, The output file ends up being only 4228 bytes. This happens reliably for responses 6-8kb in size, but not for small responses or responses significantly larger. (I assume this is some sort of timing issue and that's just happens to be where the boundary falls, not anything related to that number of bytes specifically.)
>
> tcpdump shows that the backend is sending the complete response, but haproxy is sending RST to the backend, and FIN to the client. The haproxy log shows a successful "- - ----" in the logs.
Oh, that's really bad. It's a serious problem. Could you post your config file ? I really want to reproduce this. Change your IP addresses and remove anything sensible, but please do not touch timeouts nor options.
> Has anyone seen this before?
No, this is completely new to me. Also, a lot of people are running several flavours of 1.3 with epoll in production. Could you indicate what kernel version you're using, what gcc version, what type of machine, etc... Anything which might justify either timing or code changes in fact. If it's a timing bug, it's fairly possible that you have all the conditions required to get it while others never see it.
Oh, BTW, could you try with the binary from haproxy's site ? At least is will once for all tell us if it's a build-time problem or run-time problem.
> I can seemingly reliably 'fix' it by recompiling without epoll support and using traditional poll. I'm running 1.3.13.1 in production, but I just tried with 1.3.14.1, and the behavior is the same -- works with poll, fails with epoll (sepoll). However, with 1.3.14.1 I did not use USE_MY_EPOLL, as that did not compile on my test box.
In fact, you do not need to remove epoll support, you can start with "-ds" to disable sepoll and "-de" to disable epoll, or alternatively, add "nosepoll" and "noepoll" in the global section of your configuration.
Thanks,
Willy
Received on 2008/01/11 06:38
This archive was generated by hypermail 2.2.0 : 2008/01/11 07:00 CET