Hi there,
I put the LB back live and it's been going strong for some 5 hours. I
established connections and grepped ip addresses using the netstat -antpoe
command so see whether connections were lingering on and am happy to say
that everything seems to be behaving normally.
We haven't had loads of traffic though so the real test is the weekend.... thank you lots guys
On 7 February 2010 11:37, Hank A. Paulson <hap#spamproof.nospammail.net>wrote:
> I don't know if those will solve the problem (I doubt they will), but if
> you put the machine back into the traffic stream - try to get a few outputs
> if things are going badly:
>
> * stats output from haproxy (socket or web page, pref socket)
> * netstat -antpoe output
> * netstat -s output
> * free -m output
> * haproxy http logs
> * iptables config output, if any
> * be sure to have a tail -f /var/log/messages running before you start the
> test to watch for conntrack and other messages
>
> That will provide clues to what may be the problem(s).
> Others will probably have ideas of other things to look for/capture while
> trying the configuration.
>
>
> On 2/7/10 2:20 AM, Peter Griffin wrote:
>
>> Hi there,
>> Ok I disabled selinux and increased check inter to 30s. I enabled an
>> http check of an asphx file because ASP is critical to the operation of
>> the site. It was already there but I disabled it earlier because of the
>> problems we were having:
>> option httpchk HEAD /testip.ashx HTTP/1.1\r\nHost:\ www.oursite.com
>> <http://www.oursite.com>
>>
>>
>> With regards to free, I'm ashamed to say that yes I did go after the
>> first line.
>>
>
> It happens to people who claim to be very linux savvy, so don't worry about
> it.
>
>
> I also did a yum upgrade but will postpone 1.4rc1 until I
>
>> see how this change responds. Will put the LB back online when the
>> traffic is not that heavy as I cannot risk another outage and hence my
>> job :)
>>
>> Will post a reply tomorrow afternoon.
>>
>> Thank you so much you've been great.
>>
>>
>>
>>
>>
>> On 7 February 2010 02:06, Hank A. Paulson <hap#spamproof.nospammail.net
>> <mailto:hap#spamproof.nospammail.net>> wrote:
>>
>> You have selinux on, so it may be unhappy with some part of haproxy
>> - the directory it uses, the socket listeners, etc. Turn it off (if
>> you can) until you get everything working ok. Turning it off
>> requires a reboot.
>>
>> To see if it is on:
>> # sestatus
>> google for how to turn it off
>>
>> I would back off the check inter to 30s or so and make it an http
>> check of a file that you know exists, if you can have any static
>> files on your servers. This will allow you to see that haproxy is
>> able to find that file, get a 200 response and verify that the
>> server is up.
>>
>> Also, when you say "free mem going down to 45Mb" are you looking at
>> the first line of "free" or the second line? Ignore the first line,
>> it is designed to cause panic. eg:
>>
>> $ free -m
>> total used free shared buffers
>> cached
>> Mem: 32244 32069 174 0 0
>> 19578
>> -/+ buffers/cache: 12490 19753
>> Swap: 4095 0 4095
>>
>> OMG, I only have 174MB of my 32GB of memory available!?!
>> - no, really 19.75 GB is still available.
>>
>> On your haproxy config, if you log errors separately then you can
>> tail -f that error-only log and watch it as you start up haproxy.
>> And why not do http logging if you are doing http mode? Maybe I am
>> missing something.
>>
>> I would back off the check inter to 30s or so and make it an http
>> check of a file that you know exists, if you can have any static
>> files on your servers. This will allow you to see that haproxy is
>> able to find that file, get a 200 response and verify that the
>> server is really is up and responding fully, not just opening a
>> socket. If you can switch to 1.4rc1 then you get alot more info
>> about the health check/health status on the stats page and you can
>> do set log-health-checks as an addition aid to troubleshooting.
>>
>>
>> global
>> log 127.0.0.1 local0
>> log 127.0.0.1 local1 notice
>> #log loghost local0 info
>> option log-separate-errors
>>
>> maxconn 4096
>> chroot /var/lib/haproxy
>> user haproxy
>> group haproxy
>> daemon
>> # debug
>> #quiet
>>
>> defaults
>> log global
>> mode http
>> # option httplog
>> option dontlognull
>> retries 3
>> option redispatch
>> maxconn 4096
>> contimeout 5s
>> clitimeout 30s
>> srvtimeout 30s
>>
>>
>> listen loadbalancer :80
>> mode http
>> balance roundrobin
>> option forwardfor except 10.0.1.50
>> option httpclose
>> option httplog
>> option httpchk HEAD /favicon.ico
>>
>> cookie SERVERID insert indirect nocache
>> server WEB01 10.0.1.108:80 <http://10.0.1.108:80>
>>
>> cookie A check inter 30s
>> server WEB05 10.0.1.109:80 <http://10.0.1.109:80>
>>
>> cookie B check inter 30s
>>
>>
>> listen statistics 10.0.1.50:8080 <http://10.0.1.50:8080>
>>
>> stats enable
>> stats auth stats:stats
>> stats uri /
>>
>> [BTW, Did you do a yum upgrade - not yum update after your install
>> of F12?, "yum update" misses certain kinds of packaging changes,
>> "yum upgrade" covers all updates, even if the name of a package
>> changes - yum upgrade should be the default used in yum examples - I
>> ask because many people don't do this and there are many security
>> fixes and other package bug fixes that have been posted]
>>
>>
>> On 2/6/10 6:59 AM, Peter Griffin wrote:
>>
>> Hi Will,
>> Yes X-Windows is installed, but the default init is runlevel 3 and
>> I
>> have not started X for the past couple of days. The video card
>> is an
>> addon card so I rule out shared memory.
>>
>> With regards to eth1 I ran iptraf and can see that there is no
>> traffic
>> on eth1 so I'd rule this out as well. I thought about listening
>> for
>> stunnel requests on eth1 10.0.1.51 and connecting to haproxy on
>> 10.0.1.50, but maybe this will cause more problems...
>> I had already ftp'd a file some 70MB to another machine on the
>> same Vlan
>> and I did not see any problems whatsoever. What I'm planning to
>> do now
>> is to setup the LB in another environment with another 2 Web
>> servers and
>> 1 DB server and stress the hell out of it. Then I can also test
>> the
>> network traffic using Iperf.
>> Will report back in a few days, thank you once more.
>>
>>
>>
>>
>> On 6 February 2010 14:29, Willy Tarreau <w#1wt.eu
>> <mailto:w#1wt.eu> <mailto:w#1wt.eu <mailto:w#1wt.eu>>> wrote:
>>
>> On Sat, Feb 06, 2010 at 01:16:00PM +0100, Peter Griffin wrote:
>> > Both http & https. Also both web servers started to take it in
>> turns to
>> > report as DOWN but more frequently the second one than the
>> first.
>> >
>> > I ran ethtool eth0 and can verify that it's full-duplex 1Gbps:
>>
>> OK.
>>
>> > I'm attaching dmesg, I don't understand most of it.
>>
>> well, it shows some video driver issues, which are unrelated
>> (did you
>> start a graphics environment on your LB ?). It seems it's
>> reserving
>> some memory (64 or 512MB, I don't understand well) for the
>> video. I
>> hope it's not a card with shared memory, as the higher the
>> resolution,
>> the lower the remaining memory bandwidth for normal work.
>>
>> But I don't see any iptables related issue there, so that's
>> fine.
>>
>> Stupid question, are you sure that your traffic passes via
>> eth0 (the
>> gig one) ? I'm asking, because eth1 is a cheap 100 Mbps
>> realtek 8139,
>> and if you got the routing wrong, it could explain a lot of
>> networking
>> issues !
>>
>> > I'll try to send a file
>> > in both directions to saturate the link as you suggested.
>>
>> OK.
>>
>> When doing that, don't bench the disks, just the network.
>> For that,
>> create "sparse files", which are empty files for which the
>> kernel
>> produces zeroes on the fly, and send them files to /dev/null.
>> Eg
>> with ftp :
>>
>> machine1$ dd if=/dev/null bs=1M count=0 seek=1024 of=1g.bin
>>
>> machine2$ ftp machine1
>> > recv 1g.bin /dev/null
>>
>>
>> Regards,
>> Willy
>>
>>
>>
>>
>>
>
Received on 2010/02/08 19:12
This archive was generated by hypermail 2.2.0 : 2010/02/08 19:15 CET