Re: Low performance in solaris 9

From: <manuelsspace-listas#yahoo.com>
Date: Wed, 7 May 2008 16:03:16 -0700 (PDT)


Hello + thanks,

I have changed some items  

  1. rebuild haproxy. 1.2.17 now
  2. compile pcre too, it was omitted before
  3. haproxy's config reduced as

global

        daemon
        maxconn 1000      # warning: this has to be 3 times the expected value!
        log localhost local0

defaults
        mode    http
        balance roundrobin
        option  dontlognull
        option  httpclose
        retries 1
        redispatch
        maxconn         2000
        contimeout      5000
        clitimeout      50000
        srvtimeout      50000
        stats enable


listen  backend1
        bind    :8001
        log     global
        option  httplog
        capture request header X-Forwarded-For len 15
        option  httpchk /check.html
        cookie  PROXY_SERVERID
        # always down
        server  apache1 10.27.40.81:1024 maxconn 100 check inter 2000 fall 3 cookie server1
        # small pc, some times down
        server  apache2 10.21.19.27:80   maxconn 2  check inter 2000 fall 3 cookie server2
        #always running
        server  apache3 10.27.40.81:80   maxconn 200 check inter 2000 fall 3 cookie server14



4) 1K object test, 10 threads during 200secs

$ prstat -T -v 1 -u msoto 10

   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP   4255 msoto 4.9 26 - - - - 0.0 - 4K 2K 75K 0 haproxy/1 ...  

 $ vmstat 10
 kthr memory page disk faults cpu  r b w swap free re mf pi po fr de sr dd s0 -- -- in sy cs us sy id  0 0 0 1206800 184952 5 40 1 0 0 0 0 0 0 0 0 496 269 132 1 1 98  0 0 0 1186000 161104 0 1 0 0 0 0 0 0 0 0 0 3803 11096 1800 7 42 51  0 0 0 1186000 161104 0 0 0 0 0 0 0 0 0 0 0 3722 10801 1755 9 38 53  0 0 0 1186000 161104 0 0 0 0 0 0 0 0 0 0 0 3782 10982 1794 7 38 56  0 0 0 1186000 161104 0 0 0 0 0 0 0 0 0 0 0 3774 10926 1781 7 39 54  0 0 0 1186000 161104 0 0 0 0 0 0 0 0 0 0 0 3802 11133 1817 7 41 52  0 0 0 1186000 161104 0 0 0 0 0 0 0 0 0 0 0 3736 10912 1791 6 40 54 ...

[################100%##################]  200s/200s

Requests: 32835
Errors: 117
Avg Response Time: 0.053
Avg Throughput: 163.79
Current Throughput: 169
Bytes Received: 336373758

$ netstat|grep 10.21.19.27|awk '{print $7}'|sort | uniq -c
  15 ESTABLISHED
3831 TIME_WAIT
#// this is a hi value observed, all snapshot was near thees numbers. This may be the issue

HAProxy does not reports any error. Connections rejected at OS layer

At the backend, this is what I saw.

$ netstat|awk '/sg5ts14:http/{print $6}'|sort | uniq -c

      1 FIN_WAIT1
      8 FIN_WAIT2
      1 SYN_RECV

   2941 TIME_WAIT
...
$ netstat|awk '/sg5ts14:http/{print $6}'|sort | uniq -c

      5 FIN_WAIT2
   4425 TIME_WAIT
...

When the test is executed directly from the client to the backend, no more that 97 TIME_WAIT was observed in a 10 simultaneous connection test. keepalive observer in server status.

Do I have to increase rlim_fd_cur & rlim_fs_max variables to include ESTABLISHED + TIME_WAIT? or is there an argument to reduce TIME_WAIT timeout?

Thanks,
Manuel Soto
Get Firefox!

Hi,

On Mon 05.05.2008 08:56, manuelsspace-listas#yahoo.com wrote:

>Hello List,
>
>   The -Vd output is:
>
>
>root#sunexplor # ./haproxy -f digitel.cfg -Vd
>
>[WARNING]
>125/093451 (2187) : parsing [digitel.cfg:34]: keyword 'redispatch' is
>deprecated, please use 'option redispatch' instead.
>Available polling systems :
>       poll : pref=200,  test result OK
>     select : pref=150,  test result OK
>Total: 2 (2 usable), will use poll.
>Using poll() as the polling mechanism.

Thanks

>The test was stablish as 2 scenarios:
>a) test using haproxy, 1 client and 1 backend.
>b) the same client and the same backend but w/out haproxy. In this
>case, 200 threads x 200 seconds loading 10k html file each time
>
>We have observed 100 connection at top load + 100 queue in haproxy
>console in apache3. I don't know if this is associated to 200 poll
>available as -Vd display

Nope the 200 is only the event preference.

>Which client?, pylot 1.1, command line mode (http://www.pylot.org)

>Haproxy config:

[snipp]

>maxconn         2000
>contimeout      5000
>clitimeout      50000
>srvtimeout      50000
>stats enable

[snipp]

>Haproxy
>will be used internally balancing web services. 300 users expected + up
>to 100 threads running in a weblogic requesting services or smpp (up to
>100tps aprox)

Have you some CLOSE_WAITS and/or TIME_WAITS during the test?

What shows

prstat -T -v 1
vmstat 1

at runtime?

Cheers Received on 2008/05/08 01:03

This archive was generated by hypermail 2.2.0 : 2008/05/08 01:15 CEST