Hi Krzysztof,
I have several comments on your patch (which is globally correct).
On Sun, Oct 21, 2007 at 03:08:15AM +0200, Krzysztof Oledzki wrote:
> >Be careful, I don't want to add new features to 1.2.17, or minor
> >features only. It's basically in fixes-only mode right now. However,
> >adding 2 columns in the stats page in 1.3 is OK for me.
>
> Funny, funny. I have been working on similar functionality for some time
> now. I was planing to push this patch yesterday (read as: before the
> midnight) but my initial version only calculated time since last status
> change. It was nearly a trivial joke to add what you described. So here it
> is, with small delay required by additional changes. ;)
>
> This patch implements new statistics for SLA calculation by adding new
> filed 'Dwntime' with total down time since restart (both HTTP/CSV) and
> extending status filed (HTTP) or inserting a new one (CVS) with time
> showing how long each server/backend is in a current state. Additionaly,
> down transations are also calculated and displayed for backends, so it is
> possible to know how many times selected backend was down, generating "No
> server is available to handle this request." error.
>
> New information are presentetd in two different ways:
> - for HTTP: a "human redable form", one of "100000d 23h", "23h 59m" or
> "59m 59s"
Please compact the time as much as possible. Simply remove spaces. Also, you can compact the number of days. I even think that we can report the number of days in digital format. Eg: 12.5d. Also, keep in mind that with unix time, you'll never be able to report more than 24855 days.
> - for CSV: seconds
>
> I believe that seconds resolution is enough.
Yes, for that it's enough.
> As there are more columns in the status page I decided to shrink some
> names to make more space:
> - Check -> Chk
> - Down -> Dwn
> - Weight -> Wght
OK, however, I would like the order of the columns not to change, and their name to remain unchanged in the CSV output. The reason is that people write scripts to convert the output into whatever monitoring solution they use, so if we extend the format, we must not break their tools. That means that additional fields must append to the right (at least in the CSV output).
> Making described changes I also made some improvements and fixed some
> small bugs:
> - do not set server down if it is down already
> - do not set server up if it is up already
> - don't increment s->health above s->rise
You mean above s->rise+s->fall I presume ? Because from what I have in mind, the server is down below rise and up between rise and rise+fall.
> - fix colspan in multiple places (mostly broken by my previous patch)
OK, I had already fixed the number of columns as the first post-1.3.13 patch, but I did not notice that colspan was wrong.
> - add missing "status" header to CSV
oops, thanks!
> - fix order of retries/redispatches in server (CSV)
I thought I got it right. Strange...
> - s/Tthen/Then/
> - s/server/backend/ in DATA_ST_PX_BE
OK.
> To prevent future mistakes I also added new 'END' field with '.', so now
> it is easy to check if there are no fields missing after new ones were
> added.
Hmmm I don't like that very much, because we basically dump garbage on the output just in order to catch our programming mistakes more easily. I can understand why you find it useful, but I think we can control that in many other ways. I found that counting the commas was enough to confirm that the status field was missing. We could even write a simple awk/perl script to ensure that a CSV output is correct.
> Best regards,
>
> Krzysztof Ol?dzki
> diff -Nur haproxy-1.3.13-orig/include/common/time.h haproxy-1.3.13-sla/include/common/time.h
> --- haproxy-1.3.13-orig/include/common/time.h 2007-10-18 22:38:22.000000000 +0200
> +++ haproxy-1.3.13-sla/include/common/time.h 2007-10-20 22:12:50.000000000 +0200
> @@ -27,6 +27,11 @@
> #include <common/config.h>
> #include <common/standard.h>
>
> +#define SEC hz
> +#define MINUTE (60 * SEC)
> +#define HOUR (60 * MINUTE)
> +#define DAY (24 * HOUR)
Please do not write macros which are specific to one function's context. In this case, it will only work in the function which has an "hz" variable. I'd prefer that you define SEC to 1 and use "10*SEC*hz" in your function when you need the conversion.
> @@ -480,6 +485,7 @@
> tv1; \
> })
>
> +char *human_time(unsigned, int);
Please name your arguments above. I have no way to know how I can use this function.
> diff -Nur haproxy-1.3.13-orig/include/proto/backend.h haproxy-1.3.13-sla/include/proto/backend.h
> --- haproxy-1.3.13-orig/include/proto/backend.h 2007-10-18 22:38:22.000000000 +0200
> +++ haproxy-1.3.13-sla/include/proto/backend.h 2007-10-21 01:42:20.000000000 +0200
> @@ -39,7 +39,7 @@
>
> void recount_servers(struct proxy *px);
> void recalc_server_map(struct proxy *px);
> -
> +unsigned be_downtime(struct proxy *);
same here.
> diff -Nur haproxy-1.3.13-orig/include/proto/server.h haproxy-1.3.13-sla/include/proto/server.h
> --- haproxy-1.3.13-orig/include/proto/server.h 2007-10-18 22:38:22.000000000 +0200
> +++ haproxy-1.3.13-sla/include/proto/server.h 2007-10-21 01:41:32.000000000 +0200
> @@ -31,7 +31,7 @@
>
> #include <proto/queue.h>
>
> -
> +unsigned srv_downtime(struct server *);
and here.
> diff -Nur haproxy-1.3.13-orig/include/types/proxy.h haproxy-1.3.13-sla/include/types/proxy.h
> --- haproxy-1.3.13-orig/include/types/proxy.h 2007-10-18 22:38:22.000000000 +0200
> +++ haproxy-1.3.13-sla/include/types/proxy.h 2007-10-21 02:22:26.000000000 +0200
> @@ -111,6 +111,11 @@
> unsigned int maxconn; /* max # of active sessions on the frontend */
> unsigned int fullconn; /* #conns on backend above which servers are used at full load */
> struct in_addr except_net, except_mask; /* don't x-forward-for for this address. FIXME: should support IPv6 */
> +
> + unsigned down_trans; /* up-down transitions */
> + unsigned down_time; /* total time the proxy was down */
> + time_t last_change; /* last time, when the state was changed */
Use a long instead of a time_t for last_change, because it's awkward to call time() every time you need it. There is already a variable for this : "now". It is a timeval, so everywhere you need the current time, you can simply use "now.tv_sec". It is really cleaner IMHO because it is guaranteed not to change during a "cycle". Also, it avoids many useless syscalls.
> diff -Nur haproxy-1.3.13-orig/include/types/server.h haproxy-1.3.13-sla/include/types/server.h
> --- haproxy-1.3.13-orig/include/types/server.h 2007-10-18 22:38:22.000000000 +0200
> +++ haproxy-1.3.13-sla/include/types/server.h 2007-10-20 20:06:15.000000000 +0200
> @@ -86,10 +86,14 @@
> unsigned wscore; /* weight score, used during srv map computation */
>
> unsigned failed_checks, down_trans; /* failed checks and up-down transitions */
> + unsigned down_time; /* total time the server was down */
> + time_t last_change; /* last time, when the state was changed */
same here.
> diff -Nur haproxy-1.3.13-orig/src/backend.c haproxy-1.3.13-sla/src/backend.c
> --- haproxy-1.3.13-orig/src/backend.c 2007-10-18 22:38:22.000000000 +0200
> +++ haproxy-1.3.13-sla/src/backend.c 2007-10-21 02:12:57.000000000 +0200
> @@ -16,6 +16,7 @@
> #include <stdlib.h>
> #include <syslog.h>
> #include <string.h>
> +#include <time.h>
you don't need time.h anymore :-)
> @@ -717,6 +718,13 @@
> return 0;
> }
>
> +unsigned be_downtime(struct proxy *px) {
> +
> + if (px->srv_act || px->srv_bck)
> + return px->down_time;
> +
> + return (unsigned)(px->down_time + time(NULL)-px->last_change);
this translates into :
+ return (unsigned)(px->down_time + now.tv_sec - px->last_change);
> diff -Nur haproxy-1.3.13-orig/src/cfgparse.c haproxy-1.3.13-sla/src/cfgparse.c
> --- haproxy-1.3.13-orig/src/cfgparse.c 2007-10-18 22:38:22.000000000 +0200
> +++ haproxy-1.3.13-sla/src/cfgparse.c 2007-10-21 02:45:33.000000000 +0200
> @@ -22,6 +22,7 @@
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
> +#include <time.h>
>
> #include <common/cfgparse.h>
> #include <common/config.h>
> @@ -543,7 +544,8 @@
> tv_eternity(&curproxy->srvtimeout);
> tv_eternity(&curproxy->contimeout);
> tv_eternity(&curproxy->appsession_timeout);
> -
> +
> + time(&curproxy->last_change);
same here : curproxy->last_change = now.tv_sec.
> diff -Nur haproxy-1.3.13-orig/src/checks.c haproxy-1.3.13-sla/src/checks.c
> --- haproxy-1.3.13-orig/src/checks.c 2007-10-18 22:38:22.000000000 +0200
> +++ haproxy-1.3.13-sla/src/checks.c 2007-10-21 02:47:23.000000000 +0200
> @@ -57,48 +57,54 @@
> struct session *sess;
> int xferred;
>
> + if (!(s->state & SRV_RUNNING))
> + return; // already down, do nothing
I would prefer not to rely on SRV_RUNNING inside checks.c. The reason is that it is checks.c which is responsible for setting this flag for others. In some sort, it is an output. All the checks algorithms relies on s->health, and I'd really like to keep that as the primary source of information.
(...)
> - if (s->health == s->rise) {
> - recount_servers(s->proxy);
> - s->proxy->map_state |= PR_MAP_RECALC;
> -
Also, BTW, the patch is wrong, because s->health can only equal s->rise when the server is *going* down. So we enter the if only during the transition. With the change, we enter it every time a server is UP (= mostly everytime). If you really want to remove this "if" block, you can simply revert the condition and return early :
+ if (s->health != s->rise) + return; // server is not going down
But I'm not sure it bring anything.
> @@ -455,12 +461,25 @@
> /* there was a test running */
> if (s->result > 0) { /* good server detected */
> //fprintf(stderr, "process_chk: 9\n");
> - s->health++; /* was bad, stays for a while */
> + if (s->health<s->rise+s->fall-1)
> + s->health++; /* was bad, stays for a while */
OK for this one.
> if (s->health >= s->rise) {
> - s->state |= SRV_RUNNING;
>
> - if (s->health == s->rise) {
> + if (!(s->state & SRV_RUNNING)) {
same as above about the transition vs state.
> diff -Nur haproxy-1.3.13-orig/src/dumpstats.c haproxy-1.3.13-sla/src/dumpstats.c
> --- haproxy-1.3.13-orig/src/dumpstats.c 2007-10-18 22:38:22.000000000 +0200
> +++ haproxy-1.3.13-sla/src/dumpstats.c 2007-10-21 02:42:10.000000000 +0200
> @@ -47,6 +47,7 @@
> #include <proto/proto_uxst.h>
> #include <proto/senddata.h>
> #include <proto/session.h>
> +#include <proto/server.h>
>
> /* This function parses a "stats" statement in the "global" section. It returns
> * -1 if there is any error, otherwise zero. If it returns -1, it may write an
> @@ -186,9 +187,9 @@
> "dreq,dresp,"
> "ereq,econ,eresp,"
> "wretr,wredis,"
> - "weight,act,bck,"
> - "chkfail,chkdown"
> - "\n");
> + "lastchg,status,weight,act,bck,"
> + "chkfail,chkdown,downtime,"
> + "END\n");
OK for "status", but please move "lastchg" to the right (just before downtime, it would be perfect there).
(...)
I have not recounted all the commas, but I trust you on this.
> diff -Nur haproxy-1.3.13-orig/src/time.c haproxy-1.3.13-sla/src/time.c
> --- haproxy-1.3.13-orig/src/time.c 2007-10-18 22:38:22.000000000 +0200
> +++ haproxy-1.3.13-sla/src/time.c 2007-10-21 02:48:02.000000000 +0200
> @@ -142,6 +142,32 @@
> return __tv_isgt(tv1, tv2);
> }
>
> +char *human_time(unsigned t, int hz) {
> + static char rv[sizeof(" 100000d 23h")+1]; // longest of " 23h 59m" and " 59m 59s"
you cannot reach 100000 days, so you can strip one char here. However, I don't know what happens with negative time. This would be interesting to check (you see, when people brutaly set the system's time by hand in the past).
> + char *p = rv;
> +
> + if (t < HOUR) {
> + if (t >= MINUTE)
> + p += sprintf(p, " %dm", t/MINUTE);
> +
> + if (t%MINUTE/SEC || !t)
please even if it looks only like cosmetics, put spaces between your variables and operators everywhere. I don't care a dime about the 80 columns limit (as you might have noticed), but I want code that I can read at 3am when my eyes are starting to cross. And at 3am,
if (t % MINUTE / SEC || !t)
is more readable than :
if (t%MINUTE/SEC || !t)
As you see, that's a lot of small changes, but nothing particularly critical.
BTW, I generally do not change people's patches before committing them. Some contributors may find it insulting that I fix things in their back, and others get annoyed because they maintain incremental patches on their side.
On the other hand, I know it's a waste of time to play ping-pong with patches just to fix minor issues, hence why I've recently slightly modified some patches I was merging. Depending on what you expect when sending a patch, do not hesitate to tell me when you're OK with me doing minor changes if I notice anything while merging, to ask for review when it is what you're waiting for, or to explicitly tell me that you want a commit or reject but no change, for instance because you have other changes pending, or because you want to discuss the reason for changes before the commit. I have no problem adapting to people's expectations, we just have to be in sync.
Thanks,
Willy
Received on 2007/10/21 10:01
This archive was generated by hypermail 2.2.0 : 2007/11/04 19:21 CET