Re: [PATCH] Add support to HTTP proxy operation

From: Willy Tarreau <w#1wt.eu>
Date: Sat, 29 Sep 2007 11:34:46 +0200


Hi Alex,

On Tue, Sep 25, 2007 at 03:46:13PM +0200, Alexandre Cassen wrote:
> Hi,
>
> The following patch add support to HTTP proxy processing. You can set
> backend in this operation mode by using "option proxy".

Well, that seems reasonable to me. Following the same principle, you may be interested in implementing the CONNECT method as described in RFC2817. It's mostly trivial, you have :

CONNECT ip:port HTTP/1.[01]
[some potential headers, all useless for us] [empty line]
--- And now it's just like a TCP connection ---

> Current code
> doesn't support asynchonous dns resolution while processing incoming
> client request. It only support plain IP address processing.

I understand that for some specific usages, it's not a big problem. HAproxy is a reverse-proxy, targetted at outside-to-inside usages, and sometimes inside-to-inside, but its main users do not really need a inside-to-outside proxy (squid does it very well).

I have a few comments on the patch.

> Index: src/backend.c
> ===================================================================
> --- src/backend.c (révision 2)
> +++ src/backend.c (révision 3)
> @@ -163,7 +163,13 @@
> return SRV_STATUS_INTERNAL;
>
> if (!(s->flags & SN_ASSIGNED)) {
> - if (s->be->options & PR_O_BALANCE) {
> + if (s->be->options & PR_O_HTTP_PROXY) {
> + if (s->srv_addr.sin_addr.s_addr == 0)
> + return SRV_STATUS_NOSRV;
> + s->flags |= SN_ASSIGNED;
> + return SRV_STATUS_OK;
> + }
> + else if (s->be->options & PR_O_BALANCE) {
> if (s->flags & SN_DIRECT) {
> s->flags |= SN_ASSIGNED;
> return SRV_STATUS_OK;

As much as possible, I *try* to put the most common cases in the very first ifs, in order to reduce the number of tests. Thus, I would prefer PR_O_HTTP_PROXY option to be checked *after* PR_O_BALANCE if that is still compatible with the way it works.

> @@ -228,7 +234,15 @@
> fprintf(stderr,"assign_server_address : s=%p\n",s);
> #endif
>
> - if ((s->flags & SN_DIRECT) || (s->be->options & PR_O_BALANCE)) {
> + /*
> + * If HTTP PROXY option is set, then server is already assigned
> + * during incoming client request parsing.
> + */
> + if (s->be->options & PR_O_HTTP_PROXY) {
> + s->flags |= SN_ADDR_SET;
> + return SRV_STATUS_OK;
> + }
> + else if ((s->flags & SN_DIRECT) || (s->be->options & PR_O_BALANCE)) {
> /* A server is necessarily known for this session */
> if (!(s->flags & SN_ASSIGNED))
> return SRV_STATUS_INTERNAL;

Same here.

> +static int get_ip_address_from_uri(const char *addr, const char *end, unsigned long *dst)
> +{
> + static char digits[] = "0123456789";
> + int saw_digit, octets, ch;
> + u_char tmp[4], *tp;
> + const char *cp = addr;
> +
> + saw_digit = 0;
> + octets = 0;
> + *(tp = tmp) = 0;
> +
> + while (addr != end && (ch = *addr++) != '/' && ch != ':') {
> + const char *pch;
> + if ((pch = strchr(digits, ch)) != NULL) {
> + u_int new = *tp * 10 + (pch - digits);
> + if (new > 255)
> + return 0;

Strictly speaking, this test is not correct. An IP address may very well be a simple 32bit integer. Most implementations validate a host number on any network, which means that all those addresses are valid and equivalent (try them with ping) :

    192.168.1.13    => host 13 on network 192.168.1
    192.168.269     => host 269 on network 192.168
    192.11010317    => host 11010317 on network 192
    3232235789      => host 3232235789 on global network

Not many people use them, except for local addresses with many zeroes inside them, eg: 127.1, 10.1, ...

It's not critical, but I often get annoyed when I see that Opera does not understand when I type "127.1:3000" in it.

> +static int get_srv_from_uri(struct sockaddr_in *addr, char *buffer, int len)
> +{

(...)

> + /*
> + * http:// sanity check.
> + * Firstly, we try to find :// pattern.
> + */
> + for (curr = buffer; url_code != 0x98 && curr != buffer+len; curr++) {
> + if (*curr == ':' || *curr == '/')
> + url_code += (int) *curr;
> + }

It took me some time to figure out what this trick was. I like the idea but not the implementation, because it could very well match "://", "/:/", "//:". So I suggest that you shift the word on the left and remove the two tests, to achieve something like this :

	curr = buffer;
        while (curr < buffer+len && url_code != 0x3A2F2F) {
                url_code  = ((url_code & 0xFFFF) << 8);
                url_code += (unsigned char)*curr++;
        }

It will only catch "://", cut the object code in half, and will use about half cycles (about 5 cycles/char instead of about 10).

> +
> + /*
> + * Secondly, if :// pattern is found, verify parsed stuff
> + * before pattern is matching our http pattern.
> + * If so parse ip address and port in uri.
> + *
> + * WARNING: Current code doesn't support dynamic async dns resolver.
> + */
> + if (url_code == 0x98) { /* We got a :// */
> + if (curr-cp == 7 && /* Only support HTTP proxy */
> + ((*cp == 'h' || *cp == 'H') &&
> + (*(cp+1) == 't' || *(cp+1) == 'T') &&
> + (*(cp+2) == 't' || *(cp+2) == 'T') &&
> + (*(cp+3) == 'p' || *(cp+3) == 'P'))) {

Hmmm let's use the same trick here with "HTTP" :

	unsigned int http_code = 0;
        while (cp < curr - 3)
		http_code = (http_code << 8) + *cp++;
        http_code |= 0x20202020;  // turn everything to lower case

	if (http_code == 0x68747470) // "http"
                 ...

Overall, it looks mostly good. Please think about providing a minimal documentation about new features. Basically, add the keyword in the "configuration" manual, and say a few words about it and provide one example of how to use it. I expect the configuration manual to replace the reference manual in the long term, as it will be more maintainable.

Thanks!
Willy Received on 2007/09/29 11:34

This archive was generated by hypermail 2.2.0 : 2007/11/04 19:21 CET