AWS : Load Balancing with HAProxy (High Availability Proxy)

bogotobogo.com site search:

What's HAProxy

HAProxy, which stands for High Availability Proxy, is a popular open source software TCP/HTTP Load Balancer and proxying solution. Its most common use is to improve the performance and reliability of a server environment by distributing the workload across multiple servers (e.g. web, application, database).

HAProxy is a free, very fast and reliable solution offering high availability, load balancing, and proxying for TCP and HTTP-based applications. It is particularly suited for very high traffic web sites and powers quite a number of the world's most visited ones. Over the years it has become the de-facto standard opensource load balancer, is now shipped with most mainstream Linux distributions, and is often deployed by default in cloud platforms. Since it does not advertise itself, we only know it's used when the admins report it :-)

- from http://www.haproxy.org/

According to wiki, it is used in many high-profile environments, including: GitHub, Bitbucket, Stack Overflow, Reddit, Tumblr, and Twitter and is used in the OpsWorks product from Amazon Web Services.

In this article, we will see an overview of what HAProxy is, basic load-balancing terminology, and examples of how it might be used to improve the performance and reliability of our own AWS server environment.

Reverse Proxy vs forward proxy

A reverse proxy is a type of proxy server that requests network resources on behalf of external clients.
A forward proxy is a type of proxy server that requests network resources on behalf of internal clients.

Unlike a forward proxy, a reverse proxy does not require any client side configuration and all network requests are handled transparently by the reverse proxy.

Why Reverse Proxy?

We can add extra layer of security to the internet facing services.
We can stream contents from internal network services to internet users.
Also, we can add high availability and load balancing to our network.

Server setup

We have two servers on AWS:

Server 1 - 54.153.75.103 (172.31.6.124)
Server 2 - 52.8.238.145 (172.31.6.125)
HAProxy - 52.8.237.49 (172.31.6.126)

On servers 1 and 2, we need to install apache:

$ sudo apt-get install apache2

The HAProxy can be installed like this:

$ sudo apt-get install haproxy

$ haproxy --version
HA-Proxy version 1.6.3 2015/12/25

Pic credit : http://www.haproxy.org/

Access Control List (ACL)

When we talk about load balancing, ACLs are used to test some condition and perform an action (e.g. select a server, or block a request) based on the test result. Use of ACLs allows flexible network traffic forwarding based on a variety of factors like pattern-matching and the number of connections to a backend, for example:

acl url_blog path_beg /blog

This ACL is matched if the path of a user's request begins with /blog. This would match a request of http://ourdomain.com/blog/blog-entry-1, for example.

http://mysite.com/blog/blog-1

HAProxy setup

By default, HAProxy is disabled. So, we need to turn it on by setting the value to 1 in /etc/default/haproxy:

# Set ENABLED to 1 if you want the init script to start haproxy.
ENABLED=1
# Add extra flags here.
#EXTRAOPTS="-de -m 16"

We can check if the HAProxy is running:

$ sudo service haproxy status
haproxy not running.

HAProxy config - global & defaults

Now, we want to config the HAProxy in /etc/haproxy/haproxy.cfg. Here is the default configuration:

global
        log /dev/log    local0
        log /dev/log    local1 notice
        chroot /var/lib/haproxy
        stats socket /run/haproxy/admin.sock mode 660 level admin
        stats timeout 30s
        user haproxy
        group haproxy
        daemon

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        contimeout 5000
        clitimeout 50000
        srvtimeout 50000
        errorfile 400 /etc/haproxy/errors/400.http
        errorfile 403 /etc/haproxy/errors/403.http
        errorfile 408 /etc/haproxy/errors/408.http
        errorfile 500 /etc/haproxy/errors/500.http
        errorfile 502 /etc/haproxy/errors/502.http
        errorfile 503 /etc/haproxy/errors/503.http
        errorfile 504 /etc/haproxy/errors/504.http

The log directive mentions a syslog server to which log messages will be sent. On Ubuntu rsyslog is already installed and running but it doesn't listen on any IP address.

The user and group directives changes the HAProxy process to the specified user/group.

Let's look at the default section. The values to be modified are the various timeout directives. The connect option (contimeout) specifies the maximum time to wait for a connection attempt.

The client (clitimeout) and server (srvtimeout) timeouts apply when the client or server is expected to acknowledge or send data during the TCP process. HAProxy recommends setting the client and server timeouts to the same value.

HAProxy config - frontend & backend

A frontend defines how requests should be forwarded to backends. Frontends are defined in the frontend section of the HAProxy configuration. Their definitions are composed of the following components:

a set of IP addresses and a port (e.g. 10.1.1.7:80, *:443, etc.)
ACLs
use_backend rules, which define which backends to use depending on which ACL conditions are matched, and/or a default_backend rule that handles every other case.

Append the following to /etc/haproxy/haproxy.cfg:

frontend haproxy_in
    bind *:80
    default_backend haproxy_in

backend haproxy_in
    mode http
    balance roundrobin
    server web01 172.31.6.124:80 check
    server web02 172.31.6.125:80 check

A backend is a set of servers that receives forwarded requests. Backends are defined in the backend section of the HAProxy configuration. In its most basic form, a backend can be defined by:

which load balance algorithm to use
a list of servers and ports

A backend can contain one or many servers in it, and adding more servers to our backend will increase our potential load capacity by spreading the load over multiple servers. Increase reliability is also achieved through this manner.

The balance roundrobin line specifies the load balancing algorithm, and the mode http specifies that layer 7 proxying will be used. Finally, the check option at the end of the server directives specifies that health checks should be performed on those backend servers.

Running HAProxy

Now we can run the HAProxy with a proper configuration:

$ sudo service haproxy reload
 * Reloading haproxy haproxy  cat: /var/run/haproxy.pid: No such file or directory
                                                                         [ OK ]

To check if this change is done properly execute the init script of HAProxy without any parameters. We should see the following.

$ service haproxy
Usage: /etc/init.d/haproxy {start|stop|reload|restart|status}

$ sudo service haproxy status
haproxy is running.

OK. It's running now!

HAProxy in action

Now, we can navigate into a browser with the ip (52.8.237.49) for HAProxy and get either the server #1 or server #2 as shown in the pictures below: