Archivos de la categoría ‘ha’

These is one of the proposed solutions for the job assessment commented in a previous post.

Using an Open Source solution, design a load balancer configuration that meets: redundancy, multiples subnets, and handle 500-1000Mbit of syn/ack/fin packets. Explain scalability of your design/configs.

The main problem that the load balancer design must solve in
web applications is the session stickiness (or persistence). The load balancer
design must be created according to the session replication policy of the architecture.
On the other hand, the load balancer must be designed to allow the upgrade and
maintenance of the servers.

Considering the architecture explained in Webserver architecture section, the stickiness restrictions are:

  • Session stickiness must be set for each site. Site failure means lose of all users sessions crated in that site.
  • Session stickiness must be set for each farm. Server failure is allowed (session will be recovered from the session DB backend)
  • Session stickiness for servers into the farm is optional (Could be set to take advantage of OS disk cache).

The software that I propose is HAProxy (http://haproxy.1wt.eu/):

The load balancer design consists of two layers:

  • One primary HAProxy LB, that will balance between sites.
    Configured with session stickiness to the site using an inserted cookie, SITEID.
    See primary-haproxy.conf.
  • One site LB in each site, balancing the site’s farms.
    Configured with session stickiness to the farm FARM_ID.
    See site1-haproxy.conf.

Extra comments:

  • Each layer can have several HAproxy instances, with the same configuration, configured with a failover solution or behind a Layer 4 load balancer. See http://haproxy.1wt.eu/download/1.3/doc/architecture.txt (Section 2) for examples.
  • Additionally a SSL frontend solution should be configured for SSL connections between the client
    and the primary load balancer. We can use plain HTTP between balancers and servers.
    I will not describe this element.
  • The solutions described in HAproxy architecture documentation, “4 Soft-stop for application maintenance”, can be used.
  • With HAProxy 1.4 you can dynamically control servers weight.
    A monitoring system can check the farms/servers health and tune the weight as needed.

This solution scales well. You simply need to add more servers, farms and sites.
Load-balancers can scale horizontally as commented.

Primary configuration:

#
# Primary Load Balancer configuration: primary-haproxy.conf
#
global
	log 127.0.0.1	local0
	log 127.0.0.1	local1 notice
	#log loghost	local0 info
	maxconn 40000
	user haproxy
	group haproxy
	daemon
	#debug
	#quiet

defaults
	log	global
	mode	http
	option	httplog
	option	dontlognull
	retries	3
	option redispatch
	maxconn	2000
	contimeout	5000
	clitimeout	50000
	srvtimeout	50000

listen primary_lb_1
    # We insert cookies, add headers => http mode
    mode http

    #------------------------------------
    # Bind to all address.
    #bind 0.0.0.0:10001
    # Bind to a clusterized virtual ip
    bind 192.168.10.1:10001 transparent

    #------------------------------------
    # Cookie persistence for PHP sessions. Options
    #  - rewrite PHPSESSID: will add the server label to the session id
    #cookie	PHPSESSID rewrite indirect
    #  - insert a cookie with the identifier.
    #    Use of postonly (session created in login form) or nocache to avoid be cached
    cookie SITEID insert postonly

    # We need to know the client ip in the end servers.
    # Inserts X-Forwarded-For. Needs httpclose (no Keep-Alive).
    option forwardfor
    option httpclose

    # Roundrobin is ok for HTTP requests.
    balance	roundrobin

    # The backend sites
    # Several options are possible:
    #  inter 2000 downinter 500 rise 2 fall 5 weight 100
    server site1 192.168.11.1:10001 cookie site1 check
    server site2 192.168.11.1:10002 cookie site2 check
    # etc..

Site configuration:

#
# Site 1 load balancer configuration: syte1-haproxy.conf
#
global
log 127.0.0.1    local0
log 127.0.0.1    local1 notice
#log loghost    local0 info
maxconn 40000
user haproxy
group haproxy
daemon
#debug
#quiet

defaults
log    global
mode    http
option    httplog
option    dontlognull
retries    3
option redispatch
maxconn    2000
contimeout    5000
clitimeout    50000
srvtimeout    50000

#------------------------------------
listen site1_lb_1
grace 20000 # don't kill us until 20 seconds have elapsed

# Bind to all address.
#bind 0.0.0.0:10001
# Bind to a clusterized virtual ip
bind 192.168.11.1:10001 transparent

# Persistence.
# The webservers in same farm share the session
# with memcached. The whole site has them in a DB backend.
mode http
cookie FARMID insert postonly

# Roundrobin is ok for HTTP requests.
balance roundrobin

# Farm 1 servers
server site1ws1 192.168.21.1:80 cookie farm1 check
server site1ws2 192.168.21.2:80 cookie farm1 check
# etc...

# Farm 2 servers
server site1ws17 192.168.21.17:80 cookie farm2 check
server site1ws18 192.168.21.18:80 cookie farm2 check
server site1ws19 192.168.21.19:80 cookie farm1 check
# etc..
Anuncios

These is one of the proposed solutions for the job assessment commented in a previous post.

Note that this was my reply it that moment, nowadays I would change my reply including automatic provisioning and automated configuration  based on puppet, chef… and other techniques. Also something about rollbacks, using LVM snapshots.

Question

Given a 500+ node webcluster in one costumer for one base code. Design a Gentoo/PHP version control. Propose solutions for OS upgrades of servers. Propose a plan and execution for PHP upgrades.  Please explain your choices.

Solution

About the OS/upgrades I will consider:

  • There are a limited number of hardware configurations. I will call them: hw-profile.
  • There is a preproduction environment, with servers of each hardware configuration.

In that case:

  • Each upgrade must be properly tested in the preproduction environment.

  • The preproduction servers will pre-compile the Gentoo packages
    for each hw-profile. Distributed compiling can be set.

  • There is a local Gentoo mirror and pre-compiled packages repository
    in the network, serving the binaries built for each hw-profile.

  • Each server will have associated its hw-profile repository and install the binaries:

    PORTAGE_BINHOST="ftp://gentoo-repository/$hw-profile"
    emerge --usepkg --getbinpkg <package>

The PHP upgrades can be distributed using rsync, in different location for each version,
and activated changing the apache/nginx configuration.

To plan the upgrades (both OS and PHP) I will consider the architecture
explained previously in Webserver architecture section, and the load balancing
solution described in Redundant load balancer design.

The upgrade requirements are:

  • HA, no lost of service due maintenance or upgrades.
  • Each request with an associated session must access to a
    webapp version equal or superior than previous request.
    This is important to ensure the application consistency
    (e.p. an user fills a form that is only available in the last version,
    session contains unexpected values…).

The upgrades can be divided in:

  • Non-disruptive OS upgrade: small OS software upgrades that are not related to
    the webservice (p.e. man, findutils, tar…). The upgrade can be performed online.

  • Disruptive OS upgrade: OS software that imply restart the service
    or the server (p.e. Apache, kernel, libc, ssl…):

    1. It will be upgraded only one member of each farms. First all members number 1,
      then number 2…
    2. The web service will be stopped during the upgrade.
      The other servers in the farm will serve without service disruption.

    This method provides homogeneous and little performance impact (100/16 = 6% servers down).

  • Application upgrade: Clients must access to equal or newer webapp versions:

    1. A complete farm must be stopped at the same time.
    2. The sessions sticked to this farm will be
      served by other farms in the same site (session stickiness to site).
      Session data will be recovered from DB backend.
    3. The memcached associated to this farm must be flushed.
    4. Once upgraded, all servers in the farm are started and can serve to new sessions.

    Load balancer stickiness ensure that the new sessions will access only to
    the upgraded farm. Except:

    • if all the servers of the farm fail at the same time after the upgrade.
    • if end user manipulates the cookies.

    In that case, control code can be added to the application to
    invalidate sessions from upper versions. Something like this:

    if ($app_version < $_SESSION['app_version'])
     session_destroy()
    elseif ($app_version != $_SESSION['app_version'])
     $_SESSION['app_version'] = $app_session

To perform the upgrades, cluster management tools can be used,
like MCollective (good if using puppet), func, fabric