Tuesday, September 9, 2014

Zen Load Balancer 3.0.3 Perfomance and Security Customization Part 5

Now, let's move on to NIC bonding. This is useful if one of our NICs goes dead; we obviously want to make sure if that happens, we have another standing by that will take over.

Many admins have a dedicated VLAN for cluster synchronization purposes. Some others just connect two nodes using a crossover cable. That means that if one NIC goes down, all hell breaks loose; if it is the cluster synchronization NIC, then both nodes think that the other node has gone down and they both try to become masters causing havoc to the network; in any other case your frontends and backends seem to be down due to your NIC being dead.

So in that case, we employ NIC bonding. There are actually a few types of network bonding (from here): 
  • balance-rr or 0: Round-robin policy: Transmit packets in sequential order from the first available slave through the last.  This mode provides load balancing and fault tolerance.
  • active-backup or 1: Active-backup policy: Only one slave in the bond is active.  A different slave becomes active if, and only if, the active slave fails. The bond's MAC address is externally visible on only one port (network adapter) to avoid confusing the switch.
    In bonding version 2.6.2 or later, when a failover occurs in active-backup mode, bonding will issue one or more gratuitous ARPs on the newly active slave. One gratutious ARP is issued for the bonding master interface and each VLAN interfaces configured above it, provided that the interface has at least one IP address configured.  Gratuitous ARPs issued for VLAN interfaces are tagged with the appropriate VLAN id. This mode provides fault tolerance.
  • balance-xor or 2: XOR policy: Transmit based on the selected transmit hash policy.  The default policy is a simple [(source MAC address XOR'd with destination MAC address) modulo slave count].  Alternate transmit policies may be selected via the xmit_hash_policy option. This mode provides load balancing and fault tolerance.
  • broadcast or 3: Broadcast policy: transmits everything on all slave interfaces.  This mode provides fault tolerance.
  • 802.3ad or 4: IEEE 802.3ad Dynamic link aggregation.  Creates aggregation groups that share the same speed and duplex settings.  Utilizes all slaves in the active aggregator according to the 802.3ad specification. Slave selection for outgoing traffic is done according to the transmit hash policy, which may be changed from the default simple XOR policy via the xmit_hash_policy option. Note that not all transmit policies may be 802.3ad compliant, particularly in regards to the packet mis-ordering requirements of section 43.2.4 of the 802.3ad standard.  Differing peer implementations will have varying tolerances for noncompliance. Prerequisites:
    1. Ethtool support in the base drivers for retrieving the speed and duplex of each slave.
    2. A switch that supports IEEE 802.3ad Dynamic link aggregation.
    3. Most switches will require some type of configuration to enable 802.3ad mode.
  • balance-tlb or 5: Adaptive transmit load balancing: channel bonding that does not require any special switch support.  The outgoing traffic is distributed according to the current load (computed relative to the speed) on each slave.  Incoming traffic is received by the current slave.  If the receiving slave fails, another slave takes over the MAC address of the failed receiving slave. Prerequisites:
    1. Ethtool support in the base drivers for retrieving the speed and duplex of each slave.
  • balance-alb or 6: Adaptive load balancing: includes balance-tlb plus receive load balancing (rlb) for IPV4 traffic, and does not require any special switch support.  The receive load balancing is achieved by ARP negotiation. The bonding driver intercepts the ARP Replies sent by the local system on their way out and overwrites the source hardware address with the unique hardware address of one of the slaves in the bond such that different peers use different hardware addresses for the server. Receive traffic from connections created by the server is also balanced. When the local system sends an ARP Request the bonding driver copies and saves the peer's IP information from the ARP packet.  When the ARP Reply arrives from the peer, its hardware address is retrieved and the bonding driver initiates an ARP reply to this peer assigning it to one of the slaves in the bond. A problematic outcome of using ARP negotiation for balancing is that each time that an ARP request is broadcast it uses the hardware address of the bond.  Hence, peers learn the hardware address of the bond and the balancing of receive traffic collapses to the current slave.  This is handled by sending updates (ARP Replies) to all the peers with their individually assigned hardware address such that the traffic is redistributed.  Receive traffic is also redistributed when a new slave is added to the bond and when an inactive slave is re-activated.  The receive load is distributed sequentially (round robin) among the group of highest speed slaves in the bond. When a link is reconnected or a new slave joins the bond the receive traffic is redistributed among all active slaves in the bond by initiating ARP Replies with the selected mac address to each of the clients. The updelay parameter (detailed below) must be set to a value equal or greater than the switch's forwarding delay so that the ARP Replies sent to the peers will not be blocked by the switch. Prerequisites:
    1. Ethtool support in the base drivers for retrieving the speed and duplex of each slave.
    2. Base driver support for setting the hardware address of a device while it is open. This is required so that there will always be one slave in the team using the bond hardware address (the curr_active_slave) while having a unique hardware address for each slave in the bond. If the curr_active_slave fails its hardware address is swapped with the new curr_active_slave that was chosen.
In this example we will employ the active-backup method. This is the safest method to use. Most googlers like link aggregation, since an aggregation group will increase the overall bandwidth of the resulting interface.

Let's suppose we want to bond eth8 and eth3 to an interface with the IP 172.16.0.8/22, eth9 and eth4 to an interface with the IP 172.16.4.8/22, and eth0 and eth9 to an interface with the IP 172.16.8.8/23:

root@zen-lb:~# apt-get install ifenslave-2.6
root@zen-lb:~# vi /etc/network/interfaces
auto lo
iface lo inet loopback
auto bond0
iface bond0 inet static
    address 172.16.0.8
    netmask 255.255.252.0
    network 172.16.0.0
    gateway 172.16.0.1
    slaves eth8 eth3
    bond-mode active-backup
    bond-miimon 100
    bond-primary eth8
auto bond1
iface bond1 inet static
    address 172.16.4.8
    netmask 255.255.252.0
    network 172.16.4.0
    slaves eth9 eth4
    bond-mode active-backup
    bond-miimon 100
    bond-primary eth9
auto bond2
iface bond2 inet static
    address 172.16.8.8
    netmask 255.255.254.0
    network 172.16.8.0
    slaves eth0 eth5
    bond-mode active-backup
    bond-miimon 100
    bond-primary eth0

bond-primary is the NIC that will be our primary device.
bond-miimon is how often the link state will be polled.
So, in our case, every 100ms eth8 and eth3 will be polled; if eth8 is up, then this will serve our incoming and outgoing requests, otherwise eth3 will take charge.

root@zen-lb:~# rm /usr/local/zenloadbalancer/config/if_eth*
root@zen-lb:~# vi /usr/local/zenloadbalancer/config/if_bond0_conf
bond0::172.16.0.8:255.255.252.0:up::
root@zen-lb:~# vi /usr/local/zenloadbalancer/config/if_bond1_conf
bond1::172.16.4.8:255.255.252.0:up::
root@zen-lb:~# vi /usr/local/zenloadbalancer/config/if_bond2_conf
bond2::172.16.8.8:255.255.254.0:up::
root@zen-lb:~# vi /usr/local/zenloadbalancer/config/global.conf_conf
.....
#System Default Gateway
$defaultgw="172.16.0.1";
#Interface Default Gateway
$defaultgwif="bond0";
.....
#Also change the ntp server
.....
$ntp="0.europe.pool.ntp.org";
.....

You might also want to change these particular ports on your switch to portfast. That way, you won't have to wait for the forward delay (and as far as these particular ports go, forward delay is useless any way) and the transition will be seemless.

All right, let's see if it all works:

root@zen-lb:~# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth8
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth8
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:19:b9:e4:12:a3

Slave Interface: eth3
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1f:29:57:cf:fe

root@zen-lb:~# cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth9
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth9
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:19:b9:e4:12:a5

Slave Interface: eth4
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1f:29:0d:69:81

root@zen-lb:~# cat /proc/net/bonding/bond2
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1f:29:57:cf:fd

Slave Interface: eth5
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:1f:29:0d:69:80

And if you try to disconnect, or otherwise bring down any of the primary slave interfaces you'll see that the active backup will come up almost instantly (provided you set those ports to portfast on your switch).

No comments:

Post a Comment