Mar 292012

Network card aggregation with Linux RHEL 6.x, (also called Port bounding or Channel Bonding) it mean the merging of two (or more) network card into one single virtual card (bond0 in this case). This is a non exhaustive information, a mini how-to. For more detail see Redhat documentation. Changes were made between RHEL 5.x and 6.x — here is an updated version.

While applying this procedure and modifying text file, the server remote access crashed so I would recommend that you do this on the local console!

First be sure you got /sbin/ifenslave installed if not install it with “yum install iputils”. Next you will obviously need a acces to the server, it can be remote (if you intend to reboot)  but have a direct access ready in case you are adventurous.

Configuration start by adding the next line into /etc/modprobe.d/bond0.conf  (in previous version it was /etc/modprobe.conf) its necessary to define bond0 as the logical or virtual interface name.

/etc/modprobe.d/bond0.conf :

alias netdev-bond0 bonding
alias bond0 bonding

Note: Using the bond0 issues a deprecated warning as of the latest 2.6.32.xx kernel – so use the recommended “netdev-bond0” instead. Red_Hat_Enterprise_Linux/6/html/Deployment_Guide is wrong on that matter since it suggest using alias bondO.

cd /etc/sysconfig/network-scripts/
vi ifcfg-bond0

/etc/sysconfig/network-scripts/ifcfg-bond0 :

DEVICE="bond0"
ONBOOT=yes
BOOTPROTO=dhcp
NAME="bond0"
BONDING_OPTS="mode=4 miimon=100"
  • Note 1: if eth0 was dhcp configured, you may want to use the same Ethernet MAC address for bond0. You will then keep the same IP.
  • Note 2: BONDING_OPTS mode=0 (balance-rr) means Round-robin while. Mode=0  transmits packets in sequential order from the first slave through the last. This mode provides load balancing (almost twice the bandwidth) and fault tolerance in any case of cable or card failure. This mode require switch configuration as opposed to  mode=6 (balance-alb is active load balancing) which does not. There are 7 existing mode. See Redhat documentation.
  • mode=4 is  IEEE 802.1ax Link Aggregation Control Protocol (LACP) which will set the same MAC address to all bounded interfaces.
  • Note 3: miimon=100 means monitor ARP every 1/10 of a second or 100/1000 milliseconds.

/etc/sysconfig/network-scripts/ifcfg-eth0 :

DEVICE="eth0"
NM_CONTROLLED="no"
ONBOOT=yes
HWADDR=AA:F3:FC:DA:74:AA
TYPE=Ethernet
BOOTPROTO=none
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
NAME="System eth0"
UUID=5fb06bd0-0bb0-7ffb-45f1-00000000000000
SLAVE=yes
MASTER=bond0

/etc/sysconfig/network-scripts/ifcfg-eth1 :

DEVICE="eth1"
HWADDR="BB:F3:FC:DA:83:CC"
NM_CONTROLLED="no"
ONBOOT="yes"
BOOTPROTO=none
SLAVE=yes
MASTER=bond0

You should now be able to reboot and see if it does work. Beware not to loose access to the server if you logged in remotely.

  • After a reboot if using DHCP and the hostname is localhost.localdomain you must have an error and DHCP is not working.
  • After a reboot if not using DHCP and the hostname localhost.localdomain, it may be because you did not add your actual hostname and IP address into /etc/hosts file. Simply add this line to /etc/hosts like this :
192.168.100.1  myserver  myserver.mydomain.com

Atfer reboot you may see messages entering command “dmesg” or from the file /var/log/messages

Here is the output of ifconfig :

% ifconfig -a
bond0     Link encap:Ethernet  HWaddr AA:F3:FC:DA:74:BB
          inet addr:192.168.108.30  Bcast:192.168.108.255  Mask:255.255.255.0
          inet6 addr: fe80::5ef3:fcff:feda:74a4/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:12169 errors:0 dropped:0 overruns:0 frame:0
          TX packets:15745 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2794755 (2.6 MiB)  TX bytes:2601601 (2.4 MiB)

eth0      Link encap:Ethernet  HWaddr AA:F3:FC:DA:74:BB
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:9778 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10352 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2075894 (1.9 MiB)  TX bytes:1684361 (1.6 MiB)
          Interrupt:28 Memory:92000000-92012800

eth1      Link encap:Ethernet  HWaddr BB:F3:FC:DA:83:CC
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:2559 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5709 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:750032 (732.4 KiB)  TX bytes:957292 (934.8 KiB)
          Interrupt:100 Memory:c0000000-c0012800
  • Note: I have seen issue where eth1 had a different address. I change NM_CONTROLLED=”no”
  • Note: You may also setup bonding manually. It won’t stick after reboot. it very simple and may be useful to explorer different modes
ifconfig bond0 up
ifenslave bond0 eth0 eth1
ifconfig bond0 192.168.100.1 netmask 255.255.255.0 up
route add default gw 192.168.1.1 bond0

According to RHEL doc you can explore modes, mode 0, 1, 2, 3, 4, 5, 6, by modifying files into /sys/class/net/ and  modify values. However I was not able to get expected result. Given exemple was :

echo balance-alb > /sys/class/net/bond0/bonding/mode

Validate the aggregation is working by checking the status. Consider the switch not to be configured when you got no partner key or Mac.

 

% cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 2
        Actor Key: 17
        Partner Key: 43
        Partner Mac Address: 00:d0:03:56:74:00

Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: AA:f3:fc:da:67:BB
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: AA:f3:fc:da:85:CC
Aggregator ID: 1
Slave queue ID: 0

 

hope this help someone, anyone, even me for next config 🙂
Rejean.

11 Responses to “Network card aggregation with Linux RHEL 6.x”

  1. Rejean says:

    I had some issue lately with this configuration. When adding options (options for DNS and DOMAIN) to the file /etc/sysconfig/network-scripts/ifcfg-* I have seen the system become unresponsive. … to the point a hard reboot was required. I cant make sense of this. It was on SL61.

  2. Rejean says:

    Beware: While I did test this config, as soon as I saved the first ifcfg-eth0 config file, it reloaded and I loose all network access… I do not understand yet why that is. It should wait for the next reboot… I will post more as I know more.

  3. Rejean says:

    I had issue with /etc/resolv.conf that was constantly erased by NetworkManager,
    I decided disable NetworkManager, and I enable the older “network” service.
    In the config I Also removed it from NetworkManager control:

    /etc/sysconfig/network-scripts/ifcfg-eth*
    NM_CONTROLLED=”no”

    I even decided to stop using NetworkManager service.
    chkconfig NetworkManager off
    chkconfig network on

    or by using the setup tool in System services menu.

    but still changing the config live will cause it to stop network interface… Again you do need to do this from the console.

  4. Nick says:

    Hi,
    thanks for the post, this is what i have configured and is working fine.
    I have added on top of the bond0 one vlan to test.
    it is also working fine.

    however, as soon as i disconnect either one or the other port, i loose network connectivity.

    do you have any clue of where i could have a problem ?

  5. Rejean says:

    Thank you Nick for the comment. I dont have enough information to answer you it could be so many things. are you using NetworkManager or network service? well Good luck.

  6. Nick says:

    Hey Rejean,
    thanks for your reply. No, I have disabled NetworkManager, working in an institute where we use user-defined dhcp options, and NetworkManager just prevents using dhcp-exit-hooks to make benefit of some dhcp options…
    whatever…

    so in fact, I have a server with a Dual 10GbE NICs card (from Solarflare). Solarflare drivers are up-to-date (latest release of june or july 2012…) , bonding driver is the one from RHEL 6.3 (3.6.0). behind that I have two Cisco Nexus 7010, I have created a vPC (Virtual Port-Channel) to make it seen as a standard port-channel between both Nexus and the server. the port-channel sets the port in access mode, so no trunking, and a vlan ID is assigned.

    I have created a bond interface, re-configured my eth0/1 interfaces to be part of the bond, gave it an ip address.
    apply config/restart the whole package, pinging towards whatever IP of the network works fine.

    Now of course, I wanted to make sure that redundancy works. and I have therefore unplugged one or the two fibers. boom… stops pinging.

    Normally, I should expect pinging to continue, I would even accept losing a ping in the middle… but nope… just stops…

    did a cat /proc/net/bonding/bond0 and all parameters are pretty good, tested several scenarios with hash calculation for layer2 only, and then layer3+4… nothing changes. parameters of miimon are as described in your post, and on several other websites i went thru.

    I even tried to set up the port-channel with a trunk, with a single VLAN spanned down… created a vlan interface on top of the bond0, but same results.

    Nexus-wise, the settings are exactly the same than for other virtual-port-channels.
    I even have a Checkpoint FW (which by the way, runs an old version of RHEL 4 if my memory is good), even compared the settings to reproduce what CP is doing when creating bonding + vlan on top… exactly the same kind of output than on /proc/net/bonding/bond0…

    I have spent all day long yesterday trying to set up that for a NFS server. And I am so angry i could not get this working… I have new servers coming pretty soon, on which i will have to set up the same thing…

    better get it done before it gets there.

    any help appreciated.

  7. Rejean says:

    Hi Nick. My guess would be to check the switch ! it’s been quite straight forward on Linux for a while now. A few twitch now and then but in you case it’s major. Check if the switch ports are configured for LACP. Both side need to be!!

    It’s like when a port is disconnected the switch lost it’s path to the NIC instead of forwarding to the second one. Does both NIS has the same MAC address?

    Bonne chance nick, je sais que tu parles francais 😉

  8. Nick says:

    😀 bien joué 🙂
    pour info, j’ai du arreter, car je devais mettre le server en ligne.
    je vais voir pour prendre une autre machine avec dual 10GbE, et faire un test.
    congé la semaine prochaine, donc pas d’occase de tester.
    la suite à partir du 28/08 !

    coté LACP sur le Nexus ça avait l’air bon. maintenant j’ai remis le server avec une seule attache en mode access, je peux plus rien voir.
    la suite au prochain numéro 🙂

  9. Richard says:

    Thank you so much. This just saved me with our setup. Intel CNA X520-DA2 card. I was getting connectivity but my address was not assigned to bond0 it was assigned to eth0, and the link aggregation was not working. Turned off NetworkManager and turned on the old network service. added the line NM_CONTROLLED=no to both ifcfg-eth0 and ifcfgeth1. restarted the network service and bam it was working.

  10. Eli says:

    Not meaning to nitpick, but your description of miimon is incorrect. ARP is not used in MII monitoring mode. You have the interval correct (the value is in milliseconds), but the monitoring mode tests the local interface using ioctls or netif_carrier_ok … (driver reports whether the carrier is present, ie link status). In fact, ARP monitoring should not be used in conjunction with MII monitoring, as per bonding.txt.

  11. Rejean says:

    Thanks Eli to report this. I am not even sure which lines of config you report to be wrong. In any case this is the config I use, and I understood from the research I did back then. If you care to point out the line in error and what it should be so we all can benefit from what you propose I will check it out. thank you.

Leave a Reply

(required)

(required)