I upgraded a CHR cluster with the main objectives of reduce costs, improve network redundancy and provide an easy administration for CHR instances. As explained in previous posts, CHR can be run on many popular hypervisors, and most users are having great success using Hyper-V Failover clusters or vSphere HA to provide highly available routers without depending on VRRP or other gateway redundancy protocols.
These virtual routers currently provide two main services besides routing for ISP customers. They act as PPPoE concentrator for FTTH users, and provide traffic shaping and policing depending on the customer service plan.
For this node, I will use a 32 core Dell R730, with 32 GB of RAM, and a 500 GB RAID 10 storage. On future post, new hosts will be added to the cluster.
This server comes with a 4 port Gigabit Ethernet NIC, which could be used without any issues with the ixgbe driver.
First idea was to use two ports in a LACP bundle, and the other two in separate port groups.
I had previous Netflow analysis where I saw a predictable traffic behavior, where most of the bandwidth usage was going from and to a CDN peer of the ISP network. Customers had a mix of public and private addresses of the Class B segment, and they were being moved to CG-NAT ranges. In other words, traffic from a specific set of addresses were going from and to a specific set of addresses.
Why not configure two port-channels, instead of using separate port groups? I tested and due to the nature of the IP addressing on the customer side of the routers, none of the available hashing modes for LACP allowed to achieve a decent distribution on both links of the port-channel.
So, for the purposes of this cluster, I added an Intel X520 dual SFP+ card, providing 20 Gbps conectivity to the CHR instances. Peak bandwidth usage was around 4200 Mbps, so this card is more than enough to allow for future grow.
The Intel X520 only supports Intel branded SFP modules, and this behavior can be tuned configuring the kernel module. However, for this particular scenario, where both ports will be connected to a top of rack Dell Force10 S4048-ON switch, I choosed to use DAC cables to keep things simple.
The server is using ESXi 6.5 for the hypervisor. After booting, I noticed the NICs were being recognized as vmnic5 and vmnic6, but they were using the ixgbe driver and only establishing links at 1 Gbps.
I downloaded the ixgben driver which is provided by VMware itself here and uploaded it to ESXi via SFTP.
For all my SFTP needs, my tool of choice always is Bitwise SSH client.
Once uploaded, I installed the offline bundle with the following command line.
[root@esxi] esxcli software vib install -d "/complete/path/to/the/driver/bundle"
Then I followed the KB article to disable the native ixgbe driver and use the new one. First, I placed the host on maintenance mode, and then I executedthe following to disable the driver.
[root@esxi] esxcli system module set --enabled=false --module=ixgbe
After a reboot, the new ixgben driver was loaded, and the NICs were establishing links at 10 Gbps.
I added the new NICs to the previously created virtual switches, checked the correct assignments of port groups, and then migrated the VMs to this host.