Mellanox OFED Installation[edit | edit source]

The Mellanox OFED (OpenFabrics Enterprise Distribution) is a software stack that allows for RDMA and kernel bypass applications to the adapter. Installing this package is required in order to make proper use of the hardware.

The Mellanox 5.1 OFED for CentOS 8.2: https://www.mellanox.com/downloads/ofed/MLNX_OFED-5.1-2.3.7.1/MLNX_OFED_LINUX-5.1-2.3.7.1-rhel8.2-x86_64.tgz

The Mellanox 4.9 OFED for CentOS 8.2: https://www.mellanox.com/downloads/ofed/MLNX_OFED-4.9-0.1.7.0/MLNX_OFED_LINUX-4.9-0.1.7.0-rhel8.2-x86_64.tgz

If you have an old adapter (ConnectX-3 Pro, ConnectX-3, Connect-IB) which requires the 4.x OFED but are using a newer distro such as CentOS 8.2, you will need to use the LTS versions.

The install script requires the following packages to work:

yum install perl-Term-ANSIColor tcsh tcl gcc-gfortran tk

To build kernel support, the following packages will also be needed:

yum install perl-File-Temp createrepo elfutils-libelf-devel rpm-build lsof python36 python36-devel kernel-devel-4.18.0-193.14.2.el8_2.x86_64 make gdb-headless gcc kernel-rpm-macros

After installing, run /etc/init.d/openibd restart to restart the driver. You will want to run this whenever changing any settings with the kernel module for it to take effect.

Quick Usage[edit | edit source]

Here are some commands when working with Infiniband.

Command Description
ofed_info Shows the OFED version that is installed.
# ofed_info -s
MLNX_OFED_LINUX-5.1-2.3.7.1:
ibcheckwidth
ibdev2netdev Maps adapter port to network device
# ibdev2netdev -v
0000:05:00.0 mlx4_0 (MT26438 - MT1008X01087) FALCON QDR      fw 2.9.1000 port 1 (ACTIVE) ==> ib0 (Down)
0000:05:00.0 mlx4_0 (MT26438 - MT1008X01087) FALCON QDR      fw 2.9.1000 port 2 (DOWN  ) ==> enp5s0d1 (Down)
ibdiagnet Shows diagnostic information about the network. Example commands:
# ibdiagnet --pc -P all=1 -get_phy_info --extended_speed all --pm_per_lane \
     --get_cable_info --cable_info_disconnected --pm_pause_time 600 -o /tmp/ibdiagnet_ibm
# ibdiagnet -P symbol_error_counter=1
ibnetdiscover Discovers the network topology. Pipe stdout to /dev/null to look for any errors.
# ibnetdiscover > /dev/null
ibqueryerrors Looks for any errors.
# ibqueryerrors -s PortXmitWait,LinkErrorRecoveryCounter,PortRcvSwitchRelayErrors,\
LinkDownedCounter,PortXmitDiscards,VL15Dropped,PortRcvErrors,PortRcvRemotePhysicalErrors
ibstat Shows port statistics
ibswitches Shows switches on the network
ibv_devinfo Shows the Infiniband device information.

IP over Infiniband (IPoIB)[edit | edit source]

On Linux, the driver is ib_ipoib. Like all other kernel modules, its options when loading can be set at /etc/modprobe.d/ib_ipoib.conf.

Issues with dropped packets[edit | edit source]

I've been beating my head trying to determine why an infiniband adapter on one particular server is dropping packets:

# cat /sys/class/net/ib0/statistics/{tx,rx}_dropped
4069
141287

The ring buffer size appears to be smaller than other servers. Other servers that have no issues have a buffer size of 512/512 for rx/tx respectively.

# ethtool -g ib0
Ring parameters for ib0:
Pre-set maximums:
RX:		8192
RX Mini:	0
RX Jumbo:	0
TX:		8192
Current hardware settings:
RX:		256
RX Mini:	0
RX Jumbo:	0
TX:		128

Set the buffer size one time:

# ethtool -G ib0 rx 8192
# ethtool -G ib0 tx 4096

Or, set it in the kernel modprobe.d file:

# echo "options ib_ipoib recv_queue_size=8192 send_queue_size=4096 >> /etc/modprobe.d/ib_ipoib.conf

IPoIB Bonding[edit | edit source]

At CHGI, there is a GPFS filesystem that has the storage nodes use a 4 link infiniband interface bonded as one. However, it's not working quite right because it randomly stops working.

dmesg shows this on the storage node:

ib2: ipoib_cm_handle_tx_wc: failed cm send event

Then, the bonded ib0 interface stops responding to the quorum node. It can still reach the other storage nodes and certain other hosts via the bonded link.

The fix when this happens is to toggle ib2 down and up. This then causes the bonded interface to use another link as active. Once this happens, the bonded link can talk with the quorum node again. Here, you can see me toggling every infiniband link until I hit ib2 which causes the bond to use ib3 as the active link.

[Mon Jul 27 10:25:09 2020] bonding: bond0: link status definitely down for interface ib0, disabling it         
[Mon Jul 27 10:25:11 2020] bonding: bond0: link status definitely up for interface ib0, 56000 Mbps full duplex.
[Mon Jul 27 10:25:15 2020] bonding: bond0: link status definitely down for interface ib1, disabling it         
[Mon Jul 27 10:25:17 2020] bonding: bond0: link status definitely up for interface ib1, 56000 Mbps full duplex.
[Mon Jul 27 10:25:24 2020] bonding: bond0: link status definitely down for interface ib2, disabling it         
[Mon Jul 27 10:25:24 2020] bonding: bond0: making interface ib3 the new active one.                            
[Mon Jul 27 10:25:26 2020] bonding: bond0: link status definitely up for interface ib2, 56000 Mbps full duplex.

The quorum node only has one infiniband link. When this occurred, pings from the storage node was able to reach the quorum node and the quorum node does send a reply which never reaches the storage node. Most likely the switch is getting confused and is sending the reply back on another interface. (something to test if this happens again?)

[root@essio1 ~]# dmesg -T | grep -B 3 "making interface ib"                                                     
[Sun Feb 16 15:12:57 2020] device bond0 left promiscuous mode                                                   
[Sun Feb 16 15:12:57 2020] device ib0 left promiscuous mode                                                     
[Sun Feb 16 18:02:12 2020] bonding: bond0: link status definitely down for interface ib0, disabling it          
[Sun Feb 16 18:02:12 2020] bonding: bond0: making interface ib1 the new active one.                             
--                                                                                                              
[Mon May  4 12:08:56 2020] NOHZ: local_softirq_pending 08                                                       
[Mon May  4 12:08:56 2020] bonding: bond0: link status definitely up for interface ib0, 56000 Mbps full duplex. 
[Mon May  4 12:09:08 2020] bonding: bond0: link status definitely down for interface ib1, disabling it          
[Mon May  4 12:09:08 2020] bonding: bond0: making interface ib2 the new active one.                             
--                                                                                                              
[Mon Jul 27 10:25:15 2020] bonding: bond0: link status definitely down for interface ib1, disabling it          
[Mon Jul 27 10:25:17 2020] bonding: bond0: link status definitely up for interface ib1, 56000 Mbps full duplex. 
[Mon Jul 27 10:25:24 2020] bonding: bond0: link status definitely down for interface ib2, disabling it          
[Mon Jul 27 10:25:24 2020] bonding: bond0: making interface ib3 the new active one.

The failure always occurs after a failed cm send event error, but not always. as you can see by the timestamps.

[root@essio1 ~]# dmesg -T | grep "failed cm send"                                                            
[Mon Feb 24 09:57:43 2020] ib1: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=304 vend_err 81)
[Tue Apr 14 14:05:10 2020] ib1: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=154 vend_err 81)
[Tue Apr 14 14:07:13 2020] ib1: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=262 vend_err 81)
[Tue Apr 14 14:09:15 2020] ib1: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=287 vend_err 81)
[Tue Apr 14 14:09:20 2020] ib1: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=223 vend_err 81)
[Tue Apr 14 14:09:45 2020] ib1: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=109 vend_err 81)
[Mon May  4 11:34:14 2020] ib1: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=64 vend_err 81) 
[Mon May  4 15:50:20 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=225 vend_err 81)
[Mon May  4 15:51:07 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=407 vend_err 81)
[Tue Jun 16 23:41:55 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=154 vend_err 81)
[Tue Jun 16 23:41:55 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=353 vend_err 81)
[Wed Jun 17 11:04:36 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=325 vend_err 81)
[Wed Jun 17 11:05:50 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=181 vend_err 81)
[Wed Jun 17 11:08:48 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=157 vend_err 81)
[Wed Jun 17 11:16:19 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=166 vend_err 81)
[Wed Jun 17 11:16:42 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=323 vend_err 81)
[Mon Jun 22 11:13:27 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=158 vend_err 81)
[Mon Jun 22 11:14:55 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=238 vend_err 81)
[Mon Jun 22 11:18:28 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=255 vend_err 81)
[Wed Jun 24 16:57:30 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=167 vend_err 81)
[Wed Jun 24 17:03:09 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=131 vend_err 81)
[Mon Jul  6 10:49:46 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=15 vend_err 81) 
[Mon Jul  6 11:08:34 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=72 vend_err 81) 
[Tue Jul 14 13:52:32 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=122 vend_err 81)
[Sun Jul 26 06:20:52 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=269 vend_err 81)

See Also[edit | edit source]