Line 1: Line 1:
  
== Commands ==
+
==Commands==
 
Here are some commands when working with Infiniband.
 
Here are some commands when working with Infiniband.
  
{| class=wikitable
+
{| class="wikitable"
! width="25%" | Command
+
! width="25%" |Command
! width="75%" | Description
+
! width="75%" |Description
 
|-
 
|-
| {{code|ibcheckwidth}} ||  
+
|{{code|ibcheckwidth}}||
 
|-
 
|-
| {{code|ibdiagnet}} || Shows diagnostic information about the network. Example commands:
+
|{{code|ibdiagnet}}||Shows diagnostic information about the network. Example commands:
 
{{highlight|lang=terminal|code=
 
{{highlight|lang=terminal|code=
 
# ibdiagnet --pc -P all=1 -get_phy_info --extended_speed all --pm_per_lane \
 
# ibdiagnet --pc -P all=1 -get_phy_info --extended_speed all --pm_per_lane \
Line 16: Line 16:
 
}}
 
}}
 
|-
 
|-
| {{code|ibnetdiscover}} || Discovers the network topology. Pipe stdout to {{code|/dev/null}} to look for any errors.
+
|{{code|ibnetdiscover}}||Discovers the network topology. Pipe stdout to {{code|/dev/null}} to look for any errors.
 
{{highlight|lang=terminal|code=
 
{{highlight|lang=terminal|code=
 
# ibnetdiscover > /dev/null  
 
# ibnetdiscover > /dev/null  
 
}}
 
}}
 
|-
 
|-
| {{code|ibqueryerrors}} || Looks for any errors.
+
|{{code|ibqueryerrors}}||Looks for any errors.
 
{{highlight|lang=terminal|code=
 
{{highlight|lang=terminal|code=
 
# ibqueryerrors -s PortXmitWait,LinkErrorRecoveryCounter,PortRcvSwitchRelayErrors,\
 
# ibqueryerrors -s PortXmitWait,LinkErrorRecoveryCounter,PortRcvSwitchRelayErrors,\
Line 27: Line 27:
 
}}
 
}}
 
|-
 
|-
| {{code|ibstat}} || Shows port statistics
+
|{{code|ibstat}}||Shows port statistics
 
|-
 
|-
| {{code|ibswitches}} || Shows switches on the network
+
|{{code|ibswitches}}||Shows switches on the network
 
|-
 
|-
| {{code|ibv_devinfo}} || Shows the Infiniband device information.
+
|{{code|ibv_devinfo}}||Shows the Infiniband device information.
 
|}
 
|}
  
== IP over Infiniband (IPoIB) ==
+
==IP over Infiniband (IPoIB)==
 
On Linux, the driver is {{code|ib_ipoib}}. Like all other kernel modules, its options when loading can be set at {{code|/etc/modprobe.d/ib_ipoib.conf}}.   
 
On Linux, the driver is {{code|ib_ipoib}}. Like all other kernel modules, its options when loading can be set at {{code|/etc/modprobe.d/ib_ipoib.conf}}.   
  
=== Issues with dropped packets ===
+
===Issues with dropped packets===
 
I've been beating my head trying to determine why an infiniband adapter on one particular server is dropping packets:
 
I've been beating my head trying to determine why an infiniband adapter on one particular server is dropping packets:
 
{{highlight|lang=terminal|code=
 
{{highlight|lang=terminal|code=
Line 72: Line 72:
 
}}
 
}}
  
== See Also ==
+
== IPoIB Bonding ==
* https://ieeexplore.ieee.org/document/7092697
+
At CHGI, there is a GPFS filesystem that has the storage nodes use a 4 link infiniband interface bonded as one. However, it's not working quite right because it randomly stops working.
* IP over Infiniband info - https://www.kernel.org/doc/Documentation/infiniband/ipoib.txt
+
 
 +
dmesg shows this on the storage node:
 +
{{Highlight
 +
| code = ib2: ipoib_cm_handle_tx_wc: failed cm send event
 +
| lang = text
 +
}}
 +
Then, the bonded ib0 interface stops responding to the quorum node. It can still reach the other storage nodes and certain other hosts via the bonded link.
 +
 
 +
The fix when this happens is to toggle ib2 down and up. This then causes the bonded interface to use another link as active. Once this happens, the bonded link can talk with the quorum node again. Here, you can see me toggling every infiniband link until I hit ib2 which causes the bond to use ib3 as the active link.
 +
{{Highlight
 +
| code = [Mon Jul 27 10:25:09 2020] bonding: bond0: link status definitely down for interface ib0, disabling it       
 +
[Mon Jul 27 10:25:11 2020] bonding: bond0: link status definitely up for interface ib0, 56000 Mbps full duplex.
 +
[Mon Jul 27 10:25:15 2020] bonding: bond0: link status definitely down for interface ib1, disabling it       
 +
[Mon Jul 27 10:25:17 2020] bonding: bond0: link status definitely up for interface ib1, 56000 Mbps full duplex.
 +
[Mon Jul 27 10:25:24 2020] bonding: bond0: link status definitely down for interface ib2, disabling it       
 +
[Mon Jul 27 10:25:24 2020] bonding: bond0: making interface ib3 the new active one.                           
 +
[Mon Jul 27 10:25:26 2020] bonding: bond0: link status definitely up for interface ib2, 56000 Mbps full duplex.
 +
| lang = text
 +
}}
 +
The quorum node only has one infiniband link. When this occurred, pings from the storage node was able to reach the quorum node and the quorum node does send a reply which never reaches the storage node. Most likely the switch is getting confused and is sending the reply back on another interface. ('''something to test if this happens again?''')
 +
{{Highlight
 +
| code = [root@essio1 ~]# dmesg -T {{!}} grep -B 3 "making interface ib"                                                   
 +
[Sun Feb 16 15:12:57 2020] device bond0 left promiscuous mode                                                 
 +
[Sun Feb 16 15:12:57 2020] device ib0 left promiscuous mode                                                   
 +
[Sun Feb 16 18:02:12 2020] bonding: bond0: link status definitely down for interface ib0, disabling it         
 +
[Sun Feb 16 18:02:12 2020] bonding: bond0: making interface ib1 the new active one.                           
 +
--                                                                                                             
 +
[Mon May  4 12:08:56 2020] NOHZ: local_softirq_pending 08                                                     
 +
[Mon May  4 12:08:56 2020] bonding: bond0: link status definitely up for interface ib0, 56000 Mbps full duplex.
 +
[Mon May  4 12:09:08 2020] bonding: bond0: link status definitely down for interface ib1, disabling it         
 +
[Mon May  4 12:09:08 2020] bonding: bond0: making interface ib2 the new active one.                           
 +
--                                                                                                             
 +
[Mon Jul 27 10:25:15 2020] bonding: bond0: link status definitely down for interface ib1, disabling it         
 +
[Mon Jul 27 10:25:17 2020] bonding: bond0: link status definitely up for interface ib1, 56000 Mbps full duplex.
 +
[Mon Jul 27 10:25:24 2020] bonding: bond0: link status definitely down for interface ib2, disabling it         
 +
[Mon Jul 27 10:25:24 2020] bonding: bond0: making interface ib3 the new active one.
 +
| lang = terminal
 +
}}
 +
The failure always occurs after a failed cm send event error, but not always. as you can see by the timestamps.
 +
{{Highlight
 +
| code = [root@essio1 ~]# dmesg -T {{!}} grep "failed cm send"                                                           
 +
[Mon Feb 24 09:57:43 2020] ib1: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=304 vend_err 81)
 +
[Tue Apr 14 14:05:10 2020] ib1: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=154 vend_err 81)
 +
[Tue Apr 14 14:07:13 2020] ib1: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=262 vend_err 81)
 +
[Tue Apr 14 14:09:15 2020] ib1: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=287 vend_err 81)
 +
[Tue Apr 14 14:09:20 2020] ib1: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=223 vend_err 81)
 +
[Tue Apr 14 14:09:45 2020] ib1: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=109 vend_err 81)
 +
[Mon May  4 11:34:14 2020] ib1: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=64 vend_err 81)
 +
[Mon May  4 15:50:20 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=225 vend_err 81)
 +
[Mon May  4 15:51:07 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=407 vend_err 81)
 +
[Tue Jun 16 23:41:55 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=154 vend_err 81)
 +
[Tue Jun 16 23:41:55 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=353 vend_err 81)
 +
[Wed Jun 17 11:04:36 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=325 vend_err 81)
 +
[Wed Jun 17 11:05:50 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=181 vend_err 81)
 +
[Wed Jun 17 11:08:48 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=157 vend_err 81)
 +
[Wed Jun 17 11:16:19 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=166 vend_err 81)
 +
[Wed Jun 17 11:16:42 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=323 vend_err 81)
 +
[Mon Jun 22 11:13:27 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=158 vend_err 81)
 +
[Mon Jun 22 11:14:55 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=238 vend_err 81)
 +
[Mon Jun 22 11:18:28 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=255 vend_err 81)
 +
[Wed Jun 24 16:57:30 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=167 vend_err 81)
 +
[Wed Jun 24 17:03:09 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=131 vend_err 81)
 +
[Mon Jul  6 10:49:46 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=15 vend_err 81)
 +
[Mon Jul  6 11:08:34 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=72 vend_err 81)
 +
[Tue Jul 14 13:52:32 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=122 vend_err 81)
 +
[Sun Jul 26 06:20:52 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=269 vend_err 81)
 +
| lang = terminal
 +
}}
 +
 
 +
==See Also==
 +
 
 +
*https://ieeexplore.ieee.org/document/7092697
 +
*IP over Infiniband info - https://www.kernel.org/doc/Documentation/infiniband/ipoib.txt

Latest revision as of 14:50, 30 July 2020

Commands[edit | edit source]

Here are some commands when working with Infiniband.

Command Description
ibcheckwidth
ibdiagnet Shows diagnostic information about the network. Example commands:
# ibdiagnet --pc -P all=1 -get_phy_info --extended_speed all --pm_per_lane \
     --get_cable_info --cable_info_disconnected --pm_pause_time 600 -o /tmp/ibdiagnet_ibm
# ibdiagnet -P symbol_error_counter=1
ibnetdiscover Discovers the network topology. Pipe stdout to /dev/null to look for any errors.
# ibnetdiscover > /dev/null
ibqueryerrors Looks for any errors.
# ibqueryerrors -s PortXmitWait,LinkErrorRecoveryCounter,PortRcvSwitchRelayErrors,\
LinkDownedCounter,PortXmitDiscards,VL15Dropped,PortRcvErrors,PortRcvRemotePhysicalErrors
ibstat Shows port statistics
ibswitches Shows switches on the network
ibv_devinfo Shows the Infiniband device information.

IP over Infiniband (IPoIB)[edit | edit source]

On Linux, the driver is ib_ipoib. Like all other kernel modules, its options when loading can be set at /etc/modprobe.d/ib_ipoib.conf.

Issues with dropped packets[edit | edit source]

I've been beating my head trying to determine why an infiniband adapter on one particular server is dropping packets:

# cat /sys/class/net/ib0/statistics/{tx,rx}_dropped
4069
141287

The ring buffer size appears to be smaller than other servers. Other servers that have no issues have a buffer size of 512/512 for rx/tx respectively.

# ethtool -g ib0
Ring parameters for ib0:
Pre-set maximums:
RX:		8192
RX Mini:	0
RX Jumbo:	0
TX:		8192
Current hardware settings:
RX:		256
RX Mini:	0
RX Jumbo:	0
TX:		128

Set the buffer size one time:

# ethtool -G ib0 rx 8192
# ethtool -G ib0 tx 4096

Or, set it in the kernel modprobe.d file:

# echo "options ib_ipoib recv_queue_size=8192 send_queue_size=4096 >> /etc/modprobe.d/ib_ipoib.conf

IPoIB Bonding[edit | edit source]

At CHGI, there is a GPFS filesystem that has the storage nodes use a 4 link infiniband interface bonded as one. However, it's not working quite right because it randomly stops working.

dmesg shows this on the storage node:

ib2: ipoib_cm_handle_tx_wc: failed cm send event

Then, the bonded ib0 interface stops responding to the quorum node. It can still reach the other storage nodes and certain other hosts via the bonded link.

The fix when this happens is to toggle ib2 down and up. This then causes the bonded interface to use another link as active. Once this happens, the bonded link can talk with the quorum node again. Here, you can see me toggling every infiniband link until I hit ib2 which causes the bond to use ib3 as the active link.

[Mon Jul 27 10:25:09 2020] bonding: bond0: link status definitely down for interface ib0, disabling it         
[Mon Jul 27 10:25:11 2020] bonding: bond0: link status definitely up for interface ib0, 56000 Mbps full duplex.
[Mon Jul 27 10:25:15 2020] bonding: bond0: link status definitely down for interface ib1, disabling it         
[Mon Jul 27 10:25:17 2020] bonding: bond0: link status definitely up for interface ib1, 56000 Mbps full duplex.
[Mon Jul 27 10:25:24 2020] bonding: bond0: link status definitely down for interface ib2, disabling it         
[Mon Jul 27 10:25:24 2020] bonding: bond0: making interface ib3 the new active one.                            
[Mon Jul 27 10:25:26 2020] bonding: bond0: link status definitely up for interface ib2, 56000 Mbps full duplex.

The quorum node only has one infiniband link. When this occurred, pings from the storage node was able to reach the quorum node and the quorum node does send a reply which never reaches the storage node. Most likely the switch is getting confused and is sending the reply back on another interface. (something to test if this happens again?)

[root@essio1 ~]# dmesg -T | grep -B 3 "making interface ib"                                                     
[Sun Feb 16 15:12:57 2020] device bond0 left promiscuous mode                                                   
[Sun Feb 16 15:12:57 2020] device ib0 left promiscuous mode                                                     
[Sun Feb 16 18:02:12 2020] bonding: bond0: link status definitely down for interface ib0, disabling it          
[Sun Feb 16 18:02:12 2020] bonding: bond0: making interface ib1 the new active one.                             
--                                                                                                              
[Mon May  4 12:08:56 2020] NOHZ: local_softirq_pending 08                                                       
[Mon May  4 12:08:56 2020] bonding: bond0: link status definitely up for interface ib0, 56000 Mbps full duplex. 
[Mon May  4 12:09:08 2020] bonding: bond0: link status definitely down for interface ib1, disabling it          
[Mon May  4 12:09:08 2020] bonding: bond0: making interface ib2 the new active one.                             
--                                                                                                              
[Mon Jul 27 10:25:15 2020] bonding: bond0: link status definitely down for interface ib1, disabling it          
[Mon Jul 27 10:25:17 2020] bonding: bond0: link status definitely up for interface ib1, 56000 Mbps full duplex. 
[Mon Jul 27 10:25:24 2020] bonding: bond0: link status definitely down for interface ib2, disabling it          
[Mon Jul 27 10:25:24 2020] bonding: bond0: making interface ib3 the new active one.

The failure always occurs after a failed cm send event error, but not always. as you can see by the timestamps.

[root@essio1 ~]# dmesg -T | grep "failed cm send"                                                            
[Mon Feb 24 09:57:43 2020] ib1: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=304 vend_err 81)
[Tue Apr 14 14:05:10 2020] ib1: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=154 vend_err 81)
[Tue Apr 14 14:07:13 2020] ib1: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=262 vend_err 81)
[Tue Apr 14 14:09:15 2020] ib1: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=287 vend_err 81)
[Tue Apr 14 14:09:20 2020] ib1: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=223 vend_err 81)
[Tue Apr 14 14:09:45 2020] ib1: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=109 vend_err 81)
[Mon May  4 11:34:14 2020] ib1: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=64 vend_err 81) 
[Mon May  4 15:50:20 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=225 vend_err 81)
[Mon May  4 15:51:07 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=407 vend_err 81)
[Tue Jun 16 23:41:55 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=154 vend_err 81)
[Tue Jun 16 23:41:55 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=353 vend_err 81)
[Wed Jun 17 11:04:36 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=325 vend_err 81)
[Wed Jun 17 11:05:50 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=181 vend_err 81)
[Wed Jun 17 11:08:48 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=157 vend_err 81)
[Wed Jun 17 11:16:19 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=166 vend_err 81)
[Wed Jun 17 11:16:42 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=323 vend_err 81)
[Mon Jun 22 11:13:27 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=158 vend_err 81)
[Mon Jun 22 11:14:55 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=238 vend_err 81)
[Mon Jun 22 11:18:28 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=255 vend_err 81)
[Wed Jun 24 16:57:30 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=167 vend_err 81)
[Wed Jun 24 17:03:09 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=131 vend_err 81)
[Mon Jul  6 10:49:46 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=15 vend_err 81) 
[Mon Jul  6 11:08:34 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=72 vend_err 81) 
[Tue Jul 14 13:52:32 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=122 vend_err 81)
[Sun Jul 26 06:20:52 2020] ib2: ipoib_cm_handle_tx_wc: failed cm send event (status=12, wrid=269 vend_err 81)

See Also[edit | edit source]