GPFS

From Leo's Notes
Last edited on 14 March 2023, at 23:17.

The General Parallel File System (GPFS) now called IBM Spectrum Scale is IBM's proprietary filesystem.

Installation[edit | edit source]

GPFS packages are provided by the gpfs.* packages.

Command line usage[edit | edit source]

All GPFS management utilities are located under the /usr/lpp/mmfs/bin/ directory and commands typically start with mm.

Command Description
mmccr flist lists files stored in Cluster Configuration Repository (CCR)
mmccr lsnodes Shows all quorum nodes
mmshutdown -a Stop the GPFS cluster
mmstartup -a Start up the GPFS cluster
mmmount [gpfs0|/fileset] Mount a particular gpfs device or fileset after start up. Do not mount GPFS using mount as it probably won't work.
mmlsfs all Shows all file system attributes on the GPFS cluster.
mmlssnapshot gpfs1

mmlssnapshot gpfs1 -j chgi_data -d

Shows all snapshots in a particular filesystem

You can also limit based on fileset name with -j and show snapshot sizes using -d

mmlsmount all_local shows whether the file system on this node is being mounted by other nodes in the cluster.
mmlsfileset gpfs0 [fileset] shows all filesets on the file system gpfs0
mmlspdisk all --not-ok shows all failed disks.
mmhealth cluster show [node] Shows the cluster status. Does not exist before GPFS 4.2.3 (?).
mmedquota -j gpfs1:fileset used to change a quota of a fileset on the GPFS1 volume.
mmcrfileset gpfs1 fileset --inode-space new --inode-limit 4194304:2097152 change inode quota. The two nubmers are Limit:Allocated
mmdiag --config Shows the GPFS system parameters
mmdiag --waiters Shows processes that are blocking IO. Useful for troubleshooting causes of long-waiters.

The IBM Spectrum Scale RAID GUI utilities are at /usr/lpp/mmfs/gui/cli/. Certain actions taken on the web interface will trigger one of these executables.

Snapshots[edit | edit source]

Snapshot management is done using the following utilities:

Command Description
mmlssnapshot gpfs0 Lists all snapshots in the gpfs0 device
mmdelsnapshot gpfs0 snapshot_name -j fileset Deletes a snapshot on the gpfs0 device on a particular fileset by name.
mmcrsnapshot gpfs0 snapshot_name -j fileset Create a snapshot on the gpfs0 device for a particular fileset with the provided snapshot name.
mmcrsnapshot gpfs0 snapshot_name Create a global snapshot on the gpfs0 device with the provided snapshot name.

Global snapshots contains a snapshot of all filesets as well.

Automated Snapshots[edit | edit source]

Automated snapshots should be created using the GPFS Spectrum Scale GUI tool or by creating snapshot rules associated with a fileset using /usr/lpp/mmfs/gui/cli/mksnapassoc command. Information on mksnapassoc available at https://www.ibm.com/support/knowledgecenter/en/ST5Q4U_1.6.0/com.ibm.storwize.v7000.unified.160.doc/manpages/mksnapassoc.html.

When a schedule has been created, you can list them to show when they were last executed.

# /usr/lpp/mmfs/gui/cli/lssnapassoc 
Cluster           Device Fileset      Rule  Last implemented Last update
chgi-psc.gpfs.net gpfs0  home         D7    3/4/20 1:00 AM   4/3/17 11:00 AM
chgi-psc.gpfs.net gpfs0  home         H12   3/4/20 3:20 PM   3/4/20 10:56 AM

Tasks[edit | edit source]

Create Filesystem[edit | edit source]

mmcrfs gpfs1 -F vdisk.cfg -j scatter -T /gpfs/gpfs1 -B 16M --metadata-block-size 1M

Create a new GPFS filesystem on GPFS1, where vdisk.cfg contains the filesystem specifications.

%vdisk: vdiskName=gssio1_Data_16M_2p_1 rg=gssio1 da=DA1 blocksize=16M size=700T raidCode=8+2p diskUsage=dataOnly failureGroup=30  pool=data
%vdisk: vdiskName=gssio1_MetaData_16M_2p_1 rg=gssio1 da=DA1 blocksize=1M size=7T raidCode=3WayReplication diskUsage=metadataOnly failureGroup=30  pool=system
# %nsd:gssio1_Data_16M_2p_1:::dataOnly:30:gssio1_Data_16M_2p_1:data
gssio1_Data_16M_2p_1:::dataOnly:30::data
# %nsd:gssio1_MetaData_16M_2p_1:::metadataOnly:30:gssio1_MetaData_16M_2p_1:system
gssio1_MetaData_16M_2p_1:::metadataOnly:30::system
%vdisk: vdiskName=gssio2_Data_16M_2p_1 rg=gssio2 da=DA1 blocksize=16M size=700T raidCode=8+2p diskUsage=dataOnly failureGroup=30  pool=data
%vdisk: vdiskName=gssio2_MetaData_16M_2p_1 rg=gssio2 da=DA1 blocksize=1M size=7T raidCode=3WayReplication diskUsage=metadataOnly failureGroup=30  pool=system
# %nsd:gssio2_Data_16M_2p_1:::dataOnly:30:gssio2_Data_16M_2p_1:data
gssio2_Data_16M_2p_1:::dataOnly:30::data
# %nsd:gssio2_MetaData_16M_2p_1:::metadataOnly:30:gssio2_MetaData_16M_2p_1:system
gssio2_MetaData_16M_2p_1:::metadataOnly:30::system

Node Management[edit | edit source]

Use mmlsnode to show all nodes that are part of a nodeset.

[root@tsm /]# mmlsnode
GPFS nodeset    Node list
-------------   -------------------------------------------------------
   chgi-psc     essio1-ib essio2-ib ems1-ib crick-ib powerlc001-ib powerlc002-ib transmart-ib tsm-ib node001 node002 node003 node004 node005 node006 node007 node008 node009 node010 node011 node012 node013 node014 node015 node016 node017 node018 node019 node020 node021 node022 node023 node024 node025 node026 node027 node028 node029 node030 node031 node032 node035 node033 node036 node034 theia galaxy-dev-ib ebg01-ib snyder-ib

Additional information for each node can be found when listing cluster information with mmlscluster:

[root@tsm /]# mmlscluster

GPFS cluster information
========================
  GPFS cluster name:         chgi-psc.gpfs.net
  GPFS cluster id:           2878855853427990921
  GPFS UID domain:           chgi-psc.gpfs.net
  Remote shell command:      sudo wrapper in use
  Remote file copy command:  sudo wrapper in use
  Repository type:           CCR

 Node  Daemon node name        IP address      Admin node name         Designation
-----------------------------------------------------------------------------------
   1   essio1-ib.gpfs.net      172.26.3.1      essio1-ib.gpfs.net      quorum-manager-perfmon
   2   essio2-ib.gpfs.net      172.26.3.2      essio2-ib.gpfs.net      quorum-manager-perfmon
   3   ems1-ib.gpfs.net        172.26.3.251    ems1-ib.gpfs.net        quorum-perfmon
   4   crick-ib.gpfs.net       172.26.4.252    crick-ib.gpfs.net       
   5   powerlc001-ib.gpfs.net  172.26.4.1      powerlc001-ib.gpfs.net  
   6   powerlc002-ib.gpfs.net  172.26.4.2      powerlc002-ib.gpfs.net  
   7   transmart-ib.gpfs.net   172.26.4.3      transmart-ib.gpfs.net   
   8   tsm-ib.gpfs.net         172.26.3.250    tsm-ib.gpfs.net         
   9   node001                 172.26.10.1     node001                 
  ...
  50   node034                 172.26.10.34    node034                 
  51   theia                   172.26.25.2     theia                   
  56   galaxy-dev-ib           172.26.10.116   galaxy-dev-ib           
  58   ebg01-ib                172.26.10.153   ebg01-ib                
  60   snyder-ib               172.26.10.105   snyder-ib

To add a new node:

# mmaddnode -N damona-ib

To remove a node:

# mmdelnode -N damona-ib
Verifying GPFS is stopped on all affected nodes ...
mmdelnode: [W] Could not cleanup the following unreached nodes:
damona-ib
mmdelnode: Command successfully completed
mmdelnode: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.


Rebuild Shadow Database[edit | edit source]

The shadow database is a file used by mmbackup to quickly determine what files to back up. As per the documentation:

These databases shadow the inventory of objects in IBM Spectrum Protect so that only new changes will be backed up in the next incremental mmbackup. Failing to do so will needlessly back up some files additional times. The shadow database can also become out of date if mmbackup fails due to certain IBM Spectrum Protect server problems that prevent mmbackup from properly updating its shadow database after a backup. In these cases it is also required to issue the next mmbackup command with either the -q option or the --rebuild options.

To rebuild the shadow database, I had to use a separate temp directory because the system temp was too small. For some reason however, this process failed but still generated a 10GB shadow file.

# mmbackup gpfs0 --rebuild -s /tmp2
--------------------------------------------------------
mmbackup: Shadow database rebuild of /gpfs begins at Wed Feb 19 09:35:36 MST 2020.
--------------------------------------------------------
Thu Feb 20 00:56:27 2020 mmbackup:Built query data file from TSM server: CHGI_TSM01 rc = 0
Thu Feb 20 00:56:28 2020 mmbackup:Scanning file system gpfs0
Thu Feb 20 04:55:32 2020 mmbackup:Reconstructing previous shadow file /gpfs/.mmbackupShadow.1.CHGI_TSM01.filesys from query data for CHGI_TSM01
mmbackup: tsbuhelper: parsePolicyShow: could not find mtime date
mmbackup: tsbuhelper: rebuildShadow: writing merged record fail rc = -1
mmbackup: tsbuhelper:rebuildShadow: Failed with rc=-1
Thu Feb 20 05:26:48 2020 mmbackup:/usr/lpp/mmfs/bin/tsbuhelper rebuildshadow /gpfs/.mmbackupQueryShadow.CHGI_TSM01.filesys /gpfs/.mmbackupCfg/prepFiles/list.mmbackup.1.CHGI_TSM01 /gpfs/.mmbackupShadow.1.CHGI_TSM01.filesys /tmp2 2>&1 returned rc=255.  Cannot rebuild shadow file.
Thu Feb 20 05:26:48 2020 mmbackup:Failed to reconstruct old shadow file.  Skipping CHGI_TSM01.
Thu Feb 20 05:26:48 2020 mmbackup:Done with shadow file database rebuilds
Thu Feb 20 05:26:48 2020 mmbackup:Incremental shadow database rebuild completely failed.
        TSM had 0 severe errors and returned 0. See the TSM log file for more information.
        0 files had errors,
 TSM exit status:  exit 12

----------------------------------------------------------
mmbackup: Shadow database rebuild of /gpfs completed with errors at Thu Feb 20 05:26:48 MST 2020.
----------------------------------------------------------
mmbackup: Command failed. Examine previous error messages to determine cause.

It was at 5:26 that the TSM server lost its membership to the GPFS filesystem for some reason which probably caused the failure...

User Management for IBM Spectrum Scale RAID[edit | edit source]

The main utilities for managing users are:

Command Description
/usr/lpp/mmfs/gui/cli/chuser Change a user's password or membership
/usr/lpp/mmfs/gui/cli/lsuser List all users
/usr/lpp/mmfs/gui/cli/mkuser Create a new user
/usr/lpp/mmfs/gui/cli/rmuser Removes a user

If you need to reset the password to the admin user, use the /usr/lpp/mmfs/gui/cli/chuser username -p password utility.

# /usr/lpp/mmfs/gui/cli/chuser admin -p 'a~new!password@'
EFSSG0020I The user admin has been successfully changed.
EFSSG1000I The command completed successfully.

There are other user* utilities that can be used to manage user group memberships (not something I need or care about right now).

Clear Stale Events on IBM Spectrum Scale RAID[edit | edit source]

To clear out old events, you will need to delete entries from the PostgreSQL server. On the server running the GUI, run:

# sudo -u postgres psql
psql (9.2.7)
Type "help" for help.

postgres=# select * FROM pg_stat_activity;

SELECT * FROM fscc.gss_state WHERE event_time <= '2019-12-12' ;

Fix Data Collection[edit | edit source]

If data collection is not happening or no data is being reported on IBM Spectrum Scale RAID, ensure that the pmsensors service is running on all GPFS nodes.

essio# systemctl restart pmsensors

If pmsensors isn't starting, or if the /opt/IBM/zimon/ZIMonSensors.cfg file is missing, ensure that the configuration file has the following contents. The colCandidates value should be set to the hostname of the GPFS management node for data collection.

colCandidates = "synergy-ib"
colRedundancy = 1
collectors = {
        host = "synergy-ib"
        port = "4739"
}
config = "/opt/IBM/zimon/ZIMonSensors.cfg"
ctdbstat = ""
daemonize = T
hostname = ""
ipfixinterface = "0.0.0.0"
logfile = "/var/log/zimon/ZIMonSensors.log"
loglevel = "info"
mmcmd = "/opt/IBM/zimon/MMCmdProxy"
mmdfcmd = "/opt/IBM/zimon/MMDFProxy"
mmpmon = "/opt/IBM/zimon/MmpmonSockProxy"
piddir = "/var/run"
release = "4.2.0-2"
sensors = {
        name = "CPU"
        period = 1
},
{
        name = "Load"
        period = 1
},
{
        name = "Memory"
        period = 1
},
{
        name = "Network"
        period = 1
},
{
        name = "Netstat"
        period = 0
},
{
        name = "Diskstat"
        period = 0
},
{
        name = "DiskFree"
        period = 600
},
{
        name = "GPFSDisk"
        period = 0
},
{
        name = "GPFSFilesystem"
        period = 1
},
{
        name = "GPFSNSDDisk"
        period = 1
        restrict = "nsdNodes"
},
{
        name = "GPFSPoolIO"
        period = 0
},
{
        name = "GPFSVFS"
        period = 1
},
{
        name = "GPFSIOC"
        period = 0
},
{
        name = "GPFSVIO"
        period = 0
},
{
        name = "GPFSPDDisk"
        period = 1
        restrict = "nsdNodes"
},
{
        name = "GPFSvFLUSH"
        period = 0
},
{
        name = "GPFSNode"
        period = 1
},
{
        name = "GPFSNodeAPI"
        period = 1
},
{
        name = "GPFSFilesystemAPI"
        period = 1
},
{
        name = "GPFSLROC"
        period = 0
},
{
        name = "GPFSCHMS"
        period = 0
},
{
        name = "GPFSAFM"
        period = 0
},
{
        name = "GPFSAFMFS"
        period = 0
},
{
        name = "GPFSAFMFSET"
        period = 0
},
{
        name = "GPFSRPCS"
        period = 0
},
{
        name = "GPFSFilesetQuota"
        period = 3600
},
{
        name = "GPFSDiskCap"
        period = 0
        restrict = "ems1-ib"
},
{
        name = "NFSIO"
        period = 0
        proxyCmd = "/opt/IBM/zimon/GaneshaProxy"
        restrict = "cesNodes"
        type = "Generic"
}
smbstat = ""

Troubleshooting[edit | edit source]

To troubleshoot issues with GPFS, you may want to take a look at the following log files:

Service Log Locations
mmfsd /var/adm/ras/mmfs.log.*

/var/adm/ras/trace

Spectrum Scale /var/log/cnlog/mgtsrv/*log

Ensure that /var/lib/mmfs/gui/preferences.xml has the proper configuration and number.

GPFS Cannot Start[edit | edit source]

First thing's first: Check the network connectivity of your node. If the GPFS service fails, or you get errors like Unexpected error from ccr fget mmsdrsfs. Return code: 149, or you cannot mount the GPFS filesystems, it could simply be a networking issue preventing them from working.

At CHGI, there is some infiniband weirdness. The storage nodes have 4 infiniband interfaces bonded together. However, it sometimes goes into a weird state where the failover causes traffic to stop flowing between the storage node and quorum node. The fix is to log in to each storage node that cannot talk with the quorum node and toggle each of the infiniband interfaces until the bonded link works again.

essio1# ifconfig ib0 down; ifconfig ib0 up
essio1# ifconfig ib1 down; ifconfig ib1 up
essio1# ifconfig ib2 down; ifconfig ib2 up
essio1# ifconfig ib3 down; ifconfig ib3 up

## on the quorum node, check using ping and restart gpfs.service if OK
ems1# ping essio1-ib
ems1# systemctl restart gpfs.service

Cluster Manager connection broke[edit | edit source]

The TSM server kept on getting disconnected every morning. Logs show the following:

Fri Feb 21 05:08:04.735 2020: [E] The TCP connection to IP address 172.26.3.2 essio2-ib.gpfs.net <c0n1> (socket 76) state is unexpected: ca_state=4 unacked=0 rto=210000
Fri Feb 21 05:08:04.749 2020: [I] tscCheckTcpConn: Sending debug data collection request to node 172.26.3.2 essio2-ib.gpfs.net
Fri Feb 21 05:08:04.750 2020: Sending request to collect TCP debug data to essio2-ib.gpfs.net localNode
Fri Feb 21 05:08:04.809 2020: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon.
Fri Feb 21 05:17:55.685 2020: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 172.26.3.2 essio2-ib.gpfs.net. Sending expel message.
Fri Feb 21 05:17:55.716 2020: Sending request to collect expel debug data to essio2-ib.gpfs.net localNode
Fri Feb 21 05:17:55.721 2020: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon.
Fri Feb 21 05:18:05.782 2020: [N] Request sent to 172.26.3.1 (essio1-ib.gpfs.net) to expel 172.26.3.2 (essio2-ib.gpfs.net) from cluster chgi-psc.gpfs.net
Fri Feb 21 05:18:05.783 2020: [N] This node will be expelled from cluster chgi-psc.gpfs.net due to expel msg from 172.26.3.250 (tsm-ib.gpfs.net)
Fri Feb 21 05:18:21.075 2020: [N] VERBS RDMA closed connection to 172.26.3.1 (essio1-ib.gpfs.net) on mlx5_0 port 1 fabnum 0 index 1
Fri Feb 21 05:18:21.077 2020: [N] VERBS RDMA closed connection to 172.26.3.1 (essio1-ib.gpfs.net) on mlx5_0 port 1 fabnum 0 index 3
Fri Feb 21 05:18:21.078 2020: [N] VERBS RDMA closed connection to 172.26.3.1 (essio1-ib.gpfs.net) on mlx5_0 port 1 fabnum 0 index 2
Fri Feb 21 05:18:21.079 2020: [N] VERBS RDMA closed connection to 172.26.3.1 (essio1-ib.gpfs.net) on mlx5_0 port 1 fabnum 0 index 11
Fri Feb 21 05:18:21.080 2020: [I] Cluster Manager connection broke. Probing cluster chgi-psc.gpfs.net
Fri Feb 21 05:18:21.112 2020: [N] VERBS RDMA closed connection to 172.26.3.2 (essio2-ib.gpfs.net) on mlx5_0 port 1 fabnum 0 index 4
Fri Feb 21 05:18:21.114 2020: [N] VERBS RDMA closed connection to 172.26.3.2 (essio2-ib.gpfs.net) on mlx5_0 port 1 fabnum 0 index 7
Fri Feb 21 05:18:21.115 2020: [N] VERBS RDMA closed connection to 172.26.3.2 (essio2-ib.gpfs.net) on mlx5_0 port 1 fabnum 0 index 15
Fri Feb 21 05:18:21.116 2020: [N] VERBS RDMA closed connection to 172.26.3.2 (essio2-ib.gpfs.net) on mlx5_0 port 1 fabnum 0 index 14
Fri Feb 21 05:18:21.582 2020: [E] Unable to contact any quorum nodes during cluster probe.
Fri Feb 21 05:18:21.583 2020: [E] Lost membership in cluster chgi-psc.gpfs.net. Unmounting file systems.
Fri Feb 21 05:18:22.307 2020: [I] Calling user exit script mmUnmountFs: event unmount, Async command /usr/lpp/mmfs/lib/mmsysmon/sendRasEventToMonitor.
Fri Feb 21 05:18:22.719 2020: [I] The quota client for file system gpfs0 has terminated.
Fri Feb 21 05:18:22 MST 2020: [I] sendRasEventToMonitor: Successfully send a filesystem event to monitor
Fri Feb 21 05:18:23.270 2020: [N] Connecting to 172.26.3.1 essio1-ib.gpfs.net <c0p0>
Fri Feb 21 05:18:33.317 2020: [N] Connecting to 172.26.3.2 essio2-ib.gpfs.net <c0p1>
Fri Feb 21 05:18:43.327 2020: [N] Connecting to 172.26.3.251 ems1-ib.gpfs.net <c0p2>

You can check the timeout times using mmfsadm dump cfgmgr:

# mmfsadm dump cfgmgr | grep failureDetectionTime
lease config: dynamic yes failureDetectionTime 35.0 usePR no recoveryWait 60 dmsTimeout 40
lease config: dynamic yes failureDetectionTime 0.0 usePR ? recoveryWait 60 dmsTimeout 40

Timed out waiting for a reply from node x[edit | edit source]

On the TSM server that keeps on getting expelled periodically, I see:

Mon Mar  2 07:36:23.289 2020: [N] sdrServ: Received expel data collection request from 172.26.3.2
Mon Mar  2 07:36:23.290 2020: [N] GPFS will attempt to collect debug data on this node.
Mon Mar  2 07:36:33.235 2020: [N] This node will be expelled from cluster chgi-psc.gpfs.net due to expel msg from 172.26.3.2 (essio2-ib.gpfs.net)
Mon Mar  2 07:36:33.671 2020: [N] sdrServ: Received expel data collection request from 172.26.3.1
Mon Mar  2 07:36:33.672 2020: [N] Debug data has not been collected. It was collected recently at 2020-03-02_07:36:23-0700.
Mon Mar  2 07:36:48.059 2020: [N] VERBS RDMA closed connection to 172.26.3.1 (essio1-ib.gpfs.net) on mlx5_0 port 1 fabnum 0 index 2
Mon Mar  2 07:36:48.060 2020: [N] VERBS RDMA closed connection to 172.26.3.1 (essio1-ib.gpfs.net) on mlx5_0 port 1 fabnum 0 index 3
Mon Mar  2 07:36:48.061 2020: [N] VERBS RDMA closed connection to 172.26.3.1 (essio1-ib.gpfs.net) on mlx5_0 port 1 fabnum 0 index 1
Mon Mar  2 07:36:48.063 2020: [N] VERBS RDMA closed connection to 172.26.3.1 (essio1-ib.gpfs.net) on mlx5_0 port 1 fabnum 0 index 11
Mon Mar  2 07:36:48.064 2020: [I] Cluster Manager connection broke. Probing cluster chgi-psc.gpfs.net
Mon Mar  2 07:36:48.079 2020: [N] VERBS RDMA closed connection to 172.26.3.2 (essio2-ib.gpfs.net) on mlx5_0 port 1 fabnum 0 index 14
Mon Mar  2 07:36:48.080 2020: [N] VERBS RDMA closed connection to 172.26.3.2 (essio2-ib.gpfs.net) on mlx5_0 port 1 fabnum 0 index 4
Mon Mar  2 07:36:48.081 2020: [N] VERBS RDMA closed connection to 172.26.3.2 (essio2-ib.gpfs.net) on mlx5_0 port 1 fabnum 0 index 15
Mon Mar  2 07:36:48.082 2020: [N] VERBS RDMA closed connection to 172.26.3.2 (essio2-ib.gpfs.net) on mlx5_0 port 1 fabnum 0 index 7
Mon Mar  2 07:36:48.565 2020: [E] Unable to contact any quorum nodes during cluster probe.
Mon Mar  2 07:36:48.566 2020: [E] Lost membership in cluster chgi-psc.gpfs.net. Unmounting file systems.
Mon Mar  2 07:36:48.784 2020: [I] Calling user exit script mmUnmountFs: event unmount, Async command /usr/lpp/mmfs/lib/mmsysmon/sendRasEventToMonitor.
Mon Mar  2 07:36:48.828 2020: [I] The quota client for file system gpfs0 has terminated.
Mon Mar  2 07:36:49 MST 2020: [I] sendRasEventToMonitor: Successfully send a filesystem event to monitor
Mon Mar  2 07:36:49.384 2020: [N] Connecting to 172.26.3.1 essio1-ib.gpfs.net <c0p0>
Mon Mar  2 07:36:59.396 2020: [N] Connecting to 172.26.3.2 essio2-ib.gpfs.net <c0p1>
Mon Mar  2 07:37:09.408 2020: [N] Connecting to 172.26.3.251 ems1-ib.gpfs.net <c0p2>
Mon Mar  2 07:37:31.443 2020: [I] Connected to 172.26.3.1 essio1-ib.gpfs.net <c0p0>
Mon Mar  2 07:37:36.456 2020: [I] Connected to 172.26.3.2 essio2-ib.gpfs.net <c0p1>
Mon Mar  2 07:37:36.468 2020: [I] Connected to 172.26.3.251 ems1-ib.gpfs.net <c0p2>
Mon Mar  2 07:38:23.972 2020: [I] Node 172.26.3.1 (essio1-ib.gpfs.net) is now the Group Leader.
Mon Mar  2 07:38:23.975 2020: [I] Calling user exit script mmClusterManagerRoleChange: event clusterManagerTakeOver, Async command /usr/lpp/mmfs/bin/mmsysmonc.
Mon Mar  2 07:44:48.684 2020: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 172.26.3.2 essio2-ib.gpfs.net. Sending expel message.
Mon Mar  2 07:44:48.685 2020: [N] Expel data is not collected on any node. It was collected recently at 2020-03-02_07:36:23-0700.
Mon Mar  2 07:44:48.686 2020: [N] Request sent to 172.26.3.1 (essio1-ib.gpfs.net) to expel 172.26.3.2 (essio2-ib.gpfs.net) from cluster chgi-psc.gpfs.net
Mon Mar  2 07:44:48.687 2020: [N] This node will be expelled from cluster chgi-psc.gpfs.net due to expel msg from 172.26.3.250 (tsm-ib.gpfs.net)
Mon Mar  2 07:45:03.036 2020: [I] Cluster Manager connection broke. Probing cluster chgi-psc.gpfs.net
Mon Mar  2 07:45:03.537 2020: [E] Unable to contact any quorum nodes during cluster probe.
Mon Mar  2 07:45:03.538 2020: [E] Lost membership in cluster chgi-psc.gpfs.net. Unmounting file systems.
Mon Mar  2 07:45:04.238 2020: Failed to open gpfs0.
Mon Mar  2 07:45:04.239 2020: File system unmounted due to loss of cluster membership.
Mon Mar  2 07:45:04.240 2020: [E] Failed to open gpfs0.
Mon Mar  2 07:45:04.241 2020: [E] Remount failed for device gpfs0: Stale file handle

while on the essio2 server:

Mon Mar  2 07:36:23.002 2020: [E] Timed out waiting for a reply from node 172.26.3.250 tsm-ib.gpfs.net
Mon Mar  2 07:36:23.003 2020: Sending request to collect expel debug data to tsm-ib.gpfs.net localNode
Mon Mar  2 07:36:23.004 2020: [I] Calling User Exit Script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon.
mmtrace: move /tmp/mmfs/lxtrace.trc.essio2-ib.recycle.cpu0 /tmp/mmfs/trcfile.200302.07.36.23.27103.expel.essio2-ib.recycle.cpu0
mmtrace: formatting /tmp/mmfs/trcfile.200302.07.36.23.27103.expel.essio2-ib.recycle to /tmp/mmfs/trcrpt.200302.07.36.23.27103.expel.essio2-ib.gz
Mon Mar  2 07:36:33.235 2020: [N] Request sent to 172.26.3.1 (essio1-ib.gpfs.net) to expel 172.26.3.250 (tsm-ib.gpfs.net) from cluster chgi-psc.gpfs.net
Mon Mar  2 07:36:48.060 2020: [D] Leave protocol detail info: LA: 4 LFLG: 15359118 LFLG delta: 4
Mon Mar  2 07:36:48.062 2020: [N] VERBS RDMA closed connection to 172.26.3.250 (tsm-ib.gpfs.net) on mlx5_0 port 1 fabnum 0 index 180
Mon Mar  2 07:36:48.067 2020: [N] VERBS RDMA closed connection to 172.26.3.250 (tsm-ib.gpfs.net) on mlx5_0 port 2 fabnum 0 index 181
Mon Mar  2 07:36:48.071 2020: [N] VERBS RDMA closed connection to 172.26.3.250 (tsm-ib.gpfs.net) on mlx5_1 port 1 fabnum 0 index 182
Mon Mar  2 07:36:48.076 2020: [N] VERBS RDMA closed connection to 172.26.3.250 (tsm-ib.gpfs.net) on mlx5_1 port 2 fabnum 0 index 183
Mon Mar  2 07:36:48.079 2020: [I] Recovering nodes: 172.26.3.250
Mon Mar  2 07:36:48.109 2020: [I] Recovery: gpfs0, delay 90 sec. for safe recovery.
Mon Mar  2 07:36:58.083 2020: [W] Snapshot quiesce of SG gpfs0 snap 7/7088 doing 'mmdelsnapshot home::hr-2020.03.02-08.30.38' timed out on node <c0n1>. Retrying if possible.
Mon Mar  2 07:37:08.109 2020: [W] Snapshot quiesce of SG gpfs0 snap 7/7088 doing 'mmdelsnapshot home::hr-2020.03.02-08.30.38' timed out on node <c0n1>. Retrying if possible.
Mon Mar  2 07:37:08.110 2020: [W] Snapshot quiesce of SG gpfs0 snap 7/7088 doing 'mmdelsnapshot home::hr-2020.03.02-08.30.38' timed out on node <c0n45>. Retrying if possible.
Mon Mar  2 07:37:09.408 2020: [N] The server side TLS handshake with node 172.26.3.250 was cancelled: connection reset by peer (return code 420).
Mon Mar  2 07:37:09.409 2020: [X] Connection from 172.26.3.250 tsm-ib.gpfs.net <c0n7> refused, authentication failed
Mon Mar  2 07:37:09.410 2020: [E] Killing connection from 172.26.3.250, err 703
Mon Mar  2 07:37:09.411 2020: Operation not permitted
Mon Mar  2 07:37:36.456 2020: [I] Accepted and connected to 172.26.3.250 tsm-ib.gpfs.net <c0n7>
Mon Mar  2 07:38:18.132 2020: [I] Log recovery for log group 43 in gpfs0 completed in 0.022005000s
Mon Mar  2 07:38:18.218 2020: [W] Snapshot quiesce of SG gpfs0 snap 7/7088 doing 'mmdelsnapshot home::hr-2020.03.02-08.30.38' timed out on node <c0n1>. Retrying if possible.
Mon Mar  2 07:38:18.219 2020: [W] Snapshot quiesce of SG gpfs0 snap 7/7088 doing 'mmdelsnapshot home::hr-2020.03.02-08.30.38' timed out on node <c0n42>. Retrying if possible.
Mon Mar  2 07:38:18.220 2020: [W] Snapshot quiesce of SG gpfs0 snap 7/7088 doing 'mmdelsnapshot home::hr-2020.03.02-08.30.38' timed out on node <c0n45>. Retrying if possible.
Mon Mar  2 07:38:18.680 2020: [I] Recovered 1 nodes for file system gpfs0.
Mon Mar  2 07:38:41.231 2020: [I] Command: mmlspool /dev/gpfs0 all -L -Y
Mon Mar  2 07:38:41.232 2020: [I] Command: successful mmlspool /dev/gpfs0 all -L -Y
Mon Mar  2 07:44:49.139 2020: [N] sdrServ: Received expel data collection request from 172.26.3.1
Mon Mar  2 07:44:49.140 2020: [N] GPFS will attempt to collect debug data on this node.
mmtrace: move /tmp/mmfs/lxtrace.trc.essio2-ib.recycle.cpu0 /tmp/mmfs/trcfile.200302.07.44.49.28778.expel.essio2-ib.recycle.cpu0
mmtrace: formatting /tmp/mmfs/trcfile.200302.07.44.49.28778.expel.essio2-ib.recycle to /tmp/mmfs/trcrpt.200302.07.44.49.28778.expel.essio2-ib.gz
Mon Mar  2 07:45:03.037 2020: [D] Leave protocol detail info: LA: 30 LFLG: 15359587 LFLG delta: 30
Mon Mar  2 07:45:03.038 2020: [I] Recovering nodes: 172.26.3.250
Mon Mar  2 07:45:03.077 2020: [I] Recovery: gpfs0, delay 64 sec. for safe recovery.
Mon Mar  2 07:45:19.817 2020: [N] The server side TLS handshake with node 172.26.3.250 was cancelled: connection reset by peer (return code 420).
Mon Mar  2 07:45:19.818 2020: [X] Connection from 172.26.3.250 tsm-ib.gpfs.net <c0n7> refused, authentication failed
Mon Mar  2 07:45:19.819 2020: [E] Killing connection from 172.26.3.250, err 703
Mon Mar  2 07:45:19.820 2020: Operation not permitted
Mon Mar  2 07:45:30.343 2020: [I] Accepted and connected to 172.26.3.250 tsm-ib.gpfs.net <c0n7>
Mon Mar  2 07:46:07.638 2020: [I] Recovered 1 nodes for file system gpfs0.

So, one of the storage cluster nodes couldn't talk to TSM. ([E] Timed out waiting for a reply from node 172.26.3.250 tsm-ib.gpfs.net), which lead to tsm being expelled. The cluster tries to reestablish connection with TSM but it seems to be failing (The server side TLS handshake with node 172.26.3.250 was cancelled: connection reset by peer (return code 420).) while the TSM server says it has connected to both storage clusters.

After 5 minutes after this, TSM says its communication with the cluster has timed out (Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 172.26.3.2 essio2-ib.gpfs.net. Sending expel message.), which instead results in the cluster expelling TSM.

Spectrum Scale Web Interface Slow[edit | edit source]

Symptoms:

  • The Spectrum Scale Web UI is extremely slow to load, especially snapshots
  • Postgres pegs a core or two now and then:
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                        
    16624 postgres  20   0  184896  46208  33600 R  99.7  0.2   1:22.49 postgres                                                                                                                                       
    16680 postgres  20   0  188352  50112  33088 R  99.7  0.2   0:52.04 postgres
    
  • Snapshots have not been taking place according to their schedules.

Looking at the Postgres server, it looks like it's running a complicated query on a view:

postgres=# select * FROM pg_stat_activity;
...

2020-02-24 18:22:08.600034-06 | f       | active              | SELECT l.*,l.acknowledged || translate(right(l.colour,1),'DWNY','1234') AS severity, m.text AS description,m.explanation, m.system_action, m.repair
_action, replace(replace(replace(m.text, '{0}', split_part(arguments, ',', 1)), '{1}', split_part(arguments, ',', 2)), '{2}', split_part(arguments, ',', 3)) as formatted_description FROM ( SELECT id,cluster_id,c
olour,message_code,arguments,event_time,entity_type,entity_id,entity_name,acknowledged,sensor,sensor_category FROM fscc.gss_state_log_view UNION ALL SELECT * FROM ( SELECT id,cluster_id,colour,message_code,argum
ents,event_time,ilog.entity_type,entity_id,entity_name,acknowledged,'',isens.sensor_category FROM fscc.gss_log AS ilog LEFT JOIN fscc.gss_sensor AS isens ON ilog.entity_type=isens.entity_type AND isens.sensor='D
EFAULT' ) AS i ) AS l JOIN fscc.messages AS m ON l.message_code=m.message_code AND m.locale=$1 WHERE (ACKNOWLEDGED = 'N') and ((COLOUR = 'RED') or (COLOUR = 'YELLOW'))  ORDER BY SEVERITY DESC OFFSET $2 LIMIT $3

Looking at a database dump of fscc, the largest table by far is gss_state_history as it has at least 900000 entries.

10120:COPY gss_state_history (state_id, cluster_id, entity_type, entity_id, sensor, previous_state, previous_info, current_state, current_info, event_time) FROM stdin;
912165:COPY gss_vdisk (vdisk_id, cluster_id, vd_name, da_id, rg_id, raidcode, capacity, remarks, track_size, checksum_g) FROM stdin;

The gss_state_log_view view is defined as:

SELECT (-1) * s.state_id AS id, s.cluster_id, ev.colour, ev.message_code, 
    pg_catalog.concat_ws(','::text, en.entity_name, ('"'::text

I cleared out the gss_state_history table.

postgres=# SELECT COUNT(*) FROM fscc.gss_state_history WHERE event_time >= '2018-01-01' AND event_time <= '2019-01-1' AND sensor = 'GPFS_STATE';
 count  
--------
 391441
(1 row)

postgres=# DELETE FROM fscc.gss_state_history WHERE event_time >= '2018-01-01' AND event_time <= '2019-01-1' AND sensor = 'GPFS_STATE';
DELETE 391441
postgres=# SELECT COUNT(*) FROM fscc.gss_state_history;
 count  
--------
 510596
(1 row)

postgres=# SELECT COUNT(*) FROM fscc.gss_state_history WHERE event_time >= '2018-01-01' AND event_time <= '2019-01-1';
 count 
-------
  3420
(1 row)

postgres=# SELECT COUNT(*) FROM fscc.gss_state_history WHERE event_time >= '2018-01-01' AND event_time <= '2020-01-1';
 count  
--------
 434680
(1 row)

postgres=# SELECT COUNT(*) FROM fscc.gss_state_history WHERE event_time >= '2018-01-01' AND event_time <= '2020-01-1' AND sensor = 'GPFS_STATE';
 count  
--------
 422628
(1 row)

postgres=# DELETE FROM fscc.gss_state_history WHERE event_time >= '2018-01-01' AND event_time <= '2020-01-1' AND sensor = 'GPFS_STATE';
DELETE 422628

Removing all those log entries didn't seem to help. The view is just really slow.

Listing all processes on the SQL server, there are a bunch of queries that are "Idle in transaction". This suggests that there is a transaction that's keeping the table locked. The logs for Spectrum Scale shows a bunch of exceptions which could possibly be causing a thread to open a transaction without closing properly. An example of one exception is given below.

2020-02-25 11:51:39.057-07:00 com.ibm.gss.util.log.EvoLoggerAdapter logException [tid=95]      ems1.gpfs.net IP Error processing NSD event for topology
  com.ibm.fscc.gss.api.GssApiException: java.lang.NullPointerException
	at com.ibm.gss.gui.events.converters.BaseNsdEventConverter.createUpdateNsdStateEvent(BaseNsdEventConverter.java:495)
	at com.ibm.gss.gui.events.converters.BaseNsdEventConverter.generateEventsFor(BaseNsdEventConverter.java:282)
	at com.ibm.gss.gui.events.converters.GssNsdEventConverter.generateEventsFor(GssNsdEventConverter.java:24)
	at com.ibm.gss.gui.events.converters.GssNsdEventConverter.generateEventsFor(GssNsdEventConverter.java:21)
	at com.ibm.gss.gui.events.BackendObserverManager.processEvent(BackendObserverManager.java:150)
	at com.ibm.fscc.observer.ObserverManager.run(ObserverManager.java:118)
	at java.lang.Thread.run(Thread.java:785)
Caused by: java.lang.NullPointerException
	at com.ibm.gss.gui.logic.BaseRPC.getCurrentClusterID(BaseRPC.java:45)
	at com.ibm.gss.gui.logic.MonitorRPC.getNSDStatus(MonitorRPC.java:1947)
	at com.ibm.gss.gui.events.converters.BaseNsdEventConverter.createUpdateNsdStateEvent(BaseNsdEventConverter.java:492)
	... 6 more

Solution?: It turns out the /var/lib/mmfs/gui/preferences.xml file's version mismatched what was stored in the Cluster Configuration Repository (CCR) which you can see by running mmccr flist. This resulted in the GPFS GUI java application to constantly try updating the file with incremental version numbers. The trace file /var/log/cnlog/mgtsrv/mgtsrv-trace-log-0 shows the repeated mmccr fput calls:

2020-02-25T08:33:21 >AbstractCommand.buildOsCommandWrapper:183< FINER: making call to /usr/lpp/mmfs/bin/mmccr 'fput' -c 4445172 'gui' '/var/lib/mmfs/gui/preferences.xml'
2020-02-25T08:33:21 >OsCommandWrapper.execute:295< FINER: EFSSA0037I Command /usr/lpp/mmfs/bin/mmccr 'fput' -c 4445172 'gui' '/var/lib/mmfs/gui/preferences.xml'  was issued on host ems1.gpfs.net by user admin. clusterid=0
2020-02-25T08:33:21 >OsCommandWrapper.execute:341< FINER: Command /usr/lpp/mmfs/bin/mmccr 'fput' -c 4445172 'gui' '/var/lib/mmfs/gui/preferences.xml'  finished with exit code 153
2020-02-25T08:33:21 >AbstractCommand.buildOsCommandWrapper:183< FINER: making call to /usr/lpp/mmfs/bin/mmccr 'fput' -c 4445172 'gui' '/var/lib/mmfs/gui/preferences.xml'
2020-02-25T08:33:21 >CcrPreferences.flush:273< FINE: execution failed: /usr/lpp/mmfs/bin/mmccr 'fput' -c 4445172 'gui' '/var/lib/mmfs/gui/preferences.xml'
2020-02-25T08:33:21 >CcrPreferences.flush:274< FINE: /usr/lpp/mmfs/bin/mmccr 'fput' -c 4445172 'gui' '/var/lib/mmfs/gui/preferences.xml'
Command finished with exit code 153<No stdout> syserr:fput failed: Version mismatch on conditional put (err 805)

2020-02-25T08:33:21 >AbstractCommand.buildOsCommandWrapper:183< FINER: making call to /usr/lpp/mmfs/bin/mmccr 'fput' -c 4445173 'gui' '/var/lib/mmfs/gui/preferences.xml'
2020-02-25T08:33:21 >OsCommandWrapper.execute:295< FINER: EFSSA0037I Command /usr/lpp/mmfs/bin/mmccr 'fput' -c 4445173 'gui' '/var/lib/mmfs/gui/preferences.xml'  was issued on host ems1.gpfs.net by user admin. clusterid=0
2020-02-25T08:33:22 >OsCommandWrapper.execute:341< FINER: Command /usr/lpp/mmfs/bin/mmccr 'fput' -c 4445173 'gui' '/var/lib/mmfs/gui/preferences.xml'  finished with exit code 153
2020-02-25T08:33:22 >AbstractCommand.buildOsCommandWrapper:183< FINER: making call to /usr/lpp/mmfs/bin/mmccr 'fput' -c 4445173 'gui' '/var/lib/mmfs/gui/preferences.xml'
2020-02-25T08:33:22 >CcrPreferences.flush:273< FINE: execution failed: /usr/lpp/mmfs/bin/mmccr 'fput' -c 4445173 'gui' '/var/lib/mmfs/gui/preferences.xml'
2020-02-25T08:33:22 >CcrPreferences.flush:274< FINE: /usr/lpp/mmfs/bin/mmccr 'fput' -c 4445173 'gui' '/var/lib/mmfs/gui/preferences.xml'
Command finished with exit code 153<No stdout> syserr:fput failed: Version mismatch on conditional put (err 805)

It appears to increment the version number 275 times per minute which suggests that this has been going on non-stop for the past week.

The fix I tried is to stop the GPFS GUI service and edit /var/lib/mmfs/gui/preferences.xml so that the version number matches that of mmccr flist. Then, restart the GPFS GUI service. Applying changes to a snapshot schedule should now succeed. The trace log should also not be erroring out when running mmccr.

Scheduled snapshots not being created[edit | edit source]

Snapshot schedules defined in the GPFS Spectrum Scale web interface is not being executed for some or all fileset. Upon further investigation, it appears that a corrupt internal state within the GPFS GUI is causing the scheduled snapshot tasks from completing properly.

The fix is to remove all invalid or mismatched references from /var/lib/mmfs/gui/preferences.xml and the Postgres database used by GPFS Spectrum Scale GUI. Do so by editing the preferences.xml file and then committing it with mmccr. /usr/lpp/mmfs/bin/mmccr 'fput' -c 4537411 'gui' '/var/lib/mmfs/gui/preferences.xml'

Edit the database by entering pgsql with sudo -u postgres psql

Fix Procedures[edit | edit source]

  1. First, stop the GPFS GUI systemctl stop gpfsgui and backup the /var/lib/mmfs/gui/preferences.xml file.
  2. Get the current preferences version number with mmccr flist. The name for preferences.xml is 'gui'.
  3. Obtain all snapshot IDs that are currently valid using mmlssnapshot. IDs are on the 2nd column.
    # mmlssnapshot
    D7-2020.04.28-07.00.22   10171     Valid   Tue Apr 28 01:00:23 2020  qlong
    D7-2020.04.28-07.00.22   10172     Valid   Tue Apr 28 01:00:27 2020  snyder_irida
    ...
    
  1. Get all snapshot associations.
    # /usr/lpp/mmfs/gui/cli/lssnapassoc
    Cluster           Device Fileset      Rule  Last implemented Last update
    chgi-psc.gpfs.net gpfs0  qlong        D7    N/A              3/4/20 1:09 PM
    chgi-psc.gpfs.net gpfs0  qlong        QH4   N/A              3/4/20 1:10 PM
    EFSSG1000I The command completed successfully.
    
  1. Edit /var/lib/mmfs/gui/preferences.xml and:
    1. Remove any <node> elements containing snapshot references that do not exist or have a 'STATE' that isn't 'active'.
    2. Remove any <node> elements containing rules that do not exist or have a 'STATE' that isn't 'active'.
    3. Ensure that the version number matches that of mmccr flist obtained above.
    • Here are some examples of invalid GPFS snapshots (either deleting or creating) that were removed from /var/lib/mmfs/gui/preferences.xml:
      <node name="gpfs0 !!!@@@!!!4195">
              <map>
                <entry key="RULENAME" value="D7"/>
                <entry key="STATE" value="deleting"/>
              </map>
            </node>
      
            <node name="gpfs0 !!!@@@!!!H12-2020.05.04-18.20.43_12">
              <map>
                <entry key="RULENAME" value="H12"/>
                <entry key="STATE" value="creating"/>
                <entry key="COMMENT" value="SCHEDULED"/>
              </map>
            </node>
      
            <node name="gpfs0 !!!@@@!!!D1-2017.04.12-05.30.16_6">
              <map>
                <entry key="RULENAME" value="D1"/>
                <entry key="STATE" value="creating"/>
                <entry key="COMMENT" value="SCHEDULED"/>
              </map>
            </node>
      
  1. Enter the PostgreSQL prompt and delete any snapshots that were removed from the step above. There should not be any snapshots with invalid states after this.
    # sudo -u postgres psql
    postgres=# select * FROM fscc.snapshot WHERE id = 'removed-ids-from-above';
    postgres=# DELETE FROM fscc.snapshot WHERE id = 'removed-ids-from-above';
    
    ## Look for any invalid snapshots remaining and ensure these are removed from the preferences.xml file, then delete the records.
    postgres=# select * FROM fscc.snapshot WHERE status = 'Invalid';
    postgres=# DELETE FROM fscc.snapshot WHERE status = 'Invalid';
    
    If you do have invalid snapshots, the table might look something like this:
    postgres=# select * FROM fscc.snapshot WHERE fileset_name = 'qlong';
    
          cluster_id      | devicename |            id             |                         directory |        status  |         created         | metadata |   data    |       last_update       | is_psnap |      full_snap_name      | filesetid | fileset_name 
    ----------------------+------------+---------------------------+-----------------------------------+----------------+-------------------------+----------+-----------+-------------------------+----------+--------------------------+-----------+--------------
     2878855853427990921  | gpfs0      | D7-2017.05.14-05.30.45_12 | D7-2017.05.14-05.30.45            | Invalid        | 2017-05-14 00:30:45.963 |        0 |         0 | 2017-05-14 00:30:45.963 | N        | D7-2017.05.14-05.30.45   |        12 | qlong
     2878855853427990921  | gpfs0      | 335                       | D-2017.12.15-03.15.24             | Valid          | 2017-12-15 04:12:22     |        0 |         0 | 2017-12-17 12:43:52.756 | N        | D-2017.12.15-03.15.24    |        12 | qlong
     2878855853427990921  | gpfs0      | 7157                      | @GMT-2020.03.04-17.27.39          | Valid          | 2020-03-04 10:27:42     |        0 |         0 | 2020-03-04 10:27:43.214 | N        | @GMT-2020.03.04-17.27.39 |        12 | qlong
     2878855853427990921  | gpfs0      | D7-2017.04.27-05.30.02_12 | D7-2017.04.27-05.30.02            | Invalid        | 2017-04-27 00:30:02.603 |        0 |         0 | 2017-04-27 00:30:02.603 | N        | D7-2017.04.27-05.30.02   |        12 | qlong
     2878855853427990921  | gpfs0      | D7-2017.05.13-05.30.44_12 | D7-2017.05.13-05.30.44            | Invalid        | 2017-05-13 00:30:44.094 |        0 |         0 | 2017-05-13 00:30:44.094 | N        | D7-2017.05.13-05.30.44   |        12 | qlong
     2878855853427990921  | gpfs0      | D7-2017.12.18-06.30.46_12 | D7-2017.12.18-06.30.46            | Invalid        | 2017-12-18 00:30:46.082 |        0 |         0 | 2017-12-18 00:30:46.082 | N        | D7-2017.12.18-06.30.46   |        12 | qlong
     2878855853427990921  | gpfs0      | 356                       | D-2017.12.18-03.10.44             | Valid          | 2017-12-18 04:06:09     |        0 |         0 | 2017-12-18 04:17:18.359 | N        | D-2017.12.18-03.10.44    |        12 | qlong
     2878855853427990921  | gpfs0      | 328                       | D-2017.12.14-03.11.09             | Valid          | 2017-12-14 04:06:48     |        0 |         0 | 2017-12-17 12:43:52.751 | N        | D-2017.12.14-03.11.09    |        12 | qlong
     2878855853427990921  | gpfs0      | 7100                      | @GMT-2020.03.02-20.13.21          | Valid          | 2020-03-02 13:13:28     |   854016 |    358016 | 2020-03-02 13:13:29.243 | N        | @GMT-2020.03.02-20.13.21 |        12 | qlong
     2878855853427990921  | gpfs0      | 6902                      | D-2020.02.23-23.05.10             | Valid          | 2020-02-23 23:05:15     |    78272 | 291551936 | 2020-02-24 18:43:15.949 | N        | D-2020.02.23-23.05.10    |        12 | qlong
     2878855853427990921  | gpfs0      | 6906                      | D-2020.02.24-15.14.14             | Valid          | 2020-02-24 15:14:20     |    58368 |      9280 | 2020-02-24 18:43:15.951 | N        | D-2020.02.24-15.14.14    |        12 | qlong
     2878855853427990921  | gpfs0      | 7105                      | @GMT-2020.03.02-23.34.16          | Valid          | 2020-03-02 16:59:01     | 33722848 | 207304672 | 2020-03-02 16:59:02.253 | N        | @GMT-2020.03.02-23.34.16 |        12 | qlong
     2878855853427990921  | gpfs0      | 7135                      | @GMT-2020.03.04-00.04.11          | Valid          | 2020-03-03 18:27:12     |        0 |         0 | 2020-03-03 18:27:12.204 | N        | @GMT-2020.03.04-00.04.11 |        12 | qlong
     2878855853427990921  | gpfs0      | 6928                      | D-2020.02.25-23.07.30             | Valid          | 2020-02-25 23:08:44     |   130048 |     86784 | 2020-02-25 23:27:42.31  | N        | D-2020.02.25-23.07.30    |        12 | qlong
     2878855853427990921  | gpfs0      | 6912                      | D-2020.02.24-23.06.20             | Valid          | 2020-02-24 23:06:25     |    84128 | 116962496 | 2020-02-24 23:18:33.318 | N        | D-2020.02.24-23.06.20    |        12 | qlong
     2878855853427990921  | gpfs0      | 321                       | D-2017.12.13-03.15.03             | Valid          | 2017-12-13 04:11:37     |        0 |         0 | 2017-12-17 12:43:52.747 | N        | D-2017.12.13-03.15.03    |        12 | qlong
     2878855853427990921  | gpfs0      | 349                       | D-2017.12.17-03.08.24             | Valid          | 2017-12-17 04:03:32     |        0 |         0 | 2017-12-17 12:43:52.764 | N        | D-2017.12.17-03.08.24    |        12 | qlong
     2878855853427990921  | gpfs0      | 342                       | D-2017.12.16-03.06.49             | Valid          | 2017-12-16 04:01:38     |        0 |         0 | 2017-12-17 12:43:52.761 | N        | D-2017.12.16-03.06.49    |        12 | qlong
    (18 rows)
    
  1. Exit the postgres prompt and commit the changed preferences.xml file. Replace the version number with the number returned from mmccr flist.
    ## Find the current file version number and then commit the file
    # mmccr flist
    
  1. Restart the GPFS GUI.

Spectrum Scale shows failing fans[edit | edit source]

On one of the GPFS Spectrum Scale RAID instances, the UI complains of bad fans in both GPFS Storage Servers (GSS) both model GSS26 enclosures. Interestingly, only odd numbered fans are operational while even numbered fans are not. Physically, all the fans appear to be running and the system temperature is normal.

The hardware metric data is obtained from xCAT. This can be viewed by running:

# rvitals gssio2 all
...
gssio2: Fan 1A Tach: 5664 RPM
gssio2: Fan 1B Tach: N/A
gssio2: Fan 2A Tach: 5664 RPM
gssio2: Fan 2B Tach: N/A
gssio2: Fan 3A Tach: 5664 RPM
gssio2: Fan 3B Tach: N/A
gssio2: Fan 4A Tach: 5664 RPM
gssio2: Fan 4B Tach: N/A
gssio2: Fan 5A Tach: 5605 RPM
gssio2: Fan 5B Tach: N/A
gssio2: Fan 6A Tach: 5605 RPM
gssio2: Fan 6B Tach: N/A
...

It appears that the B fans aren't connected and this measurement is triggering a failure in the Spectrum Scale RAID interface.

IBM confirmed that this is a known issue and the fix is to update Spectrum Scale.

Cannot create Snapshots[edit | edit source]

When trying to create or delete a snapshot, I get an error about unable to quiesce all nodes.

# time mmdelsnapshot gpfs0 tsm-2020-06-12                             
Invalidating snapshot files in tsm-2020-06-12...                                  
Unable to quiesce all nodes; some processes are busy or holding required resources.
Delete snapshot tsm-2020-06-12 complete, err = 78                                 
mmdelsnapshot: Command failed. Examine previous error messages to determine cause.

One of the member nodes appeared to have crashed or hung on the GPFS mount. After taking the offending node down with mmumount, snapshots can be deleted and created again.

Cannot create or list snapshots[edit | edit source]

[root@ems1 ~]# mmlssnapshot gpfs0
Unable to start tslssnapshot on 'gpfs0' because conflicting program tscrsnapshot is running. Waiting until it completes.

Run mmdiag --waiters on all nodes. There should be one that shows a bunch of waiters.

[root@essio1 ~]# mmdiag --waiters

=== mmdiag: waiters ===
0x3FFDC426F240 (  92392) waiting 15376.120518183 seconds, TSCrSnapshotCmdThread: on ThCond 0x3FFDE0003768 (0x3FFDE0003768) (MsgRecordCondvar), reason 'RPC wait' for sgmMsgSnapshotOps
0x3FFCDC06B4F0 ( 152995) waiting 14814.171937251 seconds, TSLsFilesetCmdThread: on ThCond 0x3FFED033ECD0 (0x3FFED033ECD0) (Perm2RunQueueCondvar), reason 'waiting for permission to run'
0x3FFD953F9F10 ( 152776) waiting 14533.730973987 seconds, TSSnapshotCmdThread: on ThCond 0x3FFD744A8A60 (0x3FFD744A8A60) (Perm2RunQueueCondvar), reason 'waiting for permission to run'
0x3FFD953F78B0 ( 152774) waiting 14076.411588844 seconds, PolicyCmdThread: on ThCond 0x3FFD6C436640 (0x3FFD6C436640) (Perm2RunQueueCondvar), reason 'waiting for permission to run'
0x3FFCDC03DBD0 ( 152956) waiting 13947.941451663 seconds, TSLsFilesetCmdThread: on ThCond 0x3FFE18409900 (0x3FFE18409900) (Perm2RunQueueCondvar), reason 'waiting for permission to run'
0x3FFD953C8C60 ( 152735) waiting 13914.121744943 seconds, TSLsFilesetCmdThread: on ThCond 0x3FFCC83755B0 (0x3FFCC83755B0) (Perm2RunQueueCondvar), reason 'waiting for permission to run'
0x3FFCDC02CF30 ( 152942) waiting 13014.100460180 seconds, TSLsFilesetCmdThread: on ThCond 0x3FFDE04F4080 (0x3FFDE04F4080) (Perm2RunQueueCondvar), reason 'waiting for permission to run'
0x3FFD9539A010 ( 152696) waiting 12114.041025231 seconds, TSLsFilesetCmdThread: on ThCond 0x3FFEC841CC20 (0x3FFEC841CC20) (Perm2RunQueueCondvar), reason 'waiting for permission to run'
... (it goes on for a bit, with all reason being waiting for permission to run)

You can try to turn off mmfsd (mmshutdown on this node) or reboot the node to 'fix' the long waiter.

[root@essio1 ~]# mmshutdown -N essio1-ib
Tue Dec 22 17:03:24 MST 2020: mmshutdown: Starting force unmount of GPFS file systems
Tue Dec 22 17:03:29 MST 2020: mmshutdown: Shutting down GPFS daemons
essio1-ib.gpfs.net:  Shutting down!
essio1-ib.gpfs.net:  'shutdown' command about to kill process 144377
essio1-ib.gpfs.net:  Master did not clean up; attempting cleanup now
essio1-ib.gpfs.net:  Tue Dec 22 17:04:31.176 2020: [N] mmfsd is shutting down.
essio1-ib.gpfs.net:  Tue Dec 22 17:04:31.177 2020: [N] Reason for shutdown: mmfsadm shutdown command timed out
essio1-ib.gpfs.net:  Tue Dec 22 17:04:31 MST 2020: mmcommon mmfsdown invoked.  Subsystem: mmfs Status: down
essio1-ib.gpfs.net:  Tue Dec 22 17:04:31 MST 2020: mmcommon: Unmounting file systems ...
essio1-ib.gpfs.net:  Unloading modules from /lib/modules/3.10.0-229.26.2.el7.ppc64/extra
essio1-ib.gpfs.net:  Unloading module mmfs26
essio1-ib.gpfs.net:  Unloading module mmfslinux
Tue Dec 22 17:04:41 MST 2020: mmshutdown: Finished
[root@essio1 ~]# mmstartup -N essio1-ib
Tue Dec 22 17:04:47 MST 2020: mmstartup: Starting GPFS ...

However, with the set up at CHGI, something is causing the quorum node to reboot when this was executed, resulting in the whole GPFS filesystem going down until the system comes back up. Logs show:

exportfs: Could not find '*:/gpfs/achri_data' to unexport.
exportfs: Could not find '*:/gpfs/hyperion_scratch' to unexport.
exportfs: Could not find '*:/gpfs/qlong' to unexport.
exportfs: Could not find '*:/gpfs/ebg_projects' to unexport.
exportfs: Could not find '*:/tiered/kkurek' to unexport.
exportfs: Could not find '*:/gpfs/admin/LSF' to unexport.
exportfs: Could not find '*:/tiered/ewang' to unexport.
exportfs: Could not find '*:/gpfs/vetmed_data' to unexport.
exportfs: Could not find '*:/gpfs/hyperion' to unexport.
exportfs: Could not find '*:/gpfs/ebg_gmb' to unexport.
exportfs: Could not find '*:/tiered/ewang_scratch' to unexport.
exportfs: Could not find '*:/gpfs/ebg_work' to unexport.
exportfs: Could not find '*:/gpfs/snyder_irida' to unexport.
exportfs: Could not find '*:/tiered/morph' to unexport.
exportfs: Could not find '*:/gpfs/achri_galaxy' to unexport.
exportfs: Could not find '*:/gpfs/gallo' to unexport.
exportfs: Could not find '*:/gpfs/ebg_data' to unexport.
exportfs: Could not find '*:/gpfs/snyder_work' to unexport.
exportfs: Could not find '*:/gpfs/common' to unexport.
exportfs: Could not find '*:/tiered/snyder_data' to unexport.
exportfs: Could not find '*:/gpfs/vetmed_stage' to unexport.
exportfs: Could not find '*:/gpfs/charb_data' to unexport.
exportfs: Could not find '*:/gpfs/home' to unexport.
exportfs: Could not find '*:/gpfs/ebg_web' to unexport.
exportfs: Could not find '*:/tiered/smorrissy' to unexport.
Tue Dec 22 17:06:16 MST 2020: mmnfspreunmount: CNFS will be shutdown on this node due to filesystem panic
Tue Dec 22 17:06:16 MST 2020: mmnfspreunmount: This node will be rebooted due to unrecoverable errors after a filesystem panic

The reboot appears to be triggered by /usr/lpp/mmfs/bin/mmnfspreunmount when a '$GPanic' code is passed to it. Most likely, because exportfs is mangled, when GPFS goes down, it brings the node down with it.

Remove node designations[edit | edit source]

I had a failed and irrecoverable node that was part of the quorum that I needed to remove from the cluster.

[root@essio2 ~]# mmlscluster
...
 Node  Daemon node name        IP address      Admin node name         Designation
-----------------------------------------------------------------------------------
   1   essio1-ib.gpfs.net      172.26.3.1      essio1-ib.gpfs.net      quorum-manager-perfmon
   2   essio2-ib.gpfs.net      172.26.3.2      essio2-ib.gpfs.net      quorum-manager-perfmon
   3   ems1-ib.gpfs.net        172.26.3.251    ems1-ib.gpfs.net        quorum-perfmon

To remove the designations, run:

# mmchnode --nomanager -N essio1-ib.gpfs.net
# mmchnode --nonquorum -N essio1-ib.gpfs.net
# mmchnode --noperfmon -N essio1-ib.gpfs.net

It should now look like this:

[root@essio2 ~]# mmlscluster
...
 Node  Daemon node name        IP address     Admin node name         Designation
----------------------------------------------------------------------------------
   1   essio1-ib.gpfs.net      172.26.3.1     essio1-ib.gpfs.net
   2   essio2-ib.gpfs.net      172.26.3.2     essio2-ib.gpfs.net      quorum-manager-perfmon
   3   ems1-ib.gpfs.net        172.26.3.251   ems1-ib.gpfs.net        quorum-perfmon

See Also[edit | edit source]