IBM Spectrum Protect
IBM Spectrum Protect is a backup solution by IBM. Prior to version 7.1.3, it was formerly known as Tivoli Storage Manager or TSM for short.
For training on Spectrum Protect, check out some of the free resources at https://www.ibm.com/training/search?query=Spectrum%20protect
Usage
Admin CLI is used to configure the TSM system. The TSM client CLI is used to list and restore files on the client system.
TSM Command Line Usage
The TSM command line utility can be invoked by running dsmc
. The client interface will only show a subset of parameters that you will see in the admin console. It looks like it shows only values applicable to the host that it is running on (ie. filespaces, incl/excl options, etc.).
A nice guide: https://help.it.ox.ac.uk/hfs/help/dsmc
Here is what the TSM client interface looks like.
[root@tsm ~]# dsmc
IBM Tivoli Storage Manager
Command Line Backup-Archive Client Interface
Client Version 7, Release 1, Level 6.5
Client date/time: 02/04/2020 16:01:52
(c) Copyright by IBM Corporation and other(s) 1990, 2017. All Rights Reserved.
Node Name: TSM
Session established with server CHGI_TSM01: Linux/ppc64
Server Version 7, Release 1, Level 7.200
Server date/time: 02/04/2020 16:01:52 Last access: 02/04/2020 05:29:10
tsm>
List Include / Exclude Options
You can see all include/exclude with query inclexcl
, which is similar to what is shown in the Admin Console when running query cloptset [name]
but only limited to this particular host.
tsm> q inclexcl
*** FILE INCLUDE/EXCLUDE ***
Mode Function Pattern (match from top down) Source File
---- --------- ------------------------------ -----------------
No exclude filespace statements defined.
Excl Directory /.../.TsmCacheDir TSM
Exclude All /gpfs/cbousman/.../* Server
Exclude All /.../.snapshots/.../* Server
Exclude All /tiered/vetmed_data/.../* Server
Exclude All /tiered/snyder_data/.../* Server
...
Exclude All /.../*.dbv Server
Include All /gpfs/forever/.../* Server
No DFS include/exclude statements defined.
Use the following wildcards in your include/exclude rules:
Symbol | Description |
---|---|
?
|
Matches exactly one character |
*
|
Matches any number of characters |
...
|
Matches any number of directories |
Other notes:
- You may also use
[a-zA-Z0-9]*
regex expressions as well. - Characters
* ? : [ ]
must be escaped with a backslash inside[]
. Eg.[\:]
for:
. - The
dsm.sys
rules are applied from last to first. All includes should be defined near the top and excludes near the bottom. - Use
exclude.fs
to exclude entire filespaces - Use
exclude.dir
to exclude directories - Both
exclude.fs
andexclude.dir
are applied before all other rules, regardless of order
List Files
To list all files that are available for restore:
tsm> query backup /path/to/file
Size Backup Date Mgmt Class A/I File
---- ----------- ---------- --- ----
4,096 B 11/26/2019 06:43:38 INDEFINITE A /gpfs/home/lleung/.cache
4,096 B 11/26/2019 06:43:54 INDEFINITE A /gpfs/home/lleung/.config
4,096 B 11/26/2019 06:47:09 INDEFINITE A /gpfs/home/lleung/.dbus
4,096 B 11/26/2019 12:06:04 INDEFINITE A /gpfs/home/lleung/.gnupg
...
Additional options:
-inactive
to show previous versions of the files.-subdir=yes
to show sub directories
Restoring Files
To restore a file, use the restore source [destination]
function. If destination is omitted, TSM will restore over the original location. TSM will restore the most recent version (active) of the file.
tsm> restore /gpfs/home/lleung/.zsh /gpfs/home/lleung/restore/
Restore function invoked.
Restoring 4,096 /gpfs/home/lleung/.zsh --> /gpfs/home/lleung/restore/.zsh [Done]
Restore processing finished.
Total number of objects restored: 1
Total number of objects failed: 0
Total number of bytes transferred: 0 B
Data transfer time: 0.00 sec
Network data transfer rate: 0.00 KB/sec
Aggregate data transfer rate: 0.00 KB/sec
Elapsed processing time: 00:00:03
Other options:
-su=yes
is required when restoring the entire directory.-inactive -pick
to restore an inactive file with option to pick which revision to restore.
Deleting Backups
I accidentally backed up a .snapshots directory containing filesystem snapshots. To delete this directory from backups (and to free up space on tape), I ran delete backup
with the deltype=all
option. This should recursively delete any files backed up under the provided path.
tsm> delete backup /tiered/smorrissy/.snapshots/D7-2020.12.08-08.00.27/* -deltype=all
ANS1899I ***** Examined 1,000 files *****
ANS1899I ***** Examined 2,000 files *****
... (many hours later...)
All backup objects in the specified directory and its subdirectories will be deleted. This command ignores the -subdir option. Do you wish to proceed? (Yes (Y)/No (N)) y
By default, this operation will require you to enter 'y' when it's finally finished collecting all the files it needs to delete. This is annoying especially since being idle on the session will cause the session to disconnect. Use the noprompt
option to automatically delete.
tsm> delete backup /tiered/smorrissy/.snapshots/D7-2020.12.08-08.00.27/* -deltype=all -noprompt
See: https://www.ibm.com/support/knowledgecenter/SSGSG7_7.1.2/com.ibm.itsm.client.doc/r_cmd_delbkup.html
Administrative Command Line Usage
The administrative TSM command line can be entered either from the web GUI via the 'Command Builder' or by running dsmadmc
utility. In both cases, you must provide the TSM credentials.
You will be greeted with a prompt similar to the one below.
Session established with server CHGI_TSM01: Linux/ppc64
Server Version 7, Release 1, Level 7.200
Server date/time: 11/19/2019 14:18:43 Last access: 11/12/2019 17:15:04
tsm: CHGI_TSM01>
Commands and parameters can be shortened so long as it is not ambiguous. query volume
can therefore be shortened to q vol
, for example.
Additionally, SQL queries can be executed in this prompt which executes against the underlying DB2 database.
Documentation: https://www.ibm.com/support/knowledgecenter/SSGSG7_7.1.3/srv.reference/r_cmdline_adclient_options.html
SQL
To show all file spaces and their respective storage pool location when backed up:
tsm: CHGI_TSM01> select node_name, filespace_name, physical_mb, stgpool_name FROM occupancy WHERE stgpool_name = 'BACKUPTAPE'
NODE_NAME: TSM
FILESPACE_NAME: /tiered
PHYSICAL_MB: 686244989.70
STGPOOL_NAME: BACKUPTAPE
NODE_NAME: TSM
FILESPACE_NAME: /gpfs
PHYSICAL_MB: 616886529.62
STGPOOL_NAME: BACKUPTAPE
All entries listed here defines which storage pool backups go to. In the example above, /tiered
and /gpfs
will be stored on the BACKUPTAPE
storage pool.
Query
The query command is used to obtain TSM objects.
Documentation available at https://www.ibm.com/support/knowledgecenter/SSEQVQ_8.1.9/srv.reference/r_cmd_query.html
Tapes
To show all tapes in use (those that are in the status 'Private'), their capacity and usage, run query volume
. To show all scratch tapes or the status of a particular tape, select it manually with SELECT * FROM libvolumes WHERE volume_name = 'xxx' OR status = 'Scratch'
.
tsm: CHGI_TSM01> query volume
Volume Name Storage Pool Name Device Class Name Estimated Capacity Pct Util Volume Status
A00091L7 BACKUPTAPE LTO 127.339 G 98.291 Full
A00105L7 BACKUPTAPE LTO 13.642 T 1.331 Filling
A00106L7 BACKUPTAPE LTO 167.739 G 83.773 Full
A00107L7 BACKUPTAPE LTO 203.538 G 97.105 Full
To show contents of a particular tape, run query content tapeid
.
tsm: CHGI_TSM01> query content A00091L7
Node Name Type Filespace FSID Client's Name for File
Name
--------------- ---- ---------- ---- --------------------------------------
TSM Bkup /gpfs 10 /home/atyndall/.cache/mozilla/firefox/
td76tbw4.default/cache2/entries/929BC
F811537CE5A1B05BC367E7D5FCD9D1512C2
TSM Bkup /gpfs 10 /home/atyndall/.mozilla/firefox/td76tb
w4.default/.parentlock
TSM Bkup /gpfs 10 /home/atyndall/.mozilla/firefox/td76tb
w4.default/SiteSecurityServiceState.t
xt
To see all mounted tapes, run QUERY MOUNT
. All drives with a mounted tape will be displayed.
Keep in mind that tapes that have been reclaimed will have a 'Scratch' status and won't show up in the query volume
output.
Schedules
To show all configured backup schedules, run query schedule [format=detailed]
.
tsm: CHGI_TSM01> query schedule
Domain * Schedule Name Action Start Date/Time Duration Period Day
------------ - ---------------- ------ -------------------- -------- ------ ---
SERVERS DAILY_INCR Inc Bk 07/04/2017 18:00:00 1 H 1 D Any
SERVERS MMBACKUP CMD 07/04/2017 02:10:00 1 H 1 D Any
STANDARD DAILY_INCR Inc Bk 07/04/2017 18:00:00 1 H 1 D Any
Side note: The detailed view is the same output as the TSM Client's output when running query schedule
.
Backup Filespace
To show all partitions that are made available to be backed up, run query filespace
. These are partitions that have the potential of being backed up based on schedules and policies but aren't necessarily backed up.
tsm: CHGI_TSM01> query filespace
Node Name Filespace N FSID Platform Filespace Is Filespa Capacity Pct U
ame Type ce Unicode til
?
--------------- ----------- ---- -------- --------- ---------- ----------- -----
CRICK.CHGI.UCAL / 1 LinuxPPC EXT4 No 879 GB 24.9
GARY.CA 64LE
TSM /tsminst1 1 LinuxPPC EXT4 No 30,109 MB 52.9
64
TSM / 2 LinuxPPC EXT4 No 50,268 MB 74.6
64
TSM /boot 3 LinuxPPC EXT4 No 468 MB 68.6
64
TSM /tsmdb0 4 LinuxPPC EXT4 No 196 GB 45.1
64
TSM /tsmdb1 5 LinuxPPC EXT4 No 196 GB 45.1
64
TSM /tsmdb2 6 LinuxPPC EXT4 No 196 GB 45.1
64
TSM /tsmdb3 7 LinuxPPC EXT4 No 196 GB 45.1
64
TSM /tsmlog 8 LinuxPPC EXT4 No 245 GB 57.0
64
TSM /tsmarchlog 9 LinuxPPC EXT4 No 492 GB 5.0
64
TSM /gpfs 10 LinuxPPC GPFS No 1,292 TB 77.9
64
TSM /tiered 14 LinuxPPC GPFS No 1,400 TB 92.8
64
Scripts
Custom scripts are a set of SQL queries that can be executed using the run
command. A list of all scripts can be found with query script
. To show the contents of a particular script, run query script name format=raw
.
List a particular script:
tsm: CHGI_TSM01>query script tapes
Name Description Managing profile
--------------- -------------------------------------------------- --------------------
TAPES list information about tape usage
Commands of a particular script:
tsm: CHGI_TSM01>query script tapes format=raw
select stgpool_name, count(*) as "Tapes" from volumes -
where devclass_name = 'LTO' group by stgpool_name
select count(*) as "Total Storage Pool Tapes" from volumes -
where devclass_name = 'LTO'
select count(*) as "Total Stg. Pool Tapes in Lib." from volumes -
where devclass_name='LTO' and volume_name in (select volume_name from libvolumes where library_name = 'CHGI_TSM')
select status, count(*) as "Tapes" from libvolumes -
where library_name = 'CHGI_TSM' group by status order by status desc
select count (*) as "Total LTO tapes in library" from libvolumes -
where library_name = 'CHGI_TSM'
issue message i "The following tapes are outside the library"
q media * stg=* wherestate=mountablenotinlib
Invoke the script with run scriptname
:
tsm: CHGI_TSM01>run tapes
STGPOOL_NAME Tapes
-------------------------------- ------------
BACKUPTAPE 273
Total Storage Pool Tapes
-------------------------
273
Total Stg. Pool Tapes in Lib.
------------------------------
272
STATUS Tapes
----------------- ------------
Scratch 82
Private 283
Total LTO tapes in library
---------------------------
365
ANR1496I The following tapes are outside the library
Volume N State Location Automated LibNa
ame me
-------- -------------------------- ------------------- ---------------
A00121L7 Mountable not in library ?
ANR1462I RUN: Command script TAPES completed successfully.
Event
Query event to show events that have happened recently. Useful to see if backups are running successfully.
tsm: CHGI_TSM01>query event * * begindate=-1 enddate=today format=detailed
Policy Domain Name Schedule Name Node Name Scheduled Start Actual Start Completed Status Result Reason
------------------------------ ------------------------------ -------------------- -------------------- -------------------- -------------------- --------------- --------------- ---------------------
SERVERS MMBACKUP TSM 02/04/2020 02:10:00 02/04/2020 02:10:06 02/04/2020 05:29:10 Completed 0 All operations comple
ted successfully.
SERVERS DAILY_INCR TSM 02/04/2020 18:00:00 02/04/2020 18:00:07 02/04/2020 18:12:29 Completed 4 The operation complet
ed successfully, but
some files were not
processed.
SERVERS MMBACKUP TSM 02/05/2020 02:10:00 02/05/2020 02:10:08 02/05/2020 05:32:30 Completed 0 All operations comple
ted successfully.
SERVERS DAILY_INCR TSM 02/05/2020 18:00:00 Future
Process
Query Process
shows all running tasks.
tsm: CHGI_TSM01>query process
Process Process Description Process Status
Number
-------- -------------------- -------------------------------------------------
360 MOVE MEDIA ANR8767I Number of volumes processed: 0. Volumes
sent to library CHGI_TSM for checkout: 1.
361 Space Reclamation Volume C00266L7 (storage pool BACKUPTAPE), Moved
Files: 0, Moved Bytes: 0 bytes, Deduplicated Byt
es: 0 bytes, Unreadable Files: 0, Unreadable Byt
es: 0 bytes. Current Physical File (bytes): 10,0
01 MB Current output volume(s): C00219L7.
Reclaim Stgpool
Use the reclaim stgpool poolname
to initiate a reclamation process. Tapes containing deleted files or backups out of retention can be consolidated (or defragged) into another tape to minimize wasted tape space.
tsm: CHGI_TSM01>reclaim stgpool backuptape
ANR3638W Space reclamation skipped volume A00140L7 because the spanned volume A00138L7 in storage pool BACKUPTAPE is inaccessible.
ANR2110I RECLAIM STGPOOL started as process 361.
ANR4930I Reclamation process 361 started for primary storage pool BACKUPTAPE manually, threshold=60, duration=None.
ANS8003I Process number 361 started.
Adding or Removing Tape Volumes
If you wish to remove a tape that contains data, you can either delete the contents or move the contents elsewhere.
To retain any data on a volume you wish to remove, use the MOVE DATA
command:
tsm: CHGI_TSM01>update vol C00173L7 access=readonly
ANR2207I Volume C00173L7 updated.
tsm: CHGI_TSM01>move data C00173L7
ANR2232W This command will move all of the data stored on volume C00173L7 to other volumes within the same storage pool; the data will be inaccessible to users until the operation completes.
Do you wish to proceed? (Yes (Y)/No (N)) y
ANS8003I Process number 39 started.
To delete any data on a volume you wish to remove, pass in the discarddata=yes
argument when deleting the volume.
tsm: CHGI_TSM01>delete vol C00251L7 discarddata=yes
ANR2221W This command will result in the deletion of all inventory references to the data on volume C00251L7, thereby rendering the data unrecoverable.
If the volume being deleted contains deduplicated data, the server invalidates all files that reside in the storage pool that are dependent upon the data stored on this volume. Files on other volumes might be marked as damaged and result in warning messages when that data is accessed.
Do you wish to proceed? (Yes (Y)/No (N)) y
ANR2222I Discard Data process started for volume C00251L7 (process ID 37).
ANS8003I Process number 37 started.
Once the volume has been deleted, you can move it to the IO port:
tsm: CHGI_TSM01>move media C00173L7 stgpool=backuptape remove=yes
ANR0609I MOVE MEDIA started as process 41.
Moving Tape
The move media tape-id
command moves a tape cartridge in the tape library.
To physically move a tape cartridge to the IO slot, pass in remove=yes
.
tsm: CHGI_TSM01>move media C00173L7 stgpool=backuptape remove=yes
ANR0609I MOVE MEDIA started as process 41.
Be aware that an invalid tape-id will not throw an error on execution. Check the "completed tasks" page on TSM to see the actual status.
See Also: https://www.ibm.com/support/knowledgecenter/SSGSG7_7.1.5/srv.reference/r_cmd_media_move.html
Library Audit
A full library audit can be triggered with the AUDIT LIBRARY library-name checklabel=barcode
command.
This however only works when the library isn't being used by any other processes. If any drive is in use by TSM, the job will wait until they are dismounted.
On successful completion of the job, you should see activity similar to:
Feb 19, 2020, 1:43:16 PM ANR0984I Process 47 for AUDIT LIBRARY started in the BACKGROUND at 01:43:16 PM. (SESSION: 7115, PROCESS: 47)
Feb 19, 2020, 1:43:16 PM ANR8457I AUDIT LIBRARY: Operation for library CHGI_TSM started as process 47. (SESSION: 7115, PROCESS: 47)
Feb 19, 2020, 1:44:00 PM ANR8788W Unable to read the barcode of cartridge in slot-id 1258 in library CHGI_TSM; loading in drive to read label. (SESSION: 7115, PROCESS: 47)
Feb 19, 2020, 1:44:01 PM ANR8788W Unable to read the barcode of cartridge in slot-id 1275 in library CHGI_TSM; loading in drive to read label. (SESSION: 7115, PROCESS: 47)
Feb 19, 2020, 1:44:12 PM ANR8455E Volume C00234L7 could not be located during audit of library CHGI_TSM. Volume has been removed from the library inventory. (SESSION: 7115, PROCESS: 47)
Feb 19, 2020, 1:44:12 PM ANR8455E Volume C00251L7 could not be located during audit of library CHGI_TSM. Volume has been removed from the library inventory. (SESSION: 7115, PROCESS: 47)
Feb 19, 2020, 1:44:12 PM ANR8461I AUDIT LIBRARY process for library CHGI_TSM completed successfully. (SESSION: 7115, PROCESS: 47)
Feb 19, 2020, 1:44:12 PM ANR0985I Process 47 for AUDIT LIBRARY running in the BACKGROUND completed with completion state SUCCESS at 01:44:12 PM. (SESSION: 7115, PROCESS: 47)
Missing volumes that are still in the tape library can be moved out and then re-imported.
Tape Audit
A specific tape can be verified using the AUDIT VOLUME tape-id
command.
tsm: CHGI_TSM01>audit vol c00011l7
ANR2310W This command will compare all inventory references to volume C00011L7 with the actual data stored on the volume and will report any discrepancies; the data will be inaccessible to users until the operat
ion completes.
Do you wish to proceed? (Yes (Y)/No (N)) y
tsm: CHGI_TSM01>query proc
Process Process Description Process Status
Number
-------- -------------------- -------------------------------------------------
52 Audit Volume (Inspec Volume C00011L7 (storage pool BACKUPTAPE), Files
t Only) Processed: 0, Damaged Files Found: 0, Partial Fi
les Skipped: 0. Current Physical File (bytes): 5
89,289,663 Waiting for mount of input volume C00
011L7 (1 seconds).
Option Set and Client Options
Backups can include or exclude specific files based on Option Set and its associated Client Options.
Use the TSM Administrative CLI to define a new Option Set (define cloptset
) and new Client Options (define clientopt optionset option value [seq = N]
). Use the INCLEXCL
option to define what files to include or exclude. The ordering of these rules can be changed by specifying a sequence number.
An example Option Set and its list of options.
tsm: CHGI_TSM01>query cloptset
Optionset Description Last Update by Managing profile Replica Option
(administrator) Set
------------------------- ------------------------- --------------- -------------------- ---------------
CRICK JOERG No
Option Sequence Use Option Option Value
number Set Value
(FORCE)
------------------------- -------- ---------- ------------------------------------------------------------
DOMAIN 0 No all-local -/gpfs
INCLEXCL 0 No Exclude /unix/
INCLEXCL 1 No Exclude /.../core
INCLEXCL 2 No Exclude.dir /unix/
INCLEXCL 3 No Include /gpfs/forever/.../* INDEFINITE
RESOURCEUTILIZATION 0 No 5
SCHEDMODE 0 No prompted
TXNBYTELIMIT 0 No 2G
The INCLEXCL
directive defines what files to include or exclude. To exclude all files under any directory named 'temp' on Windows, we could use the following client option. Note the *:\
to match any drive, ...\
to match any number of directories, and *
to match any file.
tsm: CHGI_TSM01> define clientopt WINDOWS inclexcl "Exclude *:\...\temp\...\*"
ANR2050I DEFINE CLIENTOPT: Option INCLEXCL defined in optionset WINDOWS.
To exclude a specific directory on Linux, we define a similar rule as above but without a drive letter and with forward slashes instead. Similarily, .../
to match any number of directories and *
to match any files.
tsm: CHGI_TSM01> define clientopt TSM_IX inclexcl "Exclude /gpfs/home/sbagheri/Human_genome_mapping/.../*"
ANR2050I DEFINE CLIENTOPT: Option INCLEXCL defined in optionset TSM_IX.
To remove a rule, use DELETE clientopt name inclexcl seq=sequence-number
.
tsm: CHGI_TSM01> query cloptset TSM_IX
Option Sequence Use Option Option Value
number Set Value
(FORCE)
------------------------- -------- ---------- ------------------------------------------------------------
...
INCLEXCL 58 No Exclude /gpfs/home/sbagheri/Human_genome_mapping/.../*'
...
tsm: CHGI_TSM01> delete clientopt TSM_IX INCLEXCL SEQ=58
ANR2053I DELETE CLIENTOPT: Option INCLEXCL, sequence number 58, has been deleted from optionset TSM_IX.
See Also:
- https://www.ibm.com/support/knowledgecenter/SSEQVQ_8.1.9/srv.reference/r_cmd_cloptset_define.html
- https://www.ibm.com/support/knowledgecenter/SSEQVQ_8.1.9/srv.reference/r_cmd_clientopt_define.html
- Inclexcl matching help - https://kb.wisc.edu/helpdesk/page.php?id=25684
TSM System
Automated Backups
At CHGI, there are two scheduled tasks defined:
- Execution of
/usr/local/bin/mmbackup.ksh
at 2:10AM, daily - An incremental backup at 6:00PM, daily.
It looks like /usr/local/bin/mmbackup.ksh
runs mmbackup
which renders the client options into a policy file at /var/mmfs/mmbackup/.mmbackupRules.gpfs0
and then executes mmapplypolicy that uses this policy file to do the actual backups. A pstree of the entire command execution looks like:
mmbackup.ksh(77050)───mmbackup(77054)───tsbackup33(77192)───sh(77314)───mmapplypolicy(77315)───tsapolicy(77387)─┬─tsapolicy(77394)─┬─{tsapolicy}(77396)
...
│ └─{tsapolicy}(77605)
├─{tsapolicy}(77395)
...
tsapolicy generates a list of files that requires to be backed up and will then generate a file list that will then get passed to dsmc selective -filelist=/gpfs/.mmbackupCfg/mmbackupChanged..... -servername=CHGI_TSM01 -verbose -subdir=no
that does the actual backups. It looks like file statuses are saved on the DB2 server, so near the end of the process where file states need to be updated, the database server will be extremely busy.
I'm unsure if the incremental backup at 6PM actually does anything then, if backups are done daily using the script...
An example client option set and the generated backup policy is provided below.
tsm> query inclexcl
Session established with server CHGI_TSM01: Linux/ppc64
Server Version 7, Release 1, Level 7.200
Server date/time: 02/05/2020 11:19:28 Last access: 02/05/2020 05:32:30
*** FILE INCLUDE/EXCLUDE ***
Mode Function Pattern (match from top down) Source File
---- --------- ------------------------------ -----------------
No exclude filespace statements defined.
Excl Directory /.../.TsmCacheDir TSM
Exclude All /gpfs/cbousman/.../* Server
Exclude All /.../.snapshots/.../* Server
Exclude All /tiered/vetmed_data/.../* Server
Exclude All /tiered/snyder_data/.../* Server
Exclude All /tiered/mtgraovac/.../* Server
Exclude All /tiered/kkurek/.../* Server
Exclude All /tiered/danderson/.../* Server
Exclude All /tiered/achri_data/.../* Server
Exclude All /tiered/ewang/.../* Server
Exclude All /tiered/morph/.../* Server
Exclude All /tiered/ewang_scratch/.../* Server
Exclude All /gpfs/gallo/.../* Server
Exclude All /gpfs/charb_data/.../* Server
Exclude All /gpfs/vetmed_stage/.../* Server
Exclude All /gpfs/vetmed_data/.../* Server
Exclude All /gpfs/snyder_work/.../* Server
Exclude All /gpfs/ebg_work/.../* Server
Exclude All /gpfs/qlong/.../* Server
Exclude All /.../tmp/.../* Server
Exclude All /.../core Server
Exclude All /tsmarchlog/.../* Server
Exclude All /tsmlog/.../* Server
Exclude All /tsmdb*/.../* Server
Exclude All /.../*.dsm Server
Exclude All /.../*.dbv Server
Include All /gpfs/forever/.../* Server
No DFS include/exclude statements defined.
Database Information
TSM requires DB2. In CHGI, the database files are stored in /tsmdb0
, /tsmdb1
, /tsmdb2
, /tsmdb3
, /tsmlog
, each about 200GB in size. The database space can be listed in the TSM command line by running:
tsm: CHGI_TSM01>q db
Database Name Total Pages Usable Pages Used Pages Free Pages
-------------- ------------ ------------ ------------ ------------
TSMDB1 14,806,035 14,802,931 14,023,207 779,724
tsm: CHGI_TSM01> query dbspace
Location Total Space of Used Space on F Free Space(MB)
File System (MB ile System (MB)
)
------------------------------ --------------- --------------- ---------------
/tsmdb0 201,459.91 90,837.06 110,622.85
/tsmdb1 201,459.91 90,836.96 110,622.95
/tsmdb2 201,459.91 90,837.04 110,622.87
/tsmdb3 201,459.91 90,837.00 110,622.91
tsm: CHGI_TSM01>q log
Total Space(MB) Used Space(MB) Free Space(MB)
--------------- --------------- ---------------
131,072 1,119 129,953
Services
There are two services on the TSM server:
[root@tsm ~]# systemctl
To stop the TSM server, run:
[root@tsm ~]# systemctl stop dsmcad
[root@tsm ~]# systemctl stop tsminst1
It doesn't look like the DB2 server has a systemd service
Troubleshooting
To help troubleshoot any issues, check out the logs at:
/opt/tivoli/tsm/client/ba/bin/dsmerror.log
/opt/tivoli/tsm/client/ba/bin/dsminstr.log
/opt/tivoli/tsm/client/ba/bin/dsmsched.log
Backups that are triggered with /usr/local/bin/mmbackup.ksh
are stored in /var/log/tsm
.
For tape library issues, you can also check the following logs which contains commands that are sent to the tape library.
/var/log/lin_tape.trace
/var/log/lin_tape.errorlog
Tape Issue
From the TSM web GUI, looking at events under the TSM server, I saw the following:
Feb 11, 2020, 12:28:28 AM ANR8944E Hardware or media error on drive DRIVE4 (/dev/IBMtape4) with volume C00013L7(OP=READ, Error Number= 5, CC=0, KEY=03, ASC=11, ASCQ=00, SENSE=F0.00.03.00.04.00.00.58.00.00.00.00.11.00.36.00.70.60.1F.47.04.01.43.30.30.30.31.33.4C.0A.00.00.83.40.CE.00.00.01.05.CE.80.08.60.4C.37.00.3D.BB.A6.00.01.90.CC.70.60.1F.47.70.60.1F.47.70.60.1F.47.70.60.00.01.15.60.03.E0.00.50.8C.00.B8.04.D1.00.00.3F.27.54.01.4D.48.31.52.58.31.55.57.44.36, Description=An undetermined error has occurred). Refer to the IBM Tivoli Storage Manager documentation on I/O error code descriptions. (PROCESS: 351)
Feb 11, 2020, 12:28:28 AM ANR8359E Media fault detected on LTO volume C00013L7 in drive DRIVE4 (/dev/IBMtape4) of library CHGI_TSM. (PROCESS: 351)
The tape library also logged an error with this cartridge:
Type: Cartridge
Location: Frame 1, Column 4, Row 3
State:
Time: Mon, Feb 10, 2020 11:11:27 PM MST
User: Service
Description: Cartridge C00013L7 could not be read from because of faulty media or a faulty drive.
Error Code: 0005
The volume was automatically set to 'unavailable' by TSM:
tsm: CHGI_TSM01>query vol C00013L7 format=detailed
Volume Name: C00013L7
Storage Pool Name: BACKUPTAPE
Device Class Name: LTO
Estimated Capacity: 7.2 T
Scaled Capacity Applied:
Pct Util: 0.3
Volume Status: Full
Access: Unavailable
Pct. Reclaimable Space: 99.7
Scratch Volume?: Yes
In Error State?: No
Number of Writable Sides: 1
Number of Times Mounted: 66
Write Pass Number: 1
Approx. Date Last Written: 08/04/2019 02:53:45
Approx. Date Last Read: 02/11/2020 00:28:28
Date Became Pending:
Number of Write Errors: 0
Number of Read Errors: 3
Volume Location:
Volume is MVS Lanfree Capable : No
Last Update by (administrator):
Last Update Date/Time: 08/02/2019 12:36:02
Begin Reclaim Period:
End Reclaim Period:
Drive Encryption Key Manager: IBM Tivoli Storage Manager
Logical Block Protected: No
I set the access mode to readonly, then initiated another reclamation with reclaim stgpool backuptape
. Tape library complained of a few errors from the tape drive and the access state went back to "Unavailable" on TSM.
I moved the tape to the IO port with:
tsm: CHGI_TSM01>move media c00013l7 stgpool=backuptape remove=yes
ANR0609I MOVE MEDIA started as process 360.
It turns out that there are many other tape cartridges that are set as unavailable.
tsm: CHGI_TSM01>query vol access=unavailable
Session established with server CHGI_TSM01: Linux/ppc64
Server Version 7, Release 1, Level 7.200
Server date/time: 02/13/2020 10:33:06 Last access: 02/13/2020 10:27:40
Volume Name Storage Poo Device Cla Estimated Pct U Volume S
l Name ss Name Capacity til tatus
------------------------ ----------- ---------- --------- ----- --------
A00110L7 BACKUPTAPE LTO 136.5 G 0.0 Full
A00112L7 BACKUPTAPE LTO 129.4 G 0.3 Full
A00118L7 BACKUPTAPE LTO 104.8 G 0.0 Full
A00122L7 BACKUPTAPE LTO 161.9 G 1.0 Full
A00127L7 BACKUPTAPE LTO 13.6 T 0.0 Filling
A00130L7 BACKUPTAPE LTO 143.8 G 78.8 Full
A00137L7 BACKUPTAPE LTO 122.3 G 70.1 Full
A00138L7 BACKUPTAPE LTO 139.3 G 74.0 Full
C00013L7 BACKUPTAPE LTO 7.2 T 0.1 Full
C00017L7 BACKUPTAPE LTO 13.6 T 0.2 Filling
C00018L7 BACKUPTAPE LTO 13.6 T 9.4 Filling
C00022L7 BACKUPTAPE LTO 13.6 T 0.2 Filling
C00049L7 BACKUPTAPE LTO 12.8 T 17.9 Full
C00061L7 BACKUPTAPE LTO 9.7 T 26.7 Full
C00064L7 BACKUPTAPE LTO 13.6 T 0.1 Filling
C00156L7 BACKUPTAPE LTO 6.8 T 1.0 Full
C00158L7 BACKUPTAPE LTO 7.0 T 0.0 Full
C00168L7 BACKUPTAPE LTO 11.2 T 1.6 Full
C00173L7 BACKUPTAPE LTO 12.1 T 0.5 Full
C00175L7 BACKUPTAPE LTO 13.6 T 5.6 Filling
C00186L7 BACKUPTAPE LTO 8.7 T 13.6 Full
C00193L7 BACKUPTAPE LTO 13.6 T 4.5 Filling
C00251L7 BACKUPTAPE LTO 5.7 T 0.0 Full
C00264L7 BACKUPTAPE LTO 7.5 T 21.7 Full
You can see the number of read and write errors per cartridge by dumping a detailed list into a file.
tsm: CHGI_TSM01>query vol format=detailed > b
Output of command redirected to file 'b'
## View the file 'b'
Volume Name Storage Poo Device Cla Estimated Scaled C Pct U Volume S Access Pct. Reclai Scratch Vo In Error S Number o Number Write Approx. Da Approx. Da Date Becam Number Number Volume L Volume is Last Update by Last Updat Begin Recl End Reclai Drive Encryption Key M Logical Block Protected
l Name ss Name Capacity apacity til tatus mable Space lume? tate? f Writab of Time Pass N te Last Wr te Last Re e Pending of Wr of Re ocation MVS Lanfre (administrator) e Date/Tim aim Period m Period anager
Applied le Sides s Mount umber itten ad ite Er ad Err e Capable e
ed rors ors
------------------------ ----------- ---------- --------- -------- ----- -------- ----------- ----------- ---------- ---------- -------- ------- ------ ---------- ---------- ---------- ------ ------ -------- ---------- --------------- ---------- ---------- ---------- ---------------------- -----------------------
A00091L7 BACKUPTAPE LTO 127.3 G 98.3 Full Read/Write 3.0 Yes No 1 1 1 11/14/2019 11/14/2019 0 0 No 11/14/2019 IBM Tivoli Storage Man No
10:11:57 09:53:52 09:51:39 ager
A00105L7 BACKUPTAPE LTO 13.6 T 1.3 Filling Read-Only 0.1 Yes Yes 1 2 1 08/17/2019 08/14/2019 1 0 No 08/14/2019 IBM Tivoli Storage Man No
20:56:27 21:09:29 21:09:21 ager
A00106L7 BACKUPTAPE LTO 167.7 G 83.8 Full Read/Write 16.6 Yes No 1 3 1 06/01/2019 06/14/2019 0 0 No 06/01/2019 IBM Tivoli Storage Man No
04:12:36 19:18:53 03:47:22 ager
A00107L7 BACKUPTAPE LTO 203.5 G 97.1 Full Read/Write 3.3 Yes No 1 4 1 06/01/2019 08/14/2019 0 0 No 06/01/2019 IBM Tivoli Storage Man No
04:14:58 20:21:55 03:47:49 ager
A00108L7 BACKUPTAPE LTO 193.3 G 68.7 Full Read/Write 32.0 Yes No 1 16 1 06/01/2019 08/25/2019 0 0 No 06/01/2019 IBM Tivoli Storage Man No
Reclamation Issues
When triggering a reclamation, I got the following messages:
tsm: CHGI_TSM01>reclaim stgpool backuptape
ANR3638W Space reclamation skipped volume C00266L7 because the spanned volume C00285L7 in storage pool BACKUPTAPE is inaccessible.
ANR3638W Space reclamation skipped volume C00234L7 because the spanned volume C00231L7 in storage pool BACKUPTAPE is inaccessible.
ANR3638W Space reclamation skipped volume A00144L7 because the spanned volume A00140L7 in storage pool BACKUPTAPE is inaccessible.
ANR2110I RECLAIM STGPOOL started as process 359.
ANR4930I Reclamation process 359 started for primary storage pool BACKUPTAPE manually, threshold=60, duration=None.
ANS8003I Process number 359 started.
The tape's status is set to unavailable for some reason:
tsm: CHGI_TSM01>query vol c00285l7 format=detailed
Volume Name: C00285L7
Storage Pool Name: BACKUPTAPE
Device Class Name: LTO
Estimated Capacity: 13.6 T
Scaled Capacity Applied:
Pct Util: 2.3
Volume Status: Filling
Access: Unavailable
Pct. Reclaimable Space: 2.4
Scratch Volume?: Yes
In Error State?: Yes
Number of Writable Sides: 1
Number of Times Mounted: 1
Write Pass Number: 1
Approx. Date Last Written: 10/02/2019 06:29:24
Approx. Date Last Read: 10/01/2019 19:53:09
Date Became Pending:
Number of Write Errors: 1
Number of Read Errors: 0
Volume Location:
Volume is MVS Lanfree Capable : No
Last Update by (administrator):
Last Update Date/Time: 10/01/2019 19:49:01
Begin Reclaim Period:
End Reclaim Period:
Drive Encryption Key Manager: IBM Tivoli Storage Manager
Logical Block Protected: No
I think these tapes may have been bad or the drive had issues reading it. Perhaps try the reclamation process again after setting the access to readonly
using UPDATE VOL tapeid access=readonly
.
Move Data Failure with ANR1880W
Mar 16, 2020, 1:46:58 PM ANR0984I Process 97 for MOVE DATA started in the BACKGROUND at 01:46:57 PM. (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 1:46:58 PM ANR1140I Move data process started for volume C00222L7 (process ID 97). (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 1:46:58 PM ANR1176I Moving data for collocation set 1 of 1 on volume C00222L7. (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 5:06:57 PM ANR0513I Process 97 opened output volume C00268L7. (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 5:08:11 PM ANR8337I LTO volume C00174L7 mounted in drive DRIVE4 (/dev/IBMtape3). (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 5:08:12 PM ANR0512I Process 97 opened input volume C00174L7. (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 5:09:42 PM ANR0515I Process 97 closed volume C00174L7. (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 5:10:24 PM ANR8468I LTO volume C00174L7 dismounted from drive DRIVE4 (/dev/IBMtape3) in library CHGI_TSM. (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 5:10:54 PM ANR8337I LTO volume C00222L7 mounted in drive DRIVE4 (/dev/IBMtape3). (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 5:10:55 PM ANR0512I Process 97 opened input volume C00222L7. (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 5:11:03 PM ANR1880W Server transaction was canceled because of a conflicting lock on table AF_SEGMENTS. (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 5:11:03 PM ANR0106E afrtrv.c(1277): Unexpected error 1014 fetching row in table "AF.Bitfiles". (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 5:11:03 PM ANR1156W Move data process terminated for volume C00222L7 - internal server error detected. (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 5:11:03 PM ANR9999D Thread<177307> issued message 1156 from: (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 5:11:03 PM ANR9999D Thread<177307> 0x00000010c142bc *UNKNOWN* (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 5:11:03 PM ANR9999D Thread<177307> 0x00000010298768 *UNKNOWN* (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 5:11:03 PM ANR9999D Thread<177307> 0x000000103bddc4 *UNKNOWN* (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 5:11:03 PM ANR9999D Thread<177307> 0x00000010cd7b10 *UNKNOWN* (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 5:11:03 PM ANR9999D Thread<177307> 0x003fffb7dfc93c *UNKNOWN* (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 5:11:03 PM ANR9999D Thread<177307> 0x003fffb2c17a3c *UNKNOWN* (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 5:11:03 PM ANR0515I Process 97 closed volume C00222L7. (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 5:11:03 PM ANR0515I Process 97 closed volume C00268L7. (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 5:11:03 PM ANR0986I Process 97 for MOVE DATA running in the BACKGROUND processed 1 items for a total of 10,487,247,280 bytes with a completion state of FAILURE at 05:11:03 PM. (SESSION: 52276, PROCESS: 97)
Mar 16, 2020, 5:11:03 PM ANR1893E Process 97 for MOVE DATA completed with a completion state of FAILURE. (SESSION: 52276, PROCESS: 97)
The relevant error here appears to be ANR1880W Server transaction was canceled because of a conflicting lock on table AF_SEGMENTS.
. Try restarting the MOVE DATA command again.
TSM Operations Center 404 Error /oc
If the Operations Center isn't loading at https://hostname:11090/oc
, try to restart the GUI server.
# /etc/init.d/opscenter.rc restart
Stopping server guiServer.
Server guiServer stopped.
Starting server guiServer.
Server guiServer started with process ID 61013.
Backup of Snapshot Fails: New file list is missing or empty.
When using mmbackup with the -S
option, I got this:
--------------------------------------------------------
mmbackup: Backup of /gpfs begins at Thu Jun 11 14:16:03 MDT 2020.
--------------------------------------------------------
DEBUGtsbackup33: /usr/lpp/mmfs/bin/mmapplypolicy "/gpfs/.snapshots/tsm-2020-06-11" -g /gpfs/.mmbackupCfg -N tsm-ib -S 'tsm-2020-06-11' --qos maintenance -P /var/mmfs/mmbackup/.mmbackupRules.gpfs0 -I prepare -f /gpfs/.mmbackupCfg/prepFiles --irule0 --sort-buffer-size=5%
Thu Jun 11 14:16:07 2020 mmbackup:Scanning file system gpfs0
Thu Jun 11 15:58:55 2020 mmbackup:New list file /gpfs/.mmbackupCfg/prepFiles/list.mmbackup.1.CHGI_TSM01 is missing or empty.
Do the TSM include/exclude rules exclude all contents of /gpfs for TSM server CHGI_TSM01 ?
Thu Jun 11 15:58:55 2020 mmbackup:No changed or deleted files for gpfs0 since mmbackup was last invoked.
Thu Jun 11 15:58:55 2020 mmbackup:Incremental backup completely failed.
TSM had 0 severe errors and returned 0. See the TSM log file for more information.
0 files had errors,
TSM exit status: exit 12
----------------------------------------------------------
mmbackup: Backup of /gpfs completed with errors at Thu Jun 11 15:58:56 MDT 2020.
----------------------------------------------------------
mmbackup: Command failed. Examine previous error messages to determine cause.
I noticed that the inclexcl
rules also excluded all '.snapshots
' paths:
INCLEXCL 56 No exclude /.../.snapshots/.../*
INCLEXCL 59 No Exclude.dir /.../.snapshots/*
INCLEXCL 60 No Exclude.dir /.../.snapshots
I removed all the rules listed above and re-ran the backups.
I did add these rules out of precaution, but I don't think they actually do any good as the policy prefixed the /gpfs/.snapshots/snapshot-name path to them.
INCLEXCL 58 No exclude.dir /gpfs/[a-z,A-Z,0-9,_]*/.snapshots/*
INCLEXCL 59 No exclude.dir /gpfs/[a-z,A-Z,0-9,_]*/.snapshots/.../*
TSM cannot handle paths with special symbols
Backup jobs fails when filenames contain special characters used by TSM's regex. These include '?' and '*'. Double quotes also cause issue as well.
Some events I saw for each invalid symbol:
Jun 12, 2020, 10:05:13 AM ANE4005E Error processing '/gpfs/home/xyz/index/index.html?C=D;O=A': file not found (SESSION: 13290)
Jun 12, 2020, 10:05:13 AM ANE4005E Error processing '/gpfs/home/xyz/index/index.html?C=S;O=D.1': file not found (SESSION: 13290)
Jun 12, 2020, 10:05:13 AM ANE4005E Error processing '/gpfs/home/xyz/index/index.html?C=N;O=D.1': file not found (SESSION: 13290)
Jun 12, 2020, 9:30:14 AM ANE4005E Error processing '/gpfs/home/xyz/assignments_corrected*': file not found (SESSION: 13222)
Jun 12, 2020, 9:18:20 AM ANE4005E Error processing '/gpfs/home/xyz/ORTHONOME/*.info': file not found (SESSION: 13199)
Jun 12, 2020, 9:49:14 AM ANE4901E The following object contains one or more unmatched quotation marks and cannot be processed: '"/gpfs/home/xyz/testdir/ ab/c"d/efg"'. (SESSION: 13263)
What kind of crap solution can't handle these filenames.
See Also
- Overview of rention policies and human readable definition of terms in TSM - https://oit.duke.edu/about/policies/tsm-backup-retention-policies
- TSM overview - https://wiki.fysik.dtu.dk/it/TSM-server-configuration
Additionally, ADSM.org is an online community around TSM help.