Line 341: Line 341:
  
 
Interestingly, all tapes that were in drives were stowed away. I believe this is the behavior when shutting down LTFS.
 
Interestingly, all tapes that were in drives were stowed away. I believe this is the behavior when shutting down LTFS.
 +
 +
=== Bad Tape Drive ===
 +
This is more of a TS4500 issue with a bad tape drive. One of the drives randomly stopped working and the following happened.
 +
 +
{{code|/var/log/ltfsee.log}} showed:
 +
{{highlight|lang=text|code=
 +
2020-03-24T19:43:39.873263-06:00 ltfs ltfseecp[21628]: GLESG081E(00468): Migrating data of GPFS file /tiered/kkurek/sbarclay/speedseq_align/73.L005.realign.bam: write failed to tape T00086L8 and file /ltfs/T00086L8/.LTFSEE_DATA/10991691794470275686-15679318773138264748-228651564-67167434-0 (data length: 524288, rc: -1, errno: 5).
 +
2020-03-24T19:43:39.873579-06:00 ltfs ltfseecp[21628]: GLESG506E(00803): Migration file (/tiered/kkurek/sbarclay/speedseq_align/73.L005.realign.bam) to tape T00086L8 failed (1091).
 +
2020-03-24T19:43:39.873870-06:00 ltfs ltfseecp[21628]: GLESC003E(01158): Redundant copy for file /tiered/kkurek/sbarclay/speedseq_align/73.L005.realign.bam to tape T00086L8 failed.
 +
2020-03-24T19:43:39.979980-06:00 ltfs mmm[28988]: GLESM110W(00210): Tape T00086L8 got critical.
 +
2020-03-24T19:44:09.243830-06:00 ltfs mmm[28988]: GLESM709E(00442): Unmount T00086L8 command error: 172.26.3.249:7600 (4): Request Error (077E): [Cartridge.cc:146]: Cartridge unmount is failed: 578a LTFSI1086E This operation is not allowed on a cartridge with a critical error.
 +
2020-03-24T19:44:09.244112-06:00 ltfs mmm[28988]: GLESM118E(00060): Unmount of tape T00086L8 failed (drive 000780765B). Check the state of tapes and drives.
 +
2020-03-24T19:44:39.166261-06:00 ltfs mmm[28988]: GLESM709E(00442): Unmount T00086L8 command error: 172.26.3.249:7600 (4): Request Error (077E): [Cartridge.cc:146]: Cartridge unmount is failed: 5b10 LTFSI1086E This operation is not allowed on a cartridge with a critical error.
 +
2020-03-24T19:44:39.166568-06:00 ltfs mmm[28988]: GLESM118E(00060): Unmount of tape T00086L8 failed (drive 000780765B). Check the state of tapes and drives.
 +
2020-03-24T19:45:09.283571-06:00 ltfs mmm[28988]: GLESM709E(00442): Unmount T00086L8 command error: 172.26.3.249:7600 (4): Request Error (077E): [Cartridge.cc:146]: Cartridge unmount is failed: 5d71 LTFSI1086E This operation is not allowed on a cartridge with a critical error.
 +
2020-03-24T19:45:09.283890-06:00 ltfs mmm[28988]: GLESM118E(00060): Unmount of tape T00086L8 failed (drive 000780765B). Check the state of tapes and drives.
 +
...
 +
2020-03-25T01:19:14.689053-06:00 ltfs ltfsee[14508]: GLESL062E(00577): Tape with ID: T00086L8 is in an invalid state. Tapes must be either in state "Valid LTFS" or "Unknown".
 +
2020-03-25T01:19:14.695195-06:00 ltfs ltfsee[14501]: GLESL062E(00577): Tape with ID: T00086L8 is in an invalid state. Tapes must be either in state "Valid LTFS" or "Unknown".
 +
2020-03-25T01:19:14.700692-06:00 ltfs ltfsee[14503]: GLESL062E(00577): Tape with ID: T00086L8 is in an invalid state. Tapes must be either in state "Valid LTFS" or "Unknown".
 +
2020-03-25T01:19:14.706384-06:00 ltfs ltfsee[14504]: GLESL062E(00577): Tape with ID: T00086L8 is in an invalid state. Tapes must be either in state "Valid LTFS" or "Unknown".
 +
 +
}}
 +
 +
{{highlight|lang=terminal|code=
 +
[root@ltfs ~]# ltfsee info tapes
 +
Tape ID  Status      Type  Capacity(GiB)  Used(GiB)  Free(GiB)  Reclaimable(GiB)  Pool    Library  Address  Drive      Appendable               
 +
T00086L8  Critical    L8            10907          0          0                0  TIERED2  T_ARCH  262      000780765B  no                     
 +
}}
  
 
== Tasks ==
 
== Tasks ==

Revision as of 11:10, 25 March 2020


Usage

The ltfs* binaries are typically installed at /opt/ibm/ltfsee/bin/ and are provided by the ltfs-*.rpm RPM packages.

You will most likely interface with LTFS using the ltfsee command.


LTFS Library Status

[root@ltfs ~]# ltfsee status
Ctrl Node     MD      MMM     Library
172.26.3.249  Active  Active  T_ARCH

Tapes

To show all tape drives:

[root@ltfs ~]# ltfsee info drives
Drive S/N   Status   Type  Role  Library  Address  Node ID  Tape      Node Group
0007807A4B  In use   LTO8   mrg  T_ARCH   260      4        T00031L8  G0        
0007807A0B  Mounted  LTO8   mrg  T_ARCH   261      4        -         G0        
000780765B  In use   LTO8   mrg  T_ARCH   262      4        T00054L8  G0


To show all tapes in the library, use ltfsee info tapes. Tape status description available at https://www.ibm.com/support/knowledgecenter/ST9MBR_1.2.6/ltfs_ee_ltfsee_info_tapes.html.

[root@ltfs ~]# ltfsee info tapes
Tape ID   Status       Type  Capacity(GiB)  Used(GiB)  Free(GiB)  Reclaimable(GiB)  Pool     Library  Address  Drive       Appendable
T00000L8  Valid        L8            10907          0      10907                 0  TIERED1  T_ARCH   1161     -           yes       
T00001L8  Valid        L8            10907          0      10907                 0  TIERED1  T_ARCH   1160     -           yes       
T00003L8  Valid        L8            10907          0      10907                 0  TIERED1  T_ARCH   1158     -           yes       
T00004L8  Valid        L8            10907          0      10907                 0  TIERED1  T_ARCH   1157     -           yes
Tape Types
Tapes that end with L8 are LTO-8.

If drives are in use, you can list all jobs to see what is actually being done.

[root@ltfs ~]# ltfsee info jobs
Job Type         Status       Idle(sec)  Scan ID     Tape      Pool     Library  Node  File Name or inode
Reclaim(Source)  In-progress     573811  2944737025  T00031L8  TIERED1  T_ARCH      4  -                 
Reclaim(Target)  In-progress     573683  2944737281  T00054L8  TIERED1  T_ARCH      4  -                 
Validate         Unscheduled       4469  1602429441  T00054L8  TIERED1  T_ARCH      -  -

To add tapes to a particular pool, use ltfsee pool add -p pool -t tape ...:

[root@ltfs ~]# ltfsee pool add -p TIERED2 -t T00099L8
GLESL042I(00894): Adding tape T00099L8 to storage pool TIERED2.
Added tape T00099L8 to pool TIERED2 successfully.

To remove tapes from a particular pool, use ltfsee pool remove -p pool -t tape ...:

[root@ltfs ~]# ltfsee pool remove -p TIERED1 -t T00030L8 
GLESL043I(01134): Removing tape T00030L8 from storage pool TIERED1.
Removed tape T00030L8 from pool TIERED1 successfully.

TODO: When removing, how does that affect replicas?

If a tape is in a drive, you can move it back to its homeslot using ltfsee tape move homeslot -t tape -p pool:

# ltfsee tape move homeslot -t T00037L8 -p TIERED1
GLESL373I(00890): Moving tape T00037L8.
Tape T00037L8 is unmounted because it is inserted into the drive.
Tape T00037L8 is moved successfully.

Alternatively, use ieslot to move it to the IO port of the tape library for extraction.

LTFS Information

ltfsee info pools shows all pools.

Pool Name  Total(TiB)  Used(TiB)  Free(TiB)  Reclaimable(TiB)  Tapes  Type  Library  Node Group
TIERED1         644.3       84.6      559.7               0.0     61  LTO   T_ARCH   G0        
TIERED2         638.9       80.8      558.1               0.0     61  LTO   T_ARCH   G0


ltfsee info libraries shows all libraries with LTFS.

[root@ltfs ~]# ltfsee info libraries 
Library Name  Status  Model     Serial Number     Ctrl Node   
T_ARCH        Active  03584L32  0000078BA6130402  172.26.3.249


Show Tiered Files

When files are tiered to tape, the full path of the file is stored in a database as 'migrated'. You can find a file's tiered tape cartridge using ltfsee info files -f filepath. Files that are 'migrated' or tiered off to tape. Files not migrated are 'resident' and resides on spinning disk.

Example output below shows tape ID @ library for files migrated

# ltfsee info files -f /tiered/ewang_scratch/xuexu/dbgap_UK_OTTO/SRR302*
Name: /tiered/ewang_scratch/xuexu/dbgap_UK_OTTO/SRR3021220_2.fastq
Tape id:-          Status: resident
Name: /tiered/ewang_scratch/xuexu/dbgap_UK_OTTO/SRR3021242_1.fastq
Tape id:T00042L8@T_ARCH:T00102L8@T_ARCH Status: migrated
Name: /tiered/ewang_scratch/xuexu/dbgap_UK_OTTO/SRR3021288_1.fastq
Tape id:T00055L8@T_ARCH:T00107L8@T_ARCH Status: migrated
Side Notes on Migrated Files
Migrated files will have a stub file with zero size. To count sizes with du, you need to use --apparent-size Eg.
[root@node001 xuexu]# du -sh dbgap_UK_OTTO
23T	dbgap_UK_OTTO

[root@node001 xuexu]# du -sh --apparent-size dbgap_UK_OTTO
35T	dbgap_UK_OTTO 

[root@node001 dbgap_UK_OTTO]# du SRR3021288_2.fastq
0	SRR3021288_2.fastq

[root@node001 dbgap_UK_OTTO]# du -sh --apparent-size SRR3021288_2.fastq
15G	SRR3021288_2.fastq

To find all files that are stored on a particular tape, the easiest way is to look at the .schema metadata files for the library stored in /tiered/.ltfsee/meta/$library-id/volume_cache/*.schema.

For example, files stored on tape T00037L8 can be found this way:

# cd /tiered/.ltfsee/meta/0000078BA6130402/volume_cache
# grep gpfs.path -A 1 T00037L8.schema | grep value
<value>/tiered/ewang/xuexu/TCGA_bam_germline_12_4/f02e8ee1-349f-42ff-9ed9-ac37435901fa/cdb2f568f3cbdff5a9cd2edd692613bb_gdc_realn.bam</value>
<value>/tiered/ewang/xuexu/TCGA_bam_germline_12_4/TCGA-A2-A0D2_1.fastq</value>
...

Movement Policies

File movements are defined and applied by mmapplypolicy. A scheduled job invoking mmapplypolicy daily can be used to tier files on a regular basis.

The primary policy is on the LTFS server located at /mmpolicies/mmpolicyLATEST.txt and are applied with a daily cronjob that invokes:

# mmapplypolicy /dev/gpfs1 -P /mmpolicies/mmpolicyLATEST.txt >/dev/null 2>&1
# mmapplypolicy /dev/gpfs0 -P /mmpolicies/mmpolicyLATEST.txt >/dev/null 2>&1

The policy file contains the rules that govern which files are moved off to tape. LTFS/GPFS related files, specific filesystem logs, Space Manager(?) should not be migrated.

To prevent tiering of small amounts of data which could cause wear loading/unloading tape excessively, tiering can also be configured to happen only if filesystem usage exceeds 90% and will attempt to lower filesystem usage down to 80% as defined by the THRESHOLD values. Small files can also be ignored with the FILE_SIZE condition.

define(user_exclude_list,(PATH_NAME LIKE '/ibm/gpfs/.ltfsee/%' OR PATH_NAME LIKE'/ibm/gpfs/.SpaceMan/%' OR NAME LIKE 'dsmerror.log'))
define(user_include_list,(PATH_NAME LIKE '/tiered/%'))
define(is_premigrated,(MISC_ATTRIBUTES LIKE '%M%' AND MISC_ATTRIBUTES NOT LIKE'%V%'))
define(is_migrated,(MISC_ATTRIBUTES LIKE '%V%'))
define(is_resident,(NOT MISC_ATTRIBUTES LIKE '%M%'))

RULE 'DATA_POOL_PLACEMENT_RULE' SET POOL 'data'
RULE EXTERNAL POOL 'LTFSEE_FILES'
EXEC '/opt/ibm/ltfsee/bin/ltfsee'
OPTS '-p TIERED1 TIERED2'

RULE 'LTFSEE_FILES_RULE' MIGRATE FROM POOL 'data'
THRESHOLD(90,80)
TO POOL 'LTFSEE_FILES'
WHERE FILE_SIZE > 1048576 
AND (CURRENT_TIMESTAMP - ACCESS_TIME > INTERVAL '525600' MINUTES )
AND (CURRENT_TIMESTAMP - MODIFICATION_TIME > INTERVAL '525600' MINUTES )
AND is_resident OR is_premigrated
AND NOT user_exclude_list
AND user_include_list


Troubleshooting

Logging

LTFS logs are located at /var/log/ltfsee.log.

Additional logs worth investigating:

  • /opt/tivoli/tsm/client/hsm/bin/dsmerror.log


The reclamation process failed

Two tape drives have tapes in it but doesn't appear to be doing anything. Looking at /var/log/ltfsee.log, we can see that the reclamation process failed. The T00037L8 tape was in a drive that stopped functioning, showing a '5' in the single character display with the cartridge ejected.

Addendum: Bad Tape?
After a week, the same error occurred again with the same tape cartridge. It's most likely that the T00037L8 tape cartridge is faulty.
2020-01-20T10:23:05.514522-07:00 ltfs reclaim_target[10652]: GLESA112E(00590): The following command failed with (rc:256:1) : /bin/cp /ltfs/T00037L8/.LTFSEE_DATA/10991691794470275686-15679318773138264748-78339434-18551699-0 /ltfs/T00022L8/.LTFSEE_DATA 2>&1.
2020-01-20T10:23:14.925509-07:00 ltfs reclaim_target[10652]: GLESA112E(00590): The following command failed with (rc:256:1) : /bin/cp /ltfs/T00037L8/.LTFSEE_DATA/10991691794470275686-15679318773138264748-78339434-18551699-0 /ltfs/T00022L8/.LTFSEE_DATA 2>&1.
2020-01-20T10:23:14.925818-07:00 ltfs reclaim_target[10652]: GLESR035E(01182): The copy process from source to destination tape failed for the file 10991691794470275686-15679318773138264748-78339434-18551699-0.
2020-01-20T10:23:14.926141-07:00 ltfs reclaim_target[10652]: GLESR004E(01930): Processing file /tiered/ewang/xuexu/dbgap_tcga_germline/SRR3341182_SRR3341183_varscan.pileup failed: exiting the reclamation driver.
2020-01-20T10:23:14.926413-07:00 ltfs reclaim_target[10652]: GLESR026E(00158): The reclamation process failed (1932).#012           Have a look for previous messages.
2020-01-20T10:23:14.927489-07:00 ltfs mmm[2692]: GLESM221E(02136): Generic job with identifier REC_TGTT00022L8 failed.
2020-01-20T10:23:15.501824-07:00 ltfs mmm[2692]: GLESM223E(02024): Not all generic requests for session 1447695617 have been successful: 1 failed.
2020-01-20T10:23:15.502533-07:00 ltfs mmm[2692]: GLESM221E(02136): Generic job with identifier REC_SRCT00037L8 failed.
2020-01-20T10:23:16.130885-07:00 ltfs mmm[2692]: GLESM223E(02024): Not all generic requests for session 1442649089 have been successful: 1 failed.
2020-01-20T10:23:16.132052-07:00 ltfs ltfsee[14335]: GLESL082E(01590): Reclamation failed while reclaiming tape T00037L8 to target tape T00022L8.
2020-01-26T05:00:02.516999-07:00 ltfs ltfsee[23826]: GLESL668E(00979): Unable to get the state of tape T00030L8. Skip to reclaim. Consult the log files. (rc=1040)
2020-01-26T05:00:02.823888-07:00 ltfs ltfsee[23826]: GLESL682E(01028): Tape with ID: T00030L8 is an invalid state. Source tapes must be in state either "Valid LTFS" or "Warning".
2020-01-26T05:04:36.512781-07:00 ltfs reclaim_target[31377]: GLESR030E(00178): The reclamation process failed. (1655)#012           Have a look for previous messages.
2020-01-26T05:04:36.513142-07:00 ltfs mmm[2692]: GLESM221E(02136): Generic job with identifier REC_TGTT00029L8 failed.
2020-01-26T05:04:36.620168-07:00 ltfs mmm[2692]: GLESM223E(02024): Not all generic requests for session 2135234305 have been successful: 1 failed.

The two drives that are still in use are idling for the past week. Interestingly, ltfsee log did not show any information for 31L8 and 54L8 tapes.

[root@ltfs ~]# ltfsee info jobs
Job Type         Status       Idle(sec)  Scan ID     Tape      Pool     Library  Node  File Name or inode
Reclaim(Source)  In-progress     576566  2944737025  T00031L8  TIERED1  T_ARCH      4  -                 
Reclaim(Target)  In-progress     576438  2944737281  T00054L8  TIERED1  T_ARCH      4  -                 
Validate         Unscheduled       7224  1602429441  T00054L8  TIERED1  T_ARCH      -  -

The job cannot be stopped. Force a LTFS stop using ltfsee stop -f and wait. The tapes drives should eventually eject the tapes. I got the following error when stopping, but the tapes got ejected and everything seemed to have stopped.

[root@ltfs ~]# ltfsee stop -f
Library name: T_ARCH, library serial: 0000078BA6130402, control node (ltfsee_md) IP address: 172.26.3.249.
Running stop command - sending request and waiting for the completion.
GLESL030E(00909): Unable to connect to the MMM service. Check whether the IBM Spectrum Archive EE has been started.
GLESL358E(00494): Error on processing tape T00054L8 (1).
GLESL661E(00104): IPC got failure result (result=1).
GLESL646E(00164): Unable to stop the IBM Spectrum Archive EE monitor daemon for library T_ARCH.

Cannot Start LTFS

[root@ltfs ~]# /opt/ibm/ltfsee/bin/ltfsee start
Library name: T_ARCH, library serial: 0000078BA6130402, control node (ltfsee_md) IP address: 172.26.3.249.
Running start command - sending request : T_ARCH.
Running start command - waiting for completion : T_ARCH.
...
GLESL657E(00191): Fail to start the IBM Spectrum Archive EE service (MMM) for library T_ARCH.
                  Use the 'ltfsee info nodes' command to see the error modules.
                  The monitor daemon will start the recovery sequence.
[root@ltfs ~]# ltfsee info nodes

Spectrum Archive EE service (MMM) for library T_ARCH fails to start or is not running on ltfs-ib.gpfs.net Node ID:4

Problem Detected:
Node ID  Error Modules
      4  MMM;

Looking at /var/log/ltfsee.log, we see:

2020-02-03T13:15:35.266926-07:00 ltfs mmm[31142]: GLESM709E(00369): Assign tape (T00099L8) command error: 172.26.3.249:7600 (4): Request Error (070E): [Cartridge.cc:61]: Cartridge add is failed: 7c05 LTFSI1079E The operation is not allowed.

Seemed to have started up by itself?

[root@ltfs ~]# /opt/ibm/ltfsee/bin/ltfsee start
Library name: T_ARCH, library serial: 0000078BA6130402, control node (ltfsee_md) IP address: 172.26.3.249.
GLESL519I(00344): The IBM Spectrum Archive EE service (ltfsee_md) for library T_ARCH is already running.

[root@ltfs ~]# ltfsee info nodes
Node ID  Status     Node IP       Drives  Ctrl Node    Library  Node Group  Host Name       
4        Available  172.26.3.249       3  yes(active)  T_ARCH   G0          ltfs-ib.gpfs.net

Not sure what happened there...

Invalid Tapes

[root@ltfs ~]# ltfsee info tapes
Tape ID   Status       Type  Capacity(GiB)  Used(GiB)  Free(GiB)  Reclaimable(GiB)  Pool     Library  Address  Drive       Appendable
T00030L8  Invalid      L8            10907      10895          0                22  TIERED1  T_ARCH   1131     -           no            
T00099L8  Invalid      L8            10907      10883          0                 0  TIERED2  T_ARCH   1031     -           no

According to IBM's documentation:

The Invalid status indicates that the cartridge is inconsistent with the LTFS format.

To check and repair this tape before you add it to a tape storage pool, use the ltfsee pool add command with the check option.

In order to add the tape back into the pool, it has to be removed first.

[root@ltfs ~]# ltfsee pool add -p TIERED2 -t T00099L8 -c
GLESL042I(00894): Adding tape T00099L8 to storage pool TIERED2.
GLESL346E(00949): No tapes are found in the unassigned list. T00099L8 is already assigned to another storage pool, is in an IE slot, or does not exist in the library.

[root@ltfs ~]# ltfsee pool remove -p TIERED2 -t T00099L8 
GLESL043I(01134): Removing tape T00099L8 from storage pool TIERED2.
Removed tape T00099L8 from pool TIERED2 successfully.

[root@ltfs ~]# ltfsee pool add -p TIERED2 -t T00099L8 -c
GLESL042I(00894): Adding tape T00099L8 to storage pool TIERED2.
Tape T00099L8 successfully checked. 
Added tape T00099L8 to pool TIERED2 successfully.

The checking step takes a very long time, upwards to an hour for this LTO8 cartridge. The tape drive status will be shown as 'locating' and ltfsee info jobs show the recovery job idling. This however appears to be normal since it eventually finished.

[root@ltfs ~]# ltfsee info jobs
Job Type  Status     Idle(sec)  Scan ID     Tape      Pool     Library  Node  File Name or inode
Recovery  Scheduled        694  2181628673  T00099L8  TIERED2  T_ARCH      4  -


Error Tape Drives

One of the tape drives is in 'Error' state.

[root@ltfs ~]# ltfsee info drives
Drive S/N   Status  Type  Role  Library  Address  Node ID  Tape      Node Group
0007807A4B  In use  LTO8   mrg  T_ARCH   260      4        T00000L8  G0        
0007807A0B  Error   LTO8   mrg  T_ARCH   261      4        -         G0        
000780765B  In use  LTO8   mrg  T_ARCH   262      4        T00037L8  G0

The drive shows up as online to the tape library. No visible errors displayed on the tape drive itself.

I can't remove/readd it:

[root@ltfs ~]# ltfsee drive remove -d 0007807A0B
GLESL132E(00247): Could not remove a drive 0007807A0B. Drive is not in mount or not mounted state. The tape drive status:2.

Stopping and starting LTFSEE seemed to have cleared this error.

[root@ltfs ~]# ltfsee stop
Library name: T_ARCH, library serial: 0000078BA6130402, control node (ltfsee_md) IP address: 172.26.3.249.
Running stop command - sending request and waiting for the completion.
...
Stopped the IBM Spectrum Archive EE services for library T_ARCH.
[root@ltfs ~]# ltfsee start
Library name: T_ARCH, library serial: 0000078BA6130402, control node (ltfsee_md) IP address: 172.26.3.249.
Running start command - sending request : T_ARCH.
Running start command - waiting for completion : T_ARCH.
.....................................
Started the IBM Spectrum Archive EE services for library T_ARCH with good status.
[root@ltfs ~]# ltfsee info drives
Drive S/N   Status       Type  Role  Library  Address  Node ID  Tape  Node Group
0007807A4B  Not mounted  LTO8   mrg  T_ARCH   260      4        -     G0        
0007807A0B  Not mounted  LTO8   mrg  T_ARCH   261      4        -     G0        
000780765B  Not mounted  LTO8   mrg  T_ARCH   262      4        -     G0

Interestingly, all tapes that were in drives were stowed away. I believe this is the behavior when shutting down LTFS.

Bad Tape Drive

This is more of a TS4500 issue with a bad tape drive. One of the drives randomly stopped working and the following happened.

/var/log/ltfsee.log showed:

2020-03-24T19:43:39.873263-06:00 ltfs ltfseecp[21628]: GLESG081E(00468): Migrating data of GPFS file /tiered/kkurek/sbarclay/speedseq_align/73.L005.realign.bam: write failed to tape T00086L8 and file /ltfs/T00086L8/.LTFSEE_DATA/10991691794470275686-15679318773138264748-228651564-67167434-0 (data length: 524288, rc: -1, errno: 5).
2020-03-24T19:43:39.873579-06:00 ltfs ltfseecp[21628]: GLESG506E(00803): Migration file (/tiered/kkurek/sbarclay/speedseq_align/73.L005.realign.bam) to tape T00086L8 failed (1091).
2020-03-24T19:43:39.873870-06:00 ltfs ltfseecp[21628]: GLESC003E(01158): Redundant copy for file /tiered/kkurek/sbarclay/speedseq_align/73.L005.realign.bam to tape T00086L8 failed.
2020-03-24T19:43:39.979980-06:00 ltfs mmm[28988]: GLESM110W(00210): Tape T00086L8 got critical.
2020-03-24T19:44:09.243830-06:00 ltfs mmm[28988]: GLESM709E(00442): Unmount T00086L8 command error: 172.26.3.249:7600 (4): Request Error (077E): [Cartridge.cc:146]: Cartridge unmount is failed: 578a LTFSI1086E This operation is not allowed on a cartridge with a critical error.
2020-03-24T19:44:09.244112-06:00 ltfs mmm[28988]: GLESM118E(00060): Unmount of tape T00086L8 failed (drive 000780765B). Check the state of tapes and drives.
2020-03-24T19:44:39.166261-06:00 ltfs mmm[28988]: GLESM709E(00442): Unmount T00086L8 command error: 172.26.3.249:7600 (4): Request Error (077E): [Cartridge.cc:146]: Cartridge unmount is failed: 5b10 LTFSI1086E This operation is not allowed on a cartridge with a critical error.
2020-03-24T19:44:39.166568-06:00 ltfs mmm[28988]: GLESM118E(00060): Unmount of tape T00086L8 failed (drive 000780765B). Check the state of tapes and drives.
2020-03-24T19:45:09.283571-06:00 ltfs mmm[28988]: GLESM709E(00442): Unmount T00086L8 command error: 172.26.3.249:7600 (4): Request Error (077E): [Cartridge.cc:146]: Cartridge unmount is failed: 5d71 LTFSI1086E This operation is not allowed on a cartridge with a critical error.
2020-03-24T19:45:09.283890-06:00 ltfs mmm[28988]: GLESM118E(00060): Unmount of tape T00086L8 failed (drive 000780765B). Check the state of tapes and drives.
...
2020-03-25T01:19:14.689053-06:00 ltfs ltfsee[14508]: GLESL062E(00577): Tape with ID: T00086L8 is in an invalid state. Tapes must be either in state "Valid LTFS" or "Unknown".
2020-03-25T01:19:14.695195-06:00 ltfs ltfsee[14501]: GLESL062E(00577): Tape with ID: T00086L8 is in an invalid state. Tapes must be either in state "Valid LTFS" or "Unknown".
2020-03-25T01:19:14.700692-06:00 ltfs ltfsee[14503]: GLESL062E(00577): Tape with ID: T00086L8 is in an invalid state. Tapes must be either in state "Valid LTFS" or "Unknown".
2020-03-25T01:19:14.706384-06:00 ltfs ltfsee[14504]: GLESL062E(00577): Tape with ID: T00086L8 is in an invalid state. Tapes must be either in state "Valid LTFS" or "Unknown".
[root@ltfs ~]# ltfsee info tapes 
Tape ID   Status       Type  Capacity(GiB)  Used(GiB)  Free(GiB)  Reclaimable(GiB)  Pool     Library  Address  Drive       Appendable                
T00086L8  Critical     L8            10907          0          0                 0  TIERED2  T_ARCH   262      000780765B  no

Tasks

Shutdown and Startup

If the tape library needs to go offline, turn off LTFS to ensure drives are not accessed.

# ltfsee stop

When completed, start up LTFS:

# ltfsee start

Remove or Reclaim Tape

If a tape needs to be removed from the pool or tape library, you need to first reclaim its data so that migrated or saved files on the tape are retained.

To start a reclaimation for a specific tape, use ltfsee reclaim -p pool -t tapeid:

[root@ltfs ~]# ltfsee reclaim -p TIERED1 -t T00037L8
Start reclaiming 1 tapes in the following list of tapes:
T00037L8 .
Files in tape T00037L8 are copied to tape T00000L8.

Once completed, you should be able to remove the tape from the pool and then move it to the IO slot.

If you don't reclaim the contents first, you will get these error messages:

[root@ltfs ~]# ltfsee pool remove -p TIERED1 -t T00037L8
GLESL043I(01134): Removing tape T00037L8 from storage pool TIERED1.
GLESL357E(01223): Tape T00037L8 has migrated files or saved files. It has not been removed from the pool.

[root@ltfs ~]# ltfsee tape move ieslot -t T00037L8 -p TIERED1
GLESL170E(00472): Failed to move tape T00037L8 because tape is assigned to a pool and not offline.


Cronjobs

The following cronjob is installed.

# Cron Job Used to Move Data Via policy to LTFS Tape at 1:00 AM
00 1 * * * /mmpolicies/movetogpfs1.sh 
# Cron Job Used to Move Data Via policy to LTFS Tape at 3:00 AM
00 3 * * * /mmpolicies/movetogpfs0.sh 
# Cron Job Used to Reconcile disk data with LTFS Tape data Pool TIERED1
00 20 * * * /opt/ibm/ltfsee/bin/ltfsee reconcile -p TIERED1 -l T_ARCH -g /tiered
# Cron Job Used to Reconcile disk data with LTFS Tape data Pool TIERED2
00 22 * * * /opt/ibm/ltfsee/bin/ltfsee reconcile -p TIERED2 -l T_ARCH -g /tiered
# Cron Job Used to Reclaim LTFS Tape data in Pool TIERED1
00 5 * * sun /opt/ibm/ltfsee/bin/ltfsee reclaim -p TIERED1 -l T_ARCH
# Cron Job Used to Reclaim LTFS Tape Data in Pool TIERED2
00 7 * * sun /opt/ibm/ltfsee/bin/ltfsee reclaim -p TIERED2 -l T_ARCH

See Also

Enable Dark Mode!