LSF

From Leo's Notes
Last edited on 30 December 2021, at 05:15.

Platform Load Sharing Facility (or simply LSF) is a job scheduler and workload manager developed by IBM. It is similar to Slurm.

Usage[edit | edit source]

Tasks[edit | edit source]

Command Description
bjobs -u all -a Shows all jobs of all users
bjobs -p -u all -a Shows all jobs with pending reason
bjobs 101 102 Shows jobs with job_ID 101 and 102
bsub Submits a batch job
bhosts Shows all host status
badmin LSF administration shell
bhist -la -u username Shows previous jobs by username

Admin[edit | edit source]

Use the lsadmin and badmin commands to do most admin related things.

Startup[edit | edit source]

Use lsfstartup.

A Load Information Manager (LIM) daemon needs to be running on each server host. This daemon collects host load and configuration information and forwards it to the master LIM service on the master host.

A Remote Execution Server (RES) daemon needs to be running on each server host in order for them to accept remote execution requests.

To start LIM and RES on all hosts, run:

# lsadmin limstartup all
# lsadmin resstartup all

sbatchd needs to be running on all hosts as well and can be started by running

# badmin hstartup all

Restart[edit | edit source]

Use lsfrestart.

If some hosts are in a 'closed' state, you may need to restart LIM and RES on the host.

node01# lsadmin resrestart
node01# lsadmin limrestart


You can also restart on all nodes with 'all':

## restart the 'lsf' service and also lim and res.
# lsadmin resrestart all
# lsadmin limrestart all

Shutdown[edit | edit source]

Use lsfshutdown to prevent users from submitting jobs.

To fully shut down, you must turn off sbatchd, LIM, and RES.

# badmin  hshutdown all
# lsadmin resshutdown all
# lsadmin limshutdown all


Tasks[edit | edit source]

Extend Job Run Time[edit | edit source]

If a job was created with too little wall time (eg. bsub -W), you will see the following output when listing jobs using bjobs -WL, -WF, or -WP:

# bjobs -u all -WL
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME   TIME_LEFT
916742  user001 RUN   interactiv compute001  node001     bash       Jan  5 22:58     -       
933794  user001 RUN   normal     compute001  63*node001  *_de_novo. Jan 13 21:28  59:47 X    
934759  user001 RUN   normal     compute001  56*node001  *test_pasa Jan 17 16:21     -       
936079  user001 RUN   normal     compute001  56*node001  *e_guided. Jan 21 07:03  237:22 E

The TIME_LEFT column shows the time left in hours and minutes. The state is one of:

  • E: The job has an estimated run time that has not been exceeded.
  • L: The job has a hard run time limit specified but either has no estimated run time or the estimated run time is more than the hard run time limit.
  • X: The job has exceeded its estimated run time and the time displayed is the time remaining until the job reaches its hard run time limit.
  • -: A dash indicates that the job has no estimated run time and no run limit, or that it has exceeded its run time but does not have a hard limit and therefore runs until completion.

A job's run time limit can be adjusted using bmod -W HH:MM Job_ID or removed using bmod -Wn Job_ID .

Administration[edit | edit source]

Logs[edit | edit source]

LSF events and accounting logs are stored in /usr/share/lsf/work/hostname/logdir. Logs can grow quite large and can fill the system drive if left unchecked.

Delete old lsb.events.X files.

See:

Metrics to InfluxDB[edit | edit source]

Here's a quick and dirty script to dump data into InfluxDB that captures CPU allocation by user/partition/status. This should allow graphing of node usage as CPU cores allocated per node and by user.

#!/bin/bash

bjobs -u all -a \
        | tail -n+2 \
        | awk '
{
        # node
        split($6, z, "*");

        # No "*" means 1 core
        if ("" ~ z[2]) {
                node=z[1];
                x[$2, $3, $4, node] += 1
        } else {
                # Number of cpus given
                node=z[2];
                x[$2, $3, $4, node] += z[1]
        }
}
END {
        for (i in x) {
                split(i, y, SUBSEP);
                print "lsf,username="y[1]",status="y[2]",queue="y[3]",node="y[4]" value="x[y[1], y[2], y[3], y[4]]
        }
}' \
        | while read i ; do
                echo "curl -X POST 'http://influxdb/write?db=lsf' --data-binary '$i `date +%s"000000000"`'" | sh
        done

Turn bhist output into metrics[edit | edit source]

Here's a craptastic script that's a work in progress that I hacked together to parse out the output from bhist -b -d -w -n X -u all. Where -b for brief, -d for only finished jobs, -w for wide format, which doesn't do what I think it does (it's still bounded by a fixed column width??), -n for the number of logs to parse, and -u for all users.

#!/bin/bash

cat /tmp/jobs \
         |  sed ':a;N;$!ba;s/\n                     //g' \
         |  awk '
BEGIN {

}


# Job <89017>, User <ahawley>, Project <default>, Command
# Job <87010>, User <ajaffer>, Project <default>, Interactive pseudo-terminal shell mode, Command <bash>
/^Job.*/ {
        match($0, /^Job <([^>]+)>.*/, arr)
        printf("Job: " arr[1] "\n")
        match($0, /.*Job Name <([^>]*)>.*/, arr)
        printf("Job Name: " arr[1] "\n")
        match($0, /.*User <([^>]*)>.*/, arr)
        printf("Username: " arr[1] "\n")

}

# Time of submission
# Eg. Sun Jul 26 16:01:33: Submitted from host <synergy-ib>, to Queue <normal>;
/Submitted from/ {
        match($0, /(.*): .*/, arr)
        printf("Submission Date: " arr[1] "\n")
}


# Sun Jul 26 16:01:34: Dispatched 8 Task(s) on Host(s) <8*node034>, Allocated 8 Slot(s) on Host(s) <8*node034>, Effective RES_REQ <select[type == local] order[r15s:pg] span[hosts=1] >;
/Dispatched / {
        match($0, /.*: Dispatched ([0-9]*) Task.*/, arr)
        printf("CPUs: " arr[1] "\n")

        # Request for whole node
        match($0, /.*hosts=([0-9]*)].*/, arr)
        hosts=arr[1]
        if (hosts == "") hosts = 0
        printf("Hosts: " hosts "\n")

        # Explicit memory reqs. undef is entire host?
        match($0, /.*mem=([0-9\.]*)].*/, arr)
        mem=arr[1]
        if (mem == "") mem = 0
        printf("Mem: " mem "\n")
}

# Mon Jul 27 06:15:22: Running with execution home </home/kmuirhead>, Execution CWD </home/kmuirhead/STI_Pathobiont_Project/scripts/ncbi_genome_assembly_proteins>, Execution Pid <17243>;
/ Running with / {
        # Get cwd
        match($0, /^(.*): Running.*Execution CWD <([^>]*)>.*/, arr)
        starttime=arr[1]
        printf("Started: " starttime "\n")

        cwd=arr[2]
        printf("CWD: " cwd "\n")
}


# Mon Jul 27 06:15:58: Done successfully. The CPU time used is 18.1 seconds;
/: Done / {
        # Get complete time
        match($0, /^(.*): Done.*/, arr)
        endtime=arr[1]
        printf("Completed: " endtime "\n")
}

# stats
#  PEND     PSUSP    RUN      USUSP    SSUSP    UNKWN    TOTAL
#   0        0        258684   0        0        0        258684
/^\s*[0-9]+\s*[0-9]+\s*[0-9]+\s*[0-9]+\s*[0-9]+\s*[0-9]+\s*[0-9]+/ {
        printf("Pending: " $1 "   running: " $3 "   total: " $7 "\n\n")
}

See Also[edit | edit source]