View Details of Cluster Jobs for Resource Profiling and Reporting

Scientific Computing (SciC) Linux Clusters HPC users and lab-groups can interactively review their batch job status and troubleshoot batch processing problems on the HPC cluster with a new LSF scheduler capability from IBM called RTM (Real Time Monitoring). Real time job status summary and detail can be generated for an individual user, job or queue.  The information can be useful for troubleshooting, understanding job memory usage, and supporting capacity planning, by putting more job information within easy reach.

Connecting to the LSF monitoring web-console

Visit from a computer that is connected to the Mass General Brigham computer network.  You will be prompted for your user ID and password twice.

Running queries

Once in the application verify the active tab is "JobIQ".  Your user ID should be active in the upper left "User" box. If not, enter it in the box and select "Go" on the right side of the user filter section (your current section). Immediately info on your running batch jobs will populate on the screen.  NOTE:  Depending on your browser, you may need to use the horizontal scroll bar to get far enough to the right to see the "Go" and other buttons. 

Other settings in the User Filter selection area are limited pull down selections and are self explanatory.  The "Timespan" selection can give you visibility to jobs from the last two hours up to one day.  As your query is for an individual user,  "ALL" works fine in the other selection settings (Cluster, Queue, Severity, Limit).

Your batch jobs that fall within the "Timespan" window you selected will be presented on the left side of the screen with status "lights" of red or green. Clicking on the RED status lights will open a detail window which provides more detail on the jobs contained within that batch.

Note that here is a periodic screen refresh process.   

Go to KB0027990 in the IS Service Desk

Related articles