All the jobs should be submitted thorough LSF scheduler system. Here is some FAQ for the job submissions.
Never forget to check whether your jobs have been submitted successfully.
If you are new user on ERISone, or not familiar with LSF scheduler system, it is always good to make a double check whether your job submission is successful. Once you finish running the job script files, you can input command "bjobs", If your submission is succeed, then your jobs are shown as:
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
744692 ekm31 RUN interact eris1n3 4*cmu12 /bin/bash Sep 21 10:04
In general, please make sure
- Your job is shown, if there was some error in the job script file, you may not find the submitted jobs at all!
- Your job are shown in the correct QUEUE, number_of_process as your desired. Your job should be from host eris1n2 or eris1n3 and the number before * under EXEC_HOST should be the number of processes the job used. Depending on the server loading, your job could be either PENDING or RUN.
- Also it is good to run command "top" or "htop" after a job submission. If you found there are jobs running by you on the login node, then something wrong with the submission. Terminate the jobs on the login node by using "kill -9 PID", where PID is your process ID from the "top" command, and check your submission.
- Never forget to use the character "<" between bsub and your script file name in your command line. It should be like:
bsub -q long < my_script.lsf
- Remember to make resource reservation. Refer to https://rc.partners.org/kb/article/2735
- If your job was not submitted with proper resource reservation, your jobs might be stuck into PSUSP/USUSP/SSUSP. The reason is your jobs required more resource than the nodes have. Then the system may automatically suspend the jobs to protect the nodes. We are monitoring such jobs, but if you found your jobs are PSUSP/USUSP/SSUSP, contact us at Scientific Computing.