MATLAB Parallel toolbox configuration on ERISOne

Submitting MATLAB Setup   Jun/2015

How to submit MATLAB jobs to the ERISOne Linux cluster from your Windows MATLAB GUI environment

If you are using MATLAB and have large MATLAB simulation or modeling work that needs to take advantage of the ERISOne cluster, you can distribute your model to the cluster via your local MATLAB installation.   

We have 128 MATLAB DCS (Distributed Computing Sever) licenses, which means 128 computing cores can concurrently be used on the cluster for MATLAB computing.   In order to distribute your jobs from your local system to ERISOne there are some one-time configuration steps you need to take on your local machine.  

This guide assumes you are using MATLAB version 2011b with the PCT (Parallel Computing Toolbox). Different version of MATLAB may vary in the position/name et al. But they are essentially the same.  If you have questions with your system, please contact Scientific Computing (hpcsupport@partners.org )  to have it installed in your local machine. 

Then proceed to these three steps

 

 

Note:  If you have large amounts of data on your own windows machine, you should try to pre-stage the data in the ERISOne storage environment first.  Otherwise, the program will try to upload your data into the ERISOne environment while the job is running, which could cause system degradation.  If you need help pre-staging your data to compute nodes, contact hpcsupport@partners.org

 

  1. Setup and Configure your MATLAB PCT environment

 

  1. Download the file erisone_pct_config.mat.  (NOTE:  When viewing this file in Windows Explorer the .mat suffix may not appear.  However, it will be present when viewing with the dir command in a DOS Command Prompt window.)  Also download MATLAB submission scripts dist_lsf_nonshared.zip, and example codes newexample.zip and unzip.  The newexample.zip contains 2 files named NewExample.m and myfunc.m.  The dist_lsf_nonshared.zip file contains the job-submission script files. These scripts come with MATLAB installation package in HPCWIN3 you can use WinRAR to extract the zip files. 
  2. Start MATLAB on your workstation, go to File->SetPath, and add the extracted “nonshared” folder from dist_lsf_nonshared.zip (that contains your job-submission scripts) to your list of MATLAB paths.  Select the “Save” button. Its possible you get a warning because this apply changes to the computer configuration, just click ok. Thsn you can close the Set Path window.

  1. Close your MATLAB program and restart it.  Select the "Parallel->Management Configurations" menu.  In the "Configurations Manager" window, select "File -> Import", browse to the location where you saved the erisone_pct_config.mat file is and select it, then click "Open".  Right click the new entry in the Configuration Manager window and select Properties.  
  2. Under Properties, edit the following fields: 
    • “Root folder of MATLAB installation for workers": It should not be changed unless you really know you would like to change. The parameter point to the location of installed MATLAB program on ERISone Server. You need to pay attention of the version of Matlab ( 2013 /2011/others) 
    • “Number of workers available to scheduler”: It should be 8.  The larger the number, the more robust the test, but first be sure there are sufficient available nodes on the cluster.
    •  “Folder where job data is stored (Data Location)”: Change it to a directory in your local machine (at which machine you started the interface of MATLAB, say your laptop) where MATLAB can put data and where you should put the NewExample.m and myfunc.m files;
    • “Function called when submitting parallel jobs”: Change it to the directory in server machine (erisone) where you store the extracted  dist_lsf_nonshared.zip  files. It should be something like: {@parallelSubmitFcn, 'erisone.partners.org','/PHShome/ekm31/Matlab/nonshared '}  
    • “Function called when submitting distributed jobs”:Change it also show your ERISOne home directory.  It should be something like: {@distributedSubmitFcn, 'erisone.partners.org', '/PHShome/ekm31/Matlab/nonshared '}
    • Leave all other fields as they are.
    • NOTE :    Be aware that Mathworks changed the function names on 2013, compared with 2011. So choose corresponding setting files.
  3. (For testing purpose, you need make a minor revision on one of the extracted file from dist_lsf_nonshared.zip) Please edit the file getSubmitString.m,  in the last line change from

12 submitString = sprintf('bsub -J %s -o %s %s %s', jobName, quotedLogFile, additionalSubmitArgs, quotedCommand);

into

12 submitString = sprintf('bsub -q medium -J %s -o %s %s %s', jobName, quotedLogFile, additionalSubmitArgs, quotedCommand);

  1. Close Properties, choose the current configuration and click “Start Validation”.  The first time the system may ask for your login id and password for ERISOne cluster.  Type the corresponding information.  (You may be asked for your logon name and password again after stopping the local MATLAB application and restarting it.) 

    The validation will test and validate the “Find Resource”,  “Distribute”, “Parallel” and “Matlabpool”. If it succeeds first three items and only fails the last item, it means you configured successfully.  (Note:  In the PCT configuration, the more workers you choose to test, the longer it will take to complete.  The system might not have enough resource, in which case you can choose fewer workers to test. 

 

  1. Test with example codes for three types of jobs:
    batch distributionMatlabpool distribution and task distribution

 

(As mentioned earlier, be sure your MATLAB scripts, NewExample.m and myfunc.m are in the path selected for “Data Location” during the Properties setup described above.  You can use your location but the path must be listed in the configuration step 2-b [File > Set Path] described earlier.)

3.1 Testing batch single distribution jobs.

  1. In this example, there are two MATLAB scripts.  One is called “NewExample.m” and it is the  main program; the other program is called “myfunc.m” and it is the functions that main program will call.  The program itself is 16 repetitions of a hypothesis test.  The hypothesis is that there is no mean difference for two normal distributions, one distribution is 5,000,000 elements array randomly generated by Norm(0,1), the other is 5,000,000 elements randomly generated by Norm(0.5, 1).   So the outcome is that for all 16 tests of the hypothesis, the results should be "rejected" and return "0" .  Array L[]  is storing the results for the 16 tests.
  2. Start MATLAB and type the following MATLAB commands.  You should be able to see the scheduler resource being discovered.

  1. Submit your program as a batch job to the cluster.   The syntax you need to use is shown in the following S.  It is important that the “FileDependencies” parameter is followed by all the functions that your main program will use.  MATLAB will automatically upload all the dependent files into the cluster for you and run the programs successfully.

  1. After you submit your job through the command "job1=batch( ...)",  if you login to the ERISOne cluster and type the "bjobs" command in your ssh terminal, you should be able to see a job being submitted (it might wait for a while depending on system load) and could be in the "Pending" or " Running" status .  Before you can retrieve your results from the cluster, your client side MATLAB needs to know when the job is finished.  The command "wait(job1)" will not return until MATLAB has retrieved the results to local machine.  You will notice that a Job folder is being created on both your ERISOne home directory and your Windows machine local MATLAB directory you selected in step 2-c.

  1. When the wait command is returned, type "job1" again. You can monitor the job status using the “Job Monitor” under the Parallel Menu.

When the jobs are finished, type "load(job1)", it will load all the results into the current MATLAB workspace (you should be able to see all the objects on the upper right corner of the MATLAB GUI). You can find more details from the workspace window.

3.2 Testing batch Matlabpool distribution into ERISOne cluster  

  1. To test Matlabpool, first create a parallel for loop by changing the "for i=1:16" in the NewExample.m code to "parfor i=1:16". 

  1. Use the batch command to distribute the entire Matlabpool environment into the ERISOne cluster. The batch command in the example wraps the Matlabpool, and request 7 workers to run the parfor loop.  Notice total of 8 workers are being allocated by ERISOne cluster:

Logon to the ERISOne cluster and use "bjobs" to see the resource allocation:  This particular job is very small and should finish in minute. So do not supprise if you did not have any running jobs when you input bjobs. You can input “bjobs –a”  and check the latest one.

  1. Now you need to wait and retrieve the results from the cluster to your Windows MATLAB client environment.  Use the same MATLAB command "wait(job2)",  then "load(job2)".   Then you can see the results objects from your workspace ( e.g.  final_time object)

 

3.3  Testing the Job-Tasks distribution

Under this test, we will use the commands "createJob" and "createTasks".  From within a job you can create an array of tasks, and each task can run the same function with different input arguments.  In our example we will use a loop to create 16 tasks to run the "myfunc.m".   However, you have to make sure that "FileDependencies" is setup correctly for the job object, because the ERISOne cluster needs to fetch your self-defined function(s) before they can run them.

  1. Create the job (still using the previous defined sched object)

  1. Setup the job file dependencies, then create tasks and repeat the same command multiple times.  Then submit the job.  If you logon to the ERISOne cluster system, after a short period, you should see each task is being dispatched as a single running job on the LSF system.

 

  1. Allocate the results using the command "waitForState()".  When it returns, use "getAllOutputArguments" to retrieve the task results together. 

This concludes the process of setting up your local system to submit MATLAB processes via your local MATLAB Windows GUI to the ERISOne.partners.org Linux cluster.  If you need further assistance with this method of running MATLAB or have problems understanding this documentation, please contact hpcsupport@partners.org   


Related articles