| 1 | %'cluster_command': creates the command string for launching jobs in the cluster system 'oar' |
|---|
| 2 | % other cluster options 'pbs' and 'psmn' are available in cluster_command_pbs and |
|---|
| 3 | % cluster_command_psmn. The choice is made in the xml file for parameters: series.xml |
|---|
| 4 | %-- ---------------------------------------------------------------------- |
|---|
| 5 | % function cmd=cluster_command(ListProcess,ActionFullName,DirLog,NbProcess, NbCore,CPUTimeProcess) |
|---|
| 6 | % |
|---|
| 7 | %OUTPUT |
|---|
| 8 | % cmd=system command (char string) to launch jobs |
|---|
| 9 | %% |
|---|
| 10 | % |
|---|
| 11 | %INPUT: |
|---|
| 12 | % ListProcessFile: name of the file containing the list of processes to perform |
|---|
| 13 | % ActionFullName: name given to the action (function activated by series) |
|---|
| 14 | % DirLog: name of the folder used to store the log files from calculations |
|---|
| 15 | % NbProcess: number of processes in the list, these processed are grouped by the systwm into jobs dipatched to NbCore cores |
|---|
| 16 | % NbCore: number of computer cores to which the processes are dispatched |
|---|
| 17 | % CPUTimeProcess: estimated CPU time for an individual process (in min) |
|---|
| 18 | |
|---|
| 19 | function cmd=cluster_command(ListProcessFile,ActionFullName,DirLog,NbProcess, NbCore,CPUTimeProcess) |
|---|
| 20 | |
|---|
| 21 | filename_log=fullfile(DirLog,'job_list.stdout'); % file for output messages of the master oar process |
|---|
| 22 | filename_errors=fullfile(DirLog,'job_list.stderr'); % file for error messages of the master oar process |
|---|
| 23 | if NbProcess>=6 |
|---|
| 24 | bigiojob_string=['+{type = ' char(39) 'bigiojob' char(39) '}/licence=1'];% char(39) is quote - bigiojob limit UVmat parallel launch on cluster to avoid saturation of disk access to data |
|---|
| 25 | else |
|---|
| 26 | bigiojob_string=''; |
|---|
| 27 | end |
|---|
| 28 | |
|---|
| 29 | WallTimeMax=23;% absolute limit on computation time (in hours) |
|---|
| 30 | WallTimeTotal=min(WallTimeMax,4*CPUTimeProcess/60);% chosen limit on computation time (in hours),possibly smaller than the absolute limit to favor job priority in the system. |
|---|
| 31 | WallTimeOneProcess=min(4*CPUTimeProcess+10,WallTimeTotal*60/2); % estimated max time of an individual process, used for checkpoint: |
|---|
| 32 | %if less than this time remains before walltime, the job is stopped and a new one can be launched ( by the option 'idempotent') |
|---|
| 33 | |
|---|
| 34 | % if NbCore==1 |
|---|
| 35 | % corestring='cpu=1/core=4'; %increases the allowed memory in case of single core job |
|---|
| 36 | % else |
|---|
| 37 | % corestring=['{cluster=''calcul6''}/core=' num2str(max(NbCore,4))];% calcul6 faster |
|---|
| 38 | corestring=['core=' num2str(max(NbCore,4))]; |
|---|
| 39 | % end |
|---|
| 40 | cmd=['oarsub -n UVmat_' ActionFullName ' '... |
|---|
| 41 | '-t idempotent --checkpoint ' num2str(WallTimeOneProcess*60) ' '... |
|---|
| 42 | '-l "' corestring bigiojob_string... % char(39) is quote - bigiojob limit UVmat parallel launch on cluster |
|---|
| 43 | ',walltime=' datestr(WallTimeTotal/24,13) '" '... |
|---|
| 44 | '-E ' filename_errors ' '... |
|---|
| 45 | '-O ' filename_log ' '... |
|---|
| 46 | '"oar-parexec -s -f ' ListProcessFile ' '... |
|---|
| 47 | '-l ' ListProcessFile '.log"']; |
|---|