Changeset 89 for trunk/oarutils
- Timestamp:
- Jun 5, 2012, 11:14:20 PM (13 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/oarutils/oar-parexec
r88 r89 589 589 590 590 C<oar-parexec> is compatible with the OAR checkpointing. 591 I lyou have 2000 small jobs that need 55h to be done on 6 cores,591 If you have 2000 small jobs that need 55h to be done on 6 cores, 592 592 you can cut this in small parts. 593 593 … … 631 631 They will be launch with the same command line at the next OAR run. 632 632 633 Example: if you have 50 small jobs that each need 72h to be done on 1 cores, 634 you can cut this in 24h parts. 635 636 For this example, we suppose that each long job loop need about 20min... 637 So, we send a checkpoint 30min before the end of the process 638 to let C<oar-parexec> suspend the jobs started. 639 After being checkpointed, C<oar-parexec> do not start any new small job. 640 641 oarsub -t idempotent -n test \ 642 -l /core=6,walltime=24:00:00 \ 643 --checkpoint 1800 \ 644 --transmit \ 645 "oar-parexec -f ./subjob.list.txt -l ./subjob.list.log" 646 647 After 23h30min, the OAR job will begin to stop launching new small job. 648 When all running small job are suspend, it's exit. 649 But as the OAR job is type C<idempotent>, 650 OAR will re-submit it as long as all small job are not finished... 651 633 652 =head1 SEE ALSO 634 653
Note: See TracChangeset
for help on using the changeset viewer.