Open MPI logo

mpirun(1) man page (version 1.2.9)

  |   Home   |   Support   |   FAQ   |  

« Return to documentation listing



NAME

       orterun,  mpirun,  mpiexec  -  Execute serial and parallel jobs in Open
       MPI.

       Note: mpirun, mpiexec, and orterun are  all  exact  synonyms  for  each
       other.   Using any of the names will result in exactly identical behav-
       ior.

SYNOPSIS

       Single Process Multiple Data (SPMD) Model:

       mpirun [ options ] <program> [ <args> ]

       Multiple Instruction Multiple Data (MIMD) Model:

       mpirun [ global_options ]
              [ local_options1 ] <program1> [ <args1> ] :
              [ local_options2 ] <program2> [ <args2> ] :
              ... :
              [ local_optionsN ] <programN> [ <argsN> ]

       Note that in both models, invoking mpirun via an absolute path name  is
       equivalent to specifying the --prefix option with a <dir> value equiva-
       lent to the directory where mpirun resides, minus  its  last  subdirec-
       tory.  For example:

           shell$ /usr/local/bin/mpirun ...

       is equivalent to

           shell$ mpirun --prefix /usr/local

QUICK SUMMARY

       If you are simply looking for how to run an MPI application, you proba-
       bly want to use a command line of the following form:

           shell$ mpirun [ -np X ] [ --hostfile <filename> ]  <program>

       This will run X copies of <program> in your current  run-time  environ-
       ment  (if running under a supported resource manager, Open MPI's mpirun
       will usually  automatically  use  the  corresponding  resource  manager
       process  starter, as opposed to, for example, rsh or ssh, which require
       the use of a hostfile, or will default to running all X copies  on  the
       localhost),  scheduling  (by  default)  in a round-robin fashion by CPU
       slot.  See the rest of this page for more details.

OPTIONS

       mpirun will send the name of the directory where it was invoked on  the
       local  node  to each of the remote nodes, and attempt to change to that
       directory.  See the "Current Working Directory" section below for  fur-
       ther details.

       <args>    Pass  these  run-time  arguments to every new process.  These
                 must always be the last arguments to mpirun. If an  app  con-
                 text file is used, <args> will be ignored.
                 Provide an appfile, ignoring all other command line  options.

       -bynode, --bynode
                 Allocate (map) the processes by node in a round-robin scheme.

       -byslot, --byslot
                 Allocate (map) the processes by slot in a round-robin scheme.
                 This is the default.

       -c <#>    Synonym for -np.

       -debug, --debug
                 Invoke    the    user-level   debugger   indicated   by   the
                 orte_base_user_debugger MCA parameter.

       -debugger, --debugger
                 Sequence of debuggers to search  for  when  --debug  is  used
                 (i.e.   a synonym for orte_base_user_debugger MCA parameter).

       -gmca, --gmca <key> <value>
                 Pass global MCA parameters that are applicable  to  all  con-
                 texts.  <key> is the parameter name; <value> is the parameter
                 value.

       -h, --help
                 Display help for this command

       -H <host1,host2,...,hostN>
                 Synonym for -host.

       -host, --host <host1,host2,...,hostN>
                 List of hosts on which to invoke processes.

       -hostfile, --hostfile <hostfile>
                 Provide a hostfile to use.

       -machinefile, --machinefile <machinefile>
                 Synonym for -hostfile.

       -mca, --mca <key> <value>
                 Send arguments to various MCA modules.  See  the  "MCA"  sec-
                 tion, below.

       -n, --n <#>
                 Synonym for -np.

       -nolocal, --nolocal
                 Do not run any copies of the launched application on the same
                 node as orterun is running.  This option will override  list-
                 ing  the  localhost  with --host or any other host-specifying
                 mechanism.

       -nooversubscribe, --nooversubscribe
                 Do not oversubscribe any nodes; error (without  starting  any
                 processes)  if  the requested number of processes would cause
                 oversubscription.  This option  implicitly  sets  "max_slots"
                 equal to the "slots" value for each node.

                 model  and  will return an error (without beginning execution
                 of the application) otherwise.

       -nw, --nw Launch the processes and do not wait  for  their  completion.
                 mpirun will complete as soon as successful launch occurs.

       -path, --path <path>
                 <path>  that will be used when attempting to locate requested
                 executables.

       --prefix <dir>
                 Prefix directory that will  be  used  to  set  the  PATH  and
                 LD_LIBRARY_PATH  on  the remote node before invoking Open MPI
                 or the target process.  See the "Remote  Execution"  section,
                 below.

       -q, --quiet
                 Suppress informative messages from orterun during application
                 execution.

       --tmpdir <dir>
                 Set the root for the session directory tree for mpirun  only.

       -tv, --tv Launch  processes  under  the TotalView debugger.  Deprecated
                 backwards compatibility flag. Synonym for --debug.

       --universe <email-address-removed:universe_name>
                 For this application, set the universe name as:
                      email-address-removed:universe_name

       -v, --verbose
                 Be verbose

       -V, --version
                 Print version number.  If no other arguments are given,  this
                 will also cause orterun to exit.

       -wd <dir> Synonym for -wdir.

       -wdir <dir>
                 Change  to the directory <dir> before the user's program exe-
                 cutes.  See the "Current Working Directory" section for notes
                 on relative paths.  Note: If the -wdir option appears both on
                 the command line and in an application context,  the  context
                 will take precedence over the command line.

       -x <env>  Export  the  specified  environment  variables  to the remote
                 nodes before executing  the  program.   Existing  environment
                 variables can be specified (see the Examples section, below),
                 or new variable names specified  with  corresponding  values.
                 The  parser  for  the -x option is not very sophisticated; it
                 does not even understand quoted values.  Users are advised to
                 set  variables  in the environment, and then use -x to export
                 (not define) them.

       The following options are useful for developers; they are not generally
       useful to most ORTE and/or MPI users:

       --debug-daemons-file
              Enable  debugging  of  any OpenRTE daemons used by this applica-
              tion, storing output in files.

       --no-daemonize
              Do not detach OpenRTE daemons used by this application.

DESCRIPTION

       One invocation of mpirun starts an MPI application running  under  Open
       MPI.  If  the  application  is single process multiple data (SPMD), the
       application can be specified on the mpirun command line.

       If the application is multiple instruction multiple data  (MIMD),  com-
       prising  of  multiple programs, the set of programs and argument can be
       specified in one of two ways:  Extended  Command  Line  Arguments,  and
       Application Context.

       An  application  context  describes  the MIMD program set including all
       arguments in a separate file.  This file essentially contains  multiple
       mpirun  command  lines,  less  the command name itself.  The ability to
       specify different options for different instantiations of a program  is
       another reason to use an application context.

       Extended command line arguments allow for the description of the appli-
       cation layout on the command line using  colons  (:)  to  separate  the
       specification  of programs and arguments. Some options are globally set
       across all specified programs (e.g. --hostfile), while others are  spe-
       cific to a single program (e.g. -np).

   Process Slots
       Open  MPI uses "slots" to represent a potential location for a process.
       Hence, a node with 2 slots means that 2 processes can  be  launched  on
       that  node.  For  performance, the community typically equates a "slot"
       with a physical CPU, thus ensuring that any process  assigned  to  that
       slot has a dedicated processor. This is not, however, a requirement for
       the operation of Open MPI.

       Slots can be specified in hostfiles after the hostname.  For example:

       host1.example.com slots=4
           Indicates that there are 4 process slots on host1.

       If no slots value is specified, then Open MPI will automatically assign
       a default value of "slots=1" to that host.

       When  running under resource managers (e.g., SLURM, Torque, etc.), Open
       MPI will obtain both the hostnames and the  number  of  slots  directly
       from  the  resource manger.  For example, if running under a SLURM job,
       Open MPI will automatically receive the hosts that SLURM has  allocated
       to  the  job as well as how many slots on each node that SLURM says are
       usable - in most high-performance environments, the slots  will  equate
       to the number of processors on the node.

       When  deciding  where  to launch processes, Open MPI will first fill up
       all available slots before  oversubscribing  (see  "Location  Nomencla-
       ture", below, for more details on the scheduling algorithms available).
       Unless told otherwise, Open MPI will arbitrarily  oversubscribe  nodes.
       For example, if the only node available is the localhost, Open MPI will
           Indicates  that  there are 4 process slots on host2.  Further, Open
           MPI is limited to launching a maximum of 6 processes on host2.

       host3.example.com slots=2 max_slots=2
           Indicates that there are 2 process slots on host3 and that no over-
           subscription  is allowed (similar to the --nooversubscribe option).

       host4.example.com max_slots=2
           Shorthand; same as listing "slots=2 max_slots=2".

       Note that Open MPI's support for resource managers does  not  currently
       set  the "max_slots" values for hosts.  If you wish to prevent oversub-
       scription in such scenarios, use the --nooversubscribe option.

       In scenarios where the user wishes to launch an application across  all
       available  slots  by  not providing a "-n" option on the mpirun command
       line, Open MPI will launch a process on each process slot for each host
       within  the  provided  environment. For example, if a hostfile has been
       provided, then Open MPI will spawn processes on each identified host up
       to  the "slots=x" limit if oversubscription is not allowed. If oversub-
       scription is allowed (the default), then Open MPI will spawn  processes
       on  each  host up to the "max_slots=y" limit if that value is provided.
       In all cases, the "-bynode" and "-byslot" mapping  directives  will  be
       enforced to ensure proper placement of process ranks.

   Location Nomenclature
       As  described above, mpirun can specify arbitrary locations in the cur-
       rent Open MPI universe.  Locations can be specified either by CPU or by
       node.

       Note:  This  nomenclature  does not force Open MPI to bind processes to
       CPUs -- specifying a location "by CPU" is really a  convenience  mecha-
       nism for SMPs that ultimately maps down to a specific node.

       Specifying  locations by node will launch one copy of an executable per
       specified node.  Using the --bynode option tells Open MPI  to  use  all
       available  nodes.   Using the --byslot option tells Open MPI to use all
       slots on an available node before  allocating  resources  on  the  next
       available node.  For example:

       mpirun --bynode -np 4 a.out
           Runs one copy of the the executable a.out on all available nodes in
           the Open MPI universe.  MPI_COMM_WORLD rank 0  will  be  on  node0,
           rank  1  will  be  on  node1, etc. Regardless of how many slots are
           available on each of the nodes.

       mpirun --byslot -np 4 a.out
           Runs one copy of the the executable a.out on each slot on  a  given
           node before running the executable on other available nodes.

   Specifying Hosts
       Hosts can be specified in a number of ways. The most common of which is
       in a 'hostfile' or 'machinefile'. If our hostfile contain the following
       information:

          shell$ cat my-hostfile
          node00 slots=2
          node01 slots=2

       Here can can include and exclude hosts from the set of hosts to run on.
       For example:

       mpirun -np 3 --host a a.out
              Runs three copies of the executable a.out on host a.

       mpirun -np 3 --host a,b,c a.out
              Runs one copy of the executable a.out on hosts a, b, and c.

       mpirun -np 3 --hostfile my-hostfile --host node00 a.out
              Runs three copies of the executable a.out on host node00.

       mpirun -np 3 --hostfile my-hostfile --host node10 a.out
              This will prompt an error since node10 is  not  in  my-hostfile;
              mpirun will abort.

       shell$ mpirun -np 1 --host a hostname : -np 2 --host b,c uptime
              Runs one copy of the executable hostname on host a. And runs one
              copy of the executable uptime on hosts b and c.

   No Local Launch
       Using the --nolocal option to orterun tells the system  to  not  launch
       any  of the application processes on the same node that orterun is run-
       ning.   While  orterun  typically  blocks  and  consumes   few   system
       resources,  this  option  can  be helpful for launching very large jobs
       where orterun may actually need to  use  noticable  amounts  of  memory
       and/or processing time.  --nolocal allows orteun to run without sharing
       the local node with the launched applications, and likewise allows  the
       launched applications to run unhindered by orterun's system usage.

       Note that --nolocal will override any other specification to launch the
       application on the local node.  It will disqualify the  localhost  from
       being capable of running any processes in the application.

       shell$ mpirun -np 1 --host localhost --nolocal hostname
              This  example  will  result in an error because orterun will not
              find anywhere to launch the application.

   No Oversubscription
       Using the --nooversubscribe option causes Open MPI  to  implicitly  set
       the  "max_slots"  value  to  be  the same as the "slots" value for each
       node.  This can  be  especially  helpful  when  running  jobs  under  a
       resource manager because Open MPI currently only sets the "slots" value
       for each node that it obtains from the resource manager.

   Application Context or Executable Program?
       To distinguish the two different forms, mpirun  looks  on  the  command
       line  for --app option.  If it is specified, then the file named on the
       command line is assumed to be an application context.   If  it  is  not
       specified, then the file is assumed to be an executable program.

   Locating Files
       If  no relative or absolute path is specified for a file, Open MPI will
       look for files by searching the directories in the user's PATH environ-
       ment variable as defined on the source node(s).

       If  a  relative directory is specified, it must be relative to the ini-
       tial working directory determined by the  specific  starter  used.  For
       ries on specific nodes and/or for specific applications.

       If the -wdir option appears both in a context file and on  the  command
       line,  the context file directory will override the command line value.

       If the -wdir option is specified, Open MPI will attempt  to  change  to
       the  specified  directory  on  all  of the remote nodes. If this fails,
       mpirun will abort.

       If the -wdir option is not specified, Open MPI will send the  directory
       name  where  mpirun was invoked to each of the remote nodes. The remote
       nodes will try to change to that directory. If they are  unable  (e.g.,
       if  the  directory  does not exit on that node), then Open MPI will use
       the default directory determined by the starter.

       All directory changing occurs before the user's program is invoked;  it
       does not wait until MPI_INIT is called.

   Standard I/O
       Open  MPI  directs  UNIX  standard  input to /dev/null on all processes
       except the MPI_COMM_WORLD rank 0 process.  The  MPI_COMM_WORLD  rank  0
       process  inherits  standard  input  from  mpirun.   Note: The node that
       invoked  mpirun  need  not  be  the  same  as  the   node   where   the
       MPI_COMM_WORLD rank 0 process resides. Open MPI handles the redirection
       of mpirun's standard input to the rank 0 process.

       Open MPI directs UNIX standard output and error from  remote  nodes  to
       the node that invoked mpirun and prints it on the standard output/error
       of mpirun.  Local processes inherit the standard output/error of mpirun
       and transfer to it directly.

       Thus  it is possible to redirect standard I/O for Open MPI applications
       by using the typical shell redirection procedure on mpirun.

             shell$ mpirun -np 2 my_app < my_input > my_output

       Note that in this example only the MPI_COMM_WORLD rank 0  process  will
       receive  the stream from my_input on stdin.  The stdin on all the other
       nodes will be tied to /dev/null.  However, the stdout  from  all  nodes
       will be collected into the my_output file.

   Signal Propagation
       When orterun receives a SIGTERM and SIGINT, it will attempt to kill the
       entire job by sending all processes in the job  a  SIGTERM,  waiting  a
       small  number  of  seconds,  then  sending  all  processes in the job a
       SIGKILL.  SIGUSR1 and SIGUSR2 signals received by  orterun  are  propa-
       gated  to  all  processes  in the job.  Other signals are not currently
       propagated by orterun.

   Process Termination / Signal Handling
       During the run of an MPI  application,  if  any  rank  dies  abnormally
       (either exiting before invoking MPI_FINALIZE, or dying as the result of
       a signal), mpirun will print out an error message and kill the rest  of
       the MPI application.

       User  signal handlers should probably avoid trying to cleanup MPI state
       (Open MPI is, currently, neither  thread-safe  nor  async-signal-safe).
       For  example,  if  a  segmentation  fault  occurs  in MPI_SEND (perhaps
       Processes  in  the  MPI  application inherit their environment from the
       Open RTE daemon upon the node on which they are running.  The  environ-
       ment  is  typically  inherited from the user's shell.  On remote nodes,
       the exact environment is determined by the boot MCA module  used.   The
       rsh  launch module, for example, uses either rsh/ssh to launch the Open
       RTE daemon on remote nodes, and typically executes one or more  of  the
       user's  shell-setup  files  before launching the Open RTE daemon.  When
       running   dynamically   linked   applications   which    require    the
       LD_LIBRARY_PATH  environment  variable to be set, care must be taken to
       ensure that it is correctly set when booting Open MPI.

       See the "Remote Execution" section for more details.

   Remote Execution
       Open MPI requires that the PATH environment variable  be  set  to  find
       executables  on  remote nodes (this is typically only necessary in rsh-
       or ssh-based environments  --  batch/scheduled  environments  typically
       copy the current environment to the execution of remote jobs, so if the
       current environment has PATH and/or LD_LIBRARY_PATH set  properly,  the
       remote nodes will also have it set properly).  If Open MPI was compiled
       with shared library support, it may  also  be  necessary  to  have  the
       LD_LIBRARY_PATH environment variable set on remote nodes as well (espe-
       cially to find the shared libraries required to run user  MPI  applica-
       tions).

       However,  it  is not always desirable or possible to edit shell startup
       files to set PATH and/or LD_LIBRARY_PATH.  The --prefix option is  pro-
       vided for some simple configurations where this is not possible.

       The  --prefix option takes a single argument: the base directory on the
       remote node where Open MPI is installed.  Open MPI will use this direc-
       tory  to  set  the remote PATH and LD_LIBRARY_PATH before executing any
       Open MPI or user applications.  This allows running Open MPI jobs with-
       out  having  pre-configued  the  PATH and LD_LIBRARY_PATH on the remote
       nodes.

       Open MPI adds the basename of the current node's "bindir"  (the  direc-
       tory where Open MPI's executables are installed) to the prefix and uses
       that to set the PATH on the remote node.  Similarly, Open MPI adds  the
       basename of the current node's "libdir" (the directory where Open MPI's
       libraries are installed) to  the  prefix  and  uses  that  to  set  the
       LD_LIBRARY_PATH on the remote node.  For example:

       Local bindir:  /local/node/directory/bin

       Local libdir:  /local/node/directory/lib64

       If the following command line is used:

           shell$ mpirun --prefix /remote/node/directory

       Open   MPI  will  add  "/remote/node/directory/bin"  to  the  PATH  and
       "/remote/node/directory/lib64" to the D_LIBRARY_PATH on the remote node
       before attempting to execute anything.

       Note that --prefix can be set on a per-context basis, allowing for dif-
       ferent values for different nodes.

       name to mpirun.  For example:

           shell$ /usr/local/bin/mpirun ...

       is equivalent to

           shell$ mpirun --prefix /usr/local

   Exported Environment Variables
       All environment variables that are named in the form OMPI_* will  auto-
       matically  be  exported to new processes on the local and remote nodes.
       The -x option to mpirun can be  used  to  export  specific  environment
       variables  to  the  new  processes.   While the syntax of the -x option
       allows the definition of new variables, note that the parser  for  this
       option  is  currently  not very sophisticated - it does not even under-
       stand quoted values.  Users are advised to set variables in  the  envi-
       ronment and use -x to export them; not to define them.

   MCA (Modular Component Architecture)
       The  -mca  switch  allows the passing of parameters to various MCA mod-
       ules.  MCA modules have direct impact  on  MPI  programs  because  they
       allow  tunable parameters to be set at run time (such as which BTL com-
       munication device driver to use, what parameters to pass to  that  BTL,
       etc.).

       The  -mca  switch  takes  two  arguments: <key> and <value>.  The <key>
       argument generally specifies which MCA module will receive  the  value.
       For example, the <key> "btl" is used to select which BTL to be used for
       transporting MPI messages.  The <value> argument is the value  that  is
       passed.  For example:

       mpirun -mca btl tcp,self -np 1 foo
           Tells  Open MPI to use the "tcp" and "self" BTLs, and to run a sin-
           gle copy of "foo" an allocated node.

       mpirun -mca btl self -np 1 foo
           Tells Open MPI to use the "self" BTL, and to run a single  copy  of
           "foo" an allocated node.

       The  -mca  switch can be used multiple times to specify different <key>
       and/or <value> arguments.  If the same <key>  is  specified  more  than
       once, the <value>s are concatenated with a comma (",") separating them.

       Note: The -mca switch is simply  a  shortcut  for  setting  environment
       variables.   The same effect may be accomplished by setting correspond-
       ing environment variables before running mpirun.  The form of the envi-
       ronment variables that Open MPI sets are:

             OMPI_<key>=<value>

       Note  that  the  -mca  switch  overrides any previously set environment
       variables.  Also note that unknown <key> arguments  are  still  set  as
       environment  variable  -- they are not checked (by mpirun) for correct-
       ness.  Illegal or  incorrect  <value>  arguments  may  or  may  not  be
       reported -- it depends on the specific MCA module.

EXAMPLES

       Be  sure  to  also see the examples in the "Location Nomenclature" sec-

       mpirun -np 4 -mca btl ib,tcp,self prog1
           Run  4  copies of prog1 using the "ib", "tcp", and "self" BTL's for
           the transport of MPI messages.

RETURN VALUE

       mpirun returns 0 if all ranks started  by  mpirun  exit  after  calling
       MPI_FINALIZE.   A  non-zero  value  is  returned  if  an internal error
       occurred in  mpirun,  or  one  or  more  ranks  exited  before  calling
       MPI_FINALIZE.  If an internal error occurred in mpirun, the correspond-
       ing error code is returned.  In the event that one or more  ranks  exit
       before  calling  MPI_FINALIZE,  the  return  value  of  the rank of the
       process that mpirun first notices died before calling MPI_FINALIZE will
       be  returned.   Note that, in general, this will be the first rank that
       died but is not guaranteed to be so.

       However, note that if the -nw switch is used,  the  return  value  from
       mpirun does not indicate the exit status of the ranks.

Open MPI                          March 2006                         MPIRUN(1)

« Return to documentation listing