$subject_val = "Re: [OMPI users] Spawn and OpenFabrics"; include("../../include/msg-header.inc"); ?>
Subject: Re: [OMPI users] Spawn and OpenFabrics
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-06-05 10:14:45
On Jun 2, 2009, at 3:26 PM, Allen Barnett wrote:
> > Does OMPI say that it has IBV fork support?
> > ompi_info --param btl openib --parsable | grep
> have_fork_support
>
> My RHEL4 system reports:
>
> MCA btl: parameter "btl_openib_want_fork_support" (current value:
> "-1")
> MCA btl: information "btl_openib_have_fork_support" (value: "1")
>
> as does the build installed on the Altix system.
>
Ok, good. Note, however, that OMPI indicating that it has support
simply means that the verbs installed has support for it. It does
*not* mean that the underlying kernel supports it.
> > Be sure to also see http://www.open-mpi.org/faq/?category=openfabrics#ofa-fork
>
> We're using OMPI 1.2.8.
>
Good.
> > > Also, would MPI_COMM_SPAWN suffer from the same difficulties?
> >
> > It shouldn't; we proxy the launch of new commands off to mpirun /
> > OMPI's run-time system. Specifically: the new process(es) are not
> > POSIX children of the process(es) that called MPI_COMM_SPAWN.
>
> Is a program started with MPI_COMM_SPAWN required to call MPI_INIT?
>
Yes. OMPI v1.3 has an extension (a specific MPI_Info key) to indicate
that the spawned program is not an MPI application, but I do not
believe that that existed back in the 1.2 series.
> I
> guess what I'm asking is if I will have to make my partitioner an
> OpenMPI program as well?
>
If you use MPI_COMM_SPAWN with the 1.2 series, yes.
Another less attractive but functional solution would be to do what I
did for the new command notifier due in the OMPI v1.5 series
("notifier" = subsystem to notify external agents when OMPI detects
something wrong, like write to the syslog, send an email, write to a
sysadmin mysql db, etc., "command" = plugin that simply forks and runs
whatever command you want). During MPI_INIT, the fork notifier pre-
forks a dummy process. This dummy process then waits for commands via
a pipe. When the parent (MPI process itself) wants to fork a child,
it sends the argv to exec down the pipe and has the child process
actually do the fork and exec.
Proxying all the fork requests through a secondary process like this
avoids all the problems with registered memory in the child process.
This is icky, but it is an unfortunately necessity for OS-bypass/
registration-based networks like OpenFabrics.
In your case, you'd want to pre-fork before calling MPI_INIT. But the
rest of the technique is pretty much the same.
Have a look at the code in this tree if it helps:
https://svn.open-mpi.org/trac/ompi/browser/trunk/orte/mca/notifier/command
-- Jeff Squyres Cisco Systems