From: smairal_at_[hidden]
Date: 2007-05-30 16:04:07


I use a shared memory system and for my MPI algorithm, I set the
IP-addresses for all the nodes as 127.0.0.1 in some_hostfile and I
execute the program using "mpirun --machinefile some_hostfile -np 4
prog-name". I think, by default the sm btl switch is ON. Will this help
in such a case? I am not sure but you may just give it a try, if you
haven't tried this, Bill.

-Sarang.

Quoting Brian Barrett <bbarrett_at_[hidden]>:

> Bill -
>
> This is a known issue in all released versions of Open MPI. I have a
> patch that hopefully will fix this issue in 1.2.3. It's currently
> waiting on people in the OPen MPI team to verify I didn't do
> something stupid.
>
> Brian
>
> On May 29, 2007, at 9:59 PM, Bill Saphir wrote:
>
> >
> > George,
> >
> > This is one of the things I tried, and the setting the oob
> > interface did not work,
> > with the error message below.
> >
> > Also, per this thread:
> > http://www.open-mpi.org/community/lists/users/2007/05/3319.php
> > I believe it is oob_tcp_include, not oob_tcp_if_include. The latter
> > is silently
> > ignored in 1.2, as far as I can tell.
> >
> > Interestingly, telling the MPI layer to use lo0 (or to not use tcp
> > at all) works fine.
> > But when I try to do the same for the OOB layer, it complains. The
> > full error is:
> >
> > [mymac.local:07001] [0,0,0] mca_oob_tcp_init: invalid address ''
> > returned for selected oob interfaces.
> > [mymac.local:07001] [0,0,0] ORTE_ERROR_LOG: Error in file oob_tcp.c
> > at line 1196
> >
> > mpirun actually hangs at this point and no processes are spawned. I
> > have to ^C to stop it.
> > I see this behavior on both Mac OS and on Linux with 1.2.2.
> >
> > Bill
> >
> >
> > George Bosilica wrote:
> >> There are 2 sets of sockets: one for the oob layer and one for the
> >> MPI layer (at least if TCP support is enabled). Therefore, in
> order
> >> to achieve what you're looking for you should add to the command
> line
> >> "--mca oob_tcp_if_include lo0 --mca btl_tcp_if_include lo0".
> >> On May 29, 2007, at 3:58 PM, Bill Saphir wrote:
> >>
> >
> > ----- original message below ---
> >
> >> We have run into the following problem:
> >>
> >> - start up Open MPI application on a laptop
> >> - disconnect from network
> >> - application hangs
> >>
> >> I believe that the problem is that all sockets created by Open MPI
> >> are bound to the external network interface.
> >> For example, when I start up a 2 process MPI job on my Mac (no
> >> hosts specified), I get the following tcp
> >> connections. 192.168.5.2 is an address on my LAN.
> >>
> >> tcp4 0 0 192.168.5.2.49459 192.168.5.2.49463
> >> ESTABLISHED
> >> tcp4 0 0 192.168.5.2.49463 192.168.5.2.49459
> >> ESTABLISHED
> >> tcp4 0 0 192.168.5.2.49456 192.168.5.2.49462
> >> ESTABLISHED
> >> tcp4 0 0 192.168.5.2.49462 192.168.5.2.49456
> >> ESTABLISHED
> >> tcp4 0 0 192.168.5.2.49456 192.168.5.2.49460
> >> ESTABLISHED
> >> tcp4 0 0 192.168.5.2.49460 192.168.5.2.49456
> >> ESTABLISHED
> >> tcp4 0 0 192.168.5.2.49456 192.168.5.2.49458
> >> ESTABLISHED
> >> tcp4 0 0 192.168.5.2.49458 192.168.5.2.49456
> >> ESTABLISHED
> >>
> >> Since this application is confined to a single machine, I would
> >> like it to use 127.0.0.1,
> >> which will remain available as the laptop moves around. I am
> >> unable to force it to bind
> >> sockets to this address, however.
> >>
> >> Some of the things I've tried are:
> >> - explicitly setting the hostname to 127.0.0.1 (--host 127.0.0.1)
> >> - turning off the tcp btl (--mca btl ^tcp) and other variations
> (--
> >> mca btl self,sm)
> >> - using --mca oob_tcp_include lo0
> >>
> >> The first two have no effect. The last one results in an error
> >> message of:
> >> [myhost.locall:05830] [0,0,0] mca_oob_tcp_init: invalid address ''
> >> returned for selected oob interfaces.
> >>
> >> Is there any way to force Open MPI to bind all sockets to
> 127.0.0.1?
> >>
> >> As a side question -- I'm curious what all of these tcp
> >> connections are used for. As I increase the number
> >> of processes, it looks like there are 4 sockets created per MPI
> >> process, without using the tcp btl.
> >> Perhaps stdin/out/err + control?
> >>
> >> Bill
> >>
> >>
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>