include("../../include/msg-header.inc"); ?>
From: Bill Saphir (bsaphir_at_[hidden])
Date: 2007-05-29 23:59:55
George,
This is one of the things I tried, and the setting the oob interface
did not work,
with the error message below.
Also, per this thread:
http://www.open-mpi.org/community/lists/users/2007/05/3319.php
I believe it is oob_tcp_include, not oob_tcp_if_include. The latter
is silently
ignored in 1.2, as far as I can tell.
Interestingly, telling the MPI layer to use lo0 (or to not use tcp at
all) works fine.
But when I try to do the same for the OOB layer, it complains. The
full error is:
[mymac.local:07001] [0,0,0] mca_oob_tcp_init: invalid address ''
returned for selected oob interfaces.
[mymac.local:07001] [0,0,0] ORTE_ERROR_LOG: Error in file oob_tcp.c
at line 1196
mpirun actually hangs at this point and no processes are spawned. I
have to ^C to stop it.
I see this behavior on both Mac OS and on Linux with 1.2.2.
Bill
George Bosilica wrote:
> There are 2 sets of sockets: one for the oob layer and one for the
> MPI layer (at least if TCP support is enabled). Therefore, in order
> to achieve what you're looking for you should add to the command line
> "--mca oob_tcp_if_include lo0 --mca btl_tcp_if_include lo0".
> On May 29, 2007, at 3:58 PM, Bill Saphir wrote:
>
----- original message below ---
> We have run into the following problem:
>
> - start up Open MPI application on a laptop
> - disconnect from network
> - application hangs
>
> I believe that the problem is that all sockets created by Open MPI
> are bound to the external network interface.
> For example, when I start up a 2 process MPI job on my Mac (no
> hosts specified), I get the following tcp
> connections. 192.168.5.2 is an address on my LAN.
>
> tcp4 0 0 192.168.5.2.49459 192.168.5.2.49463
> ESTABLISHED
> tcp4 0 0 192.168.5.2.49463 192.168.5.2.49459
> ESTABLISHED
> tcp4 0 0 192.168.5.2.49456 192.168.5.2.49462
> ESTABLISHED
> tcp4 0 0 192.168.5.2.49462 192.168.5.2.49456
> ESTABLISHED
> tcp4 0 0 192.168.5.2.49456 192.168.5.2.49460
> ESTABLISHED
> tcp4 0 0 192.168.5.2.49460 192.168.5.2.49456
> ESTABLISHED
> tcp4 0 0 192.168.5.2.49456 192.168.5.2.49458
> ESTABLISHED
> tcp4 0 0 192.168.5.2.49458 192.168.5.2.49456
> ESTABLISHED
>
> Since this application is confined to a single machine, I would
> like it to use 127.0.0.1,
> which will remain available as the laptop moves around. I am unable
> to force it to bind
> sockets to this address, however.
>
> Some of the things I've tried are:
> - explicitly setting the hostname to 127.0.0.1 (--host 127.0.0.1)
> - turning off the tcp btl (--mca btl ^tcp) and other variations (--
> mca btl self,sm)
> - using --mca oob_tcp_include lo0
>
> The first two have no effect. The last one results in an error
> message of:
> [myhost.locall:05830] [0,0,0] mca_oob_tcp_init: invalid address ''
> returned for selected oob interfaces.
>
> Is there any way to force Open MPI to bind all sockets to 127.0.0.1?
>
> As a side question -- I'm curious what all of these tcp connections
> are used for. As I increase the number
> of processes, it looks like there are 4 sockets created per MPI
> process, without using the tcp btl.
> Perhaps stdin/out/err + control?
>
> Bill
>
>