From: Brian Barrett (bbarrett_at_[hidden])
Date: 2007-05-30 10:14:04


Bill -

This is a known issue in all released versions of Open MPI. I have a
patch that hopefully will fix this issue in 1.2.3. It's currently
waiting on people in the OPen MPI team to verify I didn't do
something stupid.

Brian

On May 29, 2007, at 9:59 PM, Bill Saphir wrote:

>
> George,
>
> This is one of the things I tried, and the setting the oob
> interface did not work,
> with the error message below.
>
> Also, per this thread:
> http://www.open-mpi.org/community/lists/users/2007/05/3319.php
> I believe it is oob_tcp_include, not oob_tcp_if_include. The latter
> is silently
> ignored in 1.2, as far as I can tell.
>
> Interestingly, telling the MPI layer to use lo0 (or to not use tcp
> at all) works fine.
> But when I try to do the same for the OOB layer, it complains. The
> full error is:
>
> [mymac.local:07001] [0,0,0] mca_oob_tcp_init: invalid address ''
> returned for selected oob interfaces.
> [mymac.local:07001] [0,0,0] ORTE_ERROR_LOG: Error in file oob_tcp.c
> at line 1196
>
> mpirun actually hangs at this point and no processes are spawned. I
> have to ^C to stop it.
> I see this behavior on both Mac OS and on Linux with 1.2.2.
>
> Bill
>
>
> George Bosilica wrote:
>> There are 2 sets of sockets: one for the oob layer and one for the
>> MPI layer (at least if TCP support is enabled). Therefore, in order
>> to achieve what you're looking for you should add to the command line
>> "--mca oob_tcp_if_include lo0 --mca btl_tcp_if_include lo0".
>> On May 29, 2007, at 3:58 PM, Bill Saphir wrote:
>>
>
> ----- original message below ---
>
>> We have run into the following problem:
>>
>> - start up Open MPI application on a laptop
>> - disconnect from network
>> - application hangs
>>
>> I believe that the problem is that all sockets created by Open MPI
>> are bound to the external network interface.
>> For example, when I start up a 2 process MPI job on my Mac (no
>> hosts specified), I get the following tcp
>> connections. 192.168.5.2 is an address on my LAN.
>>
>> tcp4 0 0 192.168.5.2.49459 192.168.5.2.49463
>> ESTABLISHED
>> tcp4 0 0 192.168.5.2.49463 192.168.5.2.49459
>> ESTABLISHED
>> tcp4 0 0 192.168.5.2.49456 192.168.5.2.49462
>> ESTABLISHED
>> tcp4 0 0 192.168.5.2.49462 192.168.5.2.49456
>> ESTABLISHED
>> tcp4 0 0 192.168.5.2.49456 192.168.5.2.49460
>> ESTABLISHED
>> tcp4 0 0 192.168.5.2.49460 192.168.5.2.49456
>> ESTABLISHED
>> tcp4 0 0 192.168.5.2.49456 192.168.5.2.49458
>> ESTABLISHED
>> tcp4 0 0 192.168.5.2.49458 192.168.5.2.49456
>> ESTABLISHED
>>
>> Since this application is confined to a single machine, I would
>> like it to use 127.0.0.1,
>> which will remain available as the laptop moves around. I am
>> unable to force it to bind
>> sockets to this address, however.
>>
>> Some of the things I've tried are:
>> - explicitly setting the hostname to 127.0.0.1 (--host 127.0.0.1)
>> - turning off the tcp btl (--mca btl ^tcp) and other variations (--
>> mca btl self,sm)
>> - using --mca oob_tcp_include lo0
>>
>> The first two have no effect. The last one results in an error
>> message of:
>> [myhost.locall:05830] [0,0,0] mca_oob_tcp_init: invalid address ''
>> returned for selected oob interfaces.
>>
>> Is there any way to force Open MPI to bind all sockets to 127.0.0.1?
>>
>> As a side question -- I'm curious what all of these tcp
>> connections are used for. As I increase the number
>> of processes, it looks like there are 4 sockets created per MPI
>> process, without using the tcp btl.
>> Perhaps stdin/out/err + control?
>>
>> Bill
>>
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users