From: Brian Barrett (bbarrett_at_[hidden])
Date: 2007-05-28 17:11:20


On May 22, 2007, at 7:52 PM, Tom Clune wrote:

>> For example, if it is ppp0, try:
>>
>> mpirun -np 1 -mca oob_tcp_exclude ppp0 uptime
>
> This seems to at least produce a bit of output before hanging:
>
> LM000953070:~ tlclune$ mpirun -np 1 -mca oob_tcp_exclude ppp0 uptime
> [153.sub-70-211-6.myvzw.com:07562] [0,0,0] mca_oob_tcp_init:
> invalid address '' returned for selected oob interfaces.
> [153.sub-70-211-6.myvzw.com:07562] [0,0,0] ORTE_ERROR_LOG: Error in
> file oob_tcp.c at line 1216

Tom -

I managed to track this down a bit. We try to use the ppp0 interface
(the cell phone device) for network connectivity, as it's the only
non-localhost address up at the time. Unfortunately, we can't use
the address to route messages that way and Open MPI hangs. The
problem is made worse due to a bug that I'm still trying to track
down in Open MPI. When you tell Open MPI to not use a device (like
ppp0), it should just use whatever other devices are available. In
your case, that would be localhost, which is what you're using when
you don't have any network connectivity at all. But it appears that
this instead causes Open MPI to segfault / hang. I'm looking into
exactly why this is happening and should have a fix in the next day
or so.

Brian

-- 
   Brian W. Barrett
   Open MPI Team, CCS-1
   Los Alamos National Laboratory