Subject: Re: [OMPI users] MPI over ethernet non default-adapter - Need Help/Advice
From: Gus Correa (gus_at_[hidden])
Date: 2009-06-23 15:42:24


Hi Andreas:

You can either exclude eth0 or include eth1 on the OpenMPI
byte transport layer.
To do that you need to insert these flags on your mpiexec command line:

         -mca btl tcp,sm,self
        -mca btl_tcp_if_exclude lo,eth0

or

         -mca btl tcp,sm,self
        -mca btl_tcp_if_include eth1

See this FAQ for more info:
http://www.open-mpi.org/faq/?category=tcp#tcp-selection

(BTW, the OpenMPI FAQs are a great resource!)

You can use the default hosts file (10.42.0.21, 10.42.0.22).
At least it works fine this way for me here,
and diverts all the MPI traffic to the eth1 subnet.
Changing the hosts/machines file would be needed in MPICH2,
not in OpenMPI, as far as I know.
(Here we also use the eth0 network for login, control, and I/O,
which I suppose is what you want to do.
We run both OpenMPI and MPICH2.)

Of course your 10.0.1.0 network should be working correctly (and
separate from the 10.42.0.0 net).
You can check this out with the tools (ping, etc).

I hope this helps,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

Andreas Hoelzlwimmer wrote:
> Hello,
>
> I’m using Open-MPI on a small Cluster of RHEL5.3-Nodes, current
> MPI-Version. For me it is a requirement now to run MPI over a certain
> adapter, in the current case the “eth1”-interface of my system. The
> adapter I want to use MPI for is not the default-adapter (eth0) all the
> rest of the traffic has to go over, but I cannot make MPI use the other
> adapter and therefore a different IP-Address.
>
> The exact problem, showed on 2 Nodes:
> Node 1:
> eth0: 10.42.0.21
> eth1: 10.0.1.21
>
> Node 2:
> eth0: 10.42.0.22
> eth1: 10.0.1.22
>
> for testing purposes, I linked the eth1 adapters of both machines
> together directly and access the machines remotely via eth0. If I now
> try to run an MPI-Program (in this case the MPI-Benchmark HPL) with a
> hosts file that specifies 10.0.1.21 and 10.0.1.22 as hosts, it gets
> quite problematic. The “netstat –a” command shows me that it uses the
> addresses 10.42.* for the connection, the --debug-demon flag tells me
> that MPI initializes both nodes, but after that it runs forever and does
> not terminate. In addition to that, apart from initial traffic of a
> couple of packets, it does not send any network traffic over either of
> the network adapters.
>
> Please tell me if any of you have encounter such a problem or setup and
> can tell me how to fix it. I tried modifying routing tables, play around
> with subnetting, but I wasn’t able to get a successful connection. If
> you need more information on that, please tell me. Please note that I’m
> quite new to Open-MPI, so it might possibly be something about Open-MPI
> I just haven’t discovered yet.
>
>
> Best regards,
> Andreas Hoelzlwimmer
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users