$subject_val = "Re: [OMPI users] mpirun fails on the host"; include("../../include/msg-header.inc"); ?>
Subject: Re: [OMPI users] mpirun fails on the host
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-06-18 21:12:20
Add --debug-devel to your cmd line and you'll get a bunch of diagnostic
info. Did you configure --enable-debug? If so, then additional debug can be
obtained - can let you know how to get it, if necessary.
Ralph
On Thu, Jun 18, 2009 at 3:49 PM, Honest Guvnor
<honestguvnor_at_[hidden]>wrote:
> OpenMPI 1.2.7, ethernet, Centos 5.3 i386 fresh install on host and nodes.
>
> Despite ssh and pdsh working, mpirun hangs when launching a program
> from the host to a node:
>
> [cluster_at_hankel ~]$ ssh n06 hostname
> n06
> [cluster_at_hankel ~]$ pdsh -w n06 hostname
> n06: n06
> [cluster_at_hankel ~]$ mpirun -np 1 --host n06 hostname
> [HANGS]
>
> However, mpirun works fine in reverse:
>
> [cluster_at_n06 ~]$ mpirun -np 1 --host hankel date
> Thu Jun 18 22:53:27 CEST 2009
>
> and from node to node. Paths to bin and lib seem OK:
>
> [cluster_at_hankel ~]$ printenv PATH
>
> /usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/lib/openmpi/1.2.7-gcc/bin:/home/cluster/bin
> [cluster_at_hankel ~]$ printenv LD_LIBRARY_PATH
> :/usr/lib/openmpi/1.2.7-gcc/lib
> [cluster_at_hankel ~]$ ssh n06 printenv PATH
>
> /usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/lib/openmpi/1.2.7-gcc/bin
> [cluster_at_hankel ~]$ ssh n06 printenv LD_LIBRARY_PATH
> :/usr/lib/openmpi/1.2.7-gcc/lib
>
> We are new to openmpi but checked a few mca parameters and turned on a
> diagnostic flag or two but without coming up with much. The nodes do
> not have access to the hosts external network and we half convinced
> ourselves this was the problem because of mentions in the output with
> the -d flag but:
>
> [cluster_at_hankel ~]$ mpirun --mca btl tcp,self --mca btl_tcp_if_exclude
> lo,eth0 --mca oob_tcp_if_exclude lo,eth0 -np 1 --host n06 hostname
> [STILL HANGS]
>
> where eth0 is the external network.
>
> Suggestions gratefully received on how we can get openmpi to report
> what has failed or where to poke and prod further?
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>