$subject_val = "Re: [OMPI users] mpirun fails on the host"; include("../../include/msg-header.inc"); ?>
Subject: Re: [OMPI users] mpirun fails on the host
From: Honest Guvnor (honestguvnor_at_[hidden])
Date: 2009-06-19 04:15:30
On Fri, Jun 19, 2009 at 3:12 AM, Ralph Castain <rhc_at_[hidden]> wrote:
> Add --debug-devel to your cmd line and you'll get a bunch of diagnostic
> info. Did you configure --enable-debug? If so, then additional debug can be
> obtained - can let you know how to get it, if necessary.
Yes we had run with the -d flag and it was the output from this that
prompted us to find out how to prevent the use of the external network. I am
not sure what most of the messages mean but we still get quite a few
references to hankel.fred.com which the nodes will not be able to access.
Here is the output (changed external ip numbers and domain):
[cluster_at_hankel ~]$ mpirun --debug-devel --mca btl tcp,self --mca
btl_tcp_if_exclude lo,eth0 --mca oob_tcp_if_exclude lo,eth0 -np 1 --host n06
hostname
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] connect_uni: connection not allowed
[hankel.fred.com:26997] [0,0,0] setting up session dir with
[hankel.fred.com:26997] universe default-universe-26997
[hankel.fred.com:26997] user cluster
[hankel.fred.com:26997] host hankel.fred.com
[hankel.fred.com:26997] jobid 0
[hankel.fred.com:26997] procid 0
[hankel.fred.com:26997] procdir:
/tmp/openmpi-sessions-cluster_at_[hidden]_0/default-universe-26997/0/0
[hankel.fred.com:26997] jobdir:
/tmp/openmpi-sessions-cluster_at_[hidden]_0/default-universe-26997/0
[hankel.fred.com:26997] unidir:
/tmp/openmpi-sessions-cluster_at_[hidden]_0/default-universe-26997
[hankel.fred.com:26997] top: openmpi-sessions-cluster_at_[hidden]_0
[hankel.fred.com:26997] tmp: /tmp
[hankel.fred.com:26997] [0,0,0] contact_file
/tmp/openmpi-sessions-cluster_at_[hidden]_0
/default-universe-26997/universe-setup.txt
[hankel.fred.com:26997] [0,0,0] wrote setup file
[hankel.fred.com:26997] pls:rsh: local csh: 0, local sh: 1
[hankel.fred.com:26997] pls:rsh: assuming same remote shell as local shell
[hankel.fred.com:26997] pls:rsh: remote csh: 0, remote sh: 1
[hankel.fred.com:26997] pls:rsh: final template argv:
[hankel.fred.com:26997] pls:rsh: /usr/bin/ssh <template> orted --debug
--bootproxy 1 --name <template> --num_procs 2 --vpid_start 0 --nodename
<template> --universe cluster_at_[hidden]:default-universe-26997
--nsreplica "0.0.0;tcp://192.168.0.99:54116" --gprreplica "0.0.0;tcp://
192.168.0.99:54116"
[hankel.fred.com:26997] pls:rsh: launching on node n06
[hankel.fred.com:26997] pls:rsh: n06 is a REMOTE node
[hankel.fred.com:26997] pls:rsh: executing: (//usr/bin/ssh) /usr/bin/ssh n06
PATH=/usr/lib/openmpi/1.2.7-gcc/bin:$PATH ; export PATH ;
LD_LIBRARY_PATH=/usr/lib/openmpi/1.2.7-gcc/lib:$LD_LIBRARY_PATH ; export
LD_LIBRARY_PATH ; /usr/lib/openmpi/1.2.7-gcc/bin/orted --debug --bootproxy 1
--name 0.0.1 --num_procs 2 --vpid_start 0 --nodename n06 --universe
cluster_at_[hidden]:default-universe-26997 --nsreplica "0.0.0;tcp://
192.168.0.99:54116" --gprreplica "0.0.0;tcp://192.168.0.99:54116" [HOSTNAME=
hankel.fred.com TERM=xterm-color SHELL=/bin/bash HISTSIZE=1000
SSH_CLIENT=130.149.86.77 50506 22 SSH_TTY=/dev/pts/12 USER=cluster
LD_LIBRARY_PATH=:/usr/lib/openmpi/1.2.7-gcc/lib
LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.sh=01;32:*.csh=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.cpio=01;31:*.jpg=01;35:*.gif=01;35:*.bmp=01;35:*.xbm=01;35:*.xpm=01;35:*.png=01;35:*.tif=01;35:
MAIL=/var/spool/mail/cluster
PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/lib/openmpi/1.2.7-gcc/bin:/home/cluster/bin
INPUTRC=/etc/inputrc PWD=/home/cluster LANG=en_US.UTF-8
SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass SHLVL=1
HOME=/home/cluster LOGNAME=cluster CVS_RSH=ssh
SSH_CONNECTION=222.222.222.222 50506 111.111.111.111 22
LESSOPEN=|/usr/bin/lesspipe.sh %s G_BROKEN_FILENAMES=1
_=/usr/lib/openmpi/1.2.7-gcc/bin/mpirun OMPI_MCA_orte_debug=1
OMPI_MCA_btl=tcp,self OMPI_MCA_btl_tcp_if_exclude=lo,eth0
OMPI_MCA_oob_tcp_if_exclude=lo,eth0 OMPI_MCA_seed=0]