From: Ole Holm Nielsen (Ole.H.Nielsen_at_[hidden])
Date: 2007-05-01 10:54:06


We have built OpenMPI 1.2.1 with support for Torque 2.1.8 and its
Task Manager interface. We use the PGI 6.2-4 compiler and the
--with-tm option as described in
http://www.open-mpi.org/faq/?category=building#build-rte-tm
for building an OpenMPI RPM on a Pentium-4 machine running CentOS 4.4
(RHEL4U4 clone). The TM interface seems to be available as it should:

# ompi_info | grep tm
               MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.1)
                  MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1)
                  MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.1)

When we submit a Torque batch job running the example code in
openmpi-1.2.1/examples/hello_c.c we get this error message:

/usr/local/openmpi-1.2.1-pgi/bin/mpirun -np 2 -machinefile $PBS_NODEFILE hello_c
[u126.dcsc.fysik.dtu.dk:11981] pls:tm: failed to poll for a spawned proc, return
status = 17002
[u126.dcsc.fysik.dtu.dk:11981] [0,0,0] ORTE_ERROR_LOG: In errno in file
rmgr_urm.c at line 462
[u126.dcsc.fysik.dtu.dk:11981] mpirun: spawn failed with errno=-11

When we run the same code in an interactive (non-Torque) shell the
hello_c code works correctly:

# /usr/local/openmpi-1.2.1-pgi/bin/mpirun -np 2 -machinefile hostfile hello_c
Hello, world, I am 0 of 2
Hello, world, I am 1 of 2

To prove that the Torque TM interface is working correctly we also make this
test within the Torque batch job using the Torque pbsdsh command:

pbsdsh hostname
u126.dcsc.fysik.dtu.dk
u113.dcsc.fysik.dtu.dk

So obviously something is broken between Torque 2.1.8 and OpenMPI 1.2.1
with respect to the TM interface, whereas either one alone seems to work
correctly. Can anyone suggest a solution to this problem ?

I wonder if this problem may be related to this list thread:
http://www.open-mpi.org/community/lists/users/2007/04/3028.php

Details of configuration:
-------------------------

We use the buildrpm.sh script from
http://www.open-mpi.org/software/ompi/v1.2/srpm.php
and change the following options in the script:

prefix="/usr/local/openmpi-1.2.1-pgi"

configure_options="--with-tm=/usr/local FC=pgf90 F77=pgf90 CC=pgcc CXX=pgCC
CFLAGS=-Msignextend CXXFLAGS=-Msignextend --with-wrapper-cflags=-Msignextend
--with-wrapper-cxxflags=-Msignextend FFLAGS
=-Msignextend FCFLAGS=-Msignextend --with-wrapper-fflags=-Msignextend
--with-wrapper-fcflags=-Msignextend"
rpmbuild_options=${rpmbuild_options}" --define 'install_in_opt 0' --define
'install_shell_scripts 1' --define 'install_modulefile 0'"
rpmbuild_options=${rpmbuild_options}" --define '_prefix ${prefix}'"

build_single=yes

-- 
Ole Holm Nielsen
Department of Physics, Technical University of Denmark