From: Steven Truong (midair77_at_[hidden])
Date: 2007-05-08 22:18:51


Hi, all. I am new to OpenMPI and after initial setup I tried to run
my app but got the followign errors:

[node07.my.com:16673] *** An error occurred in MPI_Comm_rank
[node07.my.com:16673] *** on communicator MPI_COMM_WORLD
[node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node07.my.com:16674] *** An error occurred in MPI_Comm_rank
[node07.my.com:16674] *** on communicator MPI_COMM_WORLD
[node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node07.my.com:16675] *** An error occurred in MPI_Comm_rank
[node07.my.com:16675] *** on communicator MPI_COMM_WORLD
[node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node07.my.com:16676] *** An error occurred in MPI_Comm_rank
[node07.my.com:16676] *** on communicator MPI_COMM_WORLD
[node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye)
mpiexec noticed that job rank 2 with PID 16675 on node node07 exited
on signal 60 (Real-time signal 26).

 /usr/local/openmpi-1.2.1/bin/ompi_info
                Open MPI: 1.2.1
   Open MPI SVN revision: r14481
                Open RTE: 1.2.1
   Open RTE SVN revision: r14481
                    OPAL: 1.2.1
       OPAL SVN revision: r14481
                  Prefix: /usr/local/openmpi-1.2.1
 Configured architecture: x86_64-unknown-linux-gnu
           Configured by: root
           Configured on: Mon May 7 18:32:56 PDT 2007
          Configure host: neptune.nanostellar.com
                Built by: root
                Built on: Mon May 7 18:40:28 PDT 2007
              Built host: neptune.my.com
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: yes (all)
      Fortran90 bindings: yes
 Fortran90 bindings size: small
              C compiler: gcc
     C compiler absolute: /usr/bin/gcc
            C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
      Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort
  Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
      Fortran90 compiler: /opt/intel/fce/9.1.043/bin/ifort
  Fortran90 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: yes
     Fortran90 profiling: yes
          C++ exceptions: no
          Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
         libltdl support: yes
   Heterogeneous support: yes
 mpirun default --prefix: yes
           MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.1)
              MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.1)
           MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.1)
           MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.1)
           MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.1)
               MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.1)
         MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.1)
         MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.1)
           MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
           MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
                MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1)
                MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1)
                MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1)
                MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1)
                  MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1)
               MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.1)
               MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1)
              MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.1)
                 MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.1)
                 MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
                MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.1)
              MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.1)
              MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.1)
              MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.1)
                  MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.1)
                  MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.1)
                 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
                 MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA ras: slurm (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.1)
               MCA rmaps: round_robin (MCA v1.0, API v1.3, Component v1.2.1)
                MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.1)
                MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.1)
                 MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA pls: slurm (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA sds: env (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA sds: slurm (MCA v1.0, API v1.0, Component v1.2.1)

As you can see, I used Gnu gcc and g++ with Intel Fortran Compiler to
compile Open MPI and I am not sure if there are any special flags that
I need to have.
./configure --prefix=/usr/local/openmpi-1.2.1 --disable-ipv6
--with-tm=/usr/local/pbs --enable-mpirun-prefix-by-default
--enable-mpi-f90

After getting mpif90, I compiled my application (VASP) with this new
parellel compiler but then I could not run it through PBS.

#PBS -N Pt.CO.bridge.25ML
### Set the number of nodes that will be used. Ensure
### that the number "nodes" matches with the need of your job
### DO NOT MODIFY THE FOLLOWING LINE FOR SINGLE-PROCESSOR JOBS!
#PBS -l nodes=node07:ppn=4
#PBS -l walltime=96:00:00
##PBS -M asit_at_[hidden]
#PBS -m abe
export NPROCS=`wc -l $PBS_NODEFILE |gawk '//{print $1}'`
echo $NPROCS
echo The master node of this job is `hostname`
echo The working directory is `echo $PBS_O_WORKDIR`
echo The node file is $PBS_NODEFILE
echo This job runs on the following $NPROCS nodes:
echo `cat $PBS_NODEFILE`
echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-"
echo
echo command to EXE:
echo
echo
cd $PBS_O_WORKDIR

echo "cachesize=4000 mpiblock=500 npar=4 procgroup=4 mkl ompi"

date
/usr/local/openmpi-1.2.1/bin/mpiexec -mca mpi_paffinity_alone 1 -np
$NPROCS /hom e/struong/bin/vaspmpi_mkl_ompi >"$PBS_JOBID".out
date
------------

My environment is CentOS 4.4 x86_64, Intel Xeon, Torque, Maui.

Could somebody here tell me what I missed or did incorrectly?

Thank you very much.