include("../../include/msg-header.inc"); ?>
From: Steven Truong (midair77_at_[hidden])
Date: 2007-05-09 14:44:30
Hi, Jeff. Thank you very much for looking into this issue. I am
afraid that I can not give you the application/package because it is a
comercial software. I believe that a lot of people are using this
VASP software package http://cms.mpi.univie.ac.at/vasp/.
My current environment uses MPICH 1.2.7p1, however, because a new set
of dual core machines has posed a new set of challenges and I am
looking into replacing MPICH with openmpi on these machines.
Could Mr. Radican, who wrote that he was able to run VASP with
openMPI, provide a lot more detail regarding how he configure openmpi,
how he compile and run VASP job and anything relating to this issue?
Thank you very much for all your helps.
Steven.
On 5/9/07, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> Can you send a simple test that reproduces these errors?
>
> I.e., if there's a single, simple package that you can send
> instructions on how to build, it would be most helpful if we could
> reproduce the error (and therefore figure out how to fix it).
>
> Thanks!
>
>
> On May 9, 2007, at 2:19 PM, Steven Truong wrote:
>
> > Oh, no. I tried with ACML and had the same set of errors.
> >
> > Steven.
> >
> > On 5/9/07, Steven Truong <midair77_at_[hidden]> wrote:
> >> Hi, Kevin and all. I tried with the following:
> >>
> >> ./configure --prefix=/usr/local/openmpi-1.2.1 --disable-ipv6
> >> --with-tm=/usr/local/pbs --enable-mpirun-prefix-by-default
> >> --enable-mpi-f90 --with-threads=posix --enable-static
> >>
> >> and added the mpi.o in my VASP's makefile but i still got error.
> >>
> >> I forgot to mention that our environment has Intel MKL 9.0 or 8.1 and
> >> my machines are dual proc dual core Xeon 5130 .
> >>
> >> Well, I am going to try acml too.
> >>
> >> Attached is my makefile for VASP and I am not sure if I missed
> >> anything again.
> >>
> >> Thank you very much for all your helps.
> >>
> >> On 5/9/07, Steven Truong <midair77_at_[hidden]> wrote:
> >>> Thank Kevin and Brook for replying to my question. I am going to
> >>> try
> >>> out what Kevin suggested.
> >>>
> >>> Steven.
> >>>
> >>> On 5/9/07, Kevin Radican <radicak_at_[hidden]> wrote:
> >>>> Hi,
> >>>>
> >>>> We use VASP 4.6 in parallel with opemmpi 1.1.2 without any
> >>>> problems on
> >>>> x86_64 with opensuse and compiled with gcc and Intel fortran and
> >>>> use
> >>>> torque PBS.
> >>>>
> >>>> I used standard configure to build openmpi something like
> >>>>
> >>>> ./configure --prefix=/usr/local --enable-static --with-threads
> >>>> --with-tm=/usr/local --with-libnuma
> >>>>
> >>>> I used the ACLM math lapack libs and built Blacs and Scalapack
> >>>> with them
> >>>> too.
> >>>>
> >>>> I attached my vasp makefile, I might of added
> >>>>
> >>>> mpi.o : mpi.F
> >>>> $(CPP)
> >>>> $(FC) -FR -lowercase -O0 -c $*$(SUFFIX)
> >>>>
> >>>> to the end of the make file, It doesn't look like it is in the
> >>>> example
> >>>> makefiles they give, but I compiled this a while ago.
> >>>>
> >>>> Hope this helps.
> >>>>
> >>>> Cheers,
> >>>> Kevin
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote:
> >>>>> Hi, all. I am new to OpenMPI and after initial setup I tried
> >>>>> to run
> >>>>> my app but got the followign errors:
> >>>>>
> >>>>> [node07.my.com:16673] *** An error occurred in MPI_Comm_rank
> >>>>> [node07.my.com:16673] *** on communicator MPI_COMM_WORLD
> >>>>> [node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator
> >>>>> [node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye)
> >>>>> [node07.my.com:16674] *** An error occurred in MPI_Comm_rank
> >>>>> [node07.my.com:16674] *** on communicator MPI_COMM_WORLD
> >>>>> [node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator
> >>>>> [node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye)
> >>>>> [node07.my.com:16675] *** An error occurred in MPI_Comm_rank
> >>>>> [node07.my.com:16675] *** on communicator MPI_COMM_WORLD
> >>>>> [node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator
> >>>>> [node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye)
> >>>>> [node07.my.com:16676] *** An error occurred in MPI_Comm_rank
> >>>>> [node07.my.com:16676] *** on communicator MPI_COMM_WORLD
> >>>>> [node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator
> >>>>> [node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye)
> >>>>> mpiexec noticed that job rank 2 with PID 16675 on node node07
> >>>>> exited
> >>>>> on signal 60 (Real-time signal 26).
> >>>>>
> >>>>> /usr/local/openmpi-1.2.1/bin/ompi_info
> >>>>> Open MPI: 1.2.1
> >>>>> Open MPI SVN revision: r14481
> >>>>> Open RTE: 1.2.1
> >>>>> Open RTE SVN revision: r14481
> >>>>> OPAL: 1.2.1
> >>>>> OPAL SVN revision: r14481
> >>>>> Prefix: /usr/local/openmpi-1.2.1
> >>>>> Configured architecture: x86_64-unknown-linux-gnu
> >>>>> Configured by: root
> >>>>> Configured on: Mon May 7 18:32:56 PDT 2007
> >>>>> Configure host: neptune.nanostellar.com
> >>>>> Built by: root
> >>>>> Built on: Mon May 7 18:40:28 PDT 2007
> >>>>> Built host: neptune.my.com
> >>>>> C bindings: yes
> >>>>> C++ bindings: yes
> >>>>> Fortran77 bindings: yes (all)
> >>>>> Fortran90 bindings: yes
> >>>>> Fortran90 bindings size: small
> >>>>> C compiler: gcc
> >>>>> C compiler absolute: /usr/bin/gcc
> >>>>> C++ compiler: g++
> >>>>> C++ compiler absolute: /usr/bin/g++
> >>>>> Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort
> >>>>> Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
> >>>>> Fortran90 compiler: /opt/intel/fce/9.1.043/bin/ifort
> >>>>> Fortran90 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
> >>>>> C profiling: yes
> >>>>> C++ profiling: yes
> >>>>> Fortran77 profiling: yes
> >>>>> Fortran90 profiling: yes
> >>>>> C++ exceptions: no
> >>>>> Thread support: posix (mpi: no, progress: no)
> >>>>> Internal debug support: no
> >>>>> MPI parameter check: runtime
> >>>>> Memory profiling support: no
> >>>>> Memory debugging support: no
> >>>>> libltdl support: yes
> >>>>> Heterogeneous support: yes
> >>>>> mpirun default --prefix: yes
> >>>>> MCA backtrace: execinfo (MCA v1.0, API v1.0,
> >>>>> Component v1.2.1)
> >>>>> MCA memory: ptmalloc2 (MCA v1.0, API v1.0,
> >>>>> Component v1.2.1)
> >>>>> MCA paffinity: linux (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA maffinity: first_use (MCA v1.0, API v1.0,
> >>>>> Component v1.2.1)
> >>>>> MCA maffinity: libnuma (MCA v1.0, API v1.0,
> >>>>> Component v1.2.1)
> >>>>> MCA timer: linux (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA installdirs: env (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA installdirs: config (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA allocator: basic (MCA v1.0, API v1.0, Component
> >>>>> v1.0)
> >>>>> MCA allocator: bucket (MCA v1.0, API v1.0, Component
> >>>>> v1.0)
> >>>>> MCA coll: basic (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA coll: self (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA coll: sm (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA coll: tuned (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA io: romio (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA mpool: rdma (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA mpool: sm (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA pml: cm (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA pml: ob1 (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA bml: r2 (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA rcache: vma (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA btl: self (MCA v1.0, API v1.0.1, Component
> >>>>> v1.2.1)
> >>>>> MCA btl: sm (MCA v1.0, API v1.0.1, Component
> >>>>> v1.2.1)
> >>>>> MCA btl: tcp (MCA v1.0, API v1.0.1, Component
> >>>>> v1.0)
> >>>>> MCA topo: unity (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA osc: pt2pt (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA errmgr: hnp (MCA v1.0, API v1.3, Component
> >>>>> v1.2.1)
> >>>>> MCA errmgr: orted (MCA v1.0, API v1.3, Component
> >>>>> v1.2.1)
> >>>>> MCA errmgr: proxy (MCA v1.0, API v1.3, Component
> >>>>> v1.2.1)
> >>>>> MCA gpr: null (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA gpr: proxy (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA gpr: replica (MCA v1.0, API v1.0,
> >>>>> Component v1.2.1)
> >>>>> MCA iof: proxy (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA iof: svc (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA ns: proxy (MCA v1.0, API v2.0, Component
> >>>>> v1.2.1)
> >>>>> MCA ns: replica (MCA v1.0, API v2.0,
> >>>>> Component v1.2.1)
> >>>>> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
> >>>>> MCA ras: dash_host (MCA v1.0, API v1.3,
> >>>>> Component v1.2.1)
> >>>>> MCA ras: gridengine (MCA v1.0, API v1.3,
> >>>>> Component v1.2.1)
> >>>>> MCA ras: localhost (MCA v1.0, API v1.3,
> >>>>> Component v1.2.1)
> >>>>> MCA ras: slurm (MCA v1.0, API v1.3, Component
> >>>>> v1.2.1)
> >>>>> MCA ras: tm (MCA v1.0, API v1.3, Component
> >>>>> v1.2.1)
> >>>>> MCA rds: hostfile (MCA v1.0, API v1.3,
> >>>>> Component v1.2.1)
> >>>>> MCA rds: proxy (MCA v1.0, API v1.3, Component
> >>>>> v1.2.1)
> >>>>> MCA rds: resfile (MCA v1.0, API v1.3,
> >>>>> Component v1.2.1)
> >>>>> MCA rmaps: round_robin (MCA v1.0, API v1.3,
> >>>>> Component v1.2.1)
> >>>>> MCA rmgr: proxy (MCA v1.0, API v2.0, Component
> >>>>> v1.2.1)
> >>>>> MCA rmgr: urm (MCA v1.0, API v2.0, Component
> >>>>> v1.2.1)
> >>>>> MCA rml: oob (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA pls: gridengine (MCA v1.0, API v1.3,
> >>>>> Component v1.2.1)
> >>>>> MCA pls: proxy (MCA v1.0, API v1.3, Component
> >>>>> v1.2.1)
> >>>>> MCA pls: rsh (MCA v1.0, API v1.3, Component
> >>>>> v1.2.1)
> >>>>> MCA pls: slurm (MCA v1.0, API v1.3, Component
> >>>>> v1.2.1)
> >>>>> MCA pls: tm (MCA v1.0, API v1.3, Component
> >>>>> v1.2.1)
> >>>>> MCA sds: env (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA sds: pipe (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA sds: seed (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>> MCA sds: singleton (MCA v1.0, API v1.0,
> >>>>> Component v1.2.1)
> >>>>> MCA sds: slurm (MCA v1.0, API v1.0, Component
> >>>>> v1.2.1)
> >>>>>
> >>>>> As you can see, I used Gnu gcc and g++ with Intel Fortran
> >>>>> Compiler to
> >>>>> compile Open MPI and I am not sure if there are any special
> >>>>> flags that
> >>>>> I need to have.
> >>>>> ./configure --prefix=/usr/local/openmpi-1.2.1 --disable-ipv6
> >>>>> --with-tm=/usr/local/pbs --enable-mpirun-prefix-by-default
> >>>>> --enable-mpi-f90
> >>>>>
> >>>>> After getting mpif90, I compiled my application (VASP) with
> >>>>> this new
> >>>>> parellel compiler but then I could not run it through PBS.
> >>>>>
> >>>>> #PBS -N Pt.CO.bridge.25ML
> >>>>> ### Set the number of nodes that will be used. Ensure
> >>>>> ### that the number "nodes" matches with the need of your job
> >>>>> ### DO NOT MODIFY THE FOLLOWING LINE FOR SINGLE-PROCESSOR JOBS!
> >>>>> #PBS -l nodes=node07:ppn=4
> >>>>> #PBS -l walltime=96:00:00
> >>>>> ##PBS -M asit_at_[hidden]
> >>>>> #PBS -m abe
> >>>>> export NPROCS=`wc -l $PBS_NODEFILE |gawk '//{print $1}'`
> >>>>> echo $NPROCS
> >>>>> echo The master node of this job is `hostname`
> >>>>> echo The working directory is `echo $PBS_O_WORKDIR`
> >>>>> echo The node file is $PBS_NODEFILE
> >>>>> echo This job runs on the following $NPROCS nodes:
> >>>>> echo `cat $PBS_NODEFILE`
> >>>>> echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-"
> >>>>> echo
> >>>>> echo command to EXE:
> >>>>> echo
> >>>>> echo
> >>>>> cd $PBS_O_WORKDIR
> >>>>>
> >>>>> echo "cachesize=4000 mpiblock=500 npar=4 procgroup=4 mkl ompi"
> >>>>>
> >>>>> date
> >>>>> /usr/local/openmpi-1.2.1/bin/mpiexec -mca mpi_paffinity_alone 1
> >>>>> -np
> >>>>> $NPROCS /hom e/struong/bin/vaspmpi_mkl_ompi >"$PBS_JOBID".out
> >>>>> date
> >>>>> ------------
> >>>>>
> >>>>> My environment is CentOS 4.4 x86_64, Intel Xeon, Torque, Maui.
> >>>>>
> >>>>> Could somebody here tell me what I missed or did incorrectly?
> >>>>>
> >>>>> Thank you very much.
> >>>>> _______________________________________________
> >>>>> users mailing list
> >>>>> users_at_[hidden]
> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>>
> >>>>
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> users_at_[hidden]
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>
> >>>>
> >>>
> >>
> >>
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>