From: Steven Truong (midair77_at_[hidden])
Date: 2007-05-09 17:11:21


Thank Jeff very much for your efforts and helps.

On 5/9/07, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> I have mailed the VASP maintainer asking for a copy of the code.
> Let's see what happens.
>
> On May 9, 2007, at 2:44 PM, Steven Truong wrote:
>
> > Hi, Jeff. Thank you very much for looking into this issue. I am
> > afraid that I can not give you the application/package because it is a
> > comercial software. I believe that a lot of people are using this
> > VASP software package http://cms.mpi.univie.ac.at/vasp/.
> >
> > My current environment uses MPICH 1.2.7p1, however, because a new set
> > of dual core machines has posed a new set of challenges and I am
> > looking into replacing MPICH with openmpi on these machines.
> >
> > Could Mr. Radican, who wrote that he was able to run VASP with
> > openMPI, provide a lot more detail regarding how he configure openmpi,
> > how he compile and run VASP job and anything relating to this issue?
> >
> > Thank you very much for all your helps.
> > Steven.
> >
> > On 5/9/07, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> >> Can you send a simple test that reproduces these errors?
> >>
> >> I.e., if there's a single, simple package that you can send
> >> instructions on how to build, it would be most helpful if we could
> >> reproduce the error (and therefore figure out how to fix it).
> >>
> >> Thanks!
> >>
> >>
> >> On May 9, 2007, at 2:19 PM, Steven Truong wrote:
> >>
> >>> Oh, no. I tried with ACML and had the same set of errors.
> >>>
> >>> Steven.
> >>>
> >>> On 5/9/07, Steven Truong <midair77_at_[hidden]> wrote:
> >>>> Hi, Kevin and all. I tried with the following:
> >>>>
> >>>> ./configure --prefix=/usr/local/openmpi-1.2.1 --disable-ipv6
> >>>> --with-tm=/usr/local/pbs --enable-mpirun-prefix-by-default
> >>>> --enable-mpi-f90 --with-threads=posix --enable-static
> >>>>
> >>>> and added the mpi.o in my VASP's makefile but i still got error.
> >>>>
> >>>> I forgot to mention that our environment has Intel MKL 9.0 or
> >>>> 8.1 and
> >>>> my machines are dual proc dual core Xeon 5130 .
> >>>>
> >>>> Well, I am going to try acml too.
> >>>>
> >>>> Attached is my makefile for VASP and I am not sure if I missed
> >>>> anything again.
> >>>>
> >>>> Thank you very much for all your helps.
> >>>>
> >>>> On 5/9/07, Steven Truong <midair77_at_[hidden]> wrote:
> >>>>> Thank Kevin and Brook for replying to my question. I am going to
> >>>>> try
> >>>>> out what Kevin suggested.
> >>>>>
> >>>>> Steven.
> >>>>>
> >>>>> On 5/9/07, Kevin Radican <radicak_at_[hidden]> wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> We use VASP 4.6 in parallel with opemmpi 1.1.2 without any
> >>>>>> problems on
> >>>>>> x86_64 with opensuse and compiled with gcc and Intel fortran and
> >>>>>> use
> >>>>>> torque PBS.
> >>>>>>
> >>>>>> I used standard configure to build openmpi something like
> >>>>>>
> >>>>>> ./configure --prefix=/usr/local --enable-static --with-threads
> >>>>>> --with-tm=/usr/local --with-libnuma
> >>>>>>
> >>>>>> I used the ACLM math lapack libs and built Blacs and Scalapack
> >>>>>> with them
> >>>>>> too.
> >>>>>>
> >>>>>> I attached my vasp makefile, I might of added
> >>>>>>
> >>>>>> mpi.o : mpi.F
> >>>>>> $(CPP)
> >>>>>> $(FC) -FR -lowercase -O0 -c $*$(SUFFIX)
> >>>>>>
> >>>>>> to the end of the make file, It doesn't look like it is in the
> >>>>>> example
> >>>>>> makefiles they give, but I compiled this a while ago.
> >>>>>>
> >>>>>> Hope this helps.
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Kevin
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, 2007-05-08 at 19:18 -0700, Steven Truong wrote:
> >>>>>>> Hi, all. I am new to OpenMPI and after initial setup I tried
> >>>>>>> to run
> >>>>>>> my app but got the followign errors:
> >>>>>>>
> >>>>>>> [node07.my.com:16673] *** An error occurred in MPI_Comm_rank
> >>>>>>> [node07.my.com:16673] *** on communicator MPI_COMM_WORLD
> >>>>>>> [node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator
> >>>>>>> [node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye)
> >>>>>>> [node07.my.com:16674] *** An error occurred in MPI_Comm_rank
> >>>>>>> [node07.my.com:16674] *** on communicator MPI_COMM_WORLD
> >>>>>>> [node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator
> >>>>>>> [node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye)
> >>>>>>> [node07.my.com:16675] *** An error occurred in MPI_Comm_rank
> >>>>>>> [node07.my.com:16675] *** on communicator MPI_COMM_WORLD
> >>>>>>> [node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator
> >>>>>>> [node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye)
> >>>>>>> [node07.my.com:16676] *** An error occurred in MPI_Comm_rank
> >>>>>>> [node07.my.com:16676] *** on communicator MPI_COMM_WORLD
> >>>>>>> [node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator
> >>>>>>> [node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye)
> >>>>>>> mpiexec noticed that job rank 2 with PID 16675 on node node07
> >>>>>>> exited
> >>>>>>> on signal 60 (Real-time signal 26).
> >>>>>>>
> >>>>>>> /usr/local/openmpi-1.2.1/bin/ompi_info
> >>>>>>> Open MPI: 1.2.1
> >>>>>>> Open MPI SVN revision: r14481
> >>>>>>> Open RTE: 1.2.1
> >>>>>>> Open RTE SVN revision: r14481
> >>>>>>> OPAL: 1.2.1
> >>>>>>> OPAL SVN revision: r14481
> >>>>>>> Prefix: /usr/local/openmpi-1.2.1
> >>>>>>> Configured architecture: x86_64-unknown-linux-gnu
> >>>>>>> Configured by: root
> >>>>>>> Configured on: Mon May 7 18:32:56 PDT 2007
> >>>>>>> Configure host: neptune.nanostellar.com
> >>>>>>> Built by: root
> >>>>>>> Built on: Mon May 7 18:40:28 PDT 2007
> >>>>>>> Built host: neptune.my.com
> >>>>>>> C bindings: yes
> >>>>>>> C++ bindings: yes
> >>>>>>> Fortran77 bindings: yes (all)
> >>>>>>> Fortran90 bindings: yes
> >>>>>>> Fortran90 bindings size: small
> >>>>>>> C compiler: gcc
> >>>>>>> C compiler absolute: /usr/bin/gcc
> >>>>>>> C++ compiler: g++
> >>>>>>> C++ compiler absolute: /usr/bin/g++
> >>>>>>> Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort
> >>>>>>> Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
> >>>>>>> Fortran90 compiler: /opt/intel/fce/9.1.043/bin/ifort
> >>>>>>> Fortran90 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
> >>>>>>> C profiling: yes
> >>>>>>> C++ profiling: yes
> >>>>>>> Fortran77 profiling: yes
> >>>>>>> Fortran90 profiling: yes
> >>>>>>> C++ exceptions: no
> >>>>>>> Thread support: posix (mpi: no, progress: no)
> >>>>>>> Internal debug support: no
> >>>>>>> MPI parameter check: runtime
> >>>>>>> Memory profiling support: no
> >>>>>>> Memory debugging support: no
> >>>>>>> libltdl support: yes
> >>>>>>> Heterogeneous support: yes
> >>>>>>> mpirun default --prefix: yes
> >>>>>>> MCA backtrace: execinfo (MCA v1.0, API v1.0,
> >>>>>>> Component v1.2.1)
> >>>>>>> MCA memory: ptmalloc2 (MCA v1.0, API v1.0,
> >>>>>>> Component v1.2.1)
> >>>>>>> MCA paffinity: linux (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA maffinity: first_use (MCA v1.0, API v1.0,
> >>>>>>> Component v1.2.1)
> >>>>>>> MCA maffinity: libnuma (MCA v1.0, API v1.0,
> >>>>>>> Component v1.2.1)
> >>>>>>> MCA timer: linux (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA installdirs: env (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA installdirs: config (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA allocator: basic (MCA v1.0, API v1.0, Component
> >>>>>>> v1.0)
> >>>>>>> MCA allocator: bucket (MCA v1.0, API v1.0, Component
> >>>>>>> v1.0)
> >>>>>>> MCA coll: basic (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA coll: self (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA coll: sm (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA coll: tuned (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA io: romio (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA mpool: rdma (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA mpool: sm (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA pml: cm (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA pml: ob1 (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA bml: r2 (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA rcache: vma (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA btl: self (MCA v1.0, API v1.0.1, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA btl: sm (MCA v1.0, API v1.0.1, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA btl: tcp (MCA v1.0, API v1.0.1, Component
> >>>>>>> v1.0)
> >>>>>>> MCA topo: unity (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA osc: pt2pt (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA errmgr: hnp (MCA v1.0, API v1.3, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA errmgr: orted (MCA v1.0, API v1.3, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA errmgr: proxy (MCA v1.0, API v1.3, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA gpr: null (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA gpr: proxy (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA gpr: replica (MCA v1.0, API v1.0,
> >>>>>>> Component v1.2.1)
> >>>>>>> MCA iof: proxy (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA iof: svc (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA ns: proxy (MCA v1.0, API v2.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA ns: replica (MCA v1.0, API v2.0,
> >>>>>>> Component v1.2.1)
> >>>>>>> MCA oob: tcp (MCA v1.0, API v1.0, Component
> >>>>>>> v1.0)
> >>>>>>> MCA ras: dash_host (MCA v1.0, API v1.3,
> >>>>>>> Component v1.2.1)
> >>>>>>> MCA ras: gridengine (MCA v1.0, API v1.3,
> >>>>>>> Component v1.2.1)
> >>>>>>> MCA ras: localhost (MCA v1.0, API v1.3,
> >>>>>>> Component v1.2.1)
> >>>>>>> MCA ras: slurm (MCA v1.0, API v1.3, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA ras: tm (MCA v1.0, API v1.3, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA rds: hostfile (MCA v1.0, API v1.3,
> >>>>>>> Component v1.2.1)
> >>>>>>> MCA rds: proxy (MCA v1.0, API v1.3, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA rds: resfile (MCA v1.0, API v1.3,
> >>>>>>> Component v1.2.1)
> >>>>>>> MCA rmaps: round_robin (MCA v1.0, API v1.3,
> >>>>>>> Component v1.2.1)
> >>>>>>> MCA rmgr: proxy (MCA v1.0, API v2.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA rmgr: urm (MCA v1.0, API v2.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA rml: oob (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA pls: gridengine (MCA v1.0, API v1.3,
> >>>>>>> Component v1.2.1)
> >>>>>>> MCA pls: proxy (MCA v1.0, API v1.3, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA pls: rsh (MCA v1.0, API v1.3, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA pls: slurm (MCA v1.0, API v1.3, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA pls: tm (MCA v1.0, API v1.3, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA sds: env (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA sds: pipe (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA sds: seed (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>> MCA sds: singleton (MCA v1.0, API v1.0,
> >>>>>>> Component v1.2.1)
> >>>>>>> MCA sds: slurm (MCA v1.0, API v1.0, Component
> >>>>>>> v1.2.1)
> >>>>>>>
> >>>>>>> As you can see, I used Gnu gcc and g++ with Intel Fortran
> >>>>>>> Compiler to
> >>>>>>> compile Open MPI and I am not sure if there are any special
> >>>>>>> flags that
> >>>>>>> I need to have.
> >>>>>>> ./configure --prefix=/usr/local/openmpi-1.2.1 --disable-ipv6
> >>>>>>> --with-tm=/usr/local/pbs --enable-mpirun-prefix-by-default
> >>>>>>> --enable-mpi-f90
> >>>>>>>
> >>>>>>> After getting mpif90, I compiled my application (VASP) with
> >>>>>>> this new
> >>>>>>> parellel compiler but then I could not run it through PBS.
> >>>>>>>
> >>>>>>> #PBS -N Pt.CO.bridge.25ML
> >>>>>>> ### Set the number of nodes that will be used. Ensure
> >>>>>>> ### that the number "nodes" matches with the need of your job
> >>>>>>> ### DO NOT MODIFY THE FOLLOWING LINE FOR SINGLE-PROCESSOR JOBS!
> >>>>>>> #PBS -l nodes=node07:ppn=4
> >>>>>>> #PBS -l walltime=96:00:00
> >>>>>>> ##PBS -M asit_at_[hidden]
> >>>>>>> #PBS -m abe
> >>>>>>> export NPROCS=`wc -l $PBS_NODEFILE |gawk '//{print $1}'`
> >>>>>>> echo $NPROCS
> >>>>>>> echo The master node of this job is `hostname`
> >>>>>>> echo The working directory is `echo $PBS_O_WORKDIR`
> >>>>>>> echo The node file is $PBS_NODEFILE
> >>>>>>> echo This job runs on the following $NPROCS nodes:
> >>>>>>> echo `cat $PBS_NODEFILE`
> >>>>>>> echo "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-"
> >>>>>>> echo
> >>>>>>> echo command to EXE:
> >>>>>>> echo
> >>>>>>> echo
> >>>>>>> cd $PBS_O_WORKDIR
> >>>>>>>
> >>>>>>> echo "cachesize=4000 mpiblock=500 npar=4 procgroup=4 mkl ompi"
> >>>>>>>
> >>>>>>> date
> >>>>>>> /usr/local/openmpi-1.2.1/bin/mpiexec -mca mpi_paffinity_alone 1
> >>>>>>> -np
> >>>>>>> $NPROCS /hom e/struong/bin/vaspmpi_mkl_ompi >"$PBS_JOBID".out
> >>>>>>> date
> >>>>>>> ------------
> >>>>>>>
> >>>>>>> My environment is CentOS 4.4 x86_64, Intel Xeon, Torque, Maui.
> >>>>>>>
> >>>>>>> Could somebody here tell me what I missed or did incorrectly?
> >>>>>>>
> >>>>>>> Thank you very much.
> >>>>>>> _______________________________________________
> >>>>>>> users mailing list
> >>>>>>> users_at_[hidden]
> >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> users mailing list
> >>>>>> users_at_[hidden]
> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>> _______________________________________________
> >>> users mailing list
> >>> users_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >>
> >> --
> >> Jeff Squyres
> >> Cisco Systems
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>