From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-05-18 20:21:39


Hah! Your reply came in seconds after I replied.

Your questions made me notice that we're missing a FAQ entry for the
"ssh:rsh" explanation, though, so I'll add an entry for that. Thanks.

On May 18, 2007, at 5:15 PM, Steven Truong wrote:

> Hi, Jeff. Ok. After reading through the FAQ, I modified .bashrc to
> set PATH and LD_LIBRARY_PATH and now I could execute:
>
> [struong_at_neptune ~]$ ssh node07 which orted /usr/local/
> openmpi-1.2.1/bin/orted
> [struong_at_neptune ~]$ /usr/local/openmpi-1.2.1/bin/mpirun --host node07
> hostname node07.nanostellar.com
>
> Thank you.
> Steven.
>
>
>
> On 5/18/07, Steven Truong <midair77_at_[hidden]> wrote:
>> Hi, Jeff. Thanks so very much for all your helps so far. I decided
>> that I needed to go back and check whether openmpi even works for
>> simple cases, so here I am.
>>
>> So my shell might have exited when it detect that I ran
>> non-interactively. But then again, how this parameter
>> MCA pls: parameter "pls_rsh_agent" (current value: "ssh :rsh")
>>
>> affect my outcome? How am I going to set PATH and LD_LIBRARY_PATH to
>> be like those in .bash_profile in my Torque job files?
>>
>> Could you give me some tips here?
>>
>> Below is my current bash shell's settings.
>>
>> Thanks,
>> Steven.
>>
>> [struong_at_neptune ~]$ echo $SHELL
>> /bin/bash
>> [struong_at_neptune ~]$ cat .bash_profile | grep -v ^#
>>
>> if [ -f ~/.bashrc ]; then
>> . ~/.bashrc
>> fi
>>
>> umask 027
>> PATH=/opt/intel/fce/9.1.043/bin:/usr/local/openmpi-1.2.1/bin:/opt/
>> c3-4:/opt/bin:/usr/local/torque/bin:/usr/local/torque/sbin:/usr/
>> local/maui/bin:/usr/local/maui/sbin:/usr/kerberos/sbin:/usr/
>> kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/
>> usr/bin:/usr/X11R6/bin:/usr/local/rrdtool-1.2.12/bin:~/bin
>> BASH_ENV=$HOME/.bashrc
>> FC=/opt/intel/fce/9.1.043/bin/ifort
>> F90=$FC
>> F77=$FC
>> F77_GETARGDECL=" "
>> LD_LIBRARY_PATH=/usr/local/openmpi-1.2.1/lib
>> RSHCOMMAND=/usr/bin/ssh
>> PBS_DEFAULT="neptune"
>> PBSLOGLEVEL=7
>> BUILD_DIR=/tmp/rrdbuil
>> INSTALL_DIR=/usr/local/rrdtool-1.2.12
>> source /usr/local/ecce/scripts/runtime_setup.sh
>> export F77 USERNAME BASH_ENV PATH RSHCOMMAND FC F90 PBS_DEFAULT
>> BUILD_DIR INSTALL_DIR LD_LIBRARY_PATH
>>
>> [struong_at_neptune ~]$ ssh node07 which orted
>> which: no orted in (/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin)
>>
>> [struong_at_neptune ~]$ /usr/local/openmpi-1.2.1/bin/mpirun --host
>> node07
>> node07 hostname
>> ---------------------------------------------------------------------
>> -----
>> Failed to find the following executable:
>>
>> Host: node07.nanostellar.com
>> Executable: node07
>>
>> Cannot continue.
>> ---------------------------------------------------------------------
>> -----
>>
>>
>> On 5/18/07, Jeff Squyres <jsquyres_at_[hidden]> wrote:
>>> On May 18, 2007, at 4:38 PM, Steven Truong wrote:
>>>
>>>> [struong_at_neptune 4cpu4npar10nsim]$ mpirun --mca btl tcp,self -np 1
>>>> --host node07 hostname
>>>> bash: orted: command not found
>>>
>>> As you noted later in your mail, this is the key problem: orted is
>>> not found on the remote node.
>>>
>>> Notice that you are currently using the rsh launcher, not the Torque
>>> launcher (presumably because you are not inside a Torque job). What
>>> you want to check is:
>>>
>>> rsh node07 which orted
>>>
>>> (or use ssh -- whatever is correct for your cluster)
>>>
>>> I suspect that orted will not be found, and that you'll need to
>>> modify your shell startup files to set PATH / LD_LIBRARY_PATH
>>> properly. Note that some shell startup files will exit early if
>>> they
>>> detect that they are running on a non-interactive login. See
>>> http://
>>> www.open-mpi.org/faq/?category=running#adding-ompi-to-path for more
>>> details.
>>>
>>> Alternatively, you can simply use the absolute pathname to mpirun,
>>> which Open MPI will interpret to mean that you want OMPI to set the
>>> PATH/LD_LIBRARY_PATH on the remote node for you. Something like
>>> this:
>>>
>>> /usr/local/openmpi-1.2.1/bin/mpirun --host node07 hostname
>>>
>>> (note that the "btl" MCA parameter is only relevant for MPI
>>> executables)
>>>
>>> --
>>> Jeff Squyres
>>> Cisco Systems
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems