$subject_val = "Re: [OMPI users] OpenMPI and SGE"; include("../../include/msg-header.inc"); ?>
Subject: Re: [OMPI users] OpenMPI and SGE
From: Rolf Vandevaart (Rolf.Vandevaart_at_[hidden])
Date: 2009-06-23 13:41:47
Ray Muno wrote:
> Rolf Vandevaart wrote:
>
>> Ray Muno wrote:
>>
>>> Ray Muno wrote:
>>>
>>>
>>>> We are running a cluster using Rocks 5.0 and OpenMPI 1.2 (primarily).
>>>> Scheduling is done through SGE. MPI communication is over InfiniBand.
>>>>
>>>>
>>>>
>>> We also have OpenMPI 1.3 installed and receive similar errors.-
>>>
>>>
>>>
>> This does sound like a problem with SGE. By default, we use qrsh to
>> start the jobs on all the remote nodes. I believe that is the command
>> that is failing. There are two things you can try to get more info
>> depending on the version of Open MPI. With version 1.2, you can try
>> this to get more information.
>>
>> |--mca pls_gridengine_verbose 1|
>>
>>
> This did not look like it gave me any more info.
>
>
>> With Open MPI 1.3.2 and later the verbose flag will not help. But
>> instead, you can disable the use of qrsh and instead use rsh/ssh to
>> start the remote jobs.
>>
>> --mca plm_rsh_disable_qrsh 1
>>
>>
>
> Tha give me
>
> PMGR_COLLECTIVE ERROR: unitialized MPI task: Missing required
> environment variable: MPIRUN_RANK
> PMGR_COLLECTIVE ERROR: PMGR_COLLECTIVE ERROR: unitialized MPI task:
> Missing required environment variable: MPIRUN_RANK
>
I do not recognize these errors as part of Open MPI. A google search
showed they might be coming from MVAPICH. Is there a chance we are
using Open MPI to launch the jobs (via Open MPI mpirun) but we are
actually launching an application that is linked to MVAPICH?
-- ========================= rolf.vandevaart_at_[hidden] 781-442-3043 =========================