Subject: Re: [OMPI users] Problem getting OpenMPI to run
From: Jeff Layton (laytonjb_at_[hidden])
Date: 2009-06-02 06:50:20


Joe,

You are correct this is a ROCKS cluster. I didn't use the the --sge option when building (I tend to stay more generic, but I should have done that).

Not sure of the OFED release but I don't admin this cluster and the owners are picky about upgrades (tends to break Lustre).

BTW - the problem was solved. There was a configuration error for the specific queue. It was found and fixed and things seem to be running normally.

Thanks for help and I'm sorry for disturbing everyone. I wasn't familiar enough with the error messages to tell if it was OpenMPI or SGE.

TIA!

Jeff

________________________________
From: Joe Landman <landman_at_[hidden]>
To: Open MPI Users <users_at_[hidden]>
Sent: Monday, June 1, 2009 3:34:40 PM
Subject: Re: [OMPI users] Problem getting OpenMPI to run

Jeff Layton wrote:
> Jeff Squyres wrote:
>> On Jun 1, 2009, at 2:04 PM, Jeff Layton wrote:
>>
>>> error: executing task of job 3084 failed: execution daemon on host
>>> "compute-2-2.local" didn't accept task
>>>
>>
>> This looks like an error message from the resource manager/scheduler -- not from OMPI (i.e., OMPI tried to launch a process on a node and the launch failed because something rejected it).
>>
>> Which one are you using?

When you built Open-MPI, did you use the

    --with-sge

switch? Or if this is an OFED release, is it possible that this wasn't specified?

FWIW, this looks like a Rocks compute node ("compute-2-2.local" gives that away). The OFED Rolls in Rocks have had a few issues in the past with how they were built, so you may be running into that. If you didn't build it yourself, I'd suggest at least giving that a try.

Alternatively, OFED-1.4 is pretty good. Has a later version of Open-MPI than 1.3.x

Joe

>
> SGE
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics,
email: landman_at_[hidden]
web : http://scalableinformatics.com
      http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users