include("../../include/msg-header.inc"); ?>
From: Pak Lui (Pak.Lui_at_[hidden])
Date: 2007-05-21 10:00:41
Hi Götz,
I have tried using SSH instead of rsh before but I didn't use with the
kerberos auth. I can see you've tried to run qrsh -inherit via ssh
already before the mpirun line and verify the connection works.
I believe the "Permission denied, please try again." message is coming
from ssh daemon (sshd) on geminide2 and 7 that are preventing the
connections from geminide8, which in turns they cause orted not able to
launch on those 2 nodes.
Can you enable debug for sshd (with either -d or -ddd) on the SGE
cluster config with qconf -mconf, to see why the sshd sometimes blocking
ssh connection? You may get tons of outputs but it should show you the
reason why the permission is denied. It could be the setting in
sshd_config or something else we don't know about yet.
Götz Waschk wrote:
> Hello everyone,
>
> I have trouble with the Gridengine integration of openmpi. When I run
> a job with only 4 processes, it runs fine. With more processes, mpirun
> sometimes fails to connect to the remote nodes, the qrsh calls fail.
>
> I'll attach a job script and the error output. As you can see from the
> 'for' loop, I can connect to all nodes just fine, it is the qrsh
> executed by mpirun that fails. Qrsh was configured to run ssh with
> kerberos authentification (ssh -tt -o GSSAPIDelegateCredentials=no).
>
> My versions are openmpi 1.2.2, SGE 6.0u9, RHEL5. Any idea where the
> problem could be?
>
> Regards, Götz Waschk
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
-- - Pak Lui pak.lui_at_[hidden]