Subject: Re: [OMPI users] Using rsh instead of ssh during ompi-restart
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-06-11 12:42:18


The problem is that you misspelled the mca param - it should be:

-mca plm_rsh_agent rsh

On Jun 11, 2009, at 10:34 AM, Gleb Crazy Sage Igumnov wrote:

> Hello. I've got following problem: I'm trying to restart parallel job
> over our cluster using following command line:
> /common/openmpi-1.3.2/ompi-restart -mca plm-rsh-agent rsh -verbose
> -hostfile hfile ompi_global_snapshot_25229.ckpt
>
> despite of using such mca option I got following error message:
>
> --------------------------------------------------------------------------
> [umu2:26112] Checking for the existence of (/home/s0032/
> ompi_global_snapshot_25229.ckpt)
> [umu2:26112] Restarting from file (ompi_global_snapshot_25229.ckpt)
> [umu2:26112] Exec in self
> ssh: connect to host umu3 port 22: Connection refused
> --------------------------------------------------------------------------
> A daemon (pid 26113) died unexpectedly with status 1 while attempting
> to launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed
> shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to
> have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> mpirun: clean termination accomplished
> --------------------------------------------------------------------------
>
> What can I do to make ompi-restart use rsh instead of ssh?
>
>
> --
> With best regards,
> Gleb "Crazy Sage" Igumnov mailto:crazy.sage_at_[hidden]
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users