Subject: [OMPI users] Problems with Open MPI/BLCR checkpoint/restart routine.
From: Gleb Igumnov (crazy.sage_at_[hidden])
Date: 2009-06-10 06:38:53


Hello. I've got following problem. I've run MPI programm and successful
checkpointed it with BLCR.
But now, when I'm trying to restart it using ompi-restart -v
ompi_global_snapshot_7190.ckpt I'm getting following message:

[umu2:07572] Checking for the existence of
(/root/ompi_global_snapshot_7190.ckpt)
[umu2:07572] Restarting from file (ompi_global_snapshot_7190.ckpt)
[umu2:07572] Exec in self
--------------------------------------------------------------------------
Error: Unable to obtain the proper restart command to restart from the
       checkpoint file (ompi_global_snapshot_7190.ckpt). Returned -1.

--------------------------------------------------------------------------

Both Open-MPI and BLCR are installed into shared NFS directory, blcr
directories are included into PATH and LD_LIBRARY_PATH variables on
restart node.
Open MPI initially configured with keys
 âˆ’−with−ft=cr −−enable−ft−thread −−enable−mpi−thread
−−with−blcr=/path/to/blcr

Program was run with -am ft-enable-cr.
What can cause such problem?
--------------------------------------------
With best regards
Gleb "Crazy Sage" Igumnov
mailto:crazy.sage_at_[hidden]