$subject_val = "Re: [OMPI users] Problems with Open MPI/BLCR checkpoint/restart routine."; include("../../include/msg-header.inc"); ?>
Subject: Re: [OMPI users] Problems with Open MPI/BLCR checkpoint/restart routine.
From: pat.o'bryant_at_[hidden]
Date: 2009-06-11 07:24:24
Gleb,
I am trying to use BLCR as well. What levels of OpenMPI, OFED, and BLCR
are you using? I can get a serial checkpoint/restart to work but not the
parallel case. I built my system using OFED 1.3.1, OpenMPI 1.3.1, and BLCR
0.8.1-1. I also used your same BLCR configuration options for OpenMPI.
Thanks,
Pat
J.W. (Pat) O'Bryant,Jr.
Business Line Infrastructure
Technical Systems, HPC
"Gleb \"Crazy
Sage\"
Igumnov" To
<crazy.sage_at_gm Gleb Igumnov <crazy.sage_at_[hidden]>
ail.com> cc
Sent by: users_at_[hidden]
users-bounces@ Subject
open-mpi.org Re: [OMPI users] Problems with Open
MPI/BLCR checkpoint/restart
routine.
06/10/09 12:06
PM
Please respond
to
"Gleb \"Crazy
Sage\"
Igumnov"
<crazy.sage_at_gm
ail.com>;
Please respond
to
Open MPI Users
<users_at_open-mp
i.org>
Fixed this, not all paths were in variables. Sorry.
> Hello. I've got following problem. I've run MPI programm and successful
> checkpointed it with BLCR.
> But now, when I'm trying to restart it using ompi-restart -v
> ompi_global_snapshot_7190.ckpt I'm getting following message:
> [umu2:07572] Checking for the existence of
> (/root/ompi_global_snapshot_7190.ckpt)
> [umu2:07572] Restarting from file (ompi_global_snapshot_7190.ckpt)
> [umu2:07572] Exec in self
>
--------------------------------------------------------------------------
> Error: Unable to obtain the proper restart command to restart from the
> checkpoint file (ompi_global_snapshot_7190.ckpt). Returned -1.
>
--------------------------------------------------------------------------
> Both Open-MPI and BLCR are installed into shared NFS directory, blcr
> directories are included into PATH and LD_LIBRARY_PATH variables on
> restart node.
> Open MPI initially configured with keys
> ??with?ft=cr ??enable?ft?thread ??enable?mpi?thread
> ??with?blcr=/path/to/blcr
> Program was run with -am ft-enable-cr.
> What can cause such problem?
> --------------------------------------------
> With best regards
> Gleb "Crazy Sage" Igumnov
> mailto:crazy.sage_at_[hidden]
-- With best regards, Gleb "Crazy Sage" Igumnov mailto:crazy.sage_at_[hidden] _______________________________________________ users mailing list users_at_[hidden] http://www.open-mpi.org/mailman/listinfo.cgi/users