Subject: Re: [OMPI users] Checkpointing automatically at regular intervals
From: Kritiraj Sajadah (ksajadah_at_[hidden])
Date: 2009-06-30 11:37:34


Dear Josh,
            I am sure it will definitely be good because if someone is using OPEN MPI for checkpointing his application, he will not want to sit and checkpoint the application manually; and this can be a real pain if its a long running application.

I would imagine an automatic restart from the last checkpoint in case of failure would also be interesting.

Many thanks.

Regards,

Kritiraj

--- On Tue, 6/30/09, Josh Hursey <jjhursey_at_[hidden]> wrote:

> From: Josh Hursey <jjhursey_at_[hidden]>
> Subject: Re: [OMPI users] Checkpointing automatically at regular intervals
> To: "Open MPI Users" <users_at_[hidden]>
> Date: Tuesday, June 30, 2009, 3:00 PM
> Currently, there is no mechanism to
> checkpoint every X minutes in Open MPI.
>
> As mentioned below you can use a script to initiate the
> checkpoint every X minutes. Alternatively it should not be
> too difficult to add such a feature to Open MPI. If enough
> people would be interested I can file a feature bug to add
> such a feature in a future release.
>
> Josh
>
> On Jun 30, 2009, at 9:34 AM, Mohamed Slim bouguerra wrote:
>
> > Hi,
> > I think that you can write a simple script such as:
> >
> > wihle `pgrep mpirun`  != ""
> > ompi-checkpoint `pidof mpirun`
> > sleep 5
> > done
> >
> > Le 30 juin 09 à 14:29, Kritiraj Sajadah a écrit :
> >
> >>
> >> Dear All,
> >>        I can manually
> checkpoint an MPI application using OPEN MPI and BLCR.
> However, I now want to checkpointing my application
> automatically at every 5 minutes. Is there a way in OPEN MPI
> to ensure automatic checkpointing without the user
> intervention while the application is running?
> >>
> >> Thank you
> >>
> >> Regards,
> >> Kritiraj
> >>
> >>
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>