« Return to documentation listing
NAME
Open RTE MCA Snapshot Coordination (SnapC) Framework - Overview of Open
RTE's SnapC framework, and selected modules. Open MPI 1.3.4
DESCRIPTION
Open RTE can coordinate the generation of a global snapshot of a paral-
lel job from many distributed local snapshots. The components in this
framework determine how to: Initiate the checkpoint of the parallel
application, gather together the many distributed local snapshots, and
provide the user with a global snapshot handle reference that can be
used to restart the parallel application.
GENERAL PROCESS REQUIREMENTS
In order for a process to use the Open RTE SnapC components it must
adhear to a few programmatic requirements.
First, the program must call ORTE_INIT early in its execution. This
should only be called once, and it is not possible to checkpoint the
process without it first having called this function.
The program must call ORTE_FINALIZE before termination.
A user may initiate a checkpoint of a parallel application by using the
orte-checkpoint(1) and orte-restart(1) commands.
AVAILABLE COMPONENTS
Open RTE ships with one SnapC component: full.
The following MCA parameters apply to all components:
snapc_base_verbose
Set the verbosity level for all components. Default is 0, or silent
except on error.
snapc_base_global_snapshot_dir
The directory to store the checkpoint snapshots. Default is /tmp.
full SnapC Component
The full component gathers together the local snapshots to the disk
local to the Head Node Process (HNP) before completing the checkpoint
of the process. This component does not currently support replicated
HNPs, or timer based gathering of local snapshot data. This is a
3-tiered hierarchy of coordinators.
The full component has the following MCA parameters:
snapc_full_priority
The component's priority to use when selecting the most appropriate
component for a run.
snapc_full_verbose
Set the verbosity level for this component. Default is 0, or silent
except on error.
none SnapC Component
The none component simply selects no SnapC component. All of the SnapC
function calls return immediately with ORTE_SUCCESS.
1.3.4 Nov 11, 2009 ORTE_SNAPC(7)
« Return to documentation listing
|