WARNING This documentation is a work in progress and does not reflect the full FTI plugin potential.
The FTI plugin enables one to perform checkpoints using FTI. FTI plugin does not support the full FTI feature set but offers a simple declarative interface to access a large subset of it.
FTI plugin uses a MPI_Comm
datatype and MPI_COMM_WORLD
descriptor, which are defined in mpi
plugin. For now it must be loaded alongside FTI plugin (as shown below).
FTI initialization in FTI plugin is based on several rules:
init_on
is specified, FTI plugin will initialize FTI on that exact event and expects the communicator and file name to be correctly available or fail otherwise,init_on
is not specified and the communicator is available on plugin initialization (e.g. when using MPI_COMM_WORLD
predefine from mpi
plugin), FTI plugin will try to initialize FTI (on plugin initialization) and expects file name to be correctly available or fail otherwise,init_on
is not specified and the communicator is not available on plugin initialization, FTI plugin will try to initialize FTI on communicators descriptor exposure and expects file name to be correctly available or fail otherwise.FTI plugin loads a predefined descriptor named FTI_COMM_WORLD
(FTI_COMM_WORLD_F
for Fortran) after a successfull initialization of FTI. This descriptor is treated as metadata. Its content can be accessed using PDI_access
.
At its root, the FTI configuration is made of several nodes: checkpoint
, communicator
, config_file
, dataset
, init_on
, recover_on
, recover_var
, send_file
, snapshot_on
and status
.
A checkpoint
specifies a list of checkpoints to execute. It is specified by a key/value map that may contain following keys: L1_on
, L2_on
, L3_on
, L4_on
. Each of the keys specifies events names that trigger a specified checkpoint. Value of this key can be either string or a list of strings:
L1_on
: executes FTI L1 checkpoint on specified PDI events,L2_on
: executes FTI L2 checkpoint on specified PDI events,L3_on
: executes FTI L3 checkpoint on specified PDI events,L4_on
: executes FTI L4 checkpoint on specified PDI events.A communicator
is a string, which specifies name of descriptor containing a MPI communicator to use for FTI initialization. Additionally, FTI plugin can write FTI_COMM_WORLD
into this descriptor, if it is shared with write access right. It defaults to MPI_COMM_WORLD
, which is a predefined descriptor from mpi
plugin.
A config_file
is a string that can contain $-expressions. Specifies the name of the FTI configuration file.
A dataset
is a key/value map, which keys are integers that specify datasets ids in FTI and values are names of protected descriptors. Alternatively, the map may contain a map with keys name
and size
, where name
is a name of protected descriptor and size
is a descriptor where FTI plugin will write size of the protected descriptor when shared with write access right.
A init_on
is a string or a list of strings that specifies the names of the PDI events that executes recovery of FTI.
A recover_on
is a string or a list of strings that specifies the names of the PDI events that executes recovery of FTI.
A recover_var
is a list of maps with keys:
on_event
var
on_event
defines on which events the variable recovery should be done (single event or list of events)
var
defines IDs of variables to recover (single ID or list of IDs)
A send_file
is a key/value map or a list of key/value maps describing source and destination paths of files to send, event or list of events to trigger on and optionally name of descriptor, where will be stored status information when exposed. Example:
A snapshot_on
is a string or a list of strings that specifies the names of the PDI events that executes FTI_Snapshot.
A status
is a string that specifies the name of a descriptor to which FTI will write FTI_Status code when shared.