The reconfigurable design of ARMOR process benefits not only ARMOR
processes, but also non-ARMOR applications well. Non-ARMOR applications
benefit from the SIFT environment’s ability to host a wide variety of
fault tolerance mechanisms, which makes it easy to customize the software-implemented fault tolerance (SIFT)
environment for a particular application’s set of dependability
requirements:
1. JPL/REE Applications Manager.
The ARMOR-based SIFT environment has been used to protect spaceborne scientific
MPI applications as part of the Remote Exploration and Experimentation (REE)
project at the Jet Propulsion Laboratory. ARMOR processes detect application
crash failures, ARMOR crash failures, application hang failures through
progress indicators sent by the application, ARMOR hang failures, and node
failures. The REE configuration of the SIFT environment has been
experimentally evaluated through error injections to stress the error
detection and recovery mechanisms of the ARMOR processes and to determine the
overhead of the SIFT environment as seen by the application.
2. Wireless Telephone Network Controller.
A database server for a wireless telephone network controller has been
outfitted with elements that provide a data auditing framework for its
in-memory database tables. In addition to the data auditing checks embedded
into the database server, process-level detection and recovery provided by
the external ARMORs are used to tolerate failures in the controller application.
3. High Availability Framework for Wireless Client-Server Applications.
Standard socket function calls have been
overridden to invoke TCP proxy elements incorporated either within the
application process or in a local ARMOR process. These proxy elements shield
the application from the occasional disconnection expected when using a lossy
wireless medium. In addition to transparent recovery of the
application’s TCP connections, the ARMOR-based SIFT environment provides
the baseline suite of error detection and recovery services to tolerate
failures in the application processes.
4. Telecommunications Middleware
Existing middleware processes for a telecommunications application have been extended
with elements to implement server failover policies. This particular
application requires two particular fault tolerance mechanisms: (1) a
mechanism to ensure that the backup node has access to all data written to
the primary node’s local disk, and (2) a mechanism for migrating the IP
address of the primary node to the backup node to provide client
transparency. Both of these requirements are satisfied through elements that
plug into either the middleware processes or external ARMOR processes in the
SIFT environment.