Abstract:
The Low Latency Fault Tolerance (LLFT) system provides fault tolerance for distributed
applications within a wide-area network, using a leader-follower replication strategy. LLFT
provides application-transparent replication, with strong replica consistency, for applications that
involve multiple interacting processes or threads. The LLFT Messaging Protocol provides reliable,
totally ordered message delivery by employing a group multicast, where the message ordering is
determined by the primary replica in the destination group. The Leader-Determined Membership
Protocol provides reconfiguration and recovery when a replica becomes faulty and when a replica
joins or leaves a group, where the membership of the group is determined by the primary replica.
LLFT can operate in the common industrial case where there is a primary replica and one or more
backup replicas. The LLFT system achieves low latency message delivery during normal operation
and low latency reconfiguration and recovery when a fault occurs.
As in other fault tolerance systems, the replicas of a process form a process group. One
replica in the group is the primary, and the other replicas are the backups. The primary multicasts
messages to a destination group over a virtual connection. The primary in the destination group
orders the messages, performs the operations, produces ordering information for non-deterministic
operations, and supplies ordering information to its backups. Thus, the backups can perform the
same operations in the same order and obtain the same results as the primary. If the primary fails,
a new primary is chosen deterministically and the new primary determines the membership of the
group. LLFT operates within the usual asynchronous model, but with timing-based fault detectors.
The assumptions of eventual reliable communication and sufficient replicati on enable LLFT to
maintain a single consistent infinite computation, despite crash, timing, and partitioning faults
The proposed LLFT system is emphasized on the Furniture Ordering Management System.