Note: This article discusses architectural patterns for liveness detection. Actual medical device implementations require rigorous verification and regulatory approval. This is not medical device design guidance.
In safety-critical systems, knowing that something is still running can be as important as knowing what it is doing. A pacemaker must verify the heart is beating. A flight controller must confirm sensor subsystems remain responsive. A distributed system must detect node failures before they cascade.
This fundamental problem—proving existence in time—appears deceptively simple. Yet naive implementations routinely fail in ways that compromise safety. The solution lies in treating liveness detection not as a timing hack, but as a formal state machine with provable properties.
The Liveness Problem
Consider a subsystem that should emit periodic signals. How do we know it’s still alive?
The naive approach: check if we received a signal “recently.” But this raises immediate questions. How recent is recent? What happens during the first few seconds before we have enough data? What if signals arrive slightly late due to scheduling jitter? What if a single spurious signal masks an underlying failure?
Each question reveals a hidden assumption. Each assumption represents a potential failure mode in production.
Between observations, existence is uncertain. A liveness detector must distinguish between "confirmed alive," "confirmed dead," and "insufficient evidence to determine."
This three-valued logic is not a limitation—it’s intellectual honesty. Systems that collapse uncertainty into binary alive/dead decisions are implicitly making assumptions about timing, reliability, and failure modes that may not hold.
Modelling Heartbeats as State Transitions
Rather than tracking timestamps, we can model liveness detection as a finite state machine (FSM). The key insight: state encodes what we know, and transitions encode how evidence changes knowledge.
Define three states:
- INIT: Insufficient observations to determine liveness
- OK: Evidence supports the system is alive
- FAULT: Evidence indicates the system has stopped
And two inputs:
- BEAT: A heartbeat signal was received
- TIMEOUT: Sufficient time passed without a signal
The state transition function becomes:
δ(INIT, BEAT) → OK // First evidence of life
δ(INIT, TIMEOUT) → INIT // Can't fault what we haven't confirmed
δ(OK, BEAT) → OK // Continued evidence
δ(OK, TIMEOUT) → FAULT // Silence after confirmed activity
δ(FAULT, BEAT) → FAULT // Dead systems stay dead (sticky fault)
δ(FAULT, TIMEOUT)→ FAULT // Fault is terminal without resetThis FSM exhibits several important properties by construction.
Property 1: No False Positives During Learning
The INIT state provides a learning period where the detector cannot fault. This is not a grace period based on wall-clock time—it’s a formal requirement that we observe at least one heartbeat before we can claim silence is meaningful.
INV-1: (state == FAULT) → (n_beats > 0)A system cannot be declared dead if we never confirmed it was alive. This invariant is enforced by the transition function: there is no path from INIT to FAULT.
Property 2: Sticky Faults
Once a fault is detected, the FSM remains in FAULT regardless of subsequent inputs. This is a deliberate design choice for safety-critical applications.
INV-2: (state == FAULT at time t) → (state == FAULT at time t+1)Why make faults sticky? In safety-critical contexts, a detected failure should trigger investigation, not be silently recovered. A pacemaker that detects cardiac arrest should not quietly resume normal operation if the heart starts beating again—it should alert and require explicit clinical intervention.
The only exit from FAULT is an explicit reset operation, which represents a deliberate human decision that the fault has been addressed.
Property 3: Bounded Memory and Time
The FSM requires exactly two pieces of state: the current state (3 possible values) and a beat counter. No timestamps. No buffers. No history.
Memory: O(1)
Time per step: O(1)This is not merely efficient—it’s a safety property. Unbounded memory means unbounded failure modes. A buffer overflow in a heartbeat detector could be catastrophic in an implantable device.
The Mathematics of Timing
But wait—how do we generate TIMEOUT inputs without tracking time?
The answer is to separate concerns. The FSM handles state transitions. A separate tick mechanism handles timing. On each tick (a regular polling interval), we check: has a beat arrived since the last tick? If not, and enough ticks have passed, we generate a TIMEOUT input.
tick_step:
if beat_received_this_tick:
fsm_step(BEAT)
ticks_since_beat = 0
else:
ticks_since_beat += 1
if ticks_since_beat >= max_ticks:
fsm_step(TIMEOUT)The max_ticks parameter encodes the timing tolerance. But notice: the FSM itself knows nothing about time. It only knows about the sequence of BEAT and TIMEOUT events. This separation means the FSM’s correctness properties are independent of the specific timing configuration.
From Theory to Code
The translation from mathematical model to C is direct. The FSM becomes a struct:
typedef struct {
hb_state_t state; // INIT, OK, FAULT
uint32_t n_beats; // Observation counter
uint32_t ticks; // Ticks since last beat
uint8_t fault_flag; // Sticky fault indicator
uint8_t in_step; // Reentrancy guard
} hb_fsm_t;The transition function becomes a switch statement that directly transcribes the mathematical δ:
hb_result_t hb_step(hb_fsm_t *m, hb_input_t input) {
// Reentrancy guard
if (m->in_step) {
m->fault_flag = 1;
m->state = HB_FAULT;
return (hb_result_t){ .state = HB_FAULT };
}
m->in_step = 1;
// State transitions
switch (m->state) {
case HB_INIT:
if (input == HB_BEAT) {
m->state = HB_OK;
m->n_beats++;
}
// TIMEOUT in INIT: stay in INIT
break;
case HB_OK:
if (input == HB_BEAT) {
m->n_beats++;
m->ticks = 0;
} else { // TIMEOUT
m->state = HB_FAULT;
m->fault_flag = 1;
}
break;
case HB_FAULT:
// Sticky: no transitions out
break;
}
m->in_step = 0;
return (hb_result_t){ .state = m->state };
}This is not code that implements a specification—it is the specification, transcribed into a different notation. Every line traces to a mathematical statement. Every invariant can be checked after every step.
Testing Contracts, Not Coverage
Traditional testing asks: “Did I execute every line?” Contract testing asks: “Did I verify every guarantee?”
For the heartbeat FSM, we define explicit contracts:
- CONTRACT-1 (No False Early Fault): The FSM cannot transition to FAULT without first visiting OK
- CONTRACT-2 (Sticky Fault): Once in FAULT, no input sequence exits FAULT
- CONTRACT-3 (Liveness Detection): A sequence of BEAT inputs keeps the FSM in OK
- CONTRACT-4 (Fault Detection): TIMEOUT after OK transitions to FAULT
Each contract has a corresponding test that verifies the guarantee holds. Fuzz testing with 100,000 random input sequences confirms invariants hold under arbitrary conditions—not just the happy paths a developer imagined.
Composition: Where Heartbeats Meet Baselines
The real power of this approach emerges in composition. The heartbeat FSM outputs inter-arrival times—the duration between successive beats. These durations become inputs to a statistical baseline detector (explored in Statistics as State Transitions).
Event → Heartbeat FSM → Δt → Baseline FSM → Anomaly?Each FSM maintains its own contracts. The composition inherits guarantees from both: the heartbeat FSM provides liveness detection; the baseline FSM provides anomaly detection on timing patterns. Together, they answer: “Is the system alive, and is it behaving normally?”
This modular composition is only possible because each component is closed (no hidden dependencies), total (handles all inputs), and deterministic (reproducible behaviour). These properties, established mathematically and enforced in code, enable systems that support certification, audit, and verification processes.
Implications for Safety-Critical Design
The heartbeat pattern illustrates a broader principle: safety-critical systems benefit from modelling behaviour as explicit state machines with formal properties.
This approach supports regulatory frameworks like IEC 62304 for medical devices, where evidence of correctness is a certification requirement. A well-documented FSM with proven invariants provides exactly the kind of auditable evidence that regulators expect for Class C software.
More importantly, it provides engineering confidence. When a pacemaker’s liveness detector is a three-state FSM with five invariants, all verified by contract tests, the failure modes are enumerable. When it’s a tangle of timestamps and heuristics, the failure modes are limited only by imagination.
Conclusion
Liveness detection—proving that something exists in time—is a foundational problem in safety-critical systems. By modelling heartbeat detection as a deterministic finite state machine, we transform an intuitive timing check into a formal system with provable properties.
The key insights are structural: separate timing generation from state logic; use three-valued states that acknowledge uncertainty; make faults sticky to require explicit recovery; verify contracts rather than coverage.
This approach is explored in detail in the c-from-scratch educational project, where the Pulse module implements a complete heartbeat detector with full contract testing. The mathematical rigour transfers: what you prove on paper, you transcribe into code, and you verify through tests.
As with any architectural approach, suitability depends on system requirements, risk classification, and regulatory context. But for systems where knowing something is still there is as critical as knowing what it’s doing, the state machine approach offers a path from intuition to proof.