Formal Methods

Heartbeats and State Machines: Proving Existence in Time

How deterministic finite state machines can model liveness detection with verifiable properties

Published
January 08, 2026 17:40
Reading Time
10 min
Heartbeat State Machine Diagram

Note: This article discusses architectural patterns for liveness detection. Actual medical device implementations require rigorous verification and regulatory approval. This is not medical device design guidance.

In safety-critical systems, knowing that something is still running can be as important as knowing what it is doing. A pacemaker must verify the heart is beating. A flight controller must confirm sensor subsystems remain responsive. A distributed system must detect node failures before they cascade.

This fundamental problem—proving existence in time—appears deceptively simple. Yet naive implementations routinely fail in ways that compromise safety. The solution lies in treating liveness detection not as a timing hack, but as a formal state machine with provable properties.

The Liveness Problem

Consider a subsystem that should emit periodic signals. How do we know it’s still alive?

The naive approach: check if we received a signal “recently.” But this raises immediate questions. How recent is recent? What happens during the first few seconds before we have enough data? What if signals arrive slightly late due to scheduling jitter? What if a single spurious signal masks an underlying failure?

Each question reveals a hidden assumption. Each assumption represents a potential failure mode in production.

Design Property: Existence Uncertainty

Between observations, existence is uncertain. A liveness detector must distinguish between "confirmed alive," "confirmed dead," and "insufficient evidence to determine."

This three-valued logic is not a limitation—it’s intellectual honesty. Systems that collapse uncertainty into binary alive/dead decisions are implicitly making assumptions about timing, reliability, and failure modes that may not hold.

Modelling Heartbeats as State Transitions

Rather than tracking timestamps, we can model liveness detection as a finite state machine (FSM). The key insight: state encodes what we know, and transitions encode how evidence changes knowledge.

Define three states:

  • INIT: Insufficient observations to determine liveness
  • OK: Evidence supports the system is alive
  • FAULT: Evidence indicates the system has stopped

And two inputs:

  • BEAT: A heartbeat signal was received
  • TIMEOUT: Sufficient time passed without a signal

The state transition function becomes:

δ(INIT, BEAT)    → OK      // First evidence of life
δ(INIT, TIMEOUT) → INIT    // Can't fault what we haven't confirmed
δ(OK, BEAT)      → OK      // Continued evidence
δ(OK, TIMEOUT)   → FAULT   // Silence after confirmed activity
δ(FAULT, BEAT)   → FAULT   // Dead systems stay dead (sticky fault)
δ(FAULT, TIMEOUT)→ FAULT   // Fault is terminal without reset

This FSM exhibits several important properties by construction.

Property 1: No False Positives During Learning

The INIT state provides a learning period where the detector cannot fault. This is not a grace period based on wall-clock time—it’s a formal requirement that we observe at least one heartbeat before we can claim silence is meaningful.

INV-1: (state == FAULT) → (n_beats > 0)

A system cannot be declared dead if we never confirmed it was alive. This invariant is enforced by the transition function: there is no path from INIT to FAULT.

Property 2: Sticky Faults

Once a fault is detected, the FSM remains in FAULT regardless of subsequent inputs. This is a deliberate design choice for safety-critical applications.

INV-2: (state == FAULT at time t) → (state == FAULT at time t+1)

Why make faults sticky? In safety-critical contexts, a detected failure should trigger investigation, not be silently recovered. A pacemaker that detects cardiac arrest should not quietly resume normal operation if the heart starts beating again—it should alert and require explicit clinical intervention.

The only exit from FAULT is an explicit reset operation, which represents a deliberate human decision that the fault has been addressed.

Property 3: Bounded Memory and Time

The FSM requires exactly two pieces of state: the current state (3 possible values) and a beat counter. No timestamps. No buffers. No history.

Memory: O(1)
Time per step: O(1)

This is not merely efficient—it’s a safety property. Unbounded memory means unbounded failure modes. A buffer overflow in a heartbeat detector could be catastrophic in an implantable device.

The Mathematics of Timing

But wait—how do we generate TIMEOUT inputs without tracking time?

The answer is to separate concerns. The FSM handles state transitions. A separate tick mechanism handles timing. On each tick (a regular polling interval), we check: has a beat arrived since the last tick? If not, and enough ticks have passed, we generate a TIMEOUT input.

tick_step:
    if beat_received_this_tick:
        fsm_step(BEAT)
        ticks_since_beat = 0
    else:
        ticks_since_beat += 1
        if ticks_since_beat >= max_ticks:
            fsm_step(TIMEOUT)

The max_ticks parameter encodes the timing tolerance. But notice: the FSM itself knows nothing about time. It only knows about the sequence of BEAT and TIMEOUT events. This separation means the FSM’s correctness properties are independent of the specific timing configuration.

From Theory to Code

The translation from mathematical model to C is direct. The FSM becomes a struct:

typedef struct {
    hb_state_t state;      // INIT, OK, FAULT
    uint32_t   n_beats;    // Observation counter
    uint32_t   ticks;      // Ticks since last beat
    uint8_t    fault_flag; // Sticky fault indicator
    uint8_t    in_step;    // Reentrancy guard
} hb_fsm_t;

The transition function becomes a switch statement that directly transcribes the mathematical δ:

hb_result_t hb_step(hb_fsm_t *m, hb_input_t input) {
    // Reentrancy guard
    if (m->in_step) {
        m->fault_flag = 1;
        m->state = HB_FAULT;
        return (hb_result_t){ .state = HB_FAULT };
    }
    m->in_step = 1;
    
    // State transitions
    switch (m->state) {
        case HB_INIT:
            if (input == HB_BEAT) {
                m->state = HB_OK;
                m->n_beats++;
            }
            // TIMEOUT in INIT: stay in INIT
            break;
            
        case HB_OK:
            if (input == HB_BEAT) {
                m->n_beats++;
                m->ticks = 0;
            } else { // TIMEOUT
                m->state = HB_FAULT;
                m->fault_flag = 1;
            }
            break;
            
        case HB_FAULT:
            // Sticky: no transitions out
            break;
    }
    
    m->in_step = 0;
    return (hb_result_t){ .state = m->state };
}

This is not code that implements a specification—it is the specification, transcribed into a different notation. Every line traces to a mathematical statement. Every invariant can be checked after every step.

Testing Contracts, Not Coverage

Traditional testing asks: “Did I execute every line?” Contract testing asks: “Did I verify every guarantee?”

For the heartbeat FSM, we define explicit contracts:

  • CONTRACT-1 (No False Early Fault): The FSM cannot transition to FAULT without first visiting OK
  • CONTRACT-2 (Sticky Fault): Once in FAULT, no input sequence exits FAULT
  • CONTRACT-3 (Liveness Detection): A sequence of BEAT inputs keeps the FSM in OK
  • CONTRACT-4 (Fault Detection): TIMEOUT after OK transitions to FAULT

Each contract has a corresponding test that verifies the guarantee holds. Fuzz testing with 100,000 random input sequences confirms invariants hold under arbitrary conditions—not just the happy paths a developer imagined.

Composition: Where Heartbeats Meet Baselines

The real power of this approach emerges in composition. The heartbeat FSM outputs inter-arrival times—the duration between successive beats. These durations become inputs to a statistical baseline detector (explored in Statistics as State Transitions).

Event → Heartbeat FSM → Δt → Baseline FSM → Anomaly?

Each FSM maintains its own contracts. The composition inherits guarantees from both: the heartbeat FSM provides liveness detection; the baseline FSM provides anomaly detection on timing patterns. Together, they answer: “Is the system alive, and is it behaving normally?”

This modular composition is only possible because each component is closed (no hidden dependencies), total (handles all inputs), and deterministic (reproducible behaviour). These properties, established mathematically and enforced in code, enable systems that support certification, audit, and verification processes.

Implications for Safety-Critical Design

The heartbeat pattern illustrates a broader principle: safety-critical systems benefit from modelling behaviour as explicit state machines with formal properties.

This approach supports regulatory frameworks like IEC 62304 for medical devices, where evidence of correctness is a certification requirement. A well-documented FSM with proven invariants provides exactly the kind of auditable evidence that regulators expect for Class C software.

More importantly, it provides engineering confidence. When a pacemaker’s liveness detector is a three-state FSM with five invariants, all verified by contract tests, the failure modes are enumerable. When it’s a tangle of timestamps and heuristics, the failure modes are limited only by imagination.

Conclusion

Liveness detection—proving that something exists in time—is a foundational problem in safety-critical systems. By modelling heartbeat detection as a deterministic finite state machine, we transform an intuitive timing check into a formal system with provable properties.

The key insights are structural: separate timing generation from state logic; use three-valued states that acknowledge uncertainty; make faults sticky to require explicit recovery; verify contracts rather than coverage.

This approach is explored in detail in the c-from-scratch educational project, where the Pulse module implements a complete heartbeat detector with full contract testing. The mathematical rigour transfers: what you prove on paper, you transcribe into code, and you verify through tests.

As with any architectural approach, suitability depends on system requirements, risk classification, and regulatory context. But for systems where knowing something is still there is as critical as knowing what it’s doing, the state machine approach offers a path from intuition to proof.

About the Author

William Murray is a Regenerative Systems Architect with 30 years of UNIX infrastructure experience, specializing in deterministic computing for safety-critical systems. Based in the Scottish Highlands, he operates SpeyTech and maintains several open-source projects including C-Sentinel and c-from-scratch.

Discuss This Perspective

For technical discussions or acquisition inquiries, contact SpeyTech directly.

Get in touch
← Back to Insights