Formal Methods

Statistics as State Transitions

How finite state machines can provide a deterministic foundation for anomaly detection

Published
January 7, 2026 21:00
Reading Time
7 min
Statistics as State Transitions - FSM approach to anomaly detection

Statistics and state machines rarely appear in the same sentence. Statistics belongs to the world of probability distributions, confidence intervals, and hypothesis testing. State machines belong to the world of formal verification, protocol design, and hardware controllers.

But what if statistical monitoring could be reframed as state transitions? What if anomaly detection wasn’t a probabilistic judgement call, but a deterministic function from input to state?

This reframing isn’t just theoretical elegance. It’s the foundation for building monitoring systems that can be verified, tested, and certified—properties that matter when the system being monitored is safety-critical.

The Problem with Statistical Monitoring

Traditional anomaly detection treats statistics as an oracle. You feed it data, it returns a judgement: normal or abnormal, with perhaps a confidence score attached. The internal state is opaque. The decision boundary is learned, not specified. The behaviour under edge cases is undefined.

This works adequately for many applications. For safety-critical systems, it creates fundamental problems:

Non-reproducibility: Run the same detector twice on the same data, and you may get different results—especially if the detector uses adaptive thresholds or online learning with floating-point accumulation errors.

Unbounded state: Many statistical methods require unbounded history. A sliding window of N observations requires O(N) memory. Store-everything approaches grow without limit.

Opaque transitions: When the system moves from “normal” to “alert”, what exactly triggered the transition? In a learned model, the answer may be “a complex combination of 47 features weighted by coefficients from last Tuesday’s training run.”

Unverifiable contracts: What are the formal guarantees? Under what conditions will the detector definitely fire? Under what conditions will it definitely not fire? For most statistical systems, these questions have no precise answers.

The State Machine Reframing

A finite state machine (FSM) has three components: a set of states, a transition function, and an initial state. Given the current state and an input, the transition function deterministically produces the next state.

The insight is that statistical monitoring can be cast in exactly this form:

State:      S = (μ, σ², n, q)
Input:      x ∈ ℝ (a single observation)
Transition: S' = f(S, x)

Where:

  • μ is the running mean (exponentially weighted)
  • σ² is the running variance (exponentially weighted)
  • n is the observation count
  • q is the discrete state: LEARNING, STABLE, or DEVIATION

The transition function f is completely deterministic. Given the same state S and input x, it produces exactly the same next state S’. No randomness. No hidden variables. No dependence on history beyond what’s encoded in the state.

Design Property: Closure

The state Sₜ₊₁ = f(Sₜ, xₜ) is fully determined by the previous state and current observation. No external history is required.

This property—closure—is what makes the system a valid state machine rather than a statistical approximation wearing FSM clothing.

The EMA Foundation

The key to achieving closure is the Exponential Moving Average (EMA). Unlike a Simple Moving Average (SMA) which requires a buffer of N previous values, the EMA computes the new mean from just the previous mean and the current observation:

μₜ = α·xₜ + (1 - α)·μₜ₋₁

This single equation delivers several properties simultaneously:

O(1) memory: Store one value (μ), not N values.

O(1) computation: One multiply, one add—not a sum over N elements.

Spike resistance: A single outlier of magnitude M can shift the mean by at most α·M. With α = 0.1, even a massive spike has bounded impact.

Recency weighting: Recent observations influence the mean more than ancient ones, which is usually what anomaly detection requires.

The same exponential structure extends to variance:

σₜ² = α·(xₜ - μₜ₋₁)² + (1 - α)·σₜ₋₁²

Note the careful ordering: the deviation uses the previous mean, not the updated mean. This prevents the current observation from influencing its own anomaly score.

State Transitions as Contracts

With mean and variance tracked, anomaly detection becomes a z-score calculation:

z = |xₜ - μₜ₋₁| / σₜ

The z-score answers: “How many standard deviations from expected is this observation?” A z-score of 3 indicates an observation that would occur roughly 0.3% of the time under a normal distribution.

The state machine’s discrete state q transitions based on this z-score:

LEARNING → STABLE     when n ≥ min_samples and σ² > ε
STABLE → DEVIATION    when z > threshold
DEVIATION → STABLE    when z ≤ threshold

These transitions are not heuristics. They’re contracts that can be tested:

CONTRACT-1 (Convergence): For stationary input, μₜ converges to the true mean E[X].

CONTRACT-2 (Sensitivity): Deviations from the mean can be detected within O(1/α) observations.

CONTRACT-3 (Stability): The false positive rate is bounded by P(|Z| > threshold) for the underlying distribution.

CONTRACT-4 (Spike Resistance): A single outlier M shifts the mean by at most α·M.

Each contract is a theorem about the system’s behaviour. Each can be verified through testing or formal proof.

Why This Matters for Safety-Critical Systems

The reframing from “statistical monitoring” to “state machine with contracts” has practical implications:

Certification evidence: Safety standards like DO-178C and IEC 62304 require evidence of correct behaviour. A state machine with proven contracts provides exactly that evidence. A machine learning model with “good accuracy on the test set” does not.

Reproducible debugging: When an anomaly is detected (or missed), the entire state history can be reconstructed. Given the initial state and the input sequence, the final state is mathematically determined.

Bounded resource usage: O(1) memory and O(1) computation per step means the monitor can run on constrained embedded systems—exactly where safety-critical monitoring is often needed.

Compositional reasoning: State machines compose. A heartbeat monitor (detecting existence in time) can feed its inter-arrival times to a baseline monitor (detecting normality in value). The composition is itself a state machine with well-defined properties.

The Implementation Reality

Theory is necessary but not sufficient. Real implementations must handle cases that pure mathematics ignores:

Division by zero: The z-score calculation divides by σ. When variance is zero (or numerically indistinguishable from zero), this is undefined. Real implementations need an explicit guard:

if (variance <= EPSILON) {
    z = 0.0;  // Cannot compute meaningful z-score
} else {
    z = fabs(x - mu) / sqrt(variance);
}

Floating-point accumulation: Even deterministic calculations can accumulate error over millions of observations. The implementation must either bound execution length or use numerical techniques that limit drift.

State initialisation: What’s “normal” before we’ve seen any data? The LEARNING state exists precisely to handle this—the system must accumulate sufficient evidence before making judgements.

These aren’t bugs to work around. They’re the reality of finite machines computing with finite precision. A proper implementation addresses them explicitly rather than hoping they don’t occur.

From Monitoring to Composition

The real power of the state machine framing emerges in composition. Consider a pipeline:

events → Pulse → Δt → Baseline → deviation?

Pulse (from Module 1 of c-from-scratch) detects heartbeat events and outputs inter-arrival times. Baseline monitors those times for anomalies. The result: timing anomaly detection without manually configured thresholds.

Both components are closed, total, deterministic FSMs. Their composition is also a closed, total, deterministic FSM. The properties proved for each component extend to the composition.

This is modular verification: prove components correct, compose them, and know the composition inherits correctness. It’s the same principle that makes well-designed software systems tractable.


As with any architectural approach, suitability depends on system requirements and monitoring context. The FSM framing assumes that anomaly detection can be expressed as state transitions over scalar observations—an assumption that holds for many monitoring scenarios but should be validated for specific use cases.

Further Reading

About the Author

William Murray is a Regenerative Systems Architect with 30 years of UNIX infrastructure experience, specializing in deterministic computing for safety-critical systems. Based in the Scottish Highlands, he operates SpeyTech and maintains several open-source projects including C-Sentinel and c-from-scratch.

Discuss This Perspective

For technical discussions or acquisition inquiries, contact SpeyTech directly.

Get in touch
← Back to Insights