Systems Architecture

Why EMA Beats SMA for Safety-Critical Monitoring

How exponential moving averages provide the mathematical properties that safety-critical systems require

Published
January 7, 2026
Reading Time
8 min
EMA vs SMA Spike Response Comparison

When building anomaly detection for safety-critical systems, a seemingly simple question arises: how do you compute a running average? The naive answer—a Simple Moving Average (SMA)—is what most engineers reach for first. It’s intuitive, well-understood, and appears in every statistics textbook.

But for systems where determinism, bounded memory, and mathematical closure matter, SMA has fundamental problems that make it unsuitable. The Exponential Moving Average (EMA) isn’t just an alternative—it’s the only form that satisfies the contracts safety-critical systems require.

This insight emerged from developing the Baseline module for c-from-scratch, an open-source educational project teaching deterministic C programming. The mathematical analysis has broader implications for anyone building monitoring systems in constrained environments.

The Hidden Assumptions of SMA

A Simple Moving Average over N samples is defined as:

μₜ = (1/N) · Σ xₖ   for k = t-N+1 to t

This looks clean on paper. In practice, it requires:

  • A buffer of N previous values — memory that grows with window size
  • Knowledge of buffer fill state — behaviour differs during startup
  • Shifting operations — oldest value must be removed as new value enters

These aren’t implementation details. They’re fundamental properties that violate the contracts deterministic systems need.

Five Properties That Matter

For safety-critical monitoring, we need a system that is:

  1. Closed — State at time t depends only on state at t-1 and current input
  2. Bounded — Memory usage is O(1), independent of observation count
  3. Deterministic — Same inputs always produce same outputs
  4. Spike-resistant — No single input can corrupt the baseline
  5. Recoverable — System can resume from any saved state

Let’s evaluate both approaches against these properties.

Simple Moving Average
  • Closure: ❌ Depends on N past samples
  • Memory: ❌ Requires buffer of size N
  • Determinism: ❌ Behaviour varies with buffer fill
  • Spike resistance: ❌ One outlier corrupts N steps
  • C-friendly: ❌ Requires shifting buffers
Exponential Moving Average
  • Closure: ✓ Depends only on μₜ₋₁ and xₜ
  • Memory: ✓ O(1) — single value
  • Determinism: ✓ Pure recurrence relation
  • Spike resistance: ✓ Bounded influence per input
  • C-friendly: ✓ One multiply + one add

SMA fails on every property that matters. This isn’t a matter of preference—it’s a matter of mathematical suitability.

The EMA Recurrence

The Exponential Moving Average is defined as:

μₜ = α·xₜ + (1 − α)·μₜ₋₁

Where 0 < α < 1 is the smoothing factor.

This single equation delivers all five properties:

Closure: The new mean depends only on the previous mean and current observation. No history buffer required.

Bounded memory: We store exactly one value (μₜ₋₁). Whether we’ve seen 10 observations or 10 million, memory usage is identical.

Deterministic: Given the same initial state and input sequence, the output is mathematically determined. No buffer fill states, no edge cases.

Spike resistance: A single outlier M can shift the mean by at most α·|M - μₜ₋₁|. This is a hard bound, not a probabilistic claim.

Recoverability: Save μₜ and you can resume. No buffer contents to serialise.

Why Spike Resistance Matters

Consider a monitoring system tracking CPU utilisation. Normal values hover around 50%. Suddenly, a measurement glitch reports 1000%.

With SMA (N=10): That spike contributes 100% to the average for the next 10 steps. Your baseline is corrupted, and you’ll see false anomalies (or miss real ones) until the spike ages out.

With EMA (α=0.2): The spike shifts your mean by at most 0.2 × (1000 - 50) = 190. Significant, but bounded. And the influence decays exponentially—by step 5, the spike’s contribution is already below 7%.

Impact after step 1: 190.0 (spike just occurred)
Impact after step 2: 152.0 (80% of previous)
Impact after step 3: 121.6
Impact after step 4:  97.3
Impact after step 5:  77.8

This bounded influence is a safety guarantee that SMA cannot provide.

The Effective Window Interpretation

A common objection: “But I need a specific window size for my use case.”

EMA provides equivalent behaviour. For a smoothing factor α, the effective window size is approximately 2/α. An EMA with α=0.1 behaves similarly to a 20-sample SMA—but without the buffer.

α valueEffective windowUse case
0.5~4 samplesFast response
0.2~10 samplesBalanced
0.1~20 samplesSmooth baseline
0.05~40 samplesSlow-moving systems

The key insight: EMA is not an approximation of SMA. It’s a different model with better properties.

Variance Tracking

Mean alone is insufficient for anomaly detection. We also need variance to answer: “How unusual is this observation?”

The same exponential structure applies:

deviationₜ = xₜ − μₜ₋₁
σₜ² = α·deviationₜ² + (1 − α)·σₜ₋₁²

Critical detail: deviation uses the previous mean (μₜ₋₁), not the updated mean. This prevents the current observation from influencing its own anomaly score.

With mean and variance, we can compute a z-score:

zₜ = |xₜ − μₜ₋₁| / σₜ

This answers: “How many standard deviations from normal?” A z-score of 3 means the observation is statistically rare (roughly 0.3% probability for normally distributed data).

The Complete State Machine

Combining these elements, our minimal statistical state is:

Sₜ = (μₜ, σₜ², nₜ, qₜ)

Where:

  • μₜ = exponentially-weighted mean
  • σₜ² = exponentially-weighted variance
  • nₜ = observation count
  • qₜ = FSM state ∈ {LEARNING, STABLE, DEVIATION}

This is the smallest state that enables quantified anomaly detection. Anything less is insufficient. Anything more violates boundedness.

Design Property: Closure

The state Sₜ = f(Sₜ₋₁, xₜ) is fully determined by the previous state and current observation. No external history is required.

Implementation in Pure C

The EMA update translates directly to C:

typedef struct {
    double mean;
    double variance;
    uint32_t count;
    baseline_state_t state;
} baseline_t;

void baseline_update(baseline_t *b, double x, double alpha) {
    double deviation = x - b->mean;
    b->mean = alpha * x + (1.0 - alpha) * b->mean;
    b->variance = alpha * (deviation * deviation) + 
                  (1.0 - alpha) * b->variance;
    b->count++;
    
    // State transition logic
    if (b->count >= MIN_SAMPLES && b->variance > EPSILON) {
        double sigma = sqrt(b->variance);
        double z = fabs(deviation) / sigma;
        b->state = (z > THRESHOLD) ? DEVIATION : STABLE;
    }
}

No allocations. No buffers. No dependencies. Just arithmetic that a microcontroller can execute in microseconds.

Practical Applications

This pattern applies wherever you need anomaly detection with deterministic properties:

Embedded systems: Heartbeat monitoring, sensor validation, watchdog triggers. Memory constraints make SMA impractical; EMA fits in registers.

DevOps/SRE: Latency monitoring, error rate tracking, capacity planning. The bounded spike influence prevents alert storms from single bad measurements.

Financial systems: Transaction monitoring, fraud detection. Deterministic behaviour supports audit requirements.

Medical devices: Vital sign monitoring, dosage tracking. The mathematical properties support regulatory evidence for certification.

Conclusion

Simple Moving Average is a pedagogical tool—useful for explaining concepts, unsuitable for production systems where determinism matters.

Exponential Moving Average provides the mathematical properties that safety-critical monitoring requires: closure, bounded memory, deterministic behaviour, spike resistance, and recoverability. These aren’t incremental improvements; they’re categorical differences that determine whether a system can be certified, verified, and trusted.

The implementation is trivial. The mathematics is elegant. The properties are exactly what we need.

For the complete derivation and working C implementation, see c-from-scratch Module 2: Baseline on GitHub.

Key Takeaway

EMA isn't an optimisation of SMA—it's the only form that satisfies the contracts safety-critical systems require: closure, bounded memory, determinism, spike resistance, and recoverability.

As with any architectural approach, suitability depends on system requirements and the specific monitoring context. EMA assumes that recent observations are more relevant than distant ones—an assumption that holds for most anomaly detection scenarios but should be validated for your use case.

About the Author

William Murray is a Regenerative Systems Architect with 30 years of UNIX infrastructure experience, specializing in deterministic computing for safety-critical systems. Based in the Scottish Highlands, he operates SpeyTech and maintains several open-source projects including C-Sentinel and c-from-scratch.

Discuss This Perspective

For technical discussions or acquisition inquiries, contact SpeyTech directly.

Get in touch
← Back to Insights