Why EMA Beats SMA for Safety-Critical Monitoring

When building anomaly detection for safety-critical systems, a seemingly simple question arises: how do you compute a running average? The naive answer—a Simple Moving Average (SMA)—is what most engineers reach for first. It’s intuitive, well-understood, and appears in every statistics textbook.

But for systems where determinism, bounded memory, and mathematical closure matter, SMA has fundamental problems that make it unsuitable. The Exponential Moving Average (EMA) isn’t just an alternative—it’s the only form that satisfies the contracts safety-critical systems require.

This insight emerged from developing the Baseline module for c-from-scratch, an open-source educational project teaching deterministic C programming. The mathematical analysis has broader implications for anyone building monitoring systems in constrained environments.

The Hidden Assumptions of SMA

A Simple Moving Average over N samples is defined as:

μₜ = (1/N) · Σ xₖ   for k = t-N+1 to t

This looks clean on paper. In practice, it requires:

A buffer of N previous values — memory that grows with window size
Knowledge of buffer fill state — behaviour differs during startup
Shifting operations — oldest value must be removed as new value enters

These aren’t implementation details. They’re fundamental properties that violate the contracts deterministic systems need.

Five Properties That Matter

For safety-critical monitoring, we need a system that is:

Closed — State at time t depends only on state at t-1 and current input
Bounded — Memory usage is O(1), independent of observation count
Deterministic — Same inputs always produce same outputs
Spike-resistant — No single input can corrupt the baseline
Recoverable — System can resume from any saved state

Let’s evaluate both approaches against these properties.

Simple Moving Average

Closure: ❌ Depends on N past samples
Memory: ❌ Requires buffer of size N
Determinism: ❌ Behaviour varies with buffer fill
Spike resistance: ❌ One outlier corrupts N steps
C-friendly: ❌ Requires shifting buffers

Exponential Moving Average

Closure: ✓ Depends only on μₜ₋₁ and xₜ
Memory: ✓ O(1) — single value
Determinism: ✓ Pure recurrence relation
Spike resistance: ✓ Bounded influence per input
C-friendly: ✓ One multiply + one add

SMA fails on every property that matters. This isn’t a matter of preference—it’s a matter of mathematical suitability.

The EMA Recurrence

The Exponential Moving Average is defined as:

μₜ = α·xₜ + (1 − α)·μₜ₋₁

Where 0 < α < 1 is the smoothing factor.

This single equation delivers all five properties:

Closure: The new mean depends only on the previous mean and current observation. No history buffer required.

Bounded memory: We store exactly one value (μₜ₋₁). Whether we’ve seen 10 observations or 10 million, memory usage is identical.

Deterministic: Given the same initial state and input sequence, the output is mathematically determined. No buffer fill states, no edge cases.

Spike resistance: A single outlier M can shift the mean by at most α·|M - μₜ₋₁|. This is a hard bound, not a probabilistic claim.

Recoverability: Save μₜ and you can resume. No buffer contents to serialise.

Why Spike Resistance Matters

Consider a monitoring system tracking CPU utilisation. Normal values hover around 50%. Suddenly, a measurement glitch reports 1000%.

With SMA (N=10): That spike contributes 100% to the average for the next 10 steps. Your baseline is corrupted, and you’ll see false anomalies (or miss real ones) until the spike ages out.

With EMA (α=0.2): The spike shifts your mean by at most 0.2 × (1000 - 50) = 190. Significant, but bounded. And the influence decays exponentially—by step 5, the spike’s contribution is already below 7%.

Impact after step 1: 190.0 (spike just occurred)
Impact after step 2: 152.0 (80% of previous)
Impact after step 3: 121.6
Impact after step 4:  97.3
Impact after step 5:  77.8

This bounded influence is a safety guarantee that SMA cannot provide.

The Effective Window Interpretation

A common objection: “But I need a specific window size for my use case.”

EMA provides equivalent behaviour. For a smoothing factor α, the effective window size is approximately 2/α. An EMA with α=0.1 behaves similarly to a 20-sample SMA—but without the buffer.

α value	Effective window	Use case
0.5	~4 samples	Fast response
0.2	~10 samples	Balanced
0.1	~20 samples	Smooth baseline
0.05	~40 samples	Slow-moving systems

The key insight: EMA is not an approximation of SMA. It’s a different model with better properties.

Variance Tracking

Mean alone is insufficient for anomaly detection. We also need variance to answer: “How unusual is this observation?”

The same exponential structure applies:

deviationₜ = xₜ − μₜ₋₁
σₜ² = α·deviationₜ² + (1 − α)·σₜ₋₁²

Critical detail: deviation uses the previous mean (μₜ₋₁), not the updated mean. This prevents the current observation from influencing its own anomaly score.

With mean and variance, we can compute a z-score:

zₜ = |xₜ − μₜ₋₁| / σₜ

This answers: “How many standard deviations from normal?” A z-score of 3 means the observation is statistically rare (roughly 0.3% probability for normally distributed data).

The Complete State Machine

Combining these elements, our minimal statistical state is:

Sₜ = (μₜ, σₜ², nₜ, qₜ)

Where:

μₜ = exponentially-weighted mean
σₜ² = exponentially-weighted variance
nₜ = observation count
qₜ = FSM state ∈ {LEARNING, STABLE, DEVIATION}

This is the smallest state that enables quantified anomaly detection. Anything less is insufficient. Anything more violates boundedness.

Design Property: Closure

The state Sₜ = f(Sₜ₋₁, xₜ) is fully determined by the previous state and current observation. No external history is required.

Implementation in Pure C

The EMA update translates directly to C:

typedef struct {
    double mean;
    double variance;
    uint32_t count;
    baseline_state_t state;
} baseline_t;

void baseline_update(baseline_t *b, double x, double alpha) {
    double deviation = x - b->mean;
    b->mean = alpha * x + (1.0 - alpha) * b->mean;
    b->variance = alpha * (deviation * deviation) + 
                  (1.0 - alpha) * b->variance;
    b->count++;
    
    // State transition logic
    if (b->count >= MIN_SAMPLES && b->variance > EPSILON) {
        double sigma = sqrt(b->variance);
        double z = fabs(deviation) / sigma;
        b->state = (z > THRESHOLD) ? DEVIATION : STABLE;
    }
}

No allocations. No buffers. No dependencies. Just arithmetic that a microcontroller can execute in microseconds.

Practical Applications

This pattern applies wherever you need anomaly detection with deterministic properties:

Embedded systems: Heartbeat monitoring, sensor validation, watchdog triggers. Memory constraints make SMA impractical; EMA fits in registers.

DevOps/SRE: Latency monitoring, error rate tracking, capacity planning. The bounded spike influence prevents alert storms from single bad measurements.

Financial systems: Transaction monitoring, fraud detection. Deterministic behaviour supports audit requirements.

Medical devices: Vital sign monitoring, dosage tracking. The mathematical properties support regulatory evidence for certification.

Conclusion

Simple Moving Average is a pedagogical tool—useful for explaining concepts, unsuitable for production systems where determinism matters.

Exponential Moving Average provides the mathematical properties that safety-critical monitoring requires: closure, bounded memory, deterministic behaviour, spike resistance, and recoverability. These aren’t incremental improvements; they’re categorical differences that determine whether a system can be certified, verified, and trusted.

The implementation is trivial. The mathematics is elegant. The properties are exactly what we need.

For the complete derivation and working C implementation, see c-from-scratch Module 2: Baseline on GitHub.

Key Takeaway

EMA isn't an optimisation of SMA—it's the only form that satisfies the contracts safety-critical systems require: closure, bounded memory, determinism, spike resistance, and recoverability.

As with any architectural approach, suitability depends on system requirements and the specific monitoring context. EMA assumes that recent observations are more relevant than distant ones—an assumption that holds for most anomaly detection scenarios but should be validated for your use case.

Why EMA Beats SMA for Safety-Critical Monitoring

The Hidden Assumptions of SMA

Five Properties That Matter

The EMA Recurrence

Why Spike Resistance Matters

The Effective Window Interpretation

Variance Tracking

The Complete State Machine

Implementation in Pure C

Practical Applications

Conclusion

About the Author

Discuss This Perspective

Why EMA Beats SMA for Safety-Critical Monitoring

The Hidden Assumptions of SMA

Five Properties That Matter

The EMA Recurrence

Why Spike Resistance Matters

The Effective Window Interpretation

Variance Tracking

The Complete State Machine

Implementation in Pure C

Practical Applications

Conclusion

About the Author

Occasional Technical Updates

Discuss This Perspective