Why Floating Point Is Dangerous: The Case for Deterministic AI in C

The Problem Nobody Talks About

Consider a medical device company developing an ML model to detect cardiac abnormalities. The model passes all tests—95% accuracy, low latency, no memory leaks. It runs perfectly in validation.

During safety review, they discover something troubling: given identical ECG data, the model sometimes classifies the signal as normal and sometimes as abnormal. The same input. Different outputs. They can’t reproduce specific predictions reliably.

The model isn’t wrong. It’s non-deterministic.

This scenario—or variations of it—represents a common challenge in safety-critical ML development. An FDA submission gets delayed while teams debug reproducibility issues. An automotive supplier discovers their perception model behaves differently across hardware platforms. A financial system produces inconsistent risk scores for the same customer data.

The symptoms differ, but the root cause is the same: standard ML infrastructure was built for research, not for systems where reproducibility is a regulatory requirement.

Modern ML infrastructure was built for research, not for systems where “mostly reproducible” isn’t good enough.

The Three Sources of Non-Determinism

ML models fail to reproduce for three reasons. Two are well-known. The third is the one that actually causes production failures.

1. Floating Point Arithmetic (The Obvious One)

Everyone knows floating point math is imprecise. 0.1 + 0.2 ≠ 0.3 in binary. Rounding errors accumulate. Operations aren’t associative: (a + b) + c ≠ a + (b + c).

What’s less obvious: compiler optimization changes the results.

// Without optimization
float sum = 0.0f;
for (int i = 0; i < 1000000; i++) {
    sum += values[i];
}
// sum = 50000123.45

// With -O3 optimization (SIMD vectorization)
float sum = 0.0f;
for (int i = 0; i < 1000000; i++) {
    sum += values[i];
}
// sum = 50000119.23

Same code. Same input. Different result. The difference? The compiler reordered operations to use SIMD instructions. Both results are “correct” according to floating point semantics.

In a neural network with millions of operations, these differences compound. Run the same model on:

Development laptop (AVX2)
Production server (AVX-512)
Edge device (NEON)

You get three different outputs. All mathematically valid. None identical.

2. Hash Table Iteration Order (The Subtle One)

Python dictionaries and C++ unordered_map use hash tables. Iteration order depends on:

Hash function implementation
Memory addresses (affected by ASLR)
Insertion order
Load factor

A neural network training script that iterates over a feature dictionary will process features in different orders across runs. If those features are used in aggregations or fed to recurrent layers, outputs diverge.

# Training run 1
features = {"age": 45, "bp": 120, "glucose": 95}
for key, value in features.items():
    # Process features
    # Iteration order: age, bp, glucose

# Training run 2 (different memory layout)
features = {"age": 45, "bp": 120, "glucose": 95}
for key, value in features.items():
    # Process features  
    # Iteration order: glucose, age, bp

Gradient descent sees features in different orders. Model weights converge to different local minima. Both models achieve similar validation accuracy. Neither reproduces the other’s predictions exactly.

3. Memory Allocation Order (The Production Killer)

This is the one that breaks deployed systems.

Memory allocators like malloc() return addresses that depend on:

Prior allocations
Fragmentation
Thread scheduling
Operating system decisions

If your code accidentally depends on memory addresses—even indirectly—behavior changes.

Indirect dependencies you might not realize you have:

Pointer comparison:

if (ptr_a < ptr_b) {
    // Branch taken depends on allocation order
}

Address-based hashing:

size_t hash = (size_t)ptr ^ some_value;
// Hash changes with address, affecting data structure order

Uninitialized memory:

struct feature {
    float value;
    // Padding bytes uninitialized
};
// Memcmp or hash over this struct sees random garbage

Cache alignment:

// Allocated at 64-byte boundary: fast
// Allocated at boundary + 32: cache misses, different timing

In production, these create Heisenbugs: failures that disappear when you add logging, change build flags, or restart the process. The model works fine until memory layout shifts. Then predictions change. You can’t reproduce it because your debugger changes the allocation pattern.

Why This Breaks Certification

Safety-critical systems require evidence that the system behaves correctly. This means:

Reproducibility: Given input X, output must be Y. Always. On any hardware. Forever.

Traceability: Every decision must trace to a requirement. “The model sometimes outputs Z” isn’t traceable.

Verification: You must prove the system handles all inputs correctly. Can’t prove correctness if behavior varies.

Standards like DO-178C (aerospace), IEC 62304 (medical devices), and ISO 26262 (automotive) all require deterministic behavior. Non-determinism isn’t just inconvenient—it’s non-compliant.

When the FAA auditor asks “prove this model produces the same output in flight as it did during testing,” the answer can’t be “well, usually it does.”

The Cost of Non-Determinism

In development: Debugging takes 3-10× longer when you can’t reproduce failures.

In testing: Flaky tests undermine confidence. Teams ignore failures because “it works when I run it.”

In production: Silent failures accumulate. The system slowly drifts from validated behavior. By the time you notice, you can’t trace back to when it started or why.

In litigation: “We couldn’t reproduce the failure” doesn’t hold up in court when someone was harmed.

In acquisition due diligence: Non-deterministic systems are uninsurable risks. Acquirers discount valuations or walk away.

The Deterministic Alternative

Fixed-point arithmetic eliminates floating point non-determinism. Operations are exact. Results are identical across platforms.

// Fixed-point: Q16.16 format (16 bits integer, 16 bits fractional)
typedef int32_t fixed_t;

#define FIXED_ONE (1 << 16)

fixed_t fixed_mul(fixed_t a, fixed_t b) {
    return (int32_t)(((int64_t)a * b) >> 16);
}

// Example: 2.5 * 3.75
fixed_t a = (2 << 16) | (1 << 15);  // 2.5
fixed_t b = (3 << 16) | (3 << 14);  // 3.75
fixed_t result = fixed_mul(a, b);    // 9.375, exactly

// Same result on x86, ARM, RISC-V, today, tomorrow, forever

No rounding errors. No platform differences. No compiler optimization surprises.

Trade-offs:

Range is limited (Q16.16 can represent -32768 to +32767.99998)
Precision is fixed (not “floating”)
Some operations are slower (though often faster than software floating point)

When this matters: When correctness beats convenience. When “good enough” isn’t good enough.

Deterministic data structures eliminate hash table non-determinism. A sorted array or B-tree has defined iteration order.

// Deterministic feature storage
typedef struct {
    const char* name;
    float value;
} feature_t;

// Sort by name for deterministic iteration
int compare_features(const void* a, const void* b) {
    return strcmp(((feature_t*)a)->name, ((feature_t*)b)->name);
}

void process_features(feature_t* features, size_t count) {
    qsort(features, count, sizeof(feature_t), compare_features);
    
    // Now iteration order is deterministic
    for (size_t i = 0; i < count; i++) {
        // Process features[i] in alphabetical order
    }
}

Explicit memory management eliminates allocation non-determinism. Pre-allocate everything. No malloc() after initialization.

// Bounded, deterministic memory pool
typedef struct {
    uint8_t pool[POOL_SIZE];
    size_t used;
} memory_pool_t;

void* pool_alloc(memory_pool_t* pool, size_t size) {
    if (pool->used + size > POOL_SIZE) {
        return NULL;  // Explicit failure
    }
    
    void* ptr = pool->pool + pool->used;
    pool->used += size;
    return ptr;
}

// Allocation addresses are now deterministic
// Same allocation sequence = same addresses = same behavior

What This Looks Like in Practice

A deterministic neural network inference engine:

// Model state: fixed size, pre-allocated
typedef struct {
    fixed_t weights[MAX_WEIGHTS];
    fixed_t activations[MAX_ACTIVATIONS];
    memory_pool_t pool;
    uint32_t layer_count;
} model_t;

// Initialize with bounded resources
model_t* model_init(const uint8_t* model_data, size_t model_size) {
    model_t* m = calloc(1, sizeof(model_t));
    if (!m) return NULL;
    
    // Load weights into fixed-point format
    load_weights(m, model_data, model_size);
    
    // Pre-allocate all activation memory
    pool_init(&m->pool, sizeof(m->activations));
    
    return m;
}

// Inference: deterministic, bounded
void model_predict(model_t* m, const fixed_t* input, fixed_t* output) {
    // Reset pool to initial state (deterministic allocation)
    pool_reset(&m->pool);
    
    // Forward pass: exact fixed-point ops
    for (uint32_t i = 0; i < m->layer_count; i++) {
        layer_forward(m, i, input);
    }
    
    // Copy result
    memcpy(output, m->activations + output_offset, output_size);
}

Properties guaranteed:

Same input → same output (bit-for-bit identical)
Bounded memory (no dynamic allocation)
Bounded time (no data-dependent loops with unbounded iteration)
No undefined behavior (every operation is specified)

This is certifiable. This is reproducible. This is what safety-critical systems require.

The Performance Question

“But fixed-point is slower than floating point!”

Sometimes yes. Often no. Depends on the target.

On CPUs with FPUs: Floating point might be faster (hardware acceleration).

On microcontrollers without FPUs: Fixed-point is dramatically faster (no software floating point emulation).

On embedded DSPs: Fixed-point is native (that’s what they’re designed for).

For cache-sensitive workloads: Fixed-point uses less memory (int32_t vs float = same size, but fixed-point models often quantize to int16_t or int8_t).

More importantly: determinism enables optimizations floating point can’t do.

When behavior is guaranteed, you can:

Pre-compute lookup tables
Prove loop bounds for unrolling
Guarantee cache behavior
Enable aggressive optimizations that would be unsafe with floating point

The real performance win isn’t raw FLOPS. It’s predictability. When you know exactly how long every operation takes, you can schedule with confidence. No worst-case margin. No jitter. Just deterministic timing.

When Floating Point Is Fine

Most ML doesn’t need this. Research models, recommendation systems, content classification—non-determinism is acceptable. The cost of determinism exceeds the benefit.

Use floating point when:

Approximate results are acceptable
You’re not certifying for safety
Development speed matters more than reproducibility
You need the full dynamic range
You’re not debugging production issues

Consider determinism when:

Debugging is expensive (hard to reproduce issues)
Compliance requires proof of correctness
Systems are long-lived (must behave identically for years)
Liability is significant (people can be harmed)
Trust matters (financial, medical, legal decisions)

The Path Forward

Writing deterministic ML infrastructure in C isn’t exotic. It’s how embedded systems have done signal processing for decades. Digital filters, FFTs, control loops—all deterministic, all fixed-point, all certifiable.

The techniques exist. The standards exist. What’s missing is applying them to modern ML.

The opportunity: Build inference engines that guarantee reproducibility. Create tools that enable certification. Solve the problems that prevent ML from being deployed in systems that matter.

The challenge: Convenience vs. correctness. Floating point is easier. Determinism requires discipline.

The reality: For systems where failure has consequences, “easier” isn’t enough.

Autonomous vehicles, medical devices, industrial control systems, financial infrastructure—these aren’t research projects. They’re production systems where “mostly reproducible” is unacceptable.

The industry needs deterministic ML infrastructure built by people who understand both the mathematics and the constraints. People who’ve shipped safety-critical systems. People who know the difference between “it works” and “it provably works.”

If you’re building ML for systems that matter, floating point isn’t just dangerous—it’s a liability you can’t afford.

For more on production ML infrastructure:

Production AI Systems: What 30 Years of UNIX Taught Me - Infrastructure principles that enable reproducibility
Debugging Model Behavior in Production - How non-determinism makes debugging impossible
The Observability Gap in ML Systems - Why traditional monitoring misses non-deterministic failures

Building deterministic ML infrastructure? I’m working on open-source tools to solve exactly these problems. Get in touch if you’re facing these challenges in production systems.

Why Floating Point Is Dangerous: The Case for Deterministic AI in C

The Problem Nobody Talks About

The Three Sources of Non-Determinism

1. Floating Point Arithmetic (The Obvious One)

2. Hash Table Iteration Order (The Subtle One)

3. Memory Allocation Order (The Production Killer)

Why This Breaks Certification

The Cost of Non-Determinism

The Deterministic Alternative

What This Looks Like in Practice

The Performance Question

When Floating Point Is Fine

The Path Forward

About the Author

Let's Discuss Your AI Infrastructure

Why Floating Point Is Dangerous: The Case for Deterministic AI in C

The Problem Nobody Talks About

The Three Sources of Non-Determinism

1. Floating Point Arithmetic (The Obvious One)

2. Hash Table Iteration Order (The Subtle One)

3. Memory Allocation Order (The Production Killer)

Why This Breaks Certification

The Cost of Non-Determinism

The Deterministic Alternative

What This Looks Like in Practice

The Performance Question

When Floating Point Is Fine

The Path Forward

Related Reading

About the Author

Occasional Technical Updates

Let's Discuss Your AI Infrastructure