AI Architecture

A Complete Deterministic ML Pipeline for Safety-Critical Systems

From training data to deployed inference — bit-identical, auditable, certifiable

Published
January 19, 2026 00:15
Reading Time
12 min
Complete deterministic ML pipeline showing certifiable-data, certifiable-training, certifiable-quant, certifiable-deploy, and certifiable-inference with test counts and core principles

Production ML systems fail for boring reasons. Not exotic model architecture problems — infrastructure problems. Non-deterministic data loading. Floating-point variance across hardware. Training runs that can’t be reproduced. Deployed models that don’t match what was tested.

For most applications, these are annoyances. For safety-critical systems — medical devices, autonomous vehicles, aerospace — they’re disqualifying. You can’t certify software that behaves differently each time it runs.

This article describes a complete ML pipeline designed from first principles for deterministic execution and safety-critical certification. Five open-source projects, 771 tests, and a single invariant: same input → same output, always.

The Problem with Standard ML Pipelines

A typical ML pipeline has non-determinism at every stage:

Data loading: Shuffle order depends on PRNG state. Multi-threaded loaders return batches in unpredictable order. Floating-point normalisation varies by platform.

Training: Gradient reduction order affects floating-point accumulation. Dropout masks aren’t reproducible. Optimizer state depends on execution order.

Quantization: Error bounds unknown. Calibration statistics non-reproducible. No proof that quantized model preserves behaviour.

Deployment: No binding between certified model and deployed artifact. No verification that weights weren’t corrupted in transit.

Inference: Different results on different hardware. Memory allocation timing affects cache state. No audit trail linking prediction to model version.

Each of these is a solvable problem. But solving them requires designing the entire pipeline around determinism, not bolting it on afterward.

The Certifiable Pipeline

The solution is five projects that share common principles:

certifiable-data → certifiable-training → certifiable-quant → certifiable-deploy → certifiable-inference

Each project is pure C99, uses zero dynamic allocation, and produces bit-identical output across platforms.

certifiable-data

Problem: Standard data loaders introduce non-determinism through random shuffling, floating-point normalisation, and multi-threaded batch assembly.

Solution: Data loading as a pure function. Given dataset D, seed s, epoch e, and batch index t, the output is deterministic:

B_t = Pipeline(D, s, e, t)

Key components:

  • Feistel shuffling: Cryptographic permutation that maps any index to its shuffled position in O(1) time. No sequential state, no execution-order dependence.
  • Fixed-point normalisation: Q16.16 arithmetic with precomputed inverse standard deviation. No floating-point division.
  • Merkle provenance: Every epoch produces a cryptographic commitment linking dataset hash, configuration, and shuffle seed.

The same (dataset, seed, epoch, batch) produces the same samples, in the same order, with the same normalisation, forever.

Related reading: The ML Non-Determinism Problem, Bit-Perfect Reproducibility

certifiable-training

Problem: Training loops accumulate floating-point rounding errors non-deterministically. Gradient reduction order affects results. There’s no audit trail proving what happened during training.

Solution: Fixed-point gradients, deterministic reduction topology, and Merkle-chained step verification.

Key components:

  • Q8.24 gradients: Higher precision for gradient accumulation, explicit rounding at each operation.
  • Compensated summation: Neumaier algorithm tracks rounding error, making large reductions accurate.
  • Fixed reduction tree: Binary tree topology eliminates execution-order dependence.
  • Merkle chain: Every training step produces a hash linking previous state, gradients, and new weights. Any step is independently verifiable.

The training loop becomes a deterministic state machine. Given initial weights and training data, the final weights are reproducible bit-for-bit.

Related reading: From Proofs to Code: Mathematical Transcription in C, Cryptographic Execution Tracing

certifiable-quant

Problem: Standard quantizers (TensorFlow Lite, ONNX) are black boxes. Error bounds unknown. No proof that quantized model preserves behaviour. Calibration non-reproducible.

Solution: Provable FP32→Q16.16 quantization with formal error certificates.

Key components:

  • Theoretical analysis: Compute error bounds before quantization. Overflow proofs, range propagation, Lipschitz constants.
  • Empirical calibration: Collect activation statistics with coverage metrics and degenerate detection.
  • Verified conversion: Quantize with explicit error tracking, verify against theoretical bounds.
  • Cryptographic certificate: Merkle root linking analysis, calibration, and verification digests.

The output is a certificate proving the quantized model is within ε of the original, auditable forever.

Related reading: Fixed-Point Neural Networks: The Math Behind Q16.16, Closure, Totality, and the Algebra of Safe Systems

certifiable-deploy

Problem: How do you prove the deployed model matches what was certified? How do you verify weights weren’t tampered with? How do you bind artifacts to specific hardware?

Solution: Canonical bundle format with cryptographic attestation and target binding.

Key components:

  • CBF v1 format: Deterministic container with no ambient metadata. Same content = same bytes = same hash.
  • Merkle attestation: 4-leaf tree binding manifest, weights, certificates, and inference artifacts.
  • JCS manifest: RFC 8785 canonical JSON for deterministic serialization.
  • Target binding: Lock bundles to specific platforms (arch-vendor-device-abi).
  • Runtime loader: Fail-closed state machine. Inference API unreachable without completing verification chain.

The loader implements “Execution ⇒ Verification”: no inference without cryptographic proof that the loaded model matches the certified model.

Related reading: DO-178C Level A Certification, The Real Cost of Dynamic Memory

certifiable-inference

Problem: Different hardware produces different results. Memory allocation affects timing. No audit trail linking predictions to model versions.

Solution: Integer-only forward pass with static allocation and deterministic execution paths.

Key components:

  • Fixed-point arithmetic: Q16.16 for all weights and activations. No floating-point in the critical path.
  • Static buffers: All memory provided by caller. No malloc in inference loop.
  • Bounded execution: Fixed iteration counts, no data-dependent branching, WCET-analysable.

The same input produces the same output on x86, ARM, and RISC-V. Bit-identical, every time.

Related reading: WCET for Neural Network Inference, IEC 62304 Class C Requirements

The Three Theorems

Every project in the certifiable family satisfies three properties:

TheoremStatementImplication
Bit IdentityF_A(s) = F_B(s) for any DVM-compliant platforms A, BCross-platform reproducibility
Bounded ErrorError saturates, doesn’t accumulatePredictable behaviour
AuditabilityAny operation verifiable in O(1) timeScalable verification

These aren’t aspirations — they’re tested properties. 771 tests across 39 test suites verify these invariants hold.

What This Enables

Reproducible Research

Training runs that produce identical results, always. No “worked on my machine” failures. No unexplained variance between runs.

Certification Evidence

DO-178C Level A requires complete requirements traceability. IEC 62304 Class C requires validated transformations. ISO 26262 ASIL-D requires provable behaviour. This pipeline provides the evidence these standards demand.

Incident Investigation

When something goes wrong, you can replay exact inputs through exact model versions and get exact outputs. No “we can’t reproduce the failure” mysteries.

Audit Trails

Cryptographic proof linking every prediction to: the deployed bundle, the quantization certificate, the training Merkle chain, and the data provenance. Years later, you can prove exactly how that prediction was made.

Test Coverage

The pipeline is backed by 771 tests across 39 test suites:

ProjectTestsSuites
certifiable-data1338
certifiable-training22310
certifiable-quant1507
certifiable-deploy2017
certifiable-inference647
Total77139

These aren’t smoke tests. They verify bit-identical output across platforms, correct handling of edge cases, and conformance to formal specifications.

Getting Started

Each project builds with CMake:

git clone https://github.com/williamofai/certifiable-{name}
cd certifiable-{name}
mkdir build && cd build
cmake ..
make
make test

Documentation includes mathematical foundations (CT-MATH-001), data structure specifications (CT-STRUCT-001), and Software Requirements Specifications with full traceability.

When You Don’t Need This

This level of rigour has costs: more complex implementation, restricted arithmetic, verbose documentation. If you’re building a recommendation system or a chatbot, standard frameworks are fine.

You need this when:

  • Regulatory standards require reproducible behaviour (DO-178C, IEC 62304, ISO 26262)
  • Failures have safety implications
  • You need to prove what happened, not just log it
  • Cross-platform consistency matters

Further Reading

Core Concepts:

Safety-Critical Standards:

Infrastructure Patterns:

Production ML:


Context: This pipeline is designed for safety-critical applications with specific regulatory requirements. For most ML applications, standard frameworks provide better tooling, larger ecosystems, and faster development. Evaluate the trade-offs in your specific context.

The goal isn’t complexity for its own sake. It’s providing the evidence that safety-critical certification demands. When a regulator asks “prove this model behaves deterministically,” you need more than good intentions — you need 771 tests and cryptographic audit trails.

All projects are open source under GPL-3.0, with commercial licensing available for proprietary safety-critical systems.

View Projects on GitHub · Technical Documentation

About the Author

William Murray is a Regenerative Systems Architect with 30 years of UNIX infrastructure experience, specializing in deterministic computing for safety-critical systems. Based in the Scottish Highlands, he operates SpeyTech and maintains several open-source projects including C-Sentinel and c-from-scratch.

Let's Discuss Your AI Infrastructure

Available for UK-based consulting on production ML systems and infrastructure architecture.

Get in touch
← Back to AI Architecture