There’s a question that blocks AI adoption in safety-critical systems:
“Can you prove the model running on deployed hardware is exactly the same as what you tested?”
Not “similar”. Not “statistically equivalent”. The same — bit for bit, hash for hash, across different platforms, compilers, and architectures.
With TensorFlow Lite, PyTorch, or ONNX Runtime, the answer is no. Floating-point arithmetic varies by platform. Hash table iteration order depends on memory allocation. Thread scheduling is inherently non-deterministic.
For most applications, that doesn’t matter. For aerospace, medical devices, and autonomous vehicles — where certification requires evidence, not assumptions — it’s a fundamental barrier.
The certifiable-* ecosystem removes that barrier.
Eight Projects, One Pipeline
The ecosystem consists of eight interconnected projects, each handling one stage of the ML pipeline:
| Stage | Project | Purpose | Commitment |
|---|---|---|---|
| 0 | certifiable-data | Data pipeline | Merkle root of batches |
| 1 | certifiable-training | Model training | Gradient chain hash |
| 2 | certifiable-quant | Quantization | Error certificate |
| 3 | certifiable-deploy | Deployment packaging | Attestation tree |
| 4 | certifiable-inference | Forward pass | Predictions hash |
| 5 | certifiable-monitor | Runtime monitoring | Ledger digest |
| 6 | certifiable-verify | Verification | Report hash |
| — | certifiable-harness | End-to-end orchestration | Golden reference |
Every stage produces a cryptographic commitment. Every commitment chains to the next. Break any link, and verification fails.
The Core Problem: Non-Determinism
Traditional ML frameworks aren’t designed for determinism. They’re optimised for flexibility and performance:
Floating-point variance: The same model produces different outputs on different CPUs due to FMA (fused multiply-add) availability, SIMD instruction selection, and compiler optimisations.
Memory allocation: Python dictionaries, hash maps, and sets iterate in order determined by memory layout — which varies between runs.
Threading: Parallel operations complete in unpredictable order. Reduce operations accumulate floating-point errors differently depending on execution timing.
Dynamic allocation: malloc() returns different addresses, affecting pointer-based data structures and timing.
For consumer applications, these differences are invisible. For certification, they’re disqualifying.
The Solution: Determinism by Design
The certifiable-* ecosystem takes a different approach:
Fixed-Point Arithmetic (Q16.16)
Every calculation uses 32-bit fixed-point representation:
- 16 bits for the integer part
- 16 bits for the fractional part
- Range: -32768.0 to +32767.99998
No floating-point operations anywhere in the pipeline. Same inputs produce same outputs on any platform that implements integer arithmetic correctly — which is all of them.
/* Q16.16 multiplication with overflow detection */
int32_t q16_mul(int32_t a, int32_t b, q16_fault_t *fault) {
int64_t result = (int64_t)a * (int64_t)b;
result >>= 16;
if (result > Q16_MAX || result < Q16_MIN) {
fault->overflow = 1;
return (result > 0) ? Q16_MAX : Q16_MIN;
}
return (int32_t)result;
}Static Allocation
No malloc(). All buffers declared at compile time or allocated by the caller:
/* Caller provides the buffer */
void ci_forward(const ci_model_t *model,
const int32_t *input,
int32_t *output, /* Caller-allocated */
int32_t *workspace, /* Caller-allocated */
ci_fault_t *fault);No heap fragmentation. No allocation failures. Bounded memory usage provable at compile time.
Deterministic Algorithms
- Sorting: Merge sort (stable, O(n log n) worst case)
- Shuffling: Feistel network with cycle-walking (deterministic given seed)
- Hashing: SHA-256 throughout
- Reduction: Ordered accumulation (no parallel reduce)
Every algorithm chosen for determinism first, performance second.
Cryptographic Provenance
Each stage produces a 32-byte SHA-256 commitment that includes:
- The stage’s own output
- The previous stage’s commitment
This creates an unbroken chain from training data to deployed inference:
M_data = MerkleRoot(batch_hashes)
H_train = SHA256(M_data || gradient_chain)
H_cert = SHA256(H_train || quantization_certificate)
R_attest = SHA256(H_cert || bundle_files)
H_pred = SHA256(R_attest || predictions)
L_n = SHA256(H_pred || ledger_entries)
H_report = SHA256(L_n || verification_results)Modify any input, and every downstream commitment changes. The chain is tamper-evident by construction.
The Harness: Proving Bit-Identity
certifiable-harness orchestrates all seven stages and compares results against a golden reference:
$ ./certifiable-harness --golden reference.golden --output result.json
═══════════════════════════════════════════════════════════════
Certifiable Harness v1.0.0
Platform: x86_64
═══════════════════════════════════════════════════════════════
[0] data ✓ (OK, 4 µs)
[1] training ✓ (OK, 3 µs)
[2] quant ✓ (OK, 3 µs)
[3] deploy ✓ (OK, 3 µs)
[4] inference ✓ (OK, 3 µs)
[5] monitor ✓ (OK, 4 µs)
[6] verify ✓ (OK, 8 µs)
Status: ALL STAGES PASSED ✓
Bit-identical: YES ✓
═══════════════════════════════════════════════════════════════The golden reference is a 368-byte binary containing commitments from all seven stages. Run the harness on any platform — if the hashes match, you have mathematical proof of identical execution.
Verified Cross-Platform
The harness has been tested on:
| Platform | OS | Compiler | Result |
|---|---|---|---|
| x86_64 | Linux (Ubuntu) | GCC 12.2.0 | ✓ Bit-identical |
| x86_64 | macOS 11.7 | Apple Clang | ✓ Bit-identical |
Different operating systems. Different compilers. Same hashes.
What’s Implemented
| Project | Tests | Key Features |
|---|---|---|
| certifiable-data | 142 | CSV parsing, Merkle trees, deterministic shuffle |
| certifiable-training | 10 suites | Gradient descent, weight updates, chain hashing |
| certifiable-quant | 134 | FP32→Q16.16, error bounds, certificates |
| certifiable-deploy | 147 | Bundle format, manifest, attestation |
| certifiable-inference | 8 suites | Conv2D, pooling, dense layers, activations |
| certifiable-monitor | 253 | Drift detection, ledger, policy enforcement |
| certifiable-verify | 10 suites | Binding verification, report generation |
| certifiable-harness | 4 suites | Orchestration, golden comparison |
Total: 700+ tests across 8 projects.
Documentation for Certification
Each project includes formal documentation designed for regulatory review:
- MATH-001 — Mathematical specification (definitions, algorithms, proofs)
- STRUCT-001 — Data structure specification (types, layouts, invariants)
- SRS-xxx — Software requirements (traceable, testable requirements)
certifiable-harness alone has 81 traceable requirements across 4 SRS documents.
Compliance Context
The ecosystem is designed to support certification under:
| Standard | Domain | Key Requirements |
|---|---|---|
| DO-178C Level A | Aerospace | MC/DC coverage, traceability, determinism |
| IEC 62304 Class C | Medical devices | Risk management, verification, documentation |
| ISO 26262 ASIL-D | Automotive | Fault tolerance, diagnostic coverage |
| ISO 21448 (SOTIF) | Automotive AI | Behaviour verification, edge cases |
| UL 4600 | Autonomous systems | Safety case, operational design domain |
Deterministic execution simplifies verification. If the same inputs always produce the same outputs, testing becomes meaningful. If you can prove cross-platform identity, deployment becomes traceable.
The Trade-Offs
This approach has costs:
Performance: Fixed-point is slower than optimised floating-point on modern GPUs. The ecosystem is designed for edge deployment where determinism matters more than throughput.
Precision: Q16.16 has less dynamic range than FP32. For safety-critical applications, bounded precision with known error bounds is often preferable to unbounded precision with unknown variance.
Complexity: Eight projects is more infrastructure than dropping in TensorFlow Lite. The question is whether that infrastructure is justified by the assurance it provides.
Ecosystem: No pre-trained models, no model zoo, no community of contributors (yet). You’re building from scratch.
For consumer applications, these costs aren’t justified. For systems where certification is mandatory and determinism is required, the alternative is often “don’t use ML at all.”
Getting Started
Clone any project and run the tests:
git clone https://github.com/williamofai/certifiable-inference.git
cd certifiable-inference
mkdir build && cd build
cmake ..
make
ctest --output-on-failureFor end-to-end verification:
git clone https://github.com/williamofai/certifiable-harness.git
cd certifiable-harness
mkdir build && cd build
cmake ..
make
# Generate golden reference
./certifiable-harness --generate-golden --output result.json
# Verify (should show Bit-identical: YES)
./certifiable-harness --golden result.json.golden --output verify.jsonWhat This Enables
When a regulator asks “how do you know the deployed model is the same as what you tested?”, the answer changes:
Before: “We have a careful deployment process.”
After: “Here’s a 368-byte golden reference. Run it on the deployed hardware. If the seven SHA-256 hashes match, the execution is mathematically identical. If they don’t, I can tell you exactly which stage diverged.”
That’s a different kind of answer.
Repositories
| Project | URL |
|---|---|
| certifiable-data | https://github.com/williamofai/certifiable-data |
| certifiable-training | https://github.com/williamofai/certifiable-training |
| certifiable-quant | https://github.com/williamofai/certifiable-quant |
| certifiable-deploy | https://github.com/williamofai/certifiable-deploy |
| certifiable-inference | https://github.com/williamofai/certifiable-inference |
| certifiable-monitor | https://github.com/williamofai/certifiable-monitor |
| certifiable-verify | https://github.com/williamofai/certifiable-verify |
| certifiable-harness | https://github.com/williamofai/certifiable-harness |
All projects are GPL-3.0 licensed. Commercial licensing available for organisations requiring proprietary deployment.
The certifiable-* ecosystem represents one approach to deterministic ML. As with any architectural choice, suitability depends on system requirements, risk classification, and regulatory context. The goal isn’t to replace general-purpose ML frameworks — it’s to enable ML in domains where those frameworks can’t currently go.
UK Patent Application GB2521625.0 — Murray Deterministic Computing Platform