How do you prove that your ML pipeline produces identical results on different hardware?
Not “similar” results. Not “statistically equivalent” results. Identical — bit for bit, hash for hash.
I ran the certifiable-harness on Linux with GCC and macOS with Clang. Different operating system. Different compiler. Seven pipeline stages.
Same SHA-256 hashes. Every stage. Every time.
certifiable-harness orchestrates all seven stages of the certifiable-* pipeline, captures cryptographic commitments, and compares them against a golden reference.
The Problem
Traditional ML frameworks don’t even try for cross-platform determinism. Floating-point rounding varies by CPU. Memory allocation affects hash table iteration order. Thread scheduling is inherently non-deterministic.
For most applications, that’s fine. A 0.001% difference in model output doesn’t matter.
For safety-critical systems, it’s a fundamental problem. If you can’t prove that the model running on deployed hardware is exactly the same as what you tested, you can’t certify it.
The certifiable-* ecosystem solves this through fixed-point arithmetic, static allocation, and deterministic algorithms throughout. But how do you prove it works?
You run the harness.
How It Works
$ ./certifiable-harness --golden result.golden --output verify.json
═══════════════════════════════════════════════════════════════
Certifiable Harness v1.0.0
Platform: x86_64
═══════════════════════════════════════════════════════════════
[0] data ✓ (OK, 4 µs)
[1] training ✓ (OK, 3 µs)
[2] quant ✓ (OK, 3 µs)
[3] deploy ✓ (OK, 3 µs)
[4] inference ✓ (OK, 3 µs)
[5] monitor ✓ (OK, 4 µs)
[6] verify ✓ (OK, 8 µs)
Status: ALL STAGES PASSED ✓
Bit-identical: YES ✓
═══════════════════════════════════════════════════════════════If any stage produces a different hash, the harness tells you exactly which one diverged. No more “it works on my machine” — either the hashes match or they don’t.
The Golden Reference
The harness generates a 368-byte golden reference file containing:
| Offset | Size | Field |
|---|---|---|
| 0x00 | 4 | Magic (“CHGR”) |
| 0x04 | 4 | Version |
| 0x08 | 32 | Platform string |
| 0x28 | 8 | Timestamp |
| 0x30 | 32 | Config hash |
| 0x50 | 32 | Harness hash |
| 0x70 | 224 | Stage commitments (7 × 32) |
| 0x150 | 32 | File hash |
The file hash covers bytes 0x00–0x14F, enabling tamper detection. If anyone modifies the golden reference, verification fails.
Generating a Golden Reference
./certifiable-harness --generate-golden --output result.jsonThis produces:
result.json— Human-readable reportresult.json.golden— 368-byte binary for cross-platform comparison
Verifying Against Golden
Copy the golden file to another platform and run:
./certifiable-harness --golden result.json.golden --output their_result.jsonIf the hashes match: Bit-identical: YES ✓
Verified Platforms
| Platform | OS | Compiler | Result |
|---|---|---|---|
| x86_64 | Linux (Ubuntu) | GCC 12.2.0 | ✓ Bit-identical |
| x86_64 | macOS 11.7 | Apple Clang | ✓ Bit-identical |
| aarch64 | — | — | Pending |
| riscv64 | — | — | Pending |
The Linux and macOS results were generated on different machines, different operating systems, different compilers. Same hashes.
Seven Pipeline Stages
Each stage corresponds to a certifiable-* project:
| Stage | Project | Commitment |
|---|---|---|
| 0 | certifiable-data | Merkle root of batches |
| 1 | certifiable-training | Training chain hash |
| 2 | certifiable-quant | Quantization certificate |
| 3 | certifiable-deploy | Attestation root |
| 4 | certifiable-inference | Predictions hash |
| 5 | certifiable-monitor | Ledger digest |
| 6 | certifiable-verify | Report hash |
The harness runs them in sequence, passing context between stages. Each stage’s commitment includes the previous stage’s commitment, forming an unbroken cryptographic chain.
Test Coverage
| Component | Tests | Description |
|---|---|---|
| Harness | 4 | Orchestration, config, platform detection |
| Golden | 3 | Load, save, compare, integrity |
| Stages | 4 | Stage wrappers, dependency management |
| Report | 2 | JSON generation, console output |
4 test suites, all passing. 81 traceable requirements across 4 SRS documents.
Part of a Complete Pipeline
certifiable-harness orchestrates the entire certifiable-* ecosystem:
data → training → quant → deploy → inference → monitor → verify
↑ ↓
└──────────────── certifiable-harness ────────────────────┘It’s the proving ground — the place where cross-platform determinism is verified, not assumed.
When You Need This
Hardware Vendors: If you’re building AI accelerators — RISC-V, custom silicon, FPGAs — certifiable-harness lets you prove your hardware produces bit-identical results to reference implementations.
Certification Bodies: The harness produces machine-verifiable evidence. No manual inspection required. Run the harness, check the hashes, file the report.
Safety-Critical Deployments: When a regulator asks “how do you know the deployed model is the same as what you tested?”, you have a 368-byte answer.
Getting Started
git clone https://github.com/williamofai/certifiable-harness.git
cd certifiable-harness
mkdir build && cd build
cmake ..
make
ctest --output-on-failure
# Generate golden reference
./certifiable-harness --generate-golden --output result.json
# Verify (should show Bit-identical: YES)
./certifiable-harness --golden result.json.golden --output verify.jsonDocumentation
The repository includes formal documentation suitable for certification evidence:
- CH-MATH-001.md — Mathematical specification (18KB)
- CH-STRUCT-001.md — Data structure specification
- SRS-HARNESS — Harness orchestration (16 requirements)
- SRS-GOLDEN — Golden reference (23 requirements)
- SRS-STAGES — Stage wrappers (25 requirements)
- SRS-REPORT — Report generation (17 requirements)
License
Dual licensed under GPLv3 (open source) and commercial terms for proprietary safety-critical systems. The implementation builds on the Murray Deterministic Computing Platform (UK Patent GB2521625.0).
For teams building safety-critical ML systems, certifiable-harness provides the proving ground that certification demands. As with any architectural approach, suitability depends on system requirements, risk classification, and regulatory context.