Deterministic ML GPL-3.0

certifiable-harness

End-to-end test harness for deterministic ML — because 'it works on my machine' isn't certifiable

GitHub Repository
Published
January 19, 2026 22:00
Reading Time
6 min
certifiable-harness: Cross-platform bit-identity verification showing Linux and macOS producing identical SHA-256 hashes

How do you prove that your ML pipeline produces identical results on different hardware?

Not “similar” results. Not “statistically equivalent” results. Identical — bit for bit, hash for hash.

I ran the certifiable-harness on Linux with GCC and macOS with Clang. Different operating system. Different compiler. Seven pipeline stages.

Same SHA-256 hashes. Every stage. Every time.

certifiable-harness orchestrates all seven stages of the certifiable-* pipeline, captures cryptographic commitments, and compares them against a golden reference.

View on GitHub

The Problem

Traditional ML frameworks don’t even try for cross-platform determinism. Floating-point rounding varies by CPU. Memory allocation affects hash table iteration order. Thread scheduling is inherently non-deterministic.

For most applications, that’s fine. A 0.001% difference in model output doesn’t matter.

For safety-critical systems, it’s a fundamental problem. If you can’t prove that the model running on deployed hardware is exactly the same as what you tested, you can’t certify it.

The certifiable-* ecosystem solves this through fixed-point arithmetic, static allocation, and deterministic algorithms throughout. But how do you prove it works?

You run the harness.

How It Works

$ ./certifiable-harness --golden result.golden --output verify.json

═══════════════════════════════════════════════════════════════
  Certifiable Harness v1.0.0
  Platform: x86_64
═══════════════════════════════════════════════════════════════

  [0] data  (OK, 4 µs)
  [1] training  (OK, 3 µs)
  [2] quant  (OK, 3 µs)
  [3] deploy  (OK, 3 µs)
  [4] inference  (OK, 3 µs)
  [5] monitor  (OK, 4 µs)
  [6] verify  (OK, 8 µs)

  Status: ALL STAGES PASSED
  Bit-identical: YES
═══════════════════════════════════════════════════════════════

If any stage produces a different hash, the harness tells you exactly which one diverged. No more “it works on my machine” — either the hashes match or they don’t.

The Golden Reference

The harness generates a 368-byte golden reference file containing:

OffsetSizeField
0x004Magic (“CHGR”)
0x044Version
0x0832Platform string
0x288Timestamp
0x3032Config hash
0x5032Harness hash
0x70224Stage commitments (7 × 32)
0x15032File hash

The file hash covers bytes 0x00–0x14F, enabling tamper detection. If anyone modifies the golden reference, verification fails.

Generating a Golden Reference

./certifiable-harness --generate-golden --output result.json

This produces:

  • result.json — Human-readable report
  • result.json.golden — 368-byte binary for cross-platform comparison

Verifying Against Golden

Copy the golden file to another platform and run:

./certifiable-harness --golden result.json.golden --output their_result.json

If the hashes match: Bit-identical: YES ✓

Verified Platforms

PlatformOSCompilerResult
x86_64Linux (Ubuntu)GCC 12.2.0✓ Bit-identical
x86_64macOS 11.7Apple Clang✓ Bit-identical
aarch64Pending
riscv64Pending

The Linux and macOS results were generated on different machines, different operating systems, different compilers. Same hashes.

Seven Pipeline Stages

Each stage corresponds to a certifiable-* project:

StageProjectCommitment
0certifiable-dataMerkle root of batches
1certifiable-trainingTraining chain hash
2certifiable-quantQuantization certificate
3certifiable-deployAttestation root
4certifiable-inferencePredictions hash
5certifiable-monitorLedger digest
6certifiable-verifyReport hash

The harness runs them in sequence, passing context between stages. Each stage’s commitment includes the previous stage’s commitment, forming an unbroken cryptographic chain.

Test Coverage

ComponentTestsDescription
Harness4Orchestration, config, platform detection
Golden3Load, save, compare, integrity
Stages4Stage wrappers, dependency management
Report2JSON generation, console output

4 test suites, all passing. 81 traceable requirements across 4 SRS documents.

Part of a Complete Pipeline

certifiable-harness orchestrates the entire certifiable-* ecosystem:

data → training → quant → deploy → inference → monitor → verify
  ↑                                                          ↓
  └──────────────── certifiable-harness ────────────────────┘

It’s the proving ground — the place where cross-platform determinism is verified, not assumed.

When You Need This

Hardware Vendors: If you’re building AI accelerators — RISC-V, custom silicon, FPGAs — certifiable-harness lets you prove your hardware produces bit-identical results to reference implementations.

Certification Bodies: The harness produces machine-verifiable evidence. No manual inspection required. Run the harness, check the hashes, file the report.

Safety-Critical Deployments: When a regulator asks “how do you know the deployed model is the same as what you tested?”, you have a 368-byte answer.

Getting Started

git clone https://github.com/williamofai/certifiable-harness.git
cd certifiable-harness
mkdir build && cd build
cmake ..
make
ctest --output-on-failure

# Generate golden reference
./certifiable-harness --generate-golden --output result.json

# Verify (should show Bit-identical: YES)
./certifiable-harness --golden result.json.golden --output verify.json

Documentation

The repository includes formal documentation suitable for certification evidence:

  • CH-MATH-001.md — Mathematical specification (18KB)
  • CH-STRUCT-001.md — Data structure specification
  • SRS-HARNESS — Harness orchestration (16 requirements)
  • SRS-GOLDEN — Golden reference (23 requirements)
  • SRS-STAGES — Stage wrappers (25 requirements)
  • SRS-REPORT — Report generation (17 requirements)

License

Dual licensed under GPLv3 (open source) and commercial terms for proprietary safety-critical systems. The implementation builds on the Murray Deterministic Computing Platform (UK Patent GB2521625.0).


For teams building safety-critical ML systems, certifiable-harness provides the proving ground that certification demands. As with any architectural approach, suitability depends on system requirements, risk classification, and regulatory context.

View on GitHub · Request Technical Brief

About the Author

William Murray is a Regenerative Systems Architect with 30 years of UNIX infrastructure experience, specializing in deterministic computing for safety-critical systems. Based in the Scottish Highlands, he operates SpeyTech and maintains several open-source projects including C-Sentinel and c-from-scratch.

Questions or Contributions?

Open an issue on GitHub or get in touch directly.

View on GitHub Contact
← Back to Open Source