certifiable-harness

How do you prove that your ML pipeline produces identical results on different hardware?

Not “similar” results. Not “statistically equivalent” results. Identical — bit for bit, hash for hash.

I ran the certifiable-harness on Linux with GCC and macOS with Clang. Different operating system. Different compiler. Seven pipeline stages.

Same SHA-256 hashes. Every stage. Every time.

certifiable-harness orchestrates all seven stages of the certifiable-* pipeline, captures cryptographic commitments, and compares them against a golden reference.

View on GitHub

The Problem

Traditional ML frameworks don’t even try for cross-platform determinism. Floating-point rounding varies by CPU. Memory allocation affects hash table iteration order. Thread scheduling is inherently non-deterministic.

For most applications, that’s fine. A 0.001% difference in model output doesn’t matter.

For safety-critical systems, it’s a fundamental problem. If you can’t prove that the model running on deployed hardware is exactly the same as what you tested, you can’t certify it.

The certifiable-* ecosystem solves this through fixed-point arithmetic, static allocation, and deterministic algorithms throughout. But how do you prove it works?

You run the harness.

How It Works

$ ./certifiable-harness --golden result.golden --output verify.json

═══════════════════════════════════════════════════════════════
  Certifiable Harness v1.0.0
  Platform: x86_64
═══════════════════════════════════════════════════════════════

  [0] data         ✓  (OK, 4 µs)
  [1] training     ✓  (OK, 3 µs)
  [2] quant        ✓  (OK, 3 µs)
  [3] deploy       ✓  (OK, 3 µs)
  [4] inference    ✓  (OK, 3 µs)
  [5] monitor      ✓  (OK, 4 µs)
  [6] verify       ✓  (OK, 8 µs)

  Status: ALL STAGES PASSED ✓
  Bit-identical: YES ✓
═══════════════════════════════════════════════════════════════

If any stage produces a different hash, the harness tells you exactly which one diverged. No more “it works on my machine” — either the hashes match or they don’t.

The Golden Reference

The harness generates a 368-byte golden reference file containing:

Offset	Size	Field
0x00	4	Magic (“CHGR”)
0x04	4	Version
0x08	32	Platform string
0x28	8	Timestamp
0x30	32	Config hash
0x50	32	Harness hash
0x70	224	Stage commitments (7 × 32)
0x150	32	File hash

The file hash covers bytes 0x00–0x14F, enabling tamper detection. If anyone modifies the golden reference, verification fails.

Generating a Golden Reference

./certifiable-harness --generate-golden --output result.json

This produces:

result.json — Human-readable report
result.json.golden — 368-byte binary for cross-platform comparison

Verifying Against Golden

Copy the golden file to another platform and run:

./certifiable-harness --golden result.json.golden --output their_result.json

If the hashes match: Bit-identical: YES ✓

Verified Platforms

Platform	OS	Compiler	Result
x86_64	Linux (Ubuntu)	GCC 12.2.0	✓ Bit-identical
x86_64	macOS 11.7	Apple Clang	✓ Bit-identical
aarch64	—	—	Pending
riscv64	—	—	Pending

The Linux and macOS results were generated on different machines, different operating systems, different compilers. Same hashes.

Seven Pipeline Stages

Each stage corresponds to a certifiable-* project:

Stage	Project	Commitment
0	certifiable-data	Merkle root of batches
1	certifiable-training	Training chain hash
2	certifiable-quant	Quantization certificate
3	certifiable-deploy	Attestation root
4	certifiable-inference	Predictions hash
5	certifiable-monitor	Ledger digest
6	certifiable-verify	Report hash

The harness runs them in sequence, passing context between stages. Each stage’s commitment includes the previous stage’s commitment, forming an unbroken cryptographic chain.

Test Coverage

Component	Tests	Description
Harness	4	Orchestration, config, platform detection
Golden	3	Load, save, compare, integrity
Stages	4	Stage wrappers, dependency management
Report	2	JSON generation, console output

4 test suites, all passing. 81 traceable requirements across 4 SRS documents.

Part of a Complete Pipeline

certifiable-harness orchestrates the entire certifiable-* ecosystem:

data → training → quant → deploy → inference → monitor → verify
  ↑                                                          ↓
  └──────────────── certifiable-harness ────────────────────┘

It’s the proving ground — the place where cross-platform determinism is verified, not assumed.

When You Need This

Hardware Vendors: If you’re building AI accelerators — RISC-V, custom silicon, FPGAs — certifiable-harness lets you prove your hardware produces bit-identical results to reference implementations.

Certification Bodies: The harness produces machine-verifiable evidence. No manual inspection required. Run the harness, check the hashes, file the report.

Safety-Critical Deployments: When a regulator asks “how do you know the deployed model is the same as what you tested?”, you have a 368-byte answer.

Getting Started

git clone https://github.com/williamofai/certifiable-harness.git
cd certifiable-harness
mkdir build && cd build
cmake ..
make
ctest --output-on-failure

# Generate golden reference
./certifiable-harness --generate-golden --output result.json

# Verify (should show Bit-identical: YES)
./certifiable-harness --golden result.json.golden --output verify.json

Documentation

The repository includes formal documentation suitable for certification evidence:

CH-MATH-001.md — Mathematical specification (18KB)
CH-STRUCT-001.md — Data structure specification
SRS-HARNESS — Harness orchestration (16 requirements)
SRS-GOLDEN — Golden reference (23 requirements)
SRS-STAGES — Stage wrappers (25 requirements)
SRS-REPORT — Report generation (17 requirements)

License

Dual licensed under GPLv3 (open source) and commercial terms for proprietary safety-critical systems. The implementation builds on the Murray Deterministic Computing Platform (UK Patent GB2521625.0).

For teams building safety-critical ML systems, certifiable-harness provides the proving ground that certification demands. As with any architectural approach, suitability depends on system requirements, risk classification, and regulatory context.

View on GitHub · Request Technical Brief

certifiable-harness

The Problem

How It Works

The Golden Reference

Generating a Golden Reference

Verifying Against Golden

Verified Platforms

Seven Pipeline Stages

Test Coverage

Part of a Complete Pipeline

When You Need This

Getting Started

Documentation

License

About the Author

Questions or Contributions?

certifiable-harness

The Problem

How It Works

The Golden Reference

Generating a Golden Reference

Verifying Against Golden

Verified Platforms

Seven Pipeline Stages

Test Coverage

Part of a Complete Pipeline

When You Need This

Getting Started

Documentation

License

About the Author

Occasional Technical Updates

Questions or Contributions?