Certifiable Training

Deterministic training loop with Merkle chain showing cryptographic commitment at each step

Training machine learning models is inherently non-deterministic. Floating-point operations vary across platforms. Parallel reductions produce different results each run. Data shuffling depends on random number generators that aren’t truly random.

For safety-critical systems, you cannot certify what you cannot reproduce.

certifiable-training redefines training as a deterministic state evolution. Training becomes a pure function: θ_T = T^(T)(θ_0, D, seed). Given the same initial weights, data, and seed, you get the same final model — bit for bit, every time, on every platform.

The Four Mechanisms

1. Fixed-Point Arithmetic

Q16.16 for weights, Q8.24 for gradients, Q32.32 for accumulators. Same math, same result, every platform.

Format	Use Case	Range	Precision
Q16.16	Weights, activations	±32768	1.5×10⁻⁵
Q8.24	Gradients	±128	5.9×10⁻⁸
Q32.32	Accumulators	±2³¹	2.3×10⁻¹⁰

The higher precision for gradients prevents small updates from being rounded away. The wide accumulators prevent overflow during summation.

2. Deterministic Reduction

Parallel gradient reduction is a major source of non-determinism. The order of floating-point additions affects the result. Different thread scheduling produces different sums.

certifiable-training uses fixed tree topology with Neumaier compensated summation:

// Fixed reduction tree — same topology every time
//       [sum]
//      /     \
//   [a+b]   [c+d]
//   /  \    /  \
//  a    b  c    d

The tree topology is determined at compile time, not runtime. Combined with compensated arithmetic, the reduction is deterministic regardless of hardware parallelism.

3. Reproducible “Randomness”

Data shuffling and dropout require random numbers. But standard PRNGs maintain internal state that varies with execution order.

We use counter-based PRNG: PRNG(seed, op_id, step) → deterministic bits. The output depends only on the inputs, not on hidden state. Same seed produces same sequence.

For data shuffling, we use a Cycle-Walking Feistel network that provides true bijection for any dataset size:

π: [0, N-1] → [0, N-1]  (one-to-one and onto)

This isn’t “shuffle and hope” — it’s a cryptographic permutation with proven properties.

4. Merkle Audit Trail

Every training step produces a cryptographic commitment:

h_t = SHA256(h_{t-1} || H(θ_t) || H(B_t) || t)

Where:

h_{t-1} is the previous hash (chain link)
H(θ_t) is the hash of current weights
H(B_t) is the hash of the current batch
t is the step number

Any step can be independently verified. If you claim “model X was produced by training Y,” the Merkle chain proves it — or proves you’re lying.

10/10

Test Suites

SHA-256

Audit Chain

Feistel

Shuffle

O(1)

Verify Time

What’s Implemented

All core modules complete — 10/10 test suites passing:

Module	Description
DVM Primitives	Fixed-point arithmetic with fault detection
Counter-based PRNG	Deterministic pseudo-random generation
Compensated Summation	Neumaier algorithm for precision
Reduction Tree	Fixed-topology parallel reduction
Forward Pass	Q16.16 activations (ReLU, sigmoid, tanh)
Backward Pass	Q8.24 gradient computation
Optimizers	SGD, Momentum, Adam
Merkle Chain	SHA256 audit trail with checkpoints
Data Permutation	Cycle-Walking Feistel bijection
Bit Identity	Cross-platform reproducibility tests

The Fault Model

Every arithmetic operation can overflow, underflow, or divide by zero. Traditional code either ignores these (undefined behaviour) or throws exceptions (non-deterministic control flow).

certifiable-training uses sticky fault flags:

typedef struct {
    uint32_t overflow    : 1;  // Saturated high
    uint32_t underflow   : 1;  // Saturated low
    uint32_t div_zero    : 1;  // Division by zero
    uint32_t domain      : 1;  // Invalid input
    uint32_t precision   : 1;  // Precision loss
} ct_fault_flags_t;

Operations saturate rather than overflow. Faults are recorded but execution continues deterministically. If any fault occurs during a training step, the Merkle chain is invalidated — you know something went wrong, and you know exactly when.

Usage Example

#include "ct_types.h"
#include "forward.h"
#include "backward.h"
#include "optimizer.h"
#include "merkle.h"

// All buffers pre-allocated
fixed_t weights[784 * 128];
grad_t gradients[784 * 128];
ct_fault_flags_t faults = {0};

// Initialize Merkle chain
ct_merkle_ctx_t merkle;
ct_merkle_init(&merkle, &weights_tensor, config, config_size, seed);

// Training step
ct_forward_linear(&layer, input, output, &faults);
ct_backward_linear(&layer, grad_in, &faults);
ct_sgd_step(&sgd, weights, gradients, size, &faults);

// Commit to audit trail
ct_merkle_step(&merkle, &weights_tensor, indices, batch_size, 
               &step_record, &faults);

if (ct_has_fault(&faults)) {
    // Training step invalid — chain not extended
}

Why It Matters

Regulatory Compliance

IEC 62304 (medical devices) requires traceable, reproducible software. ISO 26262 (automotive) demands provable behaviour. DO-178C (aerospace) requires complete requirements traceability.

“We trained a neural network and it works” satisfies none of these. “Here is the cryptographic proof that this model was produced by this exact training process” is a foundation for certification.

Incident Investigation

When an autonomous vehicle makes a bad decision, investigators need to understand why. With Merkle-chained training, you can prove exactly what training data and process produced the model. You can replay any training step and verify it matches the logged hash.

Model Provenance

In an era of model theft and supply chain attacks, proving where a model came from matters. The Merkle chain provides cryptographic attestation of model lineage.

The Certifiable Pipeline

certifiable-training is part of a complete deterministic ML pipeline:

Project	Purpose
certifiable-data	Deterministic data loading, shuffling, augmentation
certifiable-training	Deterministic training with Merkle audit
certifiable-inference	Deterministic inference

The chain is complete: deterministic data → deterministic training → deterministic inference. End-to-end reproducibility.

Getting Started

git clone https://github.com/williamofai/certifiable-training
cd certifiable-training
mkdir build && cd build
cmake ..
make
make test

Expected output:

100% tests passed, 0 tests failed out of 10
Total Test time (real) = 0.04 sec

Documentation

CT-MATH-001.md — Mathematical foundations
CT-STRUCT-001.md — Data structure specifications
docs/requirements/ — SRS documents with full traceability

Training as a pure function. Merkle-chained proof. GPL-3.0 licensed.

Certifiable Training

The Four Mechanisms

1. Fixed-Point Arithmetic

2. Deterministic Reduction

3. Reproducible “Randomness”

4. Merkle Audit Trail

What’s Implemented

The Fault Model

Usage Example

Why It Matters

Regulatory Compliance

Incident Investigation

Model Provenance

The Certifiable Pipeline

Getting Started

Documentation

About the Author

Questions or Contributions?

Certifiable Training

The Four Mechanisms

1. Fixed-Point Arithmetic

2. Deterministic Reduction

3. Reproducible “Randomness”

4. Merkle Audit Trail

What’s Implemented

The Fault Model

Usage Example

Why It Matters

Regulatory Compliance

Incident Investigation

Model Provenance

The Certifiable Pipeline

Getting Started

Documentation

About the Author

Occasional Technical Updates

Questions or Contributions?