Git is excellent version control. It tracks source code changes, enables collaboration, and provides complete history of who changed what and when.
But Git tracks source. It says nothing about:
- Whether the same source produces the same binary
- What training data produced a given model
- Whether the deployed model matches the trained model
- What the system’s state was at any historical point
For safety-critical systems governed by DO-178C, IEC 62304, or ISO 26262, these gaps are disqualifying. Certification requires traceability from requirements through implementation to deployed artefacts. “Trust me, I committed it” isn’t evidence.
What Certification Actually Requires
DO-178C (Aerospace)
DO-178C Level A software - the kind that can crash an aircraft if it fails - requires:
Configuration Management: Every artefact (source, object code, executable, test data) must be identified, controlled, and traceable.
Traceability: Requirements trace to design, design traces to code, code traces to tests, tests trace to results. Bidirectionally.
Reproducibility: Given the same inputs (source, compiler, configuration), the build process must produce the same outputs. Always.
Data Integrity: Evidence that artefacts haven’t been modified since verification.
Git provides some of this - commit hashes identify source versions, history shows changes. But Git doesn’t verify that yesterday’s build produces today’s binary. Git doesn’t track which dataset trained which model. Git doesn’t attest that the deployed binary matches the verified one.
IEC 62304 (Medical Devices)
Class C medical device software - software whose failure could cause death - has similar requirements:
Software Configuration Management: Identify and control all software items, including documentation, source code, and executables.
Traceability Matrix: Map between requirements, design, implementation, and verification.
Problem Resolution: Track every issue from discovery through resolution, with evidence.
Again, Git handles source. But a neural network model isn’t source code. Training data isn’t source code. The relationship between a dataset, a training run, and a resulting model isn’t captured in commits.
ISO 26262 (Automotive)
ASIL-D software - the highest automotive safety integrity level - requires:
Configuration Management: Control of all work products, including code, documentation, and calibration data.
Change Management: Every change must be authorised, implemented, verified, and recorded.
Baseline Management: Consistent sets of items that form a release must be identified and controlled.
For autonomous vehicles using machine learning, the “calibration data” includes training datasets and model weights. These aren’t text files that diff cleanly in Git.
The ML Pipeline Problem
Machine learning makes version control harder because the artefacts are different:
Training Data
A dataset might be:
- 100GB of images
- Continuously updated from production
- Preprocessed through multiple stages
- Augmented with synthetic variations
Git isn’t designed for this. LFS helps with large files but doesn’t provide cryptographic integrity verification. DVC tracks data but doesn’t create certification-grade audit trails.
What certification needs: Proof that the exact data used to train model version X was dataset version Y, with cryptographic verification that neither has been modified.
Training Process
A training run involves:
- Random initialisation (or does it?)
- Stochastic gradient descent with shuffled batches
- Hardware-dependent floating-point behaviour
- Non-deterministic GPU operations
- Hyperparameter choices that may not be recorded
Two training runs with “the same” configuration often produce different models. Which one was deployed? Which one was verified? Can you prove it?
What certification needs: A complete, reproducible record of every training step, with cryptographic proof that the final model is the product of that exact process.
Model Artefacts
A trained model is:
- Weights (potentially gigabytes of floating-point numbers)
- Architecture definition
- Preprocessing configuration
- Postprocessing logic
- Runtime dependencies
These aren’t source code. They’re binary artefacts that must be tracked, versioned, and verified with the same rigour as source.
What certification needs: Cryptographic attestation that the deployed model is bit-identical to the verified model, with a chain of custody from training through deployment.
Merkle Chains: Cryptographic Audit Trails
The certifiable-* ecosystem addresses these requirements using Merkle chains - the same cryptographic structure that secures blockchains, but applied to ML pipeline provenance.
How Merkle Chains Work
A Merkle chain is a sequence of cryptographic hashes where each hash includes the previous hash:
H₀ = SHA256(initial_state)
H₁ = SHA256(H₀ || step_1_data)
H₂ = SHA256(H₁ || step_2_data)
...
Hₙ = SHA256(Hₙ₋₁ || step_n_data)This creates a tamper-evident chain:
- Changing any step invalidates all subsequent hashes
- The final hash commits to the entire history
- Verification requires only the final hash and the step data
Applied to ML Pipelines
The certifiable-data project creates Merkle chains for data pipelines:
// Each data batch gets a hash that chains to previous batches
typedef struct {
uint8_t prev_hash[32]; // Previous batch hash
uint8_t data_hash[32]; // This batch's content hash
uint64_t batch_index; // Monotonic batch number
uint64_t timestamp; // Processing timestamp
uint8_t chain_hash[32]; // H(prev_hash || data_hash || metadata)
} data_provenance_t;When training begins, the initial data state is hashed. Each preprocessing step - normalisation, augmentation, shuffling - extends the chain. The final chain hash commits to the entire data preparation process.
Training Chains
The certifiable-training project extends this to training:
// Each training step extends the Merkle chain
typedef struct {
uint8_t prev_hash[32]; // Previous step hash
uint8_t weights_hash[32]; // Current weights hash
uint8_t gradients_hash[32];// This step's gradients
uint32_t epoch;
uint32_t batch;
uint8_t chain_hash[32]; // Commits to entire training history
} training_step_t;Every gradient update is recorded. The chain hash after training commits to:
- The initial weights
- Every batch of training data (via data chain)
- Every gradient computation
- Every weight update
- The final model
This isn’t logging - it’s cryptographic proof. Given the chain, anyone can verify that the final model is the exact product of that training process.
Deployment Attestation
The certifiable-deploy project packages models with cryptographic attestation:
// Deployment bundle with attestation
typedef struct {
uint8_t model_hash[32]; // Hash of model weights
uint8_t training_chain[32]; // Final training chain hash
uint8_t data_chain[32]; // Final data chain hash
uint8_t source_commit[20]; // Git commit of training code
uint8_t config_hash[32]; // Hash of all configuration
uint8_t bundle_signature[64]; // Ed25519 signature over all above
} deployment_attestation_t;The deployment bundle includes:
- The model itself
- Cryptographic proof of its provenance
- Signature from an authorised build system
Verification checks that:
- The model hash matches the bundle contents
- The training chain is valid
- The data chain is valid
- The signature is valid
If any check fails, the model is rejected. There’s no way to deploy a model without valid provenance.
Reproducible Builds
Cryptographic hashes prove that artefacts haven’t changed. But certification also requires reproducibility - the ability to recreate artefacts from source.
The Reproducibility Problem
Traditional builds are not reproducible:
$ gcc -O2 main.c -o program
$ sha256sum program
a1b2c3d4...
$ gcc -O2 main.c -o program # Same command, same source
$ sha256sum program
e5f6g7h8... # Different hash!The difference comes from:
- Timestamps embedded in binaries
- Non-deterministic compiler optimisations
- File ordering in archives
- Environment-dependent paths
For certification, this is unacceptable. If you can’t reproduce the binary, you can’t verify it.
Deterministic Compilation
The certifiable-* projects enforce deterministic builds:
# Reproducible build flags
CFLAGS += -fno-guess-branch-probability # Deterministic optimisation
CFLAGS += -frandom-seed=fixed # Fixed randomisation seed
CFLAGS += -D__DATE__="\"Jan 01 2026\"" # Fixed date
CFLAGS += -D__TIME__="\"00:00:00\"" # Fixed time
# Reproducible archive creation
AR_FLAGS = Dcr # Deterministic mode, no timestampsWith these flags, the same source always produces the same binary. The binary hash becomes a verifiable identifier.
Deterministic Training
Neural network training is notoriously non-reproducible. The certifiable-training approach eliminates non-determinism:
Fixed-point arithmetic: No floating-point means no hardware-dependent rounding. Q16.16 operations produce identical results on any platform.
Deterministic shuffling: Feistel-based shuffling with fixed seeds, as described in The Feistel Shuffle, provides reproducible data ordering.
Deterministic reduction: Gradient accumulation uses fixed ordering, eliminating parallel reduction non-determinism.
Explicit random seeds: All randomness comes from seeded PRNGs with recorded seeds.
The result: given the same data, configuration, and seeds, training produces bit-identical models on any platform.
The Evidence Package
Certification auditors don’t just want hashes - they want a complete evidence package that demonstrates compliance.
Requirements Traceability Matrix
REQ-001: System shall detect anomalies within 100ms
→ DESIGN-001: Use EMA-based baseline with O(1) update
→ CODE: baseline.c:update_monitoring()
→ TEST: test_baseline.c:test_detection_latency()
→ RESULT: PASS (avg 2.3ms, max 8.1ms)Git tracks the code. The Merkle chain tracks the test results. Together they provide bidirectional traceability.
Configuration Baseline
A baseline captures the complete state at a release point:
{
"baseline_id": "v1.2.0",
"timestamp": "2026-01-27T10:00:00Z",
"components": {
"source": {
"repository": "speytech/certifiable-inference",
"commit": "abc123...",
"commit_hash": "sha256:def456..."
},
"training_data": {
"dataset": "sensor-readings-v3",
"chain_hash": "sha256:789abc..."
},
"model": {
"architecture": "cnn-small",
"weights_hash": "sha256:fed987...",
"training_chain": "sha256:654321..."
},
"binary": {
"platform": "arm64-linux",
"hash": "sha256:aabbcc..."
}
},
"attestation": {
"signer": "build-system-key-001",
"signature": "ed25519:..."
}
}This baseline is a complete, verifiable snapshot. Given this document, auditors can:
- Retrieve the exact source code
- Verify the training data provenance
- Verify the model provenance
- Reproduce the binary
- Confirm nothing has been modified
Change Control Records
Every change to a baselined system requires documentation:
{
"change_id": "CR-2026-001",
"baseline_from": "v1.2.0",
"baseline_to": "v1.2.1",
"description": "Update anomaly threshold from 3.0 to 2.5 sigma",
"justification": "Field data shows 3.0 sigma misses 12% of anomalies",
"impact_analysis": {
"code_changes": ["config.h:ANOMALY_THRESHOLD"],
"test_changes": ["test_baseline.c:test_threshold_values"],
"documentation_changes": ["user-manual.md:section-4.2"]
},
"verification": {
"tests_passed": true,
"regression_clear": true,
"review_approved": "2026-01-26"
},
"approval": {
"approver": "quality-manager",
"date": "2026-01-27",
"signature": "ed25519:..."
}
}This isn’t bureaucracy - it’s evidence that changes are controlled, reviewed, and verified. The cryptographic signatures make it tamper-evident.
Practical Implementation
Directory Structure
A certifiable project includes evidence alongside code:
project/
├── src/ # Source code (Git tracked)
├── include/ # Headers (Git tracked)
├── tests/ # Test code (Git tracked)
├── evidence/ # Certification evidence
│ ├── requirements/ # Requirements documents
│ ├── design/ # Design documents
│ ├── traceability/ # RTM and matrices
│ ├── test-results/ # Test execution records
│ └── baselines/ # Release baselines
├── chains/ # Merkle chain data
│ ├── data/ # Data provenance chains
│ ├── training/ # Training step chains
│ └── builds/ # Build attestations
└── releases/ # Signed release packagesAutomated Chain Generation
The Merkle chains are generated automatically during pipeline execution:
// During data preprocessing
void process_batch(batch_t *batch, chain_t *chain) {
// Process the data
normalize(batch);
augment(batch);
// Extend the chain
uint8_t batch_hash[32];
sha256(batch->data, batch->size, batch_hash);
chain_extend(chain, batch_hash);
}
// During training
void training_step(model_t *model, batch_t *batch, chain_t *chain) {
// Compute gradients
gradient_t grads = compute_gradients(model, batch);
// Update weights
apply_gradients(model, &grads);
// Extend chain with step record
step_record_t record = {
.weights_hash = hash_weights(model),
.gradients_hash = hash_gradients(&grads),
.batch_index = batch->index
};
chain_extend(chain, &record);
}The chain generation is part of normal operation, not a separate audit step.
Verification Tools
The certifiable-verify project provides verification tools:
# Verify a deployment bundle
$ certify-verify bundle model-v1.2.0.cbf
Verifying bundle: model-v1.2.0.cbf
Model hash: OK (matches embedded hash)
Training chain: OK (2,847 steps verified)
Data chain: OK (1,203 batches verified)
Signature: OK (signed by build-key-001)
Bundle verification: PASSED
# Reproduce and verify a build
$ certify-rebuild --baseline v1.2.0 --verify
Rebuilding from baseline v1.2.0
Fetching source: abc123...
Fetching config: def456...
Building: arm64-linux
Binary hash: sha256:aabbcc...
Expected: sha256:aabbcc...
Reproduction: MATCHEDThese tools enable auditors to verify claims independently.
Integration with Git
The Merkle chain approach complements Git rather than replacing it:
Git tracks: Source code, configuration files, documentation, test code
Chains track: Data provenance, training history, build artefacts, deployment attestation
Together they provide: Complete traceability from requirements through deployment
The connection points are explicit:
{
"training_chain_final": "sha256:abc123...",
"source_commit": "git:def456...",
"link_attestation": "This training chain was produced by code at commit def456..."
}Conclusion
Git is necessary but not sufficient for safety-critical systems. Source control tracks code changes, but certification requires tracking everything: data, training, builds, and deployments.
Merkle chains provide the cryptographic foundation:
- Tamper-evident history that proves integrity
- Verifiable provenance from data to deployment
- Reproducible builds that auditors can check
The certifiable-* ecosystem implements these patterns for ML pipelines, as detailed in Merkle Chains for ML Audit Trails and Cryptographic Execution Tracing.
The overhead is real - generating chains, storing evidence, maintaining baselines all require effort. But for systems where certification is required, this overhead is unavoidable. The choice is between systematic evidence generation and ad-hoc documentation that auditors will reject.
For systems that don’t require formal certification, the same principles provide value: reproducible builds catch “works on my machine” problems, provenance chains enable debugging, and attestation prevents accidental deployment of wrong artefacts.
Git tracks what you wrote. Merkle chains prove what you built.