Version Control for Deterministic Systems: Git Isn't Enough

Diagram showing what Git tracks versus what certification requires, with Merkle audit chains bridging the gap

Git is excellent version control. It tracks source code changes, enables collaboration, and provides complete history of who changed what and when.

But Git tracks source. It says nothing about:

Whether the same source produces the same binary
What training data produced a given model
Whether the deployed model matches the trained model
What the system’s state was at any historical point

For safety-critical systems governed by DO-178C, IEC 62304, or ISO 26262, these gaps are disqualifying. Certification requires traceability from requirements through implementation to deployed artefacts. “Trust me, I committed it” isn’t evidence.

What Certification Actually Requires

DO-178C (Aerospace)

DO-178C Level A software - the kind that can crash an aircraft if it fails - requires:

Configuration Management: Every artefact (source, object code, executable, test data) must be identified, controlled, and traceable.

Traceability: Requirements trace to design, design traces to code, code traces to tests, tests trace to results. Bidirectionally.

Reproducibility: Given the same inputs (source, compiler, configuration), the build process must produce the same outputs. Always.

Data Integrity: Evidence that artefacts haven’t been modified since verification.

Git provides some of this - commit hashes identify source versions, history shows changes. But Git doesn’t verify that yesterday’s build produces today’s binary. Git doesn’t track which dataset trained which model. Git doesn’t attest that the deployed binary matches the verified one.

IEC 62304 (Medical Devices)

Class C medical device software - software whose failure could cause death - has similar requirements:

Software Configuration Management: Identify and control all software items, including documentation, source code, and executables.

Traceability Matrix: Map between requirements, design, implementation, and verification.

Problem Resolution: Track every issue from discovery through resolution, with evidence.

Again, Git handles source. But a neural network model isn’t source code. Training data isn’t source code. The relationship between a dataset, a training run, and a resulting model isn’t captured in commits.

ISO 26262 (Automotive)

ASIL-D software - the highest automotive safety integrity level - requires:

Configuration Management: Control of all work products, including code, documentation, and calibration data.

Change Management: Every change must be authorised, implemented, verified, and recorded.

Baseline Management: Consistent sets of items that form a release must be identified and controlled.

For autonomous vehicles using machine learning, the “calibration data” includes training datasets and model weights. These aren’t text files that diff cleanly in Git.

The ML Pipeline Problem

Machine learning makes version control harder because the artefacts are different:

Training Data

A dataset might be:

100GB of images
Continuously updated from production
Preprocessed through multiple stages
Augmented with synthetic variations

Git isn’t designed for this. LFS helps with large files but doesn’t provide cryptographic integrity verification. DVC tracks data but doesn’t create certification-grade audit trails.

What certification needs: Proof that the exact data used to train model version X was dataset version Y, with cryptographic verification that neither has been modified.

Training Process

A training run involves:

Random initialisation (or does it?)
Stochastic gradient descent with shuffled batches
Hardware-dependent floating-point behaviour
Non-deterministic GPU operations
Hyperparameter choices that may not be recorded

Two training runs with “the same” configuration often produce different models. Which one was deployed? Which one was verified? Can you prove it?

What certification needs: A complete, reproducible record of every training step, with cryptographic proof that the final model is the product of that exact process.

Model Artefacts

A trained model is:

Weights (potentially gigabytes of floating-point numbers)
Architecture definition
Preprocessing configuration
Postprocessing logic
Runtime dependencies

These aren’t source code. They’re binary artefacts that must be tracked, versioned, and verified with the same rigour as source.

What certification needs: Cryptographic attestation that the deployed model is bit-identical to the verified model, with a chain of custody from training through deployment.

Merkle Chains: Cryptographic Audit Trails

The certifiable-* ecosystem addresses these requirements using Merkle chains - the same cryptographic structure that secures blockchains, but applied to ML pipeline provenance.

How Merkle Chains Work

A Merkle chain is a sequence of cryptographic hashes where each hash includes the previous hash:

H₀ = SHA256(initial_state)
H₁ = SHA256(H₀ || step_1_data)
H₂ = SHA256(H₁ || step_2_data)
...
Hₙ = SHA256(Hₙ₋₁ || step_n_data)

This creates a tamper-evident chain:

Changing any step invalidates all subsequent hashes
The final hash commits to the entire history
Verification requires only the final hash and the step data

Applied to ML Pipelines

The certifiable-data project creates Merkle chains for data pipelines:

// Each data batch gets a hash that chains to previous batches
typedef struct {
    uint8_t prev_hash[32];     // Previous batch hash
    uint8_t data_hash[32];     // This batch's content hash
    uint64_t batch_index;       // Monotonic batch number
    uint64_t timestamp;         // Processing timestamp
    uint8_t chain_hash[32];    // H(prev_hash || data_hash || metadata)
} data_provenance_t;

When training begins, the initial data state is hashed. Each preprocessing step - normalisation, augmentation, shuffling - extends the chain. The final chain hash commits to the entire data preparation process.

Training Chains

The certifiable-training project extends this to training:

// Each training step extends the Merkle chain
typedef struct {
    uint8_t prev_hash[32];     // Previous step hash
    uint8_t weights_hash[32];  // Current weights hash
    uint8_t gradients_hash[32];// This step's gradients
    uint32_t epoch;
    uint32_t batch;
    uint8_t chain_hash[32];    // Commits to entire training history
} training_step_t;

Every gradient update is recorded. The chain hash after training commits to:

The initial weights
Every batch of training data (via data chain)
Every gradient computation
Every weight update
The final model

This isn’t logging - it’s cryptographic proof. Given the chain, anyone can verify that the final model is the exact product of that training process.

Deployment Attestation

The certifiable-deploy project packages models with cryptographic attestation:

// Deployment bundle with attestation
typedef struct {
    uint8_t model_hash[32];        // Hash of model weights
    uint8_t training_chain[32];    // Final training chain hash
    uint8_t data_chain[32];        // Final data chain hash
    uint8_t source_commit[20];     // Git commit of training code
    uint8_t config_hash[32];       // Hash of all configuration
    uint8_t bundle_signature[64];  // Ed25519 signature over all above
} deployment_attestation_t;

The deployment bundle includes:

The model itself
Cryptographic proof of its provenance
Signature from an authorised build system

Verification checks that:

The model hash matches the bundle contents
The training chain is valid
The data chain is valid
The signature is valid

If any check fails, the model is rejected. There’s no way to deploy a model without valid provenance.

Reproducible Builds

Cryptographic hashes prove that artefacts haven’t changed. But certification also requires reproducibility - the ability to recreate artefacts from source.

The Reproducibility Problem

Traditional builds are not reproducible:

$ gcc -O2 main.c -o program
$ sha256sum program
a1b2c3d4...

$ gcc -O2 main.c -o program  # Same command, same source
$ sha256sum program
e5f6g7h8...  # Different hash!

The difference comes from:

Timestamps embedded in binaries
Non-deterministic compiler optimisations
File ordering in archives
Environment-dependent paths

For certification, this is unacceptable. If you can’t reproduce the binary, you can’t verify it.

Deterministic Compilation

The certifiable-* projects enforce deterministic builds:

# Reproducible build flags
CFLAGS += -fno-guess-branch-probability  # Deterministic optimisation
CFLAGS += -frandom-seed=fixed            # Fixed randomisation seed
CFLAGS += -D__DATE__="\"Jan 01 2026\""   # Fixed date
CFLAGS += -D__TIME__="\"00:00:00\""      # Fixed time

# Reproducible archive creation
AR_FLAGS = Dcr  # Deterministic mode, no timestamps

With these flags, the same source always produces the same binary. The binary hash becomes a verifiable identifier.

Deterministic Training

Neural network training is notoriously non-reproducible. The certifiable-training approach eliminates non-determinism:

Fixed-point arithmetic: No floating-point means no hardware-dependent rounding. Q16.16 operations produce identical results on any platform.

Deterministic shuffling: Feistel-based shuffling with fixed seeds, as described in The Feistel Shuffle, provides reproducible data ordering.

Deterministic reduction: Gradient accumulation uses fixed ordering, eliminating parallel reduction non-determinism.

Explicit random seeds: All randomness comes from seeded PRNGs with recorded seeds.

The result: given the same data, configuration, and seeds, training produces bit-identical models on any platform.

The Evidence Package

Certification auditors don’t just want hashes - they want a complete evidence package that demonstrates compliance.

Requirements Traceability Matrix

REQ-001: System shall detect anomalies within 100ms
  → DESIGN-001: Use EMA-based baseline with O(1) update
    → CODE: baseline.c:update_monitoring()
      → TEST: test_baseline.c:test_detection_latency()
        → RESULT: PASS (avg 2.3ms, max 8.1ms)

Git tracks the code. The Merkle chain tracks the test results. Together they provide bidirectional traceability.

Configuration Baseline

A baseline captures the complete state at a release point:

{
  "baseline_id": "v1.2.0",
  "timestamp": "2026-01-27T10:00:00Z",
  "components": {
    "source": {
      "repository": "speytech/certifiable-inference",
      "commit": "abc123...",
      "commit_hash": "sha256:def456..."
    },
    "training_data": {
      "dataset": "sensor-readings-v3",
      "chain_hash": "sha256:789abc..."
    },
    "model": {
      "architecture": "cnn-small",
      "weights_hash": "sha256:fed987...",
      "training_chain": "sha256:654321..."
    },
    "binary": {
      "platform": "arm64-linux",
      "hash": "sha256:aabbcc..."
    }
  },
  "attestation": {
    "signer": "build-system-key-001",
    "signature": "ed25519:..."
  }
}

This baseline is a complete, verifiable snapshot. Given this document, auditors can:

Retrieve the exact source code
Verify the training data provenance
Verify the model provenance
Reproduce the binary
Confirm nothing has been modified

Change Control Records

Every change to a baselined system requires documentation:

{
  "change_id": "CR-2026-001",
  "baseline_from": "v1.2.0",
  "baseline_to": "v1.2.1",
  "description": "Update anomaly threshold from 3.0 to 2.5 sigma",
  "justification": "Field data shows 3.0 sigma misses 12% of anomalies",
  "impact_analysis": {
    "code_changes": ["config.h:ANOMALY_THRESHOLD"],
    "test_changes": ["test_baseline.c:test_threshold_values"],
    "documentation_changes": ["user-manual.md:section-4.2"]
  },
  "verification": {
    "tests_passed": true,
    "regression_clear": true,
    "review_approved": "2026-01-26"
  },
  "approval": {
    "approver": "quality-manager",
    "date": "2026-01-27",
    "signature": "ed25519:..."
  }
}

This isn’t bureaucracy - it’s evidence that changes are controlled, reviewed, and verified. The cryptographic signatures make it tamper-evident.

Practical Implementation

Directory Structure

A certifiable project includes evidence alongside code:

project/
├── src/                    # Source code (Git tracked)
├── include/                # Headers (Git tracked)
├── tests/                  # Test code (Git tracked)
├── evidence/               # Certification evidence
│   ├── requirements/       # Requirements documents
│   ├── design/            # Design documents
│   ├── traceability/      # RTM and matrices
│   ├── test-results/      # Test execution records
│   └── baselines/         # Release baselines
├── chains/                 # Merkle chain data
│   ├── data/              # Data provenance chains
│   ├── training/          # Training step chains
│   └── builds/            # Build attestations
└── releases/              # Signed release packages

Automated Chain Generation

The Merkle chains are generated automatically during pipeline execution:

// During data preprocessing
void process_batch(batch_t *batch, chain_t *chain) {
    // Process the data
    normalize(batch);
    augment(batch);
    
    // Extend the chain
    uint8_t batch_hash[32];
    sha256(batch->data, batch->size, batch_hash);
    chain_extend(chain, batch_hash);
}

// During training
void training_step(model_t *model, batch_t *batch, chain_t *chain) {
    // Compute gradients
    gradient_t grads = compute_gradients(model, batch);
    
    // Update weights
    apply_gradients(model, &grads);
    
    // Extend chain with step record
    step_record_t record = {
        .weights_hash = hash_weights(model),
        .gradients_hash = hash_gradients(&grads),
        .batch_index = batch->index
    };
    chain_extend(chain, &record);
}

The chain generation is part of normal operation, not a separate audit step.

Verification Tools

The certifiable-verify project provides verification tools:

# Verify a deployment bundle
$ certify-verify bundle model-v1.2.0.cbf

Verifying bundle: model-v1.2.0.cbf
  Model hash:      OK (matches embedded hash)
  Training chain:  OK (2,847 steps verified)
  Data chain:      OK (1,203 batches verified)
  Signature:       OK (signed by build-key-001)
  
Bundle verification: PASSED

# Reproduce and verify a build
$ certify-rebuild --baseline v1.2.0 --verify

Rebuilding from baseline v1.2.0
  Fetching source:   abc123...
  Fetching config:   def456...
  Building:          arm64-linux
  
Binary hash: sha256:aabbcc...
Expected:    sha256:aabbcc...
  
Reproduction: MATCHED

These tools enable auditors to verify claims independently.

Integration with Git

The Merkle chain approach complements Git rather than replacing it:

Git tracks: Source code, configuration files, documentation, test code

Chains track: Data provenance, training history, build artefacts, deployment attestation

Together they provide: Complete traceability from requirements through deployment

The connection points are explicit:

{
  "training_chain_final": "sha256:abc123...",
  "source_commit": "git:def456...",
  "link_attestation": "This training chain was produced by code at commit def456..."
}

Conclusion

Git is necessary but not sufficient for safety-critical systems. Source control tracks code changes, but certification requires tracking everything: data, training, builds, and deployments.

Merkle chains provide the cryptographic foundation:

Tamper-evident history that proves integrity
Verifiable provenance from data to deployment
Reproducible builds that auditors can check

The certifiable-* ecosystem implements these patterns for ML pipelines, as detailed in Merkle Chains for ML Audit Trails and Cryptographic Execution Tracing.

The overhead is real - generating chains, storing evidence, maintaining baselines all require effort. But for systems where certification is required, this overhead is unavoidable. The choice is between systematic evidence generation and ad-hoc documentation that auditors will reject.

For systems that don’t require formal certification, the same principles provide value: reproducible builds catch “works on my machine” problems, provenance chains enable debugging, and attestation prevents accidental deployment of wrong artefacts.

Git tracks what you wrote. Merkle chains prove what you built.

Version Control for Deterministic Systems: Git Isn't Enough

What Certification Actually Requires

DO-178C (Aerospace)

IEC 62304 (Medical Devices)

ISO 26262 (Automotive)

The ML Pipeline Problem

Training Data

Training Process

Model Artefacts

Merkle Chains: Cryptographic Audit Trails

How Merkle Chains Work

Applied to ML Pipelines

Training Chains

Deployment Attestation

Reproducible Builds

The Reproducibility Problem

Deterministic Compilation

Deterministic Training

The Evidence Package

Requirements Traceability Matrix

Configuration Baseline

Change Control Records

Practical Implementation

Directory Structure

Automated Chain Generation

Verification Tools

Integration with Git

Conclusion

About the Author

Let's Discuss Your AI Infrastructure

Version Control for Deterministic Systems: Git Isn't Enough

What Certification Actually Requires

DO-178C (Aerospace)

IEC 62304 (Medical Devices)

ISO 26262 (Automotive)

The ML Pipeline Problem

Training Data

Training Process

Model Artefacts

Merkle Chains: Cryptographic Audit Trails

How Merkle Chains Work

Applied to ML Pipelines

Training Chains

Deployment Attestation

Reproducible Builds

The Reproducibility Problem

Deterministic Compilation

Deterministic Training

The Evidence Package

Requirements Traceability Matrix

Configuration Baseline

Change Control Records

Practical Implementation

Directory Structure

Automated Chain Generation

Verification Tools

Integration with Git

Conclusion

About the Author

Occasional Technical Updates

Let's Discuss Your AI Infrastructure