certifiable-monitor

Deploying an ML model isn’t the end of the story. In safety-critical systems, you need to know — with cryptographic certainty — when that model is operating outside its certified envelope. Standard monitoring tools use floating-point statistics, produce non-deterministic results, and leave no verifiable audit trail.

certifiable-monitor changes that.

View on GitHub

The Problem

When an ML model runs in production, things drift:

Input distributions shift — Real-world data diverges from training
Activations exceed bounds — Internal values go where they shouldn’t
Output patterns change — Predictions no longer match expectations
Faults accumulate silently — Overflow and saturation events go unnoticed

Current monitoring approaches have fundamental problems for certification:

Non-deterministic metrics. Floating-point drift calculations produce different results on different platforms. How do you validate something that changes?

No audit trail. When an incident occurs, there’s no cryptographic proof of what the monitor observed. It’s your word against the logs.

Ambiguous reactions. “Log a warning” isn’t a deterministic specification. What action, exactly, should the system take when TV exceeds 0.15?

For DO-178C Level A, IEC 62304 Class C, or ISO 26262 ASIL-D certification, “the model drifted so we logged it” isn’t acceptable evidence.

The Solution

certifiable-monitor provides deterministic runtime monitoring through three core mechanisms:

1. Fixed-Point Drift Detection

All statistical metrics computed in fixed-point arithmetic:

Total Variation (TV) — The safest detector, no logarithms required:

TV(p, q) = (1/2) Σ_b |p_b - q_b|

Output in Q0.32. Zero means identical distributions. UINT32_MAX means completely disjoint.

Jensen-Shannon Divergence (JSD) — Symmetric divergence measure:

JSD(p, q) = (1/2) KL(p ∥ m) + (1/2) KL(q ∥ m)

Uses a 512-entry LUT for log2 computation. No floating-point. Bit-identical on x86, ARM, and RISC-V.

Population Stability Index (PSI) — Directional sensitivity:

PSI(p, q) = Σ_b (p_b - q_b) ln(p_b / q_b)

Epsilon smoothing prevents log(0). Policy defines operational thresholds.

Same inputs produce the same drift scores. Every time. Every platform.

2. Cryptographic Audit Ledger

Every monitoring event is logged to a SHA-256 hash chain:

L_0 = SHA256("CM:LEDGER:GENESIS:v1" ∥ R ∥ H_P)
L_t = SHA256("CM:LEDGER:v1" ∥ L_{t-1} ∥ e_t)

The genesis block binds to the deployment bundle root R and policy hash H_P. Every subsequent entry chains to the previous digest.

Tampering is detectable. Truncation is detectable. Reordering is detectable. Post-incident analysis can replay the entire monitoring history with cryptographic verification.

3. Deterministic Health FSM

A state machine with formally defined transitions:

UNINIT → INIT → ENABLED → ALARM → DEGRADED → STOPPED

Fault budgets define thresholds. Violations trigger transitions. Once stopped, only manual intervention restarts. No ambiguity about system state.

What’s Implemented

253

Tests Passing

Test Suites

~13,700

Lines of Code

Module	Purpose	Tests
DVM Primitives	Saturating arithmetic, LUT log2	33
Audit Ledger	SHA-256 hash chain	18
Drift Detectors	TV, JSD, PSI computation	20
Policy Parser	COE JSON parsing, JCS hash	25
Input Monitor	Feature envelope checking	22
Activation Monitor	Layer bounds checking	24
Output Monitor	Output envelope checking	19
Health FSM	Monitor state machine	19
Reaction Handler	Violation → action mapping	14
Ledger Verification	Offline chain verification	32
Bit-Identity	Cross-platform determinism	27

Every module traces to formal specifications in CM-MATH-001, CM-STRUCT-001, and the SRS documents.

Usage Example

#include "policy.h"
#include "input.h"
#include "health.h"
#include "ledger.h"
#include "react.h"

ct_fault_flags_t faults = {0};

// Load policy and initialize ledger
cm_policy_t policy;
cm_policy_parse(policy_json, policy_len, &policy, &faults);

cm_ledger_ctx_t ledger;
cm_ledger_init(&ledger);
cm_ledger_genesis(&ledger, policy.bundle_root, policy.policy_hash, &faults);

// Initialize monitors
cm_input_ctx_t input_mon;
cm_input_init(&input_mon, &policy.input);

cm_health_ctx_t health;
cm_health_init(&health, &policy.fault_budget);
cm_health_enable(&health);

// Per-inference: check input envelope
cm_input_result_t result;
cm_input_check(&input_mon, input_vector, num_features, &result, &faults);

if (result.violations > 0) {
    // Log to cryptographic ledger
    uint8_t L_out[32];
    cm_ledger_append_violation(&ledger, window_id, CM_VIOL_INPUT_RANGE,
                               result.first_violation_idx,
                               result.first_violation_value,
                               result.first_violation_bound,
                               L_out, &faults);
    
    // Get policy-defined reaction
    cm_reaction_t action = cm_policy_get_reaction(&policy, CM_VIOL_INPUT_RANGE);
    
    // Update health state
    cm_health_report_violation(&health, CM_VIOL_INPUT_RANGE);
}

// Check if system should halt
if (cm_health_get_state(&health) == CM_HEALTH_STOPPED) {
    // Emergency stop — do not proceed with inference
}

All buffers statically allocated. No malloc. Deterministic execution path.

The Pipeline

certifiable-monitor completes the deterministic ML ecosystem:

certifiable-data → certifiable-training → certifiable-quant → certifiable-deploy → certifiable-inference
                                                                                          ↓
                                                                              certifiable-monitor
                                                                                          ↓
                                                                                   Audit Ledger

The monitor receives:

From certifiable-deploy: Bundle attestation root and policy hash
From certifiable-inference: Input vectors, activation values, output vectors, fault flags
From policy: Thresholds, envelopes, reaction mappings

Six interlocking projects. One coherent vision: deterministic ML from data to monitored production.

Why This Matters

Medical Devices

IEC 62304 Class C requires traceable, reproducible software. When a diagnostic AI flags an anomaly, the response must be deterministic. The audit trail must be verifiable.

Autonomous Vehicles

ISO 26262 ASIL-D demands provable behavior under all conditions. Input drift detection with cryptographic proof isn’t optional — it’s the difference between “we think the model was stable” and “here’s the hash chain proving it.”

Aerospace

DO-178C Level A requires complete requirements traceability. Every drift metric traces to CM-MATH-001. Every state transition traces to CM-ARCH-MATH-001. Every test traces to an SRS requirement.

This is the monitoring layer that makes ML certification possible.

Getting Started

git clone https://github.com/williamofai/certifiable-monitor
cd certifiable-monitor
mkdir build && cd build
cmake ..
make
make test-all  # 253 tests

Expected output:

100% tests passed, 0 tests failed out of 11

Documentation

The implementation traces to formal specifications:

CM-MATH-001 — Mathematical foundations (drift metrics, ledger hashing, log2 LUT)
CM-STRUCT-001 — Data structure specifications
CM-ARCH-MATH-001 — Architecture-level math (health FSM, window semantics)
SRS-001 through SRS-008 — Module requirements with full traceability

Every function documents its traceability reference. Every test validates a specification clause.

The Trade-Off

Deterministic monitoring isn’t free. Fixed-point arithmetic requires careful scaling. Hash chain updates add overhead. Static allocation means pre-sized buffers.

For systems where “it probably works” is acceptable, standard monitoring tools are simpler.

For systems where lives depend on the answer — where regulators demand proof, where post-incident analysis requires cryptographic verification, where “the model drifted” needs to be a traceable, reproducible, auditable event — certifiable-monitor provides the foundation.

As with any architectural approach, suitability depends on system requirements, risk classification, and regulatory context.

Built by SpeyTech in the Scottish Highlands. 30 years of UNIX systems engineering applied to making ML safe enough to certify.

View on GitHub · Documentation

certifiable-monitor

The Problem

The Solution

1. Fixed-Point Drift Detection

2. Cryptographic Audit Ledger

3. Deterministic Health FSM

What’s Implemented

Usage Example

The Pipeline

Why This Matters

Medical Devices

Autonomous Vehicles

Aerospace

Getting Started

Documentation

The Trade-Off

About the Author

Questions or Contributions?

certifiable-monitor

The Problem

The Solution

1. Fixed-Point Drift Detection

2. Cryptographic Audit Ledger

3. Deterministic Health FSM

What’s Implemented

Usage Example

The Pipeline

Why This Matters

Medical Devices

Autonomous Vehicles

Aerospace

Getting Started

Documentation

The Trade-Off

About the Author

Occasional Technical Updates

Questions or Contributions?