Deploying an ML model isn’t the end of the story. In safety-critical systems, you need to know — with cryptographic certainty — when that model is operating outside its certified envelope. Standard monitoring tools use floating-point statistics, produce non-deterministic results, and leave no verifiable audit trail.
certifiable-monitor changes that.
The Problem
When an ML model runs in production, things drift:
- Input distributions shift — Real-world data diverges from training
- Activations exceed bounds — Internal values go where they shouldn’t
- Output patterns change — Predictions no longer match expectations
- Faults accumulate silently — Overflow and saturation events go unnoticed
Current monitoring approaches have fundamental problems for certification:
Non-deterministic metrics. Floating-point drift calculations produce different results on different platforms. How do you validate something that changes?
No audit trail. When an incident occurs, there’s no cryptographic proof of what the monitor observed. It’s your word against the logs.
Ambiguous reactions. “Log a warning” isn’t a deterministic specification. What action, exactly, should the system take when TV exceeds 0.15?
For DO-178C Level A, IEC 62304 Class C, or ISO 26262 ASIL-D certification, “the model drifted so we logged it” isn’t acceptable evidence.
The Solution
certifiable-monitor provides deterministic runtime monitoring through three core mechanisms:
1. Fixed-Point Drift Detection
All statistical metrics computed in fixed-point arithmetic:
Total Variation (TV) — The safest detector, no logarithms required:
TV(p, q) = (1/2) Σ_b |p_b - q_b|Output in Q0.32. Zero means identical distributions. UINT32_MAX means completely disjoint.
Jensen-Shannon Divergence (JSD) — Symmetric divergence measure:
JSD(p, q) = (1/2) KL(p ∥ m) + (1/2) KL(q ∥ m)Uses a 512-entry LUT for log2 computation. No floating-point. Bit-identical on x86, ARM, and RISC-V.
Population Stability Index (PSI) — Directional sensitivity:
PSI(p, q) = Σ_b (p_b - q_b) ln(p_b / q_b)Epsilon smoothing prevents log(0). Policy defines operational thresholds.
Same inputs produce the same drift scores. Every time. Every platform.
2. Cryptographic Audit Ledger
Every monitoring event is logged to a SHA-256 hash chain:
L_0 = SHA256("CM:LEDGER:GENESIS:v1" ∥ R ∥ H_P)
L_t = SHA256("CM:LEDGER:v1" ∥ L_{t-1} ∥ e_t)The genesis block binds to the deployment bundle root R and policy hash H_P. Every subsequent entry chains to the previous digest.
Tampering is detectable. Truncation is detectable. Reordering is detectable. Post-incident analysis can replay the entire monitoring history with cryptographic verification.
3. Deterministic Health FSM
A state machine with formally defined transitions:
UNINIT → INIT → ENABLED → ALARM → DEGRADED → STOPPEDFault budgets define thresholds. Violations trigger transitions. Once stopped, only manual intervention restarts. No ambiguity about system state.
What’s Implemented
| Module | Purpose | Tests |
|---|---|---|
| DVM Primitives | Saturating arithmetic, LUT log2 | 33 |
| Audit Ledger | SHA-256 hash chain | 18 |
| Drift Detectors | TV, JSD, PSI computation | 20 |
| Policy Parser | COE JSON parsing, JCS hash | 25 |
| Input Monitor | Feature envelope checking | 22 |
| Activation Monitor | Layer bounds checking | 24 |
| Output Monitor | Output envelope checking | 19 |
| Health FSM | Monitor state machine | 19 |
| Reaction Handler | Violation → action mapping | 14 |
| Ledger Verification | Offline chain verification | 32 |
| Bit-Identity | Cross-platform determinism | 27 |
Every module traces to formal specifications in CM-MATH-001, CM-STRUCT-001, and the SRS documents.
Usage Example
#include "policy.h"
#include "input.h"
#include "health.h"
#include "ledger.h"
#include "react.h"
ct_fault_flags_t faults = {0};
// Load policy and initialize ledger
cm_policy_t policy;
cm_policy_parse(policy_json, policy_len, &policy, &faults);
cm_ledger_ctx_t ledger;
cm_ledger_init(&ledger);
cm_ledger_genesis(&ledger, policy.bundle_root, policy.policy_hash, &faults);
// Initialize monitors
cm_input_ctx_t input_mon;
cm_input_init(&input_mon, &policy.input);
cm_health_ctx_t health;
cm_health_init(&health, &policy.fault_budget);
cm_health_enable(&health);
// Per-inference: check input envelope
cm_input_result_t result;
cm_input_check(&input_mon, input_vector, num_features, &result, &faults);
if (result.violations > 0) {
// Log to cryptographic ledger
uint8_t L_out[32];
cm_ledger_append_violation(&ledger, window_id, CM_VIOL_INPUT_RANGE,
result.first_violation_idx,
result.first_violation_value,
result.first_violation_bound,
L_out, &faults);
// Get policy-defined reaction
cm_reaction_t action = cm_policy_get_reaction(&policy, CM_VIOL_INPUT_RANGE);
// Update health state
cm_health_report_violation(&health, CM_VIOL_INPUT_RANGE);
}
// Check if system should halt
if (cm_health_get_state(&health) == CM_HEALTH_STOPPED) {
// Emergency stop — do not proceed with inference
}All buffers statically allocated. No malloc. Deterministic execution path.
The Pipeline
certifiable-monitor completes the deterministic ML ecosystem:
certifiable-data → certifiable-training → certifiable-quant → certifiable-deploy → certifiable-inference
↓
certifiable-monitor
↓
Audit LedgerThe monitor receives:
- From certifiable-deploy: Bundle attestation root and policy hash
- From certifiable-inference: Input vectors, activation values, output vectors, fault flags
- From policy: Thresholds, envelopes, reaction mappings
Six interlocking projects. One coherent vision: deterministic ML from data to monitored production.
Why This Matters
Medical Devices
IEC 62304 Class C requires traceable, reproducible software. When a diagnostic AI flags an anomaly, the response must be deterministic. The audit trail must be verifiable.
Autonomous Vehicles
ISO 26262 ASIL-D demands provable behavior under all conditions. Input drift detection with cryptographic proof isn’t optional — it’s the difference between “we think the model was stable” and “here’s the hash chain proving it.”
Aerospace
DO-178C Level A requires complete requirements traceability. Every drift metric traces to CM-MATH-001. Every state transition traces to CM-ARCH-MATH-001. Every test traces to an SRS requirement.
This is the monitoring layer that makes ML certification possible.
Getting Started
git clone https://github.com/williamofai/certifiable-monitor
cd certifiable-monitor
mkdir build && cd build
cmake ..
make
make test-all # 253 testsExpected output:
100% tests passed, 0 tests failed out of 11Documentation
The implementation traces to formal specifications:
- CM-MATH-001 — Mathematical foundations (drift metrics, ledger hashing, log2 LUT)
- CM-STRUCT-001 — Data structure specifications
- CM-ARCH-MATH-001 — Architecture-level math (health FSM, window semantics)
- SRS-001 through SRS-008 — Module requirements with full traceability
Every function documents its traceability reference. Every test validates a specification clause.
The Trade-Off
Deterministic monitoring isn’t free. Fixed-point arithmetic requires careful scaling. Hash chain updates add overhead. Static allocation means pre-sized buffers.
For systems where “it probably works” is acceptable, standard monitoring tools are simpler.
For systems where lives depend on the answer — where regulators demand proof, where post-incident analysis requires cryptographic verification, where “the model drifted” needs to be a traceable, reproducible, auditable event — certifiable-monitor provides the foundation.
As with any architectural approach, suitability depends on system requirements, risk classification, and regulatory context.
Built by SpeyTech in the Scottish Highlands. 30 years of UNIX systems engineering applied to making ML safe enough to certify.