AI architecture is not model design — it is systems engineering. This section documents production-grade machine learning architectures shaped by over 30 years of UNIX infrastructure experience, with a focus on reliability, observability, deterministic behaviour, and long-term operability in real production environments.
Incident Reconstruction: Why 'It Worked Yesterday' Isn't Evidence
How bit-perfect replay, execution tracing, and sealed audit logs transform incident response from guesswork to forensics
Version Control for Deterministic Systems: Git Isn't Enough
How Merkle chains, cryptographic attestation, and reproducible builds satisfy certification evidence requirements
10 min read →Testing ML Systems: Beyond Unit Tests and Accuracy Metrics
A practical testing strategy for production machine learning
4 min read →Cost Engineering for ML Infrastructure: What Actually Matters
Where the money goes and what to optimise first
6 min read →State Management in ML Services: Beyond Stateless Inference
Architectural patterns for ML systems that need to remember
8 min read →Graceful Degradation in ML Systems: When Your Model Can't Answer
Fallback strategies for production inference that fails gracefully instead of failing loudly
7 min read →The Observability Blind Spot: What ML Metrics Don't Tell You
Why accuracy looks fine while your production system burns
10 min read →The Certifiable-* Ecosystem: Eight Projects, One Deterministic ML Pipeline
From training data to deployed inference — bit-identical, auditable, certifiable
8 min read →A Complete Deterministic ML Pipeline for Safety-Critical Systems
From training data to deployed inference — bit-identical, auditable, certifiable
10 min read →WCET Analysis for Neural Network Inference
How to prove worst-case execution time for convolution, matrix multiply, and pooling operations
10 min read →10 of 17 articles