AI Architecture

Production ML systems from 30 years of UNIX infrastructure experience

AI architecture is not model design — it is systems engineering. This section documents production-grade machine learning architectures shaped by over 30 years of UNIX infrastructure experience, with a focus on reliability, observability, deterministic behaviour, and long-term operability in real production environments.

Latest AI Architecture

Incident Reconstruction: Why 'It Worked Yesterday' Isn't Evidence

How bit-perfect replay, execution tracing, and sealed audit logs transform incident response from guesswork to forensics

Incident Reconstruction: Why 'It Worked Yesterday' Isn't Evidence

AI Architecture January 26, 2026 21:15

Version Control for Deterministic Systems: Git Isn't Enough

How Merkle chains, cryptographic attestation, and reproducible builds satisfy certification evidence requirements

10 min read →

AI Architecture January 24, 2026 00:30

Testing ML Systems: Beyond Unit Tests and Accuracy Metrics

A practical testing strategy for production machine learning

AI Architecture January 24, 2026 00:05

Cost Engineering for ML Infrastructure: What Actually Matters

Where the money goes and what to optimise first

AI Architecture January 23, 2026 22:00

State Management in ML Services: Beyond Stateless Inference

Architectural patterns for ML systems that need to remember

AI Architecture January 23, 2026 21:14

Graceful Degradation in ML Systems: When Your Model Can't Answer

Fallback strategies for production inference that fails gracefully instead of failing loudly

AI Architecture January 23, 2026 18:00

The Observability Blind Spot: What ML Metrics Don't Tell You

Why accuracy looks fine while your production system burns

10 min read →

AI Architecture January 19, 2026 23:00

The Certifiable-* Ecosystem: Eight Projects, One Deterministic ML Pipeline

From training data to deployed inference — bit-identical, auditable, certifiable

AI Architecture January 19, 2026 00:15

A Complete Deterministic ML Pipeline for Safety-Critical Systems

From training data to deployed inference — bit-identical, auditable, certifiable

10 min read →

AI Architecture January 15, 2026 22:31

WCET Analysis for Neural Network Inference

How to prove worst-case execution time for convolution, matrix multiply, and pooling operations

10 min read →

AI Architecture January 15, 2026 20:45

Why TensorFlow Lite Faces Challenges in DO-178C Certification

Understanding the architectural properties that complicate aerospace certification for mobile inference frameworks

12 min read →

AI Architecture January 14, 2026 20:00

Why Floating Point Is Dangerous: The Case for Deterministic AI in C

When 'mostly reproducible' isn't good enough for systems that matter

AI Architecture January 13, 2026 21:40

Debugging Model Behavior in Production

When the model works in staging but fails in prod, here's how to find out why

AI Architecture January 13, 2026 21:15

When You Don't Need a Feature Store

Most teams solve a problem they don't have yet

AI Architecture January 13, 2026 20:35

Model Serving Architecture Patterns

Understanding latency, throughput, and the trade-offs between them

AI Architecture January 13, 2026 19:51

Production AI Systems: What 30 Years of UNIX Taught Me

The infrastructure principles that kept systems running still apply to ML

AI Architecture January 13, 2026 19:15

The Observability Gap in ML Systems

Why your model serving cluster fails at 3AM and you can't figure out why

10 of 17 articles