AI Architecture

Incident Reconstruction: Why 'It Worked Yesterday' Isn't Evidence

How bit-perfect replay, execution tracing, and sealed audit logs transform incident response from guesswork to forensics

Published
January 26, 2026 22:00
Reading Time
6 min
Comparison showing how non-deterministic systems create investigation challenges while deterministic systems enable complete reconstruction

An autonomous vehicle strikes a pedestrian. A medical device delivers the wrong dose. A trading algorithm loses millions in seconds.

In each case, investigators need to answer: What exactly happened? What was the system’s state? What inputs did it receive? What decision did it make, and why?

For non-deterministic systems, these questions often have no definitive answers. The system cannot be replayed. The state cannot be reconstructed. The best investigators can offer is “we think this probably happened.”

For deterministic systems with proper audit trails, every question has a precise answer, cryptographically verified and legally defensible.

The Investigation Problem

What Investigators Need

Post-incident analysis requires reconstructing the exact sequence of events:

  1. Input reconstruction: What data entered the system?
  2. State reconstruction: What was the system’s internal state at each moment?
  3. Decision reconstruction: What logic produced the output?
  4. Counterfactual analysis: Would different inputs have produced different outputs?

For safety-critical systems, these aren’t academic questions. Regulators demand answers. Liability depends on them. Future safety improvements require understanding what went wrong.

Why Non-Determinism Defeats Investigation

Consider a neural network that misclassified a stop sign:

# Somewhere in production
output = model.predict(input_image)
# output = "speed limit 35" instead of "stop sign"

To investigate, you need to answer: Why did the model produce this output?

Problem 1: The model may not be reproducible. If training used stochastic elements, you can’t prove the deployed model matches any specific training run.

Problem 2: Inference may not be reproducible. If the model uses floating-point with hardware-dependent rounding, the same input might produce different outputs on different machines.

Problem 3: State may not be reconstructible. If the model has internal state (RNN, attention cache), and that state wasn’t logged, you can’t know what the model “saw.”

Problem 4: Inputs may be incomplete. If preprocessing was non-deterministic (random cropping, noise injection), you can’t reconstruct what the model actually received.

The investigation stalls at “we can’t reproduce the failure.” This is unacceptable for safety-critical systems.

Deterministic Systems Enable Forensics

A deterministic system with proper instrumentation provides complete reconstruction:

Bit-Perfect Replay

If the system is deterministic, you can replay the exact computation:

// Recorded incident data
incident_record_t incident = load_incident("2026-01-27-14-23-45");

// Reconstruct state at incident time
system_t system;
system_init(&system, &incident.config);

// Replay inputs up to incident
for (int i = 0; i < incident.input_count; i++) {
    system_update(&system, &incident.inputs[i], incident.timestamps[i]);
}

// System is now in exact incident state
// Every decision can be examined
printf("State at incident: %s\n", state_to_string(system_status(&system)));
printf("Decision: %s\n", decision_to_string(system.last_decision));
printf("Contributing factors:\n");
dump_decision_factors(&system);

This isn’t simulation - it’s exact reconstruction. The replayed state matches the incident state bit-for-bit because the system is deterministic.

Sealed Audit Trails

Reconstruction requires trusting the recorded data. Cryptographic sealing provides that trust:

// Each logged entry includes a chain hash
typedef struct {
    uint64_t timestamp;
    input_t input;
    state_t state_after;
    decision_t decision;
    uint8_t prev_hash[32];    // Previous entry's hash
    uint8_t entry_hash[32];   // H(timestamp || input || state || decision || prev_hash)
} audit_entry_t;

The chain structure means:

  • Modifying any entry invalidates all subsequent hashes
  • Inserting or deleting entries breaks the chain
  • The final hash commits to the entire history

Investigators can verify that logs haven’t been tampered with. Courts can trust the evidence.

For implementation details, see Cryptographic Execution Tracing.

Decision Traceability

Deterministic systems can log not just what decision was made, but why:

typedef struct {
    decision_t decision;
    
    // Inputs that influenced this decision
    float confidence_score;
    int contributing_features[MAX_FEATURES];
    float feature_weights[MAX_FEATURES];
    
    // State factors
    state_t system_state;
    int consecutive_alerts;
    uint64_t time_since_last_event;
    
    // Threshold comparisons
    float threshold_used;
    bool threshold_exceeded;
} decision_trace_t;

When an incident occurs, investigators don’t just see “system decided X.” They see:

  • The inputs that contributed to the decision
  • The weights assigned to each input
  • The thresholds that were or weren’t crossed
  • The system state that influenced the decision

This transforms “why did it do that?” from unanswerable to precisely documented.

Case Study: Autonomous Vehicle

Consider an autonomous vehicle incident. The vehicle failed to brake for an obstacle.

Non-Deterministic Investigation

Available evidence:

  • Dash cam video showing the obstacle
  • Vehicle telemetry showing speed, steering, brake status
  • High-level logs: “Perception module detected object. Planning module generated trajectory.”

Investigation challenges:

  • Can’t replay the exact perception model state
  • Don’t know what the model “saw” in the camera feed
  • Can’t verify the planning module’s decision logic
  • Different test runs produce different results

Conclusion: “The perception module may have misclassified the obstacle, but we cannot definitively determine the cause.”

Deterministic Investigation

Available evidence:

  • Complete input stream (camera frames, lidar points, radar returns)
  • Cryptographically sealed state snapshots
  • Decision traces for every planning cycle
  • Merkle chain proving log integrity

Investigation process:

# Load incident record
$ av-forensics load incident-2026-01-27-14-23-45

Loading incident record...
  Input frames: 1,247
  State snapshots: 1,247
  Chain verification: PASSED (no tampering detected)

# Replay to incident moment
$ av-forensics replay --to "14:23:45.127"

Replaying...
  Frame 1,189: Object detected at 47m
  Frame 1,190: Object classified as "debris" (confidence 0.73)
  Frame 1,191: Planning: maintain speed (debris, no evasion needed)
  Frame 1,192: Object reclassified as "pedestrian" (confidence 0.81)
  Frame 1,193: Planning: emergency brake initiated
  Frame 1,194: Collision

# Examine decision at frame 1,191
$ av-forensics decision-trace --frame 1191

Decision: MAINTAIN_SPEED
  Object class: debris (0.73)
  Distance: 42m
  Time to collision: 2.1s
  Threshold for evasion: class != debris OR confidence < 0.6
  
  Contributing features:
    - Shape signature: 0.31 (debris-like)
    - Motion pattern: 0.22 (stationary)
    - Size estimate: 0.18 (small)
    - Radar cross-section: 0.15 (low)
    
  Note: Pedestrian features (silhouette, limb motion) not detected until frame 1,192

Conclusion: “The perception module misclassified a pedestrian as debris based on the object’s stationary pose and low radar cross-section. Classification corrected at frame 1,192, but remaining braking distance (38m) was insufficient at current speed (65 km/h). Root cause: perception model training data underrepresented stationary pedestrians. Recommendation: augment training data with stationary pedestrian scenarios.”

The deterministic system provides a definitive answer with actionable recommendations.

Implementation Patterns

Input Logging

Every input to the system must be logged with sufficient precision for replay:

typedef struct {
    uint64_t timestamp_ns;      // Nanosecond precision
    uint32_t sequence_number;    // Monotonic sequence
    input_type_t type;           // Input type identifier
    uint32_t payload_size;
    uint8_t payload[];           // Variable-length input data
} input_record_t;

void log_input(logger_t *log, const input_t *input, uint64_t timestamp) {
    input_record_t *record = allocate_record(log, sizeof(input_t));
    
    record->timestamp_ns = timestamp;
    record->sequence_number = log->next_sequence++;
    record->type = input->type;
    record->payload_size = sizeof(input_t);
    memcpy(record->payload, input, sizeof(input_t));
    
    // Extend Merkle chain
    extend_chain(log, record);
}

State Snapshots

Periodic state snapshots enable reconstruction without replaying from the beginning:

typedef struct {
    uint64_t timestamp_ns;
    uint32_t snapshot_id;
    uint32_t input_sequence;     // Last input processed
    state_t full_state;          // Complete system state
    uint8_t state_hash[32];      // Hash of serialised state
    uint8_t chain_hash[32];      // Current Merkle chain position
} state_snapshot_t;

void take_snapshot(system_t *sys, logger_t *log) {
    state_snapshot_t snap;
    
    snap.timestamp_ns = get_timestamp();
    snap.snapshot_id = log->next_snapshot++;
    snap.input_sequence = log->last_sequence;
    snap.full_state = sys->state;
    
    // Hash the state for integrity verification
    hash_state(&sys->state, snap.state_hash);
    
    // Record current chain position
    memcpy(snap.chain_hash, log->current_chain_hash, 32);
    
    write_snapshot(log, &snap);
}

To replay to a specific point:

  1. Load the nearest preceding snapshot
  2. Replay inputs from that snapshot’s sequence number

Decision Traces

Every significant decision should be traced:

typedef struct {
    uint64_t timestamp_ns;
    uint32_t decision_id;
    decision_type_t type;
    decision_value_t value;
    
    // What inputs influenced this decision
    uint32_t input_count;
    input_reference_t inputs[MAX_INPUTS];
    
    // What state influenced this decision
    state_summary_t state_summary;
    
    // The decision logic (which rules fired, which thresholds crossed)
    uint32_t rule_count;
    rule_evaluation_t rules[MAX_RULES];
    
    // Hash for integrity
    uint8_t trace_hash[32];
} decision_trace_t;

void trace_decision(logger_t *log, decision_context_t *ctx) {
    decision_trace_t trace;
    
    trace.timestamp_ns = get_timestamp();
    trace.decision_id = log->next_decision++;
    trace.type = ctx->decision_type;
    trace.value = ctx->decision_value;
    
    // Record what influenced this decision
    trace.input_count = ctx->input_count;
    for (int i = 0; i < ctx->input_count; i++) {
        trace.inputs[i] = ctx->inputs[i];
    }
    
    // Record relevant state
    summarise_state(ctx->system_state, &trace.state_summary);
    
    // Record which rules evaluated to what
    trace.rule_count = ctx->rule_count;
    for (int i = 0; i < ctx->rule_count; i++) {
        trace.rules[i] = ctx->rule_evaluations[i];
    }
    
    // Hash and chain
    hash_trace(&trace, trace.trace_hash);
    write_trace(log, &trace);
}

Deterministic reconstruction isn’t just technically useful - it has legal weight.

Admissibility

For evidence to be admissible in court, it must be:

  • Authentic: Proven to be what it claims to be
  • Reliable: Generated by a process known to be accurate
  • Complete: Not selectively edited

Cryptographically sealed audit trails satisfy all three:

  • Chain hashes prove authenticity
  • Deterministic replay proves reliability
  • Merkle verification proves completeness

Liability Determination

When an incident occurs, liability often depends on proving what the system knew and when:

Without determinism: “The system may have detected the obstacle, but we can’t prove it.”

With determinism: “The system detected the obstacle at timestamp T₁ and classified it as debris. The classification changed to pedestrian at timestamp T₂, 1.3 seconds before impact.”

The difference matters enormously in litigation.

Regulatory Compliance

Safety standards increasingly require demonstrable reconstruction capability:

DO-178C (aerospace): Requires that software behaviour be verifiable. Deterministic replay provides verification.

IEC 62304 (medical devices): Requires traceability from requirements through implementation to testing. Deterministic audit trails extend traceability to runtime.

ISO 26262 (automotive): Requires evidence of safety. Deterministic reconstruction provides evidence of what happened and why.

For more on medical device implications, see Reproducibility and Post-Incident Analysis in Implantable Cardiac Devices.

Production Debugging

The same infrastructure that enables incident investigation also improves everyday debugging.

Reproducing Production Issues

The classic debugging nightmare: “It fails in production but works in development.”

With deterministic systems:

# Export production incident
$ export-incident --id 12345 --output incident.dat

# Import to development environment  
$ import-incident incident.dat

# Replay with debugging
$ replay-debug --breakpoint "decision.confidence < 0.5"

Replaying incident 12345...
Hit breakpoint at frame 847:
  decision.confidence = 0.47
  
(debug) print decision
{
  type: CLASSIFY,
  value: "unknown",
  confidence: 0.47,
  inputs: [frame_847, state_846]
}

(debug) print inputs[0].features
{
  edge_count: 23,
  color_histogram: [0.2, 0.3, ...],
  texture_signature: 0x7a3f...
}

The production failure is now reproducible in development. Debug at leisure. Fix with confidence.

Root Cause Analysis

Deterministic systems enable systematic root cause analysis:

# Find all incidents with similar characteristics
$ incident-search --decision-type CLASSIFY --confidence-below 0.5

Found 17 incidents:
  12345: frame 847, confidence 0.47
  12298: frame 1203, confidence 0.39
  12156: frame 512, confidence 0.44
  ...

# Analyse common factors
$ incident-correlate --incidents 12345,12298,12156,...

Common factors:
  - 94% occurred during low-light conditions
  - 88% involved partially occluded objects
  - 76% had high background complexity
  
Recommended investigation: perception model performance in low-light, high-complexity scenarios

This transforms debugging from “fix the bug you found” to “find all bugs of this type.”

Conclusion

Incident investigation in non-deterministic systems is fundamentally limited. You can hypothesise what happened, but you cannot prove it. You can guess at causes, but you cannot verify them. You can propose fixes, but you cannot confirm they address the actual problem.

Deterministic systems with proper audit infrastructure transform investigation from guesswork to forensics:

  • Bit-perfect replay reconstructs the exact computation
  • Sealed audit trails prove logs haven’t been tampered with
  • Decision traces explain not just what happened, but why

The infrastructure investment is significant: input logging, state snapshots, decision tracing, Merkle chains. But for safety-critical systems, this investment is essential. When an autonomous vehicle crashes or a medical device malfunctions, “we don’t know what happened” is not an acceptable answer.

The certifiable-* ecosystem implements these patterns. Debugging Model Behavior in Production covers practical debugging strategies. Bit-Perfect Reproducibility explains the underlying determinism requirements.

“It worked yesterday” is not evidence. Cryptographically sealed replay logs are evidence. Build systems that can prove what happened.

About the Author

William Murray is a Regenerative Systems Architect with 30 years of UNIX infrastructure experience, specializing in deterministic computing for safety-critical systems. Based in the Scottish Highlands, he operates SpeyTech and maintains several open-source projects including C-Sentinel and c-from-scratch.

Let's Discuss Your AI Infrastructure

Available for UK-based consulting on production ML systems and infrastructure architecture.

Get in touch
← Back to AI Architecture