Note: The following discussion references publicly reported incidents to illustrate investigation challenges. These are complex events with multiple contributing factors; the intent is to explore how execution traceability relates to post-incident analysis, not to assign causation.
When safety-critical software systems experience failures, investigators often face a fundamental challenge: reconstructing exactly what the software did in the moments leading up to an incident.
Flight data recorders capture sensor readings and control surface positions. System logs record events and state changes. But the internal software state—variable values, conditional branches taken, the exact sequence of function calls—may not be fully captured. This can lead to extended investigation timelines as engineers attempt to reconstruct probable execution paths through simulation and analysis.
This is an evidentiary challenge that affects many domains: when software-controlled systems fail in safety-critical contexts, the ability to provide strong evidence of execution history can significantly influence investigation timelines, liability discussions, and regulatory outcomes.
Investigation Challenges in Safety-Critical Failures
Several well-documented cases illustrate how execution uncertainty can extend investigation complexity:
Aviation incidents: Major aviation investigations have highlighted the challenge of reconstructing software decision sequences from flight data alone. When internal software state isn’t fully captured, investigators may rely on simulation and code analysis to develop probable execution scenarios, a process that can extend timelines and leave some questions unresolved.
Spacecraft anomalies: The NASA Mars Climate Orbiter loss in 1999 involved a units conversion issue. Post-failure analysis required reconstructing the exact sequence of calculations that led to the trajectory error. Without complete execution traces, investigators relied on software inspection and simulation to develop their findings.
Medical device investigations: Software-related medical device issues can be challenging to investigate when failures cannot be reliably reproduced. This has contributed to situations where broader product actions are taken when specific root causes are difficult to isolate.
In cases like these, limited execution traceability has been associated with:
- Extended investigation timelines
- Increased uncertainty in root cause determination
- Broader regulatory responses when specific failures are difficult to isolate
- Prolonged liability discussions when technical evidence is ambiguous
Limitations of Traditional Logging
Conventional system logs can have limitations for post-incident analysis:
1. Selective Capture
Traditional logging captures selected events—function calls, state changes, error conditions—but typically not complete execution state. The gaps between logged events can leave room for uncertainty about intermediate computations.
A typical system log might record:
[timestamp] Sensor reading: 74.5°
[timestamp] Command issued: -2.5°But what happened between these events? Which conditional branches were taken? What were the intermediate calculations? Without complete state capture, reconstruction may require inference.
2. Integrity Considerations
Log files are typically mutable. In adversarial contexts or when system corruption occurs, questions about log integrity can complicate analysis.
3. Timing Dependencies
In systems with race conditions or timing dependencies, similar log entries might correspond to different execution paths. Traditional logs may not capture the scheduler state, interrupt timing, or memory ordering that influenced which path was actually taken.
4. Storage Trade-offs
High-fidelity logging of complete execution state would generate substantial data volumes. Practical systems log selectively, but every omission is a potential gap in the evidentiary record.
Cryptographic Hash Chains: Strengthening Execution Evidence
Deterministic execution combined with cryptographic hash chaining offers an approach to address some of these limitations. The core concept: deterministic execution + hash chaining = tamper-evident execution traces.
How It Works
At each tick boundary, a cryptographic hash is computed from the complete system state:
State(t₀) → hash₀ = SHA-256(State(t₀))
State(t₁) → hash₁ = SHA-256(State(t₁) || hash₀)
State(t₂) → hash₂ = SHA-256(State(t₂) || hash₁)
...
State(tₙ) → hashₙ = SHA-256(State(tₙ) || hashₙ₋₁)Each hash is computed from:
- Current state: Memory, registers, program counter
- Previous hash: Cryptographically linking to prior history
- Input events: Any external inputs received at this tick
The resulting hash chain has an important property: modification to any state in the chain is designed to invalidate subsequent hashes during verification, making tampering detectable.
Cryptographic Assurance
The strength of this approach relies on the collision resistance properties of SHA-256. Under widely accepted cryptographic assumptions, finding a modified state that produces the same hash is computationally infeasible with current technology.
This provides strong evidence of execution integrity, though like all cryptographic systems, it operates within the bounds of current threat models and implementation correctness.
MDCK: Deterministic Cryptography Kernel
MDCP’s MDCK (Murray Deterministic Cryptography Kernel) implements cryptographic sealing with several design considerations:
Bounded Computation
Cryptographic operations are designed to complete within deterministic time bounds. MDCK uses constant-time implementations of SHA-256 that execute in consistent clock cycles regardless of input data, preserving determinism while providing cryptographic capabilities.
Incremental Hashing
Rather than hashing the entire system state at each tick, MDCK uses incremental approaches:
- State delta compression: Only changed memory regions are hashed
- Merkle tree structure: Large states are organized as Merkle trees, allowing partial updates
- Rolling hashes: Efficient recomputation when small portions of state change
This can reduce the computational overhead of cryptographic sealing, making it more practical for embedded systems.
Storage Optimisation
Complete execution traces can be large. MDCK provides configurable retention:
- Safety-critical sections: Full trace retention with intermediate states
- Non-critical sections: Periodic checkpoints with hash chains between checkpoints
- Post-incident: Automatic extraction and archival of relevant execution windows
Given a cryptographically sealed execution trace (State₀, hash₀), ..., (Stateₙ, hashₙ), modification to any Stateᵢ is designed to be detectable by recomputing the hash chain and comparing with the stored hashₙ, under standard cryptographic assumptions.
Potential Evidentiary Implications
Cryptographically sealed execution traces may influence post-incident analysis in several ways:
Evidentiary Considerations
Cryptographic evidence is increasingly recognised in various legal and regulatory contexts. The specific admissibility and weight given to such evidence depends on jurisdiction, chain-of-custody documentation, and other factors. However, tamper-evident execution traces may provide stronger evidentiary foundations than traditional logs in some contexts.
Analysis Efficiency
With cryptographically sealed traces:
- Reconstruction may be simplified: Execution traces can provide direct evidence rather than requiring simulation-based reconstruction
- Ambiguity may be reduced: State transitions are documented rather than inferred
- Focus can shift: Technical analysis can focus on understanding behaviour rather than establishing what happened
Risk Management Considerations
Insurance and regulatory bodies may view systems with stronger execution traceability favourably, though specific implications depend on context:
- Investigation may require extensive reconstruction
- Ambiguity can extend liability discussions
- Root cause isolation can be challenging
- Broader product actions may be needed when issues are hard to isolate
- Execution history can be verified directly
- Stronger evidence may support faster resolution
- Specific failure modes may be more readily identifiable
- Targeted responses may be possible when root cause is clear
Note: Legal, insurance, and regulatory outcomes depend on many factors beyond execution traceability. The implications discussed here are illustrative of potential benefits rather than guaranteed outcomes.
Illustrative Applications
Aerospace
Scenario: Engine control software detects an anomaly and reduces thrust. Was this correct behaviour given sensor inputs?
With traditional logging: Investigation may require simulation and analysis to reconstruct the decision process, with potential for differing expert interpretations.
With cryptographic execution tracing: Hash chain can verify that given specific sensor inputs, the control law computed the recorded output. The execution path is documented rather than reconstructed.
Medical Devices
Scenario: A medical device delivers an unexpected output. Was this configuration error, hardware fault, or software behaviour?
With traditional logging: Device logs may show commands but not the complete decision process. Broader product actions may be taken when specific root cause is uncertain.
With cryptographic execution tracing: Execution trace can show sensor readings, algorithm computations, and output commands, potentially enabling more targeted root cause identification.
Autonomous Systems
Scenario: An autonomous system fails to respond as expected. Which subsystem was responsible?
With traditional logging: Multiple failure modes might explain the outcome, requiring extensive analysis to narrow possibilities.
With cryptographic execution tracing: Hash chain can document the state of each subsystem, potentially enabling faster isolation of the specific failure point.
Performance Considerations
The overhead of cryptographic sealing can be modest with appropriate implementation:
Computational cost: SHA-256 hashing at 1000 ticks/second adds approximately 2-3% CPU overhead on modern embedded processors (measured in internal testing on ARM Cortex-A53; actual overhead varies by hardware and configuration).
Storage cost: With delta compression and Merkle trees, execution traces can be compressed significantly. Specific storage requirements depend on system complexity and retention policy.
Determinism preservation: Constant-time cryptographic implementations are designed to avoid introducing timing variability that could affect deterministic replay.
Conclusion
As software controls increasingly safety-critical systems—aircraft, medical devices, autonomous vehicles, industrial control systems—the ability to provide strong evidence of what the software did becomes increasingly valuable.
Cryptographic execution tracing represents one architectural approach that can strengthen post-incident analysis:
- From reconstruction to verification: Execution traces can be checked rather than reconstructed
- From inference to evidence: State transitions are documented rather than inferred
- From ambiguity to clarity: Specific execution paths can be identified rather than debated
This approach does not eliminate all uncertainty, nor is it appropriate for every system. The value depends on the risk profile, regulatory context, and investigation requirements of specific applications.
For safety-critical systems where post-incident clarity matters, deterministic execution with cryptographic sealing offers an architectural approach that can materially strengthen evidentiary support and enable more efficient investigation processes.
As with any architectural approach, suitability depends on system requirements, risk classification, and the specific regulatory and legal context in which the system operates.