Why TensorFlow Lite Faces Challenges in DO-178C Certification

Diagram showing DO-178C verification requirements mapped against typical inference framework architecture

Note: This article examines general architectural patterns in mobile inference frameworks and how they interact with aerospace certification requirements. It is not a comprehensive certification analysis, nor does it represent the position of any certification authority. Actual certification decisions depend on the specific system, intended use, and assessor interpretation. TensorFlow Lite is used as a representative example; similar considerations may apply to other inference frameworks.

Aerospace software certification under DO-178C imposes rigorous requirements on how software is designed, verified, and documented. These requirements evolved over decades of aviation safety experience and are mandatory for software whose failure could affect flight safety.

Mobile inference frameworks like TensorFlow Lite were designed with different priorities: flexibility, performance across diverse hardware, and ease of use for researchers and app developers. These are legitimate engineering goals that have made deep learning accessible to millions of applications.

However, the architectural choices that enable flexibility and broad hardware support can create challenges when the same frameworks are considered for aerospace applications. Understanding these challenges helps teams make informed decisions about whether to adapt existing frameworks, build custom inference engines, or pursue hybrid approaches.

This article examines specific architectural properties common in mobile inference frameworks and explains why they can complicate DO-178C certification efforts, particularly at the higher Design Assurance Levels (DAL A and DAL B).

DO-178C Requirements Overview

DO-178C establishes objectives for software development based on the severity of potential failures. Design Assurance Level A applies when software failure could cause or contribute to catastrophic failure conditions. Level B applies to hazardous conditions. Levels C, D, and E apply to progressively less severe scenarios.

At DAL A, the standard requires:

Requirements traceability. Every software requirement must trace to system requirements, and every line of code must trace to a software requirement. The purpose of each code element must be documented and justified.

Verification coverage. Testing must achieve structural coverage (MC/DC for DAL A), meaning every decision outcome and condition combination must be exercised. Dead code is prohibited.

Deterministic behaviour. While DO-178C does not explicitly use the word “deterministic,” its objectives effectively require predictable, reproducible behaviour. Evidence must demonstrate that software performs its intended functions under all operational conditions.

Configuration control. Every artefact affecting the software must be under configuration control with documented history.

These objectives are achievable but require significant discipline in design and documentation.

Architectural Properties of Mobile Inference Frameworks

Mobile inference frameworks share common architectural patterns that optimise for their primary use cases. The following analysis uses publicly documented behaviour; specific implementation details may vary across versions.

Dynamic Memory Allocation

Mobile frameworks typically allocate memory dynamically during model loading and inference:

// Typical pattern in inference frameworks
TfLiteTensor* tensor = interpreter->tensor(index);
// Tensor data may be allocated lazily or resized

Dynamic allocation creates verification challenges at high DALs:

Variable timing. Allocation time depends on heap state, creating unpredictable worst-case execution time (WCET). DAL A systems typically require bounded timing analysis.

Fragmentation risk. Long-running systems may experience heap fragmentation, causing allocation failures that are difficult to reproduce during testing.

Coverage complexity. Allocation failure paths must be tested. With many allocation points, achieving structural coverage of all failure paths can be challenging.

The CAST-21 position paper from certification authorities addresses dynamic memory, noting that demonstrating compliance requires specific measures beyond typical development practices.

Hardware Abstraction Layers

Frameworks abstract hardware differences to support multiple platforms:

// Framework selects implementation based on hardware
if (cpu_supports_neon()) {
    neon_conv2d(input, kernel, output);
} else if (cpu_supports_sse()) {
    sse_conv2d(input, kernel, output);
} else {
    reference_conv2d(input, kernel, output);
}

This flexibility creates certification considerations:

Multiple code paths. Each hardware backend is effectively a different implementation. Complete verification requires testing each path the deployed system might execute.

Platform dependence. Behaviour may vary across platforms in subtle ways. Demonstrating equivalence requires evidence that all backends produce acceptable results.

Conditional complexity. Runtime hardware detection adds branches that must be covered and justified.

For certification, the target hardware configuration is typically fixed, so unused backends could potentially be excluded. However, this requires modifying the framework build or demonstrating that excluded code is truly unreachable.

Floating-Point Computation

Neural network inference frameworks predominantly use floating-point arithmetic:

// Standard floating-point convolution
for (int i = 0; i < output_size; i++) {
    float sum = 0.0f;
    for (int j = 0; j < kernel_size; j++) {
        sum += input[i + j] * kernel[j];  // FP multiply-accumulate
    }
    output[i] = sum;
}

Floating-point creates reproducibility challenges:

Platform variance. Different processors implement floating-point with different intermediate precision, FMA availability, and rounding behaviour. The same computation may produce slightly different results on different hardware.

Compiler effects. Optimisation flags can change operation ordering, affecting results due to floating-point non-associativity.

Verification complexity. If results vary across platforms, test cases cannot use exact expected values. Tolerance-based testing requires justification that the tolerance is acceptable for the application.

For safety-critical applications, some teams choose fixed-point arithmetic to eliminate floating-point variance. This requires model quantisation and validation of accuracy loss.

Third-Party Dependencies

Frameworks depend on external libraries for math operations, threading, and platform services:

TensorFlow Lite dependencies (representative):
├── Eigen (linear algebra)
├── FlatBuffers (serialization)
├── gemmlowp (quantized matrix multiply)
├── ruy (matrix multiply)
├── XNNPACK (neural network operators)
└── platform-specific libraries

Dependencies affect certification in several ways:

Verification scope. All code that executes in the certified system must be verified to an appropriate level. Third-party libraries require the same evidence as first-party code.

Change management. Updates to dependencies require re-verification. Frameworks with frequent releases and many dependencies create configuration management challenges.

Traceability. Requirements for third-party code may not exist or may not meet aerospace documentation standards.

Some certification approaches use “previously developed software” (PDS) provisions for stable, well-characterised libraries. This requires evidence of the library’s service history and suitability for the application.

Code Size and Complexity

Mobile inference frameworks optimise for feature coverage rather than minimal footprint:

Component sizes (representative, version-dependent):
- TensorFlow Lite core: ~1MB compiled
- Operator kernels: ~2-5MB depending on included ops
- Dependencies: variable, potentially several MB

Large codebases affect certification economics:

Verification cost. Testing and documentation effort scales with code size. Structural coverage of millions of lines of code is expensive.

Dead code. Features unused by the target application may remain in the binary unless explicitly excluded. Dead code is prohibited at DAL A.

Review burden. Code reviews for certification require understanding each component’s purpose and behaviour.

Custom inference engines for specific models can achieve much smaller footprints by including only required operations. This reduces verification scope but requires custom development.

Quantifying the Challenge

To illustrate the scale, consider a hypothetical certification effort for a small neural network using a mobile inference framework:

Aspect	Typical Framework	Custom Engine
Binary size	2-5 MB	20-100 KB
Source lines (approx)	100K-500K	2K-10K
External dependencies	5-15	0-2
Hardware backends	3-10	1
Memory allocation	Dynamic	Static
Structural coverage scope	All executed code	All code

These numbers are illustrative; actual figures depend on specific configurations. The key observation is that verification effort correlates with code complexity and variability.

Approaches Teams Have Taken

Organisations pursuing aerospace AI have adopted various strategies:

Framework Modification

Some teams fork existing frameworks and modify them for certification:

Remove unused operators and backends
Replace dynamic allocation with static buffers
Add traceability documentation
Isolate and justify third-party dependencies

This approach preserves compatibility with trained models but requires substantial engineering effort and ongoing maintenance as upstream frameworks evolve.

Custom Implementation

Other teams build inference engines from scratch:

Include only required operations
Design for verification from the start
Eliminate unnecessary variability
Achieve small, auditable codebases

This approach reduces verification scope but requires implementing and validating each neural network operation.

Hybrid Approaches

Some teams use frameworks during development and custom engines for deployment:

Train models using standard frameworks (TensorFlow, PyTorch)
Export weights and architecture
Implement inference in a certifiable custom engine
Validate equivalence between framework and custom outputs

This preserves the benefits of mature training infrastructure while deploying a minimal, certifiable inference engine.

Formal Methods

Emerging approaches apply formal verification to neural network implementations:

Prove properties of fixed-point arithmetic implementations
Verify absence of runtime errors through static analysis
Demonstrate bounded execution time through formal timing analysis

These techniques are maturing but show promise for reducing testing burden.

What Certification Authorities Consider

Certification authorities (DERs in the US, OSD in Europe) evaluate each case individually. Factors they may consider include:

Safety analysis. How do inference failures affect system safety? What mitigations exist? Lower-criticality applications may accept more framework complexity.

Operational constraints. Is the system continuous or bounded in operation? Short mission times may reduce concerns about fragmentation and memory exhaustion.

Verification evidence. What testing has been performed? How comprehensive is the coverage? Strong verification evidence can sometimes offset architectural complexity.

Service history. Has this framework been used successfully in similar applications? Service history can support qualification arguments.

Development assurance. Was the framework developed to aerospace standards? Most general-purpose frameworks were not, which affects what evidence is available.

The outcome depends on the specific system, its safety role, and the persuasiveness of the certification argument.

Implications for Project Planning

Teams considering neural network inference for aerospace applications should:

Assess early. Certification implications should inform architecture decisions at project start, not be discovered during verification.

Engage authorities. Early engagement with certification authorities (through certification plans and issue papers) can clarify expectations before major investment.

Consider alternatives. Custom inference engines may have higher initial development cost but lower verification cost. The trade-off depends on model complexity and reuse expectations.

Budget realistically. Certifying complex software at high DALs is expensive regardless of approach. Underestimating verification effort is a common project failure mode.

Plan for maintenance. Aerospace software lifecycles span decades. Architectures that simplify updates and re-verification reduce lifetime cost.

Conclusion

Mobile inference frameworks like TensorFlow Lite represent excellent engineering for their intended applications. Their architectural choices—dynamic allocation, hardware abstraction, floating-point computation, and extensive dependencies—enable broad compatibility and ease of use.

These same properties can complicate certification under DO-178C, particularly at higher Design Assurance Levels. The challenges are not insurmountable, but they require significant effort to address: modifying frameworks, creating extensive verification evidence, or building custom implementations.

Teams pursuing aerospace AI should understand these challenges early and choose approaches that align with their certification strategy. For some applications, adapting existing frameworks may be cost-effective. For others, purpose-built inference engines may reduce overall certification effort.

The aerospace industry continues to develop best practices for certifiable AI. As experience accumulates and tools mature, clearer patterns will emerge. In the meantime, understanding the interaction between framework architecture and certification requirements helps teams make informed decisions.

As with any certification effort, success depends on early planning, realistic assessment, and close coordination with certification authorities. The challenges are significant but not unprecedented—aerospace has successfully certified complex software before, and neural network inference will eventually follow established paths to certification.

For an example of an inference architecture designed with certification in mind, see certifiable-inference, which demonstrates fixed-point arithmetic, static allocation, and minimal dependencies. A live simulator shows the approach in action.

Why TensorFlow Lite Faces Challenges in DO-178C Certification

DO-178C Requirements Overview

Architectural Properties of Mobile Inference Frameworks

Dynamic Memory Allocation

Hardware Abstraction Layers

Floating-Point Computation

Third-Party Dependencies

Code Size and Complexity

Quantifying the Challenge

Approaches Teams Have Taken

Framework Modification

Custom Implementation

Hybrid Approaches

Formal Methods

What Certification Authorities Consider

Implications for Project Planning

Conclusion

About the Author

Let's Discuss Your AI Infrastructure

Why TensorFlow Lite Faces Challenges in DO-178C Certification

DO-178C Requirements Overview

Architectural Properties of Mobile Inference Frameworks

Dynamic Memory Allocation

Hardware Abstraction Layers

Floating-Point Computation

Third-Party Dependencies

Code Size and Complexity

Quantifying the Challenge

Approaches Teams Have Taken

Framework Modification

Custom Implementation

Hybrid Approaches

Formal Methods

What Certification Authorities Consider

Implications for Project Planning

Conclusion

About the Author

Occasional Technical Updates

Let's Discuss Your AI Infrastructure