Fixed-Point Fundamentals

Most engineers learn floating-point arithmetic and never question it. IEEE 754 is convenient, widely supported, and “good enough” for most applications.

Until it isn’t.

When you need deterministic results — the same output for the same input, every time, on every platform — floating-point becomes a liability. When certification bodies ask you to prove your arithmetic is bounded, floating-point makes that proof difficult. When accumulated rounding errors cause your control system to drift, floating-point is the culprit.

Fixed-point arithmetic solves these problems. But most engineers never learned it properly.

View on GitHub

The Problem with Floating-Point

Floating-point arithmetic has three fundamental issues for safety-critical systems:

1. Non-Determinism Across Platforms

The same floating-point code can produce different results on different hardware:

// This may give different answers on x87 vs SSE vs ARM
float result = a * b + c * d;

The x87 FPU uses 80-bit extended precision internally. SSE uses 64-bit. ARM has its own quirks. Compiler flags like -ffast-math change behaviour. The “same” computation isn’t the same at all.

2. Accumulation Drift

Small rounding errors compound over time:

float sum = 0.0f;
for (int i = 0; i < 8640000; i++) {  // 24 hours at 100Hz
    sum += 0.01f;
}
// Expected: 86400.0
// Actual: ~87296.4 (error: ~1%)

In a control system running for hours or days, this drift can cause real problems. The Patriot missile failure in 1991 was caused by exactly this kind of accumulated error — a 0.000000095 second timing drift that, over 100 hours, caused a 573-metre targeting error. Twenty-eight soldiers died.

3. Certification Challenges

Safety standards like DO-178C (aerospace), IEC 62304 (medical devices), and ISO 26262 (automotive) require you to prove bounds on your computations. With floating-point, proving worst-case behaviour is complex. The IEEE 754 standard alone is 58 pages of edge cases.

The Fixed-Point Solution

Fixed-point arithmetic uses integers with an implicit scale factor. A Q16.16 number, for example, uses 32 bits: 16 for the integer part, 16 for the fractional part. The scale factor is 2^16 = 65536.

// Q16.16: 16 integer bits, 16 fractional bits
typedef int32_t q16_16_t;

#define Q16_16_SCALE 65536

// Convert 3.14159 to Q16.16
q16_16_t pi = (q16_16_t)(3.14159 * Q16_16_SCALE);  // = 205887

// The value 205887 represents 205887/65536 = 3.14158630...

This representation gives you:

Determinism — integer arithmetic is identical on every platform
Bounded precision — you know exactly how precise your numbers are (1/65536 ≈ 0.000015 for Q16.16)
Bounded range — you know exactly what values you can represent (±32767.99998 for Q16.16)
No special hardware — every CPU handles integers identically

The trade-off is that you must choose your format carefully. Range and precision are in tension — more bits for the integer part means fewer bits for the fractional part.

Course Structure

Fixed-Point Fundamentals teaches this material systematically, from motivation through practical application:

Lesson	Topic	What You’ll Learn
00	Why Not Float?	Real failures, platform divergence, accumulation drift
01	The Model	Q notation, implicit scaling, range vs precision trade-off
02	Arithmetic	The widening pattern: widen → compute → narrow
03	Safety	Saturation logic, sticky fault flags, avoiding undefined behaviour
04	Rounding	Truncation vs RNE, eliminating statistical bias
05	Conversion	Format rescaling, precision loss analysis
06	Patterns	Accumulators, lookup tables, mixed-precision PID
07	Strategy	Decision frameworks, format selection, certification bridge

Each lesson includes working C99 code you can compile and run immediately.

The Core Pattern: Widen → Compute → Narrow

The most important technique in fixed-point arithmetic is the widening pattern. When you multiply two Q16.16 numbers, the intermediate result needs 64 bits to avoid overflow:

q16_16_t q16_mul(q16_16_t a, q16_16_t b) {
    // Widen to 64 bits for the intermediate
    int64_t wide = (int64_t)a * (int64_t)b;
    
    // The product has 32 fractional bits (16 + 16)
    // Shift right by 16 to get back to Q16.16
    // Add half for round-to-nearest-even
    wide += (1 << 15);
    
    return (q16_16_t)(wide >> 16);
}

Without widening, the multiplication would overflow silently. With widening, you have room for the full result before narrowing back to your target format.

This pattern — widen, compute, narrow — appears everywhere in fixed-point code. Master it and you’ve mastered half of fixed-point arithmetic.

Rounding: Why It Matters More Than You Think

When you narrow a result, you lose precision. How you handle that loss matters:

Truncation (round toward zero) introduces systematic bias. If you truncate repeatedly, errors accumulate in one direction.

Round-half-up (school rounding) has the same problem — it biases toward positive infinity.

Round-to-nearest-even (banker’s rounding) is statistically unbiased. When the value is exactly halfway between two representable numbers, it rounds to the nearest even number. Over many operations, the positive and negative roundings cancel out.

// Round-to-nearest-even for Q16.16 division
q16_16_t q16_div_rne(q16_16_t a, q16_16_t b) {
    int64_t wide = ((int64_t)a << 16);
    int64_t quotient = wide / b;
    int64_t remainder = wide % b;
    
    // Round to nearest even
    int64_t half = (b > 0) ? b/2 : -b/2;
    if (remainder > half || 
        (remainder == half && (quotient & 1))) {
        quotient += (b > 0) ? 1 : -1;
    }
    
    return (q16_16_t)quotient;
}

Lesson 04 demonstrates this with a 1-million-operation test. Truncation drifts to zero. Round-half-up drifts positive. RNE stays centred.

Overflow: The Silent Killer

In C, signed integer overflow is undefined behaviour. The compiler is free to assume it never happens, which can lead to surprising optimisations that break your code.

Fixed-point code must handle overflow explicitly:

typedef struct {
    q16_16_t value;
    uint8_t flags;  // Sticky fault flags
} q16_16_result_t;

#define FAULT_OVERFLOW  0x01
#define FAULT_UNDERFLOW 0x02
#define FAULT_SATURATED 0x04

q16_16_result_t q16_add_safe(q16_16_t a, q16_16_t b, uint8_t *flags) {
    int64_t wide = (int64_t)a + (int64_t)b;
    q16_16_result_t result;
    
    if (wide > INT32_MAX) {
        result.value = INT32_MAX;
        *flags |= FAULT_OVERFLOW | FAULT_SATURATED;
    } else if (wide < INT32_MIN) {
        result.value = INT32_MIN;
        *flags |= FAULT_UNDERFLOW | FAULT_SATURATED;
    } else {
        result.value = (q16_16_t)wide;
    }
    
    return result;
}

The sticky fault flags pattern is crucial for safety-critical systems. You don’t check for overflow on every operation (too expensive). Instead, you clear the flags at the start of a computation pipeline, let them accumulate, and check once at the end. If any overflow occurred, you know about it.

Lesson 03 demonstrates this with a PID controller that experiences integral windup. Without saturation, the output wraps negative and the controller violently reverses — potentially causing physical damage in a real system.

Practical Patterns

Lesson 06 brings everything together with patterns you’ll use in real systems:

Mixed-Precision PID Controller

Different parts of a PID controller have different requirements:

typedef struct {
    q8_24_t kp, ki, kd;      // Coefficients: high precision, small range
    q32_32_t integral;        // State: wide range for accumulation
    q16_16_t last_error;      // State: general purpose
} pid_controller_t;

q16_16_t pid_update(pid_controller_t *pid, q16_16_t error, q16_16_t dt) {
    // Proportional term
    q32_32_t p_term = q_mul_q8_24_q16_16(pid->kp, error);
    
    // Integral term (accumulate in Q32.32 to prevent overflow)
    pid->integral = q32_add(pid->integral, 
                            q_mul_q8_24_q16_16(pid->ki, q_mul(error, dt)));
    
    // Derivative term
    q16_16_t derivative = q_div(q_sub(error, pid->last_error), dt);
    q32_32_t d_term = q_mul_q8_24_q16_16(pid->kd, derivative);
    
    pid->last_error = error;
    
    // Sum and convert to output format
    q32_32_t output = q32_add(q32_add(p_term, pid->integral), d_term);
    return q32_to_q16(output);  // Saturate if needed
}

Coefficients use Q8.24 (small values, high precision). The integral accumulator uses Q32.32 (wide range, prevents overflow). Inputs and outputs use Q16.16 (general purpose interface).

Sine Lookup Table with Linear Interpolation

When you can’t afford the cycles for CORDIC or polynomial approximation:

// 256-entry quarter-wave table
static const q16_16_t sine_table[257] = {
    0x00000000, 0x00000648, 0x00000C8F, /* ... */
};

q16_16_t q16_sin(q16_16_t angle) {
    // Reduce to [0, 2π)
    angle = angle & 0x0000FFFF;  // Assuming 2π = 0x10000
    
    // Determine quadrant and index
    uint32_t quadrant = (angle >> 14) & 0x3;
    uint32_t index = (angle >> 6) & 0xFF;
    uint32_t frac = angle & 0x3F;
    
    // Lookup with linear interpolation
    q16_16_t y0 = sine_table[index];
    q16_16_t y1 = sine_table[index + 1];
    q16_16_t result = y0 + (((y1 - y0) * frac) >> 6);
    
    // Apply quadrant symmetry
    // ...
    
    return result;
}

A 257-entry table (1KB) gives you better than 16-bit precision with simple linear interpolation. No floating-point transcendentals required.

The Certification Bridge

This course teaches the fundamentals with standalone, copy-paste-friendly code under the MIT license.

For production safety-critical systems, the certifiable-inference project provides:

This Course	certifiable-* Ecosystem
Teaching implementations	Production implementations
Standalone examples	Ecosystem integration
MIT license	GPL + CLA for IP protection
”Here’s how it works"	"Here’s proof it works”

The certifiable-* ecosystem adds Merkle audit trails, cross-platform bit-identity verification, and documentation templates aligned with DO-178C, IEC 62304, and ISO 26262. If you’re building systems that need certification, that’s where you go after learning the fundamentals here.

Getting Started

git clone https://github.com/SpeyTech/fixed-point-fundamentals.git
cd fixed-point-fundamentals
make

Each lesson is self-contained. Start with Lesson 00 to understand why floating-point fails, or jump to the topic you need.

Prerequisites: C programming (comfortable with integers and bit operations), basic arithmetic, a C compiler. No external dependencies.

What You’ll Build

By the end of this course, you’ll be able to:

Implement fixed-point arithmetic in strict C99
Choose appropriate Q formats for your signal characteristics
Handle overflow, underflow, and rounding correctly
Build production-grade control systems with bounded, deterministic behaviour
Understand the path from teaching implementations to certified production code

Reference Materials

The course includes formal specifications following the same methodology used in aerospace and medical device development:

FPF-MATH-001 — Mathematical closure architecture
FPF-STRUCT-001 — Data structure specification

Plus quick-reference materials:

Q Formats — Common formats and their properties
Common Pitfalls — Mistakes to avoid
Further Reading — Where to go next

Prove first, code second. MIT licensed.

As with any architectural approach, suitability depends on system requirements, risk classification, and regulatory context.

Fixed-Point Fundamentals

The Problem with Floating-Point

1. Non-Determinism Across Platforms

2. Accumulation Drift

3. Certification Challenges

The Fixed-Point Solution

Course Structure

The Core Pattern: Widen → Compute → Narrow

Rounding: Why It Matters More Than You Think

Overflow: The Silent Killer

Practical Patterns

Mixed-Precision PID Controller

Sine Lookup Table with Linear Interpolation

The Certification Bridge

Getting Started

What You’ll Build

Reference Materials

About the Author

Questions or Contributions?

Fixed-Point Fundamentals

The Problem with Floating-Point

1. Non-Determinism Across Platforms

2. Accumulation Drift

3. Certification Challenges

The Fixed-Point Solution

Course Structure

The Core Pattern: Widen → Compute → Narrow

Rounding: Why It Matters More Than You Think

Overflow: The Silent Killer

Practical Patterns

Mixed-Precision PID Controller

Sine Lookup Table with Linear Interpolation

The Certification Bridge

Getting Started

What You’ll Build

Reference Materials

Related Reading

About the Author

Occasional Technical Updates

Questions or Contributions?