Shuffling data is fundamental to ML training. Without shuffling, models learn the order of examples, not their content. Batch composition matters. Epoch ordering matters.
But shuffling typically relies on random number generators. Different seeds, different shuffles. Different platforms, potentially different shuffles even with the same seed. For deterministic ML, that’s a problem.
The Feistel shuffle solves this: a deterministic permutation that looks random but is completely reproducible.
The Problem with Random Shuffling
Consider the standard approach: Fisher-Yates shuffle with a PRNG.
import random
random.seed(42)
random.shuffle(data)This is “deterministic” in the sense that the same seed produces the same shuffle — on the same platform, with the same library version, using the same PRNG implementation.
Change any of those, and guarantees disappear. Python’s random module has changed algorithms between versions. NumPy’s shuffle differs from Python’s. C’s rand() is notoriously platform-dependent.
For certifiable systems, “it works on my machine with my library version” isn’t acceptable.
Enter the Feistel Network
A Feistel network is a cryptographic construction that transforms a block of data through multiple rounds. It’s the basis of DES, Blowfish, and other block ciphers. The key property: it’s invertible. Every input maps to exactly one output, and you can reverse the process.
Applied to shuffling, we use the Feistel network as a permutation generator. Given an index from 0 to N-1, the network produces a permuted index — also from 0 to N-1. Every index maps to a unique output. The permutation is determined entirely by the seed, with no external randomness.
The Cycle-Walking Extension
There’s a catch. Feistel networks operate on fixed-size blocks — typically powers of two. If you have 1000 training samples, you need a permutation of exactly 1000 elements, not 1024.
Cycle-walking handles this. When the Feistel output falls outside your range, you apply the Feistel function again, “walking” until you land in the valid range.
uint32_t ct_feistel_permute(const ct_feistel_t *ctx, uint32_t x)
{
uint32_t result = x;
/* Walk until we land in [0, n) */
do {
result = feistel_round(ctx, result);
} while (result >= ctx->n);
return result;
}This always terminates (provably), and the output is always a valid permutation of [0, N).
The Implementation
Here’s how certifiable-data implements the Feistel permutation:
typedef struct {
uint32_t n; /* Domain size */
uint32_t half_bits; /* Bits per half */
uint32_t mask; /* Mask for half */
uint64_t keys[4]; /* Round keys */
} ct_feistel_t;
void ct_feistel_init(ct_feistel_t *ctx, uint32_t n, uint64_t seed)
{
ctx->n = n;
/* Find smallest power of 2 >= n */
uint32_t bits = 1;
while ((1u << bits) < n) bits++;
ctx->half_bits = (bits + 1) / 2;
ctx->mask = (1u << ctx->half_bits) - 1;
/* Derive round keys from seed */
ctx->keys[0] = seed ^ 0x243F6A8885A308D3ULL;
ctx->keys[1] = seed ^ 0x13198A2E03707344ULL;
ctx->keys[2] = seed ^ 0xA4093822299F31D0ULL;
ctx->keys[3] = seed ^ 0x082EFA98EC4E6C89ULL;
}The round keys are derived from the seed using constants (first digits of pi, for those curious). Same seed → same keys → same permutation.
The core Feistel round:
static uint32_t feistel_round(const ct_feistel_t *ctx, uint32_t x)
{
uint32_t left = x >> ctx->half_bits;
uint32_t right = x & ctx->mask;
for (int round = 0; round < 4; round++) {
uint32_t f = round_function(right, ctx->keys[round], ctx->mask);
uint32_t new_right = left ^ f;
left = right;
right = new_right;
}
return (left << ctx->half_bits) | right;
}
static uint32_t round_function(uint32_t x, uint64_t key, uint32_t mask)
{
uint64_t mixed = x * 0x9E3779B97F4A7C15ULL;
mixed ^= key;
mixed ^= mixed >> 32;
return (uint32_t)(mixed & mask);
}Four rounds is sufficient for good diffusion. The round function uses multiplication and XOR — fast, portable, deterministic.
Why This Matters for ML
Reproducible Epochs
Each epoch shuffles the data differently. With Feistel, you parameterise by epoch:
ct_feistel_init(&shuffle, num_samples, seed + epoch);Epoch 0 with seed 42 always produces the same permutation. Epoch 1 produces a different permutation, but equally deterministic. Resume training at epoch 50, and batches are identical to the original run.
Batch Construction
Given the permutation, batch construction is trivial:
for (uint32_t i = 0; i < batch_size; i++) {
uint32_t global_idx = batch_start + i;
uint32_t sample_idx = ct_feistel_permute(&shuffle, global_idx);
batch[i] = dataset[sample_idx];
}No random sampling. No reservoir algorithms. Just index transformation.
Audit Trails
The Merkle chain commits batch composition. With Feistel shuffling, you can reconstruct any batch from (seed, epoch, batch_index). Verification doesn’t require storing the actual indices — just the parameters that generated them.
Properties
Bijection: Every input maps to exactly one output. No collisions, no missing values.
Determinism: Same (seed, domain_size) always produces the same permutation.
Uniformity: The permutation is statistically uniform — no obvious patterns.
Efficiency: O(1) per index lookup (amortised, accounting for cycle-walking).
Invertibility: Given the output, you can recover the input. Useful for debugging.
The Inverse
Sometimes you need the reverse mapping: “sample 42 appeared at what position in the shuffled order?”
uint32_t ct_feistel_invert(const ct_feistel_t *ctx, uint32_t y)
{
uint32_t result = y;
do {
result = feistel_inverse_round(ctx, result);
} while (result >= ctx->n);
return result;
}The inverse uses the same keys in reverse order with the left/right swap reversed. Feistel networks are their own inverse — one of their elegant properties.
Comparison to Alternatives
Fisher-Yates
- Pro: Simple, well-understood
- Con: Requires storing the full permutation or generating sequentially
- Con: PRNG-dependent
Knuth Shuffle
Same as Fisher-Yates (they’re equivalent algorithms).
Format-Preserving Encryption (FPE)
- Pro: Cryptographically strong
- Con: Typically requires AES, which is overkill for shuffling
- Con: More complex implementation
Feistel
- Pro: Self-contained, no external dependencies
- Pro: Simple to implement correctly
- Pro: Invertible by construction
- Pro: Works for any domain size (with cycle-walking)
- Con: Slightly slower than direct random shuffle
For safety-critical ML, the determinism and simplicity outweigh the minor performance cost.
Test Vectors
For any Feistel implementation to be compliant, these test cases must pass:
/* Domain size 100, seed 0 */
ct_feistel_init(&ctx, 100, 0);
assert(ct_feistel_permute(&ctx, 0) == 73);
assert(ct_feistel_permute(&ctx, 1) == 24);
assert(ct_feistel_permute(&ctx, 50) == 91);
assert(ct_feistel_permute(&ctx, 99) == 8);
/* Bijection check */
bool seen[100] = {false};
for (uint32_t i = 0; i < 100; i++) {
uint32_t j = ct_feistel_permute(&ctx, i);
assert(j < 100);
assert(!seen[j]);
seen[j] = true;
}If your implementation produces different values, it’s not compatible. Cross-platform bit-identity requires exact agreement.
Practical Considerations
Domain Size Changes
If your dataset size changes (new data added, samples removed), the permutation changes entirely. For incremental training, you may want to fix the domain size and handle growth separately.
Large Datasets
The cycle-walking can require multiple iterations when the domain isn’t a power of two. Worst case for n = 2^k + 1 is roughly 2x the iterations. Average case is much better. For billion-element datasets, this is negligible.
Parallelisation
Each index lookup is independent. You can parallelise batch construction trivially — each thread computes its own indices without synchronisation.
Conclusion
The Feistel shuffle replaces “shuffle with random seed and hope for the best” with “deterministic permutation from first principles.” Same seed, same dataset size, same shuffle — guaranteed, across platforms, across time.
For ML training that must be reproducible, this isn’t a minor detail. It’s a foundational requirement. When you claim “we can reproduce this training run,” the data ordering is part of that claim.
The certifiable-* ecosystem uses Feistel shuffling throughout certifiable-data. Combined with the Merkle chain, every batch is cryptographically committed and independently reconstructible.
As with any architectural approach, suitability depends on system requirements, risk classification, and regulatory context. For deterministic ML pipelines, Feistel shuffling provides the mathematical foundation for reproducible data ordering.
Explore the implementation in certifiable-data or see the permutation tests for the complete test vector suite.