When You Don't Need a Feature Store

The Pattern

A team builds their first production ML model. It works. Then someone asks: “Should we use a feature store?”

The question implies the answer. Feature stores are standard MLOps infrastructure. Every mature ML organization has one. The vendors say so. The conference talks recommend them. Not having a feature store feels like technical debt.

So the team spends three months evaluating Feast, Tecton, and Databricks Feature Store. Another two months integrating the chosen solution. Another month debugging why features aren’t matching between training and serving.

Six months later, they’re serving predictions from a feature store that recomputes features on every request - exactly what they were doing before, but with more complexity and latency.

This pattern repeats constantly. Feature stores solve real problems. But most teams don’t have those problems yet.

What Feature Stores Actually Solve

Feature stores solve three specific problems:

Problem 1: Training-Serving Skew

When training uses different feature computation logic than serving. The model trains on sum(purchases_last_30_days) but serves with sum(purchases_last_month) - different results, model breaks.

Problem 2: Feature Recomputation

When multiple models need the same features. Computing user_lifetime_value independently for each model wastes resources.

Problem 3: Point-in-Time Correctness

When training needs historical feature values. For a prediction made on 2024-06-15, what was user_tier on that date? Naive joins use current values, introducing data leakage.

These are real problems. If you have them, feature stores help. But you might not have them yet.

When You Don’t Need a Feature Store

You Have One Model

If you have a single model, training-serving skew is easy to avoid without infrastructure:

# features.py - single source of truth
def compute_features(user_data, transaction_data):
    """Used by both training and serving"""
    return {
        'total_purchases': len(transaction_data),
        'avg_purchase_value': mean([t.amount for t in transaction_data]),
        'days_since_last_purchase': (today() - max([t.date for t in transaction_data])).days,
        # ... more features
    }

# training.py
features = compute_features(user_data, transactions)
model.train(features, labels)

# serving.py
features = compute_features(user_data, transactions)
prediction = model.predict(features)

This works. It’s simple. It’s maintainable. Skew is impossible - same code path for both.

Why this works: With one model, feature logic fits in one file. No coordination needed. No shared infrastructure required.

When this breaks: When you have 10 models and each reimplements compute_features() slightly differently. Now you have skew risk and maintenance burden.

Your Features Are Request-Scoped

If features only use data in the request, there’s nothing to store:

# Request contains everything needed
@app.post("/predict")
def predict(request: PredictRequest):
    features = {
        'transaction_amount': request.amount,
        'merchant_category': request.merchant_category,
        'is_international': request.country != 'US',
        'hour_of_day': datetime.now().hour,
    }
    return model.predict(features)

Why this works: No historical data needed. No precomputation needed. Feature store would add latency without benefit.

When this breaks: When you need user_average_transaction_amount or merchant_fraud_rate - data not in the request. Now you need storage.

You Can Tolerate Batch Predictions

If predictions can be computed overnight and cached, feature stores are overkill:

# Nightly batch job
def compute_all_predictions():
    users = load_all_users()
    
    for user in users:
        features = compute_features(user)
        prediction = model.predict(features)
        cache.set(f"prediction:{user.id}", prediction)

# Serving just reads cache
@app.get("/prediction/{user_id}")
def get_prediction(user_id: str):
    return cache.get(f"prediction:{user_id}")

Why this works: Features computed once per day. Predictions cached. Serving is just cache lookup. No online feature computation needed.

When this breaks: When predictions need to be real-time based on latest data. Now you need online features.

Your Training Data Is Small

If your training dataset fits in memory, point-in-time correctness is a SQL query:

# Training with point-in-time correctness
training_data = db.query("""
    SELECT 
        u.user_id,
        u.created_at,
        COUNT(t.id) as num_transactions,
        AVG(t.amount) as avg_transaction
    FROM events e
    JOIN users u ON e.user_id = u.user_id
    LEFT JOIN transactions t ON t.user_id = u.user_id 
        AND t.timestamp < e.timestamp  -- Point-in-time correctness
    WHERE e.label IS NOT NULL
    GROUP BY u.user_id, u.created_at
""")

Why this works: Database handles point-in-time joins. No feature store materialization needed. Results are fast enough for typical model training.

When this breaks: When you have billions of training examples and complex feature joins. Now the SQL query takes hours. Feature store precomputation becomes necessary.

What to Use Instead

If you don’t need a feature store, use simpler alternatives:

Alternative 1: Shared Feature Functions

# features/user_features.py
def compute_user_features(user_id: str, as_of: datetime = None):
    """Compute user features for training or serving
    
    Args:
        user_id: User identifier
        as_of: Timestamp for point-in-time correctness (training)
               If None, uses current time (serving)
    """
    as_of = as_of or datetime.now()
    
    transactions = db.query(
        "SELECT * FROM transactions WHERE user_id = ? AND timestamp < ?",
        user_id, as_of
    )
    
    return {
        'num_transactions': len(transactions),
        'total_spent': sum(t.amount for t in transactions),
        'avg_transaction': mean([t.amount for t in transactions]),
        'days_since_last': (as_of - max([t.timestamp for t in transactions])).days
    }

# Training uses as_of for point-in-time correctness
train_features = [
    compute_user_features(ex.user_id, as_of=ex.timestamp)
    for ex in training_examples
]

# Serving uses current time
serve_features = compute_user_features(request.user_id)

Advantages:

Training-serving skew impossible (same code)
Point-in-time correctness handled
No new infrastructure
Easy to debug (just Python)

Disadvantages:

Repeated computation (no caching across models)
Slow for many models or large-scale training

Alternative 2: Cached Aggregations

# Precompute expensive features, cache results
class FeatureCache:
    def __init__(self, cache_ttl_seconds=300):
        self.cache = {}
        self.ttl = cache_ttl_seconds
    
    def get_user_features(self, user_id: str):
        cache_key = f"user_features:{user_id}"
        
        # Check cache
        if cache_key in self.cache:
            cached_value, timestamp = self.cache[cache_key]
            if time.time() - timestamp < self.ttl:
                return cached_value
        
        # Compute and cache
        features = self._compute_user_features(user_id)
        self.cache[cache_key] = (features, time.time())
        return features
    
    def _compute_user_features(self, user_id):
        # Expensive computation here
        return compute_features(user_id)

# Use in serving
feature_cache = FeatureCache(cache_ttl_seconds=300)

@app.post("/predict")
def predict(request: PredictRequest):
    features = feature_cache.get_user_features(request.user_id)
    return model.predict(features)

Advantages:

Fast serving (cache hits avoid computation)
No infrastructure beyond Redis/Memcached
TTL controls freshness
Works for multiple models

Disadvantages:

Cache invalidation complexity
No point-in-time correctness for training
Need to handle cache misses

Alternative 3: Materialized Views

# Database-native feature materialization
db.execute("""
    CREATE MATERIALIZED VIEW user_features AS
    SELECT 
        user_id,
        COUNT(*) as num_transactions,
        SUM(amount) as total_spent,
        AVG(amount) as avg_transaction,
        MAX(timestamp) as last_transaction_date
    FROM transactions
    GROUP BY user_id
""")

# Refresh periodically (e.g., hourly)
db.execute("REFRESH MATERIALIZED VIEW user_features")

# Training queries the view
train_features = db.query("""
    SELECT u.*, f.* 
    FROM training_examples u
    JOIN user_features f ON u.user_id = f.user_id
""")

# Serving queries the view
serve_features = db.query(
    "SELECT * FROM user_features WHERE user_id = ?",
    user_id
)

Advantages:

Database-native (no new systems)
Fast reads (precomputed)
SQL-based (familiar tools)
Works for moderate scale

Disadvantages:

Refresh lag (data staleness)
Less flexible than code
Doesn’t scale to billions of features

When You Actually Need a Feature Store

You need a feature store when:

1. Multiple teams, many models

When 5 teams are building 20 models and all need user_lifetime_value. Reimplementing it 20 times creates skew risk and maintenance burden.

2. Real-time features at scale

When you need sub-100ms serving with features computed from terabytes of data. Materialized views and caches don’t scale to this.

3. Complex point-in-time correctness

When training requires accurate historical feature values across dozens of feature types with different update frequencies.

4. Feature reuse is proven valuable

When you measure that 80% of features are shared across models. Not when you hope they might be shared someday.

5. Feature computation is expensive

When computing features costs more than storing them. For example, complex aggregations over streaming data.

At this point, feature store infrastructure pays for its complexity. Before this point, it’s premature optimization.

The Migration Path

If you start simple and later need a feature store, migration is straightforward:

Phase 1: Shared functions (current state)

def compute_features(user_id):
    # Compute on demand
    return features

Phase 2: Add caching

def compute_features(user_id):
    cached = cache.get(f"features:{user_id}")
    if cached:
        return cached
    
    features = _compute(user_id)
    cache.set(f"features:{user_id}", features, ttl=300)
    return features

Phase 3: Separate computation from serving

# Background job precomputes features
def precompute_features():
    for user_id in active_users():
        features = compute_features(user_id)
        feature_store.write(user_id, features)

# Serving reads precomputed features
def get_features(user_id):
    return feature_store.read(user_id)

Phase 4: Add feature store

# Now using Feast/Tecton/etc
features = feature_store.get_online_features(
    entity_rows=[{"user_id": user_id}],
    features=["user_lifetime_value", "transaction_count"]
)

Each phase works independently. You only move to the next phase when current phase’s limitations become painful.

The Unsexy Truth

Feature stores solve real problems. But those problems appear at scale, not at the start.

Most teams building their first few models don’t have:

Dozens of models competing for feature computation resources
Terabytes of feature data requiring specialized storage
Complex point-in-time correctness requirements across teams

What they have:

One or two models
Features that fit in a database
Team small enough to coordinate in Slack

For these teams, a feature store is complexity without benefit. Shared functions and basic caching solve the same problems with less infrastructure.

Build the simple thing first. Add complexity when you have evidence you need it. You’ll know when that time comes - your team will be spending more time working around the limitations of simple approaches than they would spend adopting a feature store.

Until then, skip it.

For more on infrastructure decisions in production ML:

Production AI Systems: What 30 Years of UNIX Taught Me - Principles for avoiding premature complexity
Model Serving Architecture Patterns - When to choose simple vs complex serving architectures

When You Don't Need a Feature Store

The Pattern

What Feature Stores Actually Solve

When You Don’t Need a Feature Store

You Have One Model

Your Features Are Request-Scoped

You Can Tolerate Batch Predictions

Your Training Data Is Small

What to Use Instead

Alternative 1: Shared Feature Functions

Alternative 2: Cached Aggregations

Alternative 3: Materialized Views

When You Actually Need a Feature Store

The Migration Path

The Unsexy Truth

About the Author

Let's Discuss Your AI Infrastructure

When You Don't Need a Feature Store

The Pattern

What Feature Stores Actually Solve

When You Don’t Need a Feature Store

You Have One Model

Your Features Are Request-Scoped

You Can Tolerate Batch Predictions

Your Training Data Is Small

What to Use Instead

Alternative 1: Shared Feature Functions

Alternative 2: Cached Aggregations

Alternative 3: Materialized Views

When You Actually Need a Feature Store

The Migration Path

The Unsexy Truth

Related Reading

About the Author

Occasional Technical Updates

Let's Discuss Your AI Infrastructure