Feature Stores

2026-04-22sohilladhani

You train a model using yesterday’s data. You serve it using today’s data. The feature computation logic is slightly different between the two. The model degrades silently and you spend a week figuring out why. The Training-Serving Skew Problem ML models are trained on offline batches: historical data, features computed via Spark jobs, labels aggregated over time. At serving time, features are computed online: live data, lower latency budget, different code path.

[Read more]

Embedding Vectors and ANN Search

2026-04-21sohilladhani

#distributed-systems #machine-learning #recommendations #system-design

“Find the 10 most similar items to this one” sounds simple. With millions of items represented as 256-dimensional vectors, exact search is too slow to be useful in production. What Embeddings Are An ML model maps an item (a product, a document, a user’s history) to a dense numeric vector. The geometry of that vector space encodes semantic similarity: similar items land close together. You train the model on interaction data and the embeddings learn to represent “things that users treat similarly.

[Read more]

Collaborative Filtering

2026-04-20sohilladhani

#distributed-systems #machine-learning #recommendations #system-design

You don’t know what a user wants. But you know what people like them have wanted. That’s the intuition behind collaborative filtering. The Two Approaches User-based CF finds users similar to you, then recommends what they liked. Item-based CF finds items similar to what you’ve already liked. Item-based is generally more stable because user behavior shifts rapidly (you might buy a couch once), while item similarity changes slowly (a couch is similar to other furniture regardless of who buys it).

[Read more]

Posts for: #Machine-Learning

Feature Stores

Embedding Vectors and ANN Search

Collaborative Filtering