I used to think that as soon as data hit the cache, the performance battle was over. In my head, RAM was the ultimate speed limit. But while building a metadata engine here in Hyderabad, I hit a wall that proved me wrong.

We designed the system to support two access patterns. First, fetching a specific record by its ID. Second, filtering across thousands of records by a specific attribute. Initially, we stored everything as standard Objects. Whether in the Java Heap or Redis, a record was a single serialized blob.

In database terms, this is Row-Based storage. It reminded me of the trade-offs I learned about LSM Trees vs B-Trees: the shape of your data dictates your performance.

The Backpack Problem#

The Object-based approach worked perfectly for ID lookups. You give a key. You get the object. Fast.

But when consumers tried to filter by attributes, performance tanked. Finding all records with Status="Active" was painful.

To check that one field, the system had to fetch the entire object. It is exactly like asking 5,000 students if they have a red pencil, but forcing them to hand you their entire backpack. You have to unzip every single one. You dig past their lunch and gym clothes. Finally, you find that one pencil. We were wasting massive CPU cycles deserializing “junk” data we didn’t need.

The Choice: Objects vs. Vectors#

We realized we couldn’t force one storage shape to serve two opposite masters. Just like we separate reads and writes in CQRS, we decided to separate our memory layouts.

Consumers could now configure their storage mode based on their primary access pattern:

  1. Object Mode for full retrieval (The Backpack).
  2. Vector Mode for heavy filtering (The Pencil Tray).

For the Vector Mode, we implemented Apache Arrow. This brings Columnar storage concepts into memory. Instead of storing [Object A: {Tag1, Tag2}], we stored the attributes in contiguous vectors: [Tag1 Vector: Value A, Value B...].

graph TD A[Consumer Request] --> B{Storage Mode Choice} B --> C[Object Mode] B --> D[Vector Mode] C --> E[Fetch Full Object] --> F[Deserialize Blob] --> G[Check Attribute] D --> H[Fetch Attribute Vector] --> I[Direct Memory Scan] G --> J[High CPU Overhead] I --> K[13x Faster Execution] style A fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style B fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style C fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style D fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style E fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style F fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style G fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style H fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style I fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style J fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style K fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff

This alignment meant the filtering queries could scan a contiguous block of memory. No jumping pointers. No touching the “rest” of the record.

Physics Wins#

The impact was immediate. Workloads that switched to the Vector option saw a 13x improvement in query execution speed.

It was a humbling lesson. “In-memory” isn’t a silver bullet. If your data layout doesn’t match your access pattern, you are just burning CPU cycles faster. At the Staff level, you stop looking at where the data lives. You start looking at how it is laid out.

Have you ever had to redesign a data structure because the cache was surprisingly slow?