Storage Tiering

Most of your data is accessed once and then never again. Storing it on fast, expensive storage forever is just burning money.

Hot, Warm, Cold#

The canonical model is three tiers based on access frequency. Hot storage (SSD-backed, high IOPS) handles recent data that’s accessed constantly. Warm storage (standard HDD or S3 Standard-IA) holds data accessed occasionally. Cold storage (archival, like Glacier) holds data that might never be touched again but legally must be retained.

The trade-off is retrieval latency versus cost. Hot data returns in milliseconds. Cold data retrieval can take minutes to hours. You design your product around which tier each data type belongs to.

Lifecycle Policies#

The tedious part is moving data between tiers. Manual migration doesn’t scale. Lifecycle policies automate it: after 30 days without access, move to warm. After 180 days, move to cold. After 7 years, delete. Define the policy once, forget about it.

graph TD A[New Data Written] --> B[Hot Tier: SSD, instant access] B --> C{30 days no access} C -->|Yes| D[Warm Tier: HDD, seconds latency] C -->|No| B D --> E{180 days no access} E -->|Yes| F[Cold Tier: Archival, minutes latency] E -->|No| D F --> G{Retention limit reached} G -->|Yes| H[Delete] style A fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style B fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style C fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style D fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style E fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style F fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style G fill:#000000,stroke:#00ff00,stroke-width:2px,color:#fff style H fill:#000000,stroke:#ff0000,stroke-width:2px,color:#fff

At Salesforce#

We had audit logs that compliance required us to retain for 7 years. All of it was sitting on the same storage tier regardless of age, which meant we were paying hot-storage prices for 6-year-old records nobody was ever going to query unless a regulator asked.

I helped set up tiered retention: logs under 90 days stayed on fast storage for operational queries, 90 days to 2 years moved to cheaper compressed storage, and older than 2 years got archived. The cost reduction was around 40% on that data set. Retrieval from the archive tier took a few minutes, which was fine because auditors don’t need instant results.

What I’m Learning#

The same pattern shows up in databases too. In-memory for hot rows, SSD buffer pool for warm, disk for cold. MySQL’s InnoDB buffer pool is just the hot tier of a three-level hierarchy most people don’t think about explicitly.

Have you had to argue for tiered storage in a cost review, or did it just happen organically?