Why Tiered Storage for Apache Kafka is a BIG THING…

Kai Waehner
11 min readFeb 24, 2024

Apache Kafka added Tiered Storage to separate compute and storage. The capability enables more scalable, reliable and cost-efficient enterprise architectures. This blog post explores the architecture, use cases, benefits, and a case study for storing Petabytes of data in the Kafka commit log. The end discusses why Tiered Storage does NOT replace other databases and how Apache Iceberg might change future Kafka architectures even more.

(Originally posted on Kai Waehner’s blog: “Why Tiered Storage for Apache Kafka is a BIG THING”… Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter)

Compute vs. Storage vs. Tiered Storage

Let’s define the terms compute, storage, and tiered storage to have the same understanding when exploring this in the context of the data streaming platform Apache Kafka.

Compute and Storage

Two fundamental components of a computing system are compute and storage. They serve different purposes in information processing.

Compute refers to the processing power and capability of a computer system to perform tasks, execute instructions, and carry out computations. The compute component includes the CPU (Central Processing Unit)…

--

--

Kai Waehner

Technology Evangelist — www.kai-waehner.de → Big Data Analytics, Data Streaming, Apache Kafka, Middleware, Microservices => linkedin.com/in/kaiwaehner