Kafka Design Overview

Apache Kafka® is an open-source distributed streaming system used for stream processing, real-time data pipelines, and data integration at scale.

Kafka is designed to be able to act as a unified platform for handling all the real-time data feeds a large company might have. To accomplish this goal, a broad set of use cases was considered and the following requirements decided. Kafka:

Must have high-throughput to support high volume event streams such as real-time log aggregation.
Requires the ability to gracefully deal with large data backlogs in order to support periodic data loads from offline systems.
Must handle low-latency delivery for more traditional messaging use-cases.

The goal for Kafka is to support partitioned, distributed, real-time processing feeds to create new, derived feeds. This motivated Kafka’s partitioning and consumer model.

Finally, in cases where the stream is fed into other data systems for serving, it was important that the system would guarantee fault-tolerance in case of machine failures.

Supporting these uses led to a design with a number of unique elements, making Kafka more like a database log than a traditional messaging system. These elements are outlined in this section.

Ready to get started?

Sign up for Confluent Cloud, the fully managed cloud-native service for Apache Kafka® and get started for free using the Cloud quick start.
Download Confluent Platform, the self managed, enterprise-grade distribution of Apache Kafka and get started using the Confluent Platform quick start.

Topics in this section

The topics in this section are an edited version of the design documentation on the Kafka site, and outline some elements of Kafka design.

Kafka and the File System - Describes how Kafka uses the file system to maintain performance at scale.
Designed for Efficiency - Describes how Kafka avoids byte-copying and uses batching and compression to optimize efficiency.
Producer Design - Provides an in-depth view on how Producers provide load balancing and batch messages sent to brokers.
Consumer Design - Details on why Consumers pull from the broker, and how consumer position is tracked with offsets.
Kafka Message Delivery Guarantees - Describes how Kafka provides semantic guarantees between the broker and producers and consumers, and how Kafka supports exactly once delivery semantics.
Kafka Replication and Committed Messages - Describes replication and new leader election enables the message guarantees provided by Kafka.
Kafka Log Compaction - Describes how compaction enables Kafka to maintain state, and how compaction is configured.
Kafka Quotas - Describes how and why to use client quotas in Kafka.

Learn more

Building Systems Using Transactions in Apache Kafka
To learn how Kafka transactions provide you with accurate, repeatable results from chains of many stream processors or microservices, connected via event streams, see Building Systems Using Transactions in Apache Kafka.
To learn how Kafka architecture has been simplified by the introduction of Apache Kafka Raft Metadata mode (KRaft), see KRaft: Apache Kafka without ZooKeeper.
To learn how serverless infrastructure is built and apply these learnings to your own projects, see Cloud-Native Apache Kafka: Designing Cloud Systems for Speed and Scale

Note

This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2.