Kafka FAQ

This topic answers common questions about Kafka, including what it is, how it works, and how you can get started.

What is Kafka?

Apache Kafka® is a distributed event streaming platform that handles high-throughput, real-time data feeds. You can use Kafka as the central nervous system for your data infrastructure to build scalable, fault-tolerant applications that process streams of events in real time. Kafka provides durable storage, high performance, and horizontal scalability, making it ideal for use cases ranging from real-time analytics to event-driven architectures.

What is Kafka used for?

You can use Kafka for a wide variety of use cases across industries:

  • Real-time data pipelines: Build data pipelines that move data between systems in real time

  • Event-driven architectures: Create applications that respond to events as they happen

  • Stream processing: Process and transform data streams in real time

  • Activity tracking: Track user activity, system metrics, and application logs

  • Messaging: Use Kafka as a high-throughput messaging system for microservices

  • Data integration: Integrate data from multiple sources and make it available to multiple consumers

  • Log aggregation: Collect and aggregate logs from multiple services

For more details, see the Use cases section in the introduction.

How does Kafka work?

Kafka works as a distributed system of brokers that store streams of events in topics. The system works as follows:

  • Producers write events to topics

  • Topics organize events into categories and can be split into partitions for parallel processing

  • Brokers store and serve the data, with replication for fault tolerance

  • Consumers read events from topics, maintaining their own position (offset) in each partition

Kafka uses a log-based storage model where events are appended to topics and can be read multiple times. This design provides high throughput, durability, and the ability to replay historical data.

For more details, see the Terminology section in the introduction.

What are the benefits of using Kafka?

Kafka offers the following key benefits:

  • High throughput: Handles millions of events per second

  • Scalability: Scales horizontally by adding more brokers

  • Durability: Stores events durably and retains them for long periods

  • Fault tolerance: Replication ensures data availability even if brokers fail

  • Low latency: Designed for real-time processing with minimal delay

  • Decoupling: Enables systems to communicate without direct dependencies

  • Replayability: Enables consumers to reprocess historical data by resetting their offset

  • Multi-producer, multi-consumer: Supports multiple applications producing to and consuming from the same topics

How is Kafka different from traditional messaging systems?

Kafka differs from traditional messaging systems in the following ways:

  • Durability: Stores events durably and doesn’t delete them after consumption, unlike many messaging systems

  • Replayability: Enables consumers to read the same events multiple times by resetting their offset

  • Throughput: Designed for high-throughput scenarios with millions of messages per second

  • Partitioning: Supports partitioning topics across multiple brokers for parallel processing

  • Log-based: Uses an append-only log structure rather than a queue-based model

  • Scalability: Scales horizontally by adding more brokers to the cluster

What are Kafka topics and partitions?

A topic is a category or feed name to which events are published. Topics are Kafka’s fundamental unit of organization. A partition is a division of a topic that allows Kafka to parallelize processing and scale horizontally.

  • Topics can have one or more partitions

  • Each partition is an ordered, immutable sequence of events

  • Partitions enable parallel processing across multiple consumers

  • Events with the same key are written to the same partition, which ensures order

What is a Kafka broker?

A broker is a server in the Kafka cluster that stores event streams. A Kafka cluster consists of multiple brokers working together:

  • Each broker can handle hundreds of thousands of reads and writes per second

  • Brokers store topics and their partitions

  • Every broker in a cluster is also a bootstrap server, so you can connect to any broker to access the entire cluster

  • Brokers replicate data across the cluster for fault tolerance

What are Kafka producers and consumers?

Producers are client applications that write (publish) events to Kafka topics. Consumers are client applications that read (subscribe to) events from topics.

  • Producers control how events are assigned to partitions, such as by key or round-robin

  • Consumers maintain their own offset (position) in each partition they read from

  • Consumers can read from multiple partitions and topics

  • Multiple producers and consumers can work with the same topics simultaneously

What is Kafka replication?

Replication is the process of maintaining multiple copies of data across different brokers in a Kafka cluster. Replication provides:

  • Fault tolerance: If a broker fails, data is still available from replicas

  • High availability: The cluster can continue operating even with broker failures

  • Data durability: Multiple copies ensure data isn’t lost

A common production setting uses a replication factor of 3, which means there are three copies of your data across different brokers.

What are Kafka Connect and Kafka Streams?

Kafka Connect is a framework for connecting Kafka with external systems such as databases, key-value stores, and file systems. Kafka Connect provides:

  • Pre-built connectors for common systems

  • A framework for building custom connectors

  • Source connectors that import data into Kafka and sink connectors that export data from Kafka

Kafka Streams is a client library for building real-time stream processing applications. Kafka Streams enables you to:

  • Process and transform data streams in real time

  • Perform aggregations, joins, and windowing operations

  • Build stateful applications that maintain state across events

How do I get started with Kafka?

You can get started with Kafka in several ways:

  • Confluent Cloud (fully-managed service): Sign up for Confluent Cloud to get started with Kafka for free with $400 in free credits. No infrastructure management is required. See the Confluent Cloud Quick Start for step-by-step instructions.

  • Confluent Platform (self-managed): Download Confluent Platform for local development or production deployment. Confluent Platform includes Kafka plus additional tools and features. See Installation Overview for installation instructions.

  • Open-source Apache Kafka: Download Apache Kafka® from kafka.apache.org. Suitable for development and learning.

For tutorials and sample code, see the Clients section of the Confluent documentation.

Next steps

  • Learn the basics: See Get Started with Kafka for a hands-on tutorial.

  • Explore client libraries: Find documentation, tutorials, and sample code for creating Kafka producer and consumer clients in several languages in the Clients section.

  • Dive deeper: Watch a series of videos that introduce Kafka and the concepts in this topic in Kafka 101. For a deep-dive into the design decisions and features of Kafka, see Kafka Design Overview.