Kafka Internals Explained Simply
Kafka internals explained simply for production workloads
Kafka looks simple from the API, but understanding its internal write and read paths is what lets you tune throughput, durability, and latency. This deep dive breaks down the storage layout, replication flow, and metadata handling so you can reason about tradeoffs and configure clients intentionally.
Prerequisites
- Running Kafka cluster (KRaft or ZooKeeper-based)
- Java 17+ with Spring Boot 3.x
- Basic knowledge of partitions and consumer groups
Log structure and segment files
Kafka persists data in append-only logs. Each partition is stored as a directory with multiple segment files. A segment has two index files that map offsets to positions in the log file.
- Log segment: the actual record bytes in order
- Offset index: sparse index of offsets to byte positions
- Time index: timestamp to byte positions for time-based lookups
Segments roll based on size or time. Compaction and retention are applied at the segment level, which is why log.segment.bytes and log.roll.ms matter when you design retention policies.
The write path in detail
- Producer sends a batch to the leader partition.
- Leader appends to its local log and updates the in-memory index.
- Replicas in the in-sync replica (ISR) set fetch the new data.
- Leader responds based on
acksandmin.insync.replicas.
This means acks=all combined with min.insync.replicas=2 gives strong durability but can reject writes if the ISR shrinks. It also means the producer batch size and linger time directly affect log append efficiency.
Replication, ISR, and high watermark
Kafka maintains a high watermark (HW) per partition. Consumers can only read up to the HW to guarantee they see replicated data. The leader advances the HW when all replicas in the ISR have persisted the records.
- If a follower falls behind, it can be removed from the ISR.
- When a follower catches up, it re-enters the ISR, allowing the HW to advance.
The read path and consumer groups
Consumers fetch records in batches. The group coordinator assigns partitions and stores offsets. Rebalances are triggered when membership changes or partitions are added.
Important internal behaviors:
- Fetch requests are optimized around
fetch.min.bytesandfetch.max.wait.msto avoid small IO operations. - Offsets are stored in the internal
__consumer_offsetstopic, so their durability depends on your replication settings.
Controller, metadata, and KRaft
Kafka uses a controller to coordinate partition leadership. In KRaft mode, the metadata log replaces ZooKeeper. The controller writes metadata changes (topic creation, ACLs, leader changes) to the metadata log so that all brokers can replay the state.
Spring Boot configuration example
Use explicit producer and consumer settings so you can align durability with performance requirements.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
spring:
kafka:
bootstrap-servers: localhost:9092
producer:
acks: all
retries: 5
properties:
enable.idempotence: true
linger.ms: 5
batch.size: 32768
consumer:
group-id: inventory-service
auto-offset-reset: earliest
enable-auto-commit: false
properties:
fetch.min.bytes: 1048576
fetch.max.wait.ms: 500
1
2
3
4
5
6
7
8
9
10
11
@Configuration
class KafkaConfig {
@Bean
ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory(
ConsumerFactory<String, String> consumerFactory) {
var factory = new ConcurrentKafkaListenerContainerFactory<String, String>();
factory.setConsumerFactory(consumerFactory);
factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL);
return factory;
}
}
Operational signals to watch
UnderReplicatedPartitionsshould be zero in steady state.RequestHandlerAvgIdlePercentdropping indicates brokers are saturated.- Replication lag and fetcher throttling warn about disk or network bottlenecks.
Things to remember
- Segment size influences retention granularity and compaction efficiency.
- The high watermark is the boundary for consumer visibility and durability.
- ISR size is the foundation of strong delivery guarantees.
- Tune producer batching and consumer fetch settings together for predictable latency.