Fundamentals of Apache Kafka

Apache Kafka is a scalable, fault-tolerant and distributed
Publish-Subscribe messaging system that receives data from
various sources and makes it available in real-time for analysis.
Its high availability and resilience to node failures make it a very
good choice for collecting data from different sources in the real
world.
Kafka follows Publish-Subscribe messaging system, in which
message Producers are publishers and message Consumers are
subscribers. Messages are stored on the basis of their topic.
· KAFKA ARCHITECTURE:
Ø CORE APIs
1. Kafka Producer API
2. Kafka Consumer API
3. Kafka Streams API: It helps an application to behave as a
stream processor that consumes an input stream from topics
and generates an output stream, thus modifying input stream to
output stream efficiently.
4. Kafka Connector API: It helps in creating reusable producers
and consumers to enable a connection between Kafka topics and
other data systems.
KAFKA COMPONENTS:
Ø TOPIC:

The topic can be defined as Messages belonging to the same
category. Topics can be replicated and partitioned. It is because
of these factors which make Kafka scalable and fault-tolerant.
Ø PRODUCER:
It publishes messages to Kafka topics.
Ø CONSUMER:
They consume the messages already published by extracting
data from brokers.
Ø BROKER:
Brokers are the servers who store the messages. There may be
zero or more partitions per topic in a single broker.
LOG ANATOMY
Logs can be termed as the partition. A data source writes
messages to logs and consumers can read any time from the logs
they select.
DATA LOG:
Messages are retained for a particular amount of time and
consumers can read them as per its convenience. If the
configuration says to keep messages for 30 hours and consumer
goes down more than that, then it will lose the messages. But if
a consumer is down only for 60 minutes or so, it will read
messages from the last known offset.
PARTITION:

There are partitions in every broker and each partition is either
leader or replica of a topic. The leader is responsible for all reads
and writes to the topic and if a leader fails, one of the replicas
takes its place.
KAFKA USE CASES
· MESSAGING
Kafka can replace the traditional messaging system because of
its high throughput, scalability, fault-tolerant and built-in
partitioning.
· METRICS
Kafka can be used for operational monitoring data as it collects
statistics from different sources and generates a centralized
feed.
FEATURES:
· SCALABILITY
Kafka can be scaled at any point in time in all 4 dimensions like
producers, consumers, processors, and connectors.
· HIGH-VOLUME
· DATA TRANSFORMATIONS
Kafka also provides a feature of transforming data streams from
publishers to new data streams.
· FAULT TOLERANCE

Kafka can handle failures with masters and databases.
· RELIABILITY
Because of fault tolerance, replication, partition like features,
Kafka is a very reliable messaging system.
· DURABILITY
Since Kafka uses a distributed commit log, it is highly durable.
· PERFORMANCE
Kafka always has high throughput even if TBs of messages are
stored, hence, maintaining its performance.
· ZERO DOWNTIME
Kafka guarantees zero downtime and data loss.
· REPLICATION
Messages are replicated internally in order to become fault-
tolerant and prevent data loss.

Fundamentals of Apache Kafka

More Related Content

What's hot

Similar to Fundamentals of Apache Kafka

Recently uploaded

Fundamentals of Apache Kafka