Apache Kafka is a scalable, fault-tolerant and distributed
Publish-Subscribe messaging system that receives data from
various sources and makes it available in real-time for analysis.
Its high availability and resilience to node failures make it a very
good choice for collecting data from different sources in the real
world.
Kafka follows Publish-Subscribe messaging system, in which
message Producers are publishers and message Consumers are
subscribers. Messages are stored on the basis of their topic.
· KAFKA ARCHITECTURE:
Ø CORE APIs
1. Kafka Producer API
2. Kafka Consumer API
3. Kafka Streams API: It helps an application to behave as a
stream processor that consumes an input stream from topics
and generates an output stream, thus modifying input stream to
output stream efficiently.
4. Kafka Connector API: It helps in creating reusable producers
and consumers to enable a connection between Kafka topics and
other data systems.
KAFKA COMPONENTS:
Ø TOPIC:
The topic can be defined as Messages belonging to the same
category. Topics can be replicated and partitioned. It is because
of these factors which make Kafka scalable and fault-tolerant.
Ø PRODUCER:
It publishes messages to Kafka topics.
Ø CONSUMER:
They consume the messages already published by extracting
data from brokers.
Ø BROKER:
Brokers are the servers who store the messages. There may be
zero or more partitions per topic in a single broker.
LOG ANATOMY
Logs can be termed as the partition. A data source writes
messages to logs and consumers can read any time from the logs
they select.
DATA LOG:
Messages are retained for a particular amount of time and
consumers can read them as per its convenience. If the
configuration says to keep messages for 30 hours and consumer
goes down more than that, then it will lose the messages. But if
a consumer is down only for 60 minutes or so, it will read
messages from the last known offset.
PARTITION:
There are partitions in every broker and each partition is either
leader or replica of a topic. The leader is responsible for all reads
and writes to the topic and if a leader fails, one of the replicas
takes its place.
KAFKA USE CASES
· MESSAGING
Kafka can replace the traditional messaging system because of
its high throughput, scalability, fault-tolerant and built-in
partitioning.
· METRICS
Kafka can be used for operational monitoring data as it collects
statistics from different sources and generates a centralized
feed.
FEATURES:
· SCALABILITY
Kafka can be scaled at any point in time in all 4 dimensions like
producers, consumers, processors, and connectors.
· HIGH-VOLUME
· DATA TRANSFORMATIONS
Kafka also provides a feature of transforming data streams from
publishers to new data streams.
· FAULT TOLERANCE
Kafka can handle failures with masters and databases.
· RELIABILITY
Because of fault tolerance, replication, partition like features,
Kafka is a very reliable messaging system.
· DURABILITY
Since Kafka uses a distributed commit log, it is highly durable.
· PERFORMANCE
Kafka always has high throughput even if TBs of messages are
stored, hence, maintaining its performance.
· ZERO DOWNTIME
Kafka guarantees zero downtime and data loss.
· REPLICATION
Messages are replicated internally in order to become fault-
tolerant and prevent data loss.

Fundamentals of Apache Kafka

  • 1.
    Apache Kafka isa scalable, fault-tolerant and distributed Publish-Subscribe messaging system that receives data from various sources and makes it available in real-time for analysis. Its high availability and resilience to node failures make it a very good choice for collecting data from different sources in the real world. Kafka follows Publish-Subscribe messaging system, in which message Producers are publishers and message Consumers are subscribers. Messages are stored on the basis of their topic. · KAFKA ARCHITECTURE: Ø CORE APIs 1. Kafka Producer API 2. Kafka Consumer API 3. Kafka Streams API: It helps an application to behave as a stream processor that consumes an input stream from topics and generates an output stream, thus modifying input stream to output stream efficiently. 4. Kafka Connector API: It helps in creating reusable producers and consumers to enable a connection between Kafka topics and other data systems. KAFKA COMPONENTS: Ø TOPIC:
  • 2.
    The topic canbe defined as Messages belonging to the same category. Topics can be replicated and partitioned. It is because of these factors which make Kafka scalable and fault-tolerant. Ø PRODUCER: It publishes messages to Kafka topics. Ø CONSUMER: They consume the messages already published by extracting data from brokers. Ø BROKER: Brokers are the servers who store the messages. There may be zero or more partitions per topic in a single broker. LOG ANATOMY Logs can be termed as the partition. A data source writes messages to logs and consumers can read any time from the logs they select. DATA LOG: Messages are retained for a particular amount of time and consumers can read them as per its convenience. If the configuration says to keep messages for 30 hours and consumer goes down more than that, then it will lose the messages. But if a consumer is down only for 60 minutes or so, it will read messages from the last known offset. PARTITION:
  • 3.
    There are partitionsin every broker and each partition is either leader or replica of a topic. The leader is responsible for all reads and writes to the topic and if a leader fails, one of the replicas takes its place. KAFKA USE CASES · MESSAGING Kafka can replace the traditional messaging system because of its high throughput, scalability, fault-tolerant and built-in partitioning. · METRICS Kafka can be used for operational monitoring data as it collects statistics from different sources and generates a centralized feed. FEATURES: · SCALABILITY Kafka can be scaled at any point in time in all 4 dimensions like producers, consumers, processors, and connectors. · HIGH-VOLUME · DATA TRANSFORMATIONS Kafka also provides a feature of transforming data streams from publishers to new data streams. · FAULT TOLERANCE
  • 4.
    Kafka can handlefailures with masters and databases. · RELIABILITY Because of fault tolerance, replication, partition like features, Kafka is a very reliable messaging system. · DURABILITY Since Kafka uses a distributed commit log, it is highly durable. · PERFORMANCE Kafka always has high throughput even if TBs of messages are stored, hence, maintaining its performance. · ZERO DOWNTIME Kafka guarantees zero downtime and data loss. · REPLICATION Messages are replicated internally in order to become fault- tolerant and prevent data loss.