Introduction to Kafka Fundamentals

Aug 5, 2024

Kafka Series - Part 1: Introduction to Kafka

Overview

  • Starting a Kafka series from beginner to advanced level.
  • Goals of the tutorial:
    • Understand what Kafka is.
    • Discover Kafka's origins.
    • Learn why we need Kafka.
    • Gain a high-level overview of how Kafka works.

What is Kafka?

  • Definition: Apache Kafka is an open-source distributed event streaming platform.
  • Key Terms:
    • Event Streaming:
      • Involves two main tasks:
        1. Creating Real-Time Streams:
          • Example: Using Paytm for transactions, where events are continuously sent to the Kafka server as users perform actions.
        2. Processing Real-Time Streams:
          • Applications read data from Kafka and process it (e.g., monitoring transaction limits).

Distributed Nature of Kafka

  • Distributed System:
    • Utilizes multiple computers across different nodes or regions to balance load and avoid downtime.
    • Example: Three Kafka servers in different regions to ensure continuous service.

Origins of Kafka

  • Developed at LinkedIn and open-sourced in early 2011.
  • Part of the Apache Software Foundation.

Why Do We Need Kafka?

  • Need for Message Management:
    • Example Scenario:
      • A postman cannot deliver a parcel if the recipient is not home, leading to lost data.
      • Solution: Install a letterbox to collect messages when the recipient is unavailable.
    • Similarly, Kafka acts as a messaging middleman between applications to prevent data loss.
  • Complex Scenarios:
    • With multiple applications needing to communicate, managing direct connections can be complex and lead to issues:
      • Data Format: Different applications might use various data formats.
      • Connection Types: Multiple types of connections (HTTP, TCP, etc.).
      • Number of Connections: Example: 4 applications connecting to 5 different services can create 20 connections, which is difficult to manage.
    • Kafka's Role:
      • Centralizes communication, allowing applications to send messages to Kafka, reducing the need for multiple direct connections.
      • Example: Instead of 20 connections, only about 9 connections are needed when using Kafka.

How Does Kafka Work?

  • Pub/Sub Model:
    • Key components:
      • Publisher: Sends messages to Kafka.
      • Message Broker: Stores messages.
      • Subscriber: Listens for and retrieves messages from Kafka.
    • This tutorial provides just a high-level overview; further sessions will delve into Kafka's architecture and components.

Conclusion

  • This video introduces Kafka, its purpose, and its functioning in real-time.
  • Subsequent tutorials will explore Kafka architecture and components in more detail.