Understanding Apache Kafka and Its Benefits

Hello everyone and welcome to this introductory lecture on Apache Kafka. Apache Kafka is a middleware for asynchronous event-based communication and in this lecture I will discuss what is event-based communication and why it is widely adopted to build large-scale distributed systems. So event-based communication enables an architectural style to build distributed systems. that are called event-based systems. In this architecture, components only interact by exchanging messages or events. We'll use the two terms almost interchangeably. And components can assume two roles. They can be publishers or producers that generate events, generate notification of events, messages. For example, a security camera may generate an event notification every time it observes a person entering a given room. Consumers or subscribers subscribe to the events that they are interested in. They consume only the events they are interested in. For example, a security dashboard may subscribe to all security-related events. Of course, components may have both roles at the same time. For example, the security dashboard may also generate security alerts. In the case it thinks that something suspicious is going on based on the events it consumed. Producers and consumers in this architectural style need not know each other because the communication is mediated by an external service, a middleware service, in our case, Kafka. So what are the benefits of the event-based paradigm? Why it is so widely adopted in practice? Here are three reasons. Space decoupling. So producers and consumers, as I already said, need not know each other. So that means that you can dynamically add new components to your system without restarting, reconfiguring the whole system or any other component, any other module of your system. For example, you can add new sensors into your network and they can start producing new events. So putting new events into Kafka, putting new events into the middleware service, and nothing else needs to be changed. Then you have synchronization decoupling. So producers are not blocked while producing events and consumers are notified asynchronously. That means that producers need not wait for consumers to consume the messages. They can just generate flash events whenever they are ready and they need not wait for other components. And this promotes scalability by removing an explicit dependency between the producers and the consumers. They are not synchronized with each other. Third, there is time decoupling. If the event-based middleware can store events, can persist events, and this is the case of Kafka, producers and consumers need not be connected at the same time. So the consumer can be down, the producer continuously push new events into the middleware. And when the consumer comes back online, it contacts the middleware and reads the messages and... keeps up with the remaining messages. So they need not be alive at the same point in time. Producers and consumers need not be alive at the same point in time. So why do we focus on Kafka? Because it is basically the standard today for event-based communication. It's probably the most widely adopted platform for event-based communication today. If you read from their website, they claim that 80% of all Fortune 100 companies trust and use Kafka. And here are some big names. LinkedIn, Netflix, New York Times, Zalando, Spotify, Twitter, and so on. So if you're interested in some use cases, you can refer to these web pages to find some more examples where Kafka is used. Just one slide to mention the history. Kafka was originally developed at LinkedIn. to handle the logs of their distributed application. And that's what Kafka does, storing logs, storing logs of events. It was open source in 2011, it became an Apache project in 2012, and in 2014 a group of developers that were originally working on Kafka at LinkedIn created a new company called Confluent. that develops Kafka, which is an open source project, and offers a commercial platform that builds on Kafka, providing additional services. Since last year, Confluent is listed on NASDAQ. So that is a clear example of a big success story. So a quick overview of Kafka. Kafka offers. so-called topic-based communication, meaning that messages, events, are put into Kafka queues or logs and each queue of events represents a logic topic. So the communication is topic-based. Consumers can refer to topic, producers can refer to topic, can publish messages for a given topic. Topics are persistent. So topics are stored on durable storage, on the disk typically, and they are replicated for fault tolerance. So you can trust on the persistency of topics. Consumers can read events whenever they want. So typically they consume events only once, but they can go back in time and re-consume events more times. And that is useful in the case of failure. If a component fails, it can resume and read previous messages and go back to the state in which it was before the failure. And then Kafka promotes scalability. So topics are partitioned to improve access performance and scalability. We will see some specific implementation details that enables the scalability and the level of performance. So let me focus on one case study to motivate what are the benefits of event-based communication in more details. So the use case comes from a system built using a microservices architecture, which is a very popular way to build applications as a set of independent services. By independent here, I mean that they do not share state. Each service has its local state, its local database, and inter-service communication takes place either as remote procedure calls or remote service invocation or using messages, events. So I will use this example to show the advantages of using events, event-based communication, instead of remote procedure calls. So in these applications, in microservices architecture, clients interact typically with a front-end, for instance, using a browser in the case of a web application, and the front-end dispatches requests to the back-end services. Here we focus on these services, so on the back-end, on the microservices. In this example, we have an orders service, the one here on top, that processes orders from customers. Orders are sent to the shipping service, which queries the customer service to obtain the address of the customer and ship the product. So now, Let's assume that we are using remote procedure calls, remote service invocations. The service order invokes a functionality on the shipping service that changes the state by adding a new order. So every time there is a new order, the order service invokes a functionality on the shipping service. That's the semantic of this error. And this functionality registers. a new order into the shipping service. So changes. the state by adding a new order. Instead, the shipping service only reads from the customer service. So read operations are also called queries. It queries the customer service. So this is a change of state. It is also called a command and this is a read operation. So a query command and query. Now we will see how commands and queries are translated in the case of event-based communication approach. So let's focus on the interaction between the orders service and the shipping service. So the command. If we want to use Kafka and event-based paradigm, we can transform the command into an event. So instead of invoking the command on the shipping service, service. we register a new order event on some topic. Let's call this the order topic. Okay, so this is the picture. Now we have Kafka. So this is a topic named order, and it is stored in Kafka, which is a mediator between two components, the order service and the shipping service. So now the two services are decoupled. there is a third-party component that mediates the communication between them. And this brings several advantages. Orders, so the order service, does not need to know which services are consuming order events. So we can change the shipping service without orders even noticing. We can take it down for some time and restarted, for example, for maintenance, and orders can continue to work without even noticing. Also, we can dynamically add new services that consume order events without any change in the rest of the infrastructure. So, for example, we can create a new service that computes statistics about orders. So, that is something that may become useful for our company later in time, and we... can add it dynamically without changing any other component. The new service will simply read from the order log, from the order topic, as shipping is already doing. The two services are also decoupled in time, so shipping might be unavailable for a while, maybe because of a maintenance or for the rollout of a new version. And that's not important. When shipping comes back online, it processes orders from the log, from the topic, and orders, the order service, can continue without noticing the temporary unavailability of shipping. Okay, that was for how to translate a command into the event-based paradigm. Now, let's focus on the second type of interaction, the query. A query is a synchronous request-response interaction. So the shipping service makes a request and waits for a response from the customer service. Using synchronous interactions between components makes the two components tightly coupled with each other and may negatively affect latency. If the customer service is not responsive, Also, the shipping service becomes slow or it may become unavailable. Okay, so synchronous interaction may negatively affect performance. So using an event-based paradigm, we can solve this problem by reversing the interaction between the services. So the shipping service does not synchronously query customer. But instead, it is notified, it observes notification of state changes from the customer service. Let's see a picture. So basically the idea is we replicate the address of customers, which is the information the shipping service needs, inside the local database of the shipping service. When customers change their detail, The customer service places a notification of change. into a new topic. Let's call this topic Customer Updates topic. So when there is an event, a change of address for a given customer, this information becomes an event and is pushed by the customer service into this topic. Every interested service can read customer updates. In particular, the shipping service can read customer updates. and update its local view of the customer. Now, the shipping service is totally independent. So, the services are fully decoupled. There is no synchronous interaction between the shipping service and the customer service. The shipping service uses its local view of customers, of the addresses, and updates that. asynchronously by reading from the customer updates log. So there is no problem in the temporary unavailability of a service and this promotes the evolution of the system. You can add, remove, change services without the other services even noticing. The problem of course is that asynchronous notification may create temporary inconsistency between the data. So the shipping services may observe a change into the address of the customer only later. So the above example introduces two key concepts. So the translation of commands and queries into the event-based paradigm leads to these two key concepts. event sourcing, meaning that events become the core elements, the center of the system. We are making events the source of truth, not the state, but the events. So the state can be reconstructed starting from the events, starting from the observation of events. So we are somehow turning the database inside out instead of relying on the current state of the system that is stored on the database of individual services we are relying on the events that change the state and we can go back to a given state always by reading the events from the log so what we assume is that there is a an intermediate communicator that is Kafka that stores the events for us and then the services can fully rely on events. The second principle is the Command Query Responsibility Segregation or CQRS. That means that we are separating the write path commands from the read path, the queries, and we are connecting them through an asynchronous communication channel. So the queries can be performed on the local state of a component that is asynchronously notified by events.

We'll use the two terms almost interchangeably. And components can assume two roles. They can be publishers or producers that generate events, generate notification of events, messages. For example, a security camera may generate an event notification every time it observes a person entering a given room. Consumers or subscribers subscribe to the events that they are interested in.

They consume only the events they are interested in. For example, a security dashboard may subscribe to all security-related events. Of course, components may have both roles at the same time.

For example, the security dashboard may also generate security alerts. In the case it thinks that something suspicious is going on based on the events it consumed. Producers and consumers in this architectural style need not know each other because the communication is mediated by an external service, a middleware service, in our case, Kafka.

So what are the benefits of the event-based paradigm? Why it is so widely adopted in practice? Here are three reasons. Space decoupling.

So producers and consumers, as I already said, need not know each other. So that means that you can dynamically add new components to your system without restarting, reconfiguring the whole system or any other component, any other module of your system. For example, you can add new sensors into your network and they can start producing new events. So putting new events into Kafka, putting new events into the middleware service, and nothing else needs to be changed.

Then you have synchronization decoupling. So producers are not blocked while producing events and consumers are notified asynchronously. That means that producers need not wait for consumers to consume the messages. They can just generate flash events whenever they are ready and they need not wait for other components. And this promotes scalability by removing an explicit dependency between the producers and the consumers.

They are not synchronized with each other. Third, there is time decoupling. If the event-based middleware can store events, can persist events, and this is the case of Kafka, producers and consumers need not be connected at the same time. So the consumer can be down, the producer continuously push new events into the middleware. And when the consumer comes back online, it contacts the middleware and reads the messages and...

keeps up with the remaining messages. So they need not be alive at the same point in time. Producers and consumers need not be alive at the same point in time. So why do we focus on Kafka?

Because it is basically the standard today for event-based communication. It's probably the most widely adopted platform for event-based communication today. If you read from their website, they claim that 80% of all Fortune 100 companies trust and use Kafka.

And here are some big names. LinkedIn, Netflix, New York Times, Zalando, Spotify, Twitter, and so on. So if you're interested in some use cases, you can refer to these web pages to find some more examples where Kafka is used. Just one slide to mention the history.

Kafka was originally developed at LinkedIn. to handle the logs of their distributed application. And that's what Kafka does, storing logs, storing logs of events.

It was open source in 2011, it became an Apache project in 2012, and in 2014 a group of developers that were originally working on Kafka at LinkedIn created a new company called Confluent. that develops Kafka, which is an open source project, and offers a commercial platform that builds on Kafka, providing additional services. Since last year, Confluent is listed on NASDAQ.

So that is a clear example of a big success story. So a quick overview of Kafka. Kafka offers. so-called topic-based communication, meaning that messages, events, are put into Kafka queues or logs and each queue of events represents a logic topic. So the communication is topic-based.

Consumers can refer to topic, producers can refer to topic, can publish messages for a given topic. Topics are persistent. So topics are stored on durable storage, on the disk typically, and they are replicated for fault tolerance. So you can trust on the persistency of topics. Consumers can read events whenever they want.

So typically they consume events only once, but they can go back in time and re-consume events more times. And that is useful in the case of failure. If a component fails, it can resume and read previous messages and go back to the state in which it was before the failure. And then Kafka promotes scalability.

So topics are partitioned to improve access performance and scalability. We will see some specific implementation details that enables the scalability and the level of performance. So let me focus on one case study to motivate what are the benefits of event-based communication in more details.

So the use case comes from a system built using a microservices architecture, which is a very popular way to build applications as a set of independent services. By independent here, I mean that they do not share state. Each service has its local state, its local database, and inter-service communication takes place either as remote procedure calls or remote service invocation or using messages, events.

So I will use this example to show the advantages of using events, event-based communication, instead of remote procedure calls. So in these applications, in microservices architecture, clients interact typically with a front-end, for instance, using a browser in the case of a web application, and the front-end dispatches requests to the back-end services. Here we focus on these services, so on the back-end, on the microservices.

In this example, we have an orders service, the one here on top, that processes orders from customers. Orders are sent to the shipping service, which queries the customer service to obtain the address of the customer and ship the product. So now, Let's assume that we are using remote procedure calls, remote service invocations. The service order invokes a functionality on the shipping service that changes the state by adding a new order. So every time there is a new order, the order service invokes a functionality on the shipping service.

That's the semantic of this error. And this functionality registers. a new order into the shipping service.

So changes. the state by adding a new order. Instead, the shipping service only reads from the customer service.

So read operations are also called queries. It queries the customer service. So this is a change of state. It is also called a command and this is a read operation.

So a query command and query. Now we will see how commands and queries are translated in the case of event-based communication approach. So let's focus on the interaction between the orders service and the shipping service.

So the command. If we want to use Kafka and event-based paradigm, we can transform the command into an event. So instead of invoking the command on the shipping service, service.

we register a new order event on some topic. Let's call this the order topic. Okay, so this is the picture. Now we have Kafka. So this is a topic named order, and it is stored in Kafka, which is a mediator between two components, the order service and the shipping service.

So now the two services are decoupled. there is a third-party component that mediates the communication between them. And this brings several advantages. Orders, so the order service, does not need to know which services are consuming order events.

So we can change the shipping service without orders even noticing. We can take it down for some time and restarted, for example, for maintenance, and orders can continue to work without even noticing. Also, we can dynamically add new services that consume order events without any change in the rest of the infrastructure. So, for example, we can create a new service that computes statistics about orders. So, that is something that may become useful for our company later in time, and we...

can add it dynamically without changing any other component. The new service will simply read from the order log, from the order topic, as shipping is already doing. The two services are also decoupled in time, so shipping might be unavailable for a while, maybe because of a maintenance or for the rollout of a new version.

And that's not important. When shipping comes back online, it processes orders from the log, from the topic, and orders, the order service, can continue without noticing the temporary unavailability of shipping. Okay, that was for how to translate a command into the event-based paradigm.

Now, let's focus on the second type of interaction, the query. A query is a synchronous request-response interaction. So the shipping service makes a request and waits for a response from the customer service.

Using synchronous interactions between components makes the two components tightly coupled with each other and may negatively affect latency. If the customer service is not responsive, Also, the shipping service becomes slow or it may become unavailable. Okay, so synchronous interaction may negatively affect performance.

So using an event-based paradigm, we can solve this problem by reversing the interaction between the services. So the shipping service does not synchronously query customer. But instead, it is notified, it observes notification of state changes from the customer service.

Let's see a picture. So basically the idea is we replicate the address of customers, which is the information the shipping service needs, inside the local database of the shipping service. When customers change their detail, The customer service places a notification of change.

into a new topic. Let's call this topic Customer Updates topic. So when there is an event, a change of address for a given customer, this information becomes an event and is pushed by the customer service into this topic.

Every interested service can read customer updates. In particular, the shipping service can read customer updates. and update its local view of the customer.

Now, the shipping service is totally independent. So, the services are fully decoupled. There is no synchronous interaction between the shipping service and the customer service. The shipping service uses its local view of customers, of the addresses, and updates that.

asynchronously by reading from the customer updates log. So there is no problem in the temporary unavailability of a service and this promotes the evolution of the system. You can add, remove, change services without the other services even noticing.

The problem of course is that asynchronous notification may create temporary inconsistency between the data. So the shipping services may observe a change into the address of the customer only later. So the above example introduces two key concepts.

So the translation of commands and queries into the event-based paradigm leads to these two key concepts. event sourcing, meaning that events become the core elements, the center of the system. We are making events the source of truth, not the state, but the events.

So the state can be reconstructed starting from the events, starting from the observation of events. So we are somehow turning the database inside out instead of relying on the current state of the system that is stored on the database of individual services we are relying on the events that change the state and we can go back to a given state always by reading the events from the log so what we assume is that there is a an intermediate communicator that is Kafka that stores the events for us and then the services can fully rely on events. The second principle is the Command Query Responsibility Segregation or CQRS. That means that we are separating the write path commands from the read path, the queries, and we are connecting them through an asynchronous communication channel. So the queries can be performed on the local state of a component that is asynchronously notified by events.

Transcript for:Understanding Apache Kafka and Its Benefits

Transcript for:
Understanding Apache Kafka and Its Benefits