Transcript for:
Introduction to Kafka Fundamentals

hi everyone welcome to Java techy as I announced before I am going to start Kafka series from beginner level to advanced level and this is the part 1 video of Kafka series and agent of this tutorial to understand what is Kafka where does Kafka come from why do we need Kafka how does it work with high level overview okay so without any further delay let's get started [Music] thank you [Music] so let's start with what is kapka if you open the official page of Kafka you will find this definition Apache is an open source distributed event streaming platform what does it mean so let's break down these words to understand in better way when I say event streaming it points to two different tasks create real-time stream process the real time stream let me explain this two word Udan example okay so I hope everyone you just pay team in this digital world if anyone does not know paytm paytm is a Epi payment method let's assume I am using that paytm application to do some payment or I am booking the flight ticket or I am just booking some movie ticket okay because paytm providers feature to do any type of transaction now when I'll do any transaction that event will go to the Kafka server but I am not the only one paytm user who is doing the transaction at this time since people uses this across Globe this Kafka server receives millions or billions of event in each minute or each second or even in each millisecond right so sending the stream of continuous data from paytm to the Kafka server is called creating Real Time Event stream or generating real time stream of data now once Kafka server receives the data he need to process it right so the paytm team or paytm developer created one client application which will read the data from the Kafka server and do some process for example let's say paytm want to restrict marks 10 transaction per day I mean a user can only do 10 transaction per day using paytm method okay if it exits the limit then client application wants to send a notification to the user in such scenario my client application needs to continuously face the data and need to do the validation to check the transaction count for a specific user in each and every seconds or millisecond right so this continuously listens to the Kafka messages and processes processes them is called processing Real Time Event stream okay so if you combine these two term this will give you the answer what is event streaming in simple words continuously sending the message or even to the Kafka server and reading and process them is called Real Time Event streaming now let's move to the next word called distributed in microservices World distributed means distribute multiple computers to different node or region to balance the load and to avoid the downtime similarly as you know kapka is a distributed event streaming platform we can also distribute our Kafka server if you can observe here we have three Kafka server running in different region to perform event streaming operation in case if any server goes down another server will come up to pick up the traffic to avoid application downtime okay hope you understand what Kafka is or why it is called a distributed event streaming platform now let's move to the next point that is where does Kafka come from Kafka was originally developed at LinkedIn and was subsequently open sourced in early 2011. now this Kafka software comes under Apache software Foundation okay now let's move to the next Point why do we need Kafka so let me walk you through an example to demonstrate this particular Point let's say I have some parcel on my name so Postman come to my door to deliver the parcel unfortunately I was not there in my home and went for vacation so he returned back again next time he came but I'm not there in home he tried two three attempt to deliver the parcel and every time I am not there in my home after some day he might forget about the person or he returned the parcel to the main office in this case I lost the data or I was I will not able to receive the information what Postman bring to my door that parcel could contains some important information or money related stuff but I missed it because I was not available during the period when the postman came to my door this could be a huge loss for me Isn't it how can I overcome this no worries I am very smart I have installed a letter box near to my door so when next time parcel Postman brings for me and if he found I am not there in home or I am not available in my home then simply he can put that parcel to my letterbox so whenever I will be back to my home it can be a day after or a week after I can go to my letter box and I can collect the parcel or the messages what Postman dropped in in my letter box in that case I will not lose the data data will be there in my letter box until I pick it up here the letterbox acts as a middleman between the postman and me how cool is this isn't it let's try to relate this understanding with one real time example let's say I have two application application one and application two Now application one wants to send the data to application 2. but if application 2 is not available to receive the data then again he will lose the data like me which might impact to the business of application too so to overcome this communication failure we might need to install something similar to the letterbox between these two application right that is where Kafka comes into the picture you can add a messaging system between application one and application 2 and that messaging system can be a Kafka or it can be a rabbit mq or it can be a radius okay but we are going to focus on the Kappa messaging system so in case application 2 is not available he can collect the message from Kafka whenever he will come online this Kafka again will act as a letter box between application one and application 2 that is how he won't lose the data right hope now you understand why we need kapka or messaging system now let's understand the need of Kafka with more complex scenario okay let's say you have four application who wants to produce different types of data to the database server this looks simple what is the problem here nothing as of now but in future your application can grow you might have n number of service to communicate with each other in that situation it's really tough to manage these many connections between the services there could be a lot of challenges what could be the possible challenges data format connection type or number of connection when I say data format this front-end app may be want to produce some different type of data to each different type of application maybe front-end want to send some payload structure to database server and some different payload structure to security system similarly Hadoop also might to do or might to send some different type of data there could be a complexity to handle the data format or schema and next connection type so there could be different type of connection I mean when I say different type of connection it could be a HTTP connection or it could be a TCP connection or it could be a jdvc connection like the connection type will be complex to maintain with multiple Services okay and second is number of connection when I say number of connection if you observe carefully front end connect to the five different destination services Hadoop connected to different file database slab connected to 5 and chart server also connected to five different services so if you count the number of connection here the total count will be 20. so to just maintain left side is four application and right side is five application so just to maintain nine application or two just communicate between the nine application we need to manage the 20 connection in Enterprise application really it's kind of a bottleneck situation to handle level of challenges then how to overcome it that is where this messaging system like Kafka came into picture so if You observe this particular diagram carefully now front-end Hadoop database lab chat server whatever the data type they want to send or whatever the schema structure they want to send or which type of connection they want to make they'll simply send the payload or information to this particular Kafka server or my messaging system now which kind of data this data server need will go to the Kafka server and you will directly get it from this Kafka server similarly let's say security systems want to face some data given by either Hadoop or database left he can simply go there and check the data what he was looking for is there or not if it is there he can simply get this data and he can play with those data in this approach we are maintaining a centralized or similar to the letter box all the front-end Hadoop database lab chat server they will drop the messages and other five Downstream services will go to those Kafka server and they will pick the messages as per their need and in this approach we are also reducing the connection count Can You observe your the total number of connection here one two three four one two three four five total nine Connection in previous approach we found 20 connection but when we centralized using the Kafka we are reducing the connection count as well okay so this is the advantages of using the messaging system like Kafka I hope you understand why do we need Kafka server as a messaging system now let's move to the next point that is how does it work I will just give high level overview to understand how this Kafka or messaging system works in real time okay so this specifically works on Pub sub model when I say pops up pop stands for publisher search transfer subscriber model okay so usually there should be three component publisher subscriber messaging system or you can say message broker okay publisher is the person who will publish the event or messages to the messaging system like Kafka okay and the message will go and sit in the message broker now the subscriber will go to that particular message broker and will ask for the messages or subscriber will simply listen to that message broken to get the messages okay so this is just 50 000 high level overview of the pub sub model don't worry we'll understand in details how the message is getting processed inside the broker and what all components helps to process the message in my coming session okay so this is just a heads up tutorial to just get basic information about Kafka and its need in real time next tutorial will understand Kafka architecture and its each and every components that's all about this particular video guys thanks for watching this video meet you soon with A New Concept