Hi everyone. This is GKCS. This
is a video on designing WhatsApp. It's a chat based application, so
once you know how to design WhatsApp, you will be able to design any chat
based application to a large extent. The special things about WhatsApp are
that they have group messaging and they have these read receipts. So those are the two key features that
people look for in a normal system design interview. But there's also other features that we'll
be talking about and we'll be talking about the features that we should probably
not take up during an interview and basically choose the kind of things that
we are doing so that we can actually finish in the hour that we have. Now, amongst all the features that you
can ask your interviewer as to, would you like this? Would you like that? Probably you should start simple and
you should start with things that you already know, because I've noticed that the first
feature that you ask for the interviewer usually says yes. So one of the things
I'm comfortable with is group messaging. So WhatsApp has groups at most,
200 people can enter these groups, and so group messaging is something
that I understand to a good extent. Image sharing is another good
question to ask as to are images going to be shared in these messages?
And almost obvious answer is yes, we will allow image sharing
or video sharing. Also a good question, but I mean this is
something that if you have used WhatsApp, you'll know about is sent,
delivered, and read receipts. So you have those tick marks coming in
based on what stage is the message on. The final two things are not critical
to an application in terms of features, but it's nice to think
of in an engineering way. The first one being that is the
person online. And if they're not, then when was the last time that they
were seen on the chat? And the second thing is, are the chats
temporary or are they permanent? So if you have a look at Snapchat, or
even if you have a look at WhatsApp, in a way, they're much more temporary than a lot
of the office messaging applications. The reason for this is because
you want a lot of privacy. You want to give the user
a lot of power. Also, it actually saves a lot of storage
space if you think about the chats being stored in the user's applications only. But if there is any sort of compliance
that you need or if there is any official communication, then you want that
message to be stored somewhere forever. So that's another thing that we'll be
asking. Although WhatsApp gives you, so to speak, only temporary chats, if you delete the app and if
your friend also deletes the app, those chat messages are lost forever.
So one thing I'd like to say is that image sharing has already
been taken up on this channel. If you want to have a look at how this
is done, have a look at the Tinder video. It explains how images can be stored,
retrieved, et cetera, et cetera, in a sensible engineering way. So you're left with four
features for this video, and the first one we'll be
picking up is group messaging. Before we get to group messaging, we need to first talk about how does
one person send a message to another person? So that is one-to-one
chat, and that is our requirement, which is 1, 2, 1 chat. Alright, this is what we are coming to,
okay, let's take this step by step. A lot of the things that I'll be
discussing in this are there in the system design playlist. So have a look at that. When you're
looking for things like load balancing, when you're looking for
things like messaging cues, I'll be using those
things as abstractions, as structures to meet all the
features that we have talked about. If you want any detail, then
you can always go there. Single point of failure is also something
pretty important in the WhatsApp architecture. So have a look
at those. Now let's start. You have the application
installed in your cell phone. You connect to WhatsApp on the cloud. The place that you're connecting
to is called a gateway. The reason for this is because you'll be
using an external protocol when you're talking to WhatsApp, but WhatsApp might be talking in a
different language with its internal services. Main reason being that
you don't need that much security. You don't need those big headers that
H T P provides you when you're talking internally because a lot of the security
mechanisms are taken care of on the gateway itself. So once you
do connect to the gateway, let's assume that you're actually
sending a message to person B. So you are person A and
you're sending to person B, person A connects to the gateway. The gateway actually needs to
send it to person B. Somehow you could store this information
as to which users are connected to which box in the gateway
itself. In that case, you would need some sort of a user to box mapping. Okay? For the gateway
service, which is a microservice itself, it needs to store the information as to
this user ID is currently connected to box number two. So if this
is box number 1, 2, 3, then there needs to be information saying
that B is connected to two and A is connected to one. When you have this kind of information
being stored on the boxes itself, it's going to be an expensive thing.
Why is it expensive? Because maintaining a connection, a TCP
connection itself takes some memory. What you want to do is you want to
actually increase the maximum number of connections that you can store in a single
box and you don't want that memory to be wasted by keeping information
for who is connected to which box. Second thing is this information is
being duplicated on all three servers. Either it's being duplicated,
there's some caching mechanism, or there is some database which
is actually handling this. This is transient information, so there's going to be a lot of updates
going on over here and this is not nice. There's a lot of coupling
that I can see in this system. So what you want to do is you
want to keep a dumb connection. This TCP connection should be dumb in
the sense that it just takes information, gives information, it doesn't know
what it's doing. Apart from that, the person you want to be asking for
when it comes to information on who is connected to which box is
a microservice in itself, and this microservice can be
the sessions microservice. What does a sessions microservice store?
Well, who's connected to which box? Just that information that we were
storing over here and was being handled by the gateway has been decoupled from the
system and being sent to the sessions Microservice, you can see that there are multiple
servers for single point of failure avoidance. Okay? So when a user is sending user
A is sending some message, it actually asks for send message
With the user ID for B, when the
gateway gets this message, it's pretty dumb. It
doesn't know what to do, it just sends it to the
session service, okay? This session service is indirectly
a router. When it gets this message, when it gets this request either
of send message to user B, what it does is it figures
out where does user B exist, which box is user B connected to? And then routes this message basically
sending this message to gateway two to send it back to user B. Now what's happened is A
has sent a message to B. Interesting. How can A send a message to B if the
server is sending this final bit where the gateway two is sending a message to
be. This can't be done using S G T P. It's a server to client protocol. I mean rather it's a client to server
protocol. So the client sends requests, the server gives responses. So you cannot send a message
from the server to the client. You can only send requests from
client to server. There's many ways to get over this using HTTP itself. One
of them is long polling. In which case, what happens is every minute
or so B can ask for, Hey, are there any new messages for me? And then the gateway or the
sessions management service, which are one you would like
can send it the message. Of course this is not real time.
And if you want something real time, especially for chat applications, which it's very important
to have the real time thing. So S TT P is not something that we
can use and we need another protocol over T C P. And the thing that we are looking
for really are web sockets. So web sockets are super nice when
it comes to chat applications. The main reason being that they can
allow you peer to peer communication. So A send to B, B send to A, there's no
client or server semantics over here. So with that, what happens is literally the server
can send a message to the client. B, okay, so we are happy B got the message, what now? Well, B got the message.
So that means it has been delivered. At this point, user A should be notified
that the message has been delivered. There's one place that I missed out on. When the message actually gets to the
gateway and gets to the session service, what it can do is you can send a
parallel response to gateway one saying that, okay, I got the message now it's going to
be sent to user B when it's possible, let's say a different
database for the chat. And because it's stored in the
database, it's safe, it's persistent, it'll keep retrying the
message till user B gets it. So A is guaranteed that B
is going to get the message. So it should get the sent receipt. So just give a response saying that, okay, I got the message gateway one is now
going to send the message to user A. So sent is taken care of when this
entire flow is completed. When B gets the message for the first time, how deliver,
how do we give a delivery receipt? Once you send the message to B
and b actually got the message, it should respond. I mean it should again go to gateway
two and say that got the message. That's an acknowledgement, a TCP acknowledgement when
gateway two gets this message, it sends it again to the session
service saying that, Hey, this message was received.
So this message was received. The message is going to be
containing A two and A from field. Yeah. So the session service,
what it can do is, okay, the message has been received by the
person who was tagged over here too, which is B. So the person who sent
the message from A should get a delivery receipt. And so sessions finds out again where
A exists. That is box number one, send a delivery receipt.
A gets a delivery receipt. Okay? And of course you can think
about how red is going to work. The moment a person opens, the
application comes and opens this chat tab. They send a message saying that red and
the exact same flow takes care of red also. Alright, so that's a lot to digest if you like. Then you can go through
this a little more. This is the very first feature
of sending and basically delivering receipts to the sender. Okay? The second feature we are
talking about is quite simple. It's about the last scene or is
the person online right now at a scale, I mean at huge scale,
when there's millions of users, everything gets complicated. But one of the principle architectural
things that we can do over here is this. Simply put B just wants to know when A
was online the last time this information has to be stored somewhere. And what
the server can do is they can ask A, but that would be stupid. So instead,
A is not even in the picture now, and the only messages which will be sent
and received are from B and the server. So B asks the server,
when does a online last? There needs to be some information
in some table saying that this user was lost online at this time. So some timestamp and it will have some entry over here with a
particular timestamp. The only question which remains is how
is this row maintained the last seen timestamp for a particular
user? This key value pair, whenever a user A makes does an activity, basically sending a message or reading
a message or any kind of request to the server should be logged as an activity
and that current timestamp should be persisted in this table. In that way, we can say that whenever A did
anything, definitely they were online, which means that the last seen timestamp
needs to be updated based on this. B can be told that if A is online or not. One of the key features over here is
that if A was online three seconds ago, then B shouldn't be told that they
were online three seconds ago. Instead, the showing tag should be online. Probably they haven't done any
activity in the last three seconds. You can keep this threshold to anything
that you like, maybe 10 seconds, maybe 15 seconds. But the important thing is they're either
online or they were last seen at least let's say 20 seconds ago. The last scene tag is a little tricky
to update even after taking in all activities. So what I'll be doing is whenever
a user sends a request to the gateway, I'll be having
a microservice, which is
the last seen microservice. And what this will be doing is
it's doing user activity tracking. Anytime there's an activity, they
definitely send a message to the gateway. When they send a message to the gateway, I'm going to say that they're last
seen at this point. Now interestingly, there might be some requests which
are not being sent by the user, but by the application
itself. For example, when you
pull for certain messages, maybe you're offline,
you're not using the app, but you want your application to
notify you whenever there's a message. So for example, delivery receipt,
that's not an activity by me. So the request should be smart in the
sense that the client should be smart saying that this is a user activity and
this is something that the application itself is doing. So two types of
messages being sent by the client. One type is user activities, and the other one is let's say
system generated or app messages. App requests. This can be a
flag in the request itself. If it's an app request, don't
send it to the last scene service. If it's a user activity, send
it to the last scene service, it'll go and update the
last seen timestamp for this user. And in that way, what can happen is user B can
say whether the user is online, or at least they were last seen at this
timestamp by querying this service. So feature three is also done. Alright, so we are. Very close to actually completing. This chat messaging
application. As you can see, it's a pretty complicated hacker,
but we get to everything one by one. Certain things that I like
to skip over so to speak, is load balancer because we
have already talked about this, so I won't be talking about how
the load balancer balances the load across the system. There's one interesting thing which
we have not talked about in the CDs, which is service discovery
or heartbeat maintenance. And that will be taken in a separate
video, but it's pretty interesting. You can have a look at some blogs, I'll probably post them
in the description below. The authentication service is another
thing that I'll be talking about later. Main reason being that it's quite simple, but it's something worth talking
about as a basic principle. So that'll also be taken
later. As you can see, these four services are things which
are not really relevant to WhatsApp, so to speak. The profile services is
a very generic service image services, sending emails and sending SMSs. Okay, then what is quota chat
application sending messages. Now you can see that there are five
users that are drawn over here. The red guys are in one group, the
green guys are in the other group. So whenever a user from the
red box sends a message, it should go to all other red boxes. And
this is the feature of group messaging. So this red user is
connected to gateway one, while we have the other red
users connected to gateway two. So let us assume that we send a
group message through this user. The problem here is that if the session
service stores all the information for all groups, let us say the red group has these three
users and they're connected to these three boxes. It's too complicated for
the session service to handle. I mean, it's something that you can decouple.
So that's what we have done. We have decoupled the information for
who is existing in which group in a group service. Now, the session service, when it gets a message from a red
user is going to be asking the group service, who are the other
group members in this group? The group service can then
respond saying 10 members with these user IDs exist in this group. Now the session service runs
through its own database. Usually this information is going
to be cached as much as possible, but it can figure out where
these users are connected to through its database.
I mean those 10 users. It had a mapping for
user ID to connection. And that connection tells you which
box, which gateway it exists in. So with this information, it can then route the messages to
each of these users one by one. What if the group has too
many members? Too bad? WhatsApp actually gives
you a maximum limit of 200. There's a lot of chat applications
which try to contain that to 500, 600. Main reason being that you'll be otherwise
fanning out the request too much. If you've seen the Instagram design video, what happens in that is also when a
celebrity actually posts something, it's effectively sending messages
to sometimes millions of people, and that's not practical. So you have to either batch process them
or you have to wait for these guys to pull them in a chat application because
you want the messages to be real time as much as possible. You can't
really have too much of a pull mechanism. Instead, what you do is you limit
the number of people in a group. 200 is a slightly reasonable
number compared to millions. Yeah, it's a very reasonable number. So what
we are going be doing is we are going to be limiting the number of
users we have to sum number X, and we are going to be assuming that
the sessions can handle web sockets sending these messages to
the relevant users. Okay? Now let's get into the details
of this mechanism. I mean, we have the bare bones thing,
how it's going to work, but the details are important. The first thing that I would do in this
architecture is because a lot of users are going to be connecting to my gateways. These gateways are going
to be starving from memory. That's the reason why we have
separated out the session service. That's one good way to
reduce memory footprint. The second thing you can do is
passing in the message, right? Maybe the message is sent over S
G T P, it's a chase on message, so on and so forth. You don't really want
to pass the message converted into an object, do some smart things on it, find out whether it has been authenticated
or not on the gateway itself, all those responsibilities, as
many responsibilities as you can, you want to push away from the
gateways because those are web sockets. Those are expensive. Those are
actual users connected to your box. So I would send an UNPASSED message to
the session service or to anyone I am sending it to. One smart way to actually send
an UNPASSED message to any service that you want to is
to have this unpassed message go through a passer microservice. You don't really need too many
server just too hard enough. So I'll just call it parser and unser microservice. What it's going to be doing is it's
going to take the unpassed message and going to be converting
it to a sensible message. So if your internal
protocol is instead of T P or written something or T C P
audio, you have something like thrift, which is used by Facebook
internally. So I would say thrift. Then you can pass the message
over here itself, right? What is the advantage? Let
me just, again, retrade, you get an electronic message over here. You send the electronic message forward, there's no work that you're
doing on the gateway itself. This electronic message will be converted
to a sensible programming language object by this passer on passer, alright? And that will then route it
to the right place. Okay? So that's one way to reduce
the memory footprint ratio. What are the other concerns or key
areas that we should focus on? Group. ID to user id? And this is a
one to many mapping, right? One group can have many user IDs and
to reduce a lot of the duplication in information that you have, we go for
something called consistent hashing. We should have a look at that. Consistent hashing helps you reduce
the memory footprint across servers by delegating only some
information to some boxes. Okay? Have a look at the video in case
you're not sure what this is. Consistent hashing is going to allow
you to actually route the request to the right box. What should be routed on the group id. If you have the
request routed on the group id, then it can tell you that for this group, who are the users belonging
to this group? Alright? That takes care of the
routing mechanism we have in case anytime the group service fails, you send the message to the box, it fail.
What do you do? You can retry, but you can only retry if you know
what request you needed to sign next. So one of the mechanisms
for this is message cues. Yeah, we have discussed this in the playlist
so I won't be getting into too much detail, but message cues are nice in the sense
that once you give a message to the message queue, it ensures that the message will be
sent maybe now maybe 10 seconds later, maybe 15 seconds later. Those
are configurable options. And also how many times
you're going to retry. All of this is configurable
in the message queue. If the message queue fails to send
the message, even after five retries, it can tell you that it's failed. You propagate the failure all the
way to the client saying that, no, I couldn't send this group
message. Okay, that's also fine. But the client needs to be told that it's
failed or it's cleared. Interestingly, when the group service gets this message,
it can send a response that, yes, I got the message sessions, then
sends a response to gateway, and the user who sends the original
message, gets a sent tick mark. Group receipts when it
comes to delivered or seen is pretty expensive. Main reason being
that everyone needs to say, yeah, I got the message, I got the message, and then finally it has
to come back to this guy. So we won't be getting into
that many chat applications. Actually don't even
have that. So it's fine. The final few interesting things when
it comes to chat messaging or group messaging especially, is
that you need item potency. There's an entire video I made on
retrial and item potency. Again, taking the Tinder messaging example, so you can have a look at that
for the technical details. This architecture is actually very
resilient and as a chat system, it's going to do pretty well. There's some tips and tricks over here
that you can get to know only if you have worked on messaging systems. So I'll
give you a few examples. For example, I mean, I was just reading this blog that
Facebook Messenger does it deprioritizes prioritises messages in case there's a
huge event like let's say New Year's or let's say some festival
like the Valley in India, there's going to be a lot of messages. Everyone's going to be wishing each
other happy. The valley Happy New Year, and that's going to be putting
a lot of load on the system. So all the principles of great limiting
come in here where you don't take messages, which are very important. Or sometimes you just drop messages
instead of dropping. I mean, the best thing to do is
to deprioritize messages. Things like last seen can be ignored.
The entire feature can be ignored. Has this message been
delivered? Has it been received? Those are not as important as actually
sending the message to the user. The first thing of the server, getting the message and
the acknowledgement. That's
all the user needs to know. Okay? That's more important than seeing whether
the person has read the message or not. So by deprioritizing unimportant messages, you're actually keeping the system
health good and you are performing okay instead of not performing at
all. So do check out the course. It's really useful when you are
designing systems like these. Of course, this takes care of the last
requirement that we had, which was to send group messages. Yeah, that is requirement number one
taken care of in the end. Alright, thank you so much for listening. Thank you so much for going
through this system design. If you have any doubts or suggestions,
you can leave them the comments below. If you liked the video, then hit the like button and if
you want for other notifications, then hit the subscribe button
and I'll see you next time. Oh, and I'll be posting a poll. So word
for what you want to see next time. See ya.