System Design Mock Interview: Designing Spotify

[Music] hello there and welcome to this system design mock interview today we want to show you a really high quality best-in-class system design answer so that you can use it to prepare for your interviews and use it to know exactly what standard you should be aiming for with us they give us the answers the candidate today I have a former engineering manager at Google it's Mark Mark how are you doing. I'm doing great Tom thanks for having me I'm looking forward to doing this oh thanks for thanks for coming on it's great to have you before we get started just want to quickly give people an idea of your background as a as an engineering manager yeah I was an engineering manager at Google for 13 years and worked on several large-scale systems with lots of great engineers and so I I hope I've learned a little bit from them and it'll be interesting to see me on the other side of the fence of uh of uh doing the being the interview candidate here so I'm looking forward to it I'm sure you do a great job yeah and Mark also is a coach on our platform so if you need some expert feedback ahead of your interviewer Google Facebook anywhere any tech company then do check him out on our platform okay uh if you're ready Mark let's crack straight into it the question I want to ask you today is how would you design Spotify okay uh so yeah so uh Spotify the probably one of the most popular music streaming services out there I guess uh so let's let me let me think about that just for a moment [Music] um because there's lots there's a lot of aspects of Spotify here let me see if I can actually maybe I'll just take some notes here so that I can kind of I think I think through this if that's okay and and can I get some uh make some assumptions so I have my notes I'll just put this right in here and then if it's okay I'll just use this for notes as well as sort of the design diagrams and things like that later on yeah sounds good so yeah so let's yeah so let's talk about just some different sort of Spotify okay so I for Spotify I can I can think of things I mean obviously there's songs music that's the main thing uh there's you know you have playlists I mean there's users there's uh artists uh I guess Spotify even has things like podcasts that you know you can listen to and things like that so there's a lot to it and so in terms of the system design for today uh is there is there like can we can we constrain this a little bit just so that we can actually get through a design what would you what would we would be good to focus on here yeah let's constrain it let's limit ourselves to let's look at finding and playing music okay okay so let me uh so use cases finding uh and playing music I'm just gonna write it down like verbatim here great yeah I think enough to talk about I think so yeah okay good so uh if I if I think about that let me let me try and ask a little bit or think a little bit about the use cases a little bit more so my typical use case then as a user if I'm signed up to Spotify and I have an app on a phone uh I am going to browse some music maybe I'll find a particular search for a particular song but but then the most core thing is that I play a song and it's you know coming back through my phone maybe my car or whatever wherever I might be headphones uh to listen to it with big broad questions like this you need to constrain the problem make it solvable in the space of an hour and Mark did this well establishing the use cases we wanted to focus on if you're asked to design an already existing system in this case Spotify share what you know about it with the interviewer they can correct you or fill you in if you have any knowledge gaps so if I think about other if I think about metrics here let's think about size let's let's do some quick quick drilling down into into some numbers here and pardon me while I get this going here so in numbers so tell me about how many users you're thinking about here for for this design yeah let's save a billion users okay one billion users okay what about number of songs yeah I think I saw I think I saw it somewhere that Spotify has about 80 million songs but let's say we need to have capacity for 100 million on there okay all right so 100 million songs a billion users and you know we're we're going to focus on the song piece of it so but let me do a little bit of uh of math here or a little bit of thinking and you can I can I'll just you know double check with you to make sure that this makes sense but for from what I know think about a typical mp3 audio is going to be about for a typical song is is about five megabytes and I mean I think that depends on oh the encoding of the song how long the song is and things like that but uh if that if that if that's okay we can just kind of make that assumption is that fair yeah that sounds good to me [Music] yeah I mean I think you can encode these things and you know really low quality like 96 kilobits per second all the way up to 320 kilobits but right in the middle 12860 is about maybe about that so all right so let's assume that and so then let's do let me just extrapolate from this so if you've got five megabytes per song you've got 100 million songs so I think uh that translates to I'm trying to do my math here so 100 million would be uh going to a thousand times that would be a 100 billion and then 100 trillion would be a million times times five would be 500 trillion so we're talking about uh total audio is something like 500 terabytes which is I guess it's the same thing is half a petabyte of data and so that's I'm making that assumption and so then depending on how you replicate this data because you typically want to have these songs in multiple you know have multiple copies of these things so they don't get lost Etc and if some replica is down so you know maybe you do let's say oops sorry my keyboard here 3x replication so that would mean like one and a half petabytes of of raw audio data and then and then each song in terms of the metadata meaning the you know the the song title and things like that and and artists and all these things the metadata is probably not very big it might be I don't know you know 100 bytes per song or something like that per song of metadata and so you might wind up with okay so let me do the you have it and that's uh uh billion 10 billion so that's only 10 gigabytes of of song metadata it's not really very much and even if you say it's it's a kilobyte then that's 100 gigabytes so it's not very much and let me just double check my math here 100 million times 100 really 100 bytes yeah I think that would be 10 gigabytes so even if we said if we round it up dramatically and said 100 gigabytes that's just not a ton of data now in terms of users uh user data you might have you know again like a maybe you have a kilobyte oops kilobyte per user metadata I guess it's all metadata and so uh that might be [Music] times a billion users so that would be a terabyte of data approximately so just to kind of so this is just me doing some quick metrics quick calculations to get a rough back of the envelope of how big these things are uh so does that make sense so far does that seem yeah that will make sense so far before starting your high level design as marketed get some metrics to help you identify any higher level decisions that might be influenced by the scale of the system Mark asked the right questions about the number of users and number of songs I made some rough calculations without getting bogged down in the numbers what Mark didn't calculate was the traffic how many songs are streamed per day or per second even that would have helped him figure out the scaling of the web servers how many he'd need and how much bandwidth he'd need however since this didn't impact his high level design it was okay not to go into the details of scaling at this point okay so let me let me maybe think about some basic components here and I'm going to draw them but I I'll get into them maybe a little bit more in detail later okay so some basic components that I would see so let me let me start with something that is my my best attempt at a mobile phone here so you cut your Spotify app and make this smaller so it fits into the thing so you've got your app which is on your phone typically and I'm gonna assume here that we're talking about the phone app because that's very common uh and so you've got your your phone the phone is going to be talking to uh an application server a Spotify application server somewhere or web server I guess and so let's draw some let me just put a like over here just say there's a Spotify web server uh and uh and there's not going to just be one there's going to be lots of them and assume that that's a lot and then of course just because this is a standard thing that you do when you have applications uh you know in the cloud here this is I guess part of the cloud you've got a load balancer and so let me draw let's assume that we've got an arrow here that's talking so Spotify app's talking through the load balancer ultimately to these to these web servers and so those are the those are some some sort of key components here let me think about this a little bit further yeah I think that's good so far okay and then of course the most important thing or one of the most important things here let me see if I can find a I know that there's a shape here here we go that's a that's a good shape I'm just going to say call this uh the database and let me make this a little bit more uh let's see that's what I wanted okay so there's database and so the web servers are going to be talking to the database and getting stuff from basically reading information writing information to and from the data reading and writing information to and from the database so those are kind of the very high level components in system design interviews good dual bandwidth communication is important that's to say you need to communicate well via both drawing and speaking you should practice this thoroughly before your interview Mark did this well he started drawing the components quite early on in his answer and this allowed the interviewer to follow along easily it's okay to have simpler and fewer components at this point in the interview you can then break it out later in your answer as Mark did when he splits his database in two and I probably am thinking about this I think this is obviously oversimplified and I'm not going to redesign Spotify uh you know the existing full application as exists but I do think I can get a little bit more detailed here by I think I'm thinking about the data again because it's it's the data is important so I'm thinking about the metadata and the user data and the audio data and they're different very different types of data so I'm going to actually I'm going to split this database here into two things and I'll explain that or I'll try to explain them just a little bit come on there we go make this just a little bit more uh so I'm going to split this up and say that I'm gonna there's a like a song audio database and then there's going to be like a metadata database so this is users uh songs uh what else I mean are ultimately they're probably be artists and Etc things like that but we're focusing on the songs I think in the in the music and so on this is also a database of course song audio database and metadata and so I think let me make this a little bit more uh this Spotify web server is actually going to talk to this to The Meta oops the metadata and uh and the audio database it's actually going to be talking to two databases here yeah that's interesting can you can you can you go into a bit more detail about why you'd split them into two yeah yeah I'm kind of making that uh just assuming that or making that yeah so okay let me let me to try and I wanted I want to answer let me try and think of this in terms of the technologies that I might use I might use something like Amazon S3 here and I might use Amazon RDS and so now let me explain what I mean by that so the type of data that the actual MP3 files those I think of as immutable data they're just Blobs of data they're files they're five megabyte roughly on you know on average files and they're not going to change they're just being streamed they're being you know stored and streamed essentially and so a blob database which is what S3 is lends itself really well towards that and it could be something like Google drive or there's other you know technologies that are sort of just document like blob storage uh systems but Amazon S3 lends itself really well to that and it scales uh greatly you can just add more and more and more and more and it scales uh linearly which is great and and so and then you can connect it up to to be able to stream the data and things like that so that type of data and the access patterns for that which is really mostly read you're just streaming this data you're never going back and forth and writing to it that would be why I would want to put the songs in that kind of a database now S3 is great for that works really well but the data the metadata which is the the users and their information and the songs and their information and uh maybe they get updated maybe you're searching across them and having to do queries to find songs for a particular genre or artists or things like that or you're trying to update your users like if I'm playing us playing a song and I want to remember where I left off so that when I continue playing it it remembers that I'm going to be modifying the data quite a bit or going back and forth between that database and doing these queries and you can't really do queries over S3 but you can over like a relational database so MySQL would be another option uh I just chose RDS here really Amazon's relational database as a as a because I'm going with the AWS family here but yeah so uh let me let me try and try and just write something here a little bit more in terms of in terms of songs you would have things like a song ID you would have a song Maybe maybe a URL like that you'd use for sharing uh you would have an artist a genre maybe there's a link to album cover and maybe there's the link to the to the audio I guess so this is too too big to fit so this is the type of data that would be stored in uh let me draw this in here again like in that in that database and then just the Raw I mean I almost don't need to write this here so song MP3 would be stored in this database down here so yeah so I think that the access patterns the types of things that you want to do the size of the data because the size of the data here I think we said was going to be well half a petabyte in one in one for one replica so S3 lends itself well to that the data here is going to be you know in the you know maybe terabyte range uh something like that and and it's going to have lots of queries over it maybe some updates and things like that so that's why I would separate these out is because I think the access patterns the size of the data and the type of uh the type of queries that you need to be able to do over would be very different does that make sense yes does that answer yeah yeah it does thank you yeah I think that makes sense at this point of your answer when you've laid out the main components it's good to start identifying some technologies as Mark did here starting with the database his design made sense breaking up the databases into audio data and metadata but he could have explained why he was doing this up front rather than waiting for the interviewer to ask why in general try to be upfront about your decision making yeah feel free to feel free to continue so okay so let's say we have these two databases and I'm just trying to think about now the actual the two the use case sorry the use cases we talked about so finding and playing music [Music] so for finding music I think uh the the Spotify app you know would need to uh request a do it do a fine music and so the user probably you know either is typing in like an artist's name or maybe they're selecting a bunch of filters I actually I'm not that familiar with that's how possible that is but but let's say I'm searching for music uh for an artist with a particular genre and so ultimately somehow I'm filling in this query in my app either by clicking on some things or typing some things or a combination and then that request to find music is going to be sent to this web server going through the load balancer to pick up web server it's going to go there and then that web server is going to do a query issue a query uh translated query to to the relational database to find a bunch of songs and return a list of songs and return that back up to the app with whatever metadata there there is and let's say that I found it it finds I don't know 100 songs that match my uh request for uh uh Korean pop music or something like that and so I I now get back 100 songs and I can look at those and now once I've gotten that list of songs and by the way for for that query where I'm searching for music I don't touch the mp3s at all I don't need to go to that database at all I just need to go to uh to the metadata database and do this query and because it's a relational database that that query can happen pretty efficiently Etc so now I'm I return that information back and that and then I display that to the user does that does that seem does that make sense in terms of a finding music yeah yeah that makes sense that's good um okay and so now I have a list of songs and maybe I maybe there's like the one of the ones in that list is like oh yeah that's what I was looking for and so I click on that uh song to play it I want to I want to hear that hear the song so that's that second sort of part of the use case that we talked about which is playing music so now to play music now this gets a little interesting and this is I think I'm going to wind up maybe changing what I'm thinking about here a little bit but but let me let me just talk through it so I I click on the play button or I click on the song to play it and that translates into a request from the app to the web server to start playing a song playing the song and it has an ID like I mentioned as like an ID in it and so now the the web server based on that song information ID maybe has to go to this database here to look up the link uh what did I call it oh an audio link maybe that's an Mp3 link I don't know something like that that's a better term for it I don't know so the Mp3 link and so that Mp3 link or audio link would be returned and now the Spotify web server has to go to the database where the actual audio is stored in Fetch that fetch that database now you could imagine streaming that five megabytes from this database so chunk by chunk it comes back and chunk by chunk it goes up to the Spotify app in order to do this you need like a websocket connection so you need a kind of a long-standing connection between the application and this web server so that you can chunk that you can send the data back in chunks but I'm not sure whether I would need to do that or whether because it's only five megabytes that's possibly small enough to just fit in memory you know read from the uh from S3 and fill fit into memory and then have that particular web server chunk it back from there I think that might be better because it would eliminate the possible lag between the database so you don't start streaming the audio until you have it in memory that that might be something that I I would consider doing so now let me think about so that that would be does that make sense in terms of like the the playing like you're you start playing and obviously you can control it you can pause playing and so on but does that make sense so you you you start playing the web server gets the request to play it maybe gets the information about where to go over here to get the the S oops the S3 uh storage audio storage reads that back it's only five megabytes that should be should not take it should be almost instantaneous and then it starts streaming that back to the application does that make sense yeah that makes sense uh please please carry on yeah so all good so far Okay so this almost sounds too good to be true or sounds too easy I I think it's got to be more complicated than that okay so one thing I'm just realizing here is uh probably out of these what did we say we said 100 million songs there's there's probably a lot of stuff that is uh I'm going to use the term Indie artists or something like that stuff that few people uh not very many people listen to or isn't very popular and on the flip side of that I'm thinking about like what could go wrong here well if uh like BTS which is a again a Korean pop pop band uh if they let's say they release a new song and it's like hey everybody wants to listen to this is super oh have you heard the latest song Etc if you've got all of these web servers all fetching that same thing uh that same song and let's say you've got you know requests like it's released and in the next minute you've got I don't know uh how do we you said a billion users okay let's say that uh 10 million users all request the song all at the same time or something like that because it's just been released and they're following you know social media is something you could easily overload in terms of bottlenecks you could Pro you could overload possibly uh that bucket or that bucket excuse me that particular uh song uh that file or whatever however it's stored in AWS and it could be stored different ways I'm using it S3 here you could overload uh AWS or S3 there and you could also possibly be you know like streaming like loading up all these these web servers with the same song streaming the same thing back so it's a lot of bandwidth Etc so a better thing to do and a common thing to do for stuff like this is to to use what's called a CDN content delivery Network which is like a it's a cache so this this helps reduce the amount of load on on back ends so let me draw something here let me just I am going to pick uh what shape am I going to pick I'll just pick randomly something like this so maybe over here so I'm going to say here this is a CDN which is an song audio cache this is typing in and so what would happen so and by the way the Technologies here just to this this this CDN a Content delivery network is usually very very close in terms of number of hops and network connections to users so what would happen is let me draw another arrow here and let me draw uh an arrow down to here so what what would happen here is that the first time that this song is this new BTS newly released single is is requested the web server would read it and stream it like normal but probably we need to make sure that these Spotify web servers are somewhat uh they're keeping track of things and they're keeping track of you know which songs are being requested which ones are hot they probably have a heat map the most recently requested songs and so in that heat map as they see oh this song is now ever I've seen this requested the fifth third time in you know in a minute or something like that at that point in time what it might do is it might actually uh instead of just streaming it back to itself or copying it back to itself excuse me it might actually load that into the CDN and so I am trying to I'm going to have to draw another arrow I think in order to do that because these things probably have to talk to the CDN and by the way I'm not a super expert on this area in terms of how this works but there's some connection between this Edge caching this content delivery Network which is just there for caching and these Spotify web servers so these web servers are going to somehow notify the CDN hey you should be pulling this song this BTS latest song pull that from uh the the MP3 the audio storage so the CDN you know would load that up and this now has to work in conjunction with the application so that means that the application when it when the person decides when I request the song I want to play this latest BTS song and I haven't had it on my phone yet it's just been released before I go and I talk to the web server I might even go and check in this in the application might actually go and check in this content delivery Network to see is it there it might also be so and if it is there then I just read it from there the other option of the possibility and again this is technology that I'm sort of not super familiar with is that it goes and asks the web server just like it normally would and the web server sends back a redirect saying hey I don't have it but you should go over here to this content delivery Network this this Edge cache to to to fetch the song it's much closer to you you'll get better performance and you won't be loading me up so much so I've got a lot of arrows here but uh the the the standard flow for a song first time around is you know coming around getting the metadata going around to reading the audio data the web server is now keeping track and realizing oh you know what this is getting I'm getting multiple requests for this same song now I'm going to tell the CDN go and fetch this song it's going to fetch it and so now the application can I can it can be redirected for example to read that audio from there so that's a lot of talking but I was realizing that caching here seems like it's a very important Point in in this in this design and would help with the bottleneck of of hot songs if you will does that is that absolutely yeah by the way that's a really good answer okay and there's again in the AWS family here I'm going to use the family uh you know obviously if I were a googlers you know I might be not be using this but there's a technology called cloudfront and cloudfront is uh basically a Content delivery Network there's a bunch of other ones uh gosh flask is that what it is I can't remember what the names of these of these Technologies are but there's a bunch of other technologies that are very similar in nature that allow you to do this so that would help with the bottlenecks and uh in terms of the the the caching here you could also and this is part of the reason by the way I just stepping backwards here a little bit part of the reason why I think it would be better initially when the when the web server is getting the song for the first time for it to read the the entire MP3 into its memory because then if it's if if it is getting multiple requests for a particular song uh then it doesn't have to go to the database it can actually just feed it from from its memory and that's also a form of caching of course so it's got a local cache essentially of of uh of these songs and you could imagine having a shared cache that's shared across these web servers so we that could be another optimization but my point is that if we have a cache in memory in these web servers that stores the songs then we're offloading the the database a little bit if we then have the cache here at this the edge of the network so this is that's what it's called Edge Network Edge caching then we're offloading the web servers to to a large extent because the streaming this is optimized for streaming and so on and I think if we go one step further actually I'm I'm sure and if I were designing it I would design the application because these are on smartphones uh to to store this the songs that are played frequently by the specific user locally in the local storage so it's another form of caching so then if it finds the song in the local in its local cache essentially local store here on the phone uh then you know you you wouldn't even need to go to the to the network at all right so you could actually I'll even draw like this so it's like a little little cash a little local storage here of songs so there's multi-layer caching I guess what I'm saying multiple levels of caching ultimately with the goal to provide the best user experience make sure it's super fast and to then offload obviously the system to to allow it to sort of scale and limit costs Etc so that's probably enough on caching and I think I've gone gone you know gone deep a little bit there be ready for the interviewer to ask you about particular components or different aspects of your system and to talk in detail about them it's also good to talk through the flow of a particular use case this helps make sure the interview is clear on it but also helps you test your design as you talk through it as well as helping you identify any potential bottlenecks as it did with Mark here Mark explained the purpose of the caching and also where caches might exist in the system caching wasn't limited to the CDN and the app but also the web server he could have talked about caching the metadata as well as the song data we've done caching uh that's all good uh now I wanted to ask you about load dancing uh yeah how would you think about load balancing for for this particular app yeah magic load balancing it's uh yeah a little a little bit of hand wavy here I think of load balancing I mean so load balancing uh uh commonly is used to the load balancer's job here really is to make sure that these web servers are don't are not overloaded so that they're providing good service to the end users and they're not overloaded and overloaded can can mean many things often it's in terms of CPU so you know you're getting requests in lots of requests coming in uh and if if you're getting lots of requests coming in then just load balancing these requests across the server so that there's roughly the same number of requests on each one if assuming they're all equal would probably also result in the CPU utilization being about equal I think for this application I would think about load balancing I might think about it a little bit differently because we're streaming data and so I might be not instead of using CPU as my load balancing uh metric to figure out how to distribute the load I might be looking at possibly Network bandwidth like is is because if if a particular web server is not doing a lot from a CPU perspective but if it is uh i o bound or network bound meaning it's it's hitting its limit of its if it of its a network connection then uh I wouldn't want the load balancer to send it more traffic because it's just going to bog down and you're going to get skips and things like that so I might make this load balancer aware of multiple metrics one of them being Network maybe memory for caching although you could probably you can kind of limit that possibly but maybe not CPU for this purpose it might be requests out outstanding or current streams or something like that there might be some other metrics but I'd want to I would want to make this load balancer maybe a little bit smarter than just a typical request uh round robin load balancing scheme so I'm not sure if that's getting it what you're asking yeah yeah I think that answers my question load balancing is not a one size fits all Mark mentioned looking at different metrics as a way to load balance across different servers which was a clever approach Mark was very open about the fact that load balancing isn't an error he's an expert on and this is okay it's fine to be honest when you ask about things that are at the limits of your knowledge okay cool yeah all right are you are you done with your design or is there anything else that you want to add yeah let me think about that and give me give me just a moment I I mean it again it you know I've way oversimplified this but ah I guess I think if I were to also think about this at a global scale I mean Spotify is a global app and so it's it's possible so uh replication I didn't really talk about replication other than to do the math to say hey maybe three three x replication and replication right why do we do this well we do do it to make sure that we have uh data available when there's an outage so if so you know if I if I were to you know I mean like do this right to indicate three replicas but that's that's a little naive and the same thing for the metadata of course but the replicating the data is not just for availability and downtime it's also you would want to place those replicas closer to where the users are and so from a Global Perspective you might have music that is more local like maybe European punk rock is you know more listened to more in Europe and maybe maybe BTS is you know because it's Korean pop is maybe it's more popular actually in Korea or in Asia and so you might want to have the replicas of those songs uh and and maybe even the metadata be more locally uh represented more locally so that you can get to it faster you don't have to cross an ocean to get to the to the data and uh you know to reach it so kind of a Geo aware strategy of data placement and possibly replication strategy that might be a a refinement I think that I would might make to this design to make it just a little bit more uh performant effective etc etc so yeah does that does that make sense yeah yeah I think nothing is just interesting point to add it's a good idea to wrap up your answer by referring back to the requirements laid out at the beginning of the interview and confirming that your design meets them if you have time it's always nice to think big and take a quick look at the problem from a different dimension that could make it more complex Mark did this with his geolocation idea he didn't go into detail but another interior might have explored this further and it could have opened up a whole new aspect of the design overall an excellent answer from Mark yeah great job Mark I think that was a a really good approach and yeah how did you feel now now looking back now the interview is over let's say you've uh you've walked out the interview room and read the big sigh of relief how how are you feeling that it's gone I feel pretty good about it it's uh yeah it's definitely different being on the other side of things and uh I I know that I'm sure I miss things and I'm I'm you know people are watching this and and uh wind up scheduling some a session with me I'm you know happy to take the the feedback on some things that I missed I'm sure I've missed things but uh but yeah it's fun fun to do this though cool good stuff well uh yeah thanks very much and uh yeah thanks everyone for watching hello I really hope you found that useful if you did you can like And subscribe and why not come visit us at igotanoffer.com there you can find more videos useful Frameworks and question guides all completely free and you can also book expert feedback one-to-one with our coaches from Google, meta, Amazon Etc thank you and good luck with your interview [Music] all right [Music]

[Music] hello there and welcome to this system 
design mock interview today we want to show you a   really high quality best-in-class system design 
answer so that you can use it to prepare for   your interviews and use it to know exactly what 
standard you should be aiming for with us they   give us the answers the candidate today I have 
a former engineering manager at Google it&#39;s Mark   Mark how are you doing. I&#39;m doing great Tom thanks 
for having me I&#39;m looking forward to doing this   oh thanks for thanks for coming on it&#39;s 
great to have you before we get started   just want to quickly give people an idea of 
your background as a as an engineering manager   yeah I was an engineering manager at Google for 
13 years and worked on several large-scale systems   with lots of great engineers and so I I hope 
I&#39;ve learned a little bit from them and it&#39;ll   be interesting to see me on the other side of the 
fence of uh of uh doing the being the interview   candidate here so I&#39;m looking forward to it I&#39;m 
sure you do a great job yeah and Mark also is a   coach on our platform so if you need some expert 
feedback ahead of your interviewer Google Facebook   anywhere any tech company then do check him out 
on our platform okay uh if you&#39;re ready Mark   let&#39;s crack straight into it the question I want 
to ask you today is how would you design Spotify okay uh so yeah so uh Spotify the probably one 
of the most popular music streaming services   out there I guess uh so let&#39;s let me let me 
think about that just for a moment [Music] um because there&#39;s lots there&#39;s a lot of aspects 
of Spotify here let me see if I can actually maybe   I&#39;ll just take some notes here so that I can kind 
of I think I think through this if that&#39;s okay and   and can I get some uh make some assumptions so 
I have my notes I&#39;ll just put this right in here   and then if it&#39;s okay I&#39;ll just use this for 
notes as well as sort of the design diagrams   and things like that later on yeah sounds good 
so yeah so let&#39;s yeah so let&#39;s talk about just   some different sort of Spotify okay so I for 
Spotify I can I can think of things I mean   obviously there&#39;s songs music that&#39;s the main 
thing uh there&#39;s you know you have playlists   I mean there&#39;s users there&#39;s uh artists uh I 
guess Spotify even has things like podcasts   that you know you can listen to and things like 
that so there&#39;s a lot to it and so in terms of the   system design for today uh is there is there like 
can we can we constrain this a little bit just   so that we can actually get through a design what 
would you what would we would be good to focus on   here yeah let&#39;s constrain it let&#39;s limit ourselves 
to let&#39;s look at finding and playing music   okay okay so let me uh so use cases finding uh 
and playing music I&#39;m just gonna write it down   like verbatim here great yeah I think enough 
to talk about I think so yeah okay good so uh   if I if I think about that let me let me try 
and ask a little bit or think a little bit about   the use cases a little bit more so my typical use 
case then as a user if I&#39;m signed up to Spotify   and I have an app on a phone uh I am going to 
browse some music maybe I&#39;ll find a particular   search for a particular song but but then the 
most core thing is that I play a song and it&#39;s   you know coming back through my phone maybe my 
car or whatever wherever I might be headphones uh   to listen to it with big broad questions like this 
you need to constrain the problem make it solvable   in the space of an hour and Mark did this well 
establishing the use cases we wanted to focus on   if you&#39;re asked to design an already existing 
system in this case Spotify share what you   know about it with the interviewer they can 
correct you or fill you in if you have any   knowledge gaps so if I think about other 
if I think about metrics here let&#39;s think   about size let&#39;s let&#39;s do some quick quick 
drilling down into into some numbers here   and pardon me while I get this going here so 
in numbers so tell me about how many users   you&#39;re thinking about here for for this 
design yeah let&#39;s save a billion users okay one billion users okay what about number 
of songs yeah I think I saw I think I saw it   somewhere that Spotify has about 80 million 
songs but let&#39;s say we need to have capacity   for 100 million on there okay all right so 100 
million songs a billion users and you know we&#39;re   we&#39;re going to focus on the song piece of it so 
but let me do a little bit of uh of math here   or a little bit of thinking and you can I can 
I&#39;ll just you know double check with you to   make sure that this makes sense but for from what 
I know think about a typical mp3 audio is going   to be about for a typical song is is about five 
megabytes and I mean I think that depends on oh   the encoding of the song how long the song is and 
things like that but uh if that if that if that&#39;s   okay we can just kind of make that assumption 
is that fair yeah that sounds good to me [Music] yeah I mean I think you can encode 
these things and you know really low   quality like 96 kilobits per second all 
the way up to 320 kilobits but right in   the middle 12860 is about maybe about 
that so all right so let&#39;s assume that   and so then let&#39;s do let me just extrapolate from 
this so if you&#39;ve got five megabytes per song   you&#39;ve got 100 million songs so I think uh that 
translates to I&#39;m trying to do my math here so   100 million would be uh going to a thousand times 
that would be a 100 billion and then 100 trillion   would be a million times times five would be 
500 trillion so we&#39;re talking about uh total   audio is something like 500 terabytes which is I 
guess it&#39;s the same thing is half a petabyte of   data and so that&#39;s I&#39;m making that assumption and 
so then depending on how you replicate this data   because you typically want to have these songs in 
multiple you know have multiple copies of these   things so they don&#39;t get lost Etc and if some 
replica is down so you know maybe you do let&#39;s   say oops sorry my keyboard here 3x replication so 
that would mean like one and a half petabytes of   of raw audio data and then and then each song 
in terms of the metadata meaning the you know   the the song title and things like that and 
and artists and all these things the metadata   is probably not very big it might be I don&#39;t know 
you know 100 bytes per song or something like that   per song of metadata and so you might wind up with 
okay so let me do the you have it and that&#39;s uh uh   billion 10 billion so that&#39;s only 10 gigabytes 
of of song metadata it&#39;s not really very much   and even if you say it&#39;s it&#39;s a kilobyte then 
that&#39;s 100 gigabytes so it&#39;s not very much and   let me just double check my math here 100 
million times 100 really 100 bytes yeah I   think that would be 10 gigabytes so even if we 
said if we round it up dramatically and said   100 gigabytes that&#39;s just not a ton of data 
now in terms of users uh user data you might   have you know again like a maybe you have a 
kilobyte oops kilobyte per user metadata I   guess it&#39;s all metadata and so uh that might be 
[Music] times a billion users so that would be a   terabyte of data approximately so just to kind 
of so this is just me doing some quick metrics   quick calculations to get a rough back of 
the envelope of how big these things are   uh so does that make sense so far does 
that seem yeah that will make sense so far   before starting your high level design as marketed 
get some metrics to help you identify any higher   level decisions that might be influenced by the 
scale of the system Mark asked the right questions   about the number of users and number of songs 
I made some rough calculations without getting   bogged down in the numbers what Mark didn&#39;t 
calculate was the traffic how many songs are   streamed per day or per second even that would 
have helped him figure out the scaling of the   web servers how many he&#39;d need and how much 
bandwidth he&#39;d need however since this didn&#39;t   impact his high level design it was okay not 
to go into the details of scaling at this point   okay so let me let me maybe think about 
some basic components here and I&#39;m going   to draw them but I I&#39;ll get into them 
maybe a little bit more in detail later okay so some basic components that I would see so let 
me let me start with something that is my my best   attempt at a mobile phone here so you cut your 
Spotify app and make this smaller so it fits into   the thing so you&#39;ve got your app which is on your 
phone typically and I&#39;m gonna assume here that   we&#39;re talking about the phone app because that&#39;s 
very common uh and so you&#39;ve got your your phone   the phone is going to be talking to uh an 
application server a Spotify application   server somewhere or web server I 
guess and so let&#39;s draw some let   me just put a like over here just say 
there&#39;s a Spotify web server uh and uh   and there&#39;s not going to just be one there&#39;s going 
to be lots of them and assume that that&#39;s a lot   and then of course just because this is a standard 
thing that you do when you have applications uh   you know in the cloud here this is I guess part of 
the cloud you&#39;ve got a load balancer and so let me   draw let&#39;s assume that we&#39;ve got an arrow here 
that&#39;s talking so Spotify app&#39;s talking through   the load balancer ultimately to these to these web 
servers and so those are the those are some some   sort of key components here let me think about 
this a little bit further yeah I think that&#39;s   good so far okay and then of course the most 
important thing or one of the most important   things here let me see if I can find a I know that 
there&#39;s a shape here here we go that&#39;s a that&#39;s a   good shape I&#39;m just going to say call this uh the 
database and let me make this a little bit more   uh let&#39;s see that&#39;s what I wanted okay so there&#39;s 
database and so the web servers are going to be   talking to the database and getting stuff from 
basically reading information writing information   to and from the data reading and writing 
information to and from the database   so those are kind of the very high level 
components in system design interviews good dual   bandwidth communication is important that&#39;s to say 
you need to communicate well via both drawing and   speaking you should practice this thoroughly 
before your interview Mark did this well he   started drawing the components quite early on in 
his answer and this allowed the interviewer to   follow along easily it&#39;s okay to have simpler and 
fewer components at this point in the interview   you can then break it out later in your answer as 
Mark did when he splits his database in two and   I probably am thinking about this I think this 
is obviously oversimplified and I&#39;m not going to   redesign Spotify uh you know the existing full 
application as exists but I do think I can get   a little bit more detailed here by I think I&#39;m 
thinking about the data again because it&#39;s it&#39;s   the data is important so I&#39;m thinking about the 
metadata and the user data and the audio data and they&#39;re different very different types of data 
so I&#39;m going to actually I&#39;m going to split this   database here into two things and I&#39;ll explain 
that or I&#39;ll try to explain them just a little   bit come on there we go make this just a little 
bit more uh so I&#39;m going to split this up and say   that I&#39;m gonna there&#39;s a 
like a song audio database and then there&#39;s going to be like a 
metadata database so this is users uh songs   uh what else I mean are ultimately they&#39;re 
probably be artists and Etc things like that   but we&#39;re focusing on the songs I think in the 
in the music and so on this is also a database   of course song audio database and metadata and 
so I think let me make this a little bit more   uh this Spotify web server is actually going to 
talk to this to The Meta oops the metadata and   uh and the audio database it&#39;s actually going 
to be talking to two databases here yeah that&#39;s   interesting can you can you can you go into a bit 
more detail about why you&#39;d split them into two yeah yeah I&#39;m kind of making that uh just assuming 
that or making that yeah so okay let me let me   to try and I wanted I want to 
answer let me try and think of   this in terms of the technologies that 
I might use I might use something like   Amazon S3 here and I might use Amazon RDS and 
so now let me explain what I mean by that so   the type of data that the actual MP3 files those 
I think of as immutable data they&#39;re just Blobs of   data they&#39;re files they&#39;re five megabyte roughly 
on you know on average files and they&#39;re not going   to change they&#39;re just being streamed they&#39;re 
being you know stored and streamed essentially   and so a blob database which is what S3 is lends 
itself really well towards that and it could be   something like Google drive or there&#39;s other 
you know technologies that are sort of just   document like blob storage uh systems but Amazon 
S3 lends itself really well to that and it scales   uh greatly you can just add more and more and more 
and more and it scales uh linearly which is great   and and so and then you can connect it up to to 
be able to stream the data and things like that so   that type of data and the access patterns for that 
which is really mostly read you&#39;re just streaming   this data you&#39;re never going back and forth and 
writing to it that would be why I would want to   put the songs in that kind of a database now S3 
is great for that works really well but the data   the metadata which is the the users and their 
information and the songs and their information   and uh maybe they get updated maybe you&#39;re 
searching across them and having to do queries   to find songs for a particular genre or artists or 
things like that or you&#39;re trying to update your   users like if I&#39;m playing us playing a song and 
I want to remember where I left off so that when   I continue playing it it remembers that I&#39;m going 
to be modifying the data quite a bit or going back   and forth between that database and doing these 
queries and you can&#39;t really do queries over S3   but you can over like a relational database so 
MySQL would be another option uh I just chose   RDS here really Amazon&#39;s relational database as 
a as a because I&#39;m going with the AWS family here   but yeah so uh let me let me try and try and just 
write something here a little bit more in terms of   in terms of songs you would have things 
like a song ID you would have a song Maybe   maybe a URL like that you&#39;d use for sharing 
uh you would have an artist a genre maybe   there&#39;s a link to album cover and maybe 
there&#39;s the link to the to the audio I   guess so this is too too big to fit so this 
is the type of data that would be stored in   uh let me draw this in here again like in 
that in that database and then just the Raw I mean I almost don&#39;t need to 
write this here so song MP3 would be stored in this database down here so yeah 
so I think that the access patterns the types of   things that you want to do the size of the data 
because the size of the data here I think we said   was going to be well half a petabyte in one in 
one for one replica so S3 lends itself well to   that the data here is going to be you know in 
the you know maybe terabyte range uh something   like that and and it&#39;s going to have lots of 
queries over it maybe some updates and things   like that so that&#39;s why I would separate these 
out is because I think the access patterns the   size of the data and the type of uh the type 
of queries that you need to be able to do over   would be very different does that make sense yes 
does that answer yeah yeah it does thank you yeah   I think that makes sense at this point of your 
answer when you&#39;ve laid out the main components   it&#39;s good to start identifying some technologies 
as Mark did here starting with the database his   design made sense breaking up the databases 
into audio data and metadata but he could have   explained why he was doing this up front rather 
than waiting for the interviewer to ask why in   general try to be upfront about your decision 
making yeah feel free to feel free to continue   so okay so let&#39;s say we have these two databases 
and I&#39;m just trying to think about now the actual   the two the use case sorry the use cases we 
talked about so finding and playing music [Music] so for finding music I think uh the 
the Spotify app you know would need   to uh request a do it do a fine music and so 
the user probably you know either is typing   in like an artist&#39;s name or maybe they&#39;re 
selecting a bunch of filters I actually I&#39;m   not that familiar with that&#39;s how possible that 
is but but let&#39;s say I&#39;m searching for music uh   for an artist with a particular genre and so 
ultimately somehow I&#39;m filling in this query   in my app either by clicking on some things or 
typing some things or a combination and then that   request to find music is going to be sent to this 
web server going through the load balancer to pick   up web server it&#39;s going to go there and then that 
web server is going to do a query issue a query uh   translated query to to the relational database to 
find a bunch of songs and return a list of songs   and return that back up to the app with whatever 
metadata there there is and let&#39;s say that I found   it it finds I don&#39;t know 100 
songs that match my uh request for   uh uh Korean pop music or something like that 
and so I I now get back 100 songs and I can look   at those and now once I&#39;ve gotten that list of 
songs and by the way for for that query where I&#39;m   searching for music I don&#39;t touch the mp3s at all 
I don&#39;t need to go to that database at all I just   need to go to uh to the metadata database and do 
this query and because it&#39;s a relational database   that that query can happen pretty efficiently 
Etc so now I&#39;m I return that information back   and that and then I display that to the 
user does that does that seem does that   make sense in terms of a finding music 
yeah yeah that makes sense that&#39;s good um okay and so now I have a list of songs and 
maybe I maybe there&#39;s like the one of the ones   in that list is like oh yeah that&#39;s what I was 
looking for and so I click on that uh song to   play it I want to I want to hear that hear the 
song so that&#39;s that second sort of part of the   use case that we talked about which is playing 
music so now to play music now this gets a little   interesting and this is I think I&#39;m going to wind 
up maybe changing what I&#39;m thinking about here a   little bit but but let me let me just talk through 
it so I I click on the play button or I click on   the song to play it and that translates into a 
request from the app to the web server to start   playing a song playing the song and it has an ID 
like I mentioned as like an ID in it and so now   the the web server based on that song 
information ID maybe has to go to this   database here to look up the link uh what did 
I call it oh an audio link maybe that&#39;s an Mp3   link I don&#39;t know something like that that&#39;s a 
better term for it I don&#39;t know so the Mp3 link   and so that Mp3 link or audio link would be 
returned and now the Spotify web server has to   go to the database where the actual audio is 
stored in Fetch that fetch that database now   you could imagine streaming that five 
megabytes from this database so chunk by   chunk it comes back and chunk by chunk it goes 
up to the Spotify app in order to do this you   need like a websocket connection so you need a 
kind of a long-standing connection between the   application and this web server so that you can 
chunk that you can send the data back in chunks   but I&#39;m not sure whether I would need to do that 
or whether because it&#39;s only five megabytes that&#39;s   possibly small enough to just fit in memory you 
know read from the uh from S3 and fill fit into   memory and then have that particular web server 
chunk it back from there I think that might be   better because it would eliminate the possible lag 
between the database so you don&#39;t start streaming   the audio until you have it in memory that that 
might be something that I I would consider doing   so now let me think about so that that would 
be does that make sense in terms of like the   the playing like you&#39;re you start playing and 
obviously you can control it you can pause playing   and so on but does that make sense so you you you 
start playing the web server gets the request to   play it maybe gets the information about where 
to go over here to get the the S oops the S3 uh   storage audio storage reads that back it&#39;s only 
five megabytes that should be should not take   it should be almost instantaneous and then it 
starts streaming that back to the application   does that make sense yeah that makes sense uh 
please please carry on yeah so all good so far   Okay so this almost sounds too good to be 
true or sounds too easy I I think it&#39;s got   to be more complicated than that okay so 
one thing I&#39;m just realizing here is uh   probably out of these what did 
we say we said 100 million songs   there&#39;s there&#39;s probably a lot of stuff that is 
uh I&#39;m going to use the term Indie artists or   something like that stuff that few people uh not 
very many people listen to or isn&#39;t very popular   and on the flip side of that I&#39;m thinking 
about like what could go wrong here well if   uh like BTS which is a again a Korean pop pop band 
uh if they let&#39;s say they release a new song and   it&#39;s like hey everybody wants to listen to this 
is super oh have you heard the latest song Etc if   you&#39;ve got all of these web servers all fetching 
that same thing uh that same song and let&#39;s say   you&#39;ve got you know requests like it&#39;s released 
and in the next minute you&#39;ve got I don&#39;t know   uh how do we you said a billion users okay let&#39;s 
say that uh 10 million users all request the song   all at the same time or something like that 
because it&#39;s just been released and they&#39;re   following you know social media is something you 
could easily overload in terms of bottlenecks you   could Pro you could overload possibly uh that 
bucket or that bucket excuse me that particular   uh song uh that file or whatever however it&#39;s 
stored in AWS and it could be stored different   ways I&#39;m using it S3 here you could overload uh 
AWS or S3 there and you could also possibly be   you know like streaming like loading up all these 
these web servers with the same song streaming the   same thing back so it&#39;s a lot of bandwidth Etc so 
a better thing to do and a common thing to do for   stuff like this is to to use what&#39;s called a CDN 
content delivery Network which is like a it&#39;s a   cache so this this helps reduce the amount of load 
on on back ends so let me draw something here let   me just I am going to pick uh what shape am I 
going to pick I&#39;ll just pick randomly something   like this so maybe over here so I&#39;m going to say 
here this is a CDN which is an song audio cache   this is typing in and so what would happen so and 
by the way the Technologies here just to this this   this CDN a Content delivery network is usually 
very very close in terms of number of hops and   network connections to users so what would happen 
is let me draw another arrow here and let me draw   uh an arrow down to here so what what would happen 
here is that the first time that this song is this   new BTS newly released single is is requested the 
web server would read it and stream it like normal   but probably we need to make sure that 
these Spotify web servers are somewhat uh   they&#39;re keeping track of things and they&#39;re 
keeping track of you know which songs are being   requested which ones are hot they probably have a 
heat map the most recently requested songs and so   in that heat map as they see oh this song 
is now ever I&#39;ve seen this requested the   fifth third time in you know in 
a minute or something like that   at that point in time what it might do is it 
might actually uh instead of just streaming   it back to itself or copying it back to itself 
excuse me it might actually load that into the   CDN and so I am trying to I&#39;m going to have to 
draw another arrow I think in order to do that   because these things probably have to talk to the 
CDN and by the way I&#39;m not a super expert on this   area in terms of how this works but there&#39;s some 
connection between this Edge caching this content   delivery Network which is just there for caching 
and these Spotify web servers so these web servers   are going to somehow notify the CDN hey you 
should be pulling this song this BTS latest song   pull that from uh the the MP3 the audio storage so 
the CDN you know would load that up and this now   has to work in conjunction with the application 
so that means that the application when it when   the person decides when I request the song I want 
to play this latest BTS song and I haven&#39;t had it   on my phone yet it&#39;s just been released before 
I go and I talk to the web server I might even   go and check in this in the application might 
actually go and check in this content delivery   Network to see is it there it might also be so 
and if it is there then I just read it from there   the other option of the possibility and again this 
is technology that I&#39;m sort of not super familiar   with is that it goes and asks the web server 
just like it normally would and the web server   sends back a redirect saying hey I don&#39;t have 
it but you should go over here to this content   delivery Network this this Edge cache to to to 
fetch the song it&#39;s much closer to you you&#39;ll   get better performance and you won&#39;t be loading 
me up so much so I&#39;ve got a lot of arrows here   but uh the the the standard flow for a song 
first time around is you know coming around   getting the metadata going around to reading the 
audio data the web server is now keeping track   and realizing oh you know what this is getting 
I&#39;m getting multiple requests for this same song   now I&#39;m going to tell the CDN go and fetch 
this song it&#39;s going to fetch it and so now   the application can I can it can be redirected for 
example to read that audio from there so that&#39;s a   lot of talking but I was realizing that caching 
here seems like it&#39;s a very important Point   in in this in this design and would help with 
the bottleneck of of hot songs if you will   does that is that absolutely yeah by 
the way that&#39;s a really good answer   okay and there&#39;s again in the AWS family here I&#39;m 
going to use the family uh you know obviously if I   were a googlers you know I might be not be using 
this but there&#39;s a technology called cloudfront   and cloudfront is uh basically a Content 
delivery Network there&#39;s a bunch of other ones   uh gosh flask is that what it is I can&#39;t remember 
what the names of these of these Technologies are   but there&#39;s a bunch of other technologies that are 
very similar in nature that allow you to do this   so that would help with the bottlenecks and uh in 
terms of the the the caching here you could also   and this is part of the reason by the way I just 
stepping backwards here a little bit part of the   reason why I think it would be better initially 
when the when the web server is getting the song   for the first time for it to read the the entire 
MP3 into its memory because then if it&#39;s if if it   is getting multiple requests for a particular song 
uh then it doesn&#39;t have to go to the database it   can actually just feed it from from its memory and 
that&#39;s also a form of caching of course so it&#39;s   got a local cache essentially of of uh of these 
songs and you could imagine having a shared cache   that&#39;s shared across these web servers so we that 
could be another optimization but my point is that   if we have a cache in memory in these web servers 
that stores the songs then we&#39;re offloading the   the database a little bit if we then have the 
cache here at this the edge of the network so   this is that&#39;s what it&#39;s called Edge Network Edge 
caching then we&#39;re offloading the web servers   to to a large extent because the streaming this 
is optimized for streaming and so on and I think   if we go one step further actually I&#39;m I&#39;m sure 
and if I were designing it I would design the   application because these are on smartphones uh to 
to store this the songs that are played frequently   by the specific user locally in the local storage 
so it&#39;s another form of caching so then if it   finds the song in the local in its local cache 
essentially local store here on the phone uh then   you know you you wouldn&#39;t even need to go to the 
to the network at all right so you could actually   I&#39;ll even draw like this so it&#39;s like a little 
little cash a little local storage here of songs   so there&#39;s multi-layer caching I guess what I&#39;m 
saying multiple levels of caching ultimately with   the goal to provide the best user experience 
make sure it&#39;s super fast and to then offload   obviously the system to to allow it to sort of 
scale and limit costs Etc so that&#39;s probably   enough on caching and I think I&#39;ve gone 
gone you know gone deep a little bit there   be ready for the interviewer to ask you about 
particular components or different aspects of your   system and to talk in detail about them it&#39;s also 
good to talk through the flow of a particular use   case this helps make sure the interview is clear 
on it but also helps you test your design as you   talk through it as well as helping you identify 
any potential bottlenecks as it did with Mark here   Mark explained the purpose of the caching and also 
where caches might exist in the system caching   wasn&#39;t limited to the CDN and the app but also 
the web server he could have talked about caching   the metadata as well as the song data we&#39;ve done 
caching uh that&#39;s all good uh now I wanted to ask   you about load dancing uh yeah how would you think 
about load balancing for for this particular app yeah magic load balancing it&#39;s uh yeah a little 
a little bit of hand wavy here I think of load   balancing I mean so load balancing uh uh commonly 
is used to the load balancer&#39;s job here really   is to make sure that these web servers are don&#39;t 
are not overloaded so that they&#39;re providing   good service to the end users and they&#39;re 
not overloaded and overloaded can can mean   many things often it&#39;s in terms of CPU so you know 
you&#39;re getting requests in lots of requests coming   in uh and if if you&#39;re getting lots of requests 
coming in then just load balancing these requests   across the server so that there&#39;s roughly the same 
number of requests on each one if assuming they&#39;re   all equal would probably also result in the CPU 
utilization being about equal I think for this   application I would think about load balancing 
I might think about it a little bit differently   because we&#39;re streaming data and so I might be 
not instead of using CPU as my load balancing   uh metric to figure out how to distribute the load 
I might be looking at possibly Network bandwidth   like is is because if if a particular web server 
is not doing a lot from a CPU perspective but   if it is uh i o bound or network bound meaning 
it&#39;s it&#39;s hitting its limit of its if it of its   a network connection then uh I wouldn&#39;t want the 
load balancer to send it more traffic because   it&#39;s just going to bog down and you&#39;re going to 
get skips and things like that so I might make   this load balancer aware of multiple metrics one 
of them being Network maybe memory for caching   although you could probably you can kind of limit 
that possibly but maybe not CPU for this purpose   it might be requests out outstanding or current 
streams or something like that there might be   some other metrics but I&#39;d want to I would want to 
make this load balancer maybe a little bit smarter   than just a typical request uh round robin load 
balancing scheme so I&#39;m not sure if that&#39;s getting   it what you&#39;re asking yeah yeah I think that 
answers my question load balancing is not a one   size fits all Mark mentioned looking at different 
metrics as a way to load balance across different   servers which was a clever approach Mark was very 
open about the fact that load balancing isn&#39;t an   error he&#39;s an expert on and this is okay it&#39;s fine 
to be honest when you ask about things that are   at the limits of your knowledge okay cool yeah 
all right are you are you done with your design   or is there anything else that you want to add 
yeah let me think about that and give me give me   just a moment I I mean it again it you know I&#39;ve 
way oversimplified this but ah I guess I think if   I were to also think about this at a global 
scale I mean Spotify is a global app and so it&#39;s it&#39;s possible so uh replication I didn&#39;t 
really talk about replication other than to do the   math to say hey maybe three three x replication 
and replication right why do we do this well we do   do it to make sure that we have uh data available 
when there&#39;s an outage so if so you know if I if   I were to you know I mean like do this right to 
indicate three replicas but that&#39;s that&#39;s a little   naive and the same thing for the metadata 
of course but the replicating the data is   not just for availability and downtime it&#39;s 
also you would want to place those replicas   closer to where the users are and so from a 
Global Perspective you might have music that   is more local like maybe European punk rock is 
you know more listened to more in Europe and   maybe maybe BTS is you know because it&#39;s Korean 
pop is maybe it&#39;s more popular actually in Korea   or in Asia and so you might want to have the 
replicas of those songs uh and and maybe even   the metadata be more locally uh represented more 
locally so that you can get to it faster you don&#39;t   have to cross an ocean to get to the to the data 
and uh you know to reach it so kind of a Geo aware   strategy of data placement and possibly 
replication strategy that might be a a   refinement I think that I would might make to 
this design to make it just a little bit more   uh performant effective etc etc so yeah does that 
does that make sense yeah yeah I think nothing is   just interesting point to add it&#39;s a good idea 
to wrap up your answer by referring back to the   requirements laid out at the beginning of the 
interview and confirming that your design meets   them if you have time it&#39;s always nice to think 
big and take a quick look at the problem from a   different dimension that could make it more 
complex Mark did this with his geolocation   idea he didn&#39;t go into detail but another interior 
might have explored this further and it could have   opened up a whole new aspect of the design overall 
an excellent answer from Mark yeah great job Mark   I think that was a a really good approach and 
yeah how did you feel now now looking back now   the interview is over let&#39;s say you&#39;ve 
uh you&#39;ve walked out the interview room   and read the big sigh of relief how 
how are you feeling that it&#39;s gone I feel pretty good about it it&#39;s uh yeah it&#39;s 
definitely different being on the other side of   things and uh I I know that I&#39;m sure I miss things 
and I&#39;m I&#39;m you know people are watching this and   and uh wind up scheduling some a session with me 
I&#39;m you know happy to take the the feedback on   some things that I missed I&#39;m sure I&#39;ve missed 
things but uh but yeah it&#39;s fun fun to do this   though cool good stuff well uh yeah thanks very 
much and uh yeah thanks everyone for watching   hello I really hope you found that useful if you 
did you can like And subscribe and why not come   visit us at igotanoffer.com there you can find 
more videos useful Frameworks and question guides   all completely free and you can also book expert 
feedback one-to-one with our coaches from Google,   meta, Amazon Etc thank you and good luck 
with your interview [Music] all right [Music]

Transcript for:System Design Mock Interview: Designing Spotify

Transcript for:
System Design Mock Interview: Designing Spotify