Designing Instagram's Ranking Model

designer ranking model for the Instagram [Music] feed okay thank you so much for being here today riam uh can you quickly introduce yourself to our viewers yeah definitely um my most recent experience has been as a machine learning engineer working at meta uh specializing specifically on ranking and recommendation systems uh previously before that I've been a software engineer and a machine learning for quite a while uh across wide variety of Technologies and IND okay amazing it sounds like you have the perfect background for answering the question that we're going to talk about today then so let's get right into it so our question today is design a ranking model for the Instagram feed great thank you Angie for the question I'm just going to quickly write it out as well so I have something to type it and look at and use as a reference as we go along um yeah it makes sense Instagram so design an Instagram feed ranking model all right so before we jump into the actual design of of the of the items what I typically want to do is uh run through to recognize what who are the actors what are the what does the system expect it to what are some of the business requirements and where and how some of those business require requirements can be translated into machine learning models uh which is where our thrust of uh the conversation would be so uh basically what what from what I understand and please feel free to correct me or or you know uh refocus our attention on things that are important uh a a a media platform like you know Instagram where you actually go into the homepage you actually see a bunch of U uh posts that are suggested to to you as a user and at the very top what I see is like suggested post from friends that I'm connected to on on the platform and then subsequent to that there are suggested post that are uh that are ranked based on um a ranking algorithm and we if I understand correctly are concerned about that particular ranking what are the post of from nonconnected people that uh that would be of interest to me and that possibly would lead to a higher engagement me liking the post commenting on the post and so on and so forth am I on the right page as what you intend for us to design yeah exactly um so we're trying to focus on like suggested posts so the those might be various content creators that the users not necessarily following but also of course uh it could be people who they're like tangentially connected to nice great um and uh from a business perspective I assume and this is based on my my understanding the the business is interested in improving the engagement of me as the as the viewer uh with these content so that that way in some way or shape or form it actually contributes the session or the daily active user within the platform am I right on that yeah absolutely so I think it'd be interesting to talk about some metrics both at like a high level and also at an individual level that you'd like to measure right uh improve Dow or sessions uh engagement for the viewer great um all right so uh that's just laying laying out um what would be the metrics of interest to the uh to the uh to the organization um so one of the things that I want to mention here is as if we looking to employ some sort of a machine learning model to be able to move this metric uh that's of interest to the organization or team uh what we however would be working on would be at the level of an individual as opposed to the gross metric which is aggregated over the entire Community or the entire user base so as if we were focusing on the machine learning model what it what we would be designing is an objective function that is in some way aligned with the the overarching uh metric of intent for the uh for the organization or the team so we'll assume that somehow we have teased out some sort of an objective that can be used by a machine learning model that will be contributive to the actual uh Machinery actual uh metric of interest to the uh to the business so ml objective is aligned let's assume with the uh business motive which typically is something along the lines of Dow sessions etc etc um so uh and and here ml objective in in the case specifically talking about a a Instagram user would be uh something along the lines of viewing the uh content um content uh engaging with the content which could be liking commenting and so engaging com so so those are the activities that I see uh happening on the machine learning model or machine Machinery that we're going to build to be able to do that so this is just you know giving a quick rundown on what what what's involved in in terms of uh core quote unquote functional requirements right so this is just getting the context for the functional requirement yeah um a quick clarification question there when you say that the ml objective is aligned with the business model are you saying that you're directly optimizing for those metrics or are you optimizing something that should be correlated with those metrics yes we uh we the objective is correlated to the eventual business uh business metric da so you're not like directly optimizing like da or a number of sessions for example exactly we wouldn't directly be attempting to optimize da because that's daily uh active user account we we're not looking at the gross metric we're looking at the individual objective that we can act on on a particular individual so that it will contribute to the DA yeah okay that makes sense right and typically the way we do that is by virtue of data scientists having studied this we find uh activities or actions by individuals which are contributive to the overall uh uh metric of Interest the the other uh item I want to mention here is in terms of um and this is uh trying to get to a Machinery that would be able to predict and and and contribute to this overall uh overall metric the other thing we want to also focus on and pay attention to a little bit would be uh on the non-functional side which is looking at the scale of the system so it's scalable uh is one thing uh and it's available be another thing uh and and and in order to concretize some of these maybe some estimations of numbers would be be helpful as well uh and that typically uh in in in social media social media of this nature that's globally present the numbers could be quite large and I'm going to take a few quick stabs at some of these um daily active user count you know maybe in the order of a few hundred I think it possibly it's in the range of like 500 million or so uh for Instagram and perhaps even more um I'm going to pause there and see whether you have any you know Hearts questions and so on I'm just enumerating some of the items that come to my mind at this time uh in terms of non-functional and estimations uh yeah that makes F sense to me so far and I assume we're going to get to functional requirements later then uh I'm going to go back and and and see what more is to what more needs to be added on the functional okay so U on the functional uh I want to make sure that we get this right in terms of uh the specific objective of the Machinery so we want to build a Machinery an ml model that gets uh that improves individual engagement and that's the core of our uh functional requirement and here the engagement could be just viewing the cont the post or liking the post or commenting on the post and that's the core function of the ml model that we're attempting to build the nonfunctional as we stated is that it it be scalable available uh and um yeah I think those are the two key things that come to my mind um and the estimation was just to support that non-functional uh understanding of the non-functional functional require non-functional requirement uh if there's um do you have anything more we need to uh set out here or establish here before we move on to perhaps looking at uh what would be involved in the entire building out an entire ml pipeline training and so on and so forth or do you want me to proceed yeah so I'm curious for the non-functional requirements um or I guess you could put this under functional maybe like if you would think about incorporating any sort of um tooling for this ranking system that's that's a good point uh thank you for bringing that up typically yes we do want to have want to look at Analytics uh uh tooling specifically maybe along the lines of debug ability uh monitorability monitoring um uh and uh in general uh ml Ops related Al alerts and warnings right if there are operations mlops alerts and warnings when we have exceeded or exceeded certain thresholds on number of poor U features or the feature coverage has not been appropriate enough and so on and so forth so you're right definitely need some ml Ops related uh tooling available uh what I what we may not be truly interested at least based on what I what I see in this descript in this description so far gathered is perhaps how do we monetize something like this right uh a lot of social media is monetized based on commercials that are shown which are also embedded into the uh into the feed like Instagram feed but we are not directly looking at that but that would be a non-functional requirement should we you know choose to at some point yeah I would agree that's out of scope for now okay uh and in terms of analytics you know just want to quickly layout a few things that may be of interest would be uh something along the lines of you know who are the most popular content creators geographically how are they distributed uh or what are the trending uh Po and things of that nature that may be of interest from the side of Instagram uh administrator right or a data scientist um anyway that that let's not get bogged in that but that that would be the some of the things that I think uh would be of interest typically okay perfect so this looks like a great place to start I'm curious where you would go next now that you've gathered like the basic requirements yeah uh what I want to uh look down now a bit more deeply is to see um what are the features available to be uh to uh proceed creating something like this um and you know how do we build a a Machinery to to to accomplish this task but before I do that I just want to quickly lay out my thinking based on what know what I understand of a a feed ranking uh system like this so typically a system like this would involve at least three different phases so it's not like one Machinery or one model and we are essentially building a pipeline uh and not just so much a a a single model and the pipeline involves three stages a candidate generation stage uh a a a a ranking stage and then finally a post Crossing stage and this is uh how typically things are done within the industry right now uh for uh each of these stages the the the input would be the the previous stage uh for the candidate generation stage you know we uh so in order to perform the candidate generation stage there are certain Contraptions that would have would have to have been constructed and be ready for available for us uh for the ranking stage the output from the candidate generation stage is consumed and then we finally apply some post processing which is typically uh things along the nature of you know fairness diversity and and freshness of content and so on um anyway uh what I want to take a step back from what what I showed you here as being the overarching pipeline and perhaps look at the the data that that is available which is the data/ features that would be available in order for us to construct something like this um and give me one second here can generation great um before I jump into going through the data and what sorts of what categories of data we look at is there anything at all you wanted to talk about any comments you want to make uh only just one quick question so for this post-processing stage you mentioned like fairness and diversity and so on so overall is postprocessing meant to sort of like change the list of candidates that you've ranked or are you like adding things in like what what does it consist of exactly oh good question uh post procing typically consists of excuse me reranking or reshuffling the candidates that we have found um okay in in such a way that so if I have 10 candidates show post shown on my feed and let's say the 10th candidate is perhaps someone uh something that is required for the listing to be more fair in some sense uh we would boost it up to perhaps position number three things of that nature of course I'm giving you only a you know assuming there is 10 candidates but typically what we want to do is something along the lines of candidate generation would generate around 1,000 candidates uh who the ranking system would run Rank and then we would subsequently uh postprocess some of them in place to to have more visibility or less visibility so the short answer to your question is uh we're not generating new candidates at postprocessing we're re reshuffling their appearance or their presents okay cool sounds good so I'd love to hear more about this data pipeline then yeah so uh the in in general we want to gather features for the viewer uh the Instagram Instagram viewer the Instagram post uh we also uh want to gather features uh of the viewers interaction history and this is interaction history with post or content and spe you know additionally we also want to gather viewer interaction history uh in their social uh Network in the sense that in the uh in the connected content they have with other uh other uh uh viewers uh meaning not necessarily content creators but other uh folks who they are connected to in the in the network so these are the general overarching categories of features that one would look at specifically if we look at each of these category uh what would be of interest uh would be acre I should call it aggregated aggregated and delayed uh features um an aggregated Fe feature an example would be something along the lines of what has been my engagement for the past 7 days what is the number of um likes in the past 7even days just just just an example and and U I picked seven out of thin air uh it's conceivable one can have this for varying Windows you can have it for one day two day five days s days 28 days and so on and so forth and we also want delayed features that are features that typically are um have to do with his iCal interactions something along the lines so what was my interaction a week back what was my like count uh 14 days back and so on and so forth so this gives us a uh this hopefully captures quite a bit of richness in terms of the viewer itself and and the trajectory of know what they've G gone through right yeah and I like this idea of the aggregated features because it's Smooths uh interactions over time essentially absolutely right uh and then similarly for the Post uh we could have something along the same lines but here the Post men the post uh itself direct directly may not contain much much details uh although it could uh we also would be interested in the post creators um so the content creators post creators uh features uh how well connected are they the number of their number of people who follow their content and so on and so forth and in terms of post typically one could imagine a post being parsed into video audio uh and text and we could gather embeddings for each one of these for video portion audio and text and Gathering those would be helpful as well um and then um what has been the the the historical engagement of people uh meaning viewers against this post in terms of the number of Engagement number of likes that it it has accured in the last 3 days four days and so on and so forth so something that parallels what we spoke of here right aggregated and delayed features and parallel to that for uh for for the Post itself um okay now and yeah can you give me slightly more detail about what these embeddings are like do they come from some type of pre-train model or are they collected from other sources of data right so specifically embeddings can be created from other pre-tin pre-tin models uh embeddings for video embeddings for audio embeddings for text can be collected from pre-train Models running these post again against a pre-train pre-train model would give us these embeddings typically they are vectors of certain sizes that we have chosen to uh uh you know they could be in the hundreds uh depending on what exactly you choose sometimes it's like typically we see vectors of the size of 786 floating um numbers floating Point numbers uh again the sizes are very much dependent on the architectures and the size of the models that are used to create these embeddings um does that answer your question uh yeah sure okay so the plan then is to get these from pre-trained Models yes that is one source uh that is one source of uh for embeddings okay okay um we could we could also um generate embeddings uh so One Source this uh we we could also in our own uh Machinery that we're going to build um be able to construct these embeddings as a as a consequence of the architecture we choose and I'll get to that in a little bit when I get to a two Tower model that I I am thinking of uh using for for for this particular problem okay okay sure uh so um that's uh just Gathering uh data and Fe features features here not so much data because there what what else what else could happen uh in terms of uh when a person interacts with a a post so that's when we generate label labels um and labels are a delayed signal uh it is it's delayed with respect to the fact that an impression happens today on my on my Instagram feed I choose to react or respond to that or react to that maybe a day after maybe a week after so the labels are available only after the fact of whether I liked it whether I forwarded it whether I comment about it and so on so this is another part of the data that that has to be captured I'm going to separate this out into a different line so you could say data being features being a part of uh data uh this is interaction did I like it view it Etc and these could be binary labels right this could be zero of not having an interaction I'm going to talk about how we get a zero in a little bit one would be a positive interaction meaning I actually engaged with the content with this particular action okay and it's what is difficult typically is to gather a zero label because there Z how do you how do we know that someone hasn't uh reacted to a post or responded to a post and so on well the only way to gather that is to to enforce on upon ourselves a threshold uh after which we discount any other any other inter any interactions so this requires that we uh put forth some sort of a threshold up until which We Gather positive labels meaning one label one right um um all right so these are the overarching features that one can think of typically you know if we are inclined to look at this a little bit uh mathematic some something along the lines of a tuple x a RP is what we're Gathering where X is the vector of features uh a is the uh a is the action that is that has been recorded um actually a is the action that has recommended R is the reward and P is the probability and I'll explain that in a second uh oops that's not what I meant okay X is a rep X is the features uh is the action recommended and R is the reward um p is the probability and the we may not immediately see what this Tuple is at this point of data Gathering but this becomes evident once we start looking at uh how the model is eventually deployed and and and served in in real life uh for now we have just looked at what the X features are um the reward is closely related to the action that was taken by the by The View by the viewer right so if I choose to have an engagement it would be a one otherwise it' be a zero okay and the P probability has to do with serving time once we have built a Machinery we would be able to gather this pre number for the probability of the Machinery recommending the uh recommending this particular uh likelihood of this particular EXP feature resulting in an engagement but we'll skip that for now uh I just want to make sure that we go on to the next step that we would we should be covering uh which would be talking a little bit more detail about what this candidate generation ranking and postprocessing mean for us to serve and as well as what it means for us when in terms of uh model training and model building if that's okay yeah of course uh so let's talk a about the model architecture then okay uh so uh what we need to do in in in the serving at serving time is to be able to generate candidates and the way to generate candidates at serving time would be to specifically be uh for this sort of a problem of recommendation would be to actually use what is known as approximate near nearest neighbor uh in order to compute approximate nearest neighbor we need to compute uh um embeddings for viewers and posts so again uh I'm talking about the candidate generation phe of uh the the the pipeline okay um so in order to uh be able to compute the uh compute the the th ranked uh candidate post for this current viewer th000 post for the current viewer we want to use an approximate nearest neighbor logic or or algorithm and in order to do that we need to first off generate some sort of an embedding and this is where uh the earlier embedding conversation came up and we we we tie those two together and then once we have identified those uh candidates th candidate post for this particular viewer we can rank them based on the uh based on the probability of Engagement that they they they would entail um so uh the the the task now would be to look at a Machinery or build a Machinery that would allow us to learn the embeddings right and the way to do that there are a few two different T technique tactics or techniques used uh one is um uh a collaborative filtering technique the second is a a two Tower uh Network technique of learning this I'm going to spend a few minutes on each of these and U and and and and I'll tell you which one I I would personally prefer for you know several reasons uh here very quickly for collaborative filtering technique what we're essentially doing is constructing a if I can move this uh then copy paste let me move this two Tower thing away okay come on oh I can't click on uh I'm going to open a new uh new uh what do you call it um can I do that oh yeah create new uh board so I do not I don't have the text and the diagrams mixed together somehow it doesn't make me yeah sure let me yeah so let me do this uh where is this get copy this is this and then uh another one what I'm attempting to show you here is or three boxes uh essentially along the top row if I can text where is it or post items post items along the top row along the columns and here are uh viewers so this might be an example might be John another might be Amy here post items here example might be post number seven an example here is post number number nine and what we're capturing in this Matrix is a one or a zero corresponding to each of these that indicates like a signal of whether there was engagement exactly that indicates whether there was a engagement or not as you can imagine very quickly this this is by the way part of me attempting to elaborate on uh collaborative filtering technique MH okay of learning and embedding so you as you can imagine and I haven't got into the other two boxes as you can imagine this is already a sparse Matrix why because the number of engagements in comparison to the number of available post is very sparse I might only engage with a handful at at best you know on a daily basis or even acrude over time uh what what I have on this on this box on the right hand side or would be viewer features uh so if I look at across the line across the Rope for John I would have let's say John's age maybe and John's uh um uh uh latitude longitude somehow quoted in as a as a as a number that can be used um maybe um the viewer features whatever fundamental features of the viewer or it could be historical aggregated features of the viewer as well number of views to post or number of likes to post in the past week things of that nature so things of that nature can be coded up in here on the on the on the um on the extension table on the right hand side on the bottom in the bottom what we can do here would be post features right so in here we have post number seven what is the what what is the feature for post number seven maybe the number number of likes in the past week so maybe that was 100 uh number of um number of reposts or number of um likes uh number of sorry uh comments and that may be say 20 things of that nature so there would be a long string of post features appending added on to this column there would be a long list of viewer features added onto the r now this mat this Matrix call it a matrix a can can be decomposed into a a user Matrix and a and a I think it's called user Matrix and the item Matrix v. transpose and that's the essential nature of collaborative filtering what we're doing is we're trying to approximate this Matrix a and and uh factorize that into two different matrices of smaller Dimension and if we look at this closely u a would be of Dimension M byn where m is is the number of users uh n is the number of posts that we have gathered together to form this construct this uh table whereas U would be of the dimension M by D and V would be of the dimension uh d uh we transpose n by D okay now what we have done effectively in this in this method of looking at the problem is we have we have uh found a low dimensional representation of users and post items into a d dimensional space uh and in the act of doing that we have approximated the the the interaction Matrix and that's the idea between behind collaborative filtering and the way to do that really is to actually apply regular Rises and so on so forth but I'm going to stop here uh see uh before I jump into the the two Tower Network and how we could learn embeddings off of that I'm going to stop here and see whether you have any questions or any c Corrections we need to do in this conversation yeah uh not for now I guess so in summary it seems like the intuition behind this is that um in a you're trying to approximate all of the interactions via like the similarity between the viewer uh the viewer Factor the D dimensional factor and the post factor right that's that's correct so now let's quickly uh spend a few seconds talking about the the two Tower Network which is which is slightly preferred uh in contemporary uh in in the industry right now because it's uh especially given that collaborative filtering requir some sort of alternating lead Square mechanism for Matrix factorization and uh in the case of large data that may not always be feasible or may always be uh uh pertinent so in a two Tower Network essentially what we have two towers of a Towers to a uh neural net uh and I'm going to make it rather uh easy on myself oops like this and quickly draw just the the outlines of it uh this is the viewer Tower and this is the post item tar what we feed in here are would be the features of the viewer the sparse and the dense features of the viewer and I should call this Tower and the post item Tower the input would be the post item features okay uh what would come out of these two would be uh what we restrict ourselves to an output that is of size d to represent our D output units on both the viewer and the post item can move this and then finally we have a a sigmoid layer here which allows us to take the dotproduct of these two and compute a sigmoid on the sigmoid function here and then gives us uh a a probability like number between zero and one so this is the the essential idea between behind a two Tower Network what do we Fe feed in a viewer set of features post set of features and we run through two independent Towers generate d d layer output take a DOT product it take a sigmoid the output becomes a number between zero and one and we compare this against the label we have a label of a 01 and we use binary cross entropy loss as the loss function and we appropriately adjust weights in the in these in these networks based on what what's the loss that we encounter and progressively this network learned better and better weights so as to approximate what what what's seen as the labels and in having done that now we have learned Two Towers Independent towers that can independently be used to learn or present the embeddings for a new user and that's the uh the the idea behind a twoot tower Network in order to train this uh so I think I quickly write down we would have a a binary cross entropy loss and U we could use uh any sort of machinery for training a deep Network stochastic gradient descent uh any of the optimizers Adam Optimizer and so on and so forth to be able to train something like this uh and that gives us embedding as an outcome uh when we pass in an individual user you can get an embedding as the outcome from that yeah okay that makes a lot of sense and I like that the high level intuition of this approach is very similar to the last one where we're just trying to come up with these D dimensional vectors or embeddings representing those viewer and the item and just learning the similarity between the two exactly the the next thing I want to quickly talk about in the interest of time is to go on to uh how we uh train these models the first one collaborative filtering going to put it aside for now but in this case what we need to be cautious of is that we have adequate samples of positive and negative labeled data points of few or post item interactions in order to do that we can use several techniques to do that uh know we want to say if you we we have 500 million positive examples we want to do some negative sampling and and fetch 500 million negative uh samples which mean no interaction samples and that thus our Machinery is learning the correct embedding that predicts the engagement of Engagement likelihood as opposed to the inherent bias in the data so that's a very key important step to be taken here in terms of uh preparing data for the training itself in terms of um when you say negative samples do you mean simply posts that haven't been uh interactive with the user yet or are you're referring specifically to maybe like posts that have been presented on the feed to the user but the user didn't interact the post that did not uh encounter a positive interaction meaning no likes no views uh no whatever is the engagement metric that we want to use right oh yeah I mean like is it conditioned upon that post having shown up on the users feed before yes these are all conditioned on post having shown up on the users feed before okay great yeah okay because that is what we would want to see that is the person likely to engage given that yeah and yeah that makes more sense so right and a Machinery like this we know this Machinery has come up to a a a good maturity in terms of training once we look at the the actual losses of not the L actual accuracy metrics or we look at the Au curve of this remember or or if as you may know this is a classification problem now because we're predicting label zero or one now you can plot Au plots area under the curve of The Roc of The Roc curve and that is something that we can use to uh as a use as a gauge to see how well this model is is performing in terms of being able to predict the labels uh typically we can use something along the lines of 80% as a number as a gating number to say okay is this model good enough I've seen in various groups in various teams this being number being varying anywhere anywhere from 60% to 80% but definitely it has to be greater than 50% because 50% is the the random line like like flipping a coin anyway perfect having done this the next thing we could uh I want to talk about is going back to the pipeline again uh if I can if I may how do I do that thought it was available in this on my files I think uh no not here yes uh in the in the pipeline what we now can look at is the uh is past the is is that the candidate generation how do we generate candidates now well it's very simple now because we have we have given a particular user we can run the user through the two Tower Network and we'll get an embedding for the user so is a new user who come on to the system and then we simply do what is known as approximate nearest Neighbors which is a a nearest neighbor algorithm much like you know uh KNN K nearest neighbors you explore nearest neighbors pick 1,000 of the nearest neighbors and these 1,000 neighbors are neighboring viewers I mean neighboring post items form our 1,000 candidates then we run through ranking which is we go back to the Machinery that we have built and which is the two top we have 1,000 candidates we have the current viewer sitting here take all 1,000 of them one at a time and produce this label produce this prediction the higher the prediction number the higher likely the more likely that this person is to engage okay and then that use that number to sort these uh candidates in descending order and that forms our ranking okay okay and then yeah and then we go into postprocessing in which we can apply uh hand you know hand edited rules that say certain types of content or certain types of viewers have to be promoted or certain type of posts have to be promoted higher or lower within the uh within the within the result Set uh okay so now that you have this final ranking uh how do you feel about moving on to evaluation of that ranking right the there are a few things to do one as I mentioned was something along the lines of plotting the Au curve and so on in in training itself looking at those metrics and we are also in while in preparing the data we also have validation data that is prepared separately and we look at the validation data uh and see how how does the validation data set perform uh in terms of the same metric Au R curves and so on that's one one dating before we actually unleash this model the the next step to do is we want to run an AB test to a limited controlled fraction of users in the real life in in production systems so that means a a portion of users are shown a portion of controlled users are shown the existing system a portion of a similar number of users are shown the this new uh new model that we built based on the uh Canada generation uh U ranking and post processing pipeline U and in this particular Cas we would be looking at uh the the outcome for those users who have been shown this new model or who have gone through this particular treatment to see whether they uh engage more than the users who have Trad who have not and that would form our the basis of our AB test what we look at there in in terms of metrics would be definitely of course the engagement numbers themselves but now this is where we tie it back to the business objective does these folks who have gone through this new uh uh new uh uh mechanism yield a higher daily AC activity so for example if you're looking at Dao do they yield a highly daily active user if that's the case then this is a good model to promote up into production otherwise uh otherwise we do not uh there are a few things to mention here in addition just the Dow we also want to look at some of the Safeguard metrics in our in our uh in our system for example a safeguard metric might be some something along the lines of do I get uh higher number of reports in there are posts sometimes that people report or posts that cause a a viewer to block the content creator and things of that nature if that numbers these are Safeguard metrics if those numbers are elevated or increasing then possibly there is a uh there is a skew or bias in this model and that this may not be a good candidate to promote the production does that answer you question yeah totally um okay so I think this is a great place to pause because I thought this is a really fantastic solution uh and I'd love to hear from you like what do you think if you were the interviewer what do you think went well and what do you think you would have wanted to change or add as an interviewer I would have uh loved to see a much more rapid Pace at which we would we could have arrived here and then a second thing is um be able to see the actual um serving pipeline which means uh what would we do in real life so in order to be able to serve this model out to out to production and how we can do some sort of a online learning as well so this model is continuously learning based on newer data that comes in um because non-stationarity is is a problem within you know in the industry in terms of user interaction so on how we can see that that would be something I would have loved to see over yeah I was quickly to say overall I think we covered the basis basis basics of U the the functional requirements uh there are non-functional items that we couldn't get to uh which are things along the lines of you know how how can we ensure scale U how can we distribute the computation the learning and perhaps even the serving to multiple servers and so on yeah and that's already like a very difficult part of deploying a model right like uh a lot of times when we think about machine learning we're thinking about just model training and evaluating the model and we haven't even got to uh some of the more complicated parts of just like engineering a large scale machine Learning System you know um but I also thought that in general I was very impressed by how thorough and how well thought out your solution was uh I really appreciated um that you thought through so many details like how after the ranking there's extra post-processing where you might adjust uh the ranking to account for like fairness or diversity uh you talked about features that are not just from the current point of time but are time smooth or lagged um I also liked that you talked about different types of evaluation metrics Beyond just accuracy you know um whenever you have any sort of like possibly class imbalance system it's great to check things like uh AOC Roc and also things like Safeguard metrics which in a real world scenario are important to think about as well um yeah and then like one more thing I might add that if we had time to it' be cool to get to is uh how do we solve the cold start problem uh if a user has just joined Instagram and you don't know anything about their preferences or what kinds of posts that they might engage with uh how do you decide how to generate candidates and rank those candidates for them what do you think yep definitely th those are valid points uh specifically on the cold start um typically what is done is to supply these candidates these viewers rather with posts that are popular and see how they react to the react to them that's typically what is what I've seen done in the industry yeah that's like a great great play great way to do it uh okay so thank you so much for teaching us so much today about ranking I think we've learned a lot from this interview and hopefully we'll have some more up upcoming content on ranking and machine learning uh thank you for being here vicam and thanks everybody for [Music] watching

Transcript for:Designing Instagram's Ranking Model

Transcript for:
Designing Instagram's Ranking Model