Improving Restaurant Rankings for Deliveroo

hey everyone thanks for coming you got a choice to be in two other rooms and you chose this one I really appreciate that and also thanks to the organizers for the opportunity to speak here is actually a genuinely a privilege so yeah thank you so yeah I'm Johnny I'm a data scientist at the livery and I work in the merchandising algorithms team so today I'll be talking to you a bit about what we do and if you've looked at the title you'll probably guess it's something to do with ranking of restaurants so a little more detail what I've talked to you about so I'm gonna introduced the livery for those of you that don't really know what deliver is and and introduce why we've started this this merchandising algorithms team and then I want to spend some time talking about our approach to ranking you know and how it works have it framed the problem the modeling strategies that we're using then I want to talk a bit about the tools so we have to not only build models but we have to deploy this and serve these recommendations so how have we done that what tools have you chosen and the processes and then finally I want to finish off with this section on the lessons that we've learned you know if we've deployed a few models run a few experiments and we've made some mistakes and I want you all to learn from these so yeah and that's basically it that's what I want to go through so here's an introduction so for those that don't know what the liver is you can think of it as a platform that tries to connect consumers to local restaurants around them and that's through via an app where you can look for restaurants and you can order and then we deliver we have a fleet of riders and a rider will go pick up the order so for example if you want to go in order from Wagga mummers Wagga murmurs doesn't have their own delivery fleet so you'd have to go to the restaurant order and eat there but the Libero allows you to actually order and enjoy the food from the comfort of your own home so you'll go to the app you can scroll through find Wagga mummers click order and we'll send the rider off to the restaurant pick up your food and then they'll deliver that to your door so that's the idea and you can probably tell from the from this slide that it's a three sided marketplace we have thousands of thousands of restaurants and riders and even more consumers in over 300 cities and in 14 countries so it's a big operation a huge network so if I'm a if I'm a consumer and I want to go an order from the app this is what I might be presented with so on the left is what you might see on an app and on the right you've got the web and basically you've got a list of restaurants and what we assume what we are hypothesis is that depending on the the order of the restaurants that you see a consumer might behave differently right you may or may not order depending on what you see in that list so enter the merchandize an algorithms team right we start with initial goal and we say why don't we try and present the restaurants that are most relevant to the consumer first right because we believe that that is a thing that's going to be best for the for that consumer and they might want to purchase more so that's that's an introduction to to the to delivery and a team and what we're trying to do and so now I want to spend some time talking about our approach to ranking this these these restaurants so we've I've already mentioned the objective given a list of restaurants user opens an app there's a local set of local restaurants right given that how can we rank them optimally and optimal can mean many different things and in this case in our case we want to say that it's relevant to the user all right so the question is how do we quantify this it's all right having an objective but can I look at some numbers that will help me understand whether we're actually achieving the objective of showing the most relevant restaurants to users so two of the metrics that we look at is order volume are we getting more orders in total over a period of time and the other one is this session level conversion and you can think of that as of the total number of sessions so if a user goes on the app the of the total number of times that someone goes on to the app what proportion of those actually result in the consumer actually ordering that's the way you can think of it so the idea is that these two metrics the order volume and list session level conversion if we increase those that's likely due to the fact that we're improving the relevance of the of the is that we're ranking right that's the idea so this is what we want to improve so how can we frame the problem the ranking problem such that we can affect those metrics such that we can improve that so here's how we might want to look at a session so in a session we saw you have a list of restaurants right and this can be hundreds of restaurants right if you're in an area with densely packed restaurants in a big city like London you're going to see hundreds of those and this converted is basically did you order or not so where we got a zero a user user didn't not order from that restaurant and where there's a one a user did order from the restaurant right so it's kind of like a target variable right and that's just one session but we can have a different user a different location and they can open the up as well that's a second session and so for our training data we take thousands upon thousands of these sessions and and that's essentially what we're looking at and this forms our target variable so we decided to frame this ranking problem as a classification in a classification framework in a point wise approach and we'll explain what that means why can we frame this as a classification problem well what we can do is we can say ask for each restaurant in that list we can say what is the probability that the user will purchase from that restaurant so that's independent of it on any other restaurants in that list we just look at that restaurant that's what I mean by point wise right and classification because we've saw that the the target variable is either 0 or 1 did you buy from it or not and then we can then run for a model and I'll talk about that in a bit and it will output a score some probability and then we just ranked in descending order and that is the rank that's most relevant to the user all right so that's that's how we're going to frame the problem but it means that because we're looking at this in a point wise classification fashion that means we can use a loss function we can use a logarithmic loss function right just as we would do in any other standard classification problem and this so this is how we're in able to now train models to tackle our ranking problem but in a classification framework so I've discussed the target variable and how we might train models but we need some features to go into that so we might want to look at all the different factors that might encourage a user to actually purchase from from from a restaurant so it might be the popularity of the restaurant it might be that if users will purchase for more popular restaurants or the estimated time of arrival perhaps a user really needs some food within 20 minutes so if there's a 45 minute wait that's not suitable for them and so and perhaps they look at restaurant rating and there's loads of different factors right and we can stick them into some some box run it through F its box and output some some probability score all right and then rank in descending order but F can be as complicated as we want right it can be a simple heuristic or it can be a crazy dense neural network or whatever it is but at delivery we have this approach that we want to start simple and then just iterate and improve on things so actually what we initially began with was just a simple heuristic F was not a machine learning model it's just a mixture between the popularity of popularity rank and the estimated time of arrival rank and one of the reasons why why would we do this right so it's it's simple its intuitive we can interpret the results much easier but also we have to serve these results we have to build an entire infrastructure to be able to serve these recommendations sorry this is this ranking of lists right so with a whole bunch of moving parts there's a lot that can go wrong and we want to make sure we minimize the amount that can go wrong and so we don't need to concentrate on the complexity of the model while will be wily while we build out an infrastructure right so we started with a simple heuristic that was good enough to get started focus on getting an end-to-end pipeline in the infrastructure sorted and then we can start moving on to more complex models so then we started moving on to logistic regression models and we're looking at also more complex neural network architectures and I'll go over some of that a bit later so where we got to we discuss the target variable we've discussed the features and potentially models that we'll use right so we saw that we can get scores just ranked in descending order and Bob's your uncle we have a list right is that rank is that list any good we have to evaluate that now we've a frame this as a classification problem so in theory we could probably use evaluation metrics like accuracy or precision but when we think about the user experience they're not looking at a single restaurant they are looking at an entire list so actually we decided it would make more sense to look at an information retrieval metric for ranking which actually tries to take into account the list itself and importantly so that's that's one reason we're looking at information retrieval metrics the other thing is remember that I said at the beginning we wanted to look at in increasing the online metrics the session level conversion and the order volume now it's very difficult to go back in training data or practically impossible to then say if a user didn't order in a session now I run from my model my model then rewrites that list now they're going to order it's very important it's very difficult almost impossible to predict that so offline when we're training ranking models we don't evaluate on those online metrics we need offline metrics which are proxies for the online so these methods here are just proxies and if we improve these metrics we believe we're going to increase the online metrics so that's that's the reason why we're looking at these ones and so there's loads of there's there's a few different ones I won't go for all of them but the one that we decided to go for is this mean reciprocal rank and explain that what it is in a second but it's we went for it because it was fairly simple intuitive and again we believe it's it would be a suitable proxy for improving the online metrics so how would we calculate mean reciprocal rank for those that don't have much experience in this let's suppose we have five sessions right and on the far left that's our first session it's got five restaurants and the user converts or orders from the restaurant in the fourth position all right the reciprocal rank of that would be one over four right in the second session they order from the restaurant in a third position the reciprocal rank one only three right so we can do that and we can do that for all of the other sessions that we trained on and then finally the mean reciprocal rank is then just the mean of those numbers alright so then we get the the final fraction final score here it's roughly naught point two seven all right so that would be how we calculate the mean reciprocal rank over all the sessions that we trained over in our training data and we put it in through our test data as well know importantly that the mean reciprocal rank has a maximum score of one and that corresponds to if we managed to rearrange the list permute the list in such a way that the restaurant that the user ordered from was always placed at the top position and the idea is that if we're able to write a an algorithm that ranks the restaurant that user converts from at the top of the list then it's likely that we're doing a good job with the rest of the list right so and we'll come back to this we'll come back to this assumption lessons learned come back to this assumption but that's exactly what what we've done that's that's what we're doing from this point in time so I've mentioned about the features the the target variable the evaluation how does this actually work in our in our workflow what do we actually do so like many of you who may be working on machine learning models we need to build the Train and test sets we've got a bunch of sequel that does that queries from our data warehouse and we can build our train and test sets then we have a bit of Python to go and validate that data so we just want to make sure how many nulls are there should there be nails there do we need to impute what duplicates are there things like that so we're just trying to validate that then we go get into training and so notice we're going to train on this validated data and we train loads of different models and the difference between model one and model two can just be a removal of one feature it can be a completely different model it can be a heuristic it can be a logistic regression but we train several different models and each of those models will have its own permutation of the list so with a different permutation I can calculate a mean reciprocal rank for each one and then as I said before we just choose the one that gives us the highest score all right come back to that but we choose the one that gives us the highest score and that's the best model and once we've chosen that we can then put put all of that into production right so we run we use a continuous integration talk with circle CI that will run a whole bunch of unit tests for us and then push them some docker can docker images I should say up to Amazon s3 and then we have an inbuilt machine learning a pipeline called canoe which basically runs this job periodically each day and so that that's how we then do that so at that point we've built some models we've evaluated them offline right and we've productionize it great we're in a good position but the thing is we have not checked that it's moved our online metrics we've not checked that let's move the things that we believe achieves our objective all right so to do that we need to run this online we need to run it and we need to run an a/b test so what we do is we run a user level randomization so a user will get a version of the model so 50% we'll get the the current version another 50% will get the new proposed model and we'll run the ad test look at those online metrics and see which one actually gives us an increase in in those metrics and we'll choose that one right but this is an iterative process so I said at first we started with a simple heuristic model for example that might have been a good start but then we do a slightly more complex model and that might win but then that's still not an optimal model so then we build an even more complex and better ranking algorithm so this is a process that we'd go through over and over again so we run multiple a B test is just to try and improve our rankings so and we're at the point now so that's that's the whole pipeline of what we do and how we work and our approach to the problem and at the moment we're in it we're in the process of building slightly more complex models and feature engineering so as I said we've been using a lot of logistic regression at the moment we're currently working through a wide and deep neural nets as more complex models but also more feature engineering so looking at new features but also embeddings of features like can we get better representations of the features that we're already using to improve our search results so we still believe it as there's a lot more that we can do to improve the rankings that we that we actually show so that concludes the the the approach to the ranking problem but as I said we need to build an infrastructure that enables us to do this so I've mentioned a lot of the data science work but but what about the the infrastructure that supports this so you know we're in this in a situation where actually we need to build something that can serve features that sometimes are aggregated and we can get daily or in advance so we can write some sequel to get sort of like audit like pass thought or order volumes and things like that pass conversion scores but then there are some features that we don't know until serve time so the estimated time of arrival we can't know prior to the user actually start an accession and open the app so there are these things that we have to consider when we build an infrastructure to serve this so we considered several different ways that we could productionize our models one of the things that we've done historically in delivery and one of the things we considered is let's say that we write our model and our team writes the models and does all the analysis in Python we write a model we've got we've chosen it it gives us the best mean reciprocal rank score and then we say okay let's go and sit down with an engineer and rewrite that model in the production language and then the production language now has a version of that model in there and that can just serve straight from there and that's what we've done in the past but as you can probably imagine it's a bit slow it's a time consuming process it's prone to errors all right if we get the mistake in copying the the model over but also the it limits the complexities of the models that we can we can actually use all right because we have to implement it all from scratch so we'd have to implement like tree base or neuro-networks from scratch and go if there wasn't a library for it so we decided to think okay what's another method that we can use to productionize well what about if we didn't need to rewrite the model in the production language but instead we wrote modern and Python and then just containerized it and released it as a separate micro service so now the actual production service that shows you and renders the homepage would call out to this Python micro service and then via some maybe HTTP HTTP interface and then the the micro service would go rank restaurants and return it to the productions never say hey here's a ranked list of restaurants so that's cool but remember that actually you don't want to wait a long time when you open the app and we were worried about potential latency problems in making a lot of requests over the network and also at busy times where there's a lot of requests we might need to scale some containers up and down and actually figuring out what to scale and when to scale was going to be slightly difficult so we went with option three which is basically to integrate a serialized version of the model into the production language so what I mean by that so we can serialize Python objects so when you save a Python object it usually use pickle and then you can read that back and then you can use the variable again the problem is pickle can only be read in Python and our production language isn't go so what about if we serialized in a format that to be read from multiple from different languages and so we figured well this is the case let's try and use a framework that would enable us to do that and which is why we chose to implement our models intensively that is the main reason because it allows us to serialize models in a format in which we can read from other production languages but there are also other benefits that tend to flow so one it's got it's fairly mature so it's got a good community it's got good documentation and so we could guarantee that there was going to be a good level of support when we got stuck it includes a whole range of models so you can implement linear models in there trees for the contrib modules but also deep neural nets and it what isn't much of an API change and because we're like more familiar with linear models we were very interested in using something that could serve linear models but also allow us enable us to easily go more complex and also it has this what's called the estimator API I want to explain that because I wasn't familiar with until we started looking into tensorflow but it basically the idea is to give an interface that allows you to write models similar in fashion to how you wouldn't psyche it then so I don't usually like putting code up on screen I promise I don't but I don't need you to focus on the code I just want you to focus on on what bits this is so this is like an example of how the tensorflow estimator API might work so basically the first thing you need to do is define how the Daito needs to flow into the model so if I have like a let's say a CSV or a panda's data frame of the data I need to say oh I want a thousand observations to go in in each batch I want to shuffle that batch because I want to randomize the data and all that so we write a function I haven't put it in there because I couldn't be bothered and I want it to feel on-screen we then need to create features so tensorflow needs to know where their features numeric because it might want to scale it or if it's categorical because it might want to do one hot encodings or embeddings by the last two bits what make the estimated estimator API familiar to us so with if I want to use a logistic regression classifier I just do tf2 estimated linear classifier right in a similar fashion to two scikit-learn right and then again if I want to train it I can now do classifier train right which is analogous to the classifier don't fit in scikit-learn and importantly if you have used low-level tensorflow before and you've remembered that like crit and placeholder variables you can run multiple sessions and it's just it's just a headache I don't need that in your life right so this this enables us to not worry about any of that stuff it doesn't we don't have to worry about creating those variables and we can train models in in a simple fashion and if I want to switch to a deep neural net in the estimator API I can change linear classifier to DNN classifier for example and just add a few more arguments so now we can change some more complex models in a very very simple way and then importantly the other big the other important thing to remember is that we need to in like right inference and go are there any go writers a new like goes any audience okay a few like I've tried I've tried here I don't right go but this is roughly how it would be done right laughter you know so you import the tensorflow modern go you can write function get model it just needs to download the serialize model that comes out but importantly here so model dot session not run so it's a similar interface in another in another language and then it takes as the first input the the feed dict for anyone that's familiar with tensor flow we've put in the the input features and then the next argument is the fetches so the node in the tensor flow graph that says okay I actually want to get the probabilities from the logistic logistic regression classifier so that's the output node and then you've got your results inference it's great it's great so this is how we actually read from the serialize model in go so at the start this section I was talking about the different approaches to productionize in the model like the different considerations and one thing we're looking into is how something like Amazon sage maker can help us improve that flow because it's still a bit of friction in doing that there's still a few hoops and hopefully we can use its tools help out with that but additionally sage maker even if it doesn't change things on the production world it helps with parallelizing model training and also hyper parameter optimization so we're looking at using this for several different parts of our workflow so yeah really stages of that but awf AWS has been very helpful so far cool all right so we looked at the approach to modeling my ranking of disgusts how we train evaluate and then also the tools how we built the infrastructure to do this and we've deployed several models we've run several a B tests and we've made several mistakes and so I just want to devote the last section of this talk to some of the lessons that we've learned and if there's anything that you take away like I hope that these are going to be able to help you in some of your projects as we go forward so the first one is even though your production areas your model you serialize a model and everything works in your training environment you also have to serve this and actually when you serve it there may be bugs in a in a data science code there may be bugs in the in the engineering code and so you can actually get different results different ranks in production to your ranks in training and you won't know it and you know when we started some times we didn't we didn't actually check this at first and when we were starting to see different ranks or different results what's going on something simple we can do is just check the train service queue check that production actually matches what you expect in your training environment so this is I showed you one here that's perfect we check the ranks on production versus offline I don't write bugs in code so this is perfect if they all land on y equals x then we have a perfect rank so our production ranks match training right but actually if we see a scatter here that means something's gone wrong right so something's different so then we also have to check the features so for example if I go onto my phone and we serve a ranking a rank list I can see an estimated time of arrival of let's say 20 minutes all right that's that's what I use we'll see and then the next day that goes into a warehouse and I can write sequel to generate a dataset so now my sequel gets the estimated time of arrival for that same user and session and it might not be 20 minutes and so we want to check that when we run our sequel for our training data it actually matches the inputs that are going in in production and so here is is what we can see like four features here that did not match what we saw in production and the proportion of times that it did not match and so looking at these things can really help actually find bugs sometimes it may not be in the code that you've written more often is more is in my code but but sometimes it can be an engineering code as well and this can help find where those problems might be and we can fix them so doing this every time and setting up so it's actually part of the pipeline can be very very time like time-efficient i'll come back to that later log and monitor everything right so if there's a number you know you need log it if you're not sure log it if you collect the number and you don't think you need it log it right I promise you it's it's just so important so like the example I'm just going to use to like explain it so you know we have to write sequel to get the sessions from yeah get session data so we can train models right and that runs every day and over the last few months delivery has started to like revamp and change the backend to to the to the home page and so it meant that the tracking changed and we didn't love this and so for week for a week we were starting to miss theta because the sequel columns were matching it they were all nulls we didn't love this things we didn't really look at it and and what we then saw was at the point at which the training data period was completely messed up that's when we got an error saying our model in train today it turned out this was like actually a problem for quite a while and we just weren't seeing it and so he actually didn't had to go back but the only thing is you've all you've got is your your looks that's all you've got so we have to start from the training script where actually we just saw the modeled in train but actually the problem was way back sequel wasn't collecting the data so it was a few scripts down and it actually wasn't a change that we'd made it was a change on the backend for the engineers we didn't actually change any of our code but things went wrong and if we had logged everything and had monitored everything we would have caught it much earlier so if there's numbers that you don't you're not sure that you need just just game now just get him down it can save you a whole bunch of time and here's one I'm gonna come back to you so I mentioned to you that we run many models calculated mean reciprocal ranks and we said the best one was the one that gave us the highest score that's actually not always true and it can be very misleading in fact there were times where we saw an a/b test that we had increased with really the model to increase the meanness of pool rank but none of our online metrics actually changed right and this this is led us to start to include what other what are the limitations of mean reciprocal rank well actually it only assesses the single session that was converted it says nothing about the rest of the restaurants in that list right we made an assumption that we were if we could put one restaurant at the top we were doing well with the other restaurants right that's not that assumption fell through right and so actually we figured we needed to start to look at several different metrics so there's no one single metric that we're that we're not looking at anymore so one of the things we go and do is we look at again scatter plots so let's look at what are the ranks of the current algorithm versus the ranking of a new proposed algorithm right if they lie on the line y equals x and they're completely identical well then if you release an a/b test you're not going to see any changes because the ranks are identical and so actually we can tell why how much scatter is in these plots and we can quantify it with the Spearman's rank correlation coefficient like how different ranks of models are so now we can actually start to to determine determine in advance how different we what differences we might expect in an a/b test the other thing that we find incredibly useful is actually looking at the surface you're changing so we're actually changing the ranks when we released this model what the user sees what is their experience they see a rank of restaurants so looking at a global metric you are not you're not actually looking at what the customer experiences and so actually what we end up doing some of the engineers and our team built this ranking insights visualizer and we released it to all employees is that we can actually is not very clear on there but you can like switch between different models that we have in production and actually look at what ranks you'd expect and you can put in different locations different postcodes to see where the ranks make sense and this is incredibly important because if I now look at this and I can see at dinner time I'm getting breakfast restaurants at the top of the list maybe I haven't accounted for time properly in in the model so this looking at things like this can help us really understand the changes that we need to make in a model to improve our recommendations our rankings so it's just something that we use as a cinch check and finally if you've not read Google's rules of machine learning online go and read them and if you read on what's before read them again like I find it incredibly useful actually not just me but like other members in a team to go back to this like you know two months down the line three months because actually some of the rules make so much more sense after you've seen them and you've gone through the issues and you're like actually if I just done that earlier I would have would have caught the problem and so yeah so go back and read that so in summary of the lessons learned check the differences between training them and and production environments right it will ultimately speed things up and we can actually check that we're making the impact that we want right if we can check that our models are working accurately then they'll have the impact on the metrics that we want the other thing log and monitor all the things everything it's so useful don't just rely on global metrics as well right so global metrics are fantastic they're good regression use our MSE or whatever use accuracy whatever you want but don't just rely on them actually look at some of the predictions themselves make sure that the the predictions you're making make sense and finally rule read googles Wars of machine learning so yeah just to wrap up and in rehash everything I've just said so in a team we we had initial name aim to like improve the restaurant ranking and we wanted to improve the relevance of the restaurants to to the consumers and and we built an infrastructure to do that and we've built a bunch of models and along the way we've made mistakes and we've learned so much from it and there's still more to learn I'm sure there's there's gonna be a lot more pitfalls there maybe next year I'll come back and talk about a few more who knows but we've actually got work to do because we're trying to expand what we're doing so that's just ranking of the restaurant list but there are other things that you want to do ultimately we want to algorithmically design and create the best experience for the consumer we might want to touch different services so we need to look at search results and filter the horizontal scrolling carousels you might see like Oh order again or is some discounts and stuff like that so we actually want to try and start to do algorithms that will affect the whole experience but that's not all we do you can probably tell now that I'm starting to get to there to the you know promoting delivery here that's not all we do yeah all of you know there are other algorithm seems right we have we need to work out how much should should like fees all right how should we set them you know we've got a network and the network's different in different places and we order from different places which are further away you know should it should it change thank you logistics I've mentioned me at a start we have a three sided marketplace with a ton of restaurants and riders and consumers and they're everywhere you know how can we make sure that this network runs efficiently you know there's a lot of complicated problems there and also we run a whole bunch of a/b tests whole bunch of experiments I'm not first to finish we want a whole bunch of experiments and so actually it's we want to try and standardize this process and get out there so we're building an experimentation platform so come and help us do this and obviously guys we're hiring join us where our stand at the entrance to the big room at the back I'm gonna be there after this talk and to the to the end of the day and they'll be at least one data scientist and a recruiter on there all day today and and tomorrow's world's come see us and finally thank you for listening [Applause] epic talk the so I've also done some recommenders and I've also sort of noticed that like there's these proxies and they don't necessarily correlate with real life have you found any proxies that do so not perfect actually I've been trying to read like quite a few different recommendation articles and papers to see if there are like good offline metrics that better represent online I haven't seen any one of the things that we've done in our team but we just need to do a few more experience is we're trying to actually correlate offline metrics with what results we get in experiments and so because we run experiments we do have online number theta points what it was each experiment on its own is one data point and so we need to run quite a few to get to build that correlation we can calculate things like online meanness that will rank online precision at K but yeah the short answer is we haven't yet found it but I think the the work will do in correlating those will hopefully help but it's not perfect no hard questions thanks for the great talk I'm glad you said that Mrowr has limitations and really happy that you had that on your slides so I had a comment any question so what the comment is mr are is hard to interpret because for example the difference between putting the relevant restaurant in rank one and rank two is exactly the same as putting the relevant restaurant and trying to and Rec infinity so I wonder if you see interesting patterns in the data similar to what I described and also if you if you're going to use precision at cave for example what evaluate of K would you use in your case alright so first question I mean reciprocal rank do we see so as you can imagine with it being reciprocal if you place a the restaurant at position one you get a meanness of course go of one if you place at position two it's now half if I place it at position three is 0.33 so the difference is although we're still moving the restaurant in the same amount the difference is between the scores change right so it's it's it's not perfect how do we account for this I personally and and this is a thing so I think I evaluate in these recommendation systems and ranking systems in my opinions quite subjective I like to think of the mean reciprocal rank as thinking about it as almost like a threshold like if you can get a mean reciprocal rank which is a point two that means on average you're placing the restaurant in the fifth position right so you're doing fairly good there but once you get to that point I think if you start getting ones that are lower maybe that's not good enough and so actually once I start getting minima cyclical ranks of more than point two I'm starting to think that there's not much difference in the ranks and actually what's more important is looking at the ranks done individually set for me looking at the scatters and starting to assess whether actually it looks like it's learned something because now I can see that breakfast time I'm seeing a whole bunch of breakfast restaurants it's not perfect but I we've not settled on any good way to rank that precision at K we this was some of the work when we were correlating the online metrics with the offline we found that thirty seemed to be best for the data points we had but I don't think I had enough data points to be able to say that firmly so no one quote me saying thirty is the best but that's what it looked like but for me intuitively I think thirty would be too much I actually think we'd want to sell it lower because I think when I think about when you think I just think about having a session how many how many restaurants do you scroll through before you evaluate where a list is any good do I need to evaluate through all 30 so do users do that I I actually don't know and we're like implementing scroll tracking to start to see what they're seeing but yeah it's hard to know I mean so yeah I can answer you you mentioned bit as a marketplace of three three parties but your goal is very focused on one do you have any thoughts about how you would extend the work you've done to maybe it might be something they're taking and how it benefits of riders or new restaurants like cetera yes so I'll try me relatively quick because I know we're running out of time yes it only focuses on one and actually the change we've got to make is in the objective function I read a paper very recently by Spotify it's called towards a fair marketplace read it if you're doing any ranking problems they looking at relevance to the user fairness to the artists so that would be like fairness to the restaurants around us and actually what is satisfaction for the user and can they balance fairness and relevance to get the best satisfaction I think that we can change the objective function so that we can like show users more restaurants that are of a wider variety but where we're still early days in the haven't got to that point but yeah [Applause]

Transcript for:Improving Restaurant Rankings for Deliveroo

Transcript for:
Improving Restaurant Rankings for Deliveroo