Transcript for:
Data-Driven Product Development and Experimentation at Deliveroo

[Music] [Music] welcome everyone i'm jonathan and i head up the the product analytics teams to work with delivery which is kind of everything related to riders within the Libero and restaurants which is restaurants and jamie here he's working with me and he's gonna introduce what we're going to talk about tonight yep so my name is Jamie and I'm a data scientist on the right of payments team so what are we going to talk about we're so first of all jonathan is going to take us through you know who deliver ey which hopefully everybody or at least its familiar and hopefully even ordered with the delivery but he's also going to go through kinda the birth of delivery and the the amazing growth that we've experienced to get to where we are now and then i'm going to talk about some details on it's like a particular experiment design which we use quite frequently here at delivery which deals with some of the challenges associated with small sample sizes and high variants for example and then jonathan is going to go back through how do we how do we encourage the business to let us run all of these experiments and then also actually act on these experiments how do we kind of like gain their faith and get everybody on board to let us do our job thank you so show of hands how many in here have ever ordered with the Libero quite a few people you haven't tried you should really try it's it's really good actually so let me give you a brief background on the recent history of the delivery space do you remember like a decade or two ago you'd have to go down to your local pizzeria you'd have to pick up a menu and then you'd have to make a call at a later stage to get your pizza and it was all like a hassle and it was it was no very convenient right so so that was kind of how delivery started it's it's but it's changed a lot since then after that came the online market places so you would restaurants would be able to put up their menu on them on a web page and people would be able to order they would be able to get more orders and things were getting more convenient but there was still nothing provided as a service to restaurants in terms of delivering the food they would still have to take care of that themselves someone from the pizzeria or whatever restaurants they were they were had to do the delivery themselves or founder and CEO will he had a few problems with this so one was the restaurants that he saw on these marketplaces he didn't recognize any of them there were no well-known brands they're just like ramen restaurants here it wasn't particularly interested in ordering from them and a lot of the workers that he made they took more than an hour to get from the point that he were here ordering and that's that's you know quite a long time if you're hungry you want your food directly right and also it was hard to know when are you gonna get the folk foods it's you know you'd have to like basically sit on hold like put your life on hold for a little while while I'm waiting for my delivery and that's not a really good experience either you want to know approximately when you're gonna get your food so we took took all those thoughts and any sort of delivery when come the third wave as we call it all the delivery so that was coming you know like bringing in well-known brands that everyone loves and when I eat it at and from making sure that you can deliver that that food in a quick manner less than an hour and way less than an hour we've some words that take us a shorter six minutes and that's an amazing experience if you're lucky to have that and [Music] you would have a prediction of you would have a sense of when you would get the food so it would be much easier for you to come out you can order your food and you know approximately when you're gonna get it so since then this was five years ago and we'll start the company and we've grown immensely since then we're now in twelve markets we've had six investment round and raised quite a lot of money and we're in more than more than 150 cities 200 yeah very close to 200 izing so you know we've grown immensely so it's done a lot of things that's happening and at the core of the company is really data and data science and we we kind of have three different legs of all that science organization product analytics which me and Jamie represents so we're going to show you some examples of what we do within that space soon algorithms which is an example of a library would be frank or dispatcher mobile which assigns arrive there to an order Anthony is going to talk about that in a little bit and give you an overview of what's going on in that space and then we have B our business intelligence which is kind of developing or thinking and or structure rooms how we report metrics and how different metrics relate to each other the reason that we have so much focus on data is that the whole organization is really hungry for data-driven decisions everyone wants to make a great decision and we fundamentally believe that a data-driven decision is a really good decision so when it comes to politics that really means evidence-based product development and what does that mean is it's just a string of fancy words in practice that means experiments and what do I mean by experiments how many here have worked or are working with Ava tests or experiments or testing in general a lot of people quite a few yeah there's some that happened and that's okay I'm gonna go through an example yes give you a sense or of what the concept is about so the purpose of the experimentation is to measure an effect on behaviour on something new that is introduced in in the experience of the user so imagine that we have a population of of users we can pretend that it's all the riders that the liberal you are working with when we do an experiment we split this this population this group were riders into two different groups one control the Rhine variants in the simplest of examples and for simplicity let's imagine we have the same size of the group's 50% in each to begin with they are having the same experience they're having the same rider application they're following the same delivery process everything is the same and then at the given point in time we introduce something new to the Barrett group and then we left the experiment run for a given time and we measure the specific metrics that we've decided beforehand for the variant group and for the control group and let me compare the metrics for the for those two groups and the difference within those are the effects or this new or this new feature and one thing that is challenging with this in general is that there is an element of randomness to this so you can never be actually sure that what you're measuring is it's exactly true and one way to to kind of reduce this uncertainty that comes from the randomness is to have a big sample size and in this case we're drivers we have 35,000 riders all across the globe so we're pretty well set we're in a good position to run experiments here even if we drill down to specific geographies but wait a second Jonathan wait right there because get a little gear because we are outside of the business which is the rider side we actually don't we often don't deal with like for example rider level or order level metrics and instead we we think about the network and network level effects and so I can illustrate that with with an example if we think about this restaurant which is nicely displayed as a little Pentagon in the middle of the page which at the moment let's imagine that this circle around it depicts its delivery radius so at the moment the restaurant can deliver you know 10 minutes in any direction obviously in reality the rules that govern how far restaurant could deliver a much more complicated but for the purposes of illustration this is great for customer 1 because customer one can order from this restaurant and can probably expect the food in a relatively short duration but perhaps for whatever reason as delivery we think that maybe this restaurant should be able to deliver further and so we let it deliver you know 12 minutes in total in any direction that's potentially you know a good thing but it's actually not totally clear as it might appear on surface as to is that net good or net net bad and that's because there obviously lots of stakeholders involved in determining the outcome in this business and we can examine a few of those now so starting with perhaps who is more clear cut the perhaps the clear-cut winners in this either like the restaurant number one because the restaurant is gonna probably get more orders they can address the code you know a larger proportion of their population of delivery users and so it's probably net a good thing for those guys it's probably also good for customer two because customer two couldn't previously order from this business from this restaurant so at least they have the option to now so again probably that there a reasonably clear winner however there are other groups where as kind of it's difficult to say right so for customer Wan maybe like maybe it's fine maybe you know it's just kind of like business as usual for customer 1 but it's possible that if the riders are now busy delivering to customer 2 then now for customer 1 he's he or she is gonna take longer to to get their food and and so maybe that's not a great thing for customer 1 and then if we think about the rider for example on one hand it's it's a good thing because the rider is probably gonna get more orders because there's this greater catchment area and if we bear in mind that rider tends to be paid drop fee per order which is the flat fee then that could be a good thing but that kind of assumes that they have spare capacity and so if they don't expect capacity and perhaps if instead of delivering to customer Wan they're now delivering to customer 2 they've now got to go further to earn the same money so again it's difficult to say whether that is gonna be a good thing or a bad thing for for a rider and then there's delivery and for us like again maybe it's a great thing like more orders is obviously a good thing for delivery but it's it's potentially to kinda like mitigate against the deterioration in customer service levels which we kind of referenced in customer Wan maybe we need to put more riders on the road which obviously represents a cost to delivery so there are lots of things here which is it's not totally clear and there's a clear case for experimentation when we make a change like this and because all of these things are not really isolated to a specific rider or a specific order or a specific customer or restaurant for that matter we need to perform some kind of level of aggregation and a usual aggregation level that we use here is to alakazoo level and here we have the zones across London and zones are just kind of a delivery specific thing but they're very useful in lots of cases when we're looking at rider level changes because when a rider signs up to work with us they can assign themselves to a specific zone which means that changes within that zone from at least more right level perspective can often be relatively isolated to the zone level sometimes we will aggregate to city but often like a zone is a reasonable level of aggregation which we wish we use and because of this aggregation what it tends to mean is that conventional a B tests as Jonathan was describing often unsuitable and there's a lot of reason for really but two of the main reasons are one like the sample sizes tend to be small so if we're making a change it affects London for example I'm looking at the zones we've got fewer than 100 zones across London Mopsy lemons a huge city and so a hundred data points is actually a really small sample size to be performing any kind of experiment you know an a/b test on and also there tends to be high variance within the data points so within each of these zones in this case the high variance makes it harder to detect what has happened as a result of the change that we've made rather than just some you know Random randomness so this is when we use one of the one of the experiment designs which we use quite frequently at delivery is it's called block design and there's a nice little quote here which is block what you can and randomize what you cannot I actually don't know exactly who to attribute there's quite a famous quote but I don't know who to retreat the go to I read it in a fantastic textbook by George box but I think I think he was quoting it so yeah I don't show but whoever said it first I can illustrate what they mean and kind of like the idea behind block design and to go into that and the first talk about it's like nuisance variables for the kind of integral idea to what we mean about blocking something and nuisance variables are kind of exactly what they sound like so they're variables which may well be correlated with the metrics which we're trying to observe a change on but they're actually of no interest to the experimenter and I can illustrate with a couple of examples one to go back to like the zones which we saw for London a the significant difference between for example Erica a zone around here in central London which is incredibly dense lots of customers lots of riders lots of restaurants what that means is that generally for example if we're looking in a metric like average order duration which is how long on average went from when a customer says you know I want my food to they actually receive the food the average or a duration we quite low in an area like this however if we go to like a zone you know further out on the outskirts of London or anywhere really it can often be the case that the average or alliterations are much higher so there's lots of variance there in those data points and that we also have experience like significant seasonality throughout the year but also throughout the week so as you may imagine weekends are way way way way more popular for us than and weekdays so these are all things which we want to block out because this isn't something that we we want to control for these factors essentially so here we go block design to the rescue and I'm going to illustrate block design with an example so if we go back to the example we talked about before with a single restaurant delivering a little bit further let's think instead if we're going to apply something more general and more difficult to think about because possibly you can model something like a single restaurant but let's say instead we were extending the delivery radius for all all restaurants of a certain cuisine time and Londyn let's say we hypothesize that because perhaps sushi travels better than other foods maybe we should extend the radius for all Japanese restaurants across London and and track certain things so things we might be interested in is like does that change the average conversion do more people buy from delivery because they can see more Japanese restaurants do we experience an increase in refunds so perhaps it actually doesn't travel better and we're having to do many more refunds now because these delivery distances have increased what about for riders like our riders now having to like are they able to deliver fewer orders because the government go further maybe that's not a great thing for the writers there are lots of things here which particularly this kind of scale when we're talking about you know huge changes across a whole city or or even more than that it's basically impossible to think about without some need for experimentation so now let's talk about actually like the technical details of how the block design process works and kind of like the like in summary the logic behind it first of all if we think about all of our zones they say that you know there are hundreds owns in London then we will randomly split them into Group A and go sorry let's say Group one and group two and from that that will assign kind of the treatments the sequence of treatments that are applied to them so in this example here we've got a two-week experiment sub starts on a Monday and ends on a Sunday and we can see that zone 3 and zone 2 are in Group 1 and so what this means in the purpose here is that on the first day so on the first day of the experiment which is the Monday they will explain experience the the B effect which in this example is to say that the Japanese restaurants will have a larger radius than they would do usually and then that will alternate each day so on the Tuesday for these zones it will go back to what they have now so the you know the the normal normal size delivery radios and then that will continue so it's in like a ba-ba effect so the treatment is on treatment is off and then the kinda the opposite is true for those in group two which so that's going to go in like an ABA B sequence and what we can see here is that for each in this like two-week design for each zone they all have a single day so they all have like a Monday or Tuesday or Wednesday at Thursday or Friday it alternates to kind of keep like DataFlex out as well and then from there like the process becomes kind of like simple in some ways because we just take the mean effect of all of the B days so when the treatment was applied for each zone and the mean of all of the a days as well so when it's in its kind of control condition and we subtract the a days from the B days to give an effect per zone and then we simply perform a one sample t-test on on the on all of these effects and test for significance and kind of like hey presto like this actually quite neatly controls for the inter week seasonality so into a week seasonality which we talked about before because each zone has an observation at a single day for like a single day of week and also the intra Interzone factors as well meaning that we can actually deal with sample sizes as and get reasonable significance on certain metrics on sample sizes as low as even 50 not going to run on like I did thanks Jamie so Jamie has described a really good method for handling cases and situations were have very few samples it's important for us that when we're doing this experiment these type of things that the whole organization is on board so that we can we can really go ahead and do them and if we can you know measure or changes in a good way so to do that and to make sure that everyone is on board we've kind of employed a specific process somewhat rigorous for how we are going through experiments I'm going to go through it just quickly to give you a short overview how we do that so that kind of involves design that's the first part the design really kind of the purpose of the design phase is to decide three things one is which experiment method what we going to use so it could be an a/b test it could be the block design that Jamie just described second is where we're gonna run this experiment and third what metrics are we gonna use and that depends on what we believe that this change is gonna impact when we have that design we usually run a simulation usually a Monte Carlo simulation and that's to understand how good or accuracy and sensitivity is gonna be on the metrics that we've decided and that also determines how long we're gonna run the experiment from or sometimes we will need to go back to the design phase and change something there in order to get good enough accuracy in a timely enough manner once we're happy with the simulation we start running the experiment for the time that we've decided suppor usually between four two to six weeks somewhere in between there and once that's done we analyze experiments so we look at the we try to estimate the effects of often change and then we take that analysis turn it into insights and a recommendation to to the tensor and all it is so to summarize the goal for us here is to help guide business decisions and the problem that we have often when it comes to the orderly network is that they have very small sample sizes and we sold that by a clever experiment design so just to finish up I'd like to reiterate that at the core of the Libero is data science and we you know it's it's an important part of our organization right now and it's gonna be even more important in the future so we're bringing in a lot of people if you are as excited as me and Jamie about this then you should definitely come up and talk to us there are a lot of people here from delivery today so you should definitely talk to you Shane Mike over there Hugo Allah so there's a lot of people that are really keen to talk to you calculate over there so so you know feel free to grab us when remember having food and things together [Music] [Music] [Music] you