[Music] hello and welcome to this lecture we are going to do something slightly different instead of just a regular lecture where we do just computations and derivations we are going to do some you know computer simulation of probabilistic experiments now this is a bit different and also maybe a bit challenging to some of you if you have never seen a computer program before or something like that but hopefully you've seen some computational thinking and you have some experience of looking at you know steps of numerical computation there and you're also doing a python course on the side so hopefully these two together will help you get over this and we are not going to use any major python structure or anything like that we're going to just use some simple thing and most of the code i will i will give you i mean you have to just look at the code and learn how to run it and we've also been talking about activities in this class and this will sort of be part of your activity and you'll see it it's easy enough to do i'll expect some very simple modifications maybe from you and your instructors will help you in this as well okay so let's get started so like i said one can define an experiment an experiment is some sort of an action isn't it you you do something and some outcome results and one can presumably do that in a computer right so you can write a computer program that will simulate the experiment with you know maybe some realistic sort of condition so this is now possible and what i will give you are simple python routines for simulating experiments and interestingly from the simulations using something called monte carlo simulation is a very standard technique today you can verify or you can actually by simulation find the probability of an event okay so you have a experiment and you define an event and you do a computation to find what that probability of that event is turns out you can simulate it and verify whether these two agree okay so that's something of of very great interest to us right so you toss a coin we've always been saying fair coin is 0.5 so maybe you should just have a way of verifying that actually has a meaning in real life and that's something we will see later on even in theory in this class in practice also it is true okay so so this is some this monte carlo simulation is very important in practice a lot of you know quick experiments can be done like this for complicated experiments you can actually do simulations and check whether your probabilities of events work out okay or not okay so all this code will be available to you as a python notebook in something called google collab and these are things that are very important for you to learn in this program right knowing a good python notebook you know environment where you can go in and do some work is extremely important for a data science program today if you get out with any sort of learning at all you will need to be able to express or write some python notebooks to explain your work and this has become part and parcel of everything today not just data science in even in other areas this is very very useful so it's a good skill to have and as part of this course you can take some first steps towards it and as you go further in this program you will become more comfortable with using python in a notebook in google collab or any other notebook environment that you like okay so with that introduction let's jump into this python notebook i've created a notebook for you in google collab and i'll just show that to you and it'll also be shared to you you'll be you'll have it available and this is basically for simulating these experiments so let's see that so this is my google colab python notebook and you just have to go to this url that you see here collab.google.research.google.com and this will be like a drive file for you okay so on your so you can use the online degree login that you have and go into collab dot research and you can create any notebook that you want and this is a notebook that i have created like that okay so any notebook will have cells okay what are called cells and you can see the first cell is highlighted i can go to the second cell i can go to the third cell i can go to cell after cell after cell everything is a cell these blocks that you have here this is a cell and every cell can have either code or text okay so you can see here code text and you can keep adding cells whenever you go here you see plus code plus text right you can either add a text here or you can add a code here so for instance like you know in the very beginning if i want some you know text here i can put it's in some language which is called markdown you can also learn that later if you like this is so let me just call this probability in python so what i type here is in something called markdown and it's good for you to learn markdown as well so you can see some sort of a you know title came here and you can see this is probability in python so this is a way to type text okay so you can see this is also a text column i can edit it if i want to if you do not want to i can go back here and then these are code okay this is code what you see like this our code this slightly different background and foreground that is code this is all python code you can see python code and text gets mixed up together you write some text describe something and then you write python code for doing that okay so that's called a notebook okay so notebooks are very nice any work that you do you can see clearly that you know you can describe it in a notebook very conveniently you write the text and then you write code the code can be in many other languages as well so python is just convenient in in most cases people use python but you can also have other languages okay so now so google collab there is this connect so collab will connect to some virtual machine on the background when you run the first command it will do that connection on your own you can see it's allocating and then it's connecting and once it's connected it will run this first piece of code okay so it's connected it's given me some ram and disk etc and it does run this okay so so python works with something called packages you will learn these things later on modules or packages and i've imported an important module called numpy numpy as np so now this is this you need not really know the meaning of this but let us go on and see how i am doing this so first i have a function for generating a uniform outcome ok supposing you have n outcomes in the sample space and you want to select m outcomes independently one after the other uniformly okay you need a function for that and i am defining that function so you can run this so your machine knows that this uniform n m is something that will generate uniformly there are n outcomes in the in the sample space it will generate m outcomes for you independently okay so supposing here is an example let's see an example here example is always good uh supposing you want to toss a coin right so there are two outcomes i will denote the first outcome by heads number one outcome number one is heads outcome number two is tails right one and two so if i do if i could toss the coin once i could toss it ten times i could toss it hundred times so this is a command which prints one toss ok uniform of 2 comma 1. so now 2 is what 2 tells me that there are 2 outcomes in the sample space the first outcome is number 1 second outcome is number 2. that is what this 2 tells me and i want one experiment to be done okay experiment to be done once what about this command here uniform 2 comma 10 again 2 outcomes are there and i want to do the experiment 10 times so toss the coin 10 times and tell me what happened here this toss the coin once again 2 so that determines that it's a coin right so my outcomes are only 2 in number 1 and 2 so and i want to toss it 100 times so let us just see if i run this what happens so it tossed it first time the first toss resulted in tails the next 10 tosses resulted in you know so many ones and so many twos and then the hundred tosses resulted in so many ones and so many twos once again remember one is my code for heads two is my code for tails no it's easier to use one and two and things like that and computer programs as opposed to h and t it's not very hard to do the translation but you know this is easy enough so let me just run it once again now i can run multiple times you can keep running and you see the outcome changes right when i tossed once the outcome changed here when it was 10 times the outcome changed it keeps changing i can keep running every time i run i will get different random outcomes okay so this is what i meant by saying i am simulating an experiment the tossing of a coin is getting simulated here inside this python language in some way okay and you're getting the output here okay hopefully that was clear if you you'll have this notebook with you you can sort of hit this button here and keep getting multiple iterations going uh the next thing i want to do is throw a die we've been doing throwing a die calculations for so long right so here's throw a die you could throw once or you could throw 10 times or you could throw 100 times exactly similar to before now notice all i have changed here is uniform 6 comma 1 okay so in a die i know there are 6 possible outcomes all of them are uniform so all i need to do is make uniform 6 comma 1 and i have my die right so it's as good as that so it's very simple i can throw the die once 10 times 100 times and off you go there you go you get the answer and here it's the number is the same as the outcome so i don't need to do any translation so you see the outcomes here you know one two three two one six six that's possible right so you can run it different times you run you get different outcomes ok so that's throwing a die all right so simple experiments you you know how to do it ah it's uniform outcomes is very easy okay now the crucial thing we will learn is something called this monte carlo technique this is estimating probability of events by simulation i've given here a brief description of what it is okay so the basic idea is is this estimate okay so put equality here it's probably not equality so maybe i should say approximate here just to get that right okay so you can see here ah so so if you have an event a and you repeat the experiment n times independently you keep on repeating it and simply count the number of times the event a occurred which which i could call as n sub a okay and then if you take this ratio n sub a by n you can show using various theory and also in practice you can sort of observe that this is a good approximation for probability of that event a okay so this is the basis of monte carlo okay so this equation right here repeat the experiment n times independently and count the number of times the event that you want occurred n a then n a by n is roughly the probability of that event a ok so one can show also that as n becomes larger and larger and larger this becomes a better and better and better approximation and this is the method that's very useful okay so we're going to just do very basic simulation here the simulation i'm going to do we'll evaluate probability of a coin toss okay so we know it is 0.5 right i am expecting my uniform to be uniform so it is 0.5 so notice what i am doing here i am counting so this is where i count the number of heads ok number of heads is 0 and for i in range though this is basically like a repetitive repeating of a loop thousand times every time i repeat this loop i generate a coin toss and check if it is equal to heads okay so that is the part which checks if the coin toss is equal to head okay so maybe i can write that down here okay so this is repeat coin toss 8000 times whatever i choose to do right ok so then i i count the number of heads and divide by thousand i get a probability estimate by monte carlo so let us run this 0.492 0.513 point four eight point four nine two it's sort of close to point five so you mean you won't get the exact point five uh what the theory tells you is as you keep going to thousand ten thousand hundred thousand and all that this will become closer and closer and closer to point five of course in computer simulations there are so many other things to consider but anyway 0.5 is sort of what you have in your head in theory and you got that in practice using monte carlo so now this is a very very nice result to have so let's let's once again do probability of a die showing a number maybe you thought coin toss is too easy so we can modify the monte carlo simulation so what i would do for any monte carlo is copy this paste it and make it number instead of number of heads so that it is neutral enough right number equal to zero variable for storing the event occurrence okay and then my repeat how many how many hour times i want to repeat and i do my experiment uniform of six comma one is it equal to one ok and then i do number equals number plus one and finally i print this right so i expect this number to be one by six so point one five seven point one six is pretty good okay so maybe instead of just the number being one maybe i want to find the probability that the number is either one or two okay so what you can do what you can do is you can you can store the value of the die okay and then simply check if die equal to equal to 1 or die equal to equal to 3 then no you increment ok or ok is that okay so what what will be ah this event here di equal to equal to one or di equal to equal to three the event a is a equals one comma three isn't it this is the event a k so probability of a should be one by three you should get something close to point three three lo and behold you got point three three three i mean who more exactly i mean that was just a fluke i think you do it once again you get something slightly different point three to one point three one point three five five you know slightly around point three three that's what you will keep getting okay so you see you can make these small modifications right so so you first do the experiment okay so you always have this for monte carlo okay ah this is repetitions then you do the experiment first okay don't do anything more clever just do the experiment and then event that's it so that's monte carlo for you you can copy paste this to anything else you like right variable for storing repetitions do the experiment write have some logic for counting whether the event happened or not and then increment the number right so this is the way to do it right so if you can change it so i am claiming that if you change this to point ten thousand you will get something smaller it will take a little longer to run okay of course i have to divide by the ten thousand divided by thousand you will get something wrong so you see you're getting much closer numbers much more consistent right so point three three point three three is not changing you're getting something much closer to point three three three etcetera etcetera so these are things you can verify okay so hopefully you saw this so this is the essence of monte carlo okay so you can always copy this and paste it somewhere and then make the change that you want change your experiment change your even condition okay you got your monte carlo going just keep doing it and you have monte carlo experiments okay so this is the basic now let's consider slightly more complicated experiments i am going to consider three different experiments which actually very classic probability experiments we didn't consider the theory for it so i have put a small description of the theory given your background and probability now you will understand this the first classic problem we look at is something called the birthday problem okay so the birthday problem is a very interesting problem most people you know have this give a wrong answer to this question and as it turns out it's very surprising that the answer is very nice simple answer okay so let's just see what it is okay so you have a group of n persons okay the birthday problem basically asks what is the chance that some two have the same birthday so at least there are n persons a lot of people what is the probability that two of them some two of them share the birth okay so they have the same birthday all right so now if you think of uh you know distribution and probability etcetera if you just pick a random person it's not unreasonable to think that their birthday would be uniformly distributed from 1 to 365. i don't know in different cultures people get married only at certain times and you know they have kids only at certain times so all these things can affect your uniform distribution but generally i think it's a reasonable approximation and uh can you assume two people's birthdays are independent yeah of course i mean if you pick them random enough you know far enough apart i guess it's not a bad assumption so these are reasonable assumptions you can make so you know most people imagine that you should have at least 100 people before you start seeing common birthdays right because uniform independent assumption is probably reasonable so only when you have a collection of hundred people do you have even a chance of having common birthdays right so this is something that something people naturally assume but you know it turns out if you only have 23 persons of this sort uniform distributed birthdays and independent then you have more than a 50 chance of two people sharing a birthday just with 23 people okay does it sound believable it seems a little surprising but it's actually true you can calculate this probability so for that reason let's say event a is that some two have same birthday right so that's event a and then let us say event a complement is no two have same birthday right so in fact in this problem it turns out a complement is the easier thing to calculate right what is a complement once again how do you calculate probabilities of events you just write them out one after the other and then you know use the and rule and the or rule and combine and do it there's no other way to do it okay so a complement is birthday one on any date b1 okay for the first person the first birthday can be on any date b1 i mean there's no collision if there's only one guy right there's no problem and birthday two is on any date other than b1 okay so the first person had a birthday on p1 the second person has a birthday anything anything is allowed other than b one what about for the third person any date other than b one b two and so on and so on and so on till the last person birthday n on any day other than b one b two to b n minus one okay so what's the probability for the first one but they won on any date so that's just one right 365 by 365 everything is okay with you the next one is any date other than the first guy so that will be 1 minus 1 by 365 then 1 minus 2 by 365 so on till 1 minus n minus 1 by 365 these are all like condition condition condition you can multiply all of them together you get probability of a complement is that okay hopefully convince yourself of this this is the probability that you will get okay so looks like a pretty long expression even for n equals 10 there's lots of multiplication to do and maybe you can multiply you know but i'm going to be a bit lazy i am going to write a computer program to do this and verify both by calculation and by monte carlo simulation that the probability comes out to be correct okay so so here i am taking 10 percents n equals 10 and there is some logic here first is the logic for calculating the first number i print is the probability from the expression okay so why is this expression that i mean you can check that out i do 1 minus product of 1 minus you know something that is n by 1 by 365 right so this is like a short python code and numpy for doing this expression okay this is the same expression here let's compute it okay and then i have my monte carlo remember repetition experiment event okay repetition experiment and how i do this is a little bit more complicated uh i don't want to go into uh details of the python code believe me it's true this is the birthday problem and if you simulate you see you get answers which are very very close okay so the theoretical calculation that we did and the monte carlo experiment that we are doing give you the same value okay so this number by thousand and the theoretical calculation give you roughly the same value you can change this number for instance if you make this 23 you will get my promised 50 okay so you see the theoretical calculation is 50 percent a monte carlo sure enough gives you 0.523 you can repeat it if you like so in fact you may say 50 is still a luck but even if the number goes to you know something like 40 you literally you know guaranteed a same birthday look at that that's 90 percent okay if you have a class of 40 people two people are going to share birthdays with 90 percent probability nearly okay you would be really unlucky not to have a uh common birthday and even in a class of 9 40 if you go even to 60 i think quickly this number will go off very close to 100. look at that that's 99 percent in 60 people you're virtually guaranteed right unless you're really really unlucky and really everybody sort of came from very dependent population it's very very likely that you would have two people sharing a birthday for 60. so this is just telling you how monte carlo sort of agrees with the calculation that's done here okay all right so that's birthday problem and the next problem is something called the monty hall problem okay so let us look at that next i have three or four more problems i won't probably spend too much time on everything i'll let you read it up on your own but this monte hall problem is monty hall problem is very very classic you can read it this is like a game show there are three doors okay and there's one car and two goats okay there's a car behind one of the doors there are two goats behind the two other doors and you don't see the what's behind the door of course i mean you're a contestant you just see the three doors and you have to pick a door at random okay whatever you open you get right for instance you know if you got a goat you know if you like goat that's a that's a great gift for you but you know most people would like the car isn't it so if you are lucky get the car so it looks like if nothing else happens the probability of getting a car is one-third isn't it right so one-third is probability that you'll get a car two-thirds is the probability that you get a goat just luck of the draw you try and go off now this contest is a little bit different so you make a choice okay the the host will not show you what's behind the door okay he'll open another door right which has a goat behind it okay so right you pick the door there's something behind the door you don't know what it is could be car could be good i don't know but the host is going to show you the door open the door which has a goat behind it okay so definitely whatever you pick there will be some door with the goat behind it right where there are two goats and the host is going to open the door and show you here's a goat okay now you have the choice of switching all right you could either stick to your original door or you can switch okay so the question is what should you do okay that is the monte hall problem should you switch or should you stick to your original door okay so most people who don't do a detailed calculation will say it doesn't matter okay most people will say it's one third this way or that way you know either switch or you don't switch it shouldn't matter okay but the fact that the person you know opened and showed your code apparently ends up having a significant impact on what you should do and the surprising thing is if you choose to switch your probability of winning goes to two by three okay you can do a simple setup it's not that complicated but lot of people have been confused by this many people don't believe this okay so it makes sense to do an experiment and verify this okay so that's what i have done here i put here a little bit of a case where i know i am not going to go into details here you can see i am doing a car location and fixing the go to one location go to location then contestants original choice is again uniform and then i am checking if it is a goat or not and then finally i am counting if the other closed door is equal to car location so this is this is the case of success for switch right so if you you can check this and you'll get 0.669 sure enough simulation bears out you can see that you know even if you didn't believe the theory by theory you can show this there's a wiki page and i've given a link here to a wiki page you can go there and show you why it's why the two thirds is justified there's lots of good reasons for it and you know if you don't believe it if you like computer simulations more here you go you can simulate you see the answers two by three okay so that's the monty hall problem it's a nice problem uh next is uh something called the polias earn scheme i'm going to skip this here again now you'll see that it works out okay and the last experiment i've done is something called a simple random walk or gambler's ruin okay so it's a slightly more complicated experiment than maybe what you've used to so let's walk through this real quick a gambler has k units of money k rupees if you like k dollars if you like dollars whatever unit k units of money and is going to a casino uh to play don't ask me where is the casino in india goa has lots of casinos if you know if you didn't know anyway so this is this is the game he does if he has any money at all if he has at least one unit of money uh he's going to toss a coin okay and if the coin results in heads the casino will pay him one unit you can imagine the casino has infinite amount of money it's not going to go bankrupt it can keep on paying him as much as he wants if he gets the tails then he loses his one unit he has to give us one unit to the casino okay now of course if he loses all money he's run out of cash is bankrupt and he stops right and if he gets n units of money okay he will also stop so n is like his goal once he gets n units of money he feels rich enough he thinks he can go buy something with that money so he stops okay so this is called a gambler's ruin problem in fact it's a special case of the problem it's called a simple random walk with two absorbing barriers there's a barrier at zero there's a barrier at n right he starts at k he'll keep drifting okay he'll lose some money gain some money moving around here there if he ever hit zero he stops that's an absorbing barrier as it's called and if he ever hits n also he stops this is also an absorbing barrier so there are two barriers and the person keeps drifting around you can imagine right so there's a toss of coin going on and he keeps on moving and the interesting probability to calculate is this probability of bankruptcy okay probability of bankruptcy is interesting right i want to find out if i start with k units of money and if i play this game what is the probability that i'll go bankrupt right now you can say small p is the probability of heads okay so i'm allowing the coin to be biased okay so most casino games it's not going to be a coin toss which is fair right most casino games are loaded in favor of the casino it might be slight loading but even then it is loaded in favor of the casino right so say p is the probability of heads and q is equal to one minus p this is the probability of tails isn't it so you can show i'm not showing the details actually the calculations are briefly shown below not in great detail a probability of bankruptcy if your coin is fair is 1 minus k by n okay if it is not fair then there is a more complicated formula involving q by p okay q by p power k minus q by p power n divided by one minus q by p power okay this is the formula and how do you derive it there is a wiki page which has some definition derivations but actually it is very simple derivation so very complicated let's say xk is the probability of bankruptcy starting with k units okay the main idea is to condition on the first toss okay you have a condition on the first toss okay now x k is probability of bankruptcy given first toss is head times p plus probability of bankruptcy given first tosses tail times q right so first task could be the heads or tails right you either have heads and go bankrupt or tails and go bankrupt right so then you do the conditioning it is just the and or combination right so you do the conditioning and you write it out now notice what is this probability of bankruptcy given first source is head now from the first house is get head you now have k plus 1 other than the fact that you have k plus 1 everything else remains the same so what you have is just x k plus 1 right so this x k is like this nice little function which takes you from k to the probability of bankruptcy so you just put x k plus 1 here same way with probability of bankruptcy given first source is tail when the first source is tail you lost one money you went to k minus 1 you have x k minus 1. now how do you solve these kind of recursions you know you need boundary conditions ah boundary conditions is x 0 is equal to 1 right if you got to 0 then you are definitely bankrupt already so x 0 is 1 and x n is 0 right if you got to n then you stopped okay so x n is 0. so ah so so if you solve this solve this equation you will get this okay so i'm not going to show you how the equation is solved you can take a look at it and but if you plug it in you'll see this is a valid solution okay both for the case p equals q equals half and p not equal to q all right so so there is an expression for probability of bankruptcy all right so now we have done a i have done a monte carlo simulation here so put k equals 5 n equals 10 ah and i am printing 1 minus k by n and i am repeating thousand times i start with k 5 and then if k is 0 greater than 0 or less than n right so this is where i will toss a coin and if it's heads my k goes up if it's tails my k goes down and that's how i keep repeating it right so if k is 0 when it ends i increase the number or if k is n when it ends i won't increase the number right so this is how ah this works this is where it is so some of you who are paying really close attention will know what if it never hits 0 or n okay you may ask right what if it keeps on walking in the middle and never hits 0 or n it turns out that's not possible you can prove that in this case it later hit 0 or hit n okay so given the way it works it's a more complicated proof but you can prove that but you can see that you know 1 minus k by n and this monte carlo probability agree right so i printed first 1 minus k by n i got 0.5 this number also sort of agrees okay so this even this so this sort of probably gives you a hint on how for complicated events also you can do monte carlo monte carlo so easy to do you don't need any recursive equation etcetera so in real real life lot of applications you'll have a very complicated sequence of things happening and to compute the probability you just have to write a monte carlo simulate it you know and for simulating the only thing we did was uniform isn't it we just wanted a uniform toss okay you got it great now what about uh the biased coin right what if p is not half right can you do simulations so i put here something that will simulate a biased coin for you okay so we need a method for tossing a biased coin right so far we are only uniform outcome so put here biased of p comma m uh you put in some value for p it will toss a bias coin for you m times and will give you the answer all right so that's what biased of p comma m does let's run it get it into your compute into our computer and then you can run it so p is 0.25 i'm printing p and then i'm repeating it thousand times the biased coin and counting the number of heads uh sure enough you know 0.25 results both in theory and in practice so biased coin is also something you have once you have biased coin you can repeat the same gamblers ruin experiment with the biased coin 0.45 and you can sprint the theoretical value and the value that you got from monte carlo simulation and you can see sure enough its it agrees a lot okay so even if the probability of winning goes down to 0.45 right which is only a 5 percent reduction in that one experiment notice the casino is you know likely to win or you are likely to go to bankruptcy 75 percent probability so let's say we push it to 0.4 let's see what happens that's already 89 percent 88 percent probability right 0.4 if you go to 0.35 i think this should be really really close to 100 you very little chance of getting out of life so 1.35 is a good enough 15 advantage is good enough literally to wipe you out of all your money in a casing okay so once again uh take a look at this python sheet it's it's it's maybe not every line in the program you will understand but eventually as you progress in this program and the bsc do the other courses learn python you'll see that this is a very useful resource for you to have and what i have for you beyond this is an activity okay what is this activity so you go through and understand how the various you know just to how to set parameters you do not have to rewrite the code just how to set parameters and like i told you take a template and repeat the monte carlo okay and then what i want you to do is to pick any experiment that interested you right so we had so many experiments for which we did calculations of probability pick something very simple if if you think it's too complicated to do any changes in the python program you know you can you know just throw a die and compute some probability of some event right so you can do something so easy if you like okay it's up to you let me not tell you what you want to do you could try to do binomial you can do so many things right so whatever you choose you can try to do right and you modify the monte carlo to do repeat the same thing so you compute a probability of that event by you know theory in theory right print that value and then repeat that same experiment in monte carlo and compute or simulate by verify by simulation whether the probability value matches ok do that for any experiment of your choice and take your own version of the notebook add that and the bottom and then put it out on your google sites page now keep everything private shared only within the organization don't keep it public and show it in google sites page okay so do that do create a collab notebook as well right so you create a collab notebook you know copy all these things in add your own little python monte carlo experiment right and verify that the theory matches with the practice of the computer simulation okay so that's your activity hopefully you enjoy it hopefully you've seen a quick recap of all that you learned in stats one so all of this the theory at least is something you've already learned in stats one maybe some of it was a bit different but mostly it's recap so week one is almost entirely recap from week two onwards we'll push ahead with something new and something maybe you've seen a little bit of the random variables and all that before so we'll start with all of that and keep proceeding as we go into week two thank you very much