foreign [Music] data on Golden Retriever weights and we plotted it on a number line we then took that number line and attempted to build a histogram on it a histogram for those of you that don't know is a type of chart that breaks data up into equal sized bins and counts the number of items in each bin as reflected in the height of each bar notice that if we have too many bins that we don't really get a meaningful shape this is because we do not have an infinite amount of data however if we make the bin sizes just right to capture just enough data based on our sample size in each bin we start to see a meaningful shape we see this bell-shaped curve that we call the normal distribution here is the formula for the normal distribution as you can see we have mu which is the mean Sigma which is the standard deviation Pi is a constant which is 3.14 e is Euler's number which is about 2.718 this leaves us with only the X variable as our input on the x-axis and the y-axis represents the likelihood we call this particular view of the normal distribution the probability density function also called the PDF mu the mean is going to be a parameter that reflects the center of the bell curve and its peak in this case it is 64.53 Sigma the standard deviation is going to be 3.05 The Wider you make Sigma the standard deviation The Wider the bell curve is going to be As you move mu around the mean you'll see that it just shifts the entire bell curve left and right and of course Sigma the standard deviation is relative to the mean so it is 3.05 units away from the mean let's look at a few more properties the area under the entire curve is 1.0 both Tails go to Infinity forever approaching the x-axis but never reaching it but the area is 1.0 if I try to look up a given likelihood for a given x value so like 69.1 that is not terribly meaningful in itself what is a more productive question is finding the area for a given X range so what is the area between 69 and 70 that's .03 that means there is a three percent probability of falling in that range there is an 11 probability of falling between 68 and 71 and between 64 and 77 is going to be a 57 probability but how do we calculate this area well we have the probability density function the PDF we're going to need another form of the normal distribution called the CDF what we will do is we will project all of the area captured from negative Infinity all the way up to a given x value and we will plot that resulting area we will call that the cumulative density function the CDF let's say I want to find the area all the way up to X being less than or equal to 70. I can look up that value on the CDF and that given x value will be the input and then I trace it back to the y-axis on the CDF and that will give me a value of 0.964 . so that is the area up to x equals 70. what if I wanted to find the area between 65 and 70 though well let's capture the area up to 70. and then let's capture the area up to 65 which again is another lookup on the CDF we will find that that is 0.561 and then we're going to subtract that from the area up to 70 and that is going to give us the resulting area for that middle range between 65 and 70. we find that area to be 0.403 that is the area between 65 and 70 the probability I'm going to show you one more representation of the normal distribution the inverse CDF also called the ppf the probability Point function I am inverting the axes and therefore tipping the s-shaped curve on its side this allows me to do inverted lookups why is this useful let's say I want to find the x value that gives me a given area of 0.75 we don't know what that x value is though this is where the inverted CDF the ppf can be useful I put in that probability into the reverse CDF that Returns the corresponding x value and then I know what x value yield that area of 0.75 so it really is in the end just an inverted lookup of the CDF swapping the axes thank you very much for watching I hope you got a lot out of this there are other topics related to normal distribution we have not gotten a chance to cover including the central limit theorem and hypothesis testing as well as confidence intervals if you want to learn more about these topics please subscribe to this Channel and I will try to get to those topics as soon as I can I also do cover them in my book essential math for data science also do check out my other book getting started with SQL alright I will see you all next time thank you