Essential Math Skills for Data Professionals

how much math you need to learn to become a data professional hello my name is sum Shukla and in this particular video I'll help you to understand the various topics you need to be aware about before you can become a data professional but before we start this video don't forget to check out escaler free master class on scaler event page by leing industry experts the link is in the description box in this particular video we'll be talking about the various topics and I will be also taking some examples that will make your journey easy in the word of maths before we start this video don't forget to subscribe to our scalers YouTube channel Let's Get It Started let's Now understand how much math you need to become a data professional now please remember I'm not using the term data scientist or data analyst or business analyst or machine learning engineer because being a data professional you need to have certain knowledge of math and in this particular video I'll try to explain you how much math and from which topic which all subtopics you need to prepare so that you can be a very good data professional so let's first of all understand various categories of data professional if I try to understand how many data professions we have uh the very basic and the Very starting point so let's say if I wanted to be a data professional the very first thing that I should start with is to be a data analyst so the very first step is data analyst a particular person can become a business analyst so let's say I'm working as a data analyst at Amazon now once I get the knowledge of business I can start interacting with the business heads business managers stack holders to apply the logic and the learning from the data to the business and that is what business analyst do they try to get or extract the insights from the data and try to resolve the business problem so that is business analyst while on the other hand the people who are data analyst they are more like not working on the analysis side but they are trying to interact with the data so they are the first point contact of all the data requirements let's say I wanted to fetch uh my my top 10 customers now the person data analyst working in your company would be the best person to return you the top 10 customers but if I have to take a business decision on top of it you have to reach out to business analyst the next step in this ladder is being a data scientist now data scientist is that person who Tred to utilize the data and come up with some complex algorithms complex models that can start predicting so business analyst or being a business analyst you are trying to resolve Problem by taking decisions based on the insights that you have derived from the data but data scientists try to forecast try to predict what can happen how we can uh enhance the performance of a particular business and various other thing so adding more comp comp lexity to your current work is going to make you a data scientist please remember with each step that you climb up in this ladder you are adding more and more business knowledge to your current knowledge of business so the more knowledge of business and the more familiar you get with the problem solving it will help you to become a more profound and a more strong data professional so data analyst business analyst data scientist and then we have multiple other um maybe categories or professions uh we have mlops engineer we have data engineer and then we have uh ml manager nowadays you can also find out the position of AI manager and we have multiple other positions that that you can find out right now in the market but according to me these are the three basic positions with which you can start or with which you can enter into this data world coming to the actual question and for which you are watching this video is how much math you need to become a data professional math is very important and those of you who think that okay I don't need to learn math I can become a data scientist or a data analyst just by learning python SQL just by learning some machine learning algorithms my friend that is not going to happen even if you want to learn any machine learning algorithm you need maths because every machine learning algorithm is a mathematical model it is running on some number numbers because data is all numbers and wherever there is numbers there is math so you need math so let's try to understand how much math and which all topics I need to learn I'll help you with a flowchart so when we talk about maths there are four important topics number one is statistics number second linear algebra number third calculus number fourth discrete mathematics now if if you if you're listening to these terms for the very first time don't worry because we are going to first of all look into the pipeline and then I will also be taking one one example so that it can help you to understand that okay this is not rocket science this is something which you already know but you have to just brush up your basic maths to understand these high terms or maybe High words so statistics is further divided into various subcategories let's understand we have descriptive then we have inferential then we have hypothesis which is hypothesis test listing followed by regression and finally time series so these are the subtopics that a particular data professional has to be clear with when it comes to statistics now descriptive statistics which is the very basic statistics or maybe all of you already already know about it it is further divided into two subcategories which is number one measures of central tendency number second measures of dispersion now we will try to take examples from both measures of central tendency and measures of dispersion but first of all let's try to complete this particular tree so that you can have a clear idea regarding how much math you need from all the subtopics now let's focus on linear algebra linear algebra it starts with the concept of matrices and vectors so number one matrices and vectors followed by matrix operations when we talk about matrices and vectors it is really important that you are clear with how to operate on Matrix so Matrix operations this is followed by line lar equation and finally optimization now when we come to calculus under calculus we have topics like differentiation integation you guys must have learned about calculus in your 11th class or 12th class but calculus is also a very important topic when it comes to machine learning we use calculus we use a a very high amount of calculus when it comes to neural network now those of you who wanted to use uh let's say those of you who are looking to use uh neural networks to direct what is the content of the image or maybe you wanted to use neural networks to find out how each gure so let's say if I am uh doing something like this now what is the meaning of this so we use image processing we'll use computer vision for this and while we are working on these uh applications we use high amount of calculus So within calculus we need to be clear with differential which is basically differentiation as well as partial differentiation we also need to be clear with the optimization technique that comes under calculus now finally is discrete mathematics in discrete mathematics we have the very utilized maybe the most important topic when it comes to playing with numbers if you wanted to play with numbers discrete mathematics is something that you need to be clear with and you need to be confident with the reason is discrete mathematics consist of topics like combinat Tronics which is permutation and combination the next thing is graph Theory followed by probability Theory and finally set theory this is everything like those of you who are thinking that okay I can become a data scientist just by mugging up or maybe just by having a very good coding knowledge having a very good knowledge about various algorithms and how to apply those algorithms in Python this is the reality this much math you need to be aware about so that you can become a very strong data professional now let's try to understand each component like when I talk about descriptive statistics what exactly is it when I talk about Matrix what is that when I talk about differentiation what is that so I'll take a example from each of this category so that you can get a little sense about what exactly you will be learning if I if you take one one particular topic so let's understand now let's start with measures of central tendency which is basically the part of descriptive statistics you guys already know about it but let me just help you with the basic understanding of it so that you can start your journey in descriptive statistics and can you can you can enjoy it so descriptive is stats and the very first topic is measures of central tendency now measures of central tendency is something that can help you to find out the center point of a data measures of central tendency are are basically your mean median and mode mean is also known as average let's try to understand it let's say I'm having the data regarding uh let's say the marketing budget or let's change the example make let's make it more realistic and something which you guys can easily understand let's say this is my years of experience and in y axis I'm having salary now let's create some data points first of all let's define the axis so this is years of experience as zero 0 1 2 3 4 and so on here I'm having let's say 100 200 300 400 500 now let's point or let's create some data points so for Z zero years of experience I'm having a data point over here for one I'm having let's say a data point over here for two I'm having over here three then four now this is the data that I'm having if I try to find out a middle point that can help me to get one value that represents this complete data then that value is known as mean how to calculate mean or average it is given as sum of all the observations divided by total number of number of observations so here I'm having observations like I'm having let's say this is 150 comma I'm having another one which is below this so let's say this is 110 comma this is let's say 250 comma let's say this one is again 250 and then we have a point 400 so these are my data points let's add them so is equal to 150 + 110 + 250 + 250 + 400 divided by 1 2 3 4 5 so you will get 500 900 1,00 1,200 so this will come out to be 1 160 we have to divide it by five and if you do the calculation it will be 232 now if I try to point this particular data in My Graph so 232 would be uh less than 50 so somewhere around somewhere around here and I'll create a line for this now this is my average line so here you can see that it is basically the center point of the data where you can see that I am having approximately half data points above it half data points below this so average defines the center of the data and that is the reason it is a part of measures of central tendency but there is one problem with average average is influenced by outliers now what is outlier let's say I'm having one particular employ whose salary is around um so the person is having experience of 5 years but the salary is somewhere around here and this salary is let's say uh 2,000 with very less years the person is still having a salary of 2,000 why other people are having a salary in a range of 100 to 500 maximum is 400 now what will happen if I try to calculate the average so now the data points would be 150 + 110 + 250 + 250 + 400 + 2,000 divided by 6 so previously the summation was sorry 1 160 I will add 2,000 to this to divide it by 6 so this will be 3,160 / 6 so you will get 526 something now here you can see that just because of this particular number which we call it as outlier your average salary has been shifted to somewhere over here and this is not the center point this is your average line without lier and this is not the center of the data so that is the problem with average your average as a measure of central tendency is influenced by outlier influence by outliers now how to get rid of them that is the reason we have median if I try to calculate the median with the same data so median median is basically the center point or Central Point in the data no matter if you have outliers if you do not have outl it will be always Center so when I say Center Center let's say this is 0o this is 100 so your Center will be 50 so your 50% of the data will be below this point 50% data while your rest 50% will be above this point 50% data that is what your median is if I have to find the median for any given data data step number one is sort the data or order the data in ascending order so order the data in ascending order so I'll get uh 110 followed by 150 followed by 250 250 followed by 400 followed by 2,000 now I'll take the middle value here the middle value is is basically this because below this point I'm having three data points which is 50% of my data above this point I'm having three data points which is 50% but how to find out this middle value if my data is having odd number of data points I'll directly get the middle value but if you have if you are have having even number then it is simply the average of these two values average so 250 + 250 divid by two you will again get 250 and this is my average so you can see that even this particular data is having outlier which is 2,000 which is this value is still my average is 250 which is near to Center so my average is over here this is my average line or the median Line This is my median line median line now what is mode so we have understood mean we have understood median what is mode mode is that observation which is having highest frequency in your data so the particular observation mode is the observation with highest frequency in the data highest frequency means most number of occurrence so if I look into my data and I should add 2,000 also in this the particular value which is 250 has occurred for or two times while other values like 150 110 and 4 th 400 and 2,000 they have just occurred for one more time so my mode would be 250 in this case this is my mode so you can understand that all these measures which are the measures of central tendency are basically helping you to find out the central point of the data while mean is a metric that get influenced by the outlier median is is still a metric which is not getting influenced by the outlier another advantage of median is that it will give you a point in the data from where below that point you will have 50% of the data above that point you will have 50% of the data now let's try to understand measures of dispersion measures of dispersion now my objective is to find out how far the data points are from the central point so let's say if I'm having a particular number line and this is the center of the data and I'm having data points which are very far away over here so remember this is my my center of the data while these are my data points so here you can see that your data points are very far away from the center of the data so it will have high dispersion while if I'm having this case where this is my center and data points are concentrated around the center so these are my data points now this kind of data is known as with low dispersion now the metric that can help us to calculate this is known as the measures of dispersion we have various metric around this but two basic metrics are variance and standard deviation let's try to understand it by performing a calculation on Excel so here I'm having my Excel data with years of experience and salary now let's first of all find out the average which is nothing but mean so mean we will apply the formula is equal to average I'll find out the average salary or let's use uh years of experience because that is more simple to understand so average years of experience so is equal to average and I'll select my data range so this is my average which is 5.31 basically mean the central point of the data is five you have some employees below this you have some employees above this so maybe we have some employees with one years of experience we have some with 15 also but this is the center similarly let's find median is equal to median and we will use the same data range and that's 4.7 and let's find mode now is equal to mode and my mode is 3.2 now you can see that uh these three values are not equal to each other they are different one thing we can understand is the data must be having some outlier due to which this particular thing or mean and median is not matching also mode is not matching if a particular data is having mean is equal to median is equal to mode then the data is balanced which basically means it is exactly having the center point 50% below 50% above this kind of data is known as the normal distribution data which is difficult to find out in reality but our objective is to understand variance over here so let's come to the point now what we will do we will find out the distance of this point from the average so from the center how much far is or this particular employee is so the average or the middle point is 5.3 this particular employe is having a year of 1.1 and the difference is 4.2 so let's calculate that difference so I'll write dff which is difference is equal to average minus the value and I'll fixate the the average value and press enter now this is 4.2 let's apply the formula to all the rows now here you can see that I'm having some positive values some negative values if I wanted to find out on an average how much the value is above or below the mean I need to take the average of this column right but if I take the average my positive values will cancel out negative values that is a problem right the easiest solution is we take the square of these values so dff hyphen square and let's take the square of each value so this value to the power of two and I'll expand the formula to all the rows and now we'll take the average of all these values so is equal to average of all these values and here the average is 7.7 which basically means now this is something which is wrong because remember we have taken the average of squared values so this is also a square value how we can get a actual answer to understand how much a particular data point on an average is above or below the central point I'll take the square root of this value so is equal to a square root of this value and that is 2.7 which basically means on an average a particular data point is 2.7 units above or below the mean and that is the standard deviation of the data now higher this number the data is having higher variance lower this number the data is or maybe most of the data points are concentrated towards or around the mean remember this particular uh number that we have derived this is known as variance and various and some is something that is hard to interpret that is the reason we use a standard deviation we have direct formula to calculate these numbers in Excel also for variance we have V over here and it will give you the exact same answer of which is 8.5 now there is a difference the difference is because of a degree of Freedom please don't not worry about this particular concept as of now it's again a mathematical topic but as of now I'll delete this let's summarize what is the standard deviation and what is variance so a standard deviation will help you to understand how much a particular value or how much on an average the values are below or above the mean and that is 2.79 in this particular case now let's try to understand the second category which is linear algebra and the most important topic under linear algebra is Matrix now I hope everyone knows that the data with which we work it is a two-dimensional data we have rows columns and the best way to represent that data is using Matrix now when you have a matrix you have to perform a variety of operations like multiply The Matrix transpose The Matrix find the inverse of the Matrix and so on let's try to understand some basic operations in Matrix so a particular Matrix is a 2d structure defined by rows and columns so here we have rows so these are my rows Row one row two Row three and these are my columns column 1 column 2 column 3 and these are my elements a particular Matrix can have variety of shape the shape of a matrix is defined by the number of rows followed by number of columns so this Matrix is having a shape of 3 cross 3 because it is having three rows and three columns now when you have two Matrix there are certain rules that you have to keep in mind before performing matrix multiplication Matrix multiplication the very first rule is the number of columns of the first Matrix should be equal to the number of rows of the second Matrix when you multiply two Matrix the output that you will get will have the number of rows equals to first Matrix and number of columns equal to second Matrix let's take one General example let's say I'm having a matrix with one 2 3 4 5 6 and here I'm having a matrix with 2 [Music] 2 now this is my Matrix a this is my Matrix B can we perform matrix multiplication the answer is no these two Matrix cannot be multiplied because the number of columns of first Matrix should be same as number of rows of the second Matrix so what we have to do let's add one more value so let me make me let me make this as two two and two so this is now three rows one column now this is matching this so we can perform matrix multiplication how the matrix multiplication will happen remember R will be multiplied by C rows multiplied by column so this is my one row I'll multiply this with my entire column let's do that so my first value will be 1 into 2 + 2 into 2 + 3 into 2 my second value will be 4 into 2 + 5 into 2 + 6 into 2 and that's the answer would be 2 + 4 + 6 2 + 4 + 6 and this will be 8 + 10 + I think 12 so here the total is 12 and here the total is 30 this is your answer now the number of rows this is for the resultant Matrix after multiplication resultant Matrix after multiplication so the number of rows equal to first Matrix we have two rows which is equal to first Matrix number of columns equal to second number of column is 1 so 2 comma 1 and this is my answer so I hope now you guys are clear with how matrix multiplication happen and what is exactly a matrix again we are not covering up everything in this video but I'm trying to help you guys with the essence of each topic that you have to learn in detail when it comes to uh mathematics required for data professional so let's move to the next topic now now let's try to understand the differential which is differentiation and partial differentiation I will give you the idea regarding how to perform differentiation so let's say I'm having a function f ofx = to x² now why I call this as a function because assume that it is a machine which takes input as X and returns output as x² so this is a function of X this is my input to the function this is my output to the function and this is function of X so that's how we call it function of X now if I have to differentiate this function then we use the term Dy by DX of f ofx here also we will apply Dy by DX of x² there is a general formula of differentiation which is dy by DX of x ^ n is NX n minus1 that's the general formula so this will return me 2X 2 - 1 which is 2x so the differentiation or the differentiation or the differenced value of the function x² is 2x in machine learning we call this as the slope so when you differentiate it you get a slope and now we are not going to talk more about what is slope and what is machine learning or maybe which algorithm I'm talking about right now but this is what differentiation is and you have to learn this topic in detail how to perform differentiation the chain rule I'm just putting the topic name over here the chain rule which basically mean if I do not have the function which is this as one particular element let's say I'm having a function f ofx equal to x + y + 2x squ let's say then how we can differentiate this there is a chain rule that will help you to differentiate this and this is really important uh for any person who is trying to enter into machine learning so this is the topic that should you should be clear with when it comes to differentials let's try to now proceed with the last topic which is combinations and permutations combinations this is a very important topic when it comes to uh sampling combinations and permutations are something which we generally apply in real life also but when it comes to machine learning or even not machine learning when it comes to data science when it comes to General data analysis combinations and permutations are really useful to find out how many iterations or how how many times I have to perform a certain task or how many possible uh variations are possible in this given data let's take one example to understand the difference between combinations and permutations so let's start with combination first of all so combinations talks about the number of ways to select a particular data point from a given data let's try to understand this with one simple example let's say I'm having a population so this is my population with 1,00 people and I wanted to sample out 100 people now how many combinations are possible like which all 100 people should I select so how many possible ways are there let's try to calculate it the formula for combination is given as NCR is equal to n factorial / R factorial n minus r factorial now what is n n is the total number of observation R is the selection that you are trying to make which is 100 let's apply the formula 1,000 factorial / by 100 factorial 1,000 - 100 is 900 factorial if you wanted to expand a factorial it will be so if I expand five factorial it will be 5 into 4 into 3 into 2 into 1 which is 5 * 4 is 20 60 and 120 similarly if I try to solve this 1,000 factorial divide 100 into 900 factorial you will get 7.53 into 10 ^ 16 now these many combinations are possible these many combinations these many ways you can select 100 people from the group of thousand people and it is a good way to understand uh like at what scale we are going to work on or maybe how much efforts we have to put while performing a a certain task so combinations are of great use when it comes to data science let's try to understand permutation also permutations now in combinations we are trying to understand the various ways in permutation we are trying to understand the arrangement let's take one General example so let's say I'm having five books a b c d e now how many ways are there in which I can arrange these books in the shelf so should I keep a first followed by B CDE e so this is one combination sorry this is one permutation similarly if I place B first followed by a c d e this is permutation number two similarly if I keep C first followed by a b d e this is permutation number three so how many such permutations are possible so for this we have a formula n p r which is n factorial upon n minus r factorial so here n is the number of objects which is five and R is the number of places which is again five in this case so 5 factorial div 5 - 5 factorial this will give me 5 factorial which is 5 into 4 into 3 into 2 into 1 which is 120 so there are 120 possible ways in which you can arrange these books in your bookshelf now that is what permutation is so I hope now you guys are clear with how much math you need to be prepared with if you're are looking to enter into uh the field of data science or maybe trying to become a data professional again in this particular video I have tried to make you understand like uh the importance of each and every branch of mathematics we have not covered everything in detail but I hope that I was helpful enough to give you the starting point for each of the topic welcome to the first module of statistics and probability full course in this particular video we'll be talking about introduction to statistics and descriptive statistics we'll be also talking about measures of central tendency and measures of variation so let's get started so let's start with first of all talking about statistics now if we talk about statistics statistics is broadly classified into three major categories so let's discuss about them and I'll be writing all these notes so if you feel like increasing the speed of this particular video you can do that but I'll be going in my very normal normal pace so you can just um adjust your speed uh about this video as you like so statistics the very first category is descriptive statistic then we have inferential and finally hypothesis we'll be looking at all these three segments in a very detailed way but let's start by understanding a little bit about each of these categories so we will briefly talk about what is descriptive statistics what is inferential and what is hypothesis let's first take some random data let's say we have this kind of data so we have the data of let's say few students with their student ID the name of the student their marks and let's say their age and from which city they belong to so for a student ID I'll just write S ID then name marks age and City and let's say we have few data points let's fill the tables with some random data so 0 1 02 03 and so on till 0 let's say we have some thousand students so 1,000 name is let's say Sumit um maybe Ravi Sanju and an then we have marks let's say um 1 100 95 82 maybe this is 60 then age let's say 29 28 25 and 30 city so I am from kpur then RI maybe from Bangalore then maybe this is Chennai and finally Anita from let's say jamshedpur so we can just write jamshedpur okay now given this particular data we have to understand this data so if your manager or maybe let's say the principal of a particular School is asking you to summarize this data now you will not be able to provide the information regarding each student right so you will not be going to your principal or maybe the head of the school to explain the person that okay we have Sumit we have Ravi we have Sanju we have Anita the marks of Sumit is 100 the marks of Ravi is 95 the marks of Sanju is 82 it will take a lot of time and it is also not visible and uh the head of the school will not be having this much time to listen to all your information so what he or she is looking for he or she is looking for some information that can fully describe the complete class so what is the metric that we can use to consolidate or to better describe the complete class is what descriptive statistics is for example I can say that this particular class is having number of students as let's say thousand now this is a statistic which is describing the class and hence descriptive statistic let's say the average marks of this class is 85 again this number just this one number is describing the complete class and we will be talking more about what average is but this number is talking about the complete class and hence this is a descriptive statistic for example let's say the average age is let's say um let's say 25 similarly maximum marks then we have minimum marks so all these numbers which describe the data in such a way that we just need this number to explain the complete data points is basically known as your descriptive statistics it's very simple to understand right now we have descriptive statistics can broadly be classified into two kinds of measure so I'll just change my color and we'll write over here so one is known as measures of central tendency and then we have measures of variance or you can say it is measures of variation so I'll just simply write it as V measures of variation okay so we will be talking in detail about what are the various measures of central Tendencies is and what are the measures of variation is so what we have learned regarding the descriptive statistics it's basically those metric so those metrics that can Define the complete data or that can help help us Define the complete data we do not need to look into each and every row we do not need to look into each and every information remember in a particular data which is cross-sectional having rows and columns each row is known as the information the information so using the power of descriptive statistics we do not have to go through each and every information we can just use these metric to describe the complete data right then we have inferential now how we can understand inferential so I'll just quickly change the color and we'll take one example let's say there is company let's take any name of the company let's say there is a company known as Scala now Scala wants to know all on an average how many hours a student is studying in 12th class so what's the question how many hours a student is studies uh when the student is is in 12th class so I'll just write in bracket 12th class now now if I look into this problem overall there will be millions of student in India right and if I just um limit my scope of a study to India itself so there will be a lot of students that will be present in India right now those who will be in 12th class right now right and it is impossible for any company to perform survey in such a large scale so let's say if if scaler has to get this number scaler has to basically get a average number so I'll just write average how many average as so how many or maybe just simply write it as what's the average number of RS so what's the average number of of RS a student studies in 12th class in order to find this answer how we will take the procedure or how we will take the approach so let's say this is my complete population all the students who are in 12th class this year so 12th class students now there will be so many students right or if I just have to take a number let's take there are 100 million students I'm just throwing a number so 100 million students in order to get this statistic which is what's the average number of hours I have to reach out to each and every student so let's say The Entity Scala is performing the survey they will reach out to each and every student they will will ask the student whether um the particular student studies for 1 hour 2 Hour 3 Hour 4 hour or maybe n hours then we will collect the data the scaler team will collect the data so let's say we have the data of a student number one student number two student number three and so on till student number 100 million now once we have this data we will take the average and that is basically my output that I'm looking for now if we practically look into this procedure or into this problem seems like this is practically impossible because it is practically impossible for any company or individual to reach out to these many people or to reach out to these many observations or entities or individuals you can call it any anything right if even it is possible it requires a lot of time and lot of money lot of investment so what are the practical problems number one a lot of investment a lot of time investment similarly a lot of cost investment a lot of lot of cost investment so we have to invest a lot of things and none of the companies having this much time so what is the Practical solution for this well the Practical solution is very simple what we are going to do we are going to take some few students which we call as the sample and we will also talk about how the sampling is performed when we will be discussing about inferential statistics we'll be talking about how sampling is performed so let's say we are having a simple random sample and the term simple random sample will be I'll be explaining you in a better way so don't worry so let's say we are having a sample I have taken some um random students let's say we have taken U 100 students from uh uttar Pradesh we have taken 100 from let's say uh Andra Pradesh 100 from Tamil Nadu 100 100 from Delhi NC region so like this we have created a sample and let's say this sample is having around one lakh student one lakh now in order to perform survey on one lakh people seems quite quite practical right we can perform the survey on one lakh people so let's say after performing the survey on one lakh people we have the data with us S1 S2 and so on this uh till S1 lak and now we can take the average once we have the average this is known as the sample average so I'll just write mu mu is a Greek letter which is commonly used to represent average so average then I'll write s sample average I'll just write it in Brackets also sample average now inferential statistics is a branch of statistics which deals with approximating the population average or population parameter uh remember one thing if I am talking about the average of population it is population parameter if I'm talking about the average of sample it is sample statistic so there is a difference so please remember when I'm when I'm saying a statistic I'm basically talking about sample because practic it is impossible to do anything on population there are two problems either we are not able to reach out to all the observations in the population so if I ask you find out the average of average height of uh Indian men now it is impossible for any individual to reach out to each and every Indian male right so that's basically my population and if I can get that number if I can get the average age of all the Indians in this all all Indian males from India then that is known as the population parameter okay so inferential statistics coming to our Point influential statistics is a branch of statistic which help us to perform approximations so using this sample statistic I will try to to approximate the population parameter so let me just quickly write it over here using sample statistic we will try to approximate the population parameter and that is basically known as the in differential statistic now there are some procedures we have some few formulas we will be talking about that but this is basically the B the brief idea behind why we need inferential statistic okay the last portion or maybe the last segment which is known as the hypothesis testing when we talk about hypothesis testing as the name suggest we are trying to test something now now what we are trying to test we are trying to test something uh which has been proven or maybe a claim or maybe an statement there can be anything that we are trying to test let's take one very general example so you guys would have um watched this uh datl advertisement which says that DET all kills or detl soap kills 99.9% of the germs right now let's say you are suspicious about it you are suspicious you wanted to test whether this claim or statement made by the detol company is it having some statistical significance or not is it statistically correct or or not and that is where the power of hypothesis testing comes into picture so we will try to test here we perform test to check if a particular if a particular um claim or a statement can be statistically proven can be statistically proven so these are the three major segments which we are going to discuss now before we jump into any of these segments we have to understand what kind of data we are working with so we'll be briefly talking about how we can and explain a data and when I say explain a data what are the various types of column or variables we get in a data so let's talk about it uh we will be taking one very random example so let's just quickly create a data again we can extend the same example that we have taken in the last slide uh we were we were having student ID the name of the student salary age or instead of salary let's put uh marks age um then City then we can go about uh writing grades and maybe subject and finally we can talk about the uh gender let's fill this table with some few data points so let's say we have um 01 02 03 04 name is Sumit Amit a an marks is let's say 89 98 95 and 62 ages 29 um 28 25 and 30 C is let's say K and P B and G uh R and c and let's say t a t a then grades a b c and b subjects so let's use uh S1 as subject number one S2 S1 S3 gender male male male female now this is my dummy data now if I ask you are you guys able to observe any kind of Distinction among all the kind of variables that we are having in this data so remember these are uh these columns are also known as so these columns are also known as the variables so I may I may be using the term variables or columns and as I have told you that each row is basically my group of informations so each row is my information now if I carefully observe marks age these two columns are my numerical columns and why I'm calling it as numerical because the data which is which these columns are holding are the numbers these numbers are something that lies between a minimum and a maximum right so marks can be between 0 to 100 age can be between let's say um so since it is a school let's take uh 25 to 35 then if I talk about about City grades subject and gender all these are my categorical columns categories because CI is having kpur Bangalore rachi Tata these are all categories right uh grades a b c d categories subject S1 S2 s S3 categories gender male female category uh similarly this name is a categorical column because each name can be termed as a category the student ID is a identity column it is neither categorical nor numerical so I can just simply call it as a identity column now there are like in order to better explain these columns we have a formal name given to all the numerical columns and all the categorical columns let's talk about it so when we talk about variables it is formally divided into qualitative and quantitative now qualitative data is basically those columns which consist of non-numerical data so columns which consist of non-numerical data that's basically my qualitative data quantitative data is those columns which consist of numerical data so simply I'll write numerical data numerical data now within qualitative there is a further distinction uh which is your nominal and ordinal now what is nominal so when we talk about nominal these are those columns which consist of categories that does not comes with any kind of order among the categories for example I'll First Take example for example City let's say there is a column City and the column is having uh categories as kpur Delhi and then Mumbai and Kolkata now I cannot say that kpur is better than Delhi Mumbai is better than Kolkata because these are four distinct categories they they do not have have any kind of ESS among them so this is simply the nominal categorical variable or nominal qualitative variables so I'll just write down the definition categories that does not follow order among them are basically known as ninal data ordinal data you guys would have already guessed it it is basically those categories that follow ESS ESS among them for example grades or for example I will if I say um yeah grades is the best example so grades so a grades can be a a a A+ maybe I should write A+ first or let's take a b c d a b c d now we know that a is better than b b is better than c c is better than D so there is some ESS among these categories and that's the reason ordinal data um we can also talk about class um like first class second class third class fourth class so there is modness uh if I talk about let's say your educational qualification uh educational qualification so let's say uh a person is graduate a person is a postgraduate a person is intermediate so I know that high school then we can have inter mediate then we can have graduation and then postgraduation so these are the four categories which will definitely have a order among them so these are known as the ordinal qualitative variables now quantitative as the name suggest these are those variables which which can be Quantified quantitative means Quantified we can extract some quantity out of it for example minimum maximum we can perform addition subtraction so we can apply all those uh mathematical operations on these columns and after applying the mathematical operations the value will still make sense because they are qualitative sorry quantitative now quantitative are further classified as number one discrete and continuous now when we talk about discrete now these are those quantitative variables which will have values that cannot take decimal values or that can only take integer or natural numbers for example number of students in a class now we know that uh a class can have two students three students four students 100 students 200 students but a class cannot have [Music] number of citizens in a country again it will always be a natural number which is countable and finite so I'll just quickly write it in a bracket countable and finite when we talk about continuous these are those variables which are non-countable or uncountable and infinite uncountable and infinite let's take an example and it will make better sense for example let's take income now income of a person can lie between a minimum and a maximum value for example purpose let's take the minimum income is 100 the maximum income come is let's say 1 million now between these two numbers I have infinite number of observations that are valid can a person earn let's say 10015 let's say the these are in dollars so it will make more sense so is a person earning .15 of valid income yes is a person earning let's say one 1 million or uh let's take a smaller value so that you guys can understand it so let's say 1,265 778 I'm going very like I'm I'm taking more decimal values just to make make you guys understand that this is also a possible value which can which can exist and which is practically possible so a person can earn this number also right so between 100 and 1 million I have endless possibilities and each possibility or each observation is possible to exist similarly we have age now let's say our minimum age is 27 maximum age is 35 can I have a person with the age 27 years 5 months 6 days 4 hours 5 Seconds 6 milliseconds 7 NS yes practically possible so at every nanc I can have a particular observation that can lie in this scale and that is the reason age is also a continuous quantitative variable if I go back to my example we can see that uh both marks and age they are your continuous quantitative similarly uh if I look into grades they are qualitative but it is ordinal then subjects they are qualitative but it is nominal similarly GRS uh gender qualitative nominal because I cannot say male is better than female or female is better than male so these are two categories and it is nominal uh City again qualitative nominal so I'll just write Q hyphen n so name again qualitative nominal and I have already told you that so student ID is basically an identity column it is neither qualitative nor quantitative so this is how we describe our data into various types of variables now we are going to talk in very much detail about what are the various measures of central tendency is and what are the measures of variation is so with this particular topic we will initiate the first category or maybe first um group of discussion or first branch of our discussion which is talking about descriptive statistics now let's try to understand what is descriptive statistics as we have discussed previously that descriptive statistics is all about describing the data in one number or in few statistics parameters in such a way that it can describe the complete information and if I go back to my first slide here you can see that I was having the data of some students their name marks age and City and if the principle of of this following a school ask the teacher that okay tell me how your class looks like now the teacher will not be taking the principle to each and every row or each and every in information or each and every detail of the student so let's assume that if I am one of the teacher I would not say my principal that okay uh I have a class who is having Sumit whose a whose age is 29 whose marks are 100 and then we have another student whose name is Ravi who he is from Bangalore he's he's a 28 years old as you can see that it will take a lot of time and um I I would not be getting a a complete picture of how my class looks like right and that is the reason we have these descriptive measures and as we have discussed before the descriptive statistics measure are broadly classified into two buckets number one is measures of central tendency and measures of variation so let's try to now Deep dive into these two particular topics of descriptive statistics as we already know that we have two majors measures of central tendency and then we have measures of variation you can also say it is measures of variance or variation to be to be in general English we can call it as measures of variation now what is measures of central tendency as you know that if you if you carefully read the name it says measures of central tendency we are talking something regarding the center now is it the center of the data or is it center of my all the observation let's try to understand before we can quantify before we can uh find out any of these measures measures of central tendency we need some data right so let's create some hypothetical data of a class and we have age so let's say we have age of some 10 to 15 people and and let let me just throw some numbers on my screen so 24 23 26 then 21 26 27 28 then one more 28 one more 28 okay let's make it as 26 then 32 26 29 and finally 28 this is the data I'm having um please remember each data point over here describes a particular student so here we have a student number one 2 3 4 5 6 7 8 9 10 11 12 and 13 so we are in total having 13 students in our class now how I can explain the complete data in one number well we would try to extract that particular number which lies in the center of the data so that we know that below this number so assume that this pen is the center of the data so I know that below this point I'm having few observations or few students and above this point I'm having few students and it will give me a better idea about how my data looks like right and that is what the measure of central tendencies in this particular data the very first measure of central tendency is the mean or the average and many of you would have used it already so mean or also known as the average now um I hope you guys already know the formula if you don't know it is basically the sum of all the observations divided by the number of observations mathematically we can write T summation of so this this is a Greek letter again summation I'm taking a summation over I sorry over S and S is nothing but my student number so s i um or just write it as s SI I divided by n so uh if I expand this it would look like this S1 + S2 + S3 and so on till s13 this is my sum of all the observations divided by n now here you already know that we have 13 students so divide by 133 if I solve this so I have already done the calculations just to save some time but if you want want you can pause this video do the calculations and I would suggest all of you to please have a pen and paper with you while you're watching this video so that you can also fully understand the complete concept so please watch uh please stop the video please pause the video and calculate it by yourself if you have already calculated it it is basically 26.4 six now this is the average age what is the meaning of average assume that I have a number line now in this number line this point 26.4 6 is lying at the center of the data or maybe at the center of the number line now it might be at exactly Center it might not be exactly at Center but moreover it will be around the center so assume that this is uh this particular data point which is nothing but average and average can also be written as X and this bar so if you put X and O over it or above it you put a bar it's it's also known as X average or we can call this as s average because here we have assumed that this list is s so this particular number is my S average now what we can understand from this number we can understand that below this point I'm having few observations above this point I'm having few observations now you already know that in the class I'm having 13 people so what we can conclude we can conclude that in the class of 13 students we have few students above the age of 26 and I'm just taking 26 um and few students below the age of 26 so we are having few students above 26 and few students below 26 and this 26 is basically describing the center of my data now it might be exactly the center it might not be exactly the center why let's try to understand this let's say given this data I intentionally add a particular observation so let me add one particular observation 40 or let's add 50 so I'll write it over here and maybe let me change the color 50 so I've intentionally added a student whose ID is 14 and the age of this student is 40 and even if you look into this particular data which is having 13 students you can see that almost all the students are having age in 20s which is 21 22 23 24 uh till 29 we have a lot of observations but there is one student whose age is 32 right now these observations are basically known as outliers let me just remove this particular data point and let me um write them in ascending order and that would actually help us to derive the another measures of central tendency so I'll I'll write AG is equal to now the minimum number over here is 21 so I'll just draw a line over here 21 then we have so we don't have 22 so we have 23 so I'll write 23 over here then we have 124 so 124 then we have uh I don't think we have 24 so no 25 we have 26 226 326 and 426 so 26 26 26 26 then we have 127 and only 127 then we have 28 28 28 28 28 28 finally we have 129 and 32 2 9 and 32 so all I have done is just Rewritten all the age of all the 13 students in ascending order where this is the minimum age and this one is the maximum age now if you carefully observe you can see that 32 appears to be 32 appears to be very different from majority of the data now this observation is known as the outlier so this is known as outlier the formal definition of outlier is outlier is that particular observation in my data which is very different from majority of the data points so let me just write it down outlier is that particular observation in the data that is that is very different from majority of the data points so if you carefully observe here majority of data points belongs to 20s 21 22 23 till 29 but suddenly there is a number 32 and that is basically the outlier now what is the problem behind this outlier why outliers are problematic when we talk about uh the descriptive statistics if I try to look at the median now what is median median is another measure of central tendency so what is median median is another measure of central tendency so we have already talked about about mean over here then we have median median is exactly the center of the data exactly Center when I say exact center that basically means above this I will have 50% people below this I will have 50% people so it is exactly the center how we can find out median so uh there is a simple formula n + 1 / 2 if the number of observations are odd uh here we have odd number of observations 13 so 13 + 1/ 2 is 14 / 2 which is 7 so my 7th observation is the center of the data or is the middle point of the data and let me just quickly mark it so it is 1 2 3 4 5 6 7 over here this is my center now if you um just number if you just calculate uh if you count below this 26 below this middle point I'll just write M to depict that this is a middle point I am having 1 2 3 4 5 6 I'm having six observations Above This 1 2 3 4 5 6 I'm having six observations so median describes the exact center point of the data so what is median um this value lies exactly in the center of the data in such a way in such a way that we have 50% observations 50% observations above this point above this point and here we have taken m M so above this point M and 50% observations below this point and again this is M So Below this point M so median describes the center of the data in such a way that it is lying exactly at the center where 50% of the data is below this point 50% of the data is above this point while I cannot say the same thing for average and you can see why average is basically taken out by considering all the data when I have to calculate average I'm summing over all the data here you can see the formula the formula is sum over all the data points divided by the number of data points and due to these influential observ a which are known as outliers my average can shift towards the outlier here you can see uh mean as I have told you mean is also the center of the data while median is also the center of the data these all these both metric describes central point of my data but why there is a difference why my average is 2646 and why my median is 26 so my median is 26 now this is because of this outlier so where I'm having outlier I'm having outlier on the higher end this is my higher end so what has happened to the mean my mean has also shifted towards the towards the outlier now please remember this I'll just write down as a note over here that your average average gets impacted with the presence of the outliers and will shift towards the outlier so if my outlier so assume that if this is my number line and if my outlier is on the lower side then my average will shift towards the outlier so my average will shift towards the outlier if my outlier is on the higher side then my average will shift on the high higher side and we have already seen this example my median was 26 but my average came out to be 26.4 6 because my average has shifted towards the outlier right then we have one more measure measure of central tendency which is mode and mode is very simple to understand mode is the observation with the highest occurrence in the data so let me write the definition over here I'll just simply write it as observation with maximum frequency observation with maximum frequency so um uh if you carefully observe in this particular data I'm having the observation number 26 not number 26 observation 26 six to have to to this observation has appeared for four times right so the frequency is highest for the observation 26 and hence my mode of this data is 26 mode of the data is 26 now there might be cases where you do not have any mode for example let's say I'm having four observations um sorry four observations four observations 26 or there are four students with the age 26 we also have four students with the age 28 now in that case when there is a tie or there is no mode so you will simply write no mode in that case you have to use um mean and median to describe the center of the data so I hope you guys are able to understand understand the measures of central tendency as the name says this or all these three measures which are mean median mode they basically describe the center of the data and if your data does not comes with outlier and here you have seen that 32 was the outlier but if your data does not comes with outliers um maybe either on the higher side or on the lower side we do not have any outlier and these three measures will fall very close to each other or approximately they will be equal to each other so your mean will be equal to mode equal to median and here you can see uh I was having average to be 26.4 6 if I was not having this outlier which is 32 I would have gone got this value close to 26 only sorry 26.4 6 is my average my median is 26 my mode is 26 in this case I can say that my data is having a center of 26 now if I go to my principal and if I tell to my principal that the average age of my class is 26 with 13 students then it gives the rough idea to the principle that okay we have a center point 26 and we have few students above this point few below this point if I tell to my um principal that U the median is 26 with total number of students to be 13 then in that case the principal will be able to do a a little bit of math and he will be able to understand that okay we have six students below 26 six students above 26 so I hope um you guys are clear with measures of central tendency but there is a problem the problem is I'm still not clear how the values are varied so I know that Center is 26 but how the the variation is there in the data so am I having uh the age of the students to be very much varied when I say varied that basically means the values are very very far away from the center so let me just give you a very simple example over here let's say this is 26 this is 26 and here also we have 26 so in these two number lines the center is 26 itself but in this number line on that is on my left of the side uh left of my screen the minimum is 15 maximum is 30 here you can see that the minimum is 20 and maximum is 30 or let's uh let's take it as U 29 now here if you see that the values are very far away from the center there is a high variation while here the values are not so far away from the center the variation is low so measures of variation tries to quantify this variation given a particular data how much values are scattered around the center this is what we are going to learn in measures of variation which is our next topic so now let's talk about measures of variation measures of variation and I have already given you a very um very short example I would say in my last slide but now let's try to uh have a more insightful example which will clarify your doubt regarding what exactly is measures of variation so let's try to take two examples or two subjects and I'm just plotting these two subjects to get you a visual picture about what exactly I'm trying to explain assume that here the x-axis represent the student ID for both the plots and Y AIS represents the marks these are the marks so let's say the average marks are represented by this red color line red color dotted line these are the average marks or this is the average marks for example you can assume that the average marks is 50 for both the subjects but for subject a and this is subject B I'm not just naming any subject I'm just saying A and B for subject a um the student marks are scattered like this so This represents the marks of a student number one so this is a student number one then this marks of a student number two then uh we have a student number three and so on so I'll just draw it now because I hope this gives a clear picture about what I'm trying to explain so this is basically the scatter plot for subject a for subject B this point represents the marks of a student number one this for a student number two then three 4 4 5 6 7 8 and 9 and and let's say 10 here also I think we have 10 1 2 3 4 5 6 7 8 9 10 so you can see that and in both A and B we are having 10 10 students but for subject a you can see that all the marks or all the students have scored they have scored marks which are very close to the average most of the people have scored uh let's say uh if I just create a band let's let's create this band a doted band so most of the students fall in under this band or within this band while here for subject number B you can see that there is a lot of scatter you can see that um there are data points which are very far away from the average line there are data points which are also close to the average line but in student uh in subject a all the data points are very closely scattered around the average line right now can I say that for subject a the variation around the mean so so this is my mean the red color dotted line is my mean of the average line so m m represents the mean or the average line so for subject a the variation of the marks around the mean is low for subject B the variation of marks around the average line is high so here variation around the average line for subject a is low while for this variation around the average line for subject B is high now the measures of variation tries to quantify this how I can say this is low and how I can say this is high we need a number right we need some quantity we need a a number that can describe this measure or describe this scatter or describe this variation so measures of variations are those uh metric that describes the variation of the data around the center or around the average line so measures or metric that describes the variation of the data around the center of the data center of the average now what all measures we have so number one so I'll write it over here we have so we exactly have I think four measures let me write it down so we have range or we have variance we have standard standard deviation standard deviation and finally we have the coefficient of variation coefficient of variation so let's talk about each of them now in order to talk about any of them we need the data so let's create some random data so let's say here we have the data of the marks of a student in one particular subject and I'm intentionally drawing this table a bit large because we will be calculating a few um few parameters in order to arrive to the exact quantity that we are looking for so let me just quickly draw the table so here we have the student ID which again runs from 1 to let's say 10 1 2 3 4 5 6 7 then these are the marks so I will write X M XM means marks of students so let's say the marks are 27 52 48 3 33 39 65 and 82 now this is my marks of the students now now if I have to calculate the range range is nothing but the maximum minus minimum it's G it gives you a a very raw idea regarding what is the range of my um what is the range at which we are trying to work or if we are talking about the marks then what is the minimum marks scored what is the maximum mark Max is scored and the range is nothing but the difference between Max minus Min so my Max Max is 82 minus minimum which is 33 and if you subtract this you will get the range the second measure is variance which is a better measure to define the variation in the data what is the problem with range range is again influenced by the outlier so let's say if there is one student in the class who has scored 100 While most of the student has a scored let's say uh close to 33 33 35 39 40 then in that case also the range would be very high so it would it would mislead you it would give give you a misleading idea that students have scored in a high range but that is wrong range gets influenced by the presence of outlier and that is the reason we don't use range range as a measure to to make or to understand about how the data is uh how the data is scattered around the center of the data or Center Line so let me write it down over here range is a metric that gets influenced with the presence of outlier now how to arrive to the variance so we are talking about variance now in order to calculate variance we will first of all find out the center because as I have told you variance is all about calculating the scatter or calculating the variation of the data from the center point which is the mean or the average so we have to find out the average first of all the average and if you guys want you can pause this video and quickly calculate the average because we have already learned about it so just take the sum of all the observation divided by seven because s is the total number of students in the class and that would give you the average so the average of this particular marks for the students would came out to be 49 49. 42 now for uh Simplicity purpose and for the for making all the calculations very simple very clean let's take this number as 4 49 so assume that the center is 49 now we have the center so let me just quickly draw the graph the same graph that we have seen in the last slide uh this is a student ID these are my marks and we have the center now at 49 so somewhere in the middle but one point below so assume this is 49 so this is my 49 now let's mark all these numbers so 27 would lie somewhere around here so this is a student number one then we have 52 so this is my student number two and then we have 48 which is very close to the average so this is 48 then 33 somewhere over here 33 then 39 maybe over here then 65 over and above the average line and finally 82 which is again over and above the average line over here so these are my data points this is my student number three this is 4 5 6 finally seven we have seven data points now we will calculate the distance of each of this data point from the center from the center which is average so when I say distance I'm basically interested in this distance this is D1 then this is D2 similarly this one will be D7 this one will be D6 so distance is nothing but I'll take the data point so XM minus XM bar I hope you guys remember if I put a bar on a VAR aable it becomes the average so this is my average marks which is 49 average marks so 27 - 49 that will be my D1 52 - 49 48 - 49 33 - 49 39 - 49 65 - 49 and finally 82 - 49 now if I ask you what is the average Den um now if I ask you what is the average distance from the center basically mean how much on an average a data point is deviated from the center now if I have to find out average I have to take the sum of all these values right the problem is that few of the values are above the average which would result in a positive value few of the values are below the average which would result in a negative value and if I simply take the sum of all these data points the positive will cancel out negative so we would get a very wrong value and if if I know that the center is zero and we have some positive value some negative value then everything will be uh you will get a value close to zero which basically mean there is no variation which is wrong right in order to overcome this problem I will take distance Square I will convert the posit uh the the um positive error or the negative error into the square so that that the positive and negative can be uh can be nullified we can get rid of these positive signs and the negative sign so let me just quickly write down what we are left with so this will result in I'll just quickly write down all the numbers because I have already calculated it now here you can see that I'm having -6 and + 16 and if I simply take the sum over all the D this minus 16 will cancel out + 16 so I will not get the uh get the contribution of these two numbers similarly you can see that I'm having - 22 + 33 -1 + 3 so we are having some values which are positive some values which are negative and we cannot simply take the sum so we will take distance Square so 22 the whole square is I have already done it so 484 9 1 256 100 again 256 and 1089 now we will take the average of the distance Square so what we have to take a average of distance Square now you can you guys can ask me that Sumit why you have taken a square why you haven't taken the absolute value so I would have simply taken - 22 as just 22 and minus 16 as just 16 now please remember that when we are working with Statistics we are using a lot of calculus also and as of now I cannot show you where I would be working with calculus but as we proceed forward uh you would see that statistics is heavily dependent on uh calculus and when we talk about functions the function or the derivative of a constant is always zero and derivative of a uh Square term this is d square or maybe a derivative of a function which comes with a polinomial value can be differentiated so your Calculus does not works with constant does not works with functions which are constant and that is the reason we will not be using those um we will not be using the absolute term and we will be uh using the square term so we have to take the average of d squ and you already know the formula which is summation over D ² and divided by n so if I take the sum of all the observation which are present in the column D ² divided by 7 I would get the average distance or not the just the average distance I would get the average of D squ so this would be 2 1 95 / 7 which is 31357 now this value is known as variance variance which is 313.37 the problem with variance is that you can practically not make sense out of it I'm not able to understand what this value mean and if you remember it was d squ d s is equal to 31 3.57 variance is also known as Sigma Square it is known as Sigma Square this this is a Greek letter Sigma Sigma Square so so my Sigma will be square root of 31 3.57 and this would get us 17.65 this is my Sigma now what is the meaning of this now Sigma square is my variance Varian is basically my Sigma Square while Sigma is my standard deviation standard deviation standard deviation is my Sigma so standard deviation talks about the average distance that the data points are deviated from the center so on an average on an average the data points the data points are 17.76% the mean and that is what this standard deviation mean how much on an average the data points are deviated from the center and that is 17.65 now if this value is very low which basically means the values are closely concentrated around the center point for example if I go back for this subject B this one you will get the value of Sigma which is my standard deviation to be high while for B sorry for a which is this case I would get the value of Sigma which is standard deviation to be low so if I call this as Sigma a and sigma B then my Sigma a would be less than Sigma B the standard deviation in subject a would be less than standard deviation in subject B when I say that standard deviation is less that basically mean all the observation or most of the observations are scattered very close to the mean while when I say uh standard deviation is high that basically mean most of the observations are scattered far away from the mean now this gives you a better idea regarding how uh your data looks like uh if you guys want to practice you can see that we have already calculated the mean median and mode of the age variable we were having 13 students and their respective age we have already calculated mean median and mode now for practice what you guys can do you can quickly calculate the standard deviation of this particular data and then you can try to um understand it so if you have to explain to the principal or to any person about how this class looks like so we have already seen one segment that uh this class is having the median to be uh 26 which basically mean 50% people or 50% students fall below the uh age 26 and 50% fall above the age 26 now if you get the standard deviation of let's say 5 years I'm just throwing a value don't I I have not calculated it so I'm just giving you this task to calculate this but let's say if the standard devision of this data is 5 years then that basically means on an average on an average the students are having 5 years above the above the central point which is 26 so let me just quickly draw it so let's say this is my student ID this is my age this is my Center which is 26 and if I'm having a standard deviation of 5 years that basically mean I'm having few students who are above the age uh above the middle age by 5 years so 26 + 5 will give you 31 right and 26 - 5 will give you 21 so you can see that uh you can say that most of the students in your data are between 21 to 31 right so using the measures of descript measur of central tendency we get the Central Point using the measures of variation we get how the data is scattered around the central point but there is a problem the problem is let's say if I'm having two subjects let's say I'm having a subject maths uh having a standard deviation of let's say 15 marks and then I'm having age having a standard deviation of uh maybe let's say I'm having another subject another subject science having a standard deviation of let's say 20 now can I compare these two standard deviation can I say this is high and this is low well we cannot do that using standard deviation we cannot compare the variations among the series now please remember this is a very important note that stand standard deviation standard deviation talks about the variation in the individual series individual series and we cannot use it to come compare the variation among the series so how we can compare the variation among the series for this we have the next measure which is coefficient of variation so now let's try to understand coefficient of variation coefficient of variation as I've told you the standard deviation is limited to describe the variation within that Series so if I'm if I'm saying that the standard deviation of the maths marks is 15 and the standard deviation of science marks is 20 I cannot use these two individual standard deviations to compare the variation among the series for that we have a different metric known as coefficient of variation let's take one example let's take two series series X which is having uh 1 2 and three three data points series Y which is having again three data points 1 not one one2 13 now if I take the average average of X which it will came out to be 2 average of Y will came out to be 1 not 2 similarly standard deviation of X will be8 to1 standard deviation of Y will be 81 now both X and Y are having the same standard deviation can I say that they have the same standard deviation or can I say that they are having the same variation if you closely observe you will find out that this is not true while in the series X you can see that the third element which is three is three times of the first while in series y I cannot say the third element is three times of the first right so they both are not having same variation X and Y are not having same variation how we can compare the variation how I can say that X is having more variation Y is having less variation because a standard deviation is getting us the same value 8181 how we can get or how we can compare for this we have coefficient of variation which is nothing but the ratio between the standard deviation to the mean of the data the standard deviation to the mean xar for coefficient of variation for series X would be 81 / 2 for y it would be 81 / by 1 not2 so this would be 45 and this would be 07 now if we closely look into these two values which is uh 45 so 0.45 and 0.7 this particular series which is series X is having the coefficient of variation of 45 while series Y is having the coefficient of variation of 07 and I have already told you that the series X is having more variation than series y because in in this series the third element is three times the first while I cannot say the same thing for series y right so when we have to compare two series or we have to compare the variation among two or more series we look at the coefficient of variation a higher value represents higher variation so this series is having higher variation when compared when compared with series y so using coefficient of variation we can simply compare the variation among the series and we should not be using standard deviation I hope this makes sense now here you can see I have taken one simple example and here we'll be using all the measures which is measures of central tendency and measures of variation to understand how these three products are performing in terms of sale and please remember this is the number of units so for product number one in month one 929 units were sold similarly for product two in month one 1240 units were sold and so on so we can quickly find out the average median and mode for each of them so is equal to average average of this particular column and I can simply extend this formula to all other cells similarly median will be is equal to median I want median of all these uh cells and I can simply extend this to all other cells and then we have mode so is equal to mode now here if you carefully observe I do not have any mode so my mode is not available and I have told you previously so if I do not have any observation which is uh which is duplicator which is repeated for more than one time which is having the highest frequency in our data we will not be having a mode so we only have average and median so I'll simply delete this because we don't have a mode so let me delete this now if you compare the average and the median you can see that for the sales of product number two the average and the median is quite close to each other uh while the same thing thing can be viewed for product number three but there is a difference there is a vast difference I would not say vast but there is a quite a good difference in the average and the median for product number one let's try to look at the measures of variation also so we will calculate the standard deviation so is equal to stde V and we will use this sdde V this one a standard deviation and I can just simply extend this and now we will calculate the coefficient of variation Co so this is given as uh the standard deviation divided by its average now these are the numbers which we have to carefully read and make sense out of them what we can conclude about these three products let's try to understand them now if you carefully observe and uh I would start with product number two first in product number two you can see that the standard deviation is very small uh while the coefficient of variation is also very small uh here also you can see that coefficient of variation is 23 here it is. 22 but here it is 04 which basically mean among all these three series the lowest variation is with the sales of product number two also if you look carefully the median and the average is very close to each other also the average is highest among all the three products so what I can conclude about product number two number one the product number two is very stable in the market because it is not it is not having variations in month number one same sale in month number two approximately same sale month three same sales and so on also it is always consistently high and that is the reason that average is uh the highest at 12625 units for product number three a similar Behavior can be seen but on the Lower Side uh the average is 556 which basically means when compared with product number one product number two and product number three when compared with product number one and two the sales of product number three is on the lower side or the lowest side while the coefficient ient of variation for this which is product three and product one is approximately the same which basically means they both are varying at the same rate the variation is approximately the same so you can you can assume that one which is product number three is on the lower side and it is weing around that product number one is on uh is on maybe second lowest and varing around that and product number two is on the highest so if I just color code them I would say product number three would be coded as red uh product one I can put it as yellow and two I can put it as green and the reason I'm putting it as green because it is the most stable product in the market while uh sales of product number three is on the Lower Side that's the reason we have assigned a red color and product number one is having uh the sales between the highest and the lowest sale which is product two and product three so we have assigned a yellow color now you can see that using these measures we can better make conclusions about the data we can better explain the data and that is the power of descriptive statistics so you have seen that uh in this particular descriptive statistics segment we have talked about uh how we can explain that we have talked about um the various types of variables we have talked about qualitative quantitative right then we started our discussion around what are the various measures of central tendency we talked about mean median mode then we compared all of them then we talked about measures of variation where we looked into range variance standard deviation coefficient of variation and finally we have concluded all our learning into this table where we have compared the sales of these three products to conclude about which product is better and which product is not good now this is what the application of descriptive statistics is there is one small topic which is very Rel related to outliers I'll just quickly explain you that though that is not so much common in the uh in the descriptive world but quite important to understand I have told you about outliers right over here so I have told you that in this particular Series where we have the age 32 is considered as outlier and what is outlier outlier is that observation which is very different from majority of the data now there is one particular topic which is known as the leveraging point or leveraging data point what is is that let's take one very small example so let's say I have this data and this is the uh stock price so here we have the day of the month and here we have the stock price a stock price of a particular stock let's say on day one the stock price was somewhere around here then over here then over here then here then here here and one day it jumped over here one day it jumped over here then came back to its normal position like this now at the very first like if I provide you with this data all of you will say that this is outlier this is particularly the outlier data point which is true this is outlier but as soon as I provide you one additional information your answer will change now if I told you if I tell you that on this particular day let's say this is Day Day number 15th or maybe let's say day number 25 so in the 25 fifth day 25th day of uh month uh month let's say March 2023 let's say the stock market or the stock exchange is stock exchange or let's not talk about Stock Exchange let's say this company let's say this is the stock price of the company reliance Reliance so on 25th day of March 2023 Reliance announced a major product announced a major product announced a major product or they made uh they did they they basically uh conducted a conference they they conducted a conference they conducted a meet up in which they announced a major change in their policy or major change in their product due to which the stock price suddenly uh raised to this particular point now this is not an outlier this is the leverage data point because it has been leveraged due to this event so I cannot say this is an outlier I cannot say say this is an outl this is a leverage data point so there is a difference between outlier and leverage data point if I go to this particular example of age I cannot say this is leverage data point because this person has not or the age of this person has not been impacted by any external Factor so I cannot say this is an out uh this is an leverage data point this is just an outlier for but in this case where we are considering the sales of a particular stock price a stock price of a particular stock let's say Reliance this data point is the leverage data point because it has been it has been influenced the jump is because of a particular external factor which is nothing but the announcement right so I hope you guys are clear till this point point now we will move forward to the next particular discussion which is probability so now let's start with the topic probability now uh we started with descriptive statistics we have to talk about inferential and hypothesis but before we can move forward to understand inferential and hypothesis we have to first of all understand what is probability uh you guys must have learned about it in your 10th standards or maybe in your college days but today we will try to understand the most important topics within probability which will help you to understand inferential and hypothesis so let's start understanding probability now the term probability most of you are familiar with this so before we talk about probability we have we have to understand what is an random experiment random experiment random experiment is a particular experiment in which you are not certain about the outcome for example if I toss a coin I am not certain if I will be getting a head or a tail right if you are certain about it then we don't need prob probability right so let's say if I already know that the coin is going to land as head then I don't need probability because probability is all about finding out what is the chances of getting a particular event right so uh what is a random experiment it is something or it is it is basically an experiment or an event for which I am not sure about the outcome I am not certain about the outcome so uh let me just quickly note it down it is basically an event for which I am not certain about the outcome of for example roll roll of a dice roll of a dice now if I roll a dice I know that I may get one so either 1 2 3 4 5 or six but am I certain that I will be getting a one no am I certain that I'll be getting a two no I may get any of the outcome within 1 2 3 4 5 6 so rolling a die is particularly a random experiment because I am not certain about what is going to be my outcome similarly let's say about my birthday so now you are not certain about what is my birth date but there is a difference if I talk about my mother she is very much certain about what is my birth date so for my mother my birthday it birth date is not a random experiment but for all of you listening my birth date is an random experiment because it can be anything it can be any year it can be any month it can be any day so sometimes a random experiment or the uncertainty is maybe because of the lack of information for example my birth of date if you know about me if you are my friend then you might be very much certain that my birth date is in this year this month and this date so there is a difference between a random experiment which for which you are uncertain about the outcome and some event for which you are certain like my birth date so that is not a random experiment for my mother but for all of you it is a random experiment so I'm just taking example of all the random experiments rooll of a dies or maybe toss of a coin and we can note down all such random experiments now what is a sample space as I was talking about that when you roll a die you may get any of the outcome within 1 to six right so 1 2 3 4 5 6 you can get anything the set of all the possible outcomes out of an experiment is known as the sample space so what is the example space um the set with all the possible outcomes out of an experiment so if I take the example of Ro role of a dice I may get one 1 2 3 4 5 or six so this is my sample space for rolling a ties for toss of a coin now if the coin is a Fair coin which basically means one side will be head and one side will be tail so I may get either a head or a tail now this is basically my uh sample space for the random experiment toss of a coin if I talk about my birth date in that case all the years maybe let me not let me just limit the sample space to 1996 so if you know that the birth year is 1996 then in that case the sample space will be all the months and within each month all the dates or all the days so your sample space will have quite a good number of outcomes right so this is what my sample space is now what is probability so probability of a particular event X is basically how many times that event has occurred in your sample space so number of times the event occurred in the sample space and we divide this by the total size of the sample space total number of outcomes so total number of outcomes in my sample space now please remember one thing uh if I go with the definition this is the very accurate definition that the number of times a particular event has occurred divided by the total number of or total size of the sample space but there might or there may be various ways of writing a sample space let's try to quickly understand the difference because this is the major mistake most of the students commit while calculating probability so this particular sample space is having all the outcomes which are equally likely so all the outcomes all the out outcomes in this sample space are equally likely what is the meaning of equally likely so if you already know about it you can just pause the video and comment on this particular video but equally likely basically mean that if I choose any of the event it will have the equal chance of occuring for example what is the probability that I will get the number one if I roll a die it is simply 1/ 6 what is the probability that I'll get two Again 1/ by 6 3 1/ 6 4 1/ by 6 5 1/ by 6 6 1/ 6 so all the events in this particular sample space are equally likely because each of them are having the probability equal to 1x 6 similarly if I take this sample space this sample space is also equally likely so what is the probability of getting head 1X two what is the probability of getting tail again 1X two so the probability of getting head and tail are equally likely but now let's take example where I'm tossing two coins so I am having two coins my drawing is really good so I'll use my skill over here so let's say these are my two coins now what are the possible outcomes I I might get a head head a head tail a tail head or a tail tail right now what is the probability that I'll get a head and a head like both coins land into uh both coins land with the face head and head this is 1/ by 4 because here I'm having total four outcomes and the probability of getting getting headed is is only one event is there so the probability of getting headed is 1x4 this is 1x4 this is 1x4 this is 1x4 so this sample space is also equally likely can I Write the sample space in another way yes we can so let's say now my sample space is uh the number of heads so I can have zero heads one head or two head this is also a sample space but there is a difference this sample space is not equally likely this sample space is not equally likely so the probability of getting zero head is 1x4 and please remember here each number represents the number of heads so the probability of getting zero head is 1x4 probability of getting one head is this plus this so it can be either head tail or tail head so 1x4 + 1X 4 it is 1x2 and probability of getting two head is again 1x4 so here you can see that the probability of each event is not equally likely so whenever you are calculating probability you have to make sure that your sample uh sorry your sample space should be equally likely if it is not then you have to carefully uh calculate your probability I hope this is making sense so now let's talk about some of the rules of probability so I hope everyone of you are clear with the idea of random experiment and the sample space there is one particular term which is known as event now what is an event any possible outcome or group of outcome out of an random experiment or out of a sample space is known as an event for example if my experiment is toss of a coin then I may get a head or a tail so out of this experiment getting a head is an event getting a tail is an event similarly if I'm tossing two coins I may get head head head tail tail head and tail tail in this case getting two heads is an event getting one head is an event getting no head is an event so all of these are my events so event is nothing but a particular outcome or group of outcomes out of a sample space so outcome or group of outcomes out of a sample space this is my event so how what what all we have talked about number one random experiment number two is sample space and the two ways in which we can write the sample space one is equally likely sample space other one is not equally likely the third topic which we talked about was event now let's talk about the rules of probability so when you are using probability as a measure or as a tool to to basically calculate the possibility or probability of a particular event there are some rules which we have to keep in mind so let's talk about them they are also known as the exams of probability I'll just simply write it as rules of probability so rules of probability okay the very first rule is given a sample space the probability of each outcome will always result to one the total probab ility is always equal to one let's try to understand it let's say I'm given with a sample or sample space of roll of a dice so when I roll a dice I may get 1 2 3 4 5 or a 6 we know that the probab probability of 1 is 1x 6 so what is the probability of getting 1 in the dice is 1x 6 what is the probability of getting two it's again 1X 6 uh prob probability of three again 1X 6 and this is equal to probability of four probability of five and probability of six all is 1 by 6 now if I take the sum probability of 1 + 2 + 3 + 4 + 5 + 6 if you do the quick math you will find out that it is equal to 1 that is my first rule so summation of Pi I where I is running from 1 to n or I is over my sample space so I'll just simply write where I belongs to my sample space the sum of all the pro probability is always equal to one this is my first rule let's talk about so let me just uh formally write it down so that you guys are having all these notes and whenever you prepare for your interview if you just read all the note it will be very easy for you to pass your statistics around so uh the sum of all probability over the sample space over the sample space is always equals to one this is my first rule now let's understand the second rule which is known as the rule of sum or addition prob probability it says that the probability of a union B is equal to probability of a plus probability of B if a and b are disjoint event so there is a condition if a and b are disjoint now don't need to worry I'm not talking anything in German this is very simple to understand let's take one simple example let's say I'm tossing two coins so when you toss two coins you may get two heads two tail one head one tail and one tail one head now if I ask you what is the event of getting two heads or so let's say what is a what is the event of getting two different outcomes let me write it down what is the event of getting both coins with different outcome which basically mean both coins land into a different outcome so one is head so another one will be tail if one is tail another one will be head and we already know that there are only two possibilities so my event would be head tail or tail head right let's call this as let's call this event as event a now we have another event where both coins lands into same outcome which basically means if coin one is head coin two is also head if coin one is tail coin two is also tail so this is same outcome so both coins both coins with same outcome and that would be head head tail tail and this is my event B now if I take the union union means addition if I take the addition of these two set then according to the rule it is the sum of in individual probability of the event so what is the probability of event a if you guys have understood you can simply answer this and you can add your answers in the comment section so the probability of a is there are two outcomes with which matches my condition so head tail and tail head there are total of four outcomes uh in my sample space so the pro probability is one uh sorry 2 / 4 which is simply 1/ 2 similarly probability of B is also 2 / 4 which is 1/ 2 now if I take the sum or if I take the union of these two events so what is the union of a union b a union b means I will combine all the events in my sample space now since we do not have any common element my union would simply be head tail tail head head head and tail tail and if I ask you what is the probability of a union B you will just simply say it is 4 / 4 which is 1 now if I add these two prob probability probability of a plus probability of B you can simply add 1X 2 + 1 by 2 this is also one right so that basically mean if two events are disjointed so here you can see event a and event B are disjoint why I am calling them as disjoint because they do not have any common event you can see that in event a I'm having head tail in event B I'm having head head tail tail there is no common event so when two events A and B are disjointed they do not have any common event then the probab probability of the Union is always equal to the individual sum of probability and this is what we were trying to prove so let me just quickly highlight this and this is my second rule of probability this was my first rule this is my second rule and this rule is only uh only applicable when your two events are disjointed where we have taken this example A and B are disjoint because it is it does not it does not have any common event among them right simple now let's understand the third Rule and third rule is also very much closely associated with the second rule where we proved that probability of a union B is always equal to the probability of a plus probability of B if A and B are disjoint right now there is one more rule if they are not disjoint then probability of a union B is probability of a plus probability of B minus probability of a intersection B again you guys would be thinking that it is something German but not let's try to understand it in order to understand it in a better way we will take the example of playing cards playing cards and I hope most of you would be aware about the playing cards but if you are not don't need to worry let's try to write down the complete uh details about how many playing cards we have what are the various suits and everything so in total a playing card or a deck of playing card comes with 52 cards with four suits so we have Club is paid hard and Diamond now if you guys are aware about it then you can easily point it out that head sorry club and Spade these are two black cards while heart and Diamond these are my red cards so club and Spade are my black cards now Club is having 13 cards is Spade is having 13 heart 13 and Diamond 13 so each of them is having 13 13 cards now which all 13 so let's write down the complete structure so each suit comes with total of 13 card which starts with A's a card of A's then 2 3 4 5 6 7 8 9 10 and then three cards which are known as the Jack queen and king now this is the complete portfolio of your playing card or the deck of a playing cards now let's try to calculate some probability of events so what is the probab probability of getting a jack what is the probability of getting a card which is a jack card now we already know that a Jack or card of a jack can come from any of the suit so I may have a Jack from Club so I'll write CJ Club of Jack or Jack of Club you can say anything then we have a spade I I can get a Jack from paid I can get a Jack from heart I can get a Jack from diamond right these are the total four outcomes or total four favorable outcomes while the total number of outcomes or total number of events in my sample space is 52 because each card is a outcome right so the probability of getting a Jack is 4 ID 52 right what is the probability of getting a black card now we already know that club and Spade or let's simplify the question so that I can demonstrate you the rule that we are trying to prove so what is the probab probability of getting a heart now we know that there are total of 13 Hearts total of 13 Hearts because heart itself is a suit so we have 13 cards belong to heart and total are 52 so the probability of getting heart is 13 divid by 52 what is the probability of getting Jack and heart now before we jump to the answer I know you most of you would have already calculated the answers let's try to note down all the outcomes which matches this condition and then it will be easy for us to understand if we are going to in the right direction or not Jack and heart which basically means the card need to be a jack card and the card should be a heart now if you closely try to understand this problem you will find out that it is only one card that can satisfy these two conditions a card which is Jack plus it all also it also belongs to a heart so the pro probability is 1 / 52 because I will have only one card which is heart of Jack or Jack of heart now what is the probability of Jack or heart now or means nothing but the addition before we calculate this probability let's write down all the outcomes or all the events so it will be heart of a heart of one heart of two heart of three four 5 6 7 8 9 10 then heart of Jack heart of um queen and heart of King these are all my heart uh these all these are all the cards that belong to the suit heart now let's write down all the Jack cards so I can have a Jack from heart so HJ I can have a Jack from Club so CJ I can have a Jack from Spade so SJ and then DJ now if I combine them if I take the union of them so let's call this as event B this as event a if I take a union b a Union b means I have to create a set which is having the all the events from a and all the events from B but if there is a event which is replicating for two times I have to only pick one of them right so if I combine them you will find out that there is one event which is common in these two set and this is this card let me just quickly Circle it so heart of Jack this card is available in both the set right and if I have to combine these two set I have to count this particular card only for one time so if I write down all the combined event it will be from heart of a heart of one heart of two heart of three four 5 6 7 8 9 10 then heart of Jack now I have counted it already for one time then heart of Queen heart of King and then I will write down these three cards which is CJ SJ and DJ Now how many events I'm having in this so 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 the mistake is this is not H1 one this is H2 so after Ace it starts from two so I haven't read I should not be writing this in that case My outcome will be 16 divided by 16 / 52 probability of Jack or heart is simply 16 / 52 now if I go with the formula what is the probability of getting heart it is 1 2 3 4 5 6 7 8 9 10 11 12 13 so this is by 13 / 52 this is 4 / 52 right if I add them without subtracting this common event I will be basically counting this card for two time which is wrong right if I have to add these two set I have to make sure that I'm just counting this card for one time because uh log logically I have only one card right so that is the reason when I'm combining them uh I will write probability of a plus probability of B minus the common card or common event which occurs in these two set so A and B now if a and b is zero a and b can only be zero when the two sets are disjointed I will go back to my rule number two so if a and b are disjointed probability of a intersection B is zero When A and B are joined basically mean they have a common event then my probability of a or b a union B is equal to probability of a which is 13x 52 + 4x 52 - 1X 52 because I'm having one card which is common and I have calculated it over here so this will will be total of 16 sorry 17 - 1 which is 16 / 52 which is my third rule of probability I hope everyone of you are clear with this this is my third Rule now let's look at the one final rule which says probability of a plus probability of a complement is always equal to 1 let's try to understand that now the fourth Rule and the final rule is probability of a plus probability of a complement is equal to one what is the meaning of this we have already seen this but let's try to understand it again let's take a event or let's take a experiment the random experiment where I'm tossing two coins so toss of two coins and I hope all of you are able to now build the sample space which is we will get two heads one head one tail one tail one head one tail and one head and then two tails this is my sample space and this is a equally likely sample space right because each event is having same probab probability right okay now if I ask you so let's say event a represents both coins both coins lands lands up with same outcome both coin lands up with the same outcome what is the complement of this what is the meaning of complement the opposite of this this this statement so I'm saying both coin lands up with same outcome so the opposite of this is both coins did not or does not land up in the same outcome so let's write the compliment both coins does not lands up with the same outcome now what is going to be the S uh event for a it will be HH and Tail Tail the complement of this will be H tail head tail and tail head now if I add these two probability so probability of a is 1x two probability of a complement is also 1 by two so probability of a plus probability of a complement is also equal to 1 this is a very important rule because sometimes you need to calculate the probability of a sample space and you just know what one event then it is simply you can use this probability Rule and you'll be able to land up in the Pro probability that you're looking for I hope all of you are now clear with all the four rules of probability which is the addition of or the sum of all the probability over the sample space is always equal to one second one is probability of a union B is equal to probability of a plus probability of B if a and b are disjoint if they are not disjoint then you have to subtract the probability of a intersection B and final one is probability of a plus probability of a complement is always equal to one now let's try to understand the various types of events a particular event or two events can be dependent or independent these two terms might be familiar with you but now let's try to understand them with an example so now let's try to understand what is is particularly dependent events and independent events dependent first we will try to understand the dependent events when I'm saying events I'm basically talking about two events or two or more events right so let's take two events I'm tossing a coin so again my drawing skills so this is let's say this is a coin and this is a Fair coin Fair coin now if I toss this coin for two times so my experiment is to toss the coin for two times now when I toss it for the first time so this is my first toss when I toss it for the first time and let's say heads appeared so let me write it down uh when I tossed it for the first time and heads appeared so we know that when I toss the coin and it lands up into head you know this so I'll let me write that the outcome of the first the outcome of the first toss is head now when I'm tossing the same coin for the second time so second toss now remember the coin is same I'm having the same coin but I'm tossing it for second time now when I toss it for second time is it more likely that I would get a tail because the first one was head so my question to all of you is is it more likely that we would get a tail since the first outcome was a head now all of you would say no why because the next outcome can again again be having the probability of getting head and tail to be equally likely which basically means if I'm tossing it for first time I may either get a head or a tail and what is the probability of getting head and tail. 5.5 which is 1 by two 1X two if I toss the same coin for the second time again the probability of getting head and tail is 1x 2 1x2 which is. 5.5 so what I'm trying to say says if I toss the coin for first time and if I toss the coin for second time these two tosses are independent of each other the probability in in the second toss is not getting influenced with the outcome of the first toss if my outcome in the first toss is head in the second toss it does it it is it is not influencing right in the second toss it is it can be still be head or tail so in in simple words first of all the answer for this question is no so I'll write a big no over here and the the rule is the outcome of the first event is not impacting the outcome or the probability the outcome or the probability probab ility of the second event or the second toss and hence they are independent events independent events now there is a there is a very good so in in probability Theory there is a topic known as gamblers fallacy what gamblers think that if so let's say if the the game is of tossing a coin for 10 times now let's say there is a gambler who is given with the choice who is given with the game to toss the coin for 10 times for the first five times let's say he has got all Tails so tail tail tail tail and tail so for the first five times he has got all the taals now what gambler's fallacy is that he will think that for the next five times he will get a head so according to gambler's fallacy the outcome for next five tosses should be head head head head and head why because according to the uh probability of head and tail it should always be 0. 5.5 no that's wrong it is only true when you are tossing the coin for very large time that's known as the law of large number so this is known as gambler's fallacy this is wrong this is wrong this is only true this is only possible when you are tossing the coin for very large time so if I keep tossing the coin let's say for 10,000 time you will find out that approx approximately for 5,000 time you obtained a head and 5,000 time you obtained a tail but if you are tossing the coin for two times and you expect that I will definitely get head for first time and tail for second time that is wrong that is the gamblers policy so I hope you guys will not be landing into the gamblers policy and I hope you are clear with the dependent event independent event independent so when two events are independent the probability of A and B is simp simply as probability of a multiplied by probability of B so for example uh probability of getting head in first toss and probability of getting head in second toss is simply can be written as probability of head in first toss m multiplied by probability of head in second toss and this will be equal to 1 by 2 * 1X 2 which is 1x 4 I hope you guys are clear let me just quickly show you one example of gambler's fallacy which proves that when you toss a coin for very large time then only the probability of head and tail will be closely or approximately be equal to 05 and .5 now here you can see uh there is a very cool website which is uh seeing theory. brown.edu this is a open source website now it shows this is these are the true Pro probabilities basically mean the probability of head is 05 for tail it is 0.5 now I can flip the coin so if I flip the coin for the one time you can see that for the first time I obtained a head so my probability of getting head is 1 by one if I flip it for one more time I obtained a tail so 1 by two 1 by two if I flip it for one more time I obtain a head for one more time again aead for one more time a tail for one more time a tail for one more time a head now if I do it for 100 times I I optain uh out of 17 times head came for 57 times tail came for 50 times let me do it for one more handed time now you can see that now the probability of tail is more than the probability of head if I do it for thousand times so which basically means I have to hit this button for eight more times so one 2 3 4 5 6 7 and 8 now just look at this 4.49 51 very close to 0.5 and 0.5 right so this is the law of large number if I perform the experiment for very large time then my observed outcome or observed prob probability will be close to the true probability which is 0.5 and5 I hope this makes sense to all of you so now let's talk about dependent events so dependent events now for dependent events we will take a very simple example let's say I'm having a bag and this bag is having uh let's say two orange balls so we have two orange balls and let's say some white balls so for white balls I'll draw it like this one white ball two white ball and three white balls so total I'm having five balls in which there are two orange three white so let me write it down uh two orange and three white okay now the experiment is we take a ball out of the bag so take a ball out of the bag we have to note the color so note down the color and we put another ball we take out another ball of the bag so take one more ball out of the bag and then we note the color of that ball also note down the color now please carefully remember that this experiment is without replacement so let me just quickly demonstrate this example let's say this is my bag and this bag is having two orange balls and three white balls now what is the experiment I have to take a ball out of the bag so this is my orange ball and I note down the color so we will write orange and we keep it outside then we again take a ball out of the bag this is a white ball we will note down the color and we will keep the ball out of the bag so this is what is my experiment I hope everyone is clear now my question is we have to calculate the prob probability that I get a uh orange so o so first ball happens to be orange second ball happens to be white so what is the pro probability that I will get first ball as orange and second ball as white now let's solve this problem step by step when I am taking out the first ball so this is my first ball first ball when I'm taking out the first ball the bag is having three white and two Orange right so initial is three white and two orange now what is the probability of getting a orange ball or an orange ball probability of getting an orange orange ball this is simply be 2 ID 5 why because I have two orange ball and there are total of five balls now I I have taken this ball out of the bag so the bag is now so I will take this ball out of the bag this ball is out of the bag now now the bag is left with one orange and three white balls 1 2 3 now what is the probability that my second ball is a white ball so probability of second ball is a white ball but there is a condition given to us so given that given that so this symbol means given that the first ball was orange so we know we know that first ball was orange then what is the prob probability that my second ball will be white this will be 3 / 4 so because we have three white balls and and total are four balls now if I ask you what is the probability of getting orange and a white so what is the probability of a first ball as orange and second ball as one white now this will be simply as Pro probability of first ball as orange multiplied by probability of second ball as white given that first ball was orange so this would be 2 / 5 into 3 / 4 so this will be 2 * so this will be 3 / 10 this is my probability of getting first ball as orange second ball as white so this is 3id by 10 now here if you carefully understand the experiment the probability of second ball is dependent on the outcome of the first trial so here is a note the probability of second ball is is dependent on the probability or the outcome of the first toss or first trial or first uh I would say uh first uh first trial ism correct word so this is what is my dependent event is dependent events so when two events A and B are dependent on each other if they are dependent then the prob probability of A and B is given as probability of a multiplied by probability of B given a has already happened this is my probability of A and B now this is the dependent rule when two events are dependent on each other I hope this experiment this example uh makes sense to all of you let's take one quick example to understand the dependent case in a better way let's say I'm again having a bag this bag is having three black and I want all of you to take out pen and paper and try to solve this pause the video try to understand the logic and solve it so the the question is the bag is having three black and two red balls what is the probability of getting red red which basically means probability of red in first draw and red in second draw this is my question now let's try to logically solve it I want all of you to pause the video and try to solve it by yourself I hope you guys would have solved it already so let's say initially the bag is having two Reds and three black balls three black balls now the probability of the first ball being red is 2 divided by 5 right now this ball is out of the bag now so I have taken out the ball and I have kept it outside so the bag is now left with one red and three black so one red and three black now what is the probability of of getting the second ball as red given that I have already taken out the first ball as red given that first ball taken out was red now this would come out to be 1 / 4 because I'm left with one red ball and total balls are left are four so what is the probability of getting a red red which basically means red in first trial and red in second this will be probability of first ball as red multiplied by probability of second ball as red given first ball taken out was red so this would be 2 / 5 * 1/ 4 this is is 1 / 10 I hope you guys were able to solve this question now let's talk about random variables now random variables is something that quantifies the outcome of an experiment something that quantifies the outcome of an experiment now let's try to understand it in a better way so let's say I'm tossing two coins so my experiment is toss of two coins now the sample is space for this would be two heads head tail tail head and tail tail now how I can quantify the outcome what is the meaning of quantification converting the outcome into a number or mapping outcome into a number so let's say my random variable can be so RV assigning the variable with a sign or alphabet X so my X can be number of heads uh in a toss of two coin in a toss of two coins now this is my random variable and this will map the outcome of the random process which was toss of two coins into numbers uh I will get two 1 and zero how so if I write head head head tail tail head and tail tail now this corresponds to two heads so the number of heads when you toss two coins can be two heads these both quantifies to one because in both of these outcome I'm getting one head and this is mapped to zero because there is no head so 210 this is my random variable X similarly I can have a random variable y with number of tails in a toss of two coin in a toss of two coins so let's say y will also represents 2 1 0 right I can have two tails one tail or zero tail I can also have a random variable where I where where I have taken a random number for example let's say I have stated that if you get so there is a game let's say there is a game which says if you get two heads I will give you 10 Rupees so you you you win 10 Rupees you win 10 Rupees for any other outcome for any other outcome you have to give me you have to give me 5 rupees so you pay 5 rupees now in this case let's say I'm I'm interested in the amount I can win so X is my random variable X represents the amount or the the total price so the amount I can win and it can either be 10 Rupees so I write plus 10 because it is a it is a gain for me so + 10 and minus 5 because minus 5 represents loss so here what I have done if my experiment is having these four outcomes then this is mapped to + 10 and all of them are mapped to -4 five this is also a valid random variable now random variables are essentially divided into two categories a random variable can either be a random variable can either be discrete or it can either be continuous now what is discrete random variable or what are the discrete random variables well discrete random variables are those variables for which the outcome can be Quantified into integers which are countable for example number of heads in a toss of two coins I know it can be either zero it can be either one it can be either two but if I ask you about uh um you completing watching this video so how much time you will take to watch this video now it can be anything from minus infinity to plus infinity you can take 1 hour you can take 1 hour 5 minutes you can take 1 hour 5 minutes 50 seconds there are endless possibilities right let's take a simple example uh let me just go to our next page for discrete first of all we'll take example for discrete random variable the outcome the outcome are mapped to integer numbers which are countable we can simply count them for example number of heads in a toss of two coin or let's take one more example let's say there is a person who is trying to shoot a particular Target so this is my target here I'm having the person and this person is trying to shoot the target so here is the Target now let's say this person is trying for 10 times so if the person try to shoot the target for 10 times then what can be my random variable my random variable can be the number of times the person hits the bullly so my random variable X can be number of times the person hits the Bulls Eye now if you closely observe this can be a this can only take this random variable can only take integer numbers so I can only only get so the person can either hit the target or hit the bulli for all the 10 times maybe for nine times for eight times for 7 6 5 4 3 2 1 or zero 0er means the person was not able to hit the target for any time this is my discrete random variable can this variable take any value other than all these possibilities that I have written can the random variable X that that you can see on your screen can this variable can this variable take 1.7 no it cannot take 1.7 similarly this random variable cannot take 10.5 or maybe 9.11 so it can only take these limited numbers so this is why this is known as a discrete random variable let's talk about continuous random variable now continuous random variable can take many values the outcome are many possibilities so for example let's take first of all an example the time the time an employee takes to commute from home to office now this is my random variable X the time an employee takes to commute from home to office now this can be anything let's say we have defined a range um minimum is the person minimum takes 10 minutes maximum is let's say 1 hour or let's say uh we will keep the scale in minutes itself so we have written 10 minutes so this is 10 minutes and the maximum is 60 minutes now between these two minimum and maximum the between this range I can have endless numbers endless right there canot be a particular point that I can pinpoint I cannot say the person would take uh 10 minutes 5 Seconds uh 3 milliseconds 2 nond it can be endless possibilities right so this kind of random variable is known as The Continuous random variable so here x represents my continuous random variable when I talk about discrete random variables there are some properties and distribut distributions associated with them similarly for continuous random variable the reason we are talking about it is because we wanted to generalize them I will talk more about them I will talk more about how we can generate a distribution uh out of them but before that let's briefly talk about a very important topic known as expected value and uh we will take this example for to calculate the expected value what is the expected value so I'm just um taking a topic out of the random variable to talk briefly about expected value and then we will jump to distributions so expected values now when we talk about expected values it is basically the average value of the random variable average value of the random variable now let's say I'm I'm taking one example let's say the number of heads the number of heads heads in a toss of two coins now this is my random variable X and this random variable can take 2 1 and zero so either I can get two heads one head or zero head right I have already shown you this please remember this now what is the expected value of this random variable X so what is the expected value of x expected value is nothing but given as summation of I / x Pi I into x i I would write it formally as x i into p p probability of X is equal to x i what is the meaning of this so what is the prob so let me just quickly create a table with the random variable X and the probability of X so random variable X probability of X is equal to x i so the random variable can take value 2 1 and zero now two can come out from this event head head one is the event of head tail and tail head and zero is the event of tail tail so the probability of getting two is 1 / 4 this is 2 / 4 which is eventually 1/ 2 and this is 1 / 4 these are the pro probabilities so the expected value of x is nothing but the random variable multiplied by the probability of that random variable so 2 * 1/ 4 plus 1 * 1 / 2 + 0 * 1id 4 4 so this will be two times so I'll get 1 / 2 + 1/ 2 and this will be zero so the answer is one expected value is one what is the meaning of this this means on a long run if I perform this experiment for very long time if I keep on performing this experiment I'm tossing two coins again and again on a long run on an average this experiment this experiment would result in one head so on a long run you will observe that you are getting one head this is the mean mean of this random variable if if this is not making sense let's take a example in a game format and then you will be able to connect with it let's say now I am having the same experiment the experiment is toss of two coins toss of two coins I may either get head head head tail tail head and tail tail now we are mapping so my random variable X is now the amount I can win the amount I can win let's say two people are playing some game and the condition of this game is if the person tossing two coins or if the person tossing a coin for two times gets both of them are held the person wins 10 rupe for any other combination the person has to pay 5 rupees so the random variable X is plus 10 or minus 5 now I'll quickly create a table this is my random variable XI this is probability of X is equal to XI so person can win 10 Rupees so plus 10 person can lose 5 rupees so minus 5 what is the probability of winning 10 Rupees you can see the pro probability of winning 10 Rupees + 10 can only come when the person has obtained both head which is one divide by 4 right probability of losing 5 Rupees is getting either of the uh either of the event which is head tail tail head and tail tail so head tail tail head and tail tail and this is 3 / 4 now I'll calculate the expected value of x which is 10 1 / 4 + - 5 into 3 / 4 now this will be uh 2 * 2 2 * 5 this will be 5 / 2 minus 15 / 4 or it for easy calculation let's keep them let's let's not cancel out the four because the numerator is uh denominator is both four so I'll get 10 / 4 minus 15 / 4 this would be denominator uh denominator is 4 10 - 15 is - 5x 4 so - 5 / 4 which is - 1.25 now what is the meaning of this value this value basically mean on a long run if you play this game for very long time with your friend on a long run on an average you would end up losing 1.25 rupees per game so this is a loss for you what was my random variable the amount I can win the random variable ends up in a negative value which is minus 1.25 so on a long run if you perform this experiment for a very long time and you play this game with your friend then you will end up losing 1.25 rupees per game and that is what you should do so when you're playing a game please make sure the expected value should always turn out positive for you and uh you you would have uh heard about this term The House Always Wins The House Always Wins which basically mean in gambling whenever you go to any casino or any gambling house they always win on the long run so if so many people are playing the same game some of some of them will win while most of them will lose so on a long run it is a profit for the for the G Ling house or for the casino so some of them may win some of them may lose but on a long run when many people are playing the game in the casino the house will always win because their expected value is positive for them and negative for you and all the games are U made in such a way that the expected value is favorable to them so now let's talk about distributions what is distribution so when we have a random variable I hope you already know a random variable can be of two types either discrete or continuous so let's say I'm having a discrete random variable and first of all we are talking about the distributions for discrete I will take a example for both and then we will talk in details about the variety of distributions we have distributions are very important and please listen to this topic very carefully so when we talk about discrete random variable discrete random variable we know that the outcome can be discrete it can be integer right so let's take one example uh let's take one experiment of toss of two coins and I have taken this experiment for so many times so you already know that the outcome can be head head head tail tail head and tail tail right the random variable can be X can represents the number of heads and it can be uh two head one head or zero head now if I plot this distribution so I am having a plot in the xaxis I'm having the random variable and I already know this random variable can either take two 1 or zero here I'm having the count so assume that you performing this experiment of tossing two coins for for many times and you have noted down the number of times you obtained two heads the number of times you obtained one head the number of times you obtained uh let's say no head so let's say I'm creating a table and uh let's say you so this is my trial number one so I'll write one over here in the trial number one you obtained head head so here here I will write two in trial number two you obtained head tail so one in trial number three you obtained tail head so again one in trial number four you obtained tail tail so again zero and trial number five you obtain tail head so one trial number six you obtain tail head again one seven you obtain head head tail so one 8 you obtain head head so two then nine you obtain tail tail so one H sorry zero and 10 you obtain tail head again one now this is my outcome or random variable X now I will just count and create a plot so how many times I have obtained two heads so one one time two time so I'll assume that this is one this is two this is three and so on so for two times I obtained head head so I'll create a bar over here then how many times I obtained one head so one time two time three time four time five time six time so six time let's say it represents like this and then how many times I obtain no head so one zero and zero so two times this is known as the distribution of discrete random variable and we have a very good name for this distribution known as the pmf this is pmf this is known as the probability Mass function this the probability Mass function now please remember this that if I'm tossing one coin so let's say if I'm just tossing one coin and we have already created this Pro probability Mass function I hope you remember I I have shown you one uh website seeing theory. edu in that website you have seen that uh for the true probab probability so the true probability of getting head and tail when you toss one coin is is half and half 1/ by 2 and 1/ by 2 so the probability of mass function of the true probability when you toss one coin will be 0. 5.5 so this is head this is tail five5 but you when you actually perform the experiment this can change this can be anything this can be uh let's say I'm having more head and less tail so let's say this is 7 this is. 3 when you tossed one coin for 10 times so seven times you obtain head three times you obtain tail quite possible Right This is known as The observed this is known as observed this is known as true similarly for this example the true so what is the the probability of getting two heads uh it is 1x 4 probability of getting one head it is 1x two probability of getting zero head it is 1x4 how uh because I have mapped them please remember this prob probability is 1x4 prob probability of one head will be 1X 4 + 1X 4 which is 1x 2 and this will be again 1x4 so the true a pmf for this would look something like this where this would be uh 1x4 which is 0.25 0.5 and again 0.25 this is the probability uh BMF for my random variable X which is a true true probability this is the pmf this is for the discrete Rand random variable now let's talk about the continuous random variable and how the distribution looks like for continuous random variable if you guys carefully um try to understand the continuous random variable uh we cannot have the discrete type values or integer values but we can have range also we can have range right for example let's say the random variable is time an employee takes to commute between home and office now we have let's say we have defined the minimum and the maximum range let's say minimum the person takes 10 minutes maximum the person takes 60 minutes now I can have endless possibilities between these two numbers right let's say we have observed this so let's say I have taken one employ or let's say I have taken myself and every day I'm just taking down the number of minutes I took so let's create a t let's create a set over here this is my obs observed set for X so X observed The observed set for X is let's say in the first day I took 11 minutes second day I took 32 minutes then let's say 23 minutes then let's say um let's say 59 minutes then let's say uh 43 minutes then let's say um 29 uh 33 4 48 57 and 12 so 1 2 3 4 5 6 7 8 9 10 for 10 days I observed the number of minutes I took now can I take each of them as a as a random variable no because this is a continuous random variable is so my question to all of you is that are these the only possible outcomes which I can get from this experiment no we can have more outcomes if I perform this experiment for another 10 days I can get more values let's say someday I took I took uh 23 minutes 2 seconds so 23.2 let's say someday I took 11 minutes and 5 Seconds so there are endless possibilities so I cannot take each entry or each uh outcome over here as one random variable then what is the solution I would go with the range so let's define the range over here so range would be let's say from 10 to 15 15 to 20 20 to 25 25 to 30 30 to 35 35 to 40 40 to 45 45 to 50 50 to 55 and finally 55 to 60 and I will now just write down number of times I observed a value between this range or in this range so let's write the count so in between 10 to 15 I was having this value which is 11 and 12 so I write two over here between between 15 to 20 I don't have a value zero 20 to 25 20 to 25 I'm having one value so I write one over here 25 to 30 25 to 30 I'm having one value so I write one over here 30 to 35 so one value over here one value over here so two values 35 to 40 35 to 40 I don't see a value so okay 30 to 35 I was having no value so I should write zero over here 30 no 30 to 35 I'm having 33 and 32 yeah two so this was correct ignore if I'm doing something wrong because I'm just trying to quickly fill this table 35 to 40 um 0 40 to 45 40 to 45 one value over here so 1 45 to 50 one value over over here 50 to 55 I don't see a value 50 to 50 55 to 60 one value over here one value over here so two value now when I plot this I will plot it as a range so my X is over here which is again the time taken by an employee to commute between office to home here I'm having the same count but here I will have the range so it's starting from 10 15 20 25 30 35 40 45 and so on till 60 and then I will have this uh bars which is known as the histogram this is not the bar chart this is the histogram so 10 to 15 let's say two like this then 15 to 20 I'm having zero so I line like this then 15 20 to 25 I'm having 1 and then one so let's say like this and then um then again I'm having two and then again I'm having zero then again one and one and so on now this is basically my probability density function PDF probability density function now why I'm calling it as a density because if I ask you what is the probability of uh a person takes the time between 10 to 15 minutes I'm not talking about one point I'm talking about a range which is density so that is the reason here the distribution is known as the density and please remember the probability of a one point one point in a continuous distribution or continuous random variable is always equal to zero if I ask you what is the pro probability that uh a person will take 10 minutes that would be zero the pro probability zero the reason is we cannot calculate it we cannot calculate it because uh the way we are calculating the continuous random variable is by using a measure here we are using a measure as the time taken by a person right and 10 minutes is something that a person can take endless number of time so we don't know how many times a person can take 10 minutes divided by we know so how many times a person can take 10 minutes the pro probability of 10 minutes would be number of times s the person takes 10 minutes divided by uh total total total time so total time we know total time is 60 minutes but how many times the person has taken 10 minutes I don't know the answer this can be Infinity also right I don't know that so the pro probability will end up becoming zero the probability of a point in a continuous random variable is zero and the reason we call it as a probability density function is because we are not talking about one point probability we are talking about a range now within discrete random variable the distribution for discrete random variable and the distribution for continuous random variable we have some very general distributions some standard distributions that are very important in the field of statistics and probability so that is what we are going to learn next the various type of standard distribution that comes under both discrete and continuous so we have talked about discrete random variable and continuous random variable and we have seen very simple example of how we can create distributions for discrete and continuous now there are some standard distributions that comes under discrete and continuous so what we are going to do we are going to talk about these standard distributions so the very first we will first talk about discrete So within discrete the discrete random variable we have two very important distributions one is known as barnoli and the second one is known as binomial so boli distribution and binomial distribution now what is the boli distribution or we will first talk about the berol experiment so when we talk about the Bol EXP experiment the barnoli experiment is a particular experiment in which number one the outcome is always binary so the outcome is always binary which basically means the outcome is either head or maybe win or loss head or tail so let's say head or tail win or loss or maybe one or zero so it will always be binary the second one is the number of Trials is only one so we are just performing the experiment for one time so the the number of Trials is only one so for example let's say I'm having a coin and I tossed this coin for one time so toss of this coin for one time now can I consider this as a b barol experiment yes why because when I toss this coin I may either get a head or a tail so the outcome is binary now since the outcome is binary this experiment is the berol trial so why I'm calling it as a trial because I am performing the experiment for one time so I can either get head or tail so the Bern distribution looks like very simple we have already seen it but I'm drawing it again let's X represent uh the number of heads when you toss a coin for one time so you may either get a head so let's say I obtained a head and for tail it is nothing so this is basically my Bol distribution boli distribution now please remember one thing there is nothing difficult to remember about Bol distribution it is just a particular experiment which is having binary outcome and one trial binary outcome one trial now binomial is something very much associated with barnoli so when I perform the Bern experiment for n number of times then we get a binomial experiment so what is binomial binomial binomial experiment so when when the barnoli experiment when the berol experiment is performed for n times we get binomial experiment we get binomial experiment this is my binomial experiment let's take take one example let's say I'm having a bag and we will try to understand the binomial experiment so binomial experiment is nothing but when I'm performing the boli experiment for end times it becomes binomial again the outcome of each trial in binomial experiment is binary so again assume that you are having a coin and you are tossing it for 10 times so since I'm tossing it for 10 times it becomes binomial but the outcome of each trial is going to be either head or tail which is binary and that is the reason it is a binomial experiment so we have n balol experiments when I combine them it becomes binomial now let's understand how to calculate the probability of binomial experiments so probability of binomial experiment okay now let's say I'm having a bag and this bag is having let's say two orange balls so two orange balls so let's say the bag is having three orange ball and two white balls so one white ball two white balls so three orange and two white now my random variable X is the number of heads oh sorry my random variable X is the number of orange balls I may get out of this bag number of orange balls I may get out of this bag if if the experiment is performed for three times now if you're not able to understand it let's try to take the example so the experiment is we have to take the ball out of the bag note down the color so what's the color orange we will note it down and put the ball back to the bag and I will do this for three times now is this a b Bern trial when I'm taking the ball out of the back for one time is it a burnol trial yes why because I can either get a orange orang ball or not get a orange ball so first time I obtained a orange ball this is my f one Bol trial and I place the ball back to the bag again I did the experiment I obtain again a orange ball and I note down the color I put the ball back to the bag and I did it again I again take a ball out of the bag it came out to be orange I note the color and I put the ball back to the bag now what is happening I have performed the bunol experiment for three times so each time I was taking the ball out of the bag it was a boli trial and I have performed the boli trial for three times so let me just note down the uh outcomes uh first of all let me note down the experiment the experiment is take a ball out of the back then note down the color of the ball and then put the ball back to the bag back to the bag now we have to perform this experiment for three times now we have understood the experiment right now let's say I'm interested in finding out the probability that I will get exactly two balls when I am performing this experiment for three times so two orange balls two orange balls in three trials now what are the possible combinations in which you can get two balls or two orange balls let's note it down so this is my trial number one two and three let's say for the first two trials I obtained Orange so let me write oo and then you obtain a white ball so white then for the first you obtained orange then white then orange then for the third you obtained white Then followed by Orange and orange so now these are the three possible combinations I may get out of this experiment now what is the experiment I am taking a ball out of the bag noting down the color putting the ball back to the bag performing this experiment ment for three times my objective is to find out the probability that I will get two orange balls in three trials when I'm performing this experiment for three times these are the three possible combinations now if you carefully observe each trial is independent of the other because every time I'm taking a ball out of the bag the bag is still having five balls because we have total of five balls right so what is the probability of getting orange orange white it will be simply probability of orange multiplied by probability of orange multiplied by probab probability of white similarly for this orange white orange would be probability of orange multiplied by probability of white multiplied by probability of zero sorry orange then white orange orange would be probability of white multiplied by probability of orange multiplied by probability of orange now probability of orange is 3 divided by 5 probability of orange is 3id 5 probability of white ball is 2 divide by 5 so this will be 3x 5 into 3x 5 into 2x 5 this will be 3x 5 into 2x 5 into 3 / 5 and this will be 2x 5 into 3x 5 into 3x 5 so if you carefully observe I can write each of them as 3x 5 to the whole square into 2x 5 I can even write this as 3x 5 to the whole sare into 2x 5 I can write this as 3x 5 to the whole squ into 2x 5 now what is the probability of getting two orange ball in three trials I'm having three possible combination so it can be either this or this or this so when I'm having a all term that basically means I have to add all of them right so it will be Pro probability of O comma o comma white which is plus probability of orange comma white comma orange plus probability of white comma orange comma Orange now we have already calculated everything so it is I'm just taking it over here so it will be um 3x 5 to the whole SAR into 2x 5 + 3x 5 to the whole sare into 2x 5 + 3x 5 to the whole s into 2x 5 now all the three ter terms are similar so we can just simply write as 3 * of 3 by 5 2 the whole s into 2x 5 now this particular equation can be generalized for any binomial experiment how let's see the generalized equation is n CX PX Q to the power nus x this particular equation can get you the probability of any binomial experiment what is n n is the number of B only trials boli trials X is the number of success and and trials p is the pro probability of success and Q is the probability of failure so if I try to map this equation for this particular experiment where I have calculated the pro probability is 3 * of 3x 5 to the whole s into 2x 5 um let's try to do it in next slide so the formula is ncx PX q^ n minus X now in the last experiment we were performing the trials Bol trials for three times here I have written it right so my value of n is three now what is X x is the number of success now if I consider getting a getting an orange ball as a success then I was interested in getting two orange balls right so my X is two because I'm interested in getting two orange balls probability p is probability of success which is nothing but Pro probability of orange and orange probability is 3x5 and Q is the probability of failure which can be also written as 1 minus P this is given as 2x 5 now if I put everything on the formula this will be 3 C2 U 3x 5 to the 2 into 2x 5 to the power of 3 - 2 which is 1 uh the N CX factorial this particular term can be expanded as n factorial upon X factorial n minus X factorial so 3 C2 would be 3 factorial upon 2 factorial 3 - 2 factorial which is 3 into 2 factorial / 2 factorial into 1 factorial 2 factorial will cancel out two so it'll be left with three so 3 * of 3x 5 to the whole square into 2x 5 and here you can see we have already done this manually so using this formula you can find out uh the probability of the number of success in N Bern trials the number of success which is x in N BN trials which becomes binomial right let's take one more example a quick example uh to help you understand this so I'll just create a portion over here so let's say uh there is a person who is trying to shoot a Target here we have a person this person is trying to shoot a target for for 10 times for 10 times so n is 10 now is this is this particular experiment consist of a boli trial yes because when the person is trying to shoot the target it will either hit or miss so it is a binary experiment which is the outcome is binary and if I consider the experiment to be performed for one time it is a burnol trial since I'm performing this experiment for 10 times it becomes binomial experiment so uh let's say the person is trying to hit the target for 10 times the probability ility that he will make a hit is point uh let's say um 4 so there are 40% chances that if a person try to try to make a hit to the Target it will hit the target 40% of the time so the probability of Miss will be 6 right now I'm interested in probability that he hits the target exactly for five times so I'm interested in five success in 10 Bol trials so my x value will be five I quickly apply the formula n CX PX q to^ n minus x n is 10 C 5 probability of hit is0 4 to the^ 5 and then 6 to the power of 5 you can solve this and you will get your answer what is the probability that if the person is trying to hit the target for 10 times he will make a hit he will make a exact hit for five times and this will be your pro probability this is the binomial probability now if I calculate all the probability for all the random variables in this case in the case of hitting the target what is my random variable my random variable X is the number of times uh this person makes a hit makes a hit now the person can make a hit for all the 10 times for nine times for eight times for seven times six times or maybe for zero times so these are all the random variables now against each random variable if I calculate the probability and I plot them so let's say here I'm having X here I'm having probability of X is equal to x i and if I calculate all the probab probability what is the probability that he will hit the target for exactly 10 times for nine times for eight eight times for seven times and so on and if I plot it in a mass function so here I'm having X here I'm having let's say probability P of X is equal to x i then this will become my this will basically become my uh pmf of binomial probability of binomial experiment I hope everyone is clear about this we can have a quick Excel example to demonstrate you how this particular pmf looks like so let's look at a Excel example now here I'm having a Excel work sheet let's say this is my X which can be uh 0 1 2 3 4 5 6 7 8 9 and 10 what is the meaning of one uh zero that that basically means person was not able to make any correct hit so if the person was trying for 10 times he was not able to hit for any time the the the the number of successful hits was Zero what is the meaning of one one basically means when the person was trying to hit for 10 times or maybe shooting for 10 times he was able to hit the target only for one time so that's what my X is I'm now interested in calculating the prob probability of X is equal to x i so let's do that uh we will use the formula um the combination formula so combine is the combine is the function that we have in Excel and uh the total number of Trials are 10 x is this multiplied by So ncx Pro probability of success was 4 4 to the power of X4 to the power of X and I will put this in a bracket so that nothing is getting wrong and this is multiplied by um failure so Q so6 to the power of nus X so 10 - x and I close the bracket and I think everything is fine let's expand this formula I have the probabilities now let's plot it so I'll go to insert I'll choose a chart maybe a column chart and here we have the probability which is wrong actually it is plotted very wrong um I should choose only I think we if I choose this also uh let me try it for one more time otherwise we have to manually do it okay it is not coming correctly so let's choose only this one uh insert and this chart now here U first of all this is not one this will be zero this will be one because uh we haven't selected the X so assume this is zero then this is one and so on now you can see that the pro probability that a person will hit for four times is highest four times is highest and here you can see in the in the result also that the person hitting the target for four times is highest and then as we move forward it becomes lower and lower and lower and if I go uh below four it is becoming lower and lower and lower so using the probability Mass function of any discrete random variable we can answer a lot of questions so if I ask you what is the probability that the person will be able to hit or the person will be able to make at least five uh at least five correct hits or at least five successful hits so at least five means I'm interested in the probability of X greater than or equal to 5 which basically means I'll just simply take the summation of all these probabilities which is equal to 0.36 so there are there are 36% chances that the person will hit at least for five times what is the probability that the person will be able to make the correct hit at most five times at most means less than or equal to five which basically means I'm interested in all these probabilities which is 83% which basically means the person will be able to make at most five correct hits 83% of the time right now using this pmf of various distribution various discret random variables we can answer a lot of questions right and that is the reason these distributions are created but we are interested in one such distribution that is the I would say that is a very important distribution in the entire statistics and entire machine learning that is known as the normal distribution which comes Under The Continuous random variable so we have completed a discussion around uh discrete random variable under which we talked about Bern experiment and the Bern distribution then binomial experiment and the binomial distribution and this is the binomial uh distribution now let's talk about the continuous random variable and within that the most important distribution is normal distribution if you guys are targeting for any data science job and you have to uh you have to attend your interview for the data science field then this topic is one such topic that you will 100% find out in your interview it can be in any way but it will be around this topic so please listen to me very carefully because this is really important when when it comes to data science interviews so when it comes to continuous random variable we have a distribution known as the normal distribution and normal distribution is something that looks like a bell shaped curve something like this now what is the meaning of this so if I if I assume that the x- axis represents uh income and the y- axis represents the count or the probability then within this range within this range I have highest number or the highest occurrence of the observations so I have maximum number of people in the in the midrange in the mid income range this is known as the center of my distribution now what I have drawn right now is a normal distribution but in reality normal distributions can be of variety of shapes as of now you can see it is a very symmetrical shape but in reality you might get a very symmetrical shape of a normal distribution or you might not get so in order to understand whether your data is following a normal distribution or not we have something known as the standard normal distribution now why we have a standard normal distribution a standard normal distribution will help you to compare few properties in your original data so let's first of all let's first of all understand what is a standard normal distribution what are the properties of standard normal distribution and then we will try to compare our data with the standard normal distribution to generate some insights let's try to understand the standard normal distribution now here I'm having a particular standard normal distribution the advantage of standard normal distribution is that it is perfectly symmetrical which basically means this is my Center this is my Center which basically means this is having the mean this is my mean point so mean median and and mode all the three quantities fall in the center I'm having 50% data below this 50% data above this so it is perfectly symmetrical half here in the left side half here in the right side now the numbers you look into the xaxis are known as the Z scale we will talk more about it don't worry I will explain explain you everything what is Z scale how we got these numbers don't worry as of now just try to understand what is a standard normal distribution so number one is the mean is the mean and the median and the mode all these three properties lies in the center it is perfectly symmetrical 50% on the left 50% on the right which basically means 50% data is on the left 50% data is on the right according to the standard normal distrib distribution property 68% of the data lies between plus minus one standard deviation 95% data lies between plusus 2 standard deviation and 99.7% data lies between plusus three standard deviation now what is this this is one standard deviation one standard Dev below one is standard deviation above so between these two points I'm having 68% of the data between these two points I'm having 95% of the data and between these two point from this point till this point I'm having 99.7% data if you're not able to understand how I obtained the Z scale and what I'm talking about 68 95 99 7 let's try to understand it with a real example first of all let's try to understand the Z scale so what is Z scale for this we have to jump into topic of scaling scaling now don't worry we have a topic which we are talking about which is normal distributions we have talked about standard normal distribution we have looked into some of the properties but there is one confusion the confusion is what is Z scale and what how we have obtained these numbers and what is one is standard devision below one standard division above so we will pause this topic as of now just just create a barrier over here we'll try to understand this small topic which is scaling and the Z scale once we are once we are clear with this then we will bind everything and you will be very much clear so stay with me for another 10 minutes and you will be clear with everything so what is scaling now the problem is let's say I'm having two variables age and income let's say age is uh a person is having age of 27 and income of ,000 something like that now can I compare these two values can I say this is a very low value and this is a very high value no essentially we cannot why because both of them are having a different scale age is in the a age is in the scale of year while income is in the scale of dollars so I cannot compare these two values they are they belong to two different scales and we cannot compare them right but what if I want to compare them then we have to make them into a common scale right and that is what scaling is known as so scaling is a process of converting the data into into one common scale using which I can compare various values there are two common approach which we follow in scaling one is known as the minmax scaling also known as the normalization scaling the second one is known as the standardization of the Z scaling the Z scaling in the minmax scaling I try to com convert the complete data into a scale of 0 to 1 where minimum is 0 maximum is 1 the formula is very simple x - Min ided by Max minus Min I'll show you I'll show the show you the example in Excel as of now just look into the formula where X is the value I want to convert so X is the value we want to transform or convert now here Z in Z scaling the formula is x minus standard deviation sorry xus mean / Sigma what is X x is the value we want to convert value we want to convert mu is the mean of the column mean of the variable and sigma is the standard deviation of of the variable so we have this formula now let's try to apply this formula in the in a actual data in the Excel file to make sense out of it and then you guys will be very clear about how we obtained this Y axis this is the major question right we have the normal distribution standard normal distribution topic pared as of now we are trying to understand scaling and the main question that we are trying to to answer right now is how this Z scale came into picture and once we are clear with this then this is standard normal distribution will make more sense to you right so keep this topic parked we are talking about this topic and now we are moving forward to understand the scaling Topic in Excel with particularly a data example so let's do this now here you can see we have a data which is having age BMI and charges is the insurance charges I cannot say this that this value is very low this value is middle and this value is very very very high why because age is in year BMI is in kg per meter square if you know BMI unit is kg per meter squared and charges are in dollars these all three are in different scale I cannot compare them right let's try to use min max scaling so for min max scaling I need the minimum value so let me insert quickly a uh some rows so let's say um here I'm having minimum maximum so I'm I'm just calculating the minimum for age oh sorry is equal to minimum of of the column age so I'm having the minimum value to be 18 then calculate the maximum of the column AG the maximum value is 64 similarly uh I'll call this column as age underscore normal similar I'll do it for BMI BMI underscore normal and then for charges charge underscore n I'll just find out the uh so let's quickly do one thing let's uh um shift these columns and let's shift this column also so that I can simply just extend the formula and I do not have to write it again and again now if I just simply extend this formula I get the minimum of age BMI and charges and this will give me the maximum of age BM and charges because age BM and charges are one after the other so I just extended the formula right now let's apply the formula is equal to uh X which is this value I want to convert minus minimum so I'll just freeze this value so I want I will put a dollar x - Min ided by the value Max and then minus and then Min and I will so let's do one thing let's not freeze I so that if I extend the formula this is not getting fixated I'll just uh fix the row dollar four and dollar three and let's enter let's try to apply it on the entire column and let's try to extend it to BMI and charges also and let's try to extend it to all the rows now now all these three columns are in a scale where minimum is zero maximum is one so assume that you have a number line minimum is zero maximum is 1 now the middle point will be 0.5 so the mean is 0.5 so any value which is close to zero you can assume that the value is very low any value which is close to one you can assume that the value is very high any value which is around 05 is the average value right now the scale is very clean and all these three columns age BMI and charges are now in this scale the normalized scale let's try to compare the value if I look at this person the age of this person is zero which basically means this person this person is is the person who is having the lowest age in my data the age is zero means the person is having the lowest age the minimum age then 47 is the BMI which basically means the BMI is around mean I as I told you five is the Middle Point 47 is near to.5 right so BMI is around average while the charges are 0.009 which is very close to zero right so what I can conclude for this person this person is a uh young person very young having average BMI and very low charges right let's take one more person let's say this person the age is91 which is close to 1 right .91 is close to one so age is .91 BMI is26 close to 1.1 so 26 and then charges are4 which is close to the middle value so if I talk about this person this person is having very high age uh not very high BM so BMI is below average but charges are around average right uh let's talk about a person who can be a outlier or something let's see if I can find out some interesting value so that it makes some real sense okay if I look at this person this person is having 05 as age middle age the BMI is for this person is below average but the charges are very high for this person you can see this person right so I hope now you guys are able to understand how to read values in a normal scale normal means minimum is zero maximum is 1 and the formula is x - Min / Max minus Min in the Z scaling which is standardization we need the average we need the standard deviation and once I do this let's do this first of all so this will be my age hyphen Z and this will be my um BMI hyphen Z and then charges hyphen Z now let's apply the formula is equal to so first of all we have to calculate the mean so is equal to average I need average of the age column I will just extend this formula to the next cell next two cells then I need the standard deviation so I'll just use this a standard deviation of the BMI AG column then I will extend this to the next cell and now we will apply the formula is equal to xus mean so x minus mean and I will again just freeze the row x minus mean divided by standard deviation this is my standard deviation I will freeze the row and enter now let's extend it to all the rows I don't know why it's not happening let me try it one more time okay I don't know why it's not happening so I have to do it manually and then I will extend this formula to the next column also so let's do it like this and this one also okay now in the Z scale in the Z scaling my middle point is zero then I'm having min-1 -2 over here + 1 + 2 and + three over here and minus 3 over here what is these value what are these values these are known as my standard deviation this is my mean so standardization or Z scaling will convert your data in such a way that now the data is having a mean of zero and the standard deviation of one so if you get a value close to zero that basically means you are in average if you go on the positive side you are moving above average and the higher you move that many standard deviation you are above the average so let's say the value is + 1.2 which basically means you are one standard deviation above the average if let's say the value is minus three that basically means you are three standard deviations below the average so Z scaling will convert your complete data into a scale where the center becomes zero the mean becomes zero and the standard devation becomes one unit one unit on either side for min max scaling it was something like this 0 1 and middle is 05 right let's um quickly compare it and try to check the Z scale so here if I look into this person this person is having BMI of 0.5 0.5 means very close to zero right so 0 is the middle five is very close to zero then Min - 1 .5 is the age which basically means from zero the person is 1.5 standard deviation below the mean then Min -9 charges from the mean the person is .9 standard deviation below the mean so this person is having normal BMI because the value is close to zero very low age and very low charges because both the values are on the negative side so you can see that uh with standardization since I'm having positive negative sign and middle is zero it becomes very easy for interpretation if the value is positive I know this is above average if the value is negative I know this is below average the value is positive which basically means it is close to zero that is the advantage of standardization now this is the Z scale which you guys can find out in my standard normal distribution so if I go back to my standard normal distribution this this is what the Z scale is so the middle is zero I have one standard devation below the mean one standard division above the mean and so on now according to the property of uh standard normal distribution it states that 68% of the data is plus minus one is standard devation above or below the mean 95% data is plus -2 standard deviation 99.7% data is plus - 3 standard deviation let's TR TR to quickly prove it let's let me prove it for that first of all we have to check which particular column among age BMI and charges follow normal distribution so I already know that BMI follows a normal distribution which basically mean when I plot it it would resemble a normal curve a bell-shaped curve which is symmetrical you will get half data uh from below the center and half data above the center and since I have converted the BMI column completely into a Z scale or standard scale now just see the magic uh I have selected the BMI column and now we will go to the insert option in this uh Google Sheets uh go to chart and we will select the histogram now if you closely observe this histogram you can see that this point is the center 0.0 0.03 is the center and below this value I'm having negative points Above This value I'm having positive points now you can see that the data is spread around minus 2 standard deviation which basically means below uh it will it is distributed two standard deviation below the mean and four standard deviations above the mean now you can you can you can see that it is not perfectly distributed two standard deviation below four standard deviation above so here I'm having less data here I'm having more data or the spread is more on this side the positive side this is one thing we can understand this is a normal distribution but this is not standard normal so in order to compare this distribution with something we need a standard thing right like say if I have to compare my performance in any exam I need some standard score right standard score is 100 so maximum is 100 the best is 100 so whenever we try to compare something we need a standard right similarly in order to compare the distribution of a continuous random variable which appears to be normal I need a standard normal distribution and this is where my standard normal distribution comes into picture this is my standard normal distribution my standard normal distribution says that your mean median and mode will lie in the center and if it is not lying in the center if your mean is not equal to mode is not equal to median your distribution is not a standard normal distribution it will be either left skewed or right skewed these are the two terms which I'm introducing I will explain you as we move forward don't worry but as of now we are trying to understand the properties of normal distribution and here here I have already written the three properties which is 68% of the data lies uh across one plusus one standard deviation above or below the mean similarly 95% two standard deviation above or below the mean and 99.7% data lies across plusus three standard deviation right now what we are going to do we are going to quickly prove it so here we have the data I'll just quickly close this I'll copy this data into new sheet over here now we will calculate the mean uh median and mode and for Simplicity purpose let's not use the Z scale let's use the original scale so that you guys are able to understand this so I'll just uh replace this column with the original data BMI what is the average BMI is equal to average of this column what is the median so M and median of this column and what is the mode m o mode of this column and here we have so now you can see that mean is 30.6 median is 30.4 mode is 32.3 three these values are approximately close to each other not exactly same so first impression about this distribution is that it is not standard normal distribution but it's something close to standard right but how much close let's try to understand that first we will prove the property the property is that 68% of the data fals across plusus one is standard deviation right so let's quickly find out the standard deviation so STD DF is equal to STD so this is my standard deviation of this data it is 6.09 standard deviation if I go plus one standard deviation below the mean so let me create a table over here 68% I go so this is my mean minus minus let's say this is my lower limit this is my upper limit I'll explain you everything don't worry don't get overwhelmed and this is my count this is my total and let's write 68% over here and this is mean um so one is standard deviation so now what we are going to do we are going to take mean and subtract one is standard deviation so one is standard deviation so mean minus one is standard deviation the lower limit is 24.5 then mean plus one standard deviation the upper limit is 36 now I will count how many values are falling in this range because my objective is to find out the percentage so I'll quickly apply a filter over here and I will create a filter by condition and my condition is is between is between uh the lower value is 24.5 and upper limit is 36.7 and I'll just quickly create a filter now I'll just simply count it if you carefully look over here the count is 900 so there are 9 00 values that falls in this range what are the total number of values for that I have to remove this filter so I'll say none I'll say okay and I will calculate so 1,338 is my total so 1,338 now let's calculate the percentage percentage sorry percentage so my percentag is is equal to count divide by total and this is 67 if I convert this into percentage it is 67.2% the standard value is 68% my data is resulting in 67.2% this is the first proof of the first property the first property says that 68% data lies between plus- one standard devation I'm getting 67.2 because it is not standard normal right the second property is 95% so 95% data lies between plus- two standard deviation let's check the lower limit so is equal to mean minus 2 times of the standard deviation then is equal to to mean plus two * of a standard deviation and now I will count so I will go over here my condition is between my lower limit is 18.4 upper limit is 42.8 I'll click okay and I will now count the count is 1281 data points 1281 so let's quickly write it down over here 1281 1338 is my total I'll just extend this formula to the next the below cell 95.74% what I'm able to achieve 95.74% my standard normal distribution according to the standard normal distribution around 95% values in my data point will lie between plusus two standard deviation and here you can see final one is 99.7% of my data Falls between plusus three standard deviation so is equal to mean minus three times of standard deviation is equal to mean plus three times of the standard deviation now I'll calculate how many data points I'm having in this range so 12 3 12.3 to 48.9 and I'll click okay I'll just quickly count it is 1334 and total is 1338 so 1334 this is 1338 and I will extend this formula to the next row and I'm getting 99.7 which is exactly the same right so this is the proof of these three properties that 68% of the data lies between plusus one standard division 95% plus -2 standard deviation 99.7 in plus - 3 standard deviation now if I am given with a particular value so let's say um I'm given with a particular value how I can find out that what is the probability at that point if you're not able to understand my question don't worry at all let try to let me try to make you understand what I'm asking you this is my standard normal distribution this is my middle point which is zero zero means the mean is zero what is the probability that a data point will lie below this point what is the probability it is 0. five and above this point it is 0. five because as you already know that if I'm having uh if I consider the complete probability so what is the probability that a data point will lie anywhere in this particular range it is one right we know the minimum range we know the maximum range I know the probability that my data will lie somewhere in the middle so I know it is going to be one what is the probability that it will lie below mean 50% above mean 50% if it is a standard normal distribution so in order to find out the probability at any point point from the left we have something known as a z table Z table is basically a standard normal distribution table a standard normal distribution table which helps us to find out the cumulative probability from lower point to this point if I ask you what is the probability that uh a data point will have will will lie below this point so using the Z value so let's say the Z values over here is minus one so using the Z table we can find out this probability which is the area under the curve now let's see one example let's say a set of biology marks in a class are normally distributed with a mean of 70 and a standard deviation of six a standard devation of six points now let X repr presents the score on randomly selected person so let X represents the score of a randomly selected student then what is the probability that the marks of this student would be between 64 279 now this question is very simple to solve we just need the properties that we have learned let's create the normal distribution first of all this is my normal distribution the mean is 70 standard deviation is six which basically means I'll go one standard divion below so I'll get 64 one is standard above so I'll get 76 two standard devation below 58 two standard devation above 82 now what I need I need the probability between uh 64 which is this point 64 which is this point 279 which is between 76 and 82 right so from 76 if I go half standard deviation above this will be 79 right so this is the area which I'm looking to find out let me just highlight the area this is what I'm looking to find out I hope you guys are able to look at the area it looks like I'm coloring uh a book but I hope this makes sense now this is the area which I'm interested in now if I go with the logic and the um the properties that we have learned so from this point till this point my uh the percentage of data point so the probability of a data point between 64 to 76 is 68 6 68% we know the property right now from this point till this point from this point till this point and from this point till this point what is the uh area under the curve so if this is 68 then this point will be or let's let's not look at the complete curve let's look at the middle portion only so from 58 to 82 from 58 to 82 So Pro probability of data point between from 58 to 82 is 95 95% right if I subtract this middle portion this point I will get this portion and this portion and I need the half of this portion right as we know that this part and this part is going to be exactly the same right so let's call this as C and let's call this as C let's say Point uh 68+ 2 C is = to .95 so 2 C is = .95 -68 C is = 95 -68 / 2 this will give me c this part right so let's calculate C so this will be35 now I'm interested in half of it right so my final answer will be probability of a data point between 64 4 to 79 I'm interested in this point will be 68 which is 68 +35 / 2 so 68 + 675 which is 7475 this is my final answer so there is there are 74.7% chances that a person who is randomly selected in the class will have the marks between 64 to 79 now in order to solve this question we have used just the property of a normal distribution which states that 68% of the data lie between plus - one is standard deviation 95% between plus -2 and 99.7% plus - 3 I hope this is making sense right now let's take one example which will utilize the Z table for which in order to solve this we need the Z table so the question says the scores of zat or G Mt zat are roughly normally distributed with the mean of 527 and a standard deviation of 112 what is the probability that a of an individual is scoring above 500 so my first question is probability of and individual is scoring above 500 now in the previous question we were able to solve it because it the the question the numbers that was given in our question was in the terms of any standard deviation so the standard deviation was six units I was given the the minimum and the maximum range in such a way that I was able to make calculations I I was able to use element math to get to the point right but now we are not able to adjust it because 500 and then standard deviation is 112 it is really difficult to use the logic and use Elementary mat to get it done in this case z tables are very useful let's solve this question I know that the zat is scores are normally distributed with the mean which is 527 and standard deviation is 112 now I will find out how many standard deviation above or below this value is lying at so we will use the standardization formula we will find out the corresponding Z value which is nothing but x - mean / Sigma X is 500 the value to be scaled minus mean 520 7 divid 112 which will be -24 so if I consider the Z scale this value will be zero and Min -24 is over here below meus 224 now what is the probability that the person will be able to score above 500 in which area I'm interested in so if this point is -24 then I'm interested in the area above this which is this area the probability that the person will be able to score above 500 so I'm interested in this area this particular area now how we will find it out so using the Z table Z table will be will provide me the a under the curve from this point all the way up to this point remember Au that you will get from the Z table will always be from the left so once I have the area which is the probability from this point all the way up to this point I'll subtract it from one and I'll get my required answer let's check the Z table now here we have the Z table and if you read this is the standard normal distribution table value represents area to the left of of the zcore so if I if this is my zcore left of the zcore I'm interested in the value which is minus 2 0 point so let me just cross check it minus 0.24 let's check the area under the curve so minus this is 0.2 so my we have to first read the row and then the column so minus 0.2 and then 4 this value over here so Min - 0.24 the the intersection of these two values will lie over here I think I uh yes so this is 4517 4517 so this this x is 4517 I'm interested in this this so what it will be is equal to 1 minus probability of X less than or equal to 500 which is 1 minus probability of 0.40 517 the answer will be. 59 they are 59% chances let's say there is one more question in the same in the same example how much minimum marks should the individual score in order to uh be in the top 5% now this is a very interesting question and if you guys can understand it you have understood the complete normal distribution top 5% if this is my middle value top 5% means below me or below that point 95% data is there so if I draw the normal distribution and below so let's say this is the point which I want to find out and below this point I'm having 95% of my data right so I know that I want to find out that Z value x - mu / Sigma I I need to find out that X where the Z value is this so what is the zv value at 95% let's look into that the Z value at 0.95 so I have to look at the area now to find out the Z value so 9505 is over here uh this is 1.6 and 5 so 1.65 is my Z value so the Z value over here is + 1.65 so I know the Z value + 1.65 is equal to x - mean is 527 divid by standard divion is 112 again the elementary math 1.65 multiplied 112 is = to x - 527 so my X will be 1.65 multiplied 112 + 527 and this would be 71.6 which basically mean the person has to score a minimum of 7106 in so that the person can lie in the top 5% and below that % I'm having 95% of the observations now this is the application of Z table how Z table can help you to find out the Z values and how Z can help you to find out the uh area under the curve which is essentially the probability what is the probability that I will have a data point below this point below this Z value that's what the application of Z table is now we have learned about a lot about a standard normal distribution right now what we are going to do we are going to learn last property of standard normal distribution which is the analysis of skewness we know that a standard normal distribution looks something like this a bell-shaped curve which is symmetrical which basically means 50% data below mean 50% data above mean but this is a standard and in reality it is very difficult or close to impossible to find out a random variable a continuous random variable which appears to be a standard normal distribution so we use standard normal distribution as a benchmark but in reality you may get a distribution which is either left skewed right skewed or no skewness or near skewness let's look at all the three types this is known as the left skewness I apologize this is known as the right skewness right right skew why I'm calling it as right skew because the skewness is on the right side so right skew in the case of right skewes this tail happens to be because of the presence of outliers the presence of outliers presence of outliers While most of my data is on the lower side so here I will get the mode followed by the median and then I will have the mean so I hope you guys remember that when we were talking about the uh descriptive statistics we have learned that if I'm having outliers On The Higher Side my mean will shift to the higher end and that is the reason you can see that the mean has shifted to to the outliers or towards the outlier now below this point below the median I will have the 50% of my data and above this I will have 50% of my data this is known as right skewness I can also get a distribution which is left skewed something like this now here again you guys can easily understand the tail is because of the presence of outliers presence of outliers and then again this is the mode the mode we have the median somewhere around here and finally the mean so mean again shifted towards the outlier now here most of my data is on the higher side if I consider this x-axis to be my income then this left side graph represents that most of the people are having low income while in the left skewed graph or left skewed distribution we can understand that most of the people are having high income but due to due to the presence of few people who are having low income or very low income uh the graph has been or the distribution has been skewed and we obtained a left skewed graph now in the case of right skewes my mean is greater than median is greater than mode this represents the presence of right skewness in the left skewness my mode is greater than median is greater than mean this this is the presence of left skewness and in the case of no skewness mean will be equal to approximately equal to median will be equal to approximately equal to mode if you are getting something like this you can say your distribution is close to normal distribution close to normal distribution now please remember the properties that we have learned 68% 95% 99% .7% which is also known as 1 123 rule it will applicable it will be best applicable if your distribution is close to normal distribution while if your distribution is having left skewness right skewness well those properties will not be 100% true it will be close it will be very close it sometimes it will be very different so whenever you are applying the property please make sure that your distribution is close to normal distribution the example of BMI that we have seen was close to normal distribution and we can just go back to the same the same sheet so um uh let me go back to the same sheet here we have the data and here you can see that the mean median and mode were very close to each other so this distribution was close to normal it was not perfectly normal it was close to normal and that is the reason these properties hold true otherwise if this distribution was not normal these properties would not have held through I hope this makes sense so we have covered a lot of topics and the most important topic that we have covered till this point was the normal distribution the analysis of Z table how to read a z table how to make use of Z table to find out the probability values and we have also learned about descriptive statistics and a bunch of probability uh the random variables continuous expected value we have learned a lot right now we are going to start with the second very important topic under our statistics and probability uh framework which is inferential statistics I have already given you the essence of what inferential statistics is and why we need it when we were discussing about descriptive statistics but now we are going going to understand it completely and also we are going to utilize the power of influential statistics to make estimations so let's start our discussion so when we talk about inferential statistics in order to understand inferential statistics we will take one example let's say uh you work for a marketing company and your manager wants you to wants you to find out on an average uh how many recharge or be maybe the the average recharge so your manager wants you to find out the average internet recharge the average internet recharge made by uh the Indians so I'm talking about the complete India Indians uh in last year now just look at the scope of the problem we are talking about complete India and your task is to find out the average the average internet recharge that was done by complete Indians in last year it's a very big challenging problem right so let's assume that we have this circle representing the complete Indian population so population and now in order to find out this answer what you will do so let's say this is U this represents you you will basically reach out to each and every individual you will collect the data so let's say the the recharge done by uh person number one so let's see the recharge done by person number one is uh R1 then R2 then R3 and so on till R let's say 150 CR now first of all it is impossible or not impossible to near to impossible for you to collect the date of this much size and it will be impossible for you to per perform survey in such a large scale so what is survey let's say if I'm having a group of people and I want to ask whether they prefer tea or coffee so what I will do I will perform a quick survey I perform a survey so survey basically mean I will go to each and every individual so reach out to each and every individual then ask them about their opinion so ask the question ask the question then collect the data collect the data and finally we perform the estimation so estimation estimation means we uh we calculate the average minimum maximum whatsoever you want to calculate this is the process of survey now if I have to perform the same thing on a population which consist of 150 plus CR people don't you think for a individual like you it is near to Impossible not impossible to near to Impossible and the I would say it is impossible even for me or even for a government organization it is impossible for anyone to perform a survey in such a large scale even if it is possible even if it is possible by a government organization it will take a lot of time lot of time and it will take a lot of investment lot of investment means lots of money is required lot of people is required so lots of investment investment now what is the better solution for this we have already discussed about it solution is I'll take a small sample we call this as a random sample now I'll talk about what random sample is all about but we take a small sample we take random people let's say I have taken one lakh random people one lakh random people from the population of 150 CR plus people now as an individual it is quite possible for you to perform survey on one lakh people so you have performed survey you calculated or you collected the data so R1 R2 till r one lakh then you took the average and this is my sample average sample average this is known as the statistic sample statistic Now using this I will try to approximate my population parameter so using sample statistic a sample mean we will try to approximate the population parameter a population mean population mean and this is basically known as your this is basically known as your inferential statistics inferential stats now before we talk about how to estimate the population parameter let's try to quickly understand what is sampling and random sampling so when we talk about sampling it is basically the procedure of collecting few observation from a bigger subset or you may say population so I I'm having a population I'm having a population and now I want to collect a smaller sample a smaller group of observations this is known as a sample now this process is known as sampling there is one very popular sampling technique which is known as random sampling and this is a very important interview question what is random sampling random sampling is nothing but assume that if I'm having a population having let's say numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 1 14 15 16 17 18 19 and 20 so I'm having a population which is having 20 people and the role number of 20 peoples are written over here now if I have to select a random sample of five people so select random sample of five people so what we will do we will we can K select any five people but according to the random sampling the probability of selecting any observation is equal to 1 by n so probability of selecting 1 is 1/ by 20 probability of selecting 2 is 1/ by 20 and this is same for selecting any number is 1/ n which basically means in simple random sampling there is no bias we will just randomly select any observation from a population and the prob probability of selecting any observation in my population is 1/ by n which is the same probability for every observation so each observation in my data comes with same probability of getting selected into the sample so here my sample can be 5 9 2 7 and 10 now why I have written five why I have written 9 I have just randomly selected these five numbers so this is simple random sampling and that is what I I was talking about here the simple random sample okay now let's talk about how to perform estimation in order to perform estimation we use a particular algorithm which is known as the central limit theorem Central limit theorem now what this theorem talks about this theorem states that if I'm having a population let's say this is a population of uh Indian people let's say I took one simple random sample of one lakh people now when I calculated the average the average recharge amount for these one lakh people for sample number one came out to be let's say 250 rupees for again I took another sample sample number two of another one lakh people the average came out to be let's say 220 rupees I again took one more sample sample number three again with one one lakh random people the average came out to be let's say 210 rupees and like this if I keep on doing if I take enough simple random samples from my population and I quickly plot them in a particular histogram I'll get a normal distribution like this I'll get a normal distribution where the average of this normal distribution is my population average and the standard deviation is given as Sigma / root of n so the first property of Center limit theorem is the average of all um of All sample average sample average will be approximately equals to the population average the second property is the standard deviation of the sampling distribution is given as Sigma / root of n now these are the two properties of the central limit theorem and using these two properties we can estimate the population range let's try to understand how we can estimate the population range let's say this is the question that is given to us in a simple random sample of 50 adult women is obtained so I have a simple random sample this sample is having n is equal to 50 women and when the uh red blood cell count was measured for them this sample average came out to be 4.63 the population standard deviation is 054 now let me go back and I was I did not specify that this Sigma represent population standard deviation and this n represents the number of observations in the sample so in this case it would be 1 lakh so Central limit theorem states that the average of all these sample average so I've taken multiple sample a sample I have taken multiple samples from my population these are my multiple samples uh these are my multiple samples 1 2 3 and so on and if I take the average of all these sample average it will be closely equal to the uh population mean a population uh average and if I plot them if I plot all these sample means in a distribution in a in a histogram it will closely resemble a normal distribution and the this this this distribution is known as the sampling distribution this distribution is known as the sampling distribution why sampling distribution because we have obtained it using the sample means so these are my sample means now Central limit theorem states that the average of all these sample mean will be closely equal to the population mean and the standard deviation of the sampling distribution is Sigma divide by root of n where Sigma is a population standard deviation and is the number of observations in the sample now this property is very important and very useful to estimate made the population range so I'm given with uh ns50 my Sigma population standard is 0.54 my average sample mean is 4.63 now I have to construct 95% confidence interval estimate we already know that a sampling distribution follows a normal distribution this is my sampling distribution for this particular um for this particular event so I know that my sampling distribution follows normal distribution let's say this is my mu which is sample mean and I want to estimate this range uh lower value upper value in such a way that between these two I get between these two I get 95% of my data data now don't worry I'm I'm I'm not rushing don't worry I'll explain you everything but this is my uh this is my problem statement I have to estimate the lower value and the upper value in such a way that between this I have 95% confidence I'm right now we are using the word confidence but uh as of now you can use the term that I need lower value and upper value in such a way that 90 5% data is between these two values now we we already know this actually that from the mean if I go below two standard deviation and above two standard deviation I can get 95% of my data using the second property of my normal distribution right but that is not exactly two standard deviation below the mean I'll just tell you assume that assume that this is my standard normal distribution mean zero and I have to find out I have to find out the zv value over here what is the Z value over here and what is the Z value over here Z1 and Z2 and assume that between Z1 and Z2 the area is 95 so if the area is 95 between Z1 and Z2 this area this black color area what that will be what that will be so the total sum of these two black color area will be 5% so this will be 0.025 and this will be 0.025 how so total is one between these two data points Z 1 and Z2 my area is 0.95 so I'll subtract so I'm left with 0 0.5 sorry 05 now since I'm having two shaded regions I will divide it by two so I'm left with 0.025 so half is over here and half is over here now I have to find out the Z value over here so from here to here the area under the curve is 0 0.025 Let's quickly use our Z table so 0.025 I have to go on the negative scale 0.025 is over here and this is - 1.9 and six you can see it is not exactly two it is- 1.96 so this is this point Z1 is - 1.96 similarly to find out Z2 I have to go from here all the way up to here so that will be95 this is middle region plus 025 so that will be 975 so let's look at 975 so 975 975 is over here this is + 1.9 and six so we have + 1.96 now we have the lower Z value and upper Z value from the sample mean to estimate to estimate population range or population parameter range why I'm estimating a range because if my manager is asking me to find out average recharge made by the Indians and if I'm using sample mean I'm not 100% sure right so I cannot tell my manager that the the sample mean was for uh let's say 280 rupees because I have I have just analyzed one sample I have just surveyed one sample so I have to tell my manager with a range that I can tell to my manager that okay sir I don't know exact value but I'm sure I'm 90% confident or I'm 95% confident that my population mean lies somewhere between these two values so that is what we are trying to estimate so if I go back to my question uh to estimate the population parameter range in my sample mean this sample mean we will add and subtract the Z * of Sigma / by root FN what is z so from this value I'm moving 1.96 standard deviation below the mean 1.96 standard deviation above the mean of the standard deviation of the standard deviation of my sample so essentially what we are trying to do from the mean from the sample mean we are just moving below and above below and above so to get this lower value and the above upper value so this is my mean in order to get 95% confidence I will move Min - 1.96 standard deviation below plus 1.96 standard division above the mean and that will get me the population range so mean is 4.63 + - 1.96 it standard division is 0.54 / Sigma of < TK 50 if I solve this 4.63 + minus so this will come out to be 0.149 and the range would be 4.48 comma 4779 what is the meaning of this this basically means means that there are 95% chances that the [Music] actual population mean would lie somewhere between 4.48 1 27 4779 so it was asking me construct 95% confidence interval estimate for the mean of red blood cell count of the adults so right now 50 adult womens were taken and their sample mean was 4.63 but if we talk about the complete population complete population of adult womens then this range states that there there are 95% chances that the red blood cell count of the complete adult wom woman population would would lie between 4.48 1 microl to 4779 microl that is what this range is let's look at one more question and then it will it will clear all your doubts and it will be very clear to all of you so the question says you want to rent an unfurnished one-bedroom apartment in New York City the mean monthly rent of random sample of 60 apartment advertised in some website came out to be $100 so what you did you were trying to estimate what is going to be the average one bedroom cost or average one-bedroom rent uh in New York city so what you did you Googled and you searched through some website you sampled 60 apartments in the same city so 60 Apartments my n is 60 and after looking at all the all the rent the average of this sample came out to be $1,000 and the standard deviation was you assumed that the population standard deviation to be $200 so this is just an assumption population standard deviation is $200 this is population now we have to construct 95% confidence interval again for 95% we have already calculated the Z values so the Z values is - 1.96 + 1.96 so we will just apply the formula mean plusus z * of Sigma / root of n mean is 1,00 + - 1.96 Sigma is 200 / root of 60 and if I just quickly solve this it will came out to be 949 comma 1050 which basically mean there are 95% chances that the actual rent of one bedroom one bedroom apartment in New York City would be somewhere between $949 to $50 and this is what the application of inferential statistics is I hope you guys are able to connect all the dots and able to understand how we have arrived uh to this particular point now this concludes our discussion in inferential statistics so till now we have talked about uh descriptive statistics we talked about probability and we have also looked at how we can use inferential statistic to approximate the population parameter given the sample statistic so this is all what we have covered now we are going to start with hypothesis testing now uh I have already given you uh with the a brief idea behind what hypothesis testing is so when we started this video uh we briefly discussed about all the three major buckets what is descriptive statistics what is inferential statistics what is hypothesis testing but in this section we are going to have a detailed conversation regarding what exactly is hypothesis testing and how this can be useful when it comes to data science or when it comes to data analysis so let's start this discussion now to understand hypothesis testing we will take one example so let's say um there is a quote so let's say this particular person is the judge sitting in the court so judge in the court and we have two lawyers and we have the person who has been convicted so this person has been charged charged for some crime let's say this person is talking in favor of the convicted person and this person is the against the person now if I ask you what will be the thought process this judge will be having in his or her mind regarding this particular person who has been charged for some crime you guys can easily say that uh the judge will be having two assumptions or maybe two hypothesis or two thought process if I if I tell you like if we think in general English it would be two thought process either the person is guilty or the person is innocent right so that is what the judge will also be having in their mind so either the person is guilty or uh the person is innocent now judge will decide after listening to both the lawyers uh the particular lawyer who is talking in favor of the charged person and then another lawyer who is talking in favor of in the against the uh charge person so the judge will listen to both these parties and then they will declare or they will come to a particular conclusion right now this is basically uh you can say this is basically the framework of what hypothesis testing is so in hypothesis testing we have two thought process two assumptions and we try to conclude we try to uh prove one of them this is a very raw definition this is not 100% correct but the reason I am talking in this way is because I wanted to uh create a base of what exactly is hypothesis testing so just keep this example in your mind and whenever somebody is asking you what exactly is hypothesis testing you can just generate this picture in your mind that there is a courtroom there is a judge who is listening to both the lawyers one in in favor and one in against and there is a convicted person now either the person is guilty or the person is innocent these are the two hypothesis that the judge will be having in their mind after listening to the whole uh whole conversation between these two lawyers the judge will come to a conclusion and that is what the conclusion which we are trying to achieve right so the complete hypothesis framework is having two stages number one is to formulate the hypothesis so given a problem statement we have to formulate the hypothesis so formulation formulation of the hypothesis and number two is the test so we will perform the test to uh to reject or to fail to reject one of the hypo ois now please don't worry about the term which I have just said reject or fail to reject but just just remember that we will first formulate the hypothesis given a problem statement so let's say your manager has given you a problem statement you will first formulate the hypothesis and then you will perform the test based on the test result you will make some conclusions so this is the simple process of hypothesis testing so the next step step is testing or performing test and then finally conclusion now let's understand the formulation of hypothesis as I have just told you that this judge this judge would be having two thought process in his mind or in their mind uh either the person is guilty or the person is innocent now there is a formal word or there is a formal name given to these two thought process one is known as the null hypothesis and another one is known as the alternate hypothesis null and Alternate hypothesis now what is null hypothesis now if I ask you before this person was charged charged for some crime before this person was taken to the court initially this person was innocent right so null hypothesis talks about the initial state of anything or initial state of the particular uh particular item or particular case that we are trying to test so null hypothesis is basically the initial state of the case we are trying to test so the initial state of this person is that this person was innocent before this person was taken to the court before this person was charged for some crime this person was innocent before anything happened to him he was innocent so so my null hypothesis is uh the person is so H not so null hypothesis can also be represented as H not the person is innocent now the alternate hypothesis is what new thing has been happened what new we are trying to prove the new thing that has happened to this person is this person has been charged for Crime now so this is the new thing or new estate or the claim of the case we are trying to test we are trying to prove so H alternate hypothesis can be represented as H and in this case it will be the person is the person is guilty so this is how we can formulate the hypothesis please remember null hypothesis represents the initial state of the person or initial state of the case which we are trying to test and Alternate hypothesis is basically the new finding the new thing or maybe the new happening which uh for the particular case which we are trying to prove or which we are trying to test now let's take few examples just to understand how we can formulate hypothesis uh given any problem statement so right now this was not the problem statement this was a very uh plain example but now we will be looking at some problem statements which will have some sample mean population mean standard deviation to understand how we can formulate the hypothesis now before we talk about some business examples uh to better understand how to formulate the null and Alternate hypothesis uh we will understand a very small uh very small topic uh which which is really important when we conclude so whenever we conclude the hypothesis testing during conclusion we say that we reject null or either we say that we fail to reject null now we never say that we accept null or we accept alternate so what we never say we never say we never say we accept null or we accept alternate now there is a reason for this so this is the something which I should Mark in red color because this is wrong we never say this now the very first thing that we should understand is why are we only rejecting null and fail to reject null why it is only null why it is not alternate well the reason is because null hypothesis is something that we know that we certainly know so we are 100% sure that this person was innocent before committing crime right or before the person was charged for the crime which uh in the court somebody is trying to prove right so reason we conclude on the null hypothesis is because null hypothesis is the statement which we are certain about we know that initially this is something which actually happened the person was actually innocent and we are certain about it we are 100 % certain about it now why we don't say accept null if I say accept null we are completely ignoring the fact that the person was guilty now it might be because of the lack of evidence so if I say I accept null I am 100% sure that the person was innocent it might be because of lack of evidence it might be because was uh so let's say this person who is talking against the charged person he was not able to present enough evidence against the court due to which he was not able to prove that the person is guilty so when I say that I accept uh null I am basically ignoring the fact and I'm 100% certain that this person is innocent and I'm ignoring the fact that this person is guilty which is not true we have to be uncertain we have to uh keep some uncertainty so the uncertainty is the lack of evidence we when we say that I reject null I'm saying that I'm rejecting null but I'm not saying I'm accepting null rejecting null means I'm I'm not 100% sure and it might be because of the lack of evidence now when I say say when I say when I when I say accept alternate the reason why I don't say accept alternate so accept alternate basically mean that I'm 100% sure that the person is guilty now again this there is a there might be the lack of evidence the person who is talking in favor of this person was not able to collect EV evidence to prove that this person was innocent or this person is innocent so again uh when we say we accept alternate we are completely ignoring the fact that there is some uncertainty that can happen so in order to make sure that while we have concluding something we have the uncertainty in our decision and there are chances or there are possibilities of other thing happening because of the lack of evidence we only say that we either reject null or we fail to reject null I hope this makes sense uh if it is not making sense just remember one thing we we only say reject null or fail to reject null we only use the null hypothesis because we are 100% sure that null was true null happened null happened in reality this person was innocent in reality that is the reason we use null the second reason why we do not say accept uh accept null or accept alternate is because uh is because there might be lack of evidence due to which I was not able to prove the statement so in order to include the uncertainty in my conclusion we only reject or fail to reject null I hope this makes sense now given a particular problem statement how we can formulate the null and Alternate hypothesis so in this particular question uh it states that a restaurant owner installed a new automatic Drink Machine the machine is designed to dispense 530 mL of liquid on medium size setting now what is the initial state of this machine so we know that the machine was designed to dispense 530 ml this is the initial state of the machine so let me just highlight this this is the initial state of the machine the owner suspects that the machine may be dispensing too much in medium drink they decide to take a sample of 30 medium drink to see if the average amount is significantly greater than 530 ml now what the owner wants to test if the average amount is greater than 530 ml so this is the new finding new finding ending so my alternate hypothesis is that the average amount is greater than 530 ml my null hypothesis will be the average amount is less than or equal to 530 ml now why I am using less than or equal to now please remember in the question it says the machine is designed to dispense 530 mL of liquid so I I should have written null as average is equal to 530 ml but please remember one thing I hope you guys remember when we were talking about continuous random variable and the distribution of continuous random variable we know that the point Pro probability is zero so is equal to 530 ml let's say this line this line is 530 ml now greater than 530 ml is this region above the line everything is greater than 530 ml right this is greater than now if I use alternate as greater than 530 ML and null as is equal to 530 ml I will never be considering this region right the lower region and that would not be uh that would not be helping us to arrive at a conclusion right so for example if I'm getting a value which is lying below 530 ml then in that case if I do not consider this area I would not be able to conclude anything so that is the reason in order to make sure that I'm covering up the complete uh distribution we use less than or equal to so we get this part also and then we will we'll be able to cover up the complete uh area under the Curve there is one rule also the rule of hypothesis testing that when we talk about or when we formulate the null hypothesis the null hypothesis always uh have they will always be having uh less than equal to sign greater than equal to sign or equal to sign while alternate will always have less than greater than or not equal to that this is rule number one rule number two is null and Alternate are 100% opposite to each other so null and alterate are 100% opposite to each other which basically means if in my uh hypothesis formulation if I get my alternate as less than this will be greater than equal to if this is greater than this will be less than or equal to if this is not equal to this will be equal to this is my alternate this is my null the third rule is always start by formulating alternate always start by formulating alternate now these are the three rules which so if you remember them you can formulate the hypothesis for any given problem let's try to look into second problem a city had an unemployment rate of 7% now this is my initial state this is something which is initially given to us that we all know in the in the complete uh City that unemployment rate is 7% the mayor pled to lower this figure and supported program to decrease unemployment a group of Citizen want to test if the unemployment rate has actually decreased so now what we want to test or what is my new finding Now new finding is that the unemployment rate has to be decreased it has to go down or it has to go below 7% so I will start by writing my alternate hypothesis which is average less than 7% the opposite of this average greater than or equal to 7% would be my null hypothesis let's take one more example e health insurance claimed that in 2011 the average monthly premium paid of our individual health coverage was $183 this is the initial statement made by the company so this is my initial claim or initial State initial suppose you are suspicious that the average or mean cost is actually higher now the new finding the new thing that you wanted to test is uh the average is actually higher than this number so my alternate the average is greater than $183 null the average is less than or equal to $183 simple let's take one more example we want to test whether the mean GP of students in American college is different from 2.0 now this is what we want to test in this question we are not given with the initial state so we know the alternate that average is not equal to 2.0 null will be the average is equal to 2.0 in an issue of news uh in an issue of us news and word report an article on school standard stated that that about half of the students in France Germany and Israel take advanced placement exam in third pass the same article state that 6.6% of USS student take advanced placement exam and 4.4% pass test if the percentage of US student who take advanced placement exam is more than 6.6% so there is a magazine or there is some report which already stated that 6.6% of us student take advanced placement exam you want want to test that if this number is more than 6.6% so what you are interested in testing the alternate will be average is greater than 6.6% while null is average is less than or equal to 6.6% okay last question a survey was conducted to get an estimate of proportion of smokers among the graduate student report says 38% of them are smokers so there is a report which says 38% of The Graduate students are smokers now this is the initial State this is already given in the in some report this is already proven but now chatel doubts the result now we have a new finding this person is suspicious the person says chaty doubts the result think that the actual proportion is less than this so there is a benchmark 38% there is a new finding in the chat chatel is thinking that uh the actual proportion is less than 38% so what is alternate that the average is less than 38% which is what chaty things while the report says it is greater than or equal to 38% this is my null and Alternate hypothesis so I hope now you guys are clear with the first stage of hypothesis testing which is how to formulate the null and Alternate hypothesis given a particular business statement now let's try to understand what are the various types of test we have in the hypothesis testing topic there are two kind of test uh number one is known as the Z test the second one is known as the T Test Now within Z test we have two methods one is known as the confidence interval confidence interval estimation not confidence interval test also known as the Z test or the Z Value method then we have significance test also known as the P Value method So within Z test we have Z Value method and the P Value method within T Test we have three or three types of test so these are the two methods to perform Z test but within T Test we have three type of test based on the problem statement number one is one sample T Test second one is paired sample T Test and the third one is independent sample T Test so these are the three type of test within the T Test if I have to perform T Test we have three type of test based on the business or problem statement so three type of test based on the problem statement okay now when to use Z test when to use T Test so I will first talk about when to use Z test and when we will be starting this D test I will talk more about when to use T Test so when to use Z test you have to check for two things number one the number of observations in in your sample in the sample should be more than 30 and the second condition is the population is standard deviation is known now please remember this is a very important interview question a very common interview you question about what is the difference between Z test and T Test I'm just talking about Z test as of now when we start the T Test I will show you the rules of when to use a t test but please remember this is a very important topic this is a very common interview question the difference between Z test and T Test so if I do not know the population standard deviation and I know the sample is standard deviation but my sample size is more than 30 then it's still Z test is applicable now directly jump to a problem statement let's formulate the hypothesis apply the Z test and let's try to conclude in this question we are given with a sample of 40 new milk packet had an average milk content of 92.6 7 So within a milk packet you will find out some quantity of milk and some quantity of water so it has been given that a sample of 40 new milk packet had an average milk content of 92.6 7 with a standard deviation of 1.79 so this standard deviation is the sample standard deviation now since my sample size is more than 40 Z test is applicable over here use 0.05 significance level to determine if there is sufficient evidence to support the claim that the new batch have milk content different from 92. dat for now as of now the complete question may be sounding um really German to you don't worry about it let's try to decode this particular problem so assume that you work as a quality test and you work as a let's say as a scientist in the quality test Department of this milk producing Factory or milk packet Factory so you work as a scientist in the quality test Department quality test department and on the daily basis let's say this Factory on the daily basis is producing one lakh milk packet one lakh milk packets so on the daily basis this Factory is producing one lakh daily packet now as a scientist it is completely impossible for you to test all the milk packet right so what is the correct procedure we have already learned about it we will take a sample out of it so let's say on the daily basis you are by taking uh a sample of 40 milk packet so let's say I have just drawn a table on my screen let's say um this represent the day in the month so day of the month and this represents the average milk quantity so let's say on day one the average milk quantity so on day one you sampled out 40 milk packets out of the population of one lakh milk packet that was produced on that particular day you sampled out 40 random milk packets after performing the test on all those 40 packets the average milk content came out to be let's say 92.5 then on another day the average came out to be 92.7 on another day the average came out to be let's say 92 uh let's say 8 on another day the average came out to be let's say 92.2 7 and you have been doing this thing let's say for last one month so let's say on day 30 so I'll just put dot dot dot in this in this table on day 30 the average came out to be 9266 now when you looked at the month average the complete month average the monthly average was 92.8 4 now this is what has been given to us so another day let's say another month another day when you are testing 40 new milk packet 40 new milk packets your average came out to be 9267 so I hope this this complete picture is making sense you have been doing the test for all the days from day 1 to day 30 from last 1 month and the average was 92.8 4 and this is let let's say you can say that this is above the quality uh above the quality cut off but on one particular day after testing this 14 new milk packets the average came out to be 92.6 7 now you want to perform the test to check if this number the new mean the new sample mean is in the significance range or is is this number is 92.6 7 is statistically significant or statistically same as 92.8 for or not so what we want to test is 92.6 7 is statistically same as 92.8 4 or not this is what we want to test so if these two numbers are statistically same then we will prove this this uh this batch this batch of new 40 milk packets if they are different if they are statistically different we will reject the complete lot because the milk content is less than the acceptance this is what the complete scenario is now the number one thing that we will do is to formulate the null and Alternate hypothesis so my alternate hypothesis will be that the average is not equal to 92.8 4 why I'm using 92.8 4 over here because that is my previous or previous proven value this is my initial value right initial now alternate oh sorry null will be the average is equal to 92.8 okay now please remember one thing whenever we have a not equal to sign this basically mean I have to perform the test uh which will be a two tail test so let's quickly understand this part if I'm having not equal to sign in my alternate hypothesis then my rejection region so what is rejection what is acceptance we will talk about it don't worry but my rejection will region will be on both the side so this is my rejection region this is my rejection region if I'm having less than sign in my alternate hypothesis then my rejection region will be on left side this side so just see the arrow so the arrow is pointing on the left so my rejection region is on the left if I'm having greater than sign in my alternate hypothesis then my rejection region will be on the right which is on the this side so just see the arrow arrow is pointing towards the right so my rejection region is on the right now since I'm having not equal to sign in my alternate my that this particular test would be a two-tail test so this is known as a two tail test this is known as a left tail this is known as a right tail test so this is my two tail test so my rejection region is on the both end I'll just highlight it this is my rejection region and this is my rejection region now what we are going to do we are going to find out this ver value lower value and the upper value now I hope you guys remember that we have already done this using confidence interval estimation in inferential statistic we have already found out the limit the lower limit and the upper limit right so we have the uh average the so here we call this as a population average so we have the average plusus Z * of Sigma / root of n now what what is going to be my Z value now for that we have to use this significance level significance level also known as Alpha is basically my area of rejection which is 0.05 so out of the complete area which is 1 0.05 is the area of rejection now since I'm having area of rejection or rejection region these are my rejection regions rejection region this is also my rejection region since my rejection region is on both the end I have to divide this significance or area of rejection by two so 0.05 / by 2 will be 0 025 so from here to here the area is 0.025 from here to here the area is 0.025 now what is the Z value over here what is the Z value over here we already know this but let's quickly find out the Z value at 0.025 so 0.025 over here and the value is - 1.9 and 6 similarly for the positive side it will be + 1.96 so this is - 1.96 + 1.96 Now using this we will find out the lower value and upper value so 92.8 4 + - 1.96 Sigma which is U standard deviation is 1.9 79 / by root of 40 so we will get 92. 286 comma 93. 394 so here here we will have 92. 286 here we will have 93. 394 now this is the range in which which has been given to us or which we have derived using the population mean so now if my sample mean this is my sample mean if if my sample mean lies between or maybe in the acceptance range this is my acceptance range acceptance range so if my sample mean lies in the acceptance range we say that we fail to reject null so 9 2.67 will lie somewhere between this over here since it is lying in the acceptance range so if sample mean lies in acceptance range we conclude that we fail to reject null otherwise if it is in the rejection region if sample mean so I forgot to write mean fals or lies in the rejection region we reject null we reject null so as of now what conclusion we will derive the conclusion will be we fail to reject null why because the sample mean is within the acceptance region with within the boundary which we have derived using the population mean this is also known as the hypothe mean the reason hypothe mean is because we hypothetically call it as a population mean but it is not the true population mean so if you are getting confused just be very clear this is a sample mean 9 2.67 92.8 4 is my population mean and using the population mean we have derived a range if my sample mean fall within this range we fail to reject number if it is outside the range we reject null I hope this question makes sense if it is not let's take one more example now let's try to solve this particular problem and I would advise all of you to pause this video and try to solve this problem by yourself because we have already seen one example you have to just follow the steps and try to solve this problem and try to pause the video and just try to solve it by yourself I assume that you guys would have solved it by yourself if not let's try to solve it the question says in a certain Community a claim is made that the average income of employed individual is $35,500 now again there is some initial statement which has already been proven that the average income is 35500 so this is my initial statement initial statement a group of citizens suspect that a value is incorrect and gather a random sample of 140 employed individual in a hope of showing 35500 is not the correct average the mean came out to be 34 325 so what the group of citizens did they collected a sample of 140 people and they obtained the sample mean to be 343 25 with the population standard deviation the population standard deviation so this is not sample standard deviation so here we have been given with the population standard deviation of $4,220 the alpha alpha what is Alpha the area of rejection area of rejection is given as 0.10 which is 10% now let's start with the formulation of the hypothesis my alternate hypothesis is that the average is not equal to 35500 because 35500 is the initial value initial state so group of citizan wants to prove that 35500 is not the correct average so that's my alternate null will be opposite of this mean is equal to 35 500 Now what is my second stage I will find out the lower critical value and the upper critical value so here is my normal distribution since I'm having not equal to sign in my alternate hypothesis it is the two-tail test my area of rejection is on both the end so this is also area of rejection and this is also my area of rejection now we have to find out the lower critical value and the upper critical value before that we are given with the alpha of 0.10 since I'm having rejection region on both the ends my Alpha will be divided by two so 0.1 divide by 2 so this will be 0.05 from here all the way up to here and this will also be 0.05 so what is the Z value over here again we will use the Z table so we have to look at 0.05 so 0.05 5 okay over here maybe we can take this one or maybe this one so - 1.64 or 65 you can take anyone so I'll go with - 1.64 so this is - 1.64 this is + 1.64 now I have the Z value I have all the values let's calculate the range mean plusus z * of Sigma / root of n my mean is uh 35 500+ minus 1.64 standard deviation is 4,200 / root of 140 so the range would be 34 917 76 comma 360 8224 so this value lower value is 34 917 and the upper value is 36082 now where my sample mean is lying at 34325 so my 343 325 will lie somewhere over here right 3 4 325 in the acceptance region since it is falling in the acceptance region we conclude that we fail to reject null and that basically means the claim that was made was correct which basically means this alternate or sorry null is actually true now I'm just saying we fail to reject null but maybe because the sample size was less or because the sample collection process was was biased due to which the citizens were not able to prove that 35500 is a wrong is a wrong average but as of now using the test result uh we failed to reject null I hope this makes sense now here we have one more example so we have seen two examples where we used two tail test where the rejection region was on both the end but now let's take one example where I'm having of one tail test either left or right so the question says suppose the scores on the SAT so sat is one of the examination form a normal distribution with mean 500 and standard deviation 100 the school counselor has developed a spatial course designed to boost satus score a random sample of 40 students is selected so we have a sample of 40 students and the sample average came out to be 544 after they have gone through the course and they have attempted the SAT examination the question already states that uh the SAT score is having a normal distribution with mean 500 and a standard deviation of 100 so this is my population parameter you may assume this is these are the population parameters now does the course boost the score does does the course boost the score so my alternate hypothesis is does the average is greater than 500 because if the course boost the score it should be greater than the population average which is 500 right null is it is less than or equal to 500 right now greater than 500 basically mean uh it is the arrow is pointing on the right side so it is a right tail test so my rejection region will be on the right end this is my rejection region now now what we have to do again we have to find out the Z value over here so let's find out the Z value over here now to find out the Z value uh the area of rejection Alpha is given as 1% which is 0.01 so this area from this point all the way up to the end is 0.01 so what is the area from this point all the way up to this point it will be9 9 so now we will use the a uh Z table to find out the Z value at the Au of 0 99 so let's use the Z table so N9 what is the Z value so either we can go with this value or this value so this is 2.32 or 2.33 so I'll go with 2.32 so Z value is 2 32 and this is plus now let's apply the formula mean plus Z * of Sigma / root of n remember the lower value will now be minus infinity we have to just find the upper value so mean is 500 plus 2.32 Sigma is uh which is the standard deviation is 100 so this will come out to be 36.6 so the total is 536.773500 so this value over here is 536.773500 we conclude that we reject null so if I reject null that basically means the course does increase the saus score the course does increase the satus score now please remember this statement is based on this particular sample I'm not saying it will always in increase the satus score based based on this sample the conclusion is that course does increase the S score but it might be because of some some uncertainty it might be because when we were when we were selecting the students we have only selected uh the intelligent students with higher IQ and that is the reason we obtained the average which is greater than the population average so it might be because of the uncertainty Al also so that is the reason we say we reject null we are not saying we accept alternate we will not say that I hope now the thing is making more sense so now let's talk about T Test we have already seen uh Z test in very much detail we have solved three problems now let's talk about T Test and before we jump into solving few problems using T Test let's first of all understand when to use a t test so we have already talked about Z test now let's write down the conditions for T Test so I will write it over here T Test so we use T Test condition number one when the number of observations in the sample is less than 30 so the number of observations is less than 30 this is my first check the second check is population standard deviation is unknown the third check is the sample has been taken out from a population which is known to be normally distributed the sample has been taken out from a [Music] population which is known to be normally distributed now what is the meaning of the sample has been taken out from a population which is known to be normally distributed so let's say I take example of income I know that in India the income is normally distributed now if I take a sample of five people now this sample will satisfy the T Test T Test parameters because I know that it has been taken out from a population which is normally distributed the number of observations in the sample are less than 30 and the PO popul population standard deviation is unknown so I don't know what is the standard deviation of the income in my Indian population because the population is really large right so before we apply T Test we have to check these conditions we have to check these parameters now let's solve a problem to understand this as I've already told you that T Test can be performed or there are three types of T Test based on the problem statement so let's try to understand them very briefly about what kind of problem statement satisfy which kind of T Test Now there can be three type of T Test one sample T Test paired sample T Test and finally the independent sample T Test so regarding one sample T Test we try to compare the mean of single group with known mean now if it is not making sense don't worry we will when we will be solving a problem then I'll better explain you this perod sample T Test here we compare the mean of the same group at two time or two time two different time so at two different time in independent sample T Test We compare the mean of two different groups now if it is not making sense don't worry we will take example of each one of them and then it will make more sense let's start with one sample T Test so a a laan service advertise that they can maintain your loan at an average cost of $35 per month now this is what their proven value is the initial value that they have given uh assume the cost to be normally distributed now this is a very important sentence the population is normally distributed that has been given to us a random sample of 18 DS customers shows the average of 32.50 so we have taken a sample the sample is having the mean of 32.50 and the sample standard deviation to be 18.10 now we know that this sample has been taken from a population from a population in which the cost was normally distributed cost was non normally distributed so first of all let's check whether this question or this particular problem satisfies the condition of T Test or not the very first condition is the number of observations in the sample should be less than 30 here my n is 18 so yes first condition is satisfied second condition is the population standard division is unknown yes we do not know the population standard deviation but we know the sample is standard deviation and the third one is the sample has been taken out from the population which is normally distributed so this particular sample has been taken out from a population where the cost was normally distributed so all the three condition are satisfied let's try to see the procedure now number one we have to formulate the hypothesis so my alternate hypothesis would be that the average average cost is less than 35 so this is what we want to test so average is less than 35 null will be average is greater than or equal to 35 so this is my null and Alternate hypothesis again since I'm having less than sign in my alternate it is a lower tail test so lower tail test now we will find out the T critical now T critical formula is similar to Z value X - mu / Sigma / root of N I hope you guys remember the Z value formula when we were talking when we were talking about the Z scaling the Z value formula was x - me / Sigma now here I'm using Sigma ID by root of n is because this distribution is a sampling distribution and the and the standard deviation of the sampling distribution is given as Sigma IDE root of n according to the central limit theorem right now what is X X is my um sample mean which is 32.50 Mu is my population mean which is 35 ided by Sigma 18.10 / root of 18 what is the value you guys can quickly calculate it I have already calculated it so this will be minus 1.30 sorry there is a one um one correction this is is not t critical this will be T estimated te so T estimated is 1 - 1.30 now we will find out the T critical which is calculated using the T table and T table requires your Alpha which is given as so okay I haven't written Alpha so let's assume Alpha to be 0.05 so Alpha is 0.05 and degree of Freedom now degree of freedom is given as n minus one but what is degree of Freedom what is degree of Freedom so let's understand degree of Freedom now let's say there are three people one two and three these are the three people and let let's say these are a b and c and this is Sumit now what I have what I'm asking AB C to do I'm asking ABC to choose one number in such a way that the total is 10 now let's say so let's write down a b c and the total now we already know that the total in all the cases would be equal to 10 so let's say a has chosen five B has chosen three then C is bound to choose two let's say a has chosen five B has chosen five then C is bound to choose zero let's say a has chosen one B has chosen one then C is bound to choose it now you here you guys can see that the third component or the third data point in my in my summation is not free to choose a value when we already know the total so there are only two values A and B which are free to choose any value of their choice but C is always bound C is always bound to choose a value that is left out so degree of Freedom talks about how many observations in your data are free to choose a value the number of observations in the data in the data that are free to choose a value now here since we know the sample average since we already know the sample average the degree of Freedom will always be n minus one because there will be always one value who which will be bound or to choose a value that can make the average to come out to be what we are given with so here degree of Freedom will be n minus 1 n is 18 so 18 - 1 is is 17 now we will use the T table with these two parameters to find out the T critical value so let's use the T table so this is a one tail test uh with the significance of 0.05 and the degree of freedom to be 17 so here so correct value isus 1.74 so t t critical is minus 1.74 now I'll I'll quickly delete this because I do not have to write this first first of all we will write the T critical minus 1.74 now where this Min - 1.30 will lie T estimate T estimate will lie let's say so minus 1.3 - 1 .3 will lie over here right so this is my this is my rejection region this is my rejection region which is below the T critical value and my T estimated is beyond beyond the t t critical so this is the boundary T critical and this is my T estimate now since T estimate is on the acceptance side we say that we fail to reject null which basically means that uh uh this statement it is greater than or equal to 35 that basically means those people or those people who are uh doubting who were having the doubt regarding this DS laan service they were actually correct that based on this sample we can conclude that uh the average cost that uh this DS launch service um charges for maintaining your loan is not uh 35 or less than 35 it is more than 35 I hope this is making sense this was uh one sample T Test one sample T Test now the reason it was one sample test is because we have one group we have a known mean which is 35 and we are comparing the sample mean with the known mean so this is one sample T Test Now in the paired sample T Test this this this is the example of paired sample T Test here we compare the average of one group at two different time so here you can see that we have before result and after result let's try to solve this problem to better understand this so here in this question we are given with uh the the problem statement states that a study was conducted to determine the effectiveness of a weight loss program the table shows before and after weight of 10 individuals does the program was effecting in reducing the weight we have to test at a significance of 5% now if we logically think about this question the program will be called effective when the difference what is the difference after minus before when the difference is negative so let's calculate the difference I have already calculated it so I'll just fill the table so 169 - 185 - 16 - 5 - 13 -1 - 31 + 3 - 28 - 22 + 5 and - 23 now these are the differences if this difference is negative that basically means uh the person obtained or the person achieved the weight loss if it is positive that means the person achieved the weight gain so if I look at this observation of individual number six uh the person was having a weight of 168 it increased to 171 so the person gained one uh three three units of weight while this person whose weight was 218 now it came to 195 so this individual uh achieved a loss in the weight of 23 units now I will call this program effective if the average difference is below zero right below zero means negative so my alternate my null alternate hypothesis will be the average is less than zero because initially I will consider that the program is not effective initially I will consider that the program is not effective so initial statement will be the average is either greater than or equal to zero so assume that this is a number line uh of differences D where uh if my average average difference is zero or above this is zero or positive then my program is not affective if it is negative then my program is effective so these are my null and Alternate hypothesis now what we have to do this is my sample so I need the sample mean so mux my average of differences so I'll just write mu D the average of differences I have already calculated it you guys can use Excel for this quick calculation so the average difference is -3.1 and the standard deviation of the sample is 13.02 this is my sample statistic now I will calculate the T estimate T estimate is x - mean / Sigma / root of n my X is um -3.1 my M 13.1 population mean is zero population mean is zero which we have assumed that there is no change there is no gain there is no loss zero Sigma is 13.02 IDE root of n n is 10 now this will came out to be Min - 3.18 Now using the alpha which is 5% so 0.05 and degree of of Freedom which is n - 1 which is 10 - 1 which is 9 I will find out the T critical using the T table so let's quickly do this so degree of Freedom so this is a one tail test significance level 0.05 degree of freedom to be 9 so here we have the value 1. 833 1. 833 now since it is a lower tail test because I'm having lower sign or less than sign so the value T critical will be let's say this T critical is - 1.8 33 now where this - 3.18 will lie - 3.18 will lie over here T estimate now since it is lying on the rejection region we conclude that we reject null which basically mean uh which basically means yes based on this sample the uh program was actually found effective so yes based on this sample the program was found to be effective now here you can see that the reason we call this as a paired sample T Test is because we were having the same group but at two two different time before and after same set of people before and after so paired sample test we are we are comparing their average or the average difference now we have independent sample test which basically means now we have two different group group a group b and we have to compare them so let's see this example so this is the example of independent sample T Test and in this particular test we are having two different group now if you read the question it says let's consider an example where we want to compare the effectiveness of two different study techniques uh 30 random students are selected they were divided into two groups group a was subjected to technique a group b was subjected to technique B now please remember uh I'm having total number of students to be 30 where uh we have divided them into two equal groups so that basically means group a consist of 15 students Group B consist of 15 students group a consist of 15 different students Group B consist of different students so the these two these two groups are not having any association between them they are two different groups now I want to compare whether is technique a and technique B whether they are same or different that's what my tested whether they are same or whether they are different so my alternate hypothesis will be average of a is not equal to average of B my null hypothesis will be average of a is equal to average of B so my null hypothesis is that I'm initially assuming that these two techniques are same where what I want to prove that these two techniques are different so that's my null and alternate alternate hypothesis now we will follow the same procedure but there will be a minute difference in the T estimate the formula for T estimate is X1 average minus X2 average minus mu1 - mu2 divide divided by a square root of Sigma 1 s / N1 + Sigma 2 s / N2 so here in my table I'm already given with mu1 and mu2 so let's fill the table U fill this formula mu1 is 84.7 - 78.2 3 minus mu1 and mu2 so we are assuming that mu1 is is equal to mu2 so this will come out to be zero since we are assuming my null hypothesis is that these two average the population average are same so mu1 minus mu2 will always be zero then square root of Sigma 1 squ so 3 87 to the whole s/ N1 which is 15 plus 2.61 1 to the whole S ID 15 if I solve this we will get 3.02 this is my T estimate now this is a two-tail test right this is a two tail test because I'm having not equal to sign my alternate hypothesis two tail test we have not given with Alpha let's assume Alpha to be 0.05 so this portion will be 0.025 this portion will be 0.025 and both are my rejection region now I want to find out T critical which is this point T critical again we have to use Alpha which is 0.025 and we need degree of Freedom now in the case case of uh independent sample T Test the degree of freedom is given as n minus so degree of freedom is given as N1 + N2 minus 2 so my N1 is 15 + 15 - 2 which is 27 so I'll be using Alpha 0.05 sorry Alpha 0.025 T critical as 27 to obtain the T critical value using the T table let's do that so I'm now in the two tail test and we have to look at 0.025 so two tail test uh Alpha is given as 0.05 right Alpha is given as 0.05 uh we can use so what is my what I have have done uh I have since I have divided it by two in my T table uh it gives from the left it give the T critical value from the left so if I go with 0.025 from the left it I will get this value and if I go from this value to this if I have to find out this value it will be 0 0 see 0.95 so let's use one tail test or if you want want to use you can use two tail test at the alpha of 0.05 we can use any of one one of them uh for avoiding any confusion let's use two tail test at a alpha of 0.05 so let's look into that so two tail test Alpha is 0.05 and we are given degree of Freedom as 27 this is 2.52 so my D cral is - 2.52 and this will be+ 2.52 right because always they are complimentary to each other 3.02 where this will lie 3.02 will lie somewhere over here on this rejection region this is T estimate since it is lying on the rejection region we say that we reject null and when we reject null basically mean they are not same they are different So based on this particular sample we conclude that these two techniques are different from each other so based on this sample we conclude that these two techniques are different from each other this is what independent sample T Test is now we have completed our discussion on hypothesis testing so we have talked about how we can perform so we started our discussion with uh how to formulate the hypothesis then we talked about the two Ty kind of test Z test and T Test within Z test we looked at two methods Z Value method P Value method within T Test we looked that the various types of test which are based on various types of problem statements one sampl T Test uh paired sampl T Test independent sampl T Test and we solved a variety of problems all also to make our understanding more concrete right now we are going to talk about a topic which is known as error in hypothesis testing as I talked initially that there might be some uncertainty that my test goes wrong right since this is a test which we are performing on the sample we can never be 100% sure that what result my test is providing us or the conclusion that we made from the test is always going to be 100% true let's say uh my test concluded that uh uh we reject null or let's take one example in this particular example we have rejected null right what if this was actually not true so based on these two samples we concluded that we reject null but what if in reality overall so as of now we have taken a sample of 15 15 students and just because of the sample of 15 15 students we concluded that we reject null but what if in reality if I take a much bigger sample in reality this this is not true this is not true both techniques happen to be same then that is known as the error in hypothesis testing so how we can avoid this error before we can avoid this error we have to be very much clear with what kind of error hypothesis testing can lead us or what kind of Errors we can encounter while performing hypothesis testing let's try to understand that now before we talk about uh this particular problem so I have written one condition or I have written one problem statement let's try to summarize what are the various error we can encounter while performing hypothesis testing so we can summarize the type of error in uh two cross two Matrix this is the reality and this one is the test result or hypothesis test result hypothesis test result now there can be four type of scenarios so in reality the null hypothesis was true so H not was true and H was false here we rejected H not and we fail to reject H not now when H not was false and we rejected H not this is a good thing right so this is my correct conclusion so I'll write correct over here this is correct when my H not was true and we failed to reject H not this is also correct now where is the error error is in these two cases when my hn was true in reality so in reality let's say in reality uh I know that the program is not effective so in reality program is not effective but based on the uh based on the sample we concluded that program is effective so this is a wrong estimation or wrong result so this is known as the type one error type one error and this is represented by Alpha so Alpha which is area of rejection in all the questions that we have solved is basically the probability of type 1 error what is the probability that I will commit the type 1 error is is basically given by Alpha and this is known as the type two error when when your H not was false so in reality my null hypothesis was wrong so in reality let's say uh in reality let's say the program is not effective so in reality let's say program is effective and we fail to reject H not so we concluded that program is not effective that's a type two error if this is confusing using you a lot don't worry now let's look into examples let's look into this example City had an unemployment rate of 9% so what is my null hypothesis my null hypothesis is that the city had an unemployment rate of 9% group of Citizen thinks that the unemployment rate is different from 9% so alternate is average is not equal to 9% % null is average is equal to 9% this is my null and Alternate hypothesis now if I commit type 1 error then what will be the case so type one error means hnot is true in reality but we reject hot so what is a meaning of this so in reality unemployed employment rate was 9% but we concluded that it's not 9% we rejected H not right what is type two error in this case type two error will be that in reality unemployment rate was not 9% but we concluded that it was 9% % so we failed to reject null we failed to reject null means we concluded that the unemployment rate is 9% but in reality it was not 9% now as of now I'm not talking about which one is good which one is bad I'm just talking about how to identify the type one and type two error just remember that you you have to just uh keep this particular table or this Matrix in your in your head so that you can formulate type one error and type two error let's take one more example a city had a city planed to construct a new parking area they plan to survey a sample to see if there is a strong evidence that the proportion is interested in new parking area if it is more than 40% then the government will consider building a new parking area so the if the if the proportion uh so let's say government has taken a sample of different different uh individuals from the population and if this sample uh results in 40% 40% uh interested so let's say 40% of the people in this sample are interested in the new parking area in the city then the government will propose a new parking area in the city so what is the null and Alternate hypothesis so my alternate hypothesis is that the average is greater than 40% because this is what we were trying to prove null is the average is less than or equal to 40% so initially my assumption is that no one is interested my initial assumption is no one is interested I'm trying to prove that uh more than 40% people are interested in the sample Now what is my type 1 error type 1 error is basically reject hn when hn is actually true actually true so type one error will be uh reject H not we will reject H not basically mean uh we concluded that that we concluded that more than 40% people in the sample or in the population are interested are interested for the new parking area while in reality in reality uh less than 40% people were interested so this is my type one error what is type two error type two error is basically we fail to reject H fail to reject hn when hn was actually false so what is the conclusion for this we failed to reject H not so we concluded we concluded that less than 40% people are interested for the new parking area for the new parking area while in reality in reality H not was actually false so in reality more than 40% people were interested now if I ask you if let's say you are a government body and you are performing this test which particular error is more problematic for you as a government body as a government body type to error you can see that it is problematic for us because we have concluded that uh less than 40% people are interested in the new parking area which basically mean we will not build the new parking area so the uh the consequence of the type one error would be government will choose not to build the new parking area when it was actually required parking area when it was actually required so what is the end problem for this or major consequence that will happen to the government well people will not be happy with the government right so people will not be happy with the government right what is the consequence of type 1 error now the consequence of type one error is that the government will build the new parking area when it is not required so government will end up building the new parking area when it is not required and what is the consequence that government will will be uh facing government will be investing some unnecessary amount from their budget in building up this new parking area which will impact their budget so impact their impact in their budget now as a government body you need to decide which particular error you want to reduce because see since we are doing hypothesis testing since we are doing estimations we cannot be free from error there would there will be some kind of error but you have to decide which error you are ready to bear or which error is less problematic for you so if you are a government body you have to decide whether you are ready to face the impact in your budget or whether you are ready to face uh that people will not be happy with you well if I would a government I would not be um taking up the type two error on me because I want my people to be happy so I would do something with the budget I would try to maybe cancel some ongoing projects or maybe cancel some new projects but I don't want my people to be unhappy with the government right let's take one more example to understand how we can adjust the alpha because we know the alpha value right we can t t the alpha value we can increase or decrease the alpha value to increase the chances of type 1 error and reduce the chances of type two error so you can you can assume that uh if we if I'm having a number line then if I want type 1 error to be high I can increase the value of Alpha and it will eventually reduce the type two error the chances of type to error it is like a number L you can shift it let's see one last example so employee at a hotel perform daily water tank quality test okay uh to check if the water quality is good for the people to take bath and use it for toilet based on the test result Water Supplies stopped for 3 or 4 hours for the to perform the cleaning process or for the cleaning process if the test result are positive the water supply is not stopped test at Alpha of 0.05 now in this question we have are provided with the Alp of 0.05 what is the alternate hypothesis Let's uh you guys can pause the video think about it and then you can uh maybe comment on the chat section so now let's try to understand this question and I hope you guys would have already guessed what is going to be the alternate and the null null hypothesis my null hypothesis would be that the water quality is fine so water quality is fine is good fine or good my null would be sorry my alternate would be water quality is not good right now what is the type 1 error the type one error is basically that we reject hn when it it was so let's write it down we reject H not when H not was actually true in reality right so what is the meaning of this we reject hn so we rejected that water quality is fine when H not was actually true so what will happen so we will close the close the water supply we close the water supply when it was not required right what is type two error type two error is we fail to reject hn when when H not was actually false so what will be the consequence for this so we failed to reject H not so which basically means we concluded that the water quality is good when H was actually false so that basically means we will uh not close the pool or we will not close the water supply or maybe we let's not write V let's write the water supply was not stopped water supply was not stopped when the water quality was bad water quality was bad or not good now which one is more problematic so if you are a hotel owner then which one is more problematic for you so if I go with the type 1 error we have to close the water supply when it was not required which basically means uh we will clean or your your employee or the U maybe your uh the the the people who clean the tank they will perform the cleaning process but there will be a stoppage of water supply for 3 to 4 hour maybe this is not a big problem because maybe you can decide the window of afternoon and then the water supply can be stopped for 3 to 4 hour and it will not impact the the service right so when you have to stop the water supply you have to also think that uh it may impact the service but it will eventually not going to harm you uh in a very large scale because you can choose the window in the afternoon when most of the people are either out of the hotel or maybe um involved in some other activities they would not be using the uh the bathroom right what is the consequence of type two error water supply was not stopped when the water quality was bad now this is a more problematic condition let's say somebody is having a sensitive uh skin and that person is staying in your hotel and if you do not clean the water and let's say that person has taken bath in the contaminated water the person can attract some kind of disease which can eventually land your hotel into a big problem right so type two error is definitely very problematic for you now we have already learned that Alpha is the probability of type one error so Alpha is the probability of committing type one error so now if I increase the value of alpha there are more chances that I will commit type an error but it will reduce the chances of type two error so right now my Alpha value is 0.04 5 if I increase this to 0.10 then what happened I increased the probability of type 1 error so let's say this is type 1 error I increase the prob probability of type 1 error so it will reduce the probability of type two error and this is what you should do so while we were talking about this particular topic as a data scientist let's say you are perform performing hypothesis testing it is very important for you to carefully decide the value of Alpha and you can only decide the correct value of alpha by understanding the consequence of type of error so let's say in this particular condition if you are the data scientist working for this hotel chain then you would eventually use the value of alpha to to be on the higher side so that the probability of committing type to error can be lowered down and uh it is fine if I commit type in error because as we have already seen that type 1 error is something which is manageable but type two error is something which is not manageable I hope this makes sense so this was all about this particular video I hope this module was informative and really useful for all of you if you like this video share it with your friends give this video a big thumbs up and don't forget to subscribe to our scalers YouTube Channel

Transcript for:Essential Math Skills for Data Professionals

Transcript for:
Essential Math Skills for Data Professionals