so we're going to continue with the example that I started to write WR down the last time and then I realized that I had copied down the wrong formula so we're back on the comprehensive guide to inferential statistics and just to let you know where that is the first thing that pops up is inferential statistics and then we're down at the second which I really need to TI title these differently so it meant to be yeah yeah I thought I had so but in my head it made sense because it was under the same topic as before anyways whatever uh so testing the average weight of apples we have the sample data set as it's given and I corrected the code that is in here so we'll be able to then just literally copy and paste a bunch of stuff be able to work through with the example so we have our data set so we'll bring this up like here let me kind of drag that over like so so those values still indeed do match and same assumption as we had from the last time which is that we're having as our hypothesis that the average weight of apples is 150 g so same thing as what we had originally posted we have our significance level and now we come down here and now I have the correct formulas written in uh so Excel does not have and this was the reason why uh it was such a big deal was because Excel does not have a direct one sample T Test function so you end up having to write it as a formula and that was what I realized as I started to write it is because uh most of the time I don't use Excel I use use uh various programming languages and so in my head I was like oh yeah Excel definitely does it does it because all the other stuff does so I literally just typed down what I would do when I was doing the programming languages and that's not what works in Excel and so and I realized that then as I started to get the answer because then I remembered Excel doesn't have it so what it doesn't it does have instead is is that it has this that we can copy and put in so we're going to copy this then we're going to pop on over and we're going to paste it and you'll notice that it computed it quite nicely in fact and if we double click on it we will note and this is what I commented on on the last time so we have one of two options so I know I'm talking about Apple weights so just to make things work I'm going to go like that and I'm going to paste that in and now it's perfectly correct because if I double click on this okay I still have to adjust that that needs to be a 10 that needs to be A1 there we go now double click on it and now it's all good and if I drag this over so you can actually see what's happening bless you and and there we go so that's the quantity that it should be okay now after we find the T statistic the next thing that we have to do is that we have to find the P value of it now we reviewed What A P value was the last time the P value is the probability of having that that value we just found so we found this thing called a t statistic and that t statistic is telling us uh is giving us us a a level and now we're wanting to look for and how probable is it that we reach that level so we're going to copy this and I will tell you it's not going to originally work okay so we're going to say P value and then we're going to copy that down now I'm going to double click here and there's a reason why it's not currently working is because this is a stand in that I'm going to need to then actually type the stuff in so I've generically done the formula and now we're going to write it in okay so in here there's a little absolute value part and I'm supposed to write what my T value is or my my T stat is so I'm going to click on that because that's that actual cell and then on this part of it of where it says degrees of freedom I'm also going to change it so right back over here we're given the instruction that our degrees of freedom are the number of data values that we have minus one well for us the number of data values that we have minus one is nine and now if I click enter now I get a number now I correctly and successfully get a number so this value is 0.52 and then I come to here and I can compare compare the P value that's this thing right here compare the P value with our Alpha value if our P value is less than 05 we reject the null hypothesis okay so is this right there less than 05 the answer is no and so that means we do not reject the null hypothesis which let's go back up and check what our null hypothesis was this was our null hypothesis which means what is now the conclusion that we make for this problem the conclusion we make is is that it seems like based upon what we tested the average weight of an apple at this Farm is 150 g that's it that was the end result so let's go through the procedure for how you do it is you do each of these steps this step this was coming up with what we were going to test this step is pretty much going to be the same all the time we're not going to really change that this step is a formula and the big thing that you have to do so if you change this to a different example is you just have to make sure that in here the values that you're using match up with the data values for your example so for the example we were doing the data values were A1 through A10 and so that's what then goes into that formula in there then after that you then plug this in of where it uses in the middle of it the T value that we computed and the number of data values we have minus one which is what we call degrees of freedom then you just compare the answer that you get from this to 05 and that makes the determination for it so we're going to see more examples of this there's a whole section of where it has a couple different examples of this but this was our first example of doing this so now let's come down to confidence intervals now confidence intervals are very similar and also similarly I have it so that there's copyable Excel formulas so a confidence interval provides a range of values that is likely to contain the population parameter so using the Apple weights data set which is the same thing that we were just doing calculate a 95% confidence interval for the mean weight of apples what that means is is that we want to be bble to have a range of values of where there's a left value and a right value and we'll be 95% sure that the weight of an apple at this Farm is in between these two numbers okay so let's find out what are the two numbers so the first thing that we have to do is we have to compute the following we have to compute what's called a critical value and we can then just paste that on in and now we double click that and let's make sure that everything is correct okay now there's two things that we enter in this quantity is our Alpha value which we agreed is going to be basically always 0.05 the next number is the number of data values we have minus one one again it's that thing that we call the degree of Freedom what does it mean unimportant just know that it's take the number of data values you have and subtract one so we have 10 data values which means our degree of Freedom has to be nine yeah that's it so we go ahead and click enter okay so that's our critical value and now we then say okay so now what we're going to do is we're now going to compute our lower bound and our upper bound so our lower bound allows us to copy this and paste it and then we double click on this we see okay observe the places where it's referencing our data values A1 through A10 that's our data value that's not going to change that's not going to that's our data value that's our data value cck enter then we double then we select this stuff right here bless you like so paste it in done okay now we now have two numbers we have this one number that's the lower bound that's like the left side then we have this other number that's an upper bound that's like the right side and here's then what we say we are 95% confident that the weight of an apple at this Farm will fall between these two numbers that kind of makes sense conceptually what we're talking about and you'll notice that for when we look at the values that we tested yeah that kind that Ma matches up like so that seems pretty reasonable like the average the average Apple should kind of fall between this range not every Apple the average Apple and we see an interpretation which is right here okay so now we come to a new type of test so those two things were continuations on what we did the last time now we come to a new type of test which is where it's testing for the average between two groups to see whether there's a difference in the average between between two groups and that's what we refer to as an independent two sample T test and it similarly has copiable Excel stuff that we're going to see so we're going to run through the example okay so for this we're going to create two data sets and I'm just going to put them in the two columns without label because it doesn't really matter the label and 148 and 151 and 149 cool cool cool cool cool cool and then our next column be 155 158 if I type that correctly 160 159 156 awesome now the big thing about this is these values in these values match up in terms of that there're the the the same number of values in each of those two things so things match up so this is from Farm a and this is from Farm B and the description we're given is is that we're going to compare the apples from two different Farms we're wanting to see if there's a difference in the kind of apples that these two Farms Produce sounds reasonable right okay so we will have a null hypothesis so this is now back to hypothesis testing so our null hypothesis is going to be the average weights of apples in both Farms are equal okay so that's oural hypothesis and our alternative hypothesis is that the weights of apples and the Farms are different and at the end of the day we will end up with a P value and that P value then is going to give us uh a number which we will then check to confirm whether or not uh whether or not we're able to retain it's called retain or reject our null hypothesis now when we go into Excel so let's go in here and we click on test this is a pretty pretty helpful thing to do is when you're going to do something to click on the help section because the Excel help is actually very very good at explaining things so in particular right here this is telling us what it is it Returns the probability associated with a student T Test cool so in other words it's going to give us a P value P value is a probability value so it's going to compute for us a P value that's quite nice so there is then every cause for us to proceed with copying the formula and we can just go ahead and paste it on in anywhere and if I double click on it I can make sure it is referencing the correct stuff and it is it has correctly highlighted the two portions of data that I would like to use go ahead and click cck enter I have this value and then what I do is I now check this value against 05 is this less than 05 well it's 0 than a five which means it is less than 05 anytime my P value is less than that number right here I get to reject the null hypothesis so which means I come up here and I say this is rejected if this is rejected then what do we believe we believe the alternative which means in this particular example the weights of apples in both Farms are different questions now I'm going to show you I'm going to come back to here in just a second but I'm going to show you there's a whole section right here that has more examples of test things so that you can run through them and that's right here we will highlight this probably on Monday but it has a lot more it has a number of more examples of how to go about things for Z tests etc etc etc etc but so let's go back to our inferential statistics which was right where we were and then let's scroll down to the second part and let's continue okay so we just did independent two sample T Test let's go down to here par T Test okay so when we did that first example we just had a collection of five values from the one farm and five values from the other Farm but the thing is is that it's not like the apples actually matched up it's just so happened that we needed to have five apples from the one group and five apples from the other group but it's not like the apples were directly comparable so in other words it's not like I am saying this apple matches with that Apple well now we're going to do something that is a sort of related type of a thing so let me create a new worksheet here we're once more going to have two kind of simple data sets so 140 142 145 150 155 and then 150 148 149 155 160 and now as it says because there's instructions here now what it says is is that there are is a before and there is an after so let's suppose that what we're talking about is let's suppose that we are talking about apples exactly on some particular tree we identify five apples on the tree we don't remove them but we're able to weigh them because you can do that you can weigh an apple while it's still on the tree and what we're going to do is is that we're going to then test out some new F fertilizer for the tree so the apples have pretty much stayed the same you know weight all summer or whatever and then what we do is we apply this new fertilizer to the tree and then we're going to then test the weight of the apples after having applied the new fertilizer this is like an example scenaria so that leaves us with then we have our data set before and we have our data set after and what we're going to try to do is is we're going to try to see is there a difference between the before and after measurements now we know that the numbers are different that's not what we mean we're not asking are the numbers different they are different what we're instead asking is are we able to conclude that they're so different that it can't just be because of chance that's what the meaning of this test is okay so for example at the moment it's sunny tomorrow it could be raining right well is that due to chance or is there a reason for it that's what we're testing when we do this kind of a test so like for instance if you were to if you were to have rain then rain then rain then rain then rain then rain and it were to rain solidly for 2 weeks that's probably not going to be by chance right it's probably that there's something like a tropical storm or a hurricane somewhere off the coast and that's the reason why we're getting two weeks of rain that's really the only reason that that ever happens so that's the kind of thing we're talking about are things so different that there's no way it's just chance that it happens to be that way or are they so different it must be that there's a reason for it so that's what this test is testing for are things so different there must be some hidden reason why it's different okay so the null hypothesis is then there is no difference between the before and after measurements and the alternative is that there is a difference between the before and after measurements okay so how do we perform the test well you'll notice that this right here the part we're about to copy looks very very similar to the part that we copied from before it's almost the exact formula I was not expecting you to have that memorized I'm just commenting and pretending as if you noticed it so it's very very very similar the only thing that's different is is that for the thing we're now about to do instead of there being a two there when we scroll down there's a one and that's because in Excel when you change the two to a one it changes what you're comparing when there's a one there it it uses the formula of where each value is matched up on purpose and that's like a before and after test and when there's a two there it means that it doesn't matter how the values match so when we did our previous example it was just five apples from a from the one Farm five apples from the other farm and it doesn't matter how they compare to each other now it's as if it's the same five exact apples and there's something that happens beforehand and something that happens afterwards so now let's copy in here that formula that we just found let double click make sure it's using the correct data it's correct press enter and now we have this value so now we ask the question how does that number that we just computed how does it compare to the value of 0.05 is it bigger or smaller it's smaller okay and so the way that we make our decision is is if our P value which is what we just found is smaller then we reject the null hypothesis so which conclusion do we make is there a difference for the before and after or is there no difference for the before and after so we go up we go here and we say we got to reject this which means this one is not true if this one is not true then that means this one is true so yes difference or no difference yes difference so can do you start to see how this is very very uh so it's sometimes referenced as uh routine it's like once you start to understand the pattern of what you're supposed to do for these problems it doesn't really matter what the question's asking once you have it written up you're just going to do the same steps each time yes correct if it's less than 05 you reject the null and you and you use the alternative now how do you make the null versus the alternative so the null is always going to be something like there is no difference and the alternative is always going to be something like there is a difference and then when we go up here for things like when we were testing the average the average is always going to be the the null for an average test is going to be that it's equal to some number and the alternative is going to be that it's not equal to that number okay now we did a little bit of stuff with the Ki squar test the last time and we're going to do more on that on Monday because I added a section to make lives easier for everybody I added a section of where we can just copy and paste it into Excel so and it's going to do all the formulas for us which is quite nice uh so we're going to come back to this if not today then on Monday and we'll see how then in a very similar way it'll answer all of our it'll autod do all of our answers for us so that's quite nice okay but so let's for the time being let's take a quick pause on lecture topics and let's go to Project one because we want to make sure that we know that it exists and understand what it's asking us okay so the following is our first project project one as it says written by me uh and it's a couple different real world business scenarios uh so I built it into the assignment and hopefully everybody can figure it out and see it I think the details show up but this is supposed to be written as a group assignment does that how does that show up on canvas does that it tell you that it's supposed to be a group assignment okay so it doesn't mention that it's a group assignment f times leave student view okay you found it okay cool so let's see if I can find that same thing as if I'm a student so people groups cool so these are the available groups so I made just 10 basically like groups so you all can choose how you're in a group it's self assigned so I'm not allowed to join any of the groups sadly um so you can pick how you want to be in your groups so I I think I did set a cap I don't remember how many people I put capped off um but you can put yourself into groups and if we go back to the assignment let's go back to the assignment click click click click click click click click click okay so back to the assignment so for each question I include a little section that's a hint section so for instance if I click on this it gives an overview of how you should go about thinking about the problem in terms of statistics so up to this point this is like the real world WR the real world writing of the question and it doesn't tell you directly do this or do that the hint section is what's telling you what stuff out of what we've done in this class you should be using on that problem so for instance on this question question one is you are working as an analyst for a retail company that wants to assess the sales performance of its stores the following table shows the sales revenue and thousands of dollars of 12 stores for the first quarter now I'm not going to do this question for you but I'm going to show you something about how to answer this question that might be helpful so for every question I made it so that you can copy the table and paste it into Excel so you do not need to type it out by hand that will lead to an error I now have this is correctly a table in Excel and it has the two columns and you're then given your tasks so task one analyze the store Revenue data to calculate relevant statistical measures so that would be things like the mean median mode Etc now we have formulas for all of that in the off chance you forget what the formulas are well for starters we have this little thing which is if I then click on Central Tendencies and distributions I can see that in each one of these questions I'm given what the formula is so for instance the the formula for the median median etc etc etc so does that kind of make sense how you would approach this question okay then you're asked to visualize the distribution now I did that very quickly one time if you don't know how to visualizing an Excel Monday is a perfect day to ask me how to do that okay sound good then interpretation what do the statistical measures in the graph tell you about the the store's sales performance so that's where we had interpretation things going back here we had how to interpret this stuff in particular we had examples that were very similar when we came down to the following examples like this of talking about these values are exactly in line with what we're talking about so then we come down to employee satisfaction so that's question number two HR department of Corporation conducted a survey to measure employee satisfaction on a scale from 1 to 10 they surveyed a sample of 15 employees and obtained the following results results blah blah blah blah blah now you can you could copy this into Excel um so analyze the data to draw conclusions or you could just type it out there only 15 values analyze the data to draw conclusions construct a statistical estimate for the average satisfaction score that management can use for decision-making so a statistical estimate so maybe it's not clear to you what you're supposed to do so you click on this and it says construct a confidence interval oo okay so that means you don't have to wonder about what you're supposed to do for this problem you're supposed to conduct a you're supposed to uh make a confidence interval and that's where when we go back to right here and we go to our inferential statistics so if I click on it and then we scroll R down you'll see that all you have to do is take the confidence interval section here and you can just simply copy and paste it and then you just change it to being about the actual data values that you're using so so for instance because there's 15 values it won't be A10 it will be a15 okay cool now the thing also I'm going to suggest is is that as you do the problems that you do things like this you create the little worksheets down here you know problem one problem five etc etc so that you have your Excel sheet for each question because then you can go back and reference it and in fact most of this will work uh and most of the formulas will work in Google Sheets if you wanted to use Google Sheets instead but I would actually recommend I would recommend Excel it's nice for you to become familiar with Excel and then interpretation said Etc a Manufacturing Company claims that the defect rate of its new production line is no more than 5% you collected the following data from a sample of 200 products defective products non- defective products cool and together those two numbers add up to 200 test the company's Claim about the defect rate using appropriate statistical methods okay so we click here and then say okay what's a z test for proportions and then we go what's a z test for proportion well that's what we're doing on Monday so if we go down to here you will find that there is in fact a z test for proportions that we're going to go over so we're going to do this on Monday and it has nice little copiable stuff that you can just copy and paste and it gives you the answer and we will find out what that is as we do the example on Monday so we'll see an example of this so that takes care of question number three question number four marketing company wants to analyze the relationship between customer age and their purchase decisions when we click here we go okay well how would we approach that question well this is going to be a Ki squar test and we already did one example but we're going to again on Monday we'll see one more example of how to do this and and we'll see how we can just copy and paste the formulas that I put in from Excel to answer a question that will be similar to this it won't be exact but it'll be similar to this and I'll also tell you that you have an additional resource made for you also of where if I click on Ki squar Independence right here and if I scroll down in the event that what you're doing happens to be a 2 by two table so let's suppose that it's 14 7 five and six if it's something that happens to be either a 2X two table or a 3X3 table I made something that will just automatically give you the result and you're welcome to use that if it's not a 2 by two table or a 3X3 table then you actually have to do stuff but problem five comparing store profit margins similarly you can click on it the means of two independent groups a two sample T Test where would I figure out how to do a two sample T Test so I go here and I say that sounds like something we did in inferential statistics almost like a two sample T Test so all I'm going to need to do is is just copy this formula and make sure that it references the correct data we feeling okay so far and then on the last problem this is also something we'll see on Monday so a paired te test we'll do one more review of this but a paired T Test we also did today it's right down on here it's this one we did so there's only one question of where we have to see actually an example for the first time on Monday which is question number three every other question we've seen perfectly enough of where you can get started on it so the remaining time which is approximately seven minutes I'm going to leave for you to be uh uh figuring out what groups you'd like to work in go for it introduce yourself to people if you've not yet yes so so if you go up to here up at the very top what you're submitting is you're submitting a word file or a PDF that's what you're submitting as your submission so you work in Excel you submit as a Word document so what that means is is that you can have then like a word file of where you'll say like problem one and then you want to have something of where it's I would suggest you copy stuff like this you are working you know each problem has a problem description copy the problem description do you need to copy the table you can copy the table that's fun it makes it like a really nice presentation and then you can then just copy these tasks and then write your answers afterwards and so at the end of it you'll have in a Word document yeah and then you just write your answers in that you found from Excel yes not for this one no Excel needed just the word document form your groups if you've not yet you kind of knew this was coming so I was kind of hoping that you kind of introduced yourself to somebody e