Transcript for:
Statistics Exam Review and Problem Solving

In this video, I'll be going over a practice test I used to hand out when I was teaching at university. I'll be going over every single question on that test. I cannot show you the test because that's a copyright infringement, and that's always fun to deal with. But we're going to be going over every single question.

I have tweaked the numbers so that it's not word for word. The first question is... Suppose that the mean and standard deviation of the number of gallons of milk sold per day at a local supermarket are 194.35 and 22.74 respectively and is approximately normally distributed. Calculate the proportion of days that the supermarket will sell below 175.32 gallons of milk. So from the question we know that what the population mean and population standard deviation are.

and that what we're using is a normally distributed approximation in regards to the x that we were given. So the x that we were given was 175.32. The point of this question is to immediately let you know that you're going to go into the z distribution.

And you know that from it saying normally distributed. And it's just a matter of plug and chugging. So you know that it's the z distribution. You're doing a z score. calculation.

So we have z equals x minus mu over sigma and we have everything we need. We have 175.32 is x minus 194.35 is mu after that subtraction divided by 22.74 and that gives you the z score which is negative 0.83685. Now in most cases you're going to want to just round up to negative 0. 84 so that when you do the z-score comparison to the z-table you know exactly where to go and since rounding that's what you would use and so we know from the z-table proportion is 0.2005 so that's what it is from the z-table if you're able to use a z calculator whether in your calculator or the teacher gives you access to online resources in some capacity The specific calculation is 0.20134. The second question states, In a recent survey of 148 graduates, 122 students said that parking was too limited on campus. What is the estimate of the population proportion?

What is the standard error of this estimate? So this is a pretty straightforward question. You're doing two different types of calculations. The first one is to do the population proportion estimate, which is x over n.

We are given both of those numbers. n is 148, x is 122. Then, you calculate it. 122 over 148 to give you 0.824.

The next thing we're asked to do is a standard error calculation, which is the square root of p times 1 minus p over n. Using what we... previously calculated we get square root of 0.824 times 1 minus 0.824 divided by 148 which gives us 0.313.

Now it's really important to note that in the calculations where I got these answers I did not round. Ideally that would be the same for you. Don't round unless you're giving the answer.

When you're doing the calculation you want to retain everything as accurate as possible. Fill in the blanks. After conducting research around the country, General Motors Company determined that the probability of a randomly selected GM vehicle on the road being driven by a male is 0.5421. If there are 4,213 GM vehicles in your town, around blank cars, give or take, blank, will be driven by males. Assume each car represents an independent trial.

Now, the blank cars, give or take, blank, will be driven by males. means you're looking for a mean and then you're looking for the standard error of that mean. Some classes will say it's a standard deviation.

It's actually the standard error and we're going to be using the exact same formula as we did previously, which is the standard error estimate of square root of p times 1 minus p over n. The question already gives us an n, which is 4213 and the proportion which is 0.5421. Now it's just a matter of calculating the standard error, which you get as the square root of 0.5421 times 1 minus 0.5421 over 4213 to give you 0.0077 and that's rounded. Make sure that when you're doing these calculations you don't round them, you try to keep them as accurate as possible so that you have as close as possible to the estimate that your teacher is looking for.

Or if it's a multiple choice what will be on the multiple choice. The next step is to take those estimates and multiply them against the n which is 4213. So for the mean you multiply 0.5421 times 4213 to give you 2283.8673 and for the standard error you take 0.0077 times 4213 to give you 32.3386 The question is, the average car gets around 23.86 miles per gallon with a standard deviation of 1.134. Suppose all manufacturers vow to increase fuel economy over the next two years. If successful, 5.49 miles per gallon is added to every observation in the data set.

What is the new mean? A few things are going on here in the question, but the important thing is to understand the property of the mean. So what happens when you add... 5.49 miles per gallon to every observation.

So I'm going to go through a few examples just to to show what happens. Here I have 3 plus 3 divided by 2. It equals 3. The next is 4 plus 4 divided by 2. It equals 4. The next is 5 plus 5 divided by 2. That equals 5. Now you can look at this as the first one is the base, the next one is plus 1 to every observation, and then the next is plus 2. You can see by adding the same amount to every observation will increase the mean by that much. And the graph shows this where as you're adding the same number to every observation, it moves the mean over to the right as long as it's a positive number.

The standard deviation is just something to distract you. The mean is simply 23.86 plus 5.49 which gives you 29.35. When rolling a six-sided die, 195 times. What is the probability of rolling a 3 greater than 50 times?

Alright, so this is a question that deals with what's called binomial approximation, where you're trying to take something that has a binomial distribution and applying it to the normal distribution by z-score. So for a z-score, we need x, mu, and sigma. Mu is the mean, sigma is the standard deviation.

We are given the x. The x is 50. So how do you get the mu? The mu is the expected value or n times p. So you take 195 times the probability.

So what's the probability of 3, 1, 6? Because it's a six-sided die and 3 is 1 of 6. So it's 195 times 1 over 6. What do you get? You get 32.5.

So that is our mu. Now, what about sigma, the standard deviation? Well, the standard deviation is the square root of n times p times 1 minus p. 1 minus P is sometimes referred to as Q, but for those that don't use that notation, 1 minus P. So the standard deviation would be the square root of 195 times 1 over 6 times 5 over 6, which gets you 5.204.

And that is rounding. Alright, so now that you've got everything in the formula, let's calculate the z-score. So we have 50 minus 32.5 divided by 5.204. What does that give you?

gives you a z score of 3.36. Now in the z table you get a probability of 0.9996, but because you're covering everything from the left of that number, it's going to give you the wrong answer to this question because we're looking for everything greater not less than. So we have to do one additional calculation and that is simply 1 minus 0.9996 which gives you 0.0004. 4. And so that is the probability of getting 3 greater than 50 times when rolling a six-sided die 195 times.

Suppose that the average and standard deviation of the find for a moving violation on a particular highway are 125.25 and 13.59 respectively. Calculate an interval that is symmetric around the mean. such that it contains approximately 68% of fines.

Assume that the fine amount has a normal distribution. So this is very obviously an empirical rule application question because it's giving you a lot of things at once. It's saying calculate approximately 68%, normal distribution assumption, and then it gives you the mean and standard deviation.

It's not asking for a confidence interval per se. It is giving you the interval percentage that they're looking for. This ends up being a very easy calculation because you're only looking for the first standard deviation difference around the mean, which gives you 34% on one side and 34% on the other side.

And so we have the mean at 125.25, the standard deviation at 13.59, the upper limit calculation is 125.25 plus one times 13.59 and the lower limit is 125.25 minus 1 times 13.59. The reason why I have the 1 times there is if it was two standard deviations they were asking for say 95 percent you just switch that to a 2. If they were asking for 99 you switch that to a 3 and that's how you get it that's the formula for figuring this out it's very simple. In the notation for an interval the final answer is 111.66 to 138.84.

The five number summary for a data set is given in the table below. Using this information, calculate the interquartile range or IQR. So in the table we have min, q1, median, q3, and max.

The interquartile range formula is q3 minus q1. So the interquartile range for this data set is 12.85 minus 8.96. So the answer is 3.89.

I know this is a quick one, but questions like this come up on full-term exams, so just helping you be prepared. Suppose the average number of hours a person spends on video games is 157.65 hours per month. If the distribution of the hours is left skewed, such as the plot below, will the median be larger, smaller, or approximately equal to 157.65?

The important term to notice here is left skewed. Because it's saying left skewed it already gives away what it's supposed to be. It's a property of the distribution.

The median here is 176.57. So the median is larger than the average. That's the case for all left skewed distributions. Right skewed medians are less than the average and then when they're symmetric, they're approximately the same.

Imagine that your exam for your statistics class has 50 questions. Each question has four multiple choice options giving you a probability of 0.25 of getting each question right just by guessing. Assuming that you guess on all questions, what is the probability that you get between 9 and 11 inclusively questions correct on your exam?

This is another z-score binomial approximation but it deals with the continuity correction. The reason why I know it's a continuity correction is because it says inclusively. This is to distinguish between other binomial calculations and probability questions given the binomial aspects of this question.

So we're looking at the z-score equals x minus np over the square root of np 1 minus p and the continuity correction is going to be x plus or minus 0.5. Now when you do plus or minus 0.5, remember at the lower end, because you want to include 9, you need to subtract 0.5 from 9. And because you want to include 11, you need to add 0.5 to 11. Let's get into the calculation. We have probability of 9 less than or equal to r, so r is standing for right, less than or equal to 11, will equal, for the continuity correction, the probability of 8.5 less than or equal to r less than or equal to 11.5.

And so we then calculate the mu or the n times p part of this calculation to get 50 times 0.25 to give us 12.5. Those are how many questions we're expected to get right just by guessing. The standard deviation is the square root of np times 1 minus p which gives us 50 times 0.25 times 0.75 which gives us 3.06. Now we do two different z-score calculations. The first one is 8.5 minus 12.5 over 3.06 which gives you negative 1.306.

The next is 11.5 minus 12.5 over 3.06 to give you negative 0.327. And after you've got all of that just there's a lot of. like minor calculations that happen with this one you have approximately the probability of negative 1.306 less than z not r z less than negative 0.327 which equals the probability of z is less than negative 0.327 minus the probability of Z less than negative 1.306, which gives you 0.372 minus 0.096, which gives you 0.276.

Now there's a few things I want to add here. The first one is I did rounds. The second thing, I did use a binomial CDF calculation, which you can use in Excel or a calculator. If you don't know how to do that, just use the Z table. Suppose that in your country the typical adult male is 64.58 inches tall with a standard deviation of 4.357.

You take a random sample of 28 adult males. What is the probability that the mean height of the sample is less than 65.28? The first aspect you want to discern is are we talking about T-score or Z-score?

What will make people hesitate is the fact that it says 28 adult males and you have this rule that a lot of people give where 30 is the benchmark for whether or not you use the t or z distribution but there's an additional aspect that needs to be there which is the standard deviation needs to not be known for the population but in this case we know what the standard deviation is for the population and because we know that the t distribution is not appropriate because the t distribution with the degrees of freedom is trying to account for the fact that it doesn't know the standard deviation. Now let's go into the calculation because we know what we're doing now. We're doing a z-score calculation.

We have the equation here z equals x minus mu divided by the standard deviation. So in this case we have z equals 65.28 minus 64.58 divided by 4.357 which gives us a z-score of 0.16. You get the probability of 0.564.

Now that's rounding whether or not you use the z-score. use like an Excel normal distribution approximation or the z-table. A sample proportion is calculated from a sample size of 324. How large of a sample would be needed to decrease the standard error by a factor of 5?

The first thing we need to do is understand the equation for the standard error of proportions, which is the square root of p times 1 minus p divided by n. All of that is underneath the square root. The numerator actually doesn't matter here, because what we care about is the sample size. So we can make this a more concise formula for us to try and solve.

what we're trying to solve which is the square root of 1 over n to give us 1 over 5. What number needs to be n for you to get 1 over 5 after going through the square root? Put it a different way the square root of what equals 5 and I think most of us would understand that that's 25 because 5 times 5 is 25 and that's just the reverse of the square root. Now that we've got 25 for n what do we do now? Now we just simply multiply 324 by 25 which gives us 85. 80-100 and the answer is 80-100.

Below that answer here I have used an example to validate what I've just shown in an example where P equals 0.25. Suppose that a healthy human body has an average body temperature of 96.07 with a standard deviation of 1.764. According to the central limit theorem what happens to the histogram of averages as n increases?

limit theorem it is generally understood as as n increases the distribution looks more like a normal distribution regardless of what the original distribution is. Now there are very rare cases where the original distribution pans out, but those are exceptionally rare. And this is one of those questions where you do have multiple choice or you typically have multiple choice.

We're going to walk through the five potential answers to see why it isn't true or it is true or you know basically what the correct answer is and why. So the first one that we're going to go over is E. E is increasing N will not influence what happens. Well clearly that's not the case because basically the first line in the central limit theorem is as N increases.

so n definitely has a part to play. For d, it depends on the mean and standard deviation. The reality is it doesn't as long as the standard deviation is above zero. This particularly says mean and standard deviation and so because it only really depends on the standard deviation being greater than zero it doesn't apply.

C is the histogram will look like the original distribution. As I stated, this is something that happens very rarely. Generally speaking, that's not the case. For b, the histogram will look less like the normal curve.

That is the complete opposite of what the central limit theorem states. And A states the histogram will look more like the normal curve. So that's the answer. What I have below is a graph showing that as N increases, the more it looks like the normal distribution. An energy drink company has observed that daily sales are normally distributed with an average of 6,783,456 drinks sold with a standard deviation of 6,445.769.

What is the probability that on a given day below 6,765,958 drinks are sold? This is a different way of asking a question that we've seen before. which is what's the probability based off the z-score of this happening. And the reason why we know that is because it says it's normally distributed and it gives you the setup for the z-score which is an x, the average, and then the standard deviation, as you can see here in the formula image below.

We have all that information so let's just calculate it. z equals six million seven hundred sixty five thousand nine hundred fifty eight minus six million seven hundred eighty three thousand four hundred fifty six. over 6445.769 and that will give you negative 2.71. Now using the Z table you're able to get the probability of 0.0034 and that's the answer.

For each of the following variables identify the type of variable as either categorical or numerical. The first one is employee at a store. Hourly worker, manager, or director.

Two. Income tax brackets below 9,000, 9,000 to 45,000, 45,000 to 90,000. Now for hourly worker, you're just given labels.

You're not given any numbers, no digits at all. That would be categorical. The second one, the income tax bracket, it gives you a range. A range by itself is actually not numerical.

And I'm going to show you why that is. Here's a table looking at three citizens within a country. You see how much they make. as well as what tax bracket they belong to. For the first one we have $8,953.

That's a numeric value, it's a specific number, but it fits in the category of below $9,000. Now the second is $12,354. That again is a specific number and it goes in the category of the range $9,000 to $45,000. And the third is $67,253. and the category it belongs to is the range of 45,000 to 90,000.

We get from this that each variable is a categorical variable. If you want to know more about the difference between numerical and categorical, I do have a video devoted to that difference. I will link it in the description as well as in the corner here.

Which best describes the shape of the histogram? So looking at the graph, we see that a lot of it is centered to the right and then it kind of trails off to the left. The answer is left skewed, but if you don't quite understand why it's left skewed, think of it like this.

When it comes to skewness, the trailing part or the tail is what determines how it's skewed. So if it goes to the left like that, it's skewed left. If it goes to the right like that, it's skewed right. If it looks like a bell curve, like the normal distribution, it's symmetric. And to address D, if there are two modes, as in there are two peaks, as there is on the right side say there was another one on the left side you would say it's bimodal.

Suppose that the mean and standard deviation of scores for a physics exam are 87.54 and 4.23 respectively. Assuming the distribution is approximately normal 93.07 percent of students scored less than what score? We know from how this is set up that we're going to be dealing with the z-score and the reason why we know that is because you're given a mean and standard deviation.

We're told that the distribution is approximately normal and that we're given a percentage. The last part is the thing that trips up a lot of people because typically you're given the x, as we've seen in other questions. So what do you do with this? You work backwards and then you solve for x in the formula. So I'm going to be using a z-table to calculate what the z-score is.

When you look for 0.9307 in the z-table, you find 1.48 is the closest. You get 1.48 and then the 1.48 equals the formula for the z-score, x minus 87.54 over 4.23. And then you just algebraically solve for x. So you multiply both sides by 4.23 to give you 6.2604 equals x minus 87.54.

And then you add 87.54 to both sides to give you 93.8004 equals x. Suppose 43.54% of small businesses experience audits in the first three years of being open. An investigator takes a random sample of 437 businesses that have been open for three years or less. What is the probability that less than 37.45% of the businesses experienced audits? This is a one proportion problem.

The reason why I know that is because when it goes to set out a designation, which in this case it's three years or less, All of the sample is within that designation, whereas in the binomial approximation we did earlier, it was we're looking for this value compared to all other values. An important thing to note is it says less than 0.3745, so how you calculate from the z-score at the very end will be determined by this. So just remember that when you're doing it. First thing we should know is what the z-score formula is, which is the p that you're asked to approximate minus the established p over the square root of the established p times 1 minus the established p over n and so that would look like z equals 0.3745 minus 0.4354 over the square root of 0.4354 times 1 minus 0.4354 over 437. which will give us a z score of negative 2.57. When you look it up from the z table you get 0.0051.

The average score of PGA golfers is 69.25 with a standard deviation of 2.347. A random sample of 54 golfers is taken. What is the probability that the sample mean is between 69.25 and 69.55? Now to see what we need to do we need to take stock of what we are given.

The first thing we're given is the mean, standard deviation, a sample size, and then we're told to take the difference between two numbers. Because of the first three things, we know that we're doing a z-score calculation for the sample distribution because we are given a sample. If we're not given a sample, you just do a regular z-score.

It's x minus mu, which is the average, divided by sigma, the standard deviation, which is also divided by the square root of n. When you're doing this calculation, I recommend that you separate the calculations for yourself. You do the numerator first, you do the denominator second, or vice versa.

The important thing is that you just do it separately because if you're not used to doing convoluted calculations like this you're very prone to error. Just something to keep in mind. The first calculation we have is 69.55 minus 69.25 over 2.347 divided by the square root of 54 which gives us a z-score of 0.94.

And from the z-table we have a probability of 0.8264. The second is 69.25 minus 69.25 over 2.347 divided by the square root of 54 which gives us 0. And from the z table we get a probability of 0.5. So because we're asked to look in between these two numbers all we have to do is do simple subtraction.

Obviously you want to do bigger minus smaller because you can't have a negative probability. We have 0.8264 minus 0.5 to give us 0.3264 and that's the answer. If you have any questions or comments please leave them in the comment section. I hope this series has been helpful. Thank you for watching and stay nerdy my friends.