Comprehensive Guide to AP Statistics

All right, so let's tackle everything you need to know for EP Statistics unit one. Starting off with the difference between categorical and quantitative data. Quantitative data just deals with numbers. So think quantity, numbers, that's stuff like heights, class size, population size, anything you put a number on. And categorical is stuff that's like names and labels. So think like eye color, hair color, you can't put a number on that. Um, so we're going to branch off here and talk about categorical data, which you want to represent with a two-way table. Um you've probably seen these before, but basically you have two variables on either side and it just shows you like the cross or the um intersections between certain variables. So you see here we have math students. Sure, we have 10 math students, but then we also have the internal and external variables that split up the number of math students. So we have three math students who are internal, seven students in external in the same case for English. Um, so the important thing you need to understand is a couple vocab terms that the AP might use in your um, teachers might use on your midterm is marginal relative frequency. That's a percentage of the data in a single row or column compared to the total. And if you look at this little chart here, that's going to look at uh, B over D or C over D. So we have our column total, which is just going to be C, C over D, and then our row total is B over D. Um the next thing we're going to look at is joint relative frequency. So that's percentage of data in a single group compared to total. Um so group here that might be a little confusing is just uh we're going to look at a right. So that's just a singular uh variable or intersection that we're going to look at between a certain variable. So that's a over d. And if we look in our two-way table with actual variables, that's something like math internal. So that would be 3 over 23. Or maybe we want to look at external English. So that's you look down English externals. So that's 9 over 23. Okay. So next we're going to talk about conditional relative frequencies. That's percentage of data in a single category when you're given a specific group. So now that we're going to be looking at a over b or a over c. Now that's something like just a right and then the total in the group or column is just going to be over a over c or a over b. And in our little two-way table, that's going to look like, let's say we're given that the student is in math. Um, so that' be 10. So our total will be 10. And then we want to know what percentage of that is internal. Well, we have three. So that would be 3 over 10 as an example for that part. Okay. So now let's go on to quantitative data. So for quantitative data, you want to know how to describe it. Big part of AP stat is just interpreting because a lot of the calculations are pretty straightforward. You have a reference table. You can use your calculator all that. Um but you need to know how to interpret. So for quantitative data use the acronym C socks. Uh the C stands for context shape or S stands for shape. So is it symmetrical? Is it skewed? Number of peaks. Is it unimodal, biodal? All that. Look at the outliers, right? Any variable uh any numbers or data points are super outwide. Um compared to the rest of the data, you have your center, right? So look at your mean or your median. Um you have spread, right? that's range, your standard deviation, your IQR. And then here's a little tip when you are describing the data is utilize descriptive language like strongly roughly like roughly symmetrical and all that. Um, and also use comparative language to maximize number of points you can get on the exam. Okay, now let's go on to some basic terms you must know. I mean, you've probably seen these before. Mean, that's just the sum of all the values divided by the number of values or the average. I mean, that's like the average value. Standard deviation is just a measure of variation. And here's an important part. When you're describing it in context, okay, so you want to say like the value, context, it typically varies by standard deviation value from the mean of mean. That sounds kind of weird. So let's look at the example. The average prep worker subscribers IQ typically varies. So prep worker subscribers IQ is the value context. Always put it in context. Uh varies by five IQ points. So five IQ points varies by five IQ points. That's the standard deviation from the mean of 169 IQ points. The median is just the 50th percentile. So that's where if you have a data, you organize it from least to greatest and then look at the value in between, right? You probably heard of that. Um, and range is just the value, not the interval, the value of max minus min in your data set. Okay, so the last thing for exploring data in this part is how to make box plots. You need to know the five number summary. So the first number in the summary is your minimum. That's just the smallest value you have. Then you have the 25th percentile or Q1. That is just in between your minimum and median, right? It's the exact it's like it's pretty much like the median of your minimum point and 50th percentile if that makes sense. And then the next number is your median or 50th percentile. Then is Q3 or your 75th percentile. And then your maximum value. Okay. So next thing is IQR. IQR is just Q3 minus Q1. That's important because you also need to be able to describe specific outliers in their characteristics. That being low-end outliers or high-end outliers. So low-end outliers, uh, like the name suggests, is just a super super low outlier. Um, so that's going to be any value less than Q1 minus 1.5 times the value of QR. And the same thing for high-end outliers, that's a super high outlier. That's any value greater than Q3 + 1.5 * IQR. So, make sure you know those two equations. And here we just have a visual diagram because box plots are cool to draw. All right. So, to round off AP stat unit one, we just have a couple more terms and then we'll get into normal distributions. Um, the first is officially defined percentile. We did talk about it, but how I think you should think about it is that it is the percentage of values that are less than or equal to a specific value. And then we also have cumulative relative frequency. So that that shows the cumulative percentages from each interval up through all the data. And here here we have a visual. So you can see here when we have a data point that obviously is graphed. But when we don't have any data that is just a straight plateau and then when we get data again oops oh no it's bugging. Okay. So if we get data again then that thing is going to go up because it's pretty much like a I would say a running total of the relative frequencies and relative frequency is just like the chance of something occurring. Um so that's like the occurrence or frequency over the total. Okay. So now let's talk about zcores. So zcores are tying back to the idea of standard deviations. Zcores are simply the number of standard deviations a value is away from the mean. And this is the official equation for it. It's whatever value you're looking for the zcore for minus the mean over the standard deviation. All right. So now we're going to talk about what happens when you transform data. Okay. So let's say you have a data set. What if you added a certain constant like you added five to every single value or maybe if you multiplied every value by 20, right? What would happen to the shape, the variability and also the center? Well, here is a summary of what would happen. So if you added or subtracted all the data values by the same amount, the shape and variability will always stay the same. The center, so that's like your mean or median, will move up or down by that amount. Now, if you multiply or divide, that's a different story. The shape will still stay the same, but now your center and your variability will be multiplied or divided by that amount. All right, so now let's talk about density curves and normal distribution. So a density curve is on or above the horizontal axis has an area of one and just shows probability distribution. Note that a normal distribution is a type of density curve. Um but no when you talk about a uniform density curve, it's quite rare. You might see questions on this though. Um but you can see pretty clearly that it has a a total area of one. Um but the more common I guess density curve you'll see is the normal distribution. And you'll 100% see this and you've probably seen this before. Um, so you need to know pretty much the standard deviation and the 68 95 999.7 rule. So 68% of the values from the mean are within one standard deviation of the mean. Sorry. Uh 95% of the values are within two standard deviations of the mean and 99.7% of the values are within three standard deviations of the mean. Um, so to solve normal distribution problems, you're just going to use your calculator. I mean, that is the most simple way. So know your calculator commands. Go and study those. So norm PDF that finds a probability at a specific value. Norm CDF that shows probability that the normally distributed variable is between a set interval. Pretty similar to norm PDF. And then we have inverse normal, which pretty much does the reverse of these calculations. It finds a value that corresponds to a given percentile or on your calculator it might be denoted as quote unquote area. Okay. All right. So the final thing we're going to talk about is super niche but I'm still going to include it because it's on the CED. Um that's a normal probability plot. So that what it does is it plots the actual values versus the theoretical ZV values. Okay. So these theoretical ZV values are the ZV values you would get if the actual data points were normally distributed. Okay. So it pretty much just shows you how well the data fits a normal distribution. And that's, you know, something you might see like that. Um, so basically all you need to know is that if it's roughly linear when you plot this or it might just give you the graph, it's going to be roughly normal distribution. If it's not linear, it's, you know, roughly not a normal distribution. And yeah, that does it for everything you need to know for AP statistics unit one. So let's cover the AP stats unit 2 content. Everything you need to know. So correlation, how exactly do you describe scatter plots? We're going to be using the acronym seed. So, put it in context. Direction, is it positive or negative? Uh, state any outliers, the form, is it linear or nonlinear? And then the strength of that form. Okay? Is it strongly linear? Is it weakly linear? All that mumbo jumbo. Now, tying directly into correlation and uh describing scatter plots is your R value. Super important. Your RV value is your coefficient or sorry, your correlation coefficient. Um, and it can range from negative one to one. The closer it is to negative 1 or one, the stronger linear correlation it has. So something like negative 1 is a perfect negative linear correlation. Positive 1 would be like a perfect positive linear correlation. Here's some more examples. Negative 0.97 would be considered a strong uh linear correlation. Zero would be normal correlation and something like 0.021 would be a weak linear correlation. Um, a couple things that do not impact your correlation coefficient is changing your units, right? So, um, if you, I don't know, multiplied everything by 100 does not change your RV value. Also, if you switch the X and Y axis, that does not change your RV value. Um, now let's talk about how outliers affect your RV value. So, any outliers that are within the pattern of data will strengthen R. Any outliers outside the pattern of data will weaken R. What is the pattern of data? Well, I drew a nice diagram to represent it. So it's basically like the general trend that your points are facing. So this is like our general trend. So if I had a super out you know outlier like I know this this star I'm drawing out here that would be within the pattern data and strengthen R. But if we had something down here right that's outside the pattern of data that would weaken R. Also remember correlation does not equal causation. Super important here. You probably heard it before. So just keep that in the back of your mind. All right. So now let's talk about regression lines. So regression lines are essentially just your best fit line. So if you have a scatter plot with points, you put your best fit line on there and it helps you estimate values uh for uh points that you aren't given. Okay. Um so what this looks like on the equation is y hat. You use yhat instead of y because it is your predicted value, not your actual y-value. So it's yhatals a plus bx. You know, you've seen that form before. A is your constant, b is your slope. Um, something you probably haven't seen before is your residual. Okay, so your residual is just the degree of error of regression line prediction. Okay, so let's say my actual value is five and the predicted value is 10. Then you get neg five as the neg five units as the residual. Um, so that's a negative residual and negative residual that means you overestimated, right? You overestimated by five in that case. A positive residual would mean that you underestimated the actual value. All right. So the important thing with I would say regression lines is that yes you are going to be calculating and extrapolating some values and all that but I'll say the big part is interpreting the regression line with proper language. This is super important is probably what's going to be tested on the AP maybe on your midterm. All right so here's example I just stole this methodology off my AP stat teacher so hopefully it's not like copyright infringement. Um, but it basically shows you the, you know, one way you could use language to interpret the slope, interpret the y intercept, interpret the residuals, all that. Talking about stuff like explanatory variable, that's just the independent variable, your response variable, your dependent variable, and then using that, you know, specific and really uh correct language. And now let's talk about the least squares regression line. So basically, you'll get a big data set and you're going to plug it into your graphing calculator and then you can use a calculator function to produce the le squares regression line. the line that minimizes the sum of the square residuals. You don't need to know what that means, but basically it's like pretty much a best fit line sort of. Um, so the key things you need to know about this is that you can also find these values by the way with your calculator. The S value is your average distance. The predicted values are away from the LSR. Um, and then your R square value is the coefficient of determination. That's a percent of your response variables that can be explained with the explanatory variable. And for a optimal le squares regression line you want as low of s value as possible and as high of an R squared value as possible. Also sometimes on the AP you're going to see or I would say sometimes you're going to see this a computer print out. Okay it's kind of weird I know but you might have not done this in class. This is something to keep note of. It's just these like computer printouts. Um this is a nice graphic. It shows you where the y intercept the slope s value r square value yada yada yada. you'll learn these other things like TMP and liter units, but basically yeah, just like memorize these values where they are and all just so that if it shows up, you aren't just caught, you know, not knowing what to do. Okay, so let's talk about outliers. Okay, so how do outliers Whoops. How do outliers affect your le squares regression line? Any outlier, okay, is going to decrease your correlation. Now, here's the thing, right? Outliers are added far away from the mean of Y. So that's going to be a horizontal line. That's going to decrease your slope and also increase your y intercept. Any outliers are added above the mean of x. So this case is a vertical line. Your slope stays the same, your y intercept decreases. And if it is added below the mean of x of that vertical line, then your slope is going to be the same and your y intercept is going to increase in this case instead of decreasing. So the last thing we're going to cover now is residual plot. So it pretty much just plots residual values versus the explanatory or independent variable. If there's a clear pattern, that means a linear function is unlikely to be the best fit for the data. If it's a unclear pattern, that means a linear function is likely to be the best fit for the data. And here is a example of a unclear pattern. Um you can see it's just like pretty much random dots. So in this case, a linear function would be likely something like a clear pattern would be like I just going to draw something random. Say this was the line and you go something like this, right? Like a U-shaped parabola, whatever. That means it would be unlikely for a linear function be best fit for that data. So that does it for all the content you need to know for AP Stat unit 2. All right. So today we're going to cover all the content you need to know for AP Statistics unit 3. Okay. So the first thing we're going to talk about are sampling methods. Okay. So the most common one you're going to see is simple random sample or SRS. Um, it's a random selected subset of the population and all members in the sample have an equal chance of being selected. And here's the general process you would typically use when doing a simple random sample. You're going to define the population and label the individuals. So, what that actually looks like in practice is you're assigning numbers to people or you're putting their names on a slip of paper. And then number two is you have to randomize. Okay? That's why it's called a random sample. So, you can do randomization through a random number generator. put people's names in hats, whatever. And number three is just select members of the sample for your simple random sample. So the next uh sampling process we're going to talk about is stratified random sample where you're going to split the population into groups or strata based on shared traits. Okay. Um these are basically shared traits as in like homogeneous groups. Okay. You have groups where they share a certain trait. So let's say I have prep works subscribers. Okay, I split you guys into prep work subscribers and prep works not subscribers. Okay, and then what you're going to do from there is randomly select samples from each group. So if I had um 10 prep work subscribers and 10 prep works not subscribers, I'd randomly select say two people from each group for everyone in that group has equal chance of being selected. Uh so here's another example. You divide a school by grade level and then randomly pick students from each grade. Now the confusing part with stratified random sample is there's a very very particular close second to it is a cluster sample. Okay, it sounds very similar. You're splitting the population into group/clusters and then randomly selecting entire clusters. I made that bold for a reason. Okay, I made that capital letters for a reason. Entire clusters to sample. So example of that is splitting a city into neighborhoods and surveying everyone in any few randomly chosen neighborhoods. Okay, so here's a diagram to uh make sure that is depicted nicely. So for a shadowy random sample, I split it into red people and blue people, right? And then within the red people, I do a simple random sample. And then within the blue people, I do a simple random sample. Uh but for cluster sampling, what I do is my groups instead of having homogeneous traits, right? My stratified random sample is either you were red or you were blue. But in cluster sampling, I want equal heterogeneous, you know, like equal representation essentially, right? So basically I have like, you know, some red people, some blue people, yada yada yada. And then when I pick uh who to survey, I'm picking entire clusters. So I pick a random cluster and then I sample everybody in that cluster. Okay. The next uh sampling technique we're going to talk about is systematic random sample. That's where you select individuals at regular or set intervals and you start at a random point. So for example, if I was like sitting outside the outside of my school or something and every fifth person that walked in, I would give them a survey. Right? That's systematic random sample. All right. All right, so we're going to talk about some bad sampling methods. Convenient sample is where you basically choose people who are like easy to reach, easy to access. Example of that is surveying people at a nearby mall only because it's close to you. And the other bad sampling method, I mean, there's tons, but these are the most common is voluntary response sampling. So, you allow people to choose to participate. Um, let's say you put like an online poll, right? Uh, let's say you have like a TV show or something and then you're like, "Hey, you should answer this online poll about, you know, said issue." Well, people who feel strongly about that issue are more likely to respond. You're giving that you're giving them the choice, right? So, that introduces bias and that's why it's a bad sampling method. Here are a couple more shortcomings. Uh, number one is undercover. So, when some groups are left out or under reppresented in the sample. So, for example, if you send a survey only to people with internet access, then people who don't have access to the internet can't give their opinion. Um, and therefore that is under coverage. Non-response is when your selected individuals don't or can't respond. So, for example, if you call somebody and you're like, "Hey, can you do the survey?" yada yada yada. And they say no. Well, that's nonresponse bias and shortcoming as well. Next up, we have response bias. When people give false or misleading answers. Um, so this is like the equivalent of lying. Now, sometimes it can just be about lying, right? But it's not like explicitly them trying to lie, but it's just that their answers are prone to some sort of bias. Um, let's say you had, I don't know, let's say you were reviewing our YouTube channel, right? But let's say you are a friend of me, right? So like you know me in person. Well, then you're going to be having response bias because you know me in person and yada yada yada. Okay, I think you get the point. Next up is wording in the question. So, if you poorly phrase the question or you, you know, it's a biased question, it influences the answers. So, if I ask you, do you want to subscribe to Preper's Education? That might be a good question on its own. But if I say, do you want to subscribe to Preper's Education to receive $1 million? There's a little bribe at the end, right? So, the wording in the question there is poorly phrased. It's biased and it influences your answer. Okay? So, now let's talk about observational studies and experiments. So observational study is different from an experiment because in an observational study you only observe and collect the data without influencing the subjects. Okay? So for example, if you just sit in your car and then you watch other people in their car and you see how many people use seat belts in their car, you can't do anything about them, right? You're just simply sitting there and observing variables of interest. A observational stud is the direct opposite of experiment. Now why? Well, because in an experiment, you are manipulating variables or applying treatments to observe and measure the effects on the subjects, right? And when you conduct a experiment, you want to follow four principles. And when you are answering questions, make sure each of these principles are clear in your explanations for full credit. Comparison, you want to compare two or more groups to see the difference in the treatment. Random assignment, you know, just randomly assigning uh subjects to groups to reduce bias. Uh you can also be randomly assigning treatments. control. So you have to keep all variables constant. You've probably heard this before. Replication. Uh make sure you have enough subjects. So don't just do experiment with two people. Do with a sufficient number, respectable amount. A couple more key vocab terms is factor. So factor is the explanatory variable or independent variable. It's just a term used when there are multiple independent variables. Level is just a specific value or category of the factor. Um, so it could be like sunlight is a factor, but then it could be like low sunlight is like level or high sunlight. Confounding is when another variable affects the results. So like if you don't properly have a control group and it's hard to determine the true cause of your results or your treatment. Placebo is a fake treatment where participants may react favorably to it. Uh, single blind is when the subjects don't know which group they're in, so the treatment being assigned um, but the researchers do. Double blind is when neither of the subjects or researchers know who gets what treatment or what or who's assigned to what group. So we're going to finish off with the randomized block design. So a randomized block design is a specific type of experimental design where subjects are divided into blocks or groups based on specific characteristics. So it's kind of like the stratified random sample. Um and each block is randomly assigned a treatment. and a black is just a group with experimental units with the same characteristic. So here we have a similar study population. So let's say our study population is prep works viewers viewing this video right now. And so we're going to split them into two blocks. Okay, we have block one is prep works subscribers because they all have the same characteristic, right? They're all subscribed to the channel. And then we have another group or another block is not subscribers. They have the same characteristic. They're not subscribed to our channel. And then from here to actually conduct the experiment uh with the randomized block design to properly do it we need to assign treatments right so I don't know what experiment we're running but to assign the treatments we need to randomly like assign the treatments and we can do that with I don't know random number generator names and a hat whatever all right so the last experimental design type we're going to talk about is matched pairs design okay this is a specific type of experimental design where subjects are paired based on specific characteristics and each pair is randomly assigned a treatment or one subject in each pair could be a control. So example of this is think about um let's say you're doing an experiment and then you want to sort of like eliminate bias based on gender. Then you can pair like a male with a female and then you know have one of them be the control group or just randomly assigned treatments between them. So yeah, that was a lot of vocab and not a lot of you know crunching numbers and stats. I know it's weird. There's a lot of interpretation here. Make sure you really know these vocab terms because that serves as the basis of your interpretation and designing your own experiments and you know interpreting oh is this a bad sampling method and all that. But yeah, that does it for all the content you need to know for AP statistics unit 3. Going to cover everything you need to know for AP statistics unit 4, which is going to cover topics on probability and also random variables. I mean, do you see this? There's a lot of content. So, let's just get right into it. So, what is probability? Well, it's the chance that something happens and it's written as 0 to one. So, if it's zero, it's impossible. One means 100% chance of happening. It's unpredictable in the short term, impredictable in the long term because you have to observe many many trials of the same proportion of the random chance process before it actually starts to approach its actual probability value. So, let's say my chance of making a free throw is 50%. Right? So, if I shot 10 shots, I mean, it's likely that I only make five, but I might make four. or I might make six. But if I shot a thousand free throws, it's much more likely because of the many, many way more trials that I actually have a proportion equal to 0.5. So for simulation, it is just a model used to mimic real world events to estimate probabilities. You can use a four-step process to conduct a simulation. You're going to define the problem with the question. Then you're going to describe how to use a chance process like random numbers to model the situation. And then you're going to actually perform that. And then based on your results, you're going to estimate the probability. Now, we're going to talk about probability rules. To talk about probability rules, we first need to get a couple definitions out of the way. The first one is mutually exclusive. Okay? Uh two events have no overlap and cannot occur at the same time is the definition for a mutually exclusive uh occurrence. So, you can see here, this is a nice diagram. You have event A and event B, right? They're mutually exclusive because there's no overlap, right? They cannot occur at the same time. But if they are not mutually exclusive, they can occur at the same time. That's why there's the overlap in the ven diagram. The next one is independence. So two events are independent if the outcome of one does not affect the outcome of the other. So for example, if I flip a coin and I roll a dieice, the result of one does not change the result of other. If I flip a coin, right, whatever I get head, tails, whatever. me rolling a dieice because I flipped a head or because I flipped a tails that doesn't change what I'm going to get or the probability of what I'm going to get on the die. So, probability rules. So, let's say I have two possible events and their probability of occurring is probability of event A and probability of event B. How would I find the probability of either one occurring? So, it doesn't matter if it's probability of A happening or probability of B happening. If these two events were not mutually exclusive or if both of these events could happen at the same time, I would add the probabilities and then subtract the overlap. If they were mutually exclusive or they cannot happen at the same time, I would simply add their probabilities. And these equations are on the reference sheet. Um the top one is not the mutually exclusive one. So make just just make sure you remember that. Um, so now what if I wanted to find the probability of both event A and B occurring at the same time? Well, if they were not independent, you would just multiply the probability of A with the probability of B given A. If they were independent, you would simply multiply them. And these equations are also given on the reference table. But now let's talk about uh the complement rule. Okay, so that's cool and all, but what is the probability that A does not occur? Well, the probability of A not occurring is just one minus the probability of A. That's given as a complement rule on your reference table as well. Actually, I'm not sure if this one's on your reference table. So, or reference sheet. So, make sure you check that one. Um, but we're going to move on to visualizing probability in three ways. Um, so there's the vin diagram, which obviously shows the chance of something happening, event A and event B, and then their overlap. And you have a two-way table. You probably seen this back in unit one, just displaying the data. And you also have a probability tree. Um, the thing with this is I have I do have an example for this, but I would say that visualizing the data is really coming down to just personal interpretation. So if you are a little confused, pause the video here. I do run through an example here as I scroll down that sort of explains how to interpret the data. But if you are comfortable with the two-way table and the ven diagrams and understand how we're getting these probabilities, uh if you're trying to brute force it, it's very difficult and it's very hard to visualize. But I'm telling you just tie it back to the probability rules. Okay? Like my entire explanation over here is just using probability rules and then I'm tying that back into what I see on the vend diagram, a two-way table. And then if you want an easier visual, look at the probability tree as well. So let's move on to the random variables. So we have four types, discrete, continuous, binomial, and geometric. Okay, so your discrete random variable is a random variable that takes on a specific countable values. So think of like the number of heads that you flip in three. um the number of cars sold by a dealer in a day. So it's like quantitative numbers. Um now when you have a continuous random variable that means it's any value within a range or interval, right? So it's not just 1 2 3. You actually count what's in between one and two, right? It's continuous. It's everything in between as well. So think about the height of students in a class, right? So if you think about heights, it's not just you can be one inch or two inches. Obviously, that's super short, but you can be everything in between, right? Also, the time it takes to finish a race, right? It's incremental. It's not just finishing a race in one second or two seconds. It's also you can be like 1.111. You get the point, right? Here is another diagram just to show the difference between discrete and continuous variables. All right. So, if you want to find the mean and standard deviation of discrete or continuous variables, you can also use one variable statistics on your calculator. And a common calculation you will probably have to show for a discrete random variable is calculating the mean or expected value/w weighted average. It is the average value over many many repetitions of the same chance process for this discrete random variable. Here's an example. You can see X basically just describes the uh value of that discrete random variable and then P of X is just the probability of each of those variables. Um and so what you would do is quite literally just multiply across one * 0.1 2 * 0.3 all the way to the end and then whatever you get you add everything and that is your mean weighted average expected value. So now obviously we have to talk about it can't be stats without talking about how you transform probability distributions. So here's a summary. If you add or subtract the same constant C to each data set the shape unchanged center increases or decreases by C. variability uh remains unchanged. If you multiply or divide by the same constant C to each value in the data set, your shape stays the same. Center multiply or divide by C and variability multiply or divide by C. So, here are a couple more equations you need to know when you are adding or subtracting sets of data for random variables assuming that event A and X are independent events. So you can see we have the equation for uh finding the mean and median of x + y and x minus y respectively. So make sure you have these equations down. All right. So now we're going to move on to binomial random variables. And I'm going to tell you this is all you need. Okay? You need to know the acronym bins. Okay. First off, is it binary? Does it you know satisfy the binary? There has to be a success and there has to be a failure. Uh is are the trials independent? Is there a fixed number of trials and is there a set probability of success for each trial? Um, so make sure for questions you address each part of bins b i n s explicitly and show that the events satisfy the binomial settings to prove that's actually a binomial uh random variable. Uh for the mean and standard deviation, these are on your reference sheet. uh we have the mean and the median not the median the standard deviation uh where n equals the number of trials and p equals the probability of success and I would say you can always abuse the calculator for these problems as well uh if you're asked about um binomial random variables um so here are the calculator commands you got binomial pdf that's just npx so n is number of trials p is probability of success x is the number of successes and calculates the probability of exactly k number of successes in n trials um honestly it should be x number of trials I should change that oops x number of trials okay bin cdf np lower and upper bound it's like the same thing except it's the cumulative probability of having k or fewer or more successes in n number of trials it really depends on the parameters that you put for the lower or upper bound um so yeah now we're going to move on two geometric random variables. So these are a type of discrete random variables that models the number of trials required to achieve the first success. It's highlighted, it's bolded, the chance to achieve the first success in a series of independent trials with two possible outcomes, success or failure. So very similar to the binomial random variables. So example of this is you're flipping a coin and you're trying to see the number of flips it required to get the first heads. Um that is the example of a geometric random variable. Here are a couple criteria. Trials have to be independent. Each trial has to have the same probability of success denoted by P. And the random variable counts the number of trials until the first success occurs. Like we talked about before in the definition. Now easy way to tell geometric random variables from binomial random variables is the following. Binomial random variables have fixed number of trials. Geometrics do not have a fixed number of trials. So look out for the keyword until more calculator abuse and commands geometric PDF and geometric CDF. Um so yeah, if there's one thing you learned today is learn how to use calculator commands because it makes your life pretty darn easy. And then for the mean and standard deviation of geometric random variables, we have the mean is just one over the probability chance of success. And the standard deviation is square<unk> 1 minus probability of success over probability of success. So that does it for all the content you need to know for unit 4 probability and random variables. All right, so today we're going to go through a quick review of unit five of AP stats sampling distributions. This one really provides the basis for the rest of the curriculum. Um so yeah, let's get right into it. So first we need to differentiate between a statistic and a parameter. A statistic is just any number that describes uh something that comes from a sample of data. And then the opposite of that is the parameter which is anything that describes uh something that comes from an entire population of data. Um the most common ones is mean and proportion which we'll get and dive more into later. Um but there are other ones like you can see there's like slope and stuff. Um there's a unit on that as well Kai square. But uh what we need to know right now is what exactly is a sampling distribution? Well, it is a probability distribution of a statistic that is obtained through repeated sampling of a specific population, right? So, let's say your population I say it has like a mean of 10, right? So, what I'm going to do is I conduct simulations, right? And take repeated samples from that population, find the mean of those samples, and then each of those means of the sample counts as one data point. And then I'm going to put that data point on a new distribution. And then I'm going to do this simulation multiple times and then I'm going to ultimately find the mean of this new sampling distribution. And theoretically it should line up because it is a unbiased estimator, right? Which is when the statistics sampling distribution is equal to that of the parameter, right? Like we said before with the mean, a mean is the example of an unbiased estimator because if we have the mean of the population at 10, if we do the repeated sampling and then take the means of those, those should also equal 10 in the long run. Okay, so now we're going to dive into more specifics. The first is more specifically talking about proportions. Um, so like we said before, here's a simplified basically what we're actually doing. We're going to take repeated samples of the same size from a population, then find the proportion of each sample, and then plot that information on a distribution. Okay, so we're going to dive into a couple key components. We're not going to do any full significance test or confidence intervals in this unit, but this is stuff that you need to know and you need to understand in order to do those in later units. Okay, so we have find the mean and standard deviation. Basically, a bunch of formulas. These are on your reference sheet. I'm not going to, you know, go too much into this to waste your time. We know that increasing the sample size will increase your variability. Sorry, decrease your variability, right? Because if you're increasing the sample size, it's more accurate, right? So then you get less variability. Now, we're going to talk about a couple assumptions uh and conditions. So basically things to in order to conduct a significance test, these are a couple things that have to be true. First, you need random sampling and assignment. Then you need to satisfy the 10% condition. And it's basically where each sample that you take has to be less than or equal to 10% because you want the trials to be able to be treated as independent, right? Um where we don't have like replacement. So that is represented as n is less than 0.1 or 10% of the entire population which is denoted as big n. The uh last one is large counts condition. Right? So basically what you need is the number of successes and failures have to be both equal to or above 10. And that is the formula for it. That is basically where you have the number of uh the size of your sample n times the proportion n * 1 minus p. So that is going to basically represent the number of failures that you have. Now it's also important to note that if your sampling distribution is approximately normal which it should be because remember we have to satisfy the large counts condition then we can find the zvalue which is the number of standard deviations away that our uh say statistic is away from the mean. So in that case once we get the ZV valueue what we can do is solve for the probability of something right because a lot of times these questions are going to say like what is the probability that sample proportion is blank percent less than or equal to or greater than right in that case we can calculate the Z uh value either using a formula as you can see here down so the entire formula shows or you can just plug it into your entire calculator and it does all the math for you. Um, but to solve for the actual probability, guess what? You're also going to use your calculator using normal CDF or you can use table A. All right, so now jumping over, very similar. We're going to be talking about means to finish off the unit. Um, you're going to be taking repeated samples just like the proportions, but instead of taking the proportion of each sample, you're taking the means of each sample and applying that onto a distribution. You have some more formulas for the mean and standard deviation, the sampling distribution there. These are also on your reference table. Increasing sample size, same thing here is going to decrease variability. And then you also need to meet pretty much the same conditions. Random sampling assignment, 10% condition, a large counts condition. Now, I do want to note that you would do R random assignment. That is usually only for experiments, but if you're doing samples, usually it's just random sampling. Uh, and then there's the formula for the ZV valueue. Again, you can just use your calculator. Honestly, the most important thing in AB stat is know how to use your calculator because that will carry you a lot. If finding finding the probability is the same thing as finding the probability for um sorry proportions. And then another thing we want to introduce here is the central limit theorem. Okay. So your sampling distribution is going to be approximately normal even if the population distribution isn't if it meets the central limit theorem. Okay. What is the central limit theorem? Basically, all your sample size just has to be at least 30, right? So, that is pretty much the entirety of unit 5. Hey, today we're going through all the content you need to know for unit six of AP stat, which is inference for categorical data, more specifically uh proportions. Okay, so the first thing let's touch upon is what is a point estimate? It's just a value, more spec specifically a statistic that estimates your population parameter. Okay, um so let's say I have a confidence interval of 3 to 5. then your point estimate would be right in between that four. So it's just a number that estimates it because you're not going to be exact. Um but we're going to be using a confidence interval to test to find the actual population parameter. Um now a key thing to know is the difference between a confidence interval and a confidence level. Your confidence interval is the range of values that estimates your population parameter. It's pretty obvious. But your confidence level is that like actual percentage like 90% 95% that's the probability that the parameter falls within a specified range. And something to note here is the actual interpretation because what you need to do is this first part that confidence interval interpretation that is your uh conclusion. Your conclusion needs to conclude in context. And so you're going to say this we are blank percent confident that the interval from blank to blank captures the true parameter of interest in context. So, you need to have that down for the actual confidence interval. Uh, but then a lot of times you're going to see an FRQ gives a follow-up to that question dealing with confidence level and asking you to interpret that. So, just have this in mind as well. And now, a couple more caveats before we move on to the actual confidence intervals is that if we increase that confidence level, confidence level, not interval, that's going to increase our margin of error and make it wider. If we increase the sample size, right? So, we're getting more accurate. We've talked about this in sampling distributions as well. It decreases our margin of error which makes it narrower. And also, this is really important. You you're going to see this on a multiple choice, I bet. Um, bias does not affect margin of error. So, this is going to be like one of those answer choices about like what affects margin of error. And one of the answer choices is it going to be bias. It does not affect margin of error. Okay. So, the acronym you need to know is panic, which stands for parameter of interest, and then your assumptions and conditions, the name of the test, state that interval, and then finally just conclude. Okay, so for a parameter, you're basically saying like what you're actually doing the interval about, right? So if you have a one sample, it's just the true proportion of whatever. And then if you have two samples, then just state the true proportion of whatever and whatever number one. So number one, number two, right? So the same thing except you're doing it for two. The procedure pretty simple. If it's one sample, you're just doing a Z interval for Z interval for P. If it's two sample, it's P1 minus P2 because you're trying to find that difference to see if there is a difference in proportion. Uh conditions, these we covered in sampling distributions as well. So, make sure you really have those down in sampling distributions. Your random sample, your independence based on that 10% condition, then your large counts. And everything is the same for two sample except now you have to do random sample everything for both of them. You have to check for 10% condition for both of them. It's very annoying, but you got to do temp uh you got to do the large counts conditions for both of them. Um, so yeah, that's just something to keep in mind. Now, to calculate the actual interval, you can use your calculator, which is highly recommended, or you can be a you can be a weirdo and calculate it with this formula, the point estimate plus or minus that margin of error, which is the Zstar value times the standard deviation. And finally, conclude. Right? That's the last part. Just conclude within context. You can see again this interpretation part. This deals back with our confidence interval and how to interpret it. to really have that down and do lots of practice problems to hammer it in. Um, another note is that if your two sample a confidence interval contains the value of zero within it, then you cannot conclude a difference in population proportion. The reason I included this is because in follow-up questions, you're going to you're going to have like you're going to do your test and then get like a value of like negative 1 to like 3.2 and then in the follow-up it's going to ask you can you conclude a difference or yada yada yada or something like that and then it's going to include zero. So then you got to remember that you cannot conclude a difference because it includes zero, right? Because it could just not have a difference, right? Okay. So that is confidence intervals. But now for proportions, you also have significance tests. For significance tests, you're going to use the phantoms acronym. So your parameter of interest just like the same as confidence rules, but now you're going to have hypothesis. You're going to have Ho, which is your null hypothesis, or it is the stated distribution of the cate whatever. Okay. Yeah, that long long definition. And you're also going to have your um alternate hypothesis. Um basically the stated distribution of the population of interest is not correct. So your null is like what they claim and then your alternate hypothesis is like what your like alternative is. Like you're trying to prove the alternative is right if there's like significance. Okay, it'll make more sense when I get to the actual test itself. And then you have your assumptions and conditions just like normal. It varies a little bit. The name of the test, significance test, calculate the actual test statistics or test statistic is more of so a value just to uh evaluate the actual test in of itself. Now, usually you can just use your calculator. If you use significance uh test command, it'll just give you the test statistic yourself. Don't calculate it by hand with a formula. That's just don't do that. Then find your actual p value. Your p value is what determines whether you are going to reject your null hypothesis or fail to reject your null hypothesis. Okay? So your p- value is super important. Usually you're going to be dealing with the alpha value of like 0.05, right? So if it's like below 0.05, then it's statistically significant and you would reject Ho. And then if it's above that, you cannot do that, right? Remember all that. And then you're going to make a decision, right? You're going to reject or fail to reject Ho. And the final is just state uh your conclusion in context. When you state your conclusion in context, that's really important. That's when you're touching upon HA, right? Your alternative h u alternative hypothesis. When you make your decision, you don't have to say HO in context. But when you do uh do the latter part for H A, you do have to say that in context. Okay. All right. Let's get into the actual test itself. A lot of the same things. So for a one sample you're just talking about P and then your H A and then for a two sample you're doing uh you there's there's another way to say this. Instead of saying P1 is equal to P2 you can also say P1 minus P2 equals something like zero. Okay so basically there's not a difference. The name of your procedure right it's a one sample Z test for P or if you're doing a two sample it's just a two sample Z test for P1 minus P2. your conditions, random sample like always, then your 10% condition and then your large counts uh condition, right? Very similar to the confidence interval. For two sample, it's the exact same thing, right? Except now you got to make sure it works for both samples. Cool. All right. Finally, just make a conclusion. This is very important actually. So, I'll just like make sure you can see this entire thing. Pause the video so you can actually interpret this. This is basically what you need to have in the bank for interpreting. Okay, interpreting is super important. having the right wording, making sure you have the right phrasing as well to get all those points. So there we have it for the one sample and then the two sample. Cool. Now the final topics in this thing we need to cover are uh type one and type two error and then power. Okay. So when you're talking about uh significance test, usually they won't explicitly make you talk about type one and type two error or power. These are just like extraneous concepts that come up in follow-up questions about significance tests. So, it's just good to know them because they might show up. Uh, so type one error, I like to think about it as a false negative. So, this is where uh the null hypothesis is true, but we reject it and conclude HA, right? So, I there's there's really no way else to explain this. Um, but something else to know is that your alpha value is equivalent to the probability of your type one error. Okay? So if you increase the alpha value, the probability of making a type one error is also going to increase. And then you have the other type of error which is the pretty much the opposite. Instead of a false negative, it is a false positive. That's where your null hypothesis is actually false. But we fail to reject it and then instead we cannot include conclude HA. Okay. All right. The last thing here is power. Okay. So what exactly is power? Power is the probability that our null hypothesis will be rejected when the alternative is true. Right? So the chance of making a right decision when Ho is false because HA is true. So the equation for this can also be shown as one minus the probability of making a type two error. And then how do you increase power? There's three main ways. You can either increase the alpha value, but it also has that drawback of increasing type one error. um like we talked about before, you can increase the sample size or you can increase the distance between your null and your uh alternative hypothesis value. Okay, so if you do that, then the test is going to be more uh sensitive to large changes and it's going to detect a significance if there is a difference, right? So if you had two values like two and three and they're close together, then it's be harder to tell versus if you have like two and 100 as your null and alternative hypothesis respectively. So that does it for all the content you need to know for unit 6. All right, so today we're covering all the content you need to know for unit 7 inference for quantitative data means for AP stats. So if you understand unit six, which is basically the same thing but for proportions, they're going to be set for this unit. There's just like two things that change. Um first instead of using a critical zvalue instead of our critical value is tar. Okay so your tar value you can calculate with your calculator but to do that you need to also understand your degrees of freedom which is a new uh concept in variable. You're going to be using this in kai squared stuff as well. So your degrees of freedom is a number of values in your data that can vary freely without affecting the others. Okay. So it's a really weird definition. You don't really need to understand what that means. Instead, you need to understand how to calculate it. Very simple here for confidence intervals and uh significance test of means. This is how you're going to calculate it. Your number of degrees of freedom is n minus one. So your sample size minus one is just your degrees of freedom. That's all you need to know for that part. Okay. So now let's dive into the actual confidence interval and then the uh significance test. So very very very very similar procedure to proportions. We're going to define our parameters. So instead of the true proportion of something in context, now it is the true mean of something. And if it's a two sample, then you're going to do that for both of the samples. Then you're going to name the procedure. Okay? Either it is a one sample. Notice t interval t interval for the mean and then a two sample t interval. Instead of a Z interval, we're doing a T interval, right? Because we're using T-star instead. Then your assumptions and conditions. This is where we bring back that stuff from unit five sampling distributions. Remember how we talked about central limit theorem? Well, now we're talking about that again. So obviously we have random sample or assignment or uh whatever. So that is the same. And then you're going to have that 10% conditions. That's the same. And then our third our third condition here that we need to satisfy is kind of weird. Okay, I'll explain it. So this number one has to be satisfied. Number two has to be satisfied. But number three is weird because we have three different ways here that's shown that we can show that our uh distribution is approximately normal. Either our population is literally normal because it's stated or we can say it's normal by the central limit theorem where our sample size is greater or equal to or greater than 30 or our sample our data doesn't have any strong skewess or outliers. Um we can also say it's normal or approximately normal in that case. So it just needs to satisfy one of these three conditions to say that it is approximately normal and satisfy that third condition there. It is the same thing for a two sample except guess what you're do it for both of those samples. Kind of tedious. Okay. So the next part is the interpretation. Okay. So when you're interpreting this is basically like your conclusion. So remember we're talking about how to interpret confidence intervals. That's different from interpreting confidence levels. So make sure you really have this down. I'm sure if you do this a couple times, you're going to have that interpretation down for confidence intervals. But it is the true mean, the true difference in mean. So, make sure you have that down. Okay, cool. Um, yeah. Well, this video is going to go pretty quick because significance tests are pretty simple for means. Um, we're going to write our hypothesis and define parameters just like those for proportions, right? So, you got your null and then you got your alternative and then defining our parameters here. True mean of whatever. And then same thing for the two sample. Name the procedure. So it's one sample t test for our mean or a two sample t test for a difference in those means. Um next let's move on to assumptions and conditions. Random sample like before 10% condition like before and then the same thing with having to satisfy one of these three to show that it's approximately normal. Same thing for two sample. Got to do it for all both samples. And then the last thing here is our interpretation or conclusion. Again really have this down. Okay. Do lots of practice problems so you really get into flow. If your p value is less than 0.05 then well assuming your alpha value is 0.05 usually um then it's statistically significant. If it's above that it's not stat statistically significant. So make sure you reject or fail to reject HO um accordingly and then conclude or cannot conclude HA. So that does it for all the content you need to know for AP stats unit 7. All right. So we're moving on to unit A of AP stats which has to do with inference for categorical data kai squared tests. Okay. So this if you're going to guess is very similar to significance test for proportions and means. I mean the rest of AP stats is pretty similar. So if you get one down you can pretty much get the others down too. Um but there are some nuances you have to know with kai squared. But this thing is like only two to five% of your AP exam. So it's not that really really that important. You're probably not going to do a full kai squared test for your F FRQs. This is more likely to show up as multiple choice. Okay, so the first type of test for kai squared is a kai squared test for goodness of fit. Okay, the key question I like to ask myself is does a sample's observed distribution significantly different from the expected distribution. So that's a mumbo jumbo, you know, jargon term. But let's think about it in terms of something more simple like M&M's, right? So, let's say I have the I don't know whatever company owns M M&M's. They claim that it's like 10% red M&M's, 13% brown M&M's, etc., right? So, then I order a bunch of M&M's and then I, you know, open them up and then I check the distributions, right? I count them up, right? I'm trying to see if my sample, the percentage of X color of M&M's matches with what the company claims, right? So that is an example of where you would use kaiquare test for goodness of fit. Okay. Now something that actually differs between whoa surprise differs between kaiquare tests and the the significance test for say proportions or means is you actually don't have a parameter of interest and you also do not have a sample statistic. Okay. Instead we're going to jump straight into the hypothesis, your null and your alternative hypothesis. Then your assumptions and conditions. uh so a random sample the 10% condition your large counts except now instead of it being above 10 it has to be equal to or above five I don't know why that is I mean whoever created the AP stats curriculum or whoever invented statistics you can ask them uh name of the statistics is kaiquare test for goodness of fit your test statistic I guess something cool you can know about this is that a higher test statistic means a higher discrepancy from the expected distribution which makes sense Right? Because it is your observed value minus your expected count over your expected count. It's like squared and stuff. So if your observed value if this value I should not have drawn that arrow. If this one is very high, right? So that means like I'm expecting a value of like 10 but I get like one like 1,000 then my difference will be super large that gets squared. And so a higher test statistic means there's higher discrepancy. And you know, if I'm talking about terms of like M&M's, then the M&M's people, they're probably not telling the truth, right? There's a huge discrepancy. Now, we're talking back to degrees of freedom. So, instead of our sample size minus one, very simple. Now, it's just the number of categories we have minus one. So, in terms of M&M's, it's like each color of M&M's, right? If I have like, I don't know, seven colors of M&M's, then it' be seven minus one. Degrees of freedom is six. Then, we're going to obtain our p value. You can use table C or you can just use your calculator. Pretty simple. Then you're going to make a decision. Are you going to reject or fail to reject Ho? And then you're going to conclude in context. Remember, when you're concluding in context, you want to refer back to HA and say HA explicitly in context. Our next test is a kaiquare test for homogeneity. It's kind of hard to say. I always said as homogyny, but that's not like not the same thing. And then our next one is a kaiquare test for association or independence. Um so you'll notice that this is pretty short and that's because between these two there's only like two differences. Okay. So the differences is uh how you identify them the name and your null hypothesis and stuff. So let's start off with the uh kaiquare test for homogeneity or homogeneity whatever. The key question here is do two or more populations share the same distribution for a single categorical variable. Okay, so this is very important. You have one variable of interest. All right, so how you're going to identify this is you're going to have two or more samples with only one variable. Okay, if I haven't made this more clear, it's one variable of interest. So for your hypothesis, your null is going to be there is no difference within that one variable between your two or more groups. Your alternative is there is a difference in that one variable between all your groups. The name is kai square test blah blah blah. And so now we need to dive into the other things. Okay? Because everything else is pretty much the same between these two tests. Okay? So now like before we're going to go into our assumptions and conditions, right? You got have your random sample and selection assignment, your 10% condition, your large counts equal to or greater than five. And then this is a little different for your uh degrees of freedom. It is now the number of rows you have minus one times the number of columns you have minus one. Okay, this is because you have a two-way table, right? And then when you're dealing with that, that changes your degrees of freedom. And then your expected counts formula also changes. Now, you can plug this into your calculator and do like matrices to get it done. Um, but or you can just use this formula. You can pick either way. There's really no like better way to do it in this case, which is your row total times your column total all divided by your table total. Okay, your test statistic is the same as your goodness of fit. Um, again, don't manually calculate it, right? It's just good to know that formula in case it shows up on like a multiple choice for some reason, which is very very unlikely because this unit is very very unlikely to show up. Um, but just in case, just recognize it for those multiple choice questions. And then for your conclusion, reject or fail to reject Ho just like before. Conclude cannot conclude HA. And then just adjust the context based on each respective test. Right? So when we're talking about the context part, make sure you are going back to your null and your alternative hypothesis. So to cap the video, we're just going to talk about the uh other stuff for our kai square test for association or independence. Key question here is is there an association between two categorical variables. Okay. So now we have only one sample one sample but with multiple variables. Okay. So our null here would be there is no association between uh our first variable in context and our second variable in context between our in in our one sample. And then our alternative is we're just changing one word here association between these uh variables. And the name is just Kai square Kai square association and everything else guess what is the exact same thing. So that does for everything you need to know for unit eight of AP stats everyone. So we are on to the last unit of AP statistics which has to deal with inference for slope. So if you couldn't tell it's very similar to the last couple units except now it's for slope right. So got your confidence interval significance test um and just a couple more nuances in between. So we got our new symbol here. We got this B looking symbol that is beta. That's the slope of the le squares regression line. So instead of trying to estimate the true population uh proportion or true population mean, now it's the true population slope. And we're going to do this using sample data just like before. Okay. So the big question we're trying to answer is is there a linear relationship? So how the general data process collection goes, you could also do this with the experiments, but the most common way is just a random sample. So you've just obtained your random sample of data points that we plotted for you and you need to differentiate between your explanatory variable or your independent variable and then the dependent variable or your response variable. So this makes more sense with something like a example. So if I'm trying to say is um let's say we have study time on the x- axis and test scores on the y- axis. So if I increase study time in my mind I think that results in a higher test score. Right? So this would be upward sloping like that. So that's what I think is going to happen. Okay, we don't know if it's true. And so what I would do is if I have a student population of 500 people, then I would take 10 of those students just to satisfy the 10% condition because we also want independence as we'll see later, then just measure each student's test score and their study time, right? And then we would plot those data points. So if I erase this and we would let's see blue we to plot those points on there and let's say that in this case I don't know this is not 10 points but in this case there does because we got those data points and if we draw a le squ regression line looks something like that that's totally off but whatever that looks relatively linear and then we can estimate the slope of that and say it's like a strong positive linear slope or whatever and so in that case you would satisfy But more specifically, you can't just, you know, randomly throw out numbers like that and estimate the points. You need to actually do a detailed procedure using a p value or confidence interval. So let's dive into it. Confidence interval panic z² same thing as before. P your parameter of interest this case is beta the true slope of the population less regression line. Then just talk about your two variables in context that you're trying to see if there's a linear relationship between those two. Now for the assumptions conditions, this is why you usually won't be asked to do a full procedure on the AP exam because there's five of them to check. Um you need to check the residual plot or SC scatter plot to see if the data is approximately linear. Check for independence 10% condition or if it's independent observations if it's the experiment. Then you got your dot plot where box residuals. You don't want any strong skewness or outliers. And along with that goes you want uh equal standard deviation or variance. uh if the residual plot does not show increasing or decreasing variance. So here is what we want to see right? So we have sample data whatever looks relatively linear and then the residuals you see no clear patterns increasing or decreasing but if we have data that is not linear whatsoever you see this is a W looking shape. Well our residuals is going to reflect that right? Our residuals isn't random. There is a decreasing then increasing pattern right? That's not what we want. In this case right here, we wouldn't be able to uh do our confidence interval or significance test. The last condition is just you need random samples or assignment if it's a experiment. Let's move on to the name of the test. Just t interval for slope of a population least squares regression line at blank percent confidence. Usually that confidence is given to you. Then to calculate the interval the actual interval you can use either use your calculator or just manually calculate it. That is the estimated slope from your sample B plus or minus your critical t star value times that standard error in that formula usually is given to you directly in the problem but you can also calculate with the formula on your reference table that will be equal to your interval and just keep a note degrees of freedom n minus 2 in case you do use your calculator to calculate that uh interval. Then finally your conclusion same thing as before we are blame percent confidence interval from blah blah blah captures the true slope uh captures the slope of the true least gross regression line relating your two variables in context. Always remember to do it in context because AP graders love to see that. Let's move on to the last thing in AP statistics. It's a significance test for slopes and we're be using phantoms just like before. Okay, your parameters and your assumptions and conditions is the exact same as your um confidence interval. So if you need a refresher on that, just go back in the video. Uh your hypothesis now because you are doing a significance test, right? So your null is usually set to zero because you're under the assumption that there is no uh linear relationship in the beginning. But this can change like let's say for example a scientist is trying to disprove a model that already shows a linear relationship then your uh beta value would be set equal to like one. The alternatives would be greater than zero less than zero or if it's a two-sided test not equal to zero. Name of this is two test for the population slope for the least regression line. And then for your test statistic, it is your estimated slope from the sample minus your population slope. So keep in this beta value is based on your null value because remember just like before when we're conducting significance tests what we're doing is we're running tests under the assumption that the null is right and then based on that we get a p value like how uh likely is that if the null was true we'd get a result as significant or more significant as this and that's not as a p value then based on that we can uh reject ho and then conclude ha or whatever. So to actually obtain the p value, it's kind of interesting for slopes. You actually need the raw data if you want to use a calculator command, which is t test for slope, but you should probably know how to manually calculate it, which is get your t- value from above using the equation. Find your degrees of freedom, which is sample size minus 2, and then just use tcvf. And if you want a note about how to identify whether to put your t value as the lower or the upper bound, pause the video and get this in. All right, for the conclusion, since the p value of block blank is less than or greater than our alpha value of blank, uh usually that's given as well. Or if it's not given, just assume it's 0.05. We can reject or fail reject Ho and conclude or cannot conclude HA that a linear relationship exists in the population between your two variables. Guess what? In context. That's right. Always say everything in context. Okay? And that's it. That is the entirety of AP statistics. So go crush that AP exam. uh and subscribe for good luck. And thank you guys for watching the video.

Transcript for:Comprehensive Guide to AP Statistics

Transcript for:
Comprehensive Guide to AP Statistics