Transcript for:
Understanding Inferential Statistics Basics

If you're new to quantitative analysis, one of the many intimidating terms that you're bound to hear being thrown around is inferential statistics. In this video, we're going to break inferential stats down using plain language and loads of examples. Let's do it.

All right, so here's a quick overview of how we'll approach this lesson. First, we'll explain what exactly inferential stats are in plain language. terms. Then we'll look at how inferential stats compare to descriptive statistics.

And with those foundations laid down, we'll look at some of the most common inferential tests, including t-tests, ANOVA, chi-square, correlation, and regression. In addition to this video, you'll also definitely want to check out our free statistics cheat sheet, which will help you fast track your statistical analysis. You can grab a copy of the cheat sheet for free using the link in the description. Also, it's worth mentioning that if you are completely new to statistics and aren't yet familiar with basic terminology such as means and medians and variance, it's a good idea to first watch our descriptive statistics video so that you can get full benefit from this lesson.

Again, you can find the link to that in the description. So first things first, what are inferential statistics? At the simplest level, inferential stats, or just inferentials for short, allow you to test whether the patterns that you observe in a data set, a sample, are likely to be present in the broader population.

Or whether those patterns are just a product of chance. That probably sounds rather conceptual, so let's look at a practical example. Let's say that you surveyed 100 people in a specific city about their favorite type of food.

And let's say there were five options to choose from. This group of 100 people would be your sample. When reviewing the data, you might find that 40 people selected the pizza option.

In other words, 40% of the sample. You could then use inferential statistics to test whether that number is likely reflective of preferences across the entire city, in other words, the population, or that it's just down to chance. More specifically, you'd use a chi-square test, which we'll talk about a little bit later, so be sure to stick around for that. In statistics speak, this question of is it real or is it just chance is known as statistical significance. I won't go down that rabbit hole in this video, but it's useful to understand that this ability to assess statistical significance means that inferential statistics can be used to test hypotheses and in some cases they can even be used to make predictions.

Now, it's worth mentioning that all of this assumes that your sample is relatively representative of the population, which of course hinges on your sampling strategy. If you're not familiar with samples and sampling strategies, you can check out our straightforward explainer videos which cover these topics. As always, the links are in the description. At this point, you might be asking yourself, but how is this all different from descriptive statistics?

Well, at the simplest level, descriptive statistics summarize an organize the data that you already have. In other words, your sample. Inferentials, on the other hand, allow you to use your sample data to assess whether the patterns contained within it are indeed likely to be present in the broader population and potentially to make predictions about that population.

Again, let's look at an example to make this all a little bit more tangible. Let's imagine that you're undertaking a survey to assess customer satisfaction at a local restaurant. Let's assume that your sample comprises a mix of both men and women.

If you just wanted to know the average level of customer satisfaction across men and women within the sample, you could achieve this with descriptive statistics. Specifically, you would use the mean statistic. However, if you wanted to compare those two means, in other words, average male rating and average female rating, to see whether there's a noteworthy difference in satisfaction levels between men and women in the broader population, you'd need to use inferential statistics. Specifically, you'd use a t-test, which we'll look at in a moment.

So simply put, descriptive statistics describe your sample, and the clue, of course, is in the name there, while inferential stats help you understand whether the patterns that you see in your sample are likely to be present within the broader population. population. Remember, if you're not yet 100% comfy with descriptive statistics, it's a good idea to watch our explainer video which covers exactly that.

And as always, you can find the link to that in the description. Alright, so now that we've covered the basics, let's take a look at some of the most common statistical tests within the inferential realm. Specifically, we're going to look at t-tests, ANOVA, chi-square, correlation, and regression. If you'd like us to cover any other statistical tests, please let us know in the comments.

So let's start by looking at the humble t test. Simply put, A t-test allows you to compare the means, that's the averages, of two different groups to see if they are genuinely different or if that difference is just a product of chance, perhaps because of some outliers or because of very high variance in one of the groups. In other words, a t-test allows you to assess whether the difference between those two means is statistically significant.

As an example, you might use a t-test to see if there's a statistically significant difference between the average exam scores. of two math classes taught by different teachers. Similarly, you could use a t-test to assess the difference in average plant height when using two different fertilizers.

Now, it's worth noting here that there are a few different types of t-tests. In the examples that I just mentioned, we'd be using an independent t-test which compares the means of two different groups. If, on the other hand, you wanted to compare the mean of one group at different points in time, you'd use what's called a paired t-test.

t-test. It's also important to understand that each of these t-tests has its own set of assumptions and requirements as do all of the tests that we'll discuss here but we'll save assumptions for another video. Alright next up let's look at another inferential test called an ANOVA.

Now while a t-test, the one that we just looked at, compares the means of two groups An ANOVA can compare the means of more than two groups at once. Again, this helps you assess whether the differences in the means are statistically significant or just a product of chance. Let's look at an example of ANOVA in action. Sticking with the student theme, if you wanted to assess whether students'test scores vary based on the type of school that they attend, let's say public, private, or home school, that's three groups, you could use ANOVA to compare the average standardized test scores of these groups. Similarly, you could use ANOVA to compare the average sales of a product across multiple stores or multiple locations of a retailer.

It's worth pointing out that in these examples we're specifically referring to what's called a one-way ANOVA. But as always, there are multiple types of ANOVAs for different applications and of course you need to select the right one or your data will be pretty meaningless. Thankfully, our free statistics cheat sheet, which I mentioned a little bit earlier, makes this task really quick and easy.

So be sure to grab a copy of the cheat sheet for free using the link in the description. Next up, let's look at the car square test. So while the t-test and the ANOVA test for differences in the means of groups, the car square test allows you to see if there's a difference in the proportions of various categories.

In statistics speak, the car square test allows you to assess whether there's a statistically significant relationship between two categorical variables as opposed to numerical ones. As an example, you could use a car square test to check if there's a link between gender, for example, male-female, and preference for a certain category of vehicle, for example, sedans or SUVs. Similarly, you could use this type of test, the car square, to see if there's a relationship between the two categories of vehicles. between the type of breakfast that people eat, let's say cereal, toast or fruit, and their university major, let's say business, math or engineering.

As you can see, all of these are categorical variables and so chi-square allows you to assess whether there is a statistically significant relationship between them. If you're not yet familiar with different types of variables, for example, categorical and numerical, then be sure to check out our Variables 101 explainer video. And as always, you can find that link in the description.

All right, so now let's kick things up a notch and talk about correlation analysis. Correlation analysis looks at the relationship between two numerical variables, for example, someone's heart or weight, to assess whether they move together in some way. In other words, whether an increase in the value of one variable is likely to be accompanied by an increase or a decrease in the value of another variable. In statistics speak, correlation assesses whether there's a statistically significant relationship between two numerical variables. For example, you might find a positive correlation between the number of hours that students spend studying and their average exam scores.

Similarly, a correlation analysis may reveal a negative relationship between between the amount of time spent watching TV and physical fitness levels. When you run a correlation analysis, you'll be presented with a correlation coefficient which is also known as an R-value. This will be a number between negative 1 and positive 1. A value close to positive 1 means that the two variables reliably move together in the same direction.

As one goes up, the other one goes up. On the flip side, a value or correlation coefficient close to negative one, means that the two variables move in opposite directions. In other words, as one goes up, the other one tends to go down.

Now, it's really important for me to highlight here that while correlation analysis can help you understand how two variables are related, in other words, how they move together, it doesn't prove that one variable causes the other. Correlation is not causation. So be careful to not assume that one variable causes another when you're looking at correlation data.

All right, last but not least, let's look at regression analysis. Now, while correlation allows you to see whether there's a relationship between two numerical variables, regression takes it a step further by allowing you to make predictions about the value of one variable called the dependent variable based on the value of another variable or set of other variables called the independent variables. Let's look at an example to make this a little more.

tangible. You could potentially use regression analysis to predict the price of a house based on the number of bedrooms it has, its location, and when it was built. In other words, its age. Regression analysis of housing data in the area would give you a regression equation that would allow you to plug those factors in, in other words, the independent variables, to estimate a specific house price.

And that would be the dependent variable. In the same way, you could use regression analysis to predict a person's weight based on their heart, their age, and their daily calorie intake. Now it's worth pointing out that in these examples we've been specifically talking about multiple regression as there are multiple independent variables.

While multiple regression is certainly a popular form of regression analysis, there are many others. So as always, be sure to do your research before selecting a specific statistical. test or grab our cheat sheet to simplify this process. Now as with correlation it's important to keep in mind that regression analysis alone doesn't prove causation. While it can help you make predictions it can't prove that one variable or change in one variable causes a change in another variable.

If you want to establish causality you'll typically need a very specific research design that allows you to control all or at least most of the variables. You can check out our research design explainer video if you're keen to learn more about that. If you got value from this video, please do hit the like and subscribe buttons to help more students find this content.

If you'd like hands-on help with your research project and with your quant analysis, do check out our private coaching service where we hold your hand throughout the research process step by step. You can learn more about that and book a free consultation over at gradcoach.com. Until next time.

Good luck.