Introduction to Statistics and Its Applications

Hi, I'm Adriene Hill, and this is Crash Course Statistics. Welcome to a world of probabilities, paradoxes, and p-values. There will be games, and thought experiments, and coin flipping.

A lot of coin flipping. Statisticians love to talk about coin flipping. By the time we finish the course, you'll know why we use statistics, and how, and what questions you ought to be asking when you run across statistics in the world. Which is all the time.

Statistics can help you make a guess whether or not you're going to be accepted to Harvard. Marketers use them to sell us gold lame pants. Netflix uses stats to predict what show we might want to watch next. You use statistics when you look at the weather forecast and decide what to wear, dress or jeans.

Policymakers use them to decide whether or not to invest in more early childhood education, whether or not to spend more on mental health services. Statistics is all about making sense of data and figuring out how to put that information to use. Today we're going to answer the question, what is statistics?

The legend says that during a late 1920s English tea at Cambridge, a woman claimed that a cup of tea with milk added last tasted different than the tea where the milk was added first. The brilliant minds of the day immediately began to think of ways to test her claim. They organized 8 cups of tea in all sorts of patterns to see if she really could tell the difference.

between the milk first and the tea first cups. But even after they had seen her guesses, how could they really decide? Because she'd get about half the cups right just by randomly guessing either milk first or tea first, and even if she really could tell the difference, it's completely possible that she would miss a cup or two.

So how could you tell if this woman was actually a tea savant? What's the line between lucky tea guesser and tea supertaster? As fate would have it, future super statistician and part-time pat- potato scientist, Ronald A. Fisher, was in attendance.

During his lifetime, Fisher began work that set the stage for a large portion of statistics, which is the focus of this series. These statistics can help us make decisions in uncertain situations, tea taste tests and beyond. Fisher's insights into experimental design helped turn statistics into its own scientific discipline.

And although Fisher didn't publish results of the tea test, the story has it. The woman sorted all the tea cups correctly. just in case you are curious.

At this point, it's worth mentioning that there are two related but separate meanings of the word statistics. We can refer to the field of statistics, which is the study and practice of collecting and analyzing data, and we can talk about statistics as in facts about, or summaries of, data. To answer the question, what is statistics? We should first ask the question, what can statistics do?

Let's say you wake up at your desk after a long evening studying for finals with a cheeseburger wrapper stuck to your face. And you wonder, why do I eat this stuff? Is fast food controlling my life? But then you tell yourself, no, it's just super convenient.

But you're worried. You're thinking about how great it is that McDonald's serves breakfast all day, right now. But maybe that's normal, right?

I mean, finals are this week. So you google the question, fast food consumption, and you find the results of a fast food survey. The first thing you might do is start asking questions that interest you.

For example, you could ask, Why do people eat fast food? Do people eat more fast food on the weekend than on weekdays? Does eating fast food stress me out?

Now that we have some interesting questions, we need to ask ourselves an even more important one. Can questions like these be answered by statistics? Like I mentioned earlier, statistics are tools for us to use, but they can't do all the heavy lifting.

To answer the question about why people eat fast food, you can ask them to fill out a questionnaire. But you can't know whether their answers truly represent what they're thinking. Maybe they answer dishonestly because they don't want to admit that they scarf McDonald's because they're too tired to cook dinner, or because they're ashamed to admit they think Del Taco is delicious, or because none of the given answers represented their reasons, or they may not really know why they eat fast food.

Armed with the results of the survey, you could tell that the most common reason that people reported eating fast food was convenience, or that the average number of meals they eat out in a week is five. But you're not truly measuring why people eat so much fast food. You're measuring what we call a proxy, something that is related to what we want to measure, but isn't exactly what we want to measure.

To answer whether people eat more fast food on the weekends, or whether eating it more than twice a week increases stress, we'd not only need to know how much people are eating fast food, which our questionnaire asked, but also which days they eat it. And we'd need an additional measure of stress. You can use statistics to give a good answer about whether you're going through the drive-thru more on the weekend, but But even the question of whether eating fast food is associated with higher levels of stress is hard to answer directly.

What is stress? And how can we measure it? And are people eating fast food because they're strong?

Or does eating all those calories make them stressed? It's often the case that some of the most interesting questions are the ones that can't be answered by statistics, like why people eat fast food. Instead, we find questions that we can answer, like whether people who eat fast food often work more than 80 hours a week.

The tools we use to answer these questions are statistics plural. And there are two main types, descriptive and inferential. Descriptive statistics describe what the data show. Descriptive statistics usually include things like where the middle of the data is, what statisticians call measures of central tendency, and measures of how spread out the data are.

They take huge amounts of information that may not make much intuitive sense to us, and compress and summarize them to hopefully give us more useful information. Let's go to the Thought Bubble. You've been working for two years in a local waffle factory.

Day in and day out, you create the golden browniest, tastiest frozen waffles ever created. The holes are perfectly spaced, screaming for syrup. And now you want a raise. You deserve a raise. No one can make a waffle as well as you can.

But how much do you ask for? An extra thousand dollars? An extra five thousand dollars?

You know you're valuable, but have no idea what other waffle makers get paid. So you dig around online and find there's an entire subreddit devoted to waffle makers, and someone, username waffleleaks, has posted a spreadsheet of waffle makers'salaries. Now with a quick glance at this huge list of numbers, you can see whether the woman who works at a similar job at the rival frozen waffle company makes more than you.

You can see how much more you're making than the new guy, who's just now learning to mix batter. But you still don't know much about the paychecks of your waffle company as a whole, or the industry, cause it turns out there are thousands of waffle makers out there, and all you see is a list with with data points, not patterns that can help you learn more about how much you might be able to convince the boss to pay you. Here is where descriptive statistics come in. You could calculate the average salary at your company, as well as how spread out everyone's salaries are around that average.

You'd be able to see whether the CEO's paychecks are relatively close to the entry level batter makers, or incredibly far away, and how your salary compares to both of their salaries. You could calculate the average salary of everyone in the industry. industry, with your job title, and see the high and low end of that pay. And then, armed with those descriptive statistics, you could confidently walk into the waffle boss's office and demand to be paid for your talents.

Thanks Thought Bubble! While descriptive statistics can be great, they only tell us the basics. Inferential statistics allows us to make inferences. Clever namers, those statisticians.

Inferential statistics allow us to make conclusions that extend beyond the data we have in hand. Imagine you have a candy barrel full of saltwater taffy, some pink, some white, some yellow. If you wanted to know how many of each color you have, you could count them, one by one by one. That'd give you a set of descriptive statistics. But who has time for all that candy counting?

Or you could grab a giant handful of taffy and count just those you've pulled out, which would be using descriptive statistics. If your candy was, in fact, mixed pretty evenly throughout the barrel, and you got a big enough handful, you could use inferential statistics on that sample to estimate the content of the entire taffy stash. We ask inferential statistics to do all sorts of much more complicated work for us. Inferential statistics lets us test an idea or hypothesis, like answering whether people in the US under the age of 30 eat more fast food than people over 30. We don't survey every person to answer that question. Let's say someone tells you that their new brain vitamin, Smartivite, improves your IQ.

Do you rush out and buy it? What if they told you that the average IQ increase for Group A, 20 people who took Smarty Bite for a month, was 2 IQ points, and the average IQ increase for Group B, 20 people who took nothing, was 1 IQ point? How about now?

Still not sure? It's a pretty small difference, right? Inferential statistics give you the ability to test how likely it is that the two populations we sampled actually have different IQ increases. However, it's up to you as an individual to decide whether that's convincing or not. And don't be alarmed if the bar you set isn't the same in every situation.

It's entirely okay to have different standards for the questions, does my cat like Fancy Feast more than Meow Mix, versus does this drug cure lung cancer? It might take more evidence to convince you to take a new, supposedly cancer-curing drug than to switch cat food brands. It should take more evidence to convince you to take a new, supposedly cancer-curing drug. than to switch cat food brands.

With inferential tests, there will always be some degree of uncertainty, since it can only tell you how likely something is or is not. Your job is to take that information and use it to make a decision despite that uncertainty. If statistics were a superhero, its bat call would be uncertainty, and its tagline would be, when you don't know for sure, but doing nothing is an An option. Statistics are tools.

Statistics help us make sense of the vast amount of information in the world. Just like our eyes and ears filter out unnecessary stimuli just to give us the best, most useful stuff, statistics help us filter the loads of data that come at us every day. Descriptive statistics make the data we get more digestible, even though we lose information about individual data points. Inferential statistics can help us make decisions about data when there's uncertainty. like whether SmartyVite will actually increase your IQ.

But statistics can't do all the work. They're here to help us reason, not to reason for us. They can help us see through uncertainty, but they don't get rid of that uncertainty. To push our tool analogy a step further, statistics like chainsaws are pretty useless, even dangerous, without understanding how they work. We need to know how to use them, and how not to use them.

As we'll see in later episodes, statistics done poorly can lead us to some pretty silly conclusions. And chainsawing done poorly leads to about 36,000 injuries in the US each year, 81% of which are lacerations. Did you know that almost no one dies because of chainsaw injuries? Once in a while, but it's very rare. 95% of the people who are hurt by chainsaws are male.

This does not necessarily tell us that males are significantly worse chainsaws. Statistics can help us plan a vacation to Bali in December. They can help us optimize our chances of winning our fantasy football league.

They can help us budget our meal card at college. Statistics can help us decide whether that additional insurance the guy at Best Buy is trying to sell us on our new blender is actually worth it. Statistics can also help us decide whether or not to go ahead with an invasive heart surgery.

Statistics can help NGOs optimize the amount of food aid they send to refugee camps. They can help policymakers decide if they should spend more or less money on helping students pay back their school loans, and can help you decide how much money you should be comfortable borrowing for college in the first place. There's a lot statistics can help us with, but some things statistics can't do.

Thinking statistically means knowing the difference. So when your brother says he used statistics to prove that your mom loves him more, you can rest easy, knowing the only question he answered is whether she gives him slightly more ice cream each night. And you've got data suggesting she gives you extra sprinkles. Thanks for watching, I'll see you next time.

Crash Course Statistics is filmed in the Chad and Stacey Emigholz Studio in Indianapolis, Indiana. and it's made with the help of all these nice people. Our animation team is Thought Cafe. If you'd like to keep Crash Course free for everyone, forever, you can support the series at Patreon, a crowdfunding platform that allows you to support the content you love.

Thank you to all our patrons for your continued support. Crash Course is a production of Complexly. If you like content designed to get you thinking, check out some of our other channels at Complexly.com. Thanks for watching.

Transcript for:Introduction to Statistics and Its Applications

Transcript for:
Introduction to Statistics and Its Applications