Transcript for:
Statistics Overview and Concepts

Take a second and memorize everything on this desk. Got it? Don't forget the corners. Okay, now look away. Now back to me. Where did the tuna casserole go? Oh, who knows? But no matter what, most of us probably only remember seven things or maybe only four. Our brains are amazing organs full of astounding capabilities. But in some ways, they can be surprisingly limited. For starters, they can barely keep track of half a dozen items at the same time. But don't take it personally. It's normal not to remember big quantities of things unless you put in weeks of work and memorize 3,141 digits of pi and still forget them anyway. And that's a big deal because surprise surprise, lots of important stuff we need to make sense of and understand appears in large numbers. Like, is this two-bedroom apartment in Southbend really a better deal than the one bed in El Paso? If a restaurant has thousands of one and fivestar reviews, is that the same as lots of threest star reviews? How do we know the planet is warming? Can we make sweeping statements about the billions of people we share the planet with? And even for things we know, how do we know we know it? You know, statistics can lead us to a lot of meaningful conclusions. It shows us a whole bunch of ways to get data and transform that data into meaningful insights about whatever it is we care about. Hi, I'm Sabrina Cruz and this is Study Hall, real world statistics. [Music] You run into statistics all of the time. It's the mix of math and science that helps us make sense of data which are pretty much all the information swirling around us. That involves collecting, analyzing, interpreting, and presenting data in order to ta arrive at a conclusion. When we do statistics, we collect and analyze data related to a question. Then we interpret and present it using tests, charts, and graphs to find the answers we're looking for. And again, we humans love to categorize and interpret things. So, it's probably not surprising that we've been doing this for thousands of years. Take ancient Egypt. Thousands of years ago, these early data nerds needed to oversee tax collection. To figure out how much residents would be forking over, the Egyptians systematically recorded how big plots of land were and who owned them. Using that data, they figured out how much residents would then pay. Fast forward centuries later, and things got way spicier. Important pillars of statistics as we know it today kicked off around the end of the 17th century, ushered in by prominent mathematicians like Pierre Deerma and Bla1 Pascal. Both those guys were curious about games of chance and gambling, leading them to propose the theory of probability. Basically, they demonstrated that statistics could help you figure out your odds of winning or losing, which is a pretty big deal if you're a gambler. In the following centuries, many mathematicians and scientists contributed ideas to statistics. Then came computers. These beautiful data machines have changed our whole world and made doing statistics so much easier. Before, you'd be doing all of this by hand, unless you got familiar with one of these bad boys. But the invention of computers themselves is rooted in statistics. In 1888, the US Census Office was struggling to finish compiling statistics on the country's then very large population of 50 million people before the next census was due to start. That spurred the development of what was basically a new kind of computer, the tabulator, a number crunching machine specifically designed for statistical calculations. Today, the link between statistics and computers is still going strong. These devices generate mountains of data on everything from what your neighborhood looks like from space to what sequence of online ads nudge someone into buying more tuna casserles. Wait a second. The only practical way to analyze the enormous quantities of data we're making every day is with computers themselves. And that symbiosis has led to new ways of doing statistics and learning from data on a fundamental level. Learning from data is how your smartphone camera automatically recognizes and focuses on faces. Creepy, cool, effective. So, for thousands of years, we've been doing statistics. Exactly what that looks like might have evolved, but we're still trying to make sense of the world by discovering and sifting through information. If you're trying to get a handle on the transformative technologies of our age, statistics is a pretty good place to start. But it's far from the only reason to study statistics. Statistics is the bread and butter of how we learn things about the world. From the really big questions to the really small ones. So, forget transformative. Sometimes you just want to know basic stuff. Take sleep therapist's research. Based on the observations of the sleepy, phone obsessed teenagers at her local high school. She ponders the question, does the amount of time teenagers in her town spend looking at their screens affect their sleep quality? She finds research that screen time might interfere with the body's internal clocks and that teenagers spend a lot of time looking at screens. So, Itana constructs a hypothesis or a proposed explanation for an observation. She speculates that more daily screen time reduces teenagers sleep quality. She bets that teenagers who spend more than 4 hours a day looking at screens have lower sleep efficiency, meaning they spend significantly less of the time they're in bed at night actually asleep. Itana's hypothesis is about a population. In statistics, the population is the entire collection of things we want to know more about. Be it people, animals, stars, or jelly beans. In Itana's case, the population is every teenager in her town. Her hypothesis makes a prediction about a parameter or a descriptive measure or characteristic of the population. In this case, it's the measure of sleep efficiency. Itana wants to know the difference in sleep efficiency between the groups who get more than 4 hours of daily screen time versus those with less. Her hypothesis predicts that the value of this parameter will be negative or in other words the sleep efficiency of the first group is lower than the second. Estimating this parameter means studying real teenagers. The problem is that it doesn't have the time or resources to recruit all of the thousands of teenagers in town into a study to get data from. So instead, she uses a sample. A sample is a smaller group that we take from a population to study in a statistical analysis and hopefully learn something about the population as a whole. Typically, we'd still need a pretty big sample to make a statement about a lot of teenagers, but for the sake of our study, it recruits 68 teenagers from the high school into the study. From her sample, she won't be able to measure the parameter, which would require the whole population, but she can derive a statistic. A statistic is the same as a parameter only it's calculated from just the sample to estimate what the true parameter of the population is. And measuring this statistic means collecting data about her sample. Specifically, data are the raw facts, figures, or information we gather. And they can be pretty much anything. Numbers, words, categories, images, you name it. But the kind of data it needs to gather are the ones related to her variables. A variable is a feature in the data we're measuring because it varies across our data. If it didn't vary or change between members in the population, it wouldn't be worth studying. Judgy, I know, but true. It's variables are how much total daily screen time each teenager self-reports and each teenager's sleep efficiency. She measures this by giving her sample teenagers wristwatch-like sensors that measure restlessness during the night, producing a number between 0 and 100 for the percentage of time in bed spent sleeping. After collecting data over the course of 2 weeks from the sensors and surveys, Itana analyzes it with techniques we'll pick up in upcoming episodes and finds that her statistic is -15. In other words, the group of teenagers with more than 4 hours of screen time a day had 15% less sleep efficiency than the group that didn't look at their phone as much. Maybe I should make some lifestyle changes or not. Based on her calculations, Itana concludes that screen time probably is affecting the teenage population of her city. And although we didn't dive into her precise methods, Itana's screen adultled sleep story illustrates the core dynamics of research and statistics framing hypothesis about the world in terms of a parameter whose variables we estimate with data. As simple as it seems, that process helped it learn about what might be affecting her students sleep quality. But there's still plenty of questions ahead for our intrepid sleep sleuth. After all, how confident can she be in her conclusion? and how can we be sure that the data from her sample, the 68 students she recruited into the study, really do tell her about all the teenagers in her town. These sorts of questions are crucial for every field of science and engineering and plenty of other disciplines. Even outside of research, statistics are used everywhere from governments to NOS's, businesses, and more. These entities use statistics to make sense of the world around them, what's going on inside their own operations, and how to respond. If industry prices for something exotic like Craft American singles are shooting up, maybe I'll have to cut back on my tuna budget, which means, oh my god, my casserole budget. For all these reasons, there are lots of opportunities and growing demand for people with statistics backgrounds in just about every field. But there's an arguably even more important reason for wanting to learn statistics. The stories used to describe our world, define agendas, and convince people what to do are often driven by statistics. When news anchors talk about GDP growth, inflation and recessions, they use statistics to paint a picture of the economy. In public health, the effectiveness of drugs and vaccines, the risk of a new pathogen, and the cost of treatments are also presented as statistics. Everywhere from deciding on infrastructure projects, judging if organizations are discriminatory, or even just deciding who is the goat in sports, statistics are used to create a narrative that can lead to problems. Without a decent knowledge of statistics, you might be pushed into accepting biased framing or making a decision without all the relevant information. And as we explore throughout this series, statistics isn't a rigid procedure that always arrives at perfect truth. The people doing the statistics make choices, which could be driven by agendas other than honest and open-minded truth seeeking. By knowing what questions to ask and how to ask them, the study of statistics helps you find gaps in the argument, develop new perspectives, and get closer to the truth. Whether it's getting a better understanding of climate change or figuring out how far the study hall team will take this tuna casserole bit during this course based on what they did with lasagna in Intro to Programming, statistics are your toolkit for knowing pretty much everything. Our brains love statistics, and the more we master this field, the more we know about what we know. If you're enjoying this series and are interested in taking the full Study Hall real world statistics course and earning college credit from ASU, check out gostudyhall.com or click on the button to learn more. And if you want to help us out, give this video a like, comment your average screen time if you dare, and smash that subscribe button. Hurt your phone, damage your screen. Thank you for watching. See you next time.