Probability and Statistical Inference Overview

So the majority of the course can be split into two parts. We have the first part which considers probability where we are starting from some population in which we have some model for it and we understand its characteristics and then from that population level model we want to infer the chances of something occurring observing a sample of some kind. But later in the course we'll be mostly concerned with statistical inference. This is where all we have access to is a sample from the population and provided that sample was conducted in such a way that it possesses certain properties we can infer characteristics of the population itself. We can estimate these characteristics provide some kind of levels of uncertainty quantify that uncertainty and give us regions of plausibility of the these attributes of the population. We can also use statistical inference to answer statistical questions in regards to that population of interest. So let's kind of get an overview of what we'll be going into throughout the course. So what we start from in the case of working in probability, we have a population which represents the entire collection of individuals or objects that will be considered in our study. We then condense that information into some kind of probabilistic model that describes this behav behavior of some attributes of the population with some kind of smooth functional relationship. That relationship describes the uncertainty in the individuals or subjects being studied in the population. We use this population level model to infer the chances of something occurring. Basically, we'll compute probabilities of observing a sample of certain characteristics. So, a sample is just a subset of the entire population. A small selection of individuals or objects that would take be taken from this entire collection of the population. So, we can if we have our population model, we can describe the chances of observing samples of certain characteristics. And understanding these population levels models will allow us to kind of make inference later on about these unknown populations. We will use the language of probability to describe our uncertainty. For example, if we indeed had uh some knowledge of the population, we can describe the chances of something occurring. For example, the state department had reported at some point in time that 36% of all Americans have a passport. So it's saying all Americans here they did some kind of largecale study essentially approaching like a census. So we're assuming that 36% of all Americans have a passport. This is a parameter of our population. So this is what we call a parameter and it describes some attribute of that population of Americans. What we may be concerned with if we know the population, we may want to answer probability questions in regards to taking samples from this population. Suppose we take a sample of 20 individuals at random that are Americans and we want to know what's the probability that 10 of them will actually have a passport. So we take a sample of size 20. We typically represent this with the symbol n equal to 20, the number of observations. The thing of interest here is that half of them basically having a passport and the other half not having a passport. So we would utilize the characteristics of how the sample is being conducted a probabilistic model using information from the parameters that were known to answer this question. And later we'll see that we'll have some probabilistic functional forms that will allow us to answer this question. Basically, we figure out all the ways of selecting 10 of the people from the 20 in the sample such that those 10 will have a passport and the remaining 10 would not. And we'll be able to use properties of the fact that we sampled in such a way that um each individual had no influence on each other and it will tell us that approximately there was an 8% chance of this occurring. So it wasn't a large chance mostly because only 30% of all Americans had a passport at the time that this uh large scale study was done. And the other situation is where we'll be doing statistical inference. We will not be starting from the population. The population will be unknown. We will not even have an exact population level model. We may have some idea of the model, but we may not know the parameters, the things that control how the model behaves. Instead, we take a sample from the large population and we use that to infer characteristics of the population and and provide estimates and answer statistical questions. So when we go from the sample to the population, we're performing statistical inference, not probability. However, we still use the language of probability some way to quantify the uncertainty. We use the knowledge of certain properties in probability to make these statistical inferences. So you can think of it as we go from the sample use the abstraction of the probabilistic models to infer the unknown in the large scale population. So we assume very little about the population other than we've conducted our sample in such a way that certain properties hold and the population has certain characteristics that are reasonable but we won't assume too much and we'll use this information from the sample to infer these unknown characteristics in the population that will be the majority of the course 80% of the course will be statistic ical inference but we first need to start from the tools of probability to gain some understanding of how to quantify uncertainty. These types of problems be simplified in the following form. Suppose we interview 20 Americans at random throughout the US and in the sample they find that eight of 20 hold a passport and then we say what can we conclude about the percentage of all Americans hold a passport. Suppose we don't know that large scale study that said that uh was it 36%? Yeah, 36% of all Americans hold a passport. All we have access to right now is the sample. And then we want to infer the characteristics of the population. Specifically, what can we conclude about the percentage of all Americans hold a passport just from this sample of 20 Americans? And we can do a best guess estimate. We could say 8 out of 20 is a reasonable guess. We have no other additional information. We say roughly 40% of all Americans probably hold a passport, but we know that we're just using a sample of 20 to quantify this. So there's a lot of uncertainty here. And we need a way to kind of quantify that uncertainty. Saying it's obviously not just 04, it's going to be somewhere between some number and some other number creating some kind of error bounds around our best guess. And we'll use these ideas to kind of estimate parameters of the population by just using our sample. But we need to make sure that our studies are conducted in such a way that the sample possesses certain characteristics. Our data that we observe s suggests that certain assumptions are met in the population itself and we can use a certain types of statistical inference procedures or otherwise if these assumptions do not hold we may have to use other tools. But the main idea is we start from a sample, get an estimate of something and quantify a best guess of a population level parameter and quantify the uncertainty around that guess. We may also want to do other things like we'll have some statement that someone believes to be true and we try to disprove that statement through our data that we've obtained through our sample. I that'll be the main idea for the course going forward. Just always come back to it. Are you thinking about starting from the population trying to compute a probability? You have a probabilistic model and trying to infer the characteristics of sample of occurring of a certain size or magnitude or do we only have the sample and we're trying to go backwards and infer about the population. This will be the primary case but early on in the semester we start from the other part because we need to know how to do probability first. All right. So always come back to these ideas throughout the semester. Try to understand uh when you're doing statistical inference, when you're doing uh probabilities and how how what are the implications of each scenario.

Transcript for:Probability and Statistical Inference Overview

Transcript for:
Probability and Statistical Inference Overview