Hi everyone! Welcome to our week that includes sampling and populations. This week, in addition to assignment one, which is just showing me you have downloaded SPSS, you will have a discussion board to complete. Do your readings and watch the videos.
Email, call, or text me with questions. There are several data sets we are using in this course. Your text walks you through learning how to do statistical analysis using what is known as the General Social Survey, or GSS.
which samples the U.S. population to gather information about their attitudes and some demographics. I will talk more about the GSS in just a bit. I expect that you will be practicing the methods you learn from the text using the GSS as we go. But for your assignments in this course, I choose to have you learn about California because I think learning is more fun and interesting when we get to study something that is real and applicable to us. So the data set you will use for your graded assignments in this class is from the U.S.
Census about California specifically. It is known as the American Community Survey or ACS. The ACS is a survey the Census performs every year and it has information about the entire U.S. population that is also divided into files for each state. Thus, we get to study California.
And even more interesting, if you ask me, we are studying the gender pay gap. in California using actual data about real people, you and me, and pay close attention to what I say next. I have only included those people in our California file who work full-time year-round. This means, as you will notice soon in one of our assignments, that there are far fewer women in our data file.
The reason? Because far fewer women work full-time year-round than their male counterparts do. The ACS California file we use is based on data over a five-year period. There are one-year files, but I like to work with the five-year files because I think they better incorporate trends.
Currently, we are using the 2014-2018 file, but I update this file every few years so future classes may be hearing this and be using a later file. The ACS file you will use for your assignments is under Course Materials. You will need to download it and open it in SPSS for use in your graded assignments. And you will need to download the GSS demo and excerpt files for use in your text practice. I showed you in another video that the GSS files are on the companion website for your text.
If you are having trouble finding the companion website, I suggest first checking the syllabus where I have placed a link to it. or searching online for Adventures in Social Research companion website. So, GSS datasets online at the companion website and ACS dataset for California under course materials.
Per usual, if you have any questions, email, call, or text me. When we analyze data, we are often analyzing samples of a target population. Samples are necessary because, for example, it would be quite difficult for us to ask every single person in the United States, the population, a question. In fact, this is only done every 10 years by the U.S. Census because it is costly, time-consuming, and still has error.
The General Social Survey, or GSS surveys, takes their sample from a list provided by the U.S. Census and the U.S. Postal Service.
Presumably, this list allows for a sample to be taken from a list that represents every person in the US population. As your text will tell you, samples are most representative when every person or unit that you want to know about could possibly be included in the sample and has equal chances of being included. It's like putting everyone's name in a hat and drawing a certain number of those people's names out. Randomly sampling a population is the gold standard for any sample. There are many types of random samples that can also focus in on one group or another to help with reducing error about a smaller specific group of people.
In this class we are learning about basic random samples. Just know there are different types of random samples. For social scientists, populations refer to an entire group of people you want to study.
For GSS and ACS that is the entire U.S. population, but since we cannot survey the entire U.S. population every year, ACS and GSS take a sample of that population, and we use for our assignments a sample specific to California from the ACS. If you were studying how immigrants are faring in the U.S., and you wanted to conclude something about immigrants in your study results, then your population you would want to randomly sample from is a list of all U.S. immigrants. Or, if you wanted to study and make conclusions about police officer attitudes towards people, then you would want a random sample from a list of all police officers. Samples almost always deviate from the population to a larger or smaller degree. The point is to have as low of a sampling error as possible.
Hence, random sampling, which ensures that every person you want to study has an equal chance of being included in the sample. Any good survey will report their methods of sampling an error. Error is a number that is produced from any sample as it relates to the population. The error number is used to calculate how much statistical estimates deviate from the actual population number. More on that later and in your text.
In this class, as I have said or written before, our focus is less on the math involved in calculating error on any statistic we use in this class and more on understanding there is math happening for any statistic, knowing which statistic to use, how to use tools such as SPSS to calculate statistics correctly, and then how to interpret the output. In this slide, We are going to have a look at what sampling error is by viewing the difference between a population mean and a sampling mean. The mean, which we will go into in more depth later, is simply an average of a bunch of numbers.
Look at this visual. For an entire population, say California or immigrants or police officers or the entire U.S., there will be a population mean. Population just refers to the entire group of people being studied. But because we only survey a sample of those people we want to study, there is also a sample mean.
For example, age. Let's say the mean age of people in California who work full-time is 40, and in our sample the mean age is 45. Our sample is off by five years. That's it. Of course, we cannot know the true population mean most of the time, but in our case, using the decennial or occurring every 10 years census data, We have a good number to represent the actual population mean.
There are two things to take from this. One, the larger the sample size to a point, the smaller the sampling error. I say to a point because people who use stats all the time also use something called a power analysis to decide how large a sample needs to be to reduce sampling error.
Because at some point increasing sample size does not reduce sampling error much. Thus, power here refers to how many people must be sampled before there is no additional benefit of sampling more people. Some say a sample should be about 10% of the population.
Others say less or more. Using a power analysis can be more precise, but you do not have to know power analyses in this course. I just want you to know what they are and what they are used for.
They are very advanced. Two, Sampling error is important for other statistical concepts. Standard error of the mean, confidence intervals, and margins of error.
There is a video this week about exactly these things. Look under the reading and videos folder for this week. You do not have to know the math for these concepts, however, some of you may be interested in the math and a deeper understanding of these stats. If you are really interested in learning the math, I suggest taking a statistics course or also looking up the Khan Academy.
The issue with that is that they often do not explain how the statistics are applied in a way that makes sense for social science. But if all you take from our look at sampling error is that samples differ to one degree or another from the population and that there are things you can do such as increasing a sample size to reduce sampling error then that is okay too.