Overview of Sociology Research Methods

Welcome everybody, this is Dr. Alvarez and this is Lecture 1, Research Methods Review for Sociology 303 Statistics. So today, we're going to go through a couple of big issues. This is going to be a very important lecture, so I will take good notes for this lecture. We're going to talk about understanding the research process. We're going to introduce the idea of what a variable is. That section, section two, the introduction of variables, is perhaps one of the most important concepts we're going to be talking about throughout this entire semester. A lot of the things we do later on in the course is built upon that. So I would strongly encourage you to take good notes, to ask questions via email about that section if you have any misunderstandings about it. So understanding variables in research, we're also going to talk about... And there we will talk about causality, which is also another important thing, which will be on some of the exams and certainly on the final exam. And then we'll talk about conceptualizing variables. And then we'll look at other models, excuse me, other methods of doing social research. So a lot to cover. So let's just jump right on in. Thinking about research, understand the research process. Generally speaking, we talk about three different. purposes of research, different purposes. It's not like I said purposes. There are not three different purposes of research purposes or dolphins. It's three different purposes of research. The first is exploration. That's just familiarizing yourself with a new topic or phenomenon. And, you know, if something new comes about, no one has done any research on it. And so what would you do? You will go out and try to understand, hey, what is this new thing that is occurring. What does it mean? What does it look like? When I did my dissertation, I did my dissertation at UCLA on payday borrowing. Payday loans is a type of very expensive form of credit. And not a lot of people had done research on that. And so my study began as an exploratory study, trying to just detail, hey. What's going on? What are these payday loans? What, how do they work? Who are the people who are actually taking them out, right? And so I started off as an exploratory research project, right? But later on, I undertook a descriptive research project where I described a process, right? And there I described the process by which people found and took out these loans. I found out where they're located and how people came to the decision of deciding that they were going to take out a payday loan. So I undertook in that dissertation two different purposes. I both explored a new phenomena, payday lending, and then I described that phenomena as well. I described the process by which people took out a payday lending. I paid a loan. However, the real key purpose of research that is privileged, that really people are seeking, is explanation. Because explanation is really where you're able to try to demonstrate your ability to understand why a phenomena occurs, right? Because it details how and why something takes place. And so what did I do? In my dissertation, I also tried to explain why was it that certain people would go and take out very expensive forms of credit? Why wouldn't they use, for instance, a credit card? Why wouldn't they say, for instance, borrow money from friends or family before they took out very, very expensive payday loans, right? So I moved up to trying to explain that phenomena. So we privilege explanation over exploration. and description, but usually in order to get the explanation, we have to go through explanation and description, if you understand me, right? And so explanation is really the key thing here, and it's the thing within social science research, and actually all forms of research, whether it's social science research or the natural sciences, we care about explaining it, right? So here you can think about gravity, right? We explore how gravity works. And then we try to describe how gravity works, and we can do that, right? We can describe the force of acceleration due to gravity, right? 9.8 meters per second squared, right? Unfortunately, we don't have an explanation for why gravity occurs, right? We know that it has something to do with mass. We know that essentially anything that has a mass exerts gravity, right? We can say that, you know, the force of gravity is related to the... you know, the mass of the objects and the distance of the objects, right? But we still don't really have an explanation of why gravity works, right? So even in the natural sciences, explanation is the key purpose of research. And it's the same thing in the social sciences. We really, really do privilege explanation. Explanation is very, very difficult. And that's the problem, though. Explanation is very, very difficult. So. When we undertake research, typically we undertake it with two different processes, right? We can either take on a deductive process or what's called an inductive process. And you can see those processes, right? You see deduction on the right-hand side. You see induction on the left-hand side. And so when we talk about deduction, what we really mean is starting off with a theory. Using that theory to generate a hypothesis or set of hypotheses, observing the world in some structured way. In other words, that means using some type of research method. It means either we interview people in a structured way, we watch people, we live in their neighborhoods, we live in their communities in a structured way. That's called an ethnography. We conduct an experiment in a structured way, right? Or we conduct a survey. All of those are different forms of structured observation. It's another way of saying a rigorous collection of data, right? Once we have that data, we move to analyzing that data, right? We analyze that data. We then make sense of those results. And then based on those results, we come to some sort of conclusions of explanations. And we try to see if our data... and our data analysis confirm, or in other words, support our hypotheses or whether or not our hypotheses are not supported, right? We call that a deductive approach. We move from this big idea of theory, we narrow it down to our hypothesis, we test those hypotheses using structured observation, using data, and then we refine our theory or reject our theories and or hypotheses based upon what our data analysis. shows, right? This is typically what you get taught as the scientific method in your high schools or in some basic science classes, right? So this is why we're talking about this as part of our research methods review. Now, that is not the only way to conduct science, particularly in the social sciences. You can also do it from an inductive approach, using induction. How does this differ? Normally, inductive approaches means that you... go out into the world and you watch what goes on. And from that observation, you generate a theory about what you think is going on. And then you use that theory and you have that theory in your mind. And then you go back out into the world and you observe more of the world. And then you refine your theory. And then you go back out into the world, you observe more data, and then you refine your theory. Typically, inductive research processes are iterative, meaning that you go out into the world, you observe the world, you go home, you refine and analyze that data, and you generate a theory from that data. You go back out into the world, you observe it some more, you refine that theory, and you do that over and over and over and over and over again until you finally get to a point where you really feel like you have a very strong theory. that is very much rooted in the data that you have been observing, right? The world that you had been observing. Deductive approaches usually run through the process of theory, hypotheses, data collection, analysis, interpretation of results, and then support or non-support for your hypotheses and theory. It usually runs through that process and then it stops. And then somebody else also does a version of that and somebody else does another version of that. And through the collection of all of those efforts, we, you know, science advances, right? And so deductive approaches, generally those are quantitative statistical approaches. You go through once. you end your study, somebody else does it again and replicates it or doesn't replicate it. Inductive approaches are normally last longer and they are iterative. One researcher goes into the field, observes the field, comes back, revises their theory, goes out into the field, revises the theory. You can think here of somebody like, let's say Jane Goodall, right, who went out and observed gorillas and learned about gorillas. and watch what they were doing and spent time with them. And then she came back and she refined her theories about it. Then she went back out into the field. She observed more. And through time and through an iterative process, she learned more and more and more about guerrilla behavior. She generated more and more ideas about theories about guerrilla behavior, right? And so that would be an inductive approach, right? Oftentimes, inductive approaches are qualitative approaches. That means that you're watching and you're learning and you're writing field notes in the social sciences. That means you're interviewing people, you're spending time in their neighborhood, you're watching what's going on. And then you come back to your house and you write some notes and then you go back into the world and you observe their neighborhoods. And so, again, the point that I'm trying to make is that there's these fundamentally different... structures of research, a different process, right? We have deductive research that goes down from theory and hypotheses, testing those hypotheses with data, and then confirming or not confirming our hypotheses or theories versus induction, which observes the world, generates a theory from that, and then goes to an iterative process to refine that, right? We can sort of summarize this, you know, just very easily saying induction and then we can go back to the question. means moving from data, often qualitative observations of the world, to build theory. And it goes from data to theory. And it does so, and I would really write this down if I were you, through an iterative process. It goes through an iterative process, right? Whereas deduction uses theory to generate a set of hypotheses that can be tested using data. It goes from theory to data, where we use theory to generate hypotheses. hypotheses, right? Then we collect data, analyze the data, and sort of see if there is support for our hypotheses, right? For our purposes, statistics, and generally this is the case for whether or not you're doing surveys to get statistics or if you're doing experiments to get statistics, they're largely deductive. You start off with some theory, a complex explanation of relationship between variables. An example of this would be human capital theory. If you don't know about human capital theory, if you've never heard of this before, human capital theory essentially says that the inequality in things like income that we see in the world is a byproduct of the different type of capital that people bring to the labor market. Well, what do they mean by capital? They mean the skills they bring, the level of education that they bring, the talents that they have, the more skill. the more talent, the more education you have, the higher your level of income. And so it says that some people earn more, some people earn less based upon the amount of skills, the amount of talent, and the amount of education that they bring, right, to the labor market. From that theory, we can generate a hypothesis. A hypothesis is a tentative answer to a research problem, right? We can generate from human capital theory a very specific hypothesis. Those with more education should earn more money, right? Then we want to get some data, right, in order to test this hypothesis. We can compare earnings of those with different levels of education, right? In other words, we can look at people with different levels of education and see their income. And based upon that, come to a conclusion about whether or not our hypothesis is confirmed or not, right? And we're going to do this in the next slide, right? We're going to do this in the next slide. So this is the median annual earnings by education level for full-time year-round workers who are older than 25 in 2011. And so what do you see on the bottom? What are these? These are, I'm asking you questions like I can hear you respond to this. These are levels of education. Do not graduate high school, high school graduate, some college, associate's degree, bachelor's degree, professional degree. Now, then we can look at the bars themselves and actually the number. What does that indicate? It indicates that as... education is increasing, what's happening to income? It's increasing, right? It's increasing, right? So there does seem to be support for the hypothesis that as education increases, so too does income. So there seems to be some support for human capital theory, right? That as education increases, so too will income, right? Here we looked at every level of education that we could find, right? And let's be more specific. What happened? This data comes from some census data. It's not a true census. The Census Bureau did a sample. And what do they do? They went out and they asked people, a lot of people, what level of education they had. And the response category is the answers that they could provide is did not graduate high school, high school graduate, some college, no degree, associate's degree. bachelor's degree, professional degree, right? They also asked them another question. What was that other question you think? It was, how much money do you earn, right? And then they put those two variables, right, those two ways of measuring the world together in order to look at the relationship between education and income. Does that make sense to everybody? Now, we could look at every single level of education and look at the level of... income associated with it. But the other thing that we could do is look at two specific levels, right? So we could compare high school graduates to those with a bachelor's degree, right? And what will we see there? The exact same thing. High school graduates, their median income, right, is $28,659. Bachelor's degree, their median income is $49,648. So even if we just looked at two particular points of this, two particular categories of education, you would still see that there is some evidence for our hypothesis that as education increases, so too does income, right? Which is why on this bottom slide, it says our conclusion was get a college degree, right? Let's continue. So I use this word variable, right? Variable, a moment ago. What does that really do? What does that refer to? Well, variables are how we measure the social world, right? It's how we measure the social world. So what is a variable? This is the actual definition. It is a property of people or objects that takes on two or more values, right? It is a measurement of the social world. So for instance, do you own a car? Yes or no. Are we measuring something about the social world? Yes. We're measuring whether or not you have a car. That's a measurement of the social world, right? We could ask additional questions after that. What kind of car? How much is that car worth? How much did you pay for that car? Et cetera, et cetera, et cetera, right? Let's look at another example. How long is your commute to school? And we could ask you how many minutes specifically, and you have to tell me the number of minutes, right? Are we measuring something about the social world? Yes, we are. We're measuring the number of minutes it takes you to leave your home and to get to campus, right? That's a measure of the social world. Do you strongly agree, agree, disagree, or strongly disagree that parking should be easier? Again, are we measuring something about the social world? Yes, we are. We're measuring your attitudes, how you feel about parking on campus, right? And whether or not you strongly agree, agree, disagree, or strongly disagree that parking should be easier, right? a variable measures something about the social world, right? In our previous example, we measured the education level of individuals. We also measured their income, right? Those are things that we want to actually measure. Now, normally in this class, we're going to be using survey data, which means that we go out and we ask lots of people a whole bunch of questions, right? And each one of those questions you should think about is a different variable, right? It exists as a response. So I could ask you, for instance, how excited are you for this class, right? Very excited. Excited. Not so excited. Not excited at all, right? Sorry, that was my Eeyore voice. And that would measure, again, something about the social world, right? And so we... any and it would be in a long survey that you would answer about this class right and so every question on a survey is trying to measure something about the world. Now, I keep saying something about the social world, whatever they mean by that. When I say social world, normally what I mean is something about an individual, something about their background, something about what they believe, something about what they do, something about how they think about the world, right? Normally in psychology and sociology, political science, communication, economics, we're normally asking something about people, right? Or something that people do. Or about businesses, right? So we could ask you, how long did you watch TV yesterday? Or how often did you use your phone? And all of those things are variables that we use to measure the social world. That said, we do differentiate different types of variables. And that's what we're going to spend the next sort of almost 10 slides talking about. We call this levels of measurement. So, what type of measurement are you taking of the world? And then we also call it the type of variable. So, every variable has a level of measurement and every variable has a type. Your job in this class is to be able to look at a question on a survey and to be able to determine what the level of measurement of that variable is and what the type of variable it is. And you will get questions like that on your exams. So pay very close attention to what we're going to be talking about. But in addition, much of what we do later in this class is determined by or predicated on your ability to determine the level of measurement and the type of variable that you're dealing with. And if you can't do that, that means you'll do the wrong type of statistical analysis and then you'll get that wrong. So it is very important that you understand level of measurement and types of variable. Let's get started. The most basic form of measurement is a nominal level of measurement. This is a question that really just asks respondents to tell them what category do you fall into. And those categories have no intrinsic relationship to each other. They're just the names of a category. And you might be like, well, what the hell are you talking about right now? Well, let's look at some examples, right? So here's the definition. Response categories have no intrinsic relationship. Values attest that categories do not imply real values. What does that mean? One, we could ask somebody how, you know, what their gender identification is. And this is straight from the GSS. The GSS is old school. It does not actually allow for, does not provide additional gender identity options. It just provides male and female. In other words, it can. conflates, you know, gender and sex. However, if I ask you, how do you identify in terms of your gender, male or female? Is there any intrinsic relationship between male and female? Is male greater than female? Is female greater than male? No, they're just two categories that you could fall into and you tell me which one you fall into, right? Does that make sense? There's no intrinsic value. You can't rank them. You can't say one is more than the other. Does that make sense? They're just named categories. Now, when you go into SPSS and you see this survey question in your data set, it records people's responses, not by saying, writing in for your response that you said mail. and this other person said female, or that you said female, and somebody else said male, it doesn't record that as female and male. It says, if you said male, I'll record you as a one. And if you said female, I'll record that as a two, right? Now, should we confuse that to say that females are twice males? No, that one and that two are just place orders. They're just, a one tells you female, a two tells you. A one tells you male, a two tells you female. That doesn't mean females are twice males, right? They're just a placeholder for what your response is. So you cannot confuse the numbers that you see in SPSS with actually pertaining to an actual value, right? That's the second sentence. Values attached to the categories do not imply real values. Let's look at a couple more examples. Left-handed or right-handed, right? If you're left-handed, does that make you greater than somebody who's right-handed? If you're right-handed, does that make you greater than somebody who's left-handed? No, they're just categories that you fall into. It is a nominal category. What does nominal mean? It means name, right? The categories are just names of categories and you tell. the researcher which category you fall into. Again, SPSS may record left-handed people as zero and right-handed people as one. Does that mean that left-handed people don't count for nothing? I'm left-handed, by the way. Well, I'm actually ambidextrous. I play basketball with my left hand. And yes, I got that J for you. Excuse me, sorry. And a one means right-handed. Does that mean right-handed people actually count for something and left-handed people don't? No, that's not what that means. It's just a placeholder in our data set, a placeholder for our data set about which response that you gave. Let's keep going. How about race, ethnicity, and nationality, right? In the GSS, the race variable is actually a very bad question, but here's what it says. Maybe write this down as I say it. Now, which race do you identify with? One, white. Two? black, three other. Now, is there any intrinsic ranking of those categories? Is white greater than black or black greater than white or black greater than other? No, they're just named categories that the respondent is putting themselves into, right? Does that make sense, right? Now, I mean, unless, of course, excuse me, you're a racist or a white supremacist. In this case, that's a whole other question, but still at the level of statistics, that variable is a nominal level variable. If I ask you, what country were you born in? Now list out all the countries in the world. Is there a necessary relationship between all those countries? No, there's not. There's absolutely not. They're just named categories that you identify. yourself and put yourself into. This is what we mean by the nominal level of measurement. Nominal variables contain the least amount of information. Why do I say that? Because all we know is that you identify or you put yourself into a category that fits a particular name, right? Whether it's male, right-handed, left-handed, that you're born in the United States or that you were born in Mexico or Costa Rica or... the Dominican Republic or whatever it is, right? And so nominal level variables contain the least amount of information for us. This is different than ordinal variables. Ordinal variables ask questions where the responses can be ranked. The responses can be ranked. But we normally don't know the distance between the categories. We don't know how far the distance is, right? What do I mean by that? Let's take a look at some examples. So a perfect example of this in the GSS is class identification. The responses to the class identification variable are, do you identify as working class, middle class, or upper class? Now, can we rank your responses here? Now we can rank the response, right? We know that working class is a lower class than middle class. And middle class is a lower class than upper class, right? Does that make sense? So now, not only do we have information about which category that you go into, we can also rank you versus other people. So one person identifies as working class, another person identifies as middle class. We can actually arrange you in order. Does that make sense, right? And now, the numbers, they don't actually mean anything, but they do go in order. No, one is less than two, two is less than three. Do you see that? Let's take another example. How about strength of approval, right? Strongly approve, approve, neither approve nor disapprove, disapprove and strongly disapprove, right? So if I ask you a question, parking at Cal State Philatelan should be easier and you say strongly approve and another person in this class says disapprove. Not only do we know you're in different categories, we actually know that one of you has more approval of trying to make parking easier, whereas the other one has much less approval. They actually have disapproval of actually making parking easier, right? Ordinal variables allow you to order the responses, right? It contains more information. Not only are there names associated with the categories, but those categories can be ranked in some way. And so we cannot, we cannot. Figure out we can't look at a variable the way that it's asked right and figure out the level of measurement We must be able to see the response categories, right? These are the response categories you need that you need that to see how the responses are structured in order to determine What the level of measurement is and again if I were you I would be writing all of this down because this is going to Be some of the most important stuff we recover in this class Now there's one more level of measurement that we're going to deal with in this class, and that's an interval ratio variable. It's actually two different levels of measurement, interval and ratio. We're going to combine them together into interval ratio. Interval ratio variables are essentially any numerical variable. Not only can we rank people, but we can also tell the exact distance between those rankings. And you might be like, what the hell are you talking about? But if I ask you, how many years of, if I ask the class, how many years of education do each of you all have, right? And then I look at that data and some of you all have 14 years of education. Others of you all have 15 years of education or 16 years of education. Do I know that those who have 16 years of education have more years of education than those who have 14? Yes, I do. And do I know exactly how much more they have? I do. It's. two additional years. Does that make sense? So now we actually know the distance between the rankings, right? And just so that we're clear, interval ratio variables are not category. They're not going to provide categories. They're numeric, right? They're going to provide numbers. So for instance, if I ask you, how many dollars of income did you earn last year, right? And I say, tell me the actual number of dollars. I, you know, I earned, I don't know, $35,000 last year, right? And I'm obviously I'm making the number up. By the way, you do know that every professor and every person who works at Cal State Philatelan, that their income is publicly available, right? So there's websites where you can go and look that information up, right? Anyway, that's just a side point. Sorry, I don't mean to, I don't mean to go off on a tantrum. Dollars of income. I'm asking you dollars, right? I'm asking you about a specific number. Can you rank that number? Yes, you can. Do you know the distance between those things? Yes, you do. Therefore, whenever you see an actual physical number, that is a interval ratio level of measurement, right? If I ask you, how many minutes did you spend watching TV yesterday? And you say, oh, I watched two hours of TV. That's 120 minutes, right? That's an interval ratio level of measurement. Interval ratio levels of measurement have the most amount of information. Ordinal variables have the sort of the middle amount. And then nominal variables have the least amount of information because they're just named categories. Does that make sense? It's very important that you understand this. It's very important that you understand this. Now, keep something in mind here. Let's imagine that I ask you for dollars of income. I said, how many dollars of income did you earn last year? But instead of asking you for the specific number, I said, was it between zero and 5,000? Was it between 5,000 and 15,000? Was it between 15,000 and 35? Was it between 35 and 50? Was it between 50,000 and 100,000? What if I gave you that version of the variable? What level of measurement would that be? Am I, I'm asking this as if I can hear your response. I just want you to think about it. I mean, maybe write that down. Would the person answering that question write down a specific number? Or would they choose from an ordered set of categories? And those categories just have to happen to contain numbers, right? So imagine that somebody, you know, I ask that question. in the survey I sent out to the class, right? Imagine I asked that question. One person says they earned zero to $5,000 last year, right? So they're in category one. And another said they were in category four, that they earned 35 to $50,000. Do I know the difference in income between those two people? Or do I just know that they're in two different groups and one earned more than the other? It's the second one, right? version of asking questions about dollars of income, but where I have categories, that is an ordinal variable because you're just identifying that you're asking the respondent to put themselves into a category that can be ranked, but you don't know the distance between those two categories. If you ask them for the specific dollar amount they earn, that is a interval ratio level of measurement. Clear? Okay, let's keep going. Let's turn to types of variable. Types of variable. Again, you want to pay attention to the responses that the question provides. If a question only allows for two different possible answers, it is a dichotomous question. If those... questions are coded as 0 and 1 in SPSS, we call it a dummy variable. And I'm going to show you an example of that in just one second. So let's look at an example. Did you vote in the last presidential election? Why is this dichotomous? There are only two answers, right? No and yes. No and yes. Do you see that? And again, if I was putting this information into SPSS, If you yourself said no, would I write in no? I would put in a one and say that stands in for no. If somebody else said yes, I would put in a two. So is this a dummy variable? No, it's not because it's not coded zero and one. If I put in a no response in SPSS as a zero and a yes as a one, then it would be a dummy variable. You may be asking yourself, why the hell does that matter at all? Why is he making this distinction? It will become clear to you later on in this course why I'm making this distinction for you. For right now, your job is just to know that a dichotomous variable is any question that only has two responses. Only has two responses. It can be yes, no. It can be approve, disapprove, right? So if I ask you, do you approve of the job that Trump is doing as president currently, and it was either approve or disapprove, that would be a dichotomous variable, right? If you saw that approve was coded as one and disapprove was coded as zero in SPSS, you would then call that a dummy variable. And just to be clear, and I think you should write this point down, all dummy variables are dichotomous, right? All dummy variables are dichotomous, right? Because there are questions that have only two responses. But not all dichotomous variables are dummies, right? Because the only dichotomous variables that are dummy variables. ones that are coded as 0 and 1 instead of whatever else you want to do. Now, where those numbers come from, when you the researcher are collecting the data, you get to determine how you put that information into your data collection software. You get to make that decision. You're gonna hear me say this all the time in this class, is that researchers get to make the decision about how they do many of the things they do in statistics, right? You just have to have an explanation or justification for why you did it. So to repeat, the first type of variable we're talking about is dichotomous. It is any question that has only two responses. If those two responses are coded as 0 and 1 in SPSS, you would consider it not just a dichotomous variable, but a dummy. variable. Okay, let's keep going. We also have categorical variables. Any variable that has categories as its responses, typically nominal and ordinal variables. I just want to be clear about this. If you can identify that a variable is nominal or ordinal, it will be a categorical variable. It just says that the question has a response that is broken down into categories, right? So we talked about the race variable, right? We say that the responses to it are one white, two black, and three other, right? Are those categories that people fall into? Yes, they are, right? So it's a categorical variable. We also said that those are just categories that cannot be ranked, right? So we can actually say that with race, it is a nominal categorical variable. You put the two things together. In this class, on the exams, I will ask you, I will give you an actual question from a survey, and you will need to be able to. Identify the level of measurement of that variable and the type of variable that it is, right? So how about highest degree achieved, right? Do you remember that from our example earlier on where we had less than a high school degree, high school degree, some college, associate's degree, college degree, and then professional degree? Are those answers, those responses categories? They are, aren't they? Right? So that would be a categorical variable. What level of measurement would it be? Well, can we order those responses? Does less than a high school degree have less education than someone with a high school degree? Yes. Does someone with a high school degree have less education than someone with a college degree? Yes. Right? So the second type of variable that we deal with is categorical. categorical. Any nominal and ordinal variable will be a categorical variable by definition. And if I were you, I would write that sentence down and put it in my notes and have it ready for an exam. Okay. The next two types fall under the heading of numeric, numeric variables, right? Which level of measurement What level of measurement typically do we associate with numbers that are actually numbers? It's interval ratio, right? So interval ratio variables are numeric, and they can be either numeric discrete, which is what this is, right? A numeric variable whose units cannot be subdivided. Often, but not always, this means the numbers must be whole numbers, right? Can you think of an example of this? What place did you finish in the race? Can you come in 1.5 place? You cannot. Can you come in 3.5 place or 3.1 place or 3.8 place? You cannot, right? You need to come in first place or second place or third place, right? So, but if you come in first place, right, that's a number that can't be subdivided. How many children do you have, right? Is that a number? It is a real number. I have two, I actually have zero kids, but... How many kids do you have? I have two kids. Is it possible for you to have 2.2 kids, 2.3 kids? Not unless you are a terrible, terrible person doing terrible things to your children. No, I'm just joking. No, it's not possible. It must be a whole number. You cannot subdivide children. How many people live in your home? Does somebody only halfway live in your home? Either somebody lives in your home or they don't live in your home, right? You cannot subdivide. Is it a number? Yes, right? So in other words... Interval ratio variables that are always, by definition, they are numeric variables. They are questions that produce numbers as their responses, right? Those numbers can sometimes not be subdivided. We call those discrete, right? And you have to be able to identify when a interval ratio variable cannot be subdivided, and then you will label that discrete. So if one version of it cannot be subdivided, what do you think the other version of it is? It is, I'm actually missing a slide here and I apologize. The other version of this is, and please write this down, is continuous. I'm going to write it here right now and I'll update this. We're going to add a new slide right now. You get to watch me do this. Uh oh, delete slide. You get to watch me do this. Types of variables. I'm going to say this one is continuous. A numeric variable whose units can be subdivided. E.g., why would it, that's weird, height, weight. Can we subdivide height and weight? Can we subdivide time? We can, right? We can break down hours into minutes. You can have half a minute, half a second, a quarter of an hour, right? You can have half an inch, a quarter of an inch, right? So to be clear, numeric variables, in other words, interval ratio variables, can either be discrete. continuous, discrete or continuous, right? How tall are you? Oops, I have never done this before. Feet, inches, etc., pounds or kilograms if you're that type of person. hours, minutes, seconds. Do those always have to be provided as whole numbers? They do not. If I asked you how long were you on Twitter yesterday and you said half an hour, right? That makes sense, right? That makes perfect sense. You can say that, right? So it can be subdivided. So your job is to be able to look at a variable and figure out if it produces a number Right? Meaning it's numeric. Can it be subdivided or can it not be subdivided? You're going to have to be able to do that. So if you have any questions about that, please email me. Not only do we have to be able to identify the level of measurement of a variable, we also have to be able to identify the type of variable that we use. You need to know both of those things, and if you have any questions about them, please email me. Please, please email me. However, in this class, we normally don't care about a variable by itself. So what do I mean by that? If I ask the class about education, and I want to understand the distribution of education, that's perfectly fine and understandable, and we will do some of that in this course. But normally, we care about the relationship between two variables. Does that make sense? Like we might care about the relationship between education and income, right? And so normally, what we think about is not just understanding the distribution of one variable, but rather understanding the relationship between two variables. Does that make sense? And so we have a language for understanding those two variables, but also a language for understanding the relationship between them. And that's what we're going to talk about now, how variables get used in research. So let's go to that. You may have heard the terms independent and dependent variables before, right? An independent variable is usually the variable that we think of as the cause, you know, as something that produces an effect, right? We usually think of this as the cause. cause. And we usually refer to it as X, right? Then we also have a dependent variable. And that's the variable we look at to see if there is an effect, right? We refer to that as Y. Normally, the thing we're most interested in, the thing that we care about is Y, the dependent variable, right? For instance, in my dissertation study, I was really interested in why people take out a payday loan. And so my dependent variable that I looked at was, did you take out a payday loan? That was my dependent variable. And so then I thought of other variables that I thought might have an impact on dependent variables. So one of the independent variables I looked at was savings. And so I was interested in understanding the relationship between savings and taking out a payday loan. Does that make sense? I thought that. whether or not you would take out a payday loan would be dependent upon the level of savings that you have. Make sense? Make sense? Let's look at another example. We talked about human capital theory just a little while ago, right? We had a hypothesis that as education increases, that also income would increase, right? What's the... dependent and independent variable there. Well, there we want to see if education produces a increasing education, produces a rise in income. We think that Income is dependent upon education, right? So education is my independent variable and income or earnings is my dependent variable, right? And so oftentimes what we want to know is what is the relationship between my independent variable and my dependent variable. More specifically, the thing that we really, really care about is determining. If the relationship between my independent variable and my dependent variable is causal, it's causal. And I would write this down. If I were you, I would put a star next to it, an exclamation point. I would write this down. So what do I mean by this? What I mean is what we would like to be able to say is that it is in fact rising education. increasing education that actually causes income to increase, right? That actually causes income to increase, right? Why do we care about that? Because in many ways, if we can identify that there's a causal relationship between the two variables I'm interested in, between my independent variable and my dependent variable, what have I done? I've gone a fair amount of ways. towards explaining, explaining why income is what it is, right? And we said early on in this lecture that explanation is one of the most important, it is the most important purpose of research. So identifying causal relationships is one of the key portions of explaining, explaining things in the social world. So we privilege causality over just about everything else. Does that make sense to everybody? We want to actually be able to identify causal relationships. How do we know when we have identified a causal relationship between two variables? Well, three things have to hold. There has to be a correlation between the two variables. There must be the appropriate time order between the two variables, and we must have non-spuriousness between the two variables. Again, if I were you, I would write these down so that you have them for easy review later on. Let's go through each one of these together. Correlation. We must establish that there is actually a relationship between x and y. We must establish that there is a relationship between x and y. X and Y have to vary together. They have to co-vary. That when one goes up, the other one, the other variable has to either go up or go down in a patterned way. Patterned way, right? So we have to see if there's a relationship between the two variables, right? That's what we were doing with education and income, wasn't it? We looked at as education went up, right? As education went up. Did income go up, right? And we looked to see if there was a correlation between those two. And we did, in fact, see that, right? Now, here's the key question, right? Here's the key question. And you probably have heard this before. Does correlation equal causation? It does not, right? It does not. Correlation does not equal causation. That's actually what we're talking about here, right? Correlation is just one part of identifying. causation. We need additional things to be able to identify that, right? We need non-spuriousness and we need time order, right? However, we have to have correlation in order to get causation, but correlation by itself is not enough. In other words, for those of you comfortable with necessary and sufficient clauses, right, or conditions, correlation is a necessary but not sufficient condition for... causation, right? We must have correlation to have causation. However, correlation by itself is not enough to prove causation. Much of what we do in this course is coming up with different ways of measuring whether or not a correlation exists between two variables. We're going to be dealing with that throughout this entire course. What do we know? If a correlation exists between these two variables, how do we figure that out? Much of statistical analysis... determining if there actually is a correlation between two things because it turned out that that can be really quite complicated right and later on we'll be able to determine if there is a relationship between them and when we get near the end of the course we'll get some vaccine very powerful tools to determine if there exists a correlation coefficient causation is still a deeper and broader problem and we'll deal with that later on but let's continue with our criteria So correlation is the first. The second is time order. We must establish that one phenomenon that we identify as the cause occurs before the other phenomenon. Cause must come before effect, right? So let's think back to our education and income slide, right? What if we found out that people's income was rising prior to their education? increasing, right? How would it be possible then for education to be the explanation for why their income increased? You see what I'm saying? So the change in your independent variable has to come before the change in your dependent variable, right? The cause must come before the effect. Cause must come before the effect, right? Our final, our final Criteria is non-spuriousness. This is the most complicated one. It says, we must establish that the relationship between the two variables, X and Y, is not caused by some other third variable. Time order is one real threat to identifying causality, but it's really the criteria of non-spuriousness. That's the major hurdle for survey research in identifying Well, one of the major hurdles, probably I would argue the most important, is the most important problem of identifying causal relationships in survey research. But first, so what do we mean by non-subversionist? This graph is all well and good. What do we mean? Let's look at an example. So imagine a researcher takes a survey of 10,000 respondents ages 3 to 21. She discovers a very strong correlation between shoe size, her independent variable, and score on a vocabulary quiz, Z. And she determines that scores seem to rise with shoe size. In other words, as X increases, shoe size, score on a vocabulary quiz, the dependent variable, also seems to rise, right? Also seems to rise. Well, why do you think that is? Can you think of, do you think that shoe size is causing, is causing, you know, people to have better vocabulary? Are just people with larger shoe sizes smarter than people with smaller shoe sizes? Is there something else that might be causing this effect? Does some other variable explain the relationship between shoe size and score on a vocabulary quiz? I suggest to you that there is. Can you think of it? What about... age, right? Is age another variable that may be operating here? Yeah. What happens as people age? They grow, right? What happens to their shoe size as they grow? Their shoe size increases, isn't it? Doesn't it? Right? What also happens as people get older, right? Their brains develop, they learn new things, right? And so As they get older, they get wiser and smarter so they can answer more questions on a vocabulary quiz, right? So what's going on here? There is a third variable at play here, age, that explains the relationship between shoe size and vocabulary score. It is not shoe size that is the main issue here. It's actually age, right? And so what would we do? We would... add age to the analysis, right? And then try to see if there still exists a relationship between shoe size and vocabulary scores. Well, what would that look like? If we asked, or we looked at just people who were 11 years old, right? Or who were just, who were 18 years old, if you prefer, and we just looked at 18 year olds, would we have any belief or prediction? that shoe size will be related to vocabulary score. No, right? There's no reason for us to believe that shoe size has anything to do with your vocabulary, right? Nothing. All of that was from the outcome of this other third variable, age, right? So we would actually describe the relationship between shoe size and score on the vocabulary quiz as spurious. It is a spurious relationship. And so therefore, we would conclude that there is no causal relationship between shoe size and age, right? There's another very famous version of this. Some of you, I've heard of this before, which is that if you go and you look at the data, you can actually see that as temperatures rise, rather, excuse me, I just gave away the, I gave away the, and the answer to this, but as ice cream consumption increases, so do homicides. As ice cream consumption increases, so do homicides. And this is a correlation we can actually measure in the social world. Does this mean that we think that people who eat large amounts of ice cream are more likely to go on a homicidal killing spree? Is that what we really believe? No. Well, what's going on here? Well, When do people consume more ice cream? When it's hot outside, right? And when it's hot outside, you spend more time, or warmer outside. And when it's warmer outside, what do you do? You spend more time outside, right? You walk to the store, you go out, you go to the park. There's just more time spent outdoors. More time spent outdoors means more potential for conflict. More potential for conflict means more homicides, right? So what would we say? That the correlation that we see between ice cream consumption and homicide rates is spurious. It's not ice cream consumption that produces, you know, increased levels of homicides. It's that the increased temperatures outside means that there's increased contact between people. Those increased contacts means greater risk of conflict. Therefore, you get more homicides, right? The relationship that we observe in the social world between between ice cream consumption and homicides is spurious, right? In order to identify a true causal relationship, we have to be able to say that the relationship between X and Y, that we do identify that there is a correlation, that the change in the independent variable comes before the change in the dependent variable. and that there's no spurious relationship between the independent variable and the dependent variable. There's no other third variable that explains the correlation we see between the independent variable and the dependent variable, right? We typically, in the social sciences, we are very interested in it. We are privileging of identifying causal relationships. It's often our number one goal. Is it our only goal? No, it is not our only goal. There are other goals that we might care about. We might also care about how well we can understand that relationship. Because just because we identify a causal relationship doesn't mean that we actually can explain fully what's going on there. It takes us a certain ways towards it, right? But it doesn't mean that we fully understand it and we have a deep understanding of it. understanding of why, for instance, education causes income to rise, right? Or why social contact leads to greater chances of homicides, right? Sometimes we may privilege a deeper understanding of what's going on versus identifying causal relationships, right? So we sometimes want deeper understandings. Sometimes we want to understand what's going on. If the thing we see is generalizable, does it happen in every time and space, right? So does increased social contact lead to higher rates of homicide just in the United States? Or does that hold in, say, I don't know, Sweden or Western Europe where there are lower rates of, say, gun ownership, right? Does higher levels of education... cause greater levels of income in every country in the world? Or is it only the United States where we privilege education, right? Sometimes we care about generalizability more than we care about causality, right? And so we're going to talk about that in just one second. But so far, we have laid out what a variable is and the type of or the levels of measurement of variables and the type of variables. We also laid out. how those variables get used in social research to identify relationships that we care about, right? You know, if there's a correlation between them, if there's a more importantly, a causal relationship between them. But we also have to do more work in how we think about and use variables in our research. And we call that process conceptualized. conceptualization. In other words, we actually have to conceptualize and think about, right, the variables that we're using and what they mean. Let me give you a brief example of this before we continue and take a more in-depth look at this. Earlier on this lecture, we talked about human capital theory, right? And we... Tested it, right? Looking at two variables, education and income, right? Well, really, we just use a variable that measures education and a variable that measures income. And we use that to actually get at a much broader and more meaningful set of relationships, didn't we? We actually use education as a measure for the amount human capital that somebody has, right? We conceptualized education, right, as not just a measure of how many, you know, what level of education you have, but rather it stood in for a concept that we call human capital, right? And so when we do research, not only do we have to, like, identify the two variables we want to use to do an analysis of, we also have to say what those variables mean. That is the process of conceptualization. You also hear talked about as operationalization, right? How do we take an idea and operationalize it, right? It's the same thing we did for income, right? We just had a survey question that we used that we said, how much money do you earn income, right? But we actually use that as a stand-in, right, for This idea of how much money do you earn in the labor market, right? And so there's all of this work that has to get done to actually say what education and income actually mean. And we will use the term conceptualization and operationalization to actually deal with that issue. So let's talk more about this. What do we mean by conceptualization? It is the process by which we define. the general things that we are interested in, that we are interested in. Here, I'm going to use a different example than human capital theory, just so that you have a different idea, right? Let's say, for instance, we are interested in the concept of political engagement, political engagement, right? Well, we have to start off with defining what we mean by that idea, right? Well, what do we mean by it? How involved is someone in political activity? That is the concept that we're going to want to investigate, right? What do we have to do to actually get at that, to actually investigate it? Well, the first thing is we have to define our concept, right? In this case, we have identified political engagement as our concept, and we have defined it as how it bothers someone in political activity. Then we have to identify dimensions of that concept. Those are the different aspects of a single concept that help us measure the concept in its totality. In other words, if I can use a shorthand, what the hell do you mean by political engagement? That can mean a bunch of different things, right? More specifically, what do you mean by political activity in your definition of political engagement, right? So here are some examples. Well, voting, right? Is voting a form of political activity? Yes, it is. And, you know, that encapsulates voting behavior, right? information, right? We may include political information gathering as a form of political activity, right? To keep yourself informed. You might have to go out and learn things about what's going on in politics, right? Have you ever contacted someone about your political representative, right? Is that a form of political activity? Sure it is, right? elections. Have you ever worked on an election? Is that another form of political activity? It is, right? And so what have I just done? I have defined political engagement as my concept and I've given a definition of it, right? Then what I do, I identified different dimensions, different dimensions that I think help me show that. what do I mean by political activity and political engagement? I identified four dimensions, voting, information, contact, and elections, right? Those are the different dimensions that I've identified. Now, what do I need to do? I need to actually construct or identify indicators of that, of those dimensions of my concept. What is an indicator? It is a variable that indicates the presence of the concept we have chosen to study. This is where measurement happens. What do we say a variable is? A variable is a way of measuring the social world, right? Of measuring the social world. In other words, you know, variables are indicators, right? So, let's say I'm interested in the dimension of voting. The dimension of voting. How would I measure whether or not somebody voted or not? What question would I ask? Well, I could ask, do you vote? Does that make sense? If I ask somebody, do you vote? Have I measured something about the social world? I have. I've measured whether or not that respondent engages in one form of political activity, voting, right? Which we, and we define voting as one form of political engagement, right? Now, Is that a really good version? Is that a really good way of measuring whether or not somebody is involved in voting? I don't think so. We can make it a little bit better. How about, did you vote in the last election? Is that a little bit better? Yeah, because it gives us a little bit more information, right? By the way, what would you say would be the response categories for this? Probably yes or no, right? So what type of variable would it be? It would be a dichotomous variable, right? How about this? Did you vote in the last presidential election? Is this even a better version of the question? Yeah, because it specifies which specific type of voting they engaged in. And really, we always want to make sure that we're asking the right questions to measure, to make sure we're measuring the thing that we really care about. We want to make sure we're measuring the world in the way that we expect to be measuring, right? And we have... And we want to have different indicators for different dimensions, right? And here, I want to take a second and point out to you something that I think is really important for you to think about, okay? And this is a difference in different type of variables when we try to measure the social world, particularly when it comes to individuals and what we measure. Sometimes we measure attitudes and sometimes we measure behaviors. And what's the difference between those two? What's an attitude, do you think? An attitude is measuring how people think or they feel. So what's an example of this with respect to voting behavior or rather to political engagement? Do you think it's important to be informed about policy before voting? And you can imagine, you know, different responses to this. Yes or no. Or it's very important to be to be informed. Slightly important. Not important at all. Right. You can imagine different responses to this. Right. Do you think it's important to be informed about policy before voting? But is this measuring an attitude or behavior? It's measuring an attitude, right? It's just asking what they think about the world, not what they do in the world, right? And you have to be clear about that. Well, what would be the version of a question that actually gets at behavior, right? What do we even mean by behavior? Behaviors measure what people actually do. Did you read about the issues before you voted in the last midterm election? Do you see the difference between those two things? One thing measures attitudes, the other thing measures behaviors. You always have to be clear with yourself about which one of those things you actually care about. Measuring attitudes is not measuring behaviors, right? You'd also be very surprised about how often behaviors don't match up with attitudes. And so sometimes it's important to measure both attitudes and behaviors, right? And so, for instance, asking, do you think it's important to vote? And somebody says yes, does that mean that they actually voted? No, it does not mean that, right? Asking people if they voted, does that actually answer the question about whether or not they think it's important to vote? It does not. You would want to have two separate questions for those things. So it's very important for you always when you're thinking about the relationship between variables, when you're thinking about what a variable means, right, which is what we're doing right now. You should really think about, well, is the thing you're looking at, is it an attitude or is it a behavior? And how should you make sense about that, right? So let's return to conceptualization for a second, right? And I want to make a broader point. Thinking about research. Concepts are abstract, right? They're a broad definition of the concept that you are interested in. Political engagement. I think of political engagement as how involved is someone in political activity. Dimensions are a little bit less abstract, right? Because you're identifying the specific different dimensions that make up the idea, the concept that you are really interested in. Then indicators are very concrete. Why? Because they actually measure the presence of that dimension of your concept. They are actually where the rubber hits the road, right? Here is where the rubber hits the road. How am I going to measure their voting behavior? How am I going to measure their information gathering behavior? How am I going to measure their contact, whether or not they contact representatives? How am I going to measure whether or not they work on elections, right? Indicators are the variables you use to measure your dimensions of your concept, right? And so then what do you end up with? You end up with something. That looks like this. Here is my very broad concept, political engagement that I have identified as having these different dimensions, right? Voting, information, contact, and elections, right? And then for each one of my dimensions, I have at least one indicator that measures the presence, right, of... that dimension? Did you vote in the last election? If they say no, right, does that respondent, are they engaged in that specific dimension of political engagement? No. If you ask them, do you follow any political blogs and they say yes, right? Are they involved in the information gathering portion of political engagement? They are. The number of minutes spent watching C-SPAN last week? If they say 100 minutes, right? Are they engaged in the information gathering portion of political engagement? They are. If they say they gave feedback to their congressmen on pending legislation, are they involved in that dimension of political engagement? They are. If they say they have not ever worked on an election campaign, are they involved on the elections dimension of political engagement? They are not. And What could we do here? We could actually look at, for every respondent that we care about when we ask questions, we could actually add up the different forms of different political behavior that they are engaged in and say, well, some people are very high on political engagement and some people are very low on political engagement. Some people just vote and they don't do this other stuff, right? Some people don't contact. don't contact representatives, don't work on election. They are very low information about what's going on about politics, but they vote. You've heard of low information voters before, right? And there are other people who are high information voters, right? You see what I'm saying? We also have highly engaged voters and lowly engaged voters. Well, what are those people? Those are people, low engaged people are those who have low information, who have don't contact people. who don't work on elections, right? And so they're low engagement. What about the high engagement people? Well, they have high levels of information, they contact and they elect, right? And so we can look at high levels of political engagement and low levels of political engagements, and we can try to identify independent variables that produce low levels. of political engagement or high levels of political engagement. Does that make sense to everybody? And so we want to do a careful level of conceptualization when we do our research. Sometimes though, instead of using all of these variables, we may just choose one, right? We may just choose, do you vote in the last election? And we're like, you know what? That's going to be our most important version, our most important. type of political engagement. And so I'm just going to use voting as a stand-in for all political engagement. Now, would I say that's a very smart analytical strategy? I would say no, but sometimes that's what you do. Sometimes that's what you do because you don't have the data that you want, right? And so then what would the analysis look like, right? It would look something like this. My dependent variable would be, did you vote in the last election, right? And then I would select. a bunch of independent variables that I thought had an impact on this dependent variable. My dependent variable is dependent upon these variables that I've identified as important for the thing I'm interested in. Now, keep in mind, and I would write this down if I were you, that we would really want to identify, you know, not just one dependent variable. We would actually put them all together. And so we would say, Instead of just, we would say dependent variable would be low political engagement and high political engagement would be one variable, right? And we would say that these independent variables have an impact on whether or not you're low or high. But sometimes we don't have information that is that good. We don't have data that's good. And so we use one variable to stand in for that, right? And so what would we do? We would identify independent variables that we think have an important impact. effect on our dependent variables. And to be even clearer than that, what we would say is we would hope that each one of these independent variables has a causal relationship with our dependent variable, right? Actually causes, right? So here, believe in limited government. Is that an attitude variable or behavior variable? It's an attitude variable, right? What we would say is, you know, we might have a hypothesis that we believe those who believe that limited government is the best form of government, are less likely to vote in the last election, right? And that would be our hypothesis. And we would look at, we would collect some data and then analyze that data and see if there is support for that, right? Number of hours spent online last week. We may believe that those who spend more hours online last week are more likely to have voted in the last election. And we want to collect data to test that, right? Does that make sense to everybody? So in other words, when we do research, we don't just really nearly identify variables. We have to go through a process of actually making the variables that we identify in our research meaningful in some way. It's not just education and income. It's not just did you vote in the last election, right? Did you vote in the last election has some meaning and is associated with a broader concept. And so. I started out thinking about this class, particularly in the intro, as a class in storytelling with numbers, right? Identifying the concept of what your variables are associated with is part of telling a story about your statistical analysis, right? About your statistical analysis. And we're going to do a lot more of that as we go through this course. So let's finish up. with two last points, right? Two last points in our research methods review, which is comparing statistics and specifically survey statistics, which is what we're doing here, with other methods, right? Other methods. And I'm going to make this fast just because this lecture has gone on longer than I would have liked. Here, I have... three other methods. Experiments, right, for those who are psychology majors. Experiments, you know, are familiar to you. If you're another different type of social science, if you're a comm major, if you're a business major, if you're, you know, essentially anything other than psychology, you may not have spent that much time with experiments. That's when you have a control group and an experimental group, and you look at the differences between the control group and the experimental groups, right? Then we have ethnographies. Ethnographies is when we spend a lot of time in a particular neighborhood, right, or a potential or specific social world. And an important ethnography in my life has been with a professor who used to be a Cal State Philatelan who's no longer, who's in the sociology department. He went back to a neighborhood that he grew up in. And I think it was, I think if I'm not mistaken, it was in Brooklyn, New York. And he looked into the lives of these people he knew when he was growing up who robbed drug dealers. They had been drug dealers themselves and decided that they no longer wanted to be drug dealers. And instead, what they wanted to do to earn money was to rob drug dealers. And the name of this book, and I encourage you to read it if you ever get a chance. I mean, you can get, you know, download portions of this to the library. The name of the book is called Stick Up Kids. It's by a professor named Randall Contreras. A really great book. And it's, again, it's about how drug dealers become stick up kids who become robbers of other drug dealers, right? That's an ethnography. And he spent. a year and a half in his old neighborhood, spending time with them, watching what they do, interviewing them, trying to understand why they decided, you know, not to, you know, you know, go to college and finish their college degrees, but instead become people who rob drug dealers, right? And then finally, we call those ethnographies. And then finally, we have, you know, a survey, right? Each one of those methods has a different strength. and a different weakness, right? Experiments are the gold standard of causality. Notice here, I've listed out causality as very good, as very good, right? Experiments are very good at determining if the relationship between variables is a causal relationship, right? However, does that mean that experiments are good at everything else that we care about in the sciences? No, right? I said earlier on that we care about things like generalizability. Are experiments very good at generalizability? No, they're only okay. Well, why is that? Well, because experiments take place within experimental settings, don't they, right? And do we expect people, the way that people behave in experimental settings in a laboratory to be exactly the way that they behave in the real world? No, right? So there are limitations on the generalizability of the results of experiments. And then finally, we have in-depth understanding. Do we always fully understand why people make the decisions that they do in experimental settings? No. And in fact, experiments are not very good on their own. providing an in-depth understanding of why there is a causal relationship between your independent variable and dependent variable. So it's not so good for that, right? Well, what about experiments? Are experiments good at causality? No, actually, they're not very good. Why? Well, they're not always good at actually identifying correlations. They're not always good at identifying time order, right? They're not always good. at identifying spurious relationships. And as a matter of fact, ethnographies very rarely use a variable-oriented way of thinking about the world, right? They're not very good at that, right? Are they good at generalizability? Nah, they're not that good either at that, right? What do you mean? Well, do we think the same things that made these Brooklyn people become stick-up kids? Are the same things that would lead people to become stick-up kids in, say, Boise, Idaho, or, say, Stockton, California, or, say, Orlando, California? No, we don't, right? So it's not very good at generalizability. Well, what is it good at? It's excellent at in-depth understanding. Well, why is that? Well, because it adopts an inductive approach. It spends a lot of time with those people, right? And so there's about 12 people in his study that he spends a year and a half with. And so he knows those people very, very well. He knows their social environments very, very, very well, right? Does that make sense? So let me ask you this question. Do you think you could set up an experiment that would help you understand why people become... Stick-up kids, why they decide to become robbers of drug dealers? No, right? That's a very difficult proposition, right? Will the experiment get at that? No, it's very difficult to do, right? Let's back up. Ethnographies are very good at getting an in-depth understanding, but they're not very good at generalizability, and they're not very good at causality. Well, what about surveys? Are surveys good? And much of what we do in this class is predicated on survey data analysis. The GSS data that we use in this class is a giant survey of about 4,000 people. Is it good at causality? No, it's only okay. It's very good at identifying... at identifying correlations, and that's really useful for us, and that's what we're going to do with it, so we can do correlations, that's great. Is it good at time order? No, not really, actually, right? Because it collects data at one point in time, right? So that we use in this class, the GSS 2014, right? That means in 2013 and 2014, people asked about their education. their income and their race and their age and all that good stuff, right? Do we know when their education changed? Do we know when their income changed? No, we just know about their education today, right? This second, right? Do we know about what they believed in 2012? Do we know about what they believed in 2010? Do we know about what they believed in 1999? We don't, right? We have no information about that at all, right? So it's not good at time order. Well, what about non-spuriousness? Well, it is kind of good at that because we have lots of variables. The GSS has about 1,000 variables in it. So we can actually put a lot of variables in our model to try to control for non-spuriousness, right? To try to provide these additional third variables that might be influencing our analysis, right? So it helps us at that. But if we don't have a variable that gets at that… possibility of spurious effects, then I got nothing for you. So really spuriousness is a real problem for our survey data as is time order. So surveys are only okay at causality. What they are very good at is generalizability. Well, why is that? Anybody know? Well, you can't answer. Sorry. Well, it's because we use very, very, very... careful methods of generating our samples. We use very, very, very careful methods of selecting the people who are going to be in our survey. And as a matter of fact, one of our lectures in this class is going to be about the importance of sampling. And so in fact, many of the correlations and relationships we identify in our data, we will have very sophisticated tools to determine if they are generalizable to a much broader population outside of our sample. So for instance, we have a GSS sample of 4,000 people, and we will be very comfortable generalizing our analysis of that sample to the population at large. Finally, can we get in-depth understanding from our surveys? Not really, right? And again, you can ask yourself, Do I think that I can write survey questions that will get at why somebody decides to become a stick-up kid, decides to stop being a drug dealer, and decides to become someone who robs the drug dealer? Do you think people would even answer that question on a survey? No, they really wouldn't. So what's the point that I'm trying to make to you here? I'm trying to make two points to you here, and I will write those down. There are strengths and weaknesses to the methods that you decide to use. And you should choose the method that allows you to answer the question that you are trying to answer, right? However, in this class, we're going to be focusing on one specific method, surveys, right? only okay at causality, they are very good at generalizability, and they're really not so good at in-depth understanding. And you should understand why that is, right? And so if you were really determined to only always focus on causality and focus less on generalizability and focus less on in-depth understanding, which is what psychologists do, you would focus on experiments. If you were willing to give up some of your causal claims and give up generalizability, you would do And ethnography, so you got an in-depth understanding of the phenomena that you are trying to understand, right? There are different methods and there are different ways of going about answering research questions. I strongly encourage you to not think that statistics is the only way to do research. There are other ways to do research. All right, you all. This has gone on for long enough. Thank you for your patience. I hope that that was interesting to you and you learned something. Really, please make sure you take good notes in this. This is one of the most important lectures in the entire class. And please do email me with any questions that you may have. Okay? All right, you all. Have a good one.

We're going to talk about understanding the research process. We're going to introduce the idea of what a variable is. That section, section two, the introduction of variables, is perhaps one of the most important concepts we're going to be talking about throughout this entire semester. A lot of the things we do later on in the course is built upon that.

So I would strongly encourage you to take good notes, to ask questions via email about that section if you have any misunderstandings about it. So understanding variables in research, we're also going to talk about... And there we will talk about causality, which is also another important thing, which will be on some of the exams and certainly on the final exam. And then we'll talk about conceptualizing variables.

And then we'll look at other models, excuse me, other methods of doing social research. So a lot to cover. So let's just jump right on in. Thinking about research, understand the research process. Generally speaking, we talk about three different.

purposes of research, different purposes. It's not like I said purposes. There are not three different purposes of research purposes or dolphins. It's three different purposes of research.

The first is exploration. That's just familiarizing yourself with a new topic or phenomenon. And, you know, if something new comes about, no one has done any research on it.

And so what would you do? You will go out and try to understand, hey, what is this new thing that is occurring. What does it mean?

What does it look like? When I did my dissertation, I did my dissertation at UCLA on payday borrowing. Payday loans is a type of very expensive form of credit. And not a lot of people had done research on that.

And so my study began as an exploratory study, trying to just detail, hey. What's going on? What are these payday loans?

What, how do they work? Who are the people who are actually taking them out, right? And so I started off as an exploratory research project, right?

But later on, I undertook a descriptive research project where I described a process, right? And there I described the process by which people found and took out these loans. I found out where they're located and how people came to the decision of deciding that they were going to take out a payday loan. So I undertook in that dissertation two different purposes. I both explored a new phenomena, payday lending, and then I described that phenomena as well.

I described the process by which people took out a payday lending. I paid a loan. However, the real key purpose of research that is privileged, that really people are seeking, is explanation.

Because explanation is really where you're able to try to demonstrate your ability to understand why a phenomena occurs, right? Because it details how and why something takes place. And so what did I do? In my dissertation, I also tried to explain why was it that certain people would go and take out very expensive forms of credit? Why wouldn't they use, for instance, a credit card?

Why wouldn't they say, for instance, borrow money from friends or family before they took out very, very expensive payday loans, right? So I moved up to trying to explain that phenomena. So we privilege explanation over exploration.

and description, but usually in order to get the explanation, we have to go through explanation and description, if you understand me, right? And so explanation is really the key thing here, and it's the thing within social science research, and actually all forms of research, whether it's social science research or the natural sciences, we care about explaining it, right? So here you can think about gravity, right? We explore how gravity works. And then we try to describe how gravity works, and we can do that, right?

We can describe the force of acceleration due to gravity, right? 9.8 meters per second squared, right? Unfortunately, we don't have an explanation for why gravity occurs, right?

We know that it has something to do with mass. We know that essentially anything that has a mass exerts gravity, right? We can say that, you know, the force of gravity is related to the...

you know, the mass of the objects and the distance of the objects, right? But we still don't really have an explanation of why gravity works, right? So even in the natural sciences, explanation is the key purpose of research.

And it's the same thing in the social sciences. We really, really do privilege explanation. Explanation is very, very difficult. And that's the problem, though. Explanation is very, very difficult.

So. When we undertake research, typically we undertake it with two different processes, right? We can either take on a deductive process or what's called an inductive process.

And you can see those processes, right? You see deduction on the right-hand side. You see induction on the left-hand side. And so when we talk about deduction, what we really mean is starting off with a theory.

Using that theory to generate a hypothesis or set of hypotheses, observing the world in some structured way. In other words, that means using some type of research method. It means either we interview people in a structured way, we watch people, we live in their neighborhoods, we live in their communities in a structured way. That's called an ethnography. We conduct an experiment in a structured way, right?

Or we conduct a survey. All of those are different forms of structured observation. It's another way of saying a rigorous collection of data, right?

Once we have that data, we move to analyzing that data, right? We analyze that data. We then make sense of those results. And then based on those results, we come to some sort of conclusions of explanations.

And we try to see if our data... and our data analysis confirm, or in other words, support our hypotheses or whether or not our hypotheses are not supported, right? We call that a deductive approach.

We move from this big idea of theory, we narrow it down to our hypothesis, we test those hypotheses using structured observation, using data, and then we refine our theory or reject our theories and or hypotheses based upon what our data analysis. shows, right? This is typically what you get taught as the scientific method in your high schools or in some basic science classes, right? So this is why we're talking about this as part of our research methods review. Now, that is not the only way to conduct science, particularly in the social sciences.

You can also do it from an inductive approach, using induction. How does this differ? Normally, inductive approaches means that you... go out into the world and you watch what goes on.

And from that observation, you generate a theory about what you think is going on. And then you use that theory and you have that theory in your mind. And then you go back out into the world and you observe more of the world.

And then you refine your theory. And then you go back out into the world, you observe more data, and then you refine your theory. Typically, inductive research processes are iterative, meaning that you go out into the world, you observe the world, you go home, you refine and analyze that data, and you generate a theory from that data.

You go back out into the world, you observe it some more, you refine that theory, and you do that over and over and over and over and over again until you finally get to a point where you really feel like you have a very strong theory. that is very much rooted in the data that you have been observing, right? The world that you had been observing.

Deductive approaches usually run through the process of theory, hypotheses, data collection, analysis, interpretation of results, and then support or non-support for your hypotheses and theory. It usually runs through that process and then it stops. And then somebody else also does a version of that and somebody else does another version of that. And through the collection of all of those efforts, we, you know, science advances, right? And so deductive approaches, generally those are quantitative statistical approaches.

You go through once. you end your study, somebody else does it again and replicates it or doesn't replicate it. Inductive approaches are normally last longer and they are iterative. One researcher goes into the field, observes the field, comes back, revises their theory, goes out into the field, revises the theory. You can think here of somebody like, let's say Jane Goodall, right, who went out and observed gorillas and learned about gorillas.

and watch what they were doing and spent time with them. And then she came back and she refined her theories about it. Then she went back out into the field.

She observed more. And through time and through an iterative process, she learned more and more and more about guerrilla behavior. She generated more and more ideas about theories about guerrilla behavior, right?

And so that would be an inductive approach, right? Oftentimes, inductive approaches are qualitative approaches. That means that you're watching and you're learning and you're writing field notes in the social sciences. That means you're interviewing people, you're spending time in their neighborhood, you're watching what's going on. And then you come back to your house and you write some notes and then you go back into the world and you observe their neighborhoods.

And so, again, the point that I'm trying to make is that there's these fundamentally different... structures of research, a different process, right? We have deductive research that goes down from theory and hypotheses, testing those hypotheses with data, and then confirming or not confirming our hypotheses or theories versus induction, which observes the world, generates a theory from that, and then goes to an iterative process to refine that, right? We can sort of summarize this, you know, just very easily saying induction and then we can go back to the question. means moving from data, often qualitative observations of the world, to build theory.

And it goes from data to theory. And it does so, and I would really write this down if I were you, through an iterative process. It goes through an iterative process, right? Whereas deduction uses theory to generate a set of hypotheses that can be tested using data. It goes from theory to data, where we use theory to generate hypotheses.

hypotheses, right? Then we collect data, analyze the data, and sort of see if there is support for our hypotheses, right? For our purposes, statistics, and generally this is the case for whether or not you're doing surveys to get statistics or if you're doing experiments to get statistics, they're largely deductive.

You start off with some theory, a complex explanation of relationship between variables. An example of this would be human capital theory. If you don't know about human capital theory, if you've never heard of this before, human capital theory essentially says that the inequality in things like income that we see in the world is a byproduct of the different type of capital that people bring to the labor market.

Well, what do they mean by capital? They mean the skills they bring, the level of education that they bring, the talents that they have, the more skill. the more talent, the more education you have, the higher your level of income.

And so it says that some people earn more, some people earn less based upon the amount of skills, the amount of talent, and the amount of education that they bring, right, to the labor market. From that theory, we can generate a hypothesis. A hypothesis is a tentative answer to a research problem, right? We can generate from human capital theory a very specific hypothesis.

Those with more education should earn more money, right? Then we want to get some data, right, in order to test this hypothesis. We can compare earnings of those with different levels of education, right? In other words, we can look at people with different levels of education and see their income. And based upon that, come to a conclusion about whether or not our hypothesis is confirmed or not, right?

And we're going to do this in the next slide, right? We're going to do this in the next slide. So this is the median annual earnings by education level for full-time year-round workers who are older than 25 in 2011. And so what do you see on the bottom? What are these?

These are, I'm asking you questions like I can hear you respond to this. These are levels of education. Do not graduate high school, high school graduate, some college, associate's degree, bachelor's degree, professional degree.

Now, then we can look at the bars themselves and actually the number. What does that indicate? It indicates that as... education is increasing, what's happening to income?

It's increasing, right? It's increasing, right? So there does seem to be support for the hypothesis that as education increases, so too does income.

So there seems to be some support for human capital theory, right? That as education increases, so too will income, right? Here we looked at every level of education that we could find, right?

And let's be more specific. What happened? This data comes from some census data.

It's not a true census. The Census Bureau did a sample. And what do they do?

They went out and they asked people, a lot of people, what level of education they had. And the response category is the answers that they could provide is did not graduate high school, high school graduate, some college, no degree, associate's degree. bachelor's degree, professional degree, right? They also asked them another question.

What was that other question you think? It was, how much money do you earn, right? And then they put those two variables, right, those two ways of measuring the world together in order to look at the relationship between education and income. Does that make sense to everybody?

Now, we could look at every single level of education and look at the level of... income associated with it. But the other thing that we could do is look at two specific levels, right?

So we could compare high school graduates to those with a bachelor's degree, right? And what will we see there? The exact same thing.

High school graduates, their median income, right, is $28,659. Bachelor's degree, their median income is $49,648. So even if we just looked at two particular points of this, two particular categories of education, you would still see that there is some evidence for our hypothesis that as education increases, so too does income, right? Which is why on this bottom slide, it says our conclusion was get a college degree, right?

Let's continue. So I use this word variable, right? Variable, a moment ago.

What does that really do? What does that refer to? Well, variables are how we measure the social world, right?

It's how we measure the social world. So what is a variable? This is the actual definition. It is a property of people or objects that takes on two or more values, right?

It is a measurement of the social world. So for instance, do you own a car? Yes or no. Are we measuring something about the social world?

Yes. We're measuring whether or not you have a car. That's a measurement of the social world, right? We could ask additional questions after that.

What kind of car? How much is that car worth? How much did you pay for that car? Et cetera, et cetera, et cetera, right? Let's look at another example.

How long is your commute to school? And we could ask you how many minutes specifically, and you have to tell me the number of minutes, right? Are we measuring something about the social world? Yes, we are. We're measuring the number of minutes it takes you to leave your home and to get to campus, right?

That's a measure of the social world. Do you strongly agree, agree, disagree, or strongly disagree that parking should be easier? Again, are we measuring something about the social world?

Yes, we are. We're measuring your attitudes, how you feel about parking on campus, right? And whether or not you strongly agree, agree, disagree, or strongly disagree that parking should be easier, right? a variable measures something about the social world, right? In our previous example, we measured the education level of individuals.

We also measured their income, right? Those are things that we want to actually measure. Now, normally in this class, we're going to be using survey data, which means that we go out and we ask lots of people a whole bunch of questions, right? And each one of those questions you should think about is a different variable, right?

It exists as a response. So I could ask you, for instance, how excited are you for this class, right? Very excited. Excited. Not so excited.

Not excited at all, right? Sorry, that was my Eeyore voice. And that would measure, again, something about the social world, right?

And so we... any and it would be in a long survey that you would answer about this class right and so every question on a survey is trying to measure something about the world. Now, I keep saying something about the social world, whatever they mean by that.

When I say social world, normally what I mean is something about an individual, something about their background, something about what they believe, something about what they do, something about how they think about the world, right? Normally in psychology and sociology, political science, communication, economics, we're normally asking something about people, right? Or something that people do.

Or about businesses, right? So we could ask you, how long did you watch TV yesterday? Or how often did you use your phone? And all of those things are variables that we use to measure the social world. That said, we do differentiate different types of variables.

And that's what we're going to spend the next sort of almost 10 slides talking about. We call this levels of measurement. So, what type of measurement are you taking of the world?

And then we also call it the type of variable. So, every variable has a level of measurement and every variable has a type. Your job in this class is to be able to look at a question on a survey and to be able to determine what the level of measurement of that variable is and what the type of variable it is. And you will get questions like that on your exams.

So pay very close attention to what we're going to be talking about. But in addition, much of what we do later in this class is determined by or predicated on your ability to determine the level of measurement and the type of variable that you're dealing with. And if you can't do that, that means you'll do the wrong type of statistical analysis and then you'll get that wrong.

So it is very important that you understand level of measurement and types of variable. Let's get started. The most basic form of measurement is a nominal level of measurement.

This is a question that really just asks respondents to tell them what category do you fall into. And those categories have no intrinsic relationship to each other. They're just the names of a category. And you might be like, well, what the hell are you talking about right now?

Well, let's look at some examples, right? So here's the definition. Response categories have no intrinsic relationship. Values attest that categories do not imply real values. What does that mean?

One, we could ask somebody how, you know, what their gender identification is. And this is straight from the GSS. The GSS is old school. It does not actually allow for, does not provide additional gender identity options. It just provides male and female.

In other words, it can. conflates, you know, gender and sex. However, if I ask you, how do you identify in terms of your gender, male or female?

Is there any intrinsic relationship between male and female? Is male greater than female? Is female greater than male? No, they're just two categories that you could fall into and you tell me which one you fall into, right? Does that make sense?

There's no intrinsic value. You can't rank them. You can't say one is more than the other.

Does that make sense? They're just named categories. Now, when you go into SPSS and you see this survey question in your data set, it records people's responses, not by saying, writing in for your response that you said mail. and this other person said female, or that you said female, and somebody else said male, it doesn't record that as female and male. It says, if you said male, I'll record you as a one.

And if you said female, I'll record that as a two, right? Now, should we confuse that to say that females are twice males? No, that one and that two are just place orders. They're just, a one tells you female, a two tells you. A one tells you male, a two tells you female.

That doesn't mean females are twice males, right? They're just a placeholder for what your response is. So you cannot confuse the numbers that you see in SPSS with actually pertaining to an actual value, right? That's the second sentence.

Values attached to the categories do not imply real values. Let's look at a couple more examples. Left-handed or right-handed, right? If you're left-handed, does that make you greater than somebody who's right-handed?

If you're right-handed, does that make you greater than somebody who's left-handed? No, they're just categories that you fall into. It is a nominal category. What does nominal mean? It means name, right?

The categories are just names of categories and you tell. the researcher which category you fall into. Again, SPSS may record left-handed people as zero and right-handed people as one.

Does that mean that left-handed people don't count for nothing? I'm left-handed, by the way. Well, I'm actually ambidextrous.

I play basketball with my left hand. And yes, I got that J for you. Excuse me, sorry. And a one means right-handed. Does that mean right-handed people actually count for something and left-handed people don't?

No, that's not what that means. It's just a placeholder in our data set, a placeholder for our data set about which response that you gave. Let's keep going. How about race, ethnicity, and nationality, right? In the GSS, the race variable is actually a very bad question, but here's what it says.

Maybe write this down as I say it. Now, which race do you identify with? One, white. Two?

black, three other. Now, is there any intrinsic ranking of those categories? Is white greater than black or black greater than white or black greater than other?

No, they're just named categories that the respondent is putting themselves into, right? Does that make sense, right? Now, I mean, unless, of course, excuse me, you're a racist or a white supremacist. In this case, that's a whole other question, but still at the level of statistics, that variable is a nominal level variable. If I ask you, what country were you born in?

Now list out all the countries in the world. Is there a necessary relationship between all those countries? No, there's not.

There's absolutely not. They're just named categories that you identify. yourself and put yourself into.

This is what we mean by the nominal level of measurement. Nominal variables contain the least amount of information. Why do I say that?

Because all we know is that you identify or you put yourself into a category that fits a particular name, right? Whether it's male, right-handed, left-handed, that you're born in the United States or that you were born in Mexico or Costa Rica or... the Dominican Republic or whatever it is, right?

And so nominal level variables contain the least amount of information for us. This is different than ordinal variables. Ordinal variables ask questions where the responses can be ranked. The responses can be ranked. But we normally don't know the distance between the categories.

We don't know how far the distance is, right? What do I mean by that? Let's take a look at some examples. So a perfect example of this in the GSS is class identification.

The responses to the class identification variable are, do you identify as working class, middle class, or upper class? Now, can we rank your responses here? Now we can rank the response, right?

We know that working class is a lower class than middle class. And middle class is a lower class than upper class, right? Does that make sense?

So now, not only do we have information about which category that you go into, we can also rank you versus other people. So one person identifies as working class, another person identifies as middle class. We can actually arrange you in order. Does that make sense, right?

And now, the numbers, they don't actually mean anything, but they do go in order. No, one is less than two, two is less than three. Do you see that? Let's take another example.

How about strength of approval, right? Strongly approve, approve, neither approve nor disapprove, disapprove and strongly disapprove, right? So if I ask you a question, parking at Cal State Philatelan should be easier and you say strongly approve and another person in this class says disapprove.

Not only do we know you're in different categories, we actually know that one of you has more approval of trying to make parking easier, whereas the other one has much less approval. They actually have disapproval of actually making parking easier, right? Ordinal variables allow you to order the responses, right?

It contains more information. Not only are there names associated with the categories, but those categories can be ranked in some way. And so we cannot, we cannot. Figure out we can't look at a variable the way that it's asked right and figure out the level of measurement We must be able to see the response categories, right?

These are the response categories you need that you need that to see how the responses are structured in order to determine What the level of measurement is and again if I were you I would be writing all of this down because this is going to Be some of the most important stuff we recover in this class Now there's one more level of measurement that we're going to deal with in this class, and that's an interval ratio variable. It's actually two different levels of measurement, interval and ratio. We're going to combine them together into interval ratio.

Interval ratio variables are essentially any numerical variable. Not only can we rank people, but we can also tell the exact distance between those rankings. And you might be like, what the hell are you talking about? But if I ask you, how many years of, if I ask the class, how many years of education do each of you all have, right? And then I look at that data and some of you all have 14 years of education.

Others of you all have 15 years of education or 16 years of education. Do I know that those who have 16 years of education have more years of education than those who have 14? Yes, I do.

And do I know exactly how much more they have? I do. It's.

two additional years. Does that make sense? So now we actually know the distance between the rankings, right? And just so that we're clear, interval ratio variables are not category.

They're not going to provide categories. They're numeric, right? They're going to provide numbers.

So for instance, if I ask you, how many dollars of income did you earn last year, right? And I say, tell me the actual number of dollars. I, you know, I earned, I don't know, $35,000 last year, right? And I'm obviously I'm making the number up. By the way, you do know that every professor and every person who works at Cal State Philatelan, that their income is publicly available, right?

So there's websites where you can go and look that information up, right? Anyway, that's just a side point. Sorry, I don't mean to, I don't mean to go off on a tantrum.

Dollars of income. I'm asking you dollars, right? I'm asking you about a specific number.

Can you rank that number? Yes, you can. Do you know the distance between those things?

Yes, you do. Therefore, whenever you see an actual physical number, that is a interval ratio level of measurement, right? If I ask you, how many minutes did you spend watching TV yesterday? And you say, oh, I watched two hours of TV.

That's 120 minutes, right? That's an interval ratio level of measurement. Interval ratio levels of measurement have the most amount of information. Ordinal variables have the sort of the middle amount.

And then nominal variables have the least amount of information because they're just named categories. Does that make sense? It's very important that you understand this. It's very important that you understand this. Now, keep something in mind here.

Let's imagine that I ask you for dollars of income. I said, how many dollars of income did you earn last year? But instead of asking you for the specific number, I said, was it between zero and 5,000? Was it between 5,000 and 15,000? Was it between 15,000 and 35?

Was it between 35 and 50? Was it between 50,000 and 100,000? What if I gave you that version of the variable? What level of measurement would that be?

Am I, I'm asking this as if I can hear your response. I just want you to think about it. I mean, maybe write that down.

Would the person answering that question write down a specific number? Or would they choose from an ordered set of categories? And those categories just have to happen to contain numbers, right? So imagine that somebody, you know, I ask that question.

in the survey I sent out to the class, right? Imagine I asked that question. One person says they earned zero to $5,000 last year, right? So they're in category one. And another said they were in category four, that they earned 35 to $50,000.

Do I know the difference in income between those two people? Or do I just know that they're in two different groups and one earned more than the other? It's the second one, right?

version of asking questions about dollars of income, but where I have categories, that is an ordinal variable because you're just identifying that you're asking the respondent to put themselves into a category that can be ranked, but you don't know the distance between those two categories. If you ask them for the specific dollar amount they earn, that is a interval ratio level of measurement. Clear? Okay, let's keep going. Let's turn to types of variable.

Types of variable. Again, you want to pay attention to the responses that the question provides. If a question only allows for two different possible answers, it is a dichotomous question. If those... questions are coded as 0 and 1 in SPSS, we call it a dummy variable.

And I'm going to show you an example of that in just one second. So let's look at an example. Did you vote in the last presidential election? Why is this dichotomous? There are only two answers, right?

No and yes. No and yes. Do you see that? And again, if I was putting this information into SPSS, If you yourself said no, would I write in no? I would put in a one and say that stands in for no.

If somebody else said yes, I would put in a two. So is this a dummy variable? No, it's not because it's not coded zero and one.

If I put in a no response in SPSS as a zero and a yes as a one, then it would be a dummy variable. You may be asking yourself, why the hell does that matter at all? Why is he making this distinction? It will become clear to you later on in this course why I'm making this distinction for you. For right now, your job is just to know that a dichotomous variable is any question that only has two responses.

Only has two responses. It can be yes, no. It can be approve, disapprove, right? So if I ask you, do you approve of the job that Trump is doing as president currently, and it was either approve or disapprove, that would be a dichotomous variable, right? If you saw that approve was coded as one and disapprove was coded as zero in SPSS, you would then call that a dummy variable.

And just to be clear, and I think you should write this point down, all dummy variables are dichotomous, right? All dummy variables are dichotomous, right? Because there are questions that have only two responses.

But not all dichotomous variables are dummies, right? Because the only dichotomous variables that are dummy variables. ones that are coded as 0 and 1 instead of whatever else you want to do.

Now, where those numbers come from, when you the researcher are collecting the data, you get to determine how you put that information into your data collection software. You get to make that decision. You're gonna hear me say this all the time in this class, is that researchers get to make the decision about how they do many of the things they do in statistics, right? You just have to have an explanation or justification for why you did it.

So to repeat, the first type of variable we're talking about is dichotomous. It is any question that has only two responses. If those two responses are coded as 0 and 1 in SPSS, you would consider it not just a dichotomous variable, but a dummy. variable. Okay, let's keep going.

We also have categorical variables. Any variable that has categories as its responses, typically nominal and ordinal variables. I just want to be clear about this. If you can identify that a variable is nominal or ordinal, it will be a categorical variable. It just says that the question has a response that is broken down into categories, right?

So we talked about the race variable, right? We say that the responses to it are one white, two black, and three other, right? Are those categories that people fall into?

Yes, they are, right? So it's a categorical variable. We also said that those are just categories that cannot be ranked, right?

So we can actually say that with race, it is a nominal categorical variable. You put the two things together. In this class, on the exams, I will ask you, I will give you an actual question from a survey, and you will need to be able to. Identify the level of measurement of that variable and the type of variable that it is, right?

So how about highest degree achieved, right? Do you remember that from our example earlier on where we had less than a high school degree, high school degree, some college, associate's degree, college degree, and then professional degree? Are those answers, those responses categories?

They are, aren't they? Right? So that would be a categorical variable. What level of measurement would it be? Well, can we order those responses?

Does less than a high school degree have less education than someone with a high school degree? Yes. Does someone with a high school degree have less education than someone with a college degree?

Yes. Right? So the second type of variable that we deal with is categorical.

categorical. Any nominal and ordinal variable will be a categorical variable by definition. And if I were you, I would write that sentence down and put it in my notes and have it ready for an exam.

Okay. The next two types fall under the heading of numeric, numeric variables, right? Which level of measurement What level of measurement typically do we associate with numbers that are actually numbers?

It's interval ratio, right? So interval ratio variables are numeric, and they can be either numeric discrete, which is what this is, right? A numeric variable whose units cannot be subdivided.

Often, but not always, this means the numbers must be whole numbers, right? Can you think of an example of this? What place did you finish in the race?

Can you come in 1.5 place? You cannot. Can you come in 3.5 place or 3.1 place or 3.8 place?

You cannot, right? You need to come in first place or second place or third place, right? So, but if you come in first place, right, that's a number that can't be subdivided. How many children do you have, right?

Is that a number? It is a real number. I have two, I actually have zero kids, but... How many kids do you have? I have two kids.

Is it possible for you to have 2.2 kids, 2.3 kids? Not unless you are a terrible, terrible person doing terrible things to your children. No, I'm just joking.

No, it's not possible. It must be a whole number. You cannot subdivide children.

How many people live in your home? Does somebody only halfway live in your home? Either somebody lives in your home or they don't live in your home, right? You cannot subdivide. Is it a number?

Yes, right? So in other words... Interval ratio variables that are always, by definition, they are numeric variables. They are questions that produce numbers as their responses, right?

Those numbers can sometimes not be subdivided. We call those discrete, right? And you have to be able to identify when a interval ratio variable cannot be subdivided, and then you will label that discrete. So if one version of it cannot be subdivided, what do you think the other version of it is? It is, I'm actually missing a slide here and I apologize.

The other version of this is, and please write this down, is continuous. I'm going to write it here right now and I'll update this. We're going to add a new slide right now. You get to watch me do this. Uh oh, delete slide.

You get to watch me do this. Types of variables. I'm going to say this one is continuous.

A numeric variable whose units can be subdivided. E.g., why would it, that's weird, height, weight. Can we subdivide height and weight?

Can we subdivide time? We can, right? We can break down hours into minutes. You can have half a minute, half a second, a quarter of an hour, right? You can have half an inch, a quarter of an inch, right?

So to be clear, numeric variables, in other words, interval ratio variables, can either be discrete. continuous, discrete or continuous, right? How tall are you?

Oops, I have never done this before. Feet, inches, etc., pounds or kilograms if you're that type of person. hours, minutes, seconds. Do those always have to be provided as whole numbers?

They do not. If I asked you how long were you on Twitter yesterday and you said half an hour, right? That makes sense, right? That makes perfect sense. You can say that, right?

So it can be subdivided. So your job is to be able to look at a variable and figure out if it produces a number Right? Meaning it's numeric. Can it be subdivided or can it not be subdivided?

You're going to have to be able to do that. So if you have any questions about that, please email me. Not only do we have to be able to identify the level of measurement of a variable, we also have to be able to identify the type of variable that we use. You need to know both of those things, and if you have any questions about them, please email me. Please, please email me.

However, in this class, we normally don't care about a variable by itself. So what do I mean by that? If I ask the class about education, and I want to understand the distribution of education, that's perfectly fine and understandable, and we will do some of that in this course.

But normally, we care about the relationship between two variables. Does that make sense? Like we might care about the relationship between education and income, right?

And so normally, what we think about is not just understanding the distribution of one variable, but rather understanding the relationship between two variables. Does that make sense? And so we have a language for understanding those two variables, but also a language for understanding the relationship between them. And that's what we're going to talk about now, how variables get used in research. So let's go to that.

You may have heard the terms independent and dependent variables before, right? An independent variable is usually the variable that we think of as the cause, you know, as something that produces an effect, right? We usually think of this as the cause.

cause. And we usually refer to it as X, right? Then we also have a dependent variable. And that's the variable we look at to see if there is an effect, right? We refer to that as Y.

Normally, the thing we're most interested in, the thing that we care about is Y, the dependent variable, right? For instance, in my dissertation study, I was really interested in why people take out a payday loan. And so my dependent variable that I looked at was, did you take out a payday loan?

That was my dependent variable. And so then I thought of other variables that I thought might have an impact on dependent variables. So one of the independent variables I looked at was savings. And so I was interested in understanding the relationship between savings and taking out a payday loan.

Does that make sense? I thought that. whether or not you would take out a payday loan would be dependent upon the level of savings that you have. Make sense?

Make sense? Let's look at another example. We talked about human capital theory just a little while ago, right?

We had a hypothesis that as education increases, that also income would increase, right? What's the... dependent and independent variable there.

Well, there we want to see if education produces a increasing education, produces a rise in income. We think that Income is dependent upon education, right? So education is my independent variable and income or earnings is my dependent variable, right?

And so oftentimes what we want to know is what is the relationship between my independent variable and my dependent variable. More specifically, the thing that we really, really care about is determining. If the relationship between my independent variable and my dependent variable is causal, it's causal. And I would write this down. If I were you, I would put a star next to it, an exclamation point.

I would write this down. So what do I mean by this? What I mean is what we would like to be able to say is that it is in fact rising education. increasing education that actually causes income to increase, right?

That actually causes income to increase, right? Why do we care about that? Because in many ways, if we can identify that there's a causal relationship between the two variables I'm interested in, between my independent variable and my dependent variable, what have I done?

I've gone a fair amount of ways. towards explaining, explaining why income is what it is, right? And we said early on in this lecture that explanation is one of the most important, it is the most important purpose of research.

So identifying causal relationships is one of the key portions of explaining, explaining things in the social world. So we privilege causality over just about everything else. Does that make sense to everybody?

We want to actually be able to identify causal relationships. How do we know when we have identified a causal relationship between two variables? Well, three things have to hold. There has to be a correlation between the two variables. There must be the appropriate time order between the two variables, and we must have non-spuriousness between the two variables.

Again, if I were you, I would write these down so that you have them for easy review later on. Let's go through each one of these together. Correlation. We must establish that there is actually a relationship between x and y.

We must establish that there is a relationship between x and y. X and Y have to vary together. They have to co-vary.

That when one goes up, the other one, the other variable has to either go up or go down in a patterned way. Patterned way, right? So we have to see if there's a relationship between the two variables, right? That's what we were doing with education and income, wasn't it? We looked at as education went up, right?

As education went up. Did income go up, right? And we looked to see if there was a correlation between those two.

And we did, in fact, see that, right? Now, here's the key question, right? Here's the key question. And you probably have heard this before. Does correlation equal causation?

It does not, right? It does not. Correlation does not equal causation.

That's actually what we're talking about here, right? Correlation is just one part of identifying. causation.

We need additional things to be able to identify that, right? We need non-spuriousness and we need time order, right? However, we have to have correlation in order to get causation, but correlation by itself is not enough.

In other words, for those of you comfortable with necessary and sufficient clauses, right, or conditions, correlation is a necessary but not sufficient condition for... causation, right? We must have correlation to have causation. However, correlation by itself is not enough to prove causation. Much of what we do in this course is coming up with different ways of measuring whether or not a correlation exists between two variables.

We're going to be dealing with that throughout this entire course. What do we know? If a correlation exists between these two variables, how do we figure that out?

Much of statistical analysis... determining if there actually is a correlation between two things because it turned out that that can be really quite complicated right and later on we'll be able to determine if there is a relationship between them and when we get near the end of the course we'll get some vaccine very powerful tools to determine if there exists a correlation coefficient causation is still a deeper and broader problem and we'll deal with that later on but let's continue with our criteria So correlation is the first. The second is time order.

We must establish that one phenomenon that we identify as the cause occurs before the other phenomenon. Cause must come before effect, right? So let's think back to our education and income slide, right? What if we found out that people's income was rising prior to their education?

increasing, right? How would it be possible then for education to be the explanation for why their income increased? You see what I'm saying? So the change in your independent variable has to come before the change in your dependent variable, right?

The cause must come before the effect. Cause must come before the effect, right? Our final, our final Criteria is non-spuriousness.

This is the most complicated one. It says, we must establish that the relationship between the two variables, X and Y, is not caused by some other third variable. Time order is one real threat to identifying causality, but it's really the criteria of non-spuriousness.

That's the major hurdle for survey research in identifying Well, one of the major hurdles, probably I would argue the most important, is the most important problem of identifying causal relationships in survey research. But first, so what do we mean by non-subversionist? This graph is all well and good.

What do we mean? Let's look at an example. So imagine a researcher takes a survey of 10,000 respondents ages 3 to 21. She discovers a very strong correlation between shoe size, her independent variable, and score on a vocabulary quiz, Z.

And she determines that scores seem to rise with shoe size. In other words, as X increases, shoe size, score on a vocabulary quiz, the dependent variable, also seems to rise, right? Also seems to rise.

Well, why do you think that is? Can you think of, do you think that shoe size is causing, is causing, you know, people to have better vocabulary? Are just people with larger shoe sizes smarter than people with smaller shoe sizes?

Is there something else that might be causing this effect? Does some other variable explain the relationship between shoe size and score on a vocabulary quiz? I suggest to you that there is.

Can you think of it? What about... age, right? Is age another variable that may be operating here?

Yeah. What happens as people age? They grow, right? What happens to their shoe size as they grow? Their shoe size increases, isn't it?

Doesn't it? Right? What also happens as people get older, right?

Their brains develop, they learn new things, right? And so As they get older, they get wiser and smarter so they can answer more questions on a vocabulary quiz, right? So what's going on here? There is a third variable at play here, age, that explains the relationship between shoe size and vocabulary score.

It is not shoe size that is the main issue here. It's actually age, right? And so what would we do? We would... add age to the analysis, right?

And then try to see if there still exists a relationship between shoe size and vocabulary scores. Well, what would that look like? If we asked, or we looked at just people who were 11 years old, right? Or who were just, who were 18 years old, if you prefer, and we just looked at 18 year olds, would we have any belief or prediction?

that shoe size will be related to vocabulary score. No, right? There's no reason for us to believe that shoe size has anything to do with your vocabulary, right? Nothing.

All of that was from the outcome of this other third variable, age, right? So we would actually describe the relationship between shoe size and score on the vocabulary quiz as spurious. It is a spurious relationship.

And so therefore, we would conclude that there is no causal relationship between shoe size and age, right? There's another very famous version of this. Some of you, I've heard of this before, which is that if you go and you look at the data, you can actually see that as temperatures rise, rather, excuse me, I just gave away the, I gave away the, and the answer to this, but as ice cream consumption increases, so do homicides. As ice cream consumption increases, so do homicides.

And this is a correlation we can actually measure in the social world. Does this mean that we think that people who eat large amounts of ice cream are more likely to go on a homicidal killing spree? Is that what we really believe?

No. Well, what's going on here? Well, When do people consume more ice cream? When it's hot outside, right?

And when it's hot outside, you spend more time, or warmer outside. And when it's warmer outside, what do you do? You spend more time outside, right?

You walk to the store, you go out, you go to the park. There's just more time spent outdoors. More time spent outdoors means more potential for conflict.

More potential for conflict means more homicides, right? So what would we say? That the correlation that we see between ice cream consumption and homicide rates is spurious. It's not ice cream consumption that produces, you know, increased levels of homicides.

It's that the increased temperatures outside means that there's increased contact between people. Those increased contacts means greater risk of conflict. Therefore, you get more homicides, right?

The relationship that we observe in the social world between between ice cream consumption and homicides is spurious, right? In order to identify a true causal relationship, we have to be able to say that the relationship between X and Y, that we do identify that there is a correlation, that the change in the independent variable comes before the change in the dependent variable. and that there's no spurious relationship between the independent variable and the dependent variable.

There's no other third variable that explains the correlation we see between the independent variable and the dependent variable, right? We typically, in the social sciences, we are very interested in it. We are privileging of identifying causal relationships.

It's often our number one goal. Is it our only goal? No, it is not our only goal.

There are other goals that we might care about. We might also care about how well we can understand that relationship. Because just because we identify a causal relationship doesn't mean that we actually can explain fully what's going on there. It takes us a certain ways towards it, right? But it doesn't mean that we fully understand it and we have a deep understanding of it.

understanding of why, for instance, education causes income to rise, right? Or why social contact leads to greater chances of homicides, right? Sometimes we may privilege a deeper understanding of what's going on versus identifying causal relationships, right? So we sometimes want deeper understandings.

Sometimes we want to understand what's going on. If the thing we see is generalizable, does it happen in every time and space, right? So does increased social contact lead to higher rates of homicide just in the United States?

Or does that hold in, say, I don't know, Sweden or Western Europe where there are lower rates of, say, gun ownership, right? Does higher levels of education... cause greater levels of income in every country in the world? Or is it only the United States where we privilege education, right?

Sometimes we care about generalizability more than we care about causality, right? And so we're going to talk about that in just one second. But so far, we have laid out what a variable is and the type of or the levels of measurement of variables and the type of variables.

We also laid out. how those variables get used in social research to identify relationships that we care about, right? You know, if there's a correlation between them, if there's a more importantly, a causal relationship between them.

But we also have to do more work in how we think about and use variables in our research. And we call that process conceptualized. conceptualization. In other words, we actually have to conceptualize and think about, right, the variables that we're using and what they mean.

Let me give you a brief example of this before we continue and take a more in-depth look at this. Earlier on this lecture, we talked about human capital theory, right? And we... Tested it, right? Looking at two variables, education and income, right?

Well, really, we just use a variable that measures education and a variable that measures income. And we use that to actually get at a much broader and more meaningful set of relationships, didn't we? We actually use education as a measure for the amount human capital that somebody has, right? We conceptualized education, right, as not just a measure of how many, you know, what level of education you have, but rather it stood in for a concept that we call human capital, right?

And so when we do research, not only do we have to, like, identify the two variables we want to use to do an analysis of, we also have to say what those variables mean. That is the process of conceptualization. You also hear talked about as operationalization, right? How do we take an idea and operationalize it, right?

It's the same thing we did for income, right? We just had a survey question that we used that we said, how much money do you earn income, right? But we actually use that as a stand-in, right, for This idea of how much money do you earn in the labor market, right?

And so there's all of this work that has to get done to actually say what education and income actually mean. And we will use the term conceptualization and operationalization to actually deal with that issue. So let's talk more about this. What do we mean by conceptualization?

It is the process by which we define. the general things that we are interested in, that we are interested in. Here, I'm going to use a different example than human capital theory, just so that you have a different idea, right?

Let's say, for instance, we are interested in the concept of political engagement, political engagement, right? Well, we have to start off with defining what we mean by that idea, right? Well, what do we mean by it? How involved is someone in political activity? That is the concept that we're going to want to investigate, right?

What do we have to do to actually get at that, to actually investigate it? Well, the first thing is we have to define our concept, right? In this case, we have identified political engagement as our concept, and we have defined it as how it bothers someone in political activity.

Then we have to identify dimensions of that concept. Those are the different aspects of a single concept that help us measure the concept in its totality. In other words, if I can use a shorthand, what the hell do you mean by political engagement?

That can mean a bunch of different things, right? More specifically, what do you mean by political activity in your definition of political engagement, right? So here are some examples.

Well, voting, right? Is voting a form of political activity? Yes, it is.

And, you know, that encapsulates voting behavior, right? information, right? We may include political information gathering as a form of political activity, right?

To keep yourself informed. You might have to go out and learn things about what's going on in politics, right? Have you ever contacted someone about your political representative, right? Is that a form of political activity?

Sure it is, right? elections. Have you ever worked on an election?

Is that another form of political activity? It is, right? And so what have I just done?

I have defined political engagement as my concept and I've given a definition of it, right? Then what I do, I identified different dimensions, different dimensions that I think help me show that. what do I mean by political activity and political engagement? I identified four dimensions, voting, information, contact, and elections, right?

Those are the different dimensions that I've identified. Now, what do I need to do? I need to actually construct or identify indicators of that, of those dimensions of my concept.

What is an indicator? It is a variable that indicates the presence of the concept we have chosen to study. This is where measurement happens.

What do we say a variable is? A variable is a way of measuring the social world, right? Of measuring the social world.

In other words, you know, variables are indicators, right? So, let's say I'm interested in the dimension of voting. The dimension of voting. How would I measure whether or not somebody voted or not? What question would I ask?

Well, I could ask, do you vote? Does that make sense? If I ask somebody, do you vote? Have I measured something about the social world?

I have. I've measured whether or not that respondent engages in one form of political activity, voting, right? Which we, and we define voting as one form of political engagement, right?

Now, Is that a really good version? Is that a really good way of measuring whether or not somebody is involved in voting? I don't think so. We can make it a little bit better.

How about, did you vote in the last election? Is that a little bit better? Yeah, because it gives us a little bit more information, right? By the way, what would you say would be the response categories for this? Probably yes or no, right?

So what type of variable would it be? It would be a dichotomous variable, right? How about this?

Did you vote in the last presidential election? Is this even a better version of the question? Yeah, because it specifies which specific type of voting they engaged in.

And really, we always want to make sure that we're asking the right questions to measure, to make sure we're measuring the thing that we really care about. We want to make sure we're measuring the world in the way that we expect to be measuring, right? And we have... And we want to have different indicators for different dimensions, right? And here, I want to take a second and point out to you something that I think is really important for you to think about, okay?

And this is a difference in different type of variables when we try to measure the social world, particularly when it comes to individuals and what we measure. Sometimes we measure attitudes and sometimes we measure behaviors. And what's the difference between those two? What's an attitude, do you think?

An attitude is measuring how people think or they feel. So what's an example of this with respect to voting behavior or rather to political engagement? Do you think it's important to be informed about policy before voting?

And you can imagine, you know, different responses to this. Yes or no. Or it's very important to be to be informed.

Slightly important. Not important at all. Right. You can imagine different responses to this.

Right. Do you think it's important to be informed about policy before voting? But is this measuring an attitude or behavior?

It's measuring an attitude, right? It's just asking what they think about the world, not what they do in the world, right? And you have to be clear about that. Well, what would be the version of a question that actually gets at behavior, right?

What do we even mean by behavior? Behaviors measure what people actually do. Did you read about the issues before you voted in the last midterm election?

Do you see the difference between those two things? One thing measures attitudes, the other thing measures behaviors. You always have to be clear with yourself about which one of those things you actually care about. Measuring attitudes is not measuring behaviors, right? You'd also be very surprised about how often behaviors don't match up with attitudes.

And so sometimes it's important to measure both attitudes and behaviors, right? And so, for instance, asking, do you think it's important to vote? And somebody says yes, does that mean that they actually voted? No, it does not mean that, right?

Asking people if they voted, does that actually answer the question about whether or not they think it's important to vote? It does not. You would want to have two separate questions for those things.

So it's very important for you always when you're thinking about the relationship between variables, when you're thinking about what a variable means, right, which is what we're doing right now. You should really think about, well, is the thing you're looking at, is it an attitude or is it a behavior? And how should you make sense about that, right? So let's return to conceptualization for a second, right?

And I want to make a broader point. Thinking about research. Concepts are abstract, right? They're a broad definition of the concept that you are interested in. Political engagement.

I think of political engagement as how involved is someone in political activity. Dimensions are a little bit less abstract, right? Because you're identifying the specific different dimensions that make up the idea, the concept that you are really interested in.

Then indicators are very concrete. Why? Because they actually measure the presence of that dimension of your concept. They are actually where the rubber hits the road, right?

Here is where the rubber hits the road. How am I going to measure their voting behavior? How am I going to measure their information gathering behavior? How am I going to measure their contact, whether or not they contact representatives?

How am I going to measure whether or not they work on elections, right? Indicators are the variables you use to measure your dimensions of your concept, right? And so then what do you end up with?

You end up with something. That looks like this. Here is my very broad concept, political engagement that I have identified as having these different dimensions, right? Voting, information, contact, and elections, right?

And then for each one of my dimensions, I have at least one indicator that measures the presence, right, of... that dimension? Did you vote in the last election?

If they say no, right, does that respondent, are they engaged in that specific dimension of political engagement? No. If you ask them, do you follow any political blogs and they say yes, right?

Are they involved in the information gathering portion of political engagement? They are. The number of minutes spent watching C-SPAN last week?

If they say 100 minutes, right? Are they engaged in the information gathering portion of political engagement? They are. If they say they gave feedback to their congressmen on pending legislation, are they involved in that dimension of political engagement?

They are. If they say they have not ever worked on an election campaign, are they involved on the elections dimension of political engagement? They are not.

And What could we do here? We could actually look at, for every respondent that we care about when we ask questions, we could actually add up the different forms of different political behavior that they are engaged in and say, well, some people are very high on political engagement and some people are very low on political engagement. Some people just vote and they don't do this other stuff, right?

Some people don't contact. don't contact representatives, don't work on election. They are very low information about what's going on about politics, but they vote.

You've heard of low information voters before, right? And there are other people who are high information voters, right? You see what I'm saying?

We also have highly engaged voters and lowly engaged voters. Well, what are those people? Those are people, low engaged people are those who have low information, who have don't contact people.

who don't work on elections, right? And so they're low engagement. What about the high engagement people? Well, they have high levels of information, they contact and they elect, right? And so we can look at high levels of political engagement and low levels of political engagements, and we can try to identify independent variables that produce low levels.

of political engagement or high levels of political engagement. Does that make sense to everybody? And so we want to do a careful level of conceptualization when we do our research.

Sometimes though, instead of using all of these variables, we may just choose one, right? We may just choose, do you vote in the last election? And we're like, you know what? That's going to be our most important version, our most important.

type of political engagement. And so I'm just going to use voting as a stand-in for all political engagement. Now, would I say that's a very smart analytical strategy? I would say no, but sometimes that's what you do. Sometimes that's what you do because you don't have the data that you want, right?

And so then what would the analysis look like, right? It would look something like this. My dependent variable would be, did you vote in the last election, right?

And then I would select. a bunch of independent variables that I thought had an impact on this dependent variable. My dependent variable is dependent upon these variables that I've identified as important for the thing I'm interested in. Now, keep in mind, and I would write this down if I were you, that we would really want to identify, you know, not just one dependent variable. We would actually put them all together.

And so we would say, Instead of just, we would say dependent variable would be low political engagement and high political engagement would be one variable, right? And we would say that these independent variables have an impact on whether or not you're low or high. But sometimes we don't have information that is that good.

We don't have data that's good. And so we use one variable to stand in for that, right? And so what would we do? We would identify independent variables that we think have an important impact. effect on our dependent variables.

And to be even clearer than that, what we would say is we would hope that each one of these independent variables has a causal relationship with our dependent variable, right? Actually causes, right? So here, believe in limited government.

Is that an attitude variable or behavior variable? It's an attitude variable, right? What we would say is, you know, we might have a hypothesis that we believe those who believe that limited government is the best form of government, are less likely to vote in the last election, right?

And that would be our hypothesis. And we would look at, we would collect some data and then analyze that data and see if there is support for that, right? Number of hours spent online last week.

We may believe that those who spend more hours online last week are more likely to have voted in the last election. And we want to collect data to test that, right? Does that make sense to everybody? So in other words, when we do research, we don't just really nearly identify variables. We have to go through a process of actually making the variables that we identify in our research meaningful in some way.

It's not just education and income. It's not just did you vote in the last election, right? Did you vote in the last election has some meaning and is associated with a broader concept. And so. I started out thinking about this class, particularly in the intro, as a class in storytelling with numbers, right?

Identifying the concept of what your variables are associated with is part of telling a story about your statistical analysis, right? About your statistical analysis. And we're going to do a lot more of that as we go through this course. So let's finish up.

with two last points, right? Two last points in our research methods review, which is comparing statistics and specifically survey statistics, which is what we're doing here, with other methods, right? Other methods.

And I'm going to make this fast just because this lecture has gone on longer than I would have liked. Here, I have... three other methods. Experiments, right, for those who are psychology majors.

Experiments, you know, are familiar to you. If you're another different type of social science, if you're a comm major, if you're a business major, if you're, you know, essentially anything other than psychology, you may not have spent that much time with experiments. That's when you have a control group and an experimental group, and you look at the differences between the control group and the experimental groups, right?

Then we have ethnographies. Ethnographies is when we spend a lot of time in a particular neighborhood, right, or a potential or specific social world. And an important ethnography in my life has been with a professor who used to be a Cal State Philatelan who's no longer, who's in the sociology department.

He went back to a neighborhood that he grew up in. And I think it was, I think if I'm not mistaken, it was in Brooklyn, New York. And he looked into the lives of these people he knew when he was growing up who robbed drug dealers.

They had been drug dealers themselves and decided that they no longer wanted to be drug dealers. And instead, what they wanted to do to earn money was to rob drug dealers. And the name of this book, and I encourage you to read it if you ever get a chance. I mean, you can get, you know, download portions of this to the library.

The name of the book is called Stick Up Kids. It's by a professor named Randall Contreras. A really great book.

And it's, again, it's about how drug dealers become stick up kids who become robbers of other drug dealers, right? That's an ethnography. And he spent. a year and a half in his old neighborhood, spending time with them, watching what they do, interviewing them, trying to understand why they decided, you know, not to, you know, you know, go to college and finish their college degrees, but instead become people who rob drug dealers, right?

And then finally, we call those ethnographies. And then finally, we have, you know, a survey, right? Each one of those methods has a different strength.

and a different weakness, right? Experiments are the gold standard of causality. Notice here, I've listed out causality as very good, as very good, right?

Experiments are very good at determining if the relationship between variables is a causal relationship, right? However, does that mean that experiments are good at everything else that we care about in the sciences? No, right?

I said earlier on that we care about things like generalizability. Are experiments very good at generalizability? No, they're only okay.

Well, why is that? Well, because experiments take place within experimental settings, don't they, right? And do we expect people, the way that people behave in experimental settings in a laboratory to be exactly the way that they behave in the real world? No, right? So there are limitations on the generalizability of the results of experiments.

And then finally, we have in-depth understanding. Do we always fully understand why people make the decisions that they do in experimental settings? No. And in fact, experiments are not very good on their own.

providing an in-depth understanding of why there is a causal relationship between your independent variable and dependent variable. So it's not so good for that, right? Well, what about experiments? Are experiments good at causality? No, actually, they're not very good.

Why? Well, they're not always good at actually identifying correlations. They're not always good at identifying time order, right? They're not always good. at identifying spurious relationships.

And as a matter of fact, ethnographies very rarely use a variable-oriented way of thinking about the world, right? They're not very good at that, right? Are they good at generalizability? Nah, they're not that good either at that, right?

What do you mean? Well, do we think the same things that made these Brooklyn people become stick-up kids? Are the same things that would lead people to become stick-up kids in, say, Boise, Idaho, or, say, Stockton, California, or, say, Orlando, California? No, we don't, right? So it's not very good at generalizability.

Well, what is it good at? It's excellent at in-depth understanding. Well, why is that?

Well, because it adopts an inductive approach. It spends a lot of time with those people, right? And so there's about 12 people in his study that he spends a year and a half with. And so he knows those people very, very well.

He knows their social environments very, very, very well, right? Does that make sense? So let me ask you this question. Do you think you could set up an experiment that would help you understand why people become... Stick-up kids, why they decide to become robbers of drug dealers?

No, right? That's a very difficult proposition, right? Will the experiment get at that? No, it's very difficult to do, right?

Let's back up. Ethnographies are very good at getting an in-depth understanding, but they're not very good at generalizability, and they're not very good at causality. Well, what about surveys? Are surveys good? And much of what we do in this class is predicated on survey data analysis.

The GSS data that we use in this class is a giant survey of about 4,000 people. Is it good at causality? No, it's only okay.

It's very good at identifying... at identifying correlations, and that's really useful for us, and that's what we're going to do with it, so we can do correlations, that's great. Is it good at time order? No, not really, actually, right? Because it collects data at one point in time, right?

So that we use in this class, the GSS 2014, right? That means in 2013 and 2014, people asked about their education. their income and their race and their age and all that good stuff, right? Do we know when their education changed? Do we know when their income changed?

No, we just know about their education today, right? This second, right? Do we know about what they believed in 2012?

Do we know about what they believed in 2010? Do we know about what they believed in 1999? We don't, right? We have no information about that at all, right?

So it's not good at time order. Well, what about non-spuriousness? Well, it is kind of good at that because we have lots of variables.

The GSS has about 1,000 variables in it. So we can actually put a lot of variables in our model to try to control for non-spuriousness, right? To try to provide these additional third variables that might be influencing our analysis, right? So it helps us at that. But if we don't have a variable that gets at that… possibility of spurious effects, then I got nothing for you.

So really spuriousness is a real problem for our survey data as is time order. So surveys are only okay at causality. What they are very good at is generalizability.

Well, why is that? Anybody know? Well, you can't answer.

Sorry. Well, it's because we use very, very, very... careful methods of generating our samples. We use very, very, very careful methods of selecting the people who are going to be in our survey.

And as a matter of fact, one of our lectures in this class is going to be about the importance of sampling. And so in fact, many of the correlations and relationships we identify in our data, we will have very sophisticated tools to determine if they are generalizable to a much broader population outside of our sample. So for instance, we have a GSS sample of 4,000 people, and we will be very comfortable generalizing our analysis of that sample to the population at large. Finally, can we get in-depth understanding from our surveys?

Not really, right? And again, you can ask yourself, Do I think that I can write survey questions that will get at why somebody decides to become a stick-up kid, decides to stop being a drug dealer, and decides to become someone who robs the drug dealer? Do you think people would even answer that question on a survey? No, they really wouldn't.

So what's the point that I'm trying to make to you here? I'm trying to make two points to you here, and I will write those down. There are strengths and weaknesses to the methods that you decide to use. And you should choose the method that allows you to answer the question that you are trying to answer, right? However, in this class, we're going to be focusing on one specific method, surveys, right?

only okay at causality, they are very good at generalizability, and they're really not so good at in-depth understanding. And you should understand why that is, right? And so if you were really determined to only always focus on causality and focus less on generalizability and focus less on in-depth understanding, which is what psychologists do, you would focus on experiments.

If you were willing to give up some of your causal claims and give up generalizability, you would do And ethnography, so you got an in-depth understanding of the phenomena that you are trying to understand, right? There are different methods and there are different ways of going about answering research questions. I strongly encourage you to not think that statistics is the only way to do research. There are other ways to do research.

All right, you all. This has gone on for long enough. Thank you for your patience.

I hope that that was interesting to you and you learned something. Really, please make sure you take good notes in this. This is one of the most important lectures in the entire class. And please do email me with any questions that you may have.

Okay? All right, you all. Have a good one.

Transcript for:Overview of Sociology Research Methods

Transcript for:
Overview of Sociology Research Methods