Transcript for:
Understanding Experiments and Observational Studies

Hello, time for in-class activity 2D. Now it's called effective vaccines, but what it's really about is introducing you to a study called observational study, and we want to compare that to an experiment. So, so far all we've done is talk about experiments, which are the most powerful, I wonder why my, oh, there we go, very powerful. So we've talked about an experiment and what we've actually talked about are well-designed experiments. To be a well-designed experiment, well, to be an experiment at all, you have to have a treatment imposed on observational units.

So if the researcher is actively doing something to the observational units, it's an experiment. You're not just observing, you're doing. you're acting on those observational units.

To be a well-designed experiment, you need to have random assignment of those observational units and you want to put them into two groups at least. where you control factors and compare results or outcomes. And those results or outcomes are about the response variables. So to be an experiment, you need that treatment imposed.

To be a good experiment, you want random assignment, you want control, and you want a comparison. Those are experiments. We're now moving on to observational studies. And you might be thinking, well, why? Why do observational studies?

Because the awesome thing about experiments, the awesome, awesome thing about experiments. Only with experiments, and not always, but often, you can say the explanatory causes the response. So we know from experimentation.

that changing your diet can lower your heart rate. We know that we've done experiments on it. We know that, but, but can we do experiments on everything?

That's the question. So here's a question. Can you think of a situation in which collecting data to answer the question is more appropriate because an experiment would not be appropriate? So with an observational study, you're just looking and you're collecting data. And for an observational study, you're finding an association, but you can't necessarily say that it's caused.

You can just say the explanatory is associated. there's some sort of an association. That only happens, that happens if you have observational.

So it's a weaker statement. I can see that there is an association between smoking and having a shorter life expectancy. So go ahead and see, is there a time when you can't do an experiment?

So pause and write your own thoughts down there. Okay, welcome back. I hope you paused. I'm going to show you a clip from 2015. I really like the PBS NewsHour.

So very reliable source. And so here's a lovely picture. So let's hear what our government did around World War II.

It was a painful, horrifying, and secret part of America's history during World War II. The U.S. government conducted experiments with mustard gas and other chemicals on some U.S. troops at the time. That chapter of history was first revealed in the early 90s. Now, a new investigation by NPR finds the military used race-based experiments as part of those tests.

African-American men shown here in protective gear as well as Japanese American and Puerto Rican soldiers were singled out. These pictures show the forearms of men exposed to mustard gas and other agents. Some, like Rollins Edwards, are living with the effects decades later, including injuries to their skin.

Caitlin Dickerson led in. Okay, so there you have it from one of my favorite people, Judy Woodruff. And let's just. Let's just be clear.

In the 1940s, our government experimented on Black soldiers, Puerto Rican soldiers, Japanese soldiers, some white soldiers, but the scientists had the idea that it would be more beneficial to focus on people, darker people, that somehow they had the idea that darker people would... would handle exposure to mustard gas better. I don't know if there was any conclusive reports, any conclusion on that, but was that ethical to do that to soldiers? So my response, and actually there was, I don't know what it was called, but there was some guidelines that came out after World War II.

that stated very clearly it is unethical to experiment on people, and you know what, I'm going to say, I'll leave a little gap there, without they're informed. consent and even then that doesn't look right consent even then it's problematic if you pay somebody to participate in a study where it might actually hurt them and the results once somebody is paid and i'm going to put in here what about or animals other countries do not experiment on animals at the same rate that we do So is it ethical to inflict pain and suffering to find out maybe something that's really valid information? I don't know. Is that our right?

So it's unethical to experiment on people without their informed consent. So that means we cannot experiment on children. It just wouldn't be right. But we, I mean, if it hurts them and we don't have somebody looking out for them.

So secondhand smoke. Is secondhand smoke exposure as a child, is it going to lead to asthma in adulthood? We can't experiment on the children to see if that's the case, even if, you know, even if their parents agree. So what was being mentioned there in the PBS NewsHour was an experiment on mustard gas.

So I know that some of you are still not comfortable with the diagram. So I'm going to go ahead and I'm going to just draw. an experimental diagram here and I apologize if this triggers you but this actually happened.

So you want to start with a group of people. So this is going to be your sample. So you want to describe your sample.

Describe sample the people that you're interested in studying and then probably you're going to go ahead And you're going to break things up into at least two groups. So we're going to have a group here and we're going to have a group here because you're going to want to compare them. You want to compare results.

And one group is a treatment group. And that group is getting some special treatment. And the other group is called the control group.

And that's where people, so I'm going to say here, this is the control group and this is the treatment group. And it's a little counterintuitive that the control group is not where the researchers are controlling the treatment and doing something different. The control group is the baseline.

And then from there, you're going to measure some results. And after that, you're going to compare the results from the two groups. Okay, so that's kind of the broad template. So on the exam, I'm going to ask you to do this. So this is a diagram of an experiment.

And I'm going to say diagram of a race-based, well, actually, sometimes they just, it wasn't comparing races. So diagram of mustard gas. experiment. And that's not like mustard.

That's not mustard from what you eat on a hot dog. That's not what it is. So, so who's the sample?

So I cut off the, there was, I mean, they, they, they experimented, the U S government's experimented on, I think it was speculated. They speculated it was in the tens of thousands of people. But I'm going to focus on just one experiment.

Let's say that they took 600 African-American soldiers. So they're men, because this is wartime. So what they did was they put 300 in the control group and they put 300. I'm going to describe. So on an island, I think it was on an island, on an island fed. same exercise everything else environment so same location everything the same except Also exposed to mustard. Yes, that's the treatment.

Okay, so we're establishing that the 300 men in each group are having the same experience. The people who get the added treatment. are called the treatment group.

The people who have just everything the same except the treatment are called the control group. So we've got two different groups and but you have to be careful because for it to be a well-designed experiment So we've talked about the treatment. We've talked about the treatment.

Got that. We've identified who the observational units are. We have to talk about random assignment.

Where does the random assignment come in? Well, it comes in and who you're deciding to go into each group. So use.

randomization, random number generator, or flipping a coin, something like that, to decide who gets the treatment. So I'm illustrating where the randomization comes in. That's where the randomization comes in. And so we're going to experiment on these men. So what did they do?

In some cases, they would put, they would rub the nerve agent on their arms for 300 of the men and on the other men, they wouldn't to do the baseline study. Other times they would actually put the man in a chamber and fill the chamber with mustard gas to see how it impacted their lungs. And those men, I don't think they're not around to be interviewed anymore.

Well, now nobody is, but. that would have been probably the worst experiment to be in. So what were the results?

So let's imagine this is the skin one. So what we're going to do is I'm going to talk about, so after two weeks of exposure. see how many men have severe skin lesions.

So you do that for the control group and you also do it for the, you also do it for the treatment group. So you count up how many men have severe lesions. So you decide what severe lesions would be maybe. And then when all is said and done, you're going to compare, compare.

the proportion of men with severe lesions in each group. So if the men in the treatment group have way more severe lesions, proportionally speaking, than the other men. then you know that since everything else was equal, since you control the situation, so they're on the same island, they're in the same, they eat the same food, everything's the same, the only difference is that this group was exposed to the mustard gas.

So that's a control. You use randomization to make sure that you didn't have a bias of putting the healthiest men in one group or the other and then you had some objective way of measuring the response and then you compare that is the mark of a well-designed experiment but in this case it's also the mark of a completely unethical inappropriate terrible experiment with no oversight. So it's, we can't, and we shouldn't experiment on people without their informed consent.

And even then we want to think twice about it. The soldiers, they were soldiers. There's some argument as to whether or not they agreed to do it.

They were paid or they, but did they know what they were signing up for? There's no, I think that there were some rulings on it and it was, it's a shameful part of our past. So you can't always do experiments. to establish some sort of association. So what you do instead is you do an observational study, which is a little more gentle and a little weaker, but it's also a little more respectful of people, animals, whatever is going to be experimented on.

So we're going to talk about observational unit studies and how they are different from experimental studies. And the true difference is... treatment is imposed. That is the true difference.

So if you're a researcher in an experiment, you are not passive. For the observational studies, passively looking, observing at data. So you have no play and you have no role in guiding what's happening other than deciding what you're going to look at. That's actually a pretty a pretty good role.

So this is the real mark of an experiment. But if it's a well-designed experiment, this is the one that has random assignment. This is the one that has control.

But they. both have comparison. You're going to be comparing results probably in both of them. Okay, confounding factors.

So if you're looking at an experiment, confounding, the possible confounding variables are what we call the nuisance factors. But you can't, there's not as much control in an observational study. So you just have to be aware of what could be something.

It can create an association in studies in ways that shouldn't occur in a controlled experiment. So that doesn't really explain what confounding variables is. I like, it's not just me, the term, another term for confounding variable is lurking variable.

you know and a famous a famous example is when the weather is um oops i gave it away darn it when um you know mayor noticed that on the days that the most ice creams were sold those are the days that more people drown so let's outlast the sale of ice cream will people still drown yes because there is an association there is an association between number of ice cream and drowning deaths. So a mayor who has not taken statistics will then say, oh well maybe the ice cream causes the deaths and outlaw it. Well no.

What's the lurking variable in this case? What's the variable that could, the actual factor that could actually explain the situation? The confounding variable, oops, the confounding variable in that situation. is the weather. That could be one of them.

Okay. Confounding theory. So the weather is, it's hot.

People go to the beach and they buy ice creams and they also, more people are in the water. So the more people that are in the water, the more likely it is that someone's going to drown. Ice creams could be age.

children are who buy the ice cream. So ice creams can get set up where they know there are going to be children where they can make a lot more money. And so younger people are more likely to drown. So that's the confounding variable.

So confounding or hidden variable or lurking variable is my favorite. or hidden variable that actually explains the association might actually cause the weather causes a rise in ice cream sales and it causes a rise in deaths. So you see those two things, but they're not actually causation.

Okay, so what we're going to do today is we're going to be identifying what an observational study is compared to an experimental study. And we're going to distinguish, we're going to practice the difference between those two things. And we're going to identify possible confounding factors or variables that might explain the apparent association. So that's where we're heading. Okay.

So, oh, they're saying this is if we... didn't know it, in March 2020, COVID was declared a pandemic by the World Health Organization. In December 2020, vaccines using messenger RNA technology were given emergency authorization by the Food and Drug Administration in the United States.

So it didn't go through all the proper channels to get. the vaccine passed. But now, I mean, millions and millions and millions of people have had the vaccine.

So we certainly know now, but at that time, there was a rush to get this through because the people were dying in droves from the vaccine. Over the next 13 weeks, so I'm going to highlight things that I think are important. 3,950 healthcare personnel were vaccinated.

first responders and other essential frontline workers who were given the vaccine completed weekly COVID testing. So they were given the vaccine, received the vaccine, and they were given weekly COVID testing to see whether they had tested positive with COVID. Okay.

And so in this study, researchers were following the effectiveness of vaccines in the real world setting. So when I think about when I answer these questions, I do want to identify. I want to think about sample population and variables.

I always I know my teacher is going to ask about it. So I'm going to go ahead and annotate this. If I read it again, I would see that.

this is the sample. It's not everybody in the world, but we have a sample of frontline workers and they come from the population of frontline workers. And then we have one variable is whether or not you receive the vaccine.

Yes. Sorry guys. I'm in the middle of this. Okay, so one of the variables is whether or not you receive a vaccine and the other variable is whether or not you develop COVID.

So we've got two different things going on and they're both really interesting and they're clearly related. So come up with at least two reasons why this is an observational study. So I want, how do you distinguish, why is this not an experiment?

might be an easier question to answer. And then identify at least one potential confounding variable in this study. So they're linking together, and I'm going to write this out.

How is vaccine status? associated with infection status, COVID status. So I've got my two variables. So I know from the way this is going that... this because it goes first This is my explanatory variable.

And this is my response. And it literally is a response. Did you get COVID or not?

Now the question is not, did you get COVID or not? But is it keeping you from being in the hospital from being very, very sick? And I'm dying to say, I mean, I'm pro vaccine.

So I'm dying to say that. that if you get a vaccine, it causes you to be less likely to go in the hospital. But because this is not an experiment, I can't really say causation. So I'm going to stick with associated with. So the two are there's a relationship between the two.

So pause and answer these two questions and then come back. So two reasons this is an observational study. So it's easier for me to think, what is an experiment?

And if that didn't happen, it has to be observational. So if I go back and look at my notes, I'll see. You could say a treatment was imposed because it was, they got the vaccine, but they all got the vaccine. They all got. the vaccine.

Sorry, sorry about that, my mother-in-law calling in the middle. So come up with at least two reasons why this was an observational study. So you can say, well, a treatment was imposed and that's the vaccine. But no, for it to be a treatment being imposed, it's like one group will get it and one group won't.

And here we don't have two groups. It would be unethical to only offer the vaccine, effective vaccine to half the workers. That would be completely unethical. So there is no treatment imposed to half and no treatment imposed to the other. So I'm just going to say no treatment.

is actively imposed by a researcher. Instead, essential workers... essential workers, frontline workers, all those great people who themselves decide to get the vaccine are observed.

Sorry, it's just been one of these days I'm trying to get this out. So instead, essential workers who decide to get the vaccine are observed by a passive, a more passive, they're not totally passive, more passive researcher. Now, it's also true if you see. random assignment, if you see two different groups being set up, if you see replication, if you see all that stuff, then, um, then, you know, it's a well-designed experiment, but the real tip off is that no treatment is being actively imposed by the researcher. Um, so So come up with at least two reasons that this is an observational study.

So no active, I mean that one's good enough, but I'm going to say there's no comparison group. That's my other favorite for no comparison groups. are being set up.

So that you need to have some way of comparing two different groups. Those are the essential ingredients of an experiment. So if instead you took these almost 4,000 people and you took 2,000 of them and gave them a real vaccine, and then you gave the other 2,000 of them a fake placebo, fake vaccine, and then you later saw what proportion actually came down.

of COVID from each group, that would be an example of a totally unethical experiment because all of these frontline workers wanted the vaccine. Identify one potentially confounding variable. So confounding, lurking, some variable. Let's say that we compare the frontline workers to general population and the frontline workers got the vaccine and the general population didn't. And lo and behold, the frontline workers are coming down with COVID at a much higher rate.

Would you say that the vaccine is causing the high rate of COVID infection? No. What are those workers experiencing that is going to make it hard for a researcher to really gauge how effective the vaccine is?

Or let's say we've got people in Oakland, California, and we're looking at the frontline workers there. And then we also have... frontline workers in New York and the frontline workers in New York are getting way more infection rate. And they, it's you, what's going on in those environments?

Well, people in New York got hit way harder. California didn't get hit as hard. I think because of public transportation, there were lots of things going against New York. It was the first point of contact. So I could do a whole bunch, but I think what I'm going to do is just What is it that frontline workers are going through that maybe the rest of us aren't?

I'm going to say a level of exposure, level of exposure. The more exposed you get to COVID, you are a COVID, the more likely it is you're going to get it. So level of exposure, level of stress. they only ask for one.

So stress, extreme stress, we know might decrease your immune system. Level of sleep. Maybe you're not that stressed out, but you're very sleep deprived.

That can decrease your immune system too. Level of exposure. The more exposed you are, the more likely it is you're going to develop COVID.

The list goes on and on and on. So what else could mess up? I think that's good enough to answer the question. So what are some of the differences between an experimental study and an observational study?

So I'm going to have experimental study. I like to experiment in the movie. So experimental study. versus observational study.

All experimental studies have treatments. Treatment is imposed by the researcher. there are at least two comparison groups.

There should be random assignment of the groups there should be and should be replication. I think I didn't note replication in the diagram. that I drew. I'm going to go back and add that in.

The replication, I'll do red for replication. Oh the red is for random too. Okay so the replication occurs.

occurs when they, the evil government, exposed 300 men to minimize the negative effects of a few. extreme responses. There might be some man that is in this group that gets exposed to a lot of toxic stuff and never develops anything.

So there's some very strong individuals that are kind of inert and don't respond to anything. I'm kind of one of those where I just, I'm not that affected by my environment compared to my family members. And then there can be some very, very sensitive people that even if they had no exposure, they would break out in a rash because they're stressed out of just being in an experiment.

So you could have some weaker sensitive individuals over here. Well, if you have 300 people and 300 people, chances are a few of those odd birds being mixed in, it's their first of all, they'll be likely they'll be equally found in both groups, but even one or two really weird. responses is going to get washed out by all the others.

That's the benefits of replication. So in a good experiment, you will see replication. You'll see you won't just experiment on one person.

You'll experiment on a whole bunch of people and they will be randomly assigned to comparison groups. And then you will actively impose a treatment on one of the groups. And then, but...

comparison you find in both, whether it's nice to try to look for comparisons and patterns in both observational studies and experiments. So it's easier for me to spot what is an experiment than what is not. If it's if it's lacking these elements, it's not a well-designed experiment. And if it's lacking these elements, it's not an experiment. So for observational studies.

So the big difference is. The researcher just gathers data and looks for accounts slash associations. Okay.

So they do that over here. They look for patterns and association. But the big benefit of the experiment is if there is a difference in responses between the groups, you can... can include the explanatory response, the explanatory variable, the change caused the response.

that is firmly in the realm of the experiment, not of the observational. For observational studies, it's much more difficult to say causation because you cannot control for those confounding variables. Okay, so there's, that's a big theory. Let's go ahead and do some practicing to see how that sits in your head. So number five, a PhD student collects data from a local elementary school.

For each student, she records their age, their grade, their height, their weight, their shoe size, and their GPA. And then also score on a statewide. reading test for elementary school. So we're looking at, there's a lot of data, and she's recording all of it. Is this an experiment or an observational study?

Okay, so answer that. And I'm going to just, I'm going to shrink this down. Is it blue or is it orange? That's what you're being asked.

And when in doubt, if it's not, obviously an experiment. It probably isn't an experiment. They're usually very obvious. And for this one, it's going to be an observational study. And I should have said why.

And so I'm going to add that in, explain the answer. And why is anything ever an observational study? No treatment. was imposed on the children. Instead, data was only gathered.

Okay. So hopefully that's straightforward. So, not an experiment.

Okay. The PhD student observes that students with larger shoe sizes tend to score higher on the state reading test. Okay.

She concludes that having bigger feet improves reading ability. Can you explain why her reasoning is likely to be incorrect? Can you think of another explanation for the higher test scores? So by thinking of that, what they're actually asking you here is can you identify the lurking variable, the confounding variable? So I'm going to say use statistical language here.

Okay. So pause and you answer that. So. She's what so clearly she didn't take stats class because what she's saying is big feet, bigger feet actually improve, improve test scores.

So I'm going to say cause better test scores. That's what she said. So I'll put it in quotes, not correct.

What she should have said, because it's an observational study, is are so are associated with. Is it fair to say, gosh, the children I noticed that had the bigger feet also scored better. That is a correct statement.

the biggest feet for my size, 10 and a half, and I'm about five foot four. So I kind of like that statement, but there's an underlying, what is the lurking variable there? What is the confounding variable? What could explain that association?

What's something that's driving it? So a possible The confounding variable that messed up her study and misled her, or confounding factor, is the children's age. Younger kids have... smaller feet and they might have just as much or even more potential but and score lower because they haven't been in school as long so it's the age that actually explains the association And it's the age that causes the older you get, the better you score as an individual. Okay.

So that's that one. A study finds that people living in high density urban areas develop fatal lung cancer at lower rates than people living in low density rural areas. That's kind of, that is so surprising. So it's kind of counterintuitive.

So let me put this in terms that make sense to me. And that is that it's more likely that people living in New York City are going to not develop lung cancer. at as high a rate as people living in maybe, you know, the Blue Ridge Mountains, so the Tennessee rural area of Tennessee. So maybe that makes a little more sense when you put it in context like that.

But it's really counter counterintuitive because high density urban areas, so city folk, city folk suffer. from cancer less than country folk country people that's really shocking to me so first question that observation a study finds um do you think that study is a experiment or an observational study? Did they find kids in an orphanage and did they split them up and randomly assign some of them to live, grow up in a city and the others to force them to grow up in the country and then, and in all other ways, have everything the same and then come back 20 years later to see who, or I guess it would have to be 50 years later. No, they didn't do that.

I really hope they didn't. That would be awful. So this is going to be an observational study because you can't force people to where they live. So it's an observational study, most likely, because one of the factors is where do you live?

Unlikely. you force people to randomly live in different places. Okay, didn't ask, but there we go. Okay, a studi reads, sorry, a student reads this study and determines that living near many people prevents cancer. Can you think of other things that might explain why low density areas see higher rates of lung cancer?

So why is it besides the density of people, think of other things that might explain this. i.e. what's the confounding variable? Can you think of other things? And I got to admit, sometimes I'm just like deer in a headlight.

A possible confounding variable or working variable in this study is... Okay. Well, I think it's pretty fair to say it's, I don't know, it's hard. to find a place to smoke in a city anymore. It's all illegal.

People who are living in cities are more likely to be controlled by, and laws are more likely to be enforced, that you can't go in a public place and smoke. Whereas people in the country are a little bit more laissez-faire about it. So I think it is, I think it might be, so possible is people in rural areas are more likely to smoke. So your smoking status could be one compounding variable.

It's also absolutely true that people in rural areas are likely to be less educated than city people. And I don't mean that disrespectfully. If you look at the average, the proportion of people in New York who are highly educated, and by highly educated, I mean more than a high school education. the proportion in New York is so much higher, even in the poorest neighborhoods. And same with San Francisco.

Whereas if you go to the countryside, people have less access to education, so they don't have the same opportunities. And if you are not as well educated, you are more likely, there's all sorts of other things that you're more likely to not be. So if you are not well educated, you might not recognize the signs. And then also, People in the rural areas are more likely to be poorer and have less access to health care. People in rural areas are more likely to be poor.

So therefore... may have less access to seeing a doctor. Okay.

So that's, that's very true that, you know, whether you see a doctor in Santa Barbara or you see a doctor in Lompoc, Lompoc's not low density. Well, there might not. And maybe the doctors aren't even around, but money goes a lot further. You earn a lot more money in the cities to spend that money on doctors. And then the doctors just might not be there.

And we found that that was really highlighted in the pandemic studies that people who they, they couldn't get to the hub. So that's pandemic, not lung cancer. But so that's a couple I could keep going on and on. Um, people who live in the rural areas, probably there are not as many doctors or hospitals.

And if they do get diagnosed, it's also more likely that they might be far away from the most effective treatment. So, um, I guess that's not fatality of cancer of lung cancer. It's just, well, lung cancer is a pretty deadly one. That's not one you want to get.

Okay. So I think, um, uh, I think we're done. Yep, we're done. So let's check this.

So did I cover everything? Are you comfortable with the differences between observational study and experimental study? If it's when in doubt, it's an observational study.

Are you comfortable with what a confounding, i.e. lurking, hidden variable is that actually explains the association between the two? Can you identify an observational study compared to an experimental study? And can you identify the confounding factors or variables? So that's what you need.

So that is the end. of in-class activity 2D. So now take a break, just a little break, and then do the practice if you can. Okay, talk to you later.