Transcript for:
Understanding Confounding Variables in Experiments

this week we'll be covering chapter 11. more on experiments confounding and obscuring variables this chapter is focused on two major Concepts first threats to internal validity and second interrogating no results in experiments we'll cover each of these topics in detail as we go through this chapter like I said there are two main sections in this chapter the first one discusses potential internal validity problems and how to avoid them we'll first quickly review the three threats we covered in chapter 10 specifically design compounds selection effects and Order effects then cover an additional nine other threats the second major section we'll discuss describes some of the reasons experiments might yield to null results did the study given no result because the manipulation truly makes no difference or is the null result due to too much variability within groups or a lack of sensitivity in the measure we'll go over how we would go about answering these types of questions in this chapter so let's talk about the first main section threats to internal validity listed here are the subsections we'll cover we'll talk about an example of a really bad experiment or a tale of what not to do then we'll discuss six of our new nine threats that have to do with pre-test post-test designs then discuss the remaining three threats to internal validity that can be present in any study finally we'll cover how to still interpret the value in a study and how to see through the field of problems let's start off by describing the one group pre-test post-test design also known as the really bad experiment you can see the outline of this type of experiment here on the slide part A is a general diagram and parts B and C are diagrams of specific studies we'll discuss the really bad experiment is also known as a one group pre-test post-test design which means that there is one group of participants who are measured on a pre-test exposed to a treatment intervention or change and then measured on a post-test such a design is problematic because it is very vulnerable to threats to internal validity the first example study which is outline in line B in the diagram describes nickel a summer camp counselor and Psychology major who has noticed that his current cabin of 15 boys is an especially Rowdy bunch he's heard a change in diet might help them calm down so he eliminates the sugary snacks and desserts from their meals for two days as he expected the boys are much quieter and calmer by the end of the week after sugar has been eliminated from their diets the second example experiment outlined here in line C describes Dr Yuki who has recruited a sample of 40 depressed women all of whom are interested in receiving Psychotherapy to treat their depression she measures their level of depression using a standard depression inventory at the start of therapy for 12 weeks all of the women participate in Dr Yuki's style of cognitive therapy at the end of the 12-week session she measures the women again and finds that on the whole their levels of depression have significantly decreased here we can see the results from these two bad experiment examples it would appear from looking at the graphs that in both experiments there was a change in the groups from pre-test to post-test on the surface this appears to be meaningful but let's now talk about why these studies are problematic and how they can break down under threats to their internal validity if you recall this chapter covers a total of 12 threats to internal validity the first three are designed confounds selection effects and Order effects these three threats were the focus of the previous chapter here we'll discuss the remaining nine threats out of the 12 total this slide presents six of those nine threats that can all be present when using a pre-test post-test design as in are too bad experiment examples these six threats are maturation threats history threats regression threats attrition threats testing threats and instrumentation threats the final three threats Observer Bias demand characteristics and Placebo effects are applicable to any experiment not just ones that employ a pre-test post-test design there are also combined threats which we will discuss along with each of these individual threats in detail the first threat listed is known as a maturation threat a maturation threat is a change in behavior that emerges spontaneously over time for example children become better and faster at solving addition and subtraction problems as they get older trees grow taller with age and over time people tend to recover on their own from various psychological disorders in a really bad experiment remember that there is only one group of participants and they are getting the treatment but in a true experiment there is a comparison group in order to prevent maturation threats in the depression study graft here a comparison group has been added to Dr Yuki's really bad experiment notice that the two groups are quite similar in their depression scores at pre-test but the depression scores at post-tests differ with the Therapy Group having lower depression scores than the no Therapy Group thus the effect of maturation could be subtracted out when interpreting the results of this study and a maturation threat has been prevented after subtracting out this effect if A Change Is Still present which it is here then we can say that there truly was an effect of the treatment and that the change didn't simply come for the maturation effect the next threat is called a history threat history threats result when some external or historical event affect most members of the treatment group at the same time as the treatment so it's unclear whether the change in the dependent variable for the experimental group was the result of the treatment or the result of the historical Factor for example suppose you are studying the effects of meditation on stress levels among college students and while you were conducting the study a violent event occurred on the college campus at which you were collecting your data the meditation group did not show significant decreases in stress levels as expected but was that because the treatment wasn't effective perhaps it was effective but the campus violence raised people's stress levels which made it look like it was not effective another example can be seen here the figure on the slide shows two graphs from the go green study mentioned in your textbook in the graph on the left it appears the experimental and comparison groups both had similar declines in electrical usage from September to November because of the change of seasons therefore it looks like the go green campaign didn't actually have an effect however in the graph on the right you can see that the treatment group decreased its usage more than the comparison group thus we can rule out the history threat and conclude that it was the treatment that caused the decrease next we have regression threats regression threat which is also known as a regression to the mean it's a statistical Concept in which extremely low or extremely high performance at Time 1 is likely to be less extreme at time 2 or closer to average for example if you're in a really good mood or really bad mood you will probably be in more moderate mood tomorrow due to regression regression threats only occur in pre-test post-test designs specifically they only occur when a group has an extreme pre-test score high or low you can anticipate that the scores of those participants will regress towards the mean at post-test regression threats can be avoided by using comparison groups and inspecting the results if you look at the top graph you can see that both groups had similar levels of depression at pre-test but the therapy group had lower levels of depression at post-tests than the note Therapy Group regression can be ruled out and we can conclude therapy worked the middle graph shows the possibility of a regression threat because the two groups don't start out with equal depression scores at pre-test Therapy Group is more depressed at post-test the therapy group's level of depression has declined from the level at the time of pre-test but it's still greater than that of the no Therapy Group therefore it's not clear whether the decline occurred because of statistical regression or because of the treatment the bottom graph depicts the Therapy Group starting out with higher depression scores than the no Therapy Group but this pattern is reversed at post-test when the note Therapy Group had higher depression scores than the Therapy Group this could not be accounted for by regression alone therefore the IV or independent variable must have had an effect on the dependent variable in this graph the fourth threat to internal validity in pre-test post-test designs is known as an attrition threat attrition threat is a reduction participant numbers from pre-test to post-test attrition is only problematic when it is systematic in the top graph two people drop out of the study from pre-test to post-test and those individuals scored at the high ends of the distribution which is systematic this decrease the post-test mean dramatically you don't know whether it was the attrition or the independent variable causing the decrease however in the bottom graph the participants who dropped out have scores closer to the mean so removing their scores doesn't dramatically change the group mean at post-test therefore attrition isn't a threat one way to prevent attrition is that when participants drop out of a study the researchers remove their scores from the pre-test average as well another approach is to look at the pre-test scores of dropouts if they have extreme scores then they are more likely to threaten internal validity than if they have more moderate scores and should be removed the fifth threat is known as a testing threat testing threat is a type of order effect in which there is a change in participants as a result of experiencing the dependent variable or the test more than once their scores might go up due to practice known as a practice effect or their scores might go down due to fatigue known as a fatigue effect testing threats affect internal validity because it's not clear if the treatment caused the change in the DV or whether practice or fatigue did one way to prevent testing threats is to not use a pre-test called a post-test only design another way is to use Alternative forms of the test at pre-test and post-test having a comparison group is also helpful see the two graphs here you can rule out testing threats if both groups take the pre-test and the post-test but the treatment group exhibits a larger change than the comparison group finally we have instrumentation threats instrumentation threat also known as instrumentation decay occurs when a measuring instrument changes over time for example observers change their observation criteria over time or a researcher uses different forms of a test at pre-test and post-test and they're not equivalent forms one way to prevent instrumentation threats is to use a post-test only design however if you need a pre-test post-test design you should make sure that your pre-test and post-test forms are equivalent in terms of making observations you might retrain your observers through the study finally you might counterbalance the order of the pre-test and post-test forms such that some participants get form a at pre-test and some get Form B and then they get the other form at post-test an important distinction between instrumentation versus testing threats is an instrumentation threat the measuring instrument has changed from Time 1 to time two whereas in a testing threat the participant changes over the period between Time 1 and time 2. I mentioned before that there are also combined threats to internal validity sometimes in a pre-test post-test design two types of threats to internal validity might work together two examples of combined threats are a selection history threat and a selection attrition threat here's an example of a selection history threat suppose students at one University were in your treatment group and students at another University were in your control group in a study of the effects of meditation on stress however during the course of the study a stressful event occurs on one of the campuses now you are unable to determine if your effect is coming from your meditation manipulation the school or the event that happened at one of the schools an example of a selection attrition threat let's say that participants in one group have to travel one mile for the study and participants in the other group have to travel 20 miles for the study you might have more attrition in the 20-mile group due to the distance from the lab so you couldn't be sure if differences between groups were due to the independent variable or the distance and attrition so you can see how multiple threats can combined to cause additional threats to internal validity next let's talk about the last three threats to internal validity that can happen in any study design Observer Bias demand characteristics and Placebo effects are three potential threats to internal validity and they can occur not only in the very bad experiment example or the one group pre-test post-test design but also in experiments that have a comparison group Observer Bias is bias caused by researchers expectations influencing how they interpret the results for example Dr Yuki might be a biased Observer of her patient's depression she expects to see her patients improve whether they do or not nikhail may be a biased Observer of his campers he may expect the low sugar diet to work so he views the boy's post-test Behavior more positively although comparison groups can prevent many threats to internal validity they do not necessarily control for Observer Bias even if Dr Yuki used a note therapy comparison group Observer Bias could still occur if she knew which participants were in which group her bias could lead her to see more Improvement in the Therapy Group than In the comparison group demand characteristics are biases that occur when participants figure out what a research study is about and change their behavior in the expected Direction in order to control for Observer Bias and demand characteristics a researcher can conduct a double-blind study which is designed so that neither the participants nor the experimenters working with the participants know who is in the treatment group and who is in the control group if a double-blind study isn't feasible then an acceptable alternative is a mast design also known as a blind design participants know which group they're in but observers don't the final threat to internal validity are Placebo effects Placebo effects are present when people receive a treatment and improve but only because they believe they are receiving a valid or effective treatment for example participants are told that they are receiving a new therapy pill or injection but in fact it's missing the active ingredient or they may be told they are getting a new type of therapy but in fact simply chatted with someone and didn't get any therapy nonetheless participants May improve because they thought they had the therapy it's important to note that Placebo effects aren't imaginary in fact placebos can be strong treatments in order to rule out the placebo effect a special comparison group is used that is receiving the placebo therapy or Placebo medication but neither the people working with the participants nor the participants know who is in which group this is known as a double-blind placebo control study and the figure on the left you can see that both group symptoms decreased however the group getting the true therapy improved more than the group getting the placebo therapy which suggests that the therapy had some effect but is that really a placebo effect in the figure on the left that we just talked about there might not be a placebo effect however perhaps some of the Improvement was due to maturation history regression testing or instrument threats in order to determine whether there is in fact a placebo effect the researchers might add a third comparison group that doesn't receive the true therapy and doesn't receive the placebo therapy they don't receive any therapy at all if you have a placebo effect see the graph on the right then the no treatment group should not improve as much as the placebo group we have discussed 12 threats to internal validity also known as The Dirty Dozen and they are summarized in this table with so many potential threats out there to internal validity why do we bother to conduct experiments at all well a number of threats described are only an issue in the one group pre-test post-test design but if researchers design their studies well using comparison groups reliable coding procedures double-blind designs Placebo conditions and control variables then they can prevent these threats to internal validity here we see the first three threats specifically design compounds selection effects and Order effects once again these three were covered in the previous chapter here we see the next three threats specifically ones that can occur in the one group pre-test post-test design or maturation history and regression to the mean here we see the other three threats for one group pre-test post-test designs attrition testing and instrumentation effects and finally we see here the last three threats that can happen in any experiment Observer Bias demand characteristics and the placebo effect let's shift now to discussing the second major section of this chapter interrogating null effects what happens when a study finds a null effect also known as a null result in other words what if the independent variable did not affect the dependent variable so there is no significant covariance between the IV and the DV typically null effects aren't discussed much in the popular media because they are more focused on effects in which the IV does affect the DB here we see the subtopics we'll discuss in more detail in this section the three graphs in this figure all come from post-test only designs however null effects can happen in pre-test post-test designs and within group designs and even in correlational studies in graph a money didn't cause participants to feel happier why in graph B taking a GRE prep course didn't have a significant effect on GRE scores again why in graph C anxiety didn't affect the logical reasoning scores why might this be sometimes when a study has a null effect it might be that the independent variable really didn't affect the dependent variable other times when there's a no result it's because the study wasn't designed or conducted properly so the IV actually did affect the DV but some obscuring Factor got in the way of the researchers detecting the difference there are two types of obscuring factors one there might not have been enough difference between groups or two there might have been too much variability within groups let's look at each of these types in detail sometimes the cause of a null result is not enough between groups difference this might occur because of weak manipulations insensitive measures ceiling or floor effects or a reverse design confound based on the three examples we just looked at in the discussion of null effects we can think of them in terms of weak manipulations perhaps in the study on money and mood with the levels of the IV being no cash 25 cents or a dollar that just wasn't enough money to affect people's mood the difference didn't really matter sometimes a no result occurs because the researchers haven't operationalized the DV with enough sensitivity in the GRE Prep course example perhaps taking a prep course improves people's scores by 10 points if you were using a pass fail measure or even groups like high medium and low you wouldn't detect that change your best bet is to have very detailed quantitative increments instead of just two or three levels we'll discuss ceiling and floor effects manipulation checks and design confounds on the next few slides ceiling effects occur when the participant scores on the dependent variable are clustered at the high end for example giving college students a simple addition test another example suppose the researchers manipulated anxiety by telling the groups that they were about to receive an electric shock the low anxiety group was told to expect a 10 volt shock the medium anxiety group was told to expect a 50 volt shock and the high anxiety group was told to expect a 100 volt shock this manipulation would probably result in a ceiling effect because expecting any amount of shock would cause anxiety regardless of the shock's intensity as a result the various levels of the independent variable would appear to make no difference a floor effect occurs when the participants scores on the dependent variable are clustered at the low end for example if a researcher really did manipulate the independent variable by giving people either no money 25 cents or a dollar that would be a floor effect because those three amounts are all low they're squeezed close to the floor of zero sometimes ceiling and floor effects can be the result of a problematic independent variable as in the money and mood study in which all three levels of the independent variable were very low amounts of money either none 25 cents or a dollar poorly designed dependent variables can also cause ceiling and floor effects if you look at the graph it's an example illustrating a ceiling effect and a floor effect on the dependent variable and how that can obscure group gender differences on the independent variable a manipulation Shack is a second dependent variable included in a study to make sure the independent variable manipulation worked in the manipulation check depicted on the left it appears that the anxiety manipulation didn't work as the three anxiety groups self-reported very similar levels of anxiety in the graph on the right it looks like the anxiety manipulation did work as it seems to differentiate the three different IV groups as would be expected a design confound can counteract the true effects of an IV for example in the GRE study perhaps the test prep group was also under additional pressure to perform well in the GRE as a result they were actually receiving test prep and pressure while the no test prep group didn't have test prep or pressure the added pressure that was applied is considered a confound however it didn't work in favor of the test prep group it worked against them by lowering their scores another cause of a null effect might be that there is too much within groups variability also known as noise error variance or unsystematic variance having too much noise can get in the way of detecting between group differences if you look at the two graphs you will notice that the within groups variable is large in graph a then in graph B you can see this by examining the length of the arrow bars in the bar graphs or by examining the spread of the individual scores in the Scatter Plots notice how having more within group's variability obscures the group differences let's look at the different causes of too much variability within groups measurement error individual differences and situation noise measurement error is any factor that can inflate or deflate a person's true score on the dependent variable for example a man who is 160 centimeters tall might be measured at 160.5 centimeters because of the angle of vision of the person using the meter stick or he might be recorded as a 159.5 centimeters because he slouched a bit the goal is to keep measurement error as small as possible some solutions for reducing measurement error first use reliable precise measurements measurement errors are reduced when researchers use measurement tools that are reliable internal iterator and test retest and that are valid or have good construct validity another solution is to measure more instances if researchers can't find a measurement tool that's reliable and valid then the best alternative is to measure a large sample of participants random errors will cancel each other out with more people in the sample another source of within group variability is individual differences some people are faster Runners than others some are smarter some are funnier and so on in the graph you can see how participants who received money were slightly more cheerful than those who did not receive money but there was a lot of overlap in the two conditions thus the individual differences within each group obscured the between group differences there are two solutions for reducing the effects of individual differences first change of the design use it within group's design instead of an independent group's design see the graph on the right when you do this then each person receives both levels of the independent variable and individual differences are controlled for it's easier to see the effects of the IV when individual differences aren't obscuring between group differences you can also use a matched group's design pairs of participants are matched on an individual difference variable and it's easier to see the effects of the IV the second solution is to add more participants if it's not feasible to change the design to a within groups or match group design then try adding more participants this will lessen the effects that any one participant has on the group average situation noise are external distractions of any kind that obscure between group differences and cause variability within groups this includes smells sights and sounds that might distract participants and increase within group variability it adds unsystematic variability to each group situation by controlling the surroundings of an experiment that might affect the DV a way to reduce situation noise is to try to test participants in a quiet room with no outside odors distractions and so on another name for all of these solutions is called Power power is also an aspect of statistical validity for example if GRE prep courses really work to increase GRE scores then the study will detect this difference power refers to the strength the study has to detect such differences studies with more power can detect even smaller effects the figure here illustrates this with a light as an analogy for statistical power not having very much power is like trying to find an object in a room lit by a candle you probably wouldn't be able to find small objects by candlelight but you could find larger objects having high statistical power is like having a flashlight it's easy to find even small objects in a room lit with a bright flashlight this table summarizes the reasons for no results and can be found in your textbook table 11.2 we have discussed how we might not detect an effect when it exists because we don't have enough variability between groups or we might have too much variability within groups however a third alternative is that there truly isn't a difference between your groups and the IV simply has no effect on the DV if the study has adequate power and no difference was found then there simply is a true null effect a null effect in a study can be just as interesting as a group difference when studies are conducted with adequate power null effects are reported in scholarly literature however in the popular media you will rarely find null results the popular press focuses on group differences it often fails to see the value of No Effects in research it can be really important to demonstrate that a specific treatment did not have an effect as we wouldn't want to think it might help someone when it really doesn't so let's wrap up this week's topics remember this chapter is broken down into two main sections first the 12 threats to internal validity three we covered last week six that are specific to single group pre-test post-test designs and three that can apply to all research study designs the second major section covered how we can interrogate a null effect in research we discussed how null effects could be the result of not enough difference between groups or that it could be a result of too much difference within groups finally we talked about how sometimes there simply just isn't an effect because there really is no effect of the independent variable on the dependent variable lastly remember that though null effects are not often discussed in popular media they can still hold valuable knowledge gained from research