Transcript for:
Exploring Simple Experiments and Validity

this week we'll be covering chapter 10 introduction to simple experiments in this chapter we're going to introduce simple experiments we'll go through some examples of simple experiments and discuss aspects of the experimental design and methodology we'll talk about validity and how we can use experiments to make causal claims here we see an overview of the topics we'll cover in this chapter I'll first start off by presenting two examples of simple experiments as well as discuss specifically how we can Define experimental variables we'll then cover causal claims and why experiments are able to support such inferences we'll discuss the differences between the two different methods of experimental design specifically independent and within groups designs finally we'll talk about how we can go about interrogating causal claims using the four big validities let's first start off by discussing two simple experiments one of them should be familiar to you from a question on a previous class assignment we'll talk first about an experiment that examined the effect of method of note-taking on test scores as well as an experiment on serving both sides and Portion sizes when eating pasta researchers Pam Mueller and Daniel Oppenheimer decided to conduct an experiment in 2014 that compared the effectiveness of taking notes in class 67 college students were recruited to come to a laboratory classroom usually in pairs the classroom was prepared in advance half the time it contained laptops the other half of the time it contained notebooks and pens having selected five different TED talks on interesting topics the researchers showed one of the lectures on a video screen they told the students to take notes on the lectures using their assigned method either laptops or handwritten after the lecture students spent 30 minutes doing another activity meant to distract them from thinking about the lecture then they were tested on what they had learned from the TED Talk the results obtained are shown here in figure 10.2 students in both the laptop and the longhand groups scored about equally on the factual questions but the longhand group scored higher on the conceptual questions like we've discussed in class this is an experimental design primarily due to the inclusion of a manipulated variable here method of note-taking we can see that this variable appears to have an effect on the conceptual understanding or memory of the material the second experimental example is presented here researchers at Cornell University conducted an experiment in 2012 to see if serving both sides has an effect on portion size participants were randomly assigned to either the large bowl or medium Bowl conditions then during a kitchen laboratory-based lunch participants served themselves from a communal serving Bowl each participant's plate was weighed before he or she ate the pasta and afterwards to determine the amount of pasta they consumed the graph on the left shows the participants took more pasta from the large serving bowl that from the medium one the graph on the right shows that the large Bowl participants consumed about 140 calories more than the medium Bowl participants the researchers concluded that the size of the serving Bowl influenced how much pasta people served themselves and how much they ate again we can see here the presence of a manipulated variable or the serving Bowl size and the effect it has on both the amount of food taken and consumed foreign remember that all experiments must have at least one variable that was manipulated and at least one variable that was measured by the researcher a manipulated variable is one where the researcher assigns participants to a particular level of the variable for example note-taking method which had the levels of either computer or longhand or serving both sides either large or medium a measured variable is a variable in which researchers record what happens in terms of behavior or attitudes based on self-report behavioral observations or physiological measures for example the score on an exam or the amount of pasta that was consumed [Music] we call these two types of variables either manipulated or measured independent and dependent variables an independent variable or IV is manipulated by the researcher for example note-taking method was the independent variable in the academic achievement study independent variables have levels and the levels of the IVs are called conditions so if your IV was note-taking method the levels or conditions could be handwritten or laptop it's up to the researcher to select the conditions or levels dependent variables are variables that are measured by the researcher they depend on the outcome of the experiment or the levels of the independent variable there's also a third type of variable in experiments called a control variable control variables or any other variable that the experimenter holds constant across all participants the term control variable is actually a misnomer because the word variable suggests change yet a control variable is held constant an example here is in the pasta study all participants ate the same kind of pasta from the same type and size of plate now let's talk about why we can use experiments to support causal claims we've discussed the three criteria necessary to make a causal claim before which are presented here we'll discuss each of them as they specifically apply to experimental designs specifically how experiments establish covariance is the cause variable related to the effect variable how they establish temporal precedence does the cause variable come before the effect variable and how well-designed experiments establish internal validity are there alternative explanations for the results let's first talk about covariance both the pasta study and the note-taking study show covariance between the independent variable and the dependent variable covariance is indicated by a difference in group means in the pasta study participants in the large bowl condition ate more calories than those in the medium bolt condition independent variables answer compared to what you might recall that our personal experiences don't have the benefit of a comparison group or a comparison condition for example let's say you always eat a lot of pasta at home but you can't be sure if it's because your mom's large serving bowl or because you don't have anything to compare it to if independent variables don't vary or there's not a mean difference between groups then covariance has not been established covariance is also about the outcome your outcome variable or dependent variable is important too if the amount of pasta consumed was the same in both groups then there's no covariance and serving Bowl sides did not influence how much people ate it's important here to understand the difference between the different group types as well the control group is the neutral or no treatment level of the IV not all experiments have or need a control group the treatment group or groups are the non-neutral level or levels of the independent variable finally the placebo group is the control group that is exposed to an inert or non-active treatment the second condition necessary for a causal relationship is temporal precedence or that the cause variable occurred before the effect variable for example in the pasta study the experimenters manipulated the IV or the serving Bowl size first and measure the dependent variable second to ensure that the causal variable preceded the effect variable in time its ordering of variables is present in all experimental study designs finally to determine a causal relationship we must also have good internal validity well-designed experiments do in fact establish good internal validity however poorly designed experiments include threats to internal validity such as design compounds or selection effects if you will recall it is a primary importance to interrogate internal validity when making a causal claim you need to be able to rule out third variable explanations and feel confident that the IV caused the change in the DV and not some other variable confounds are these alternative explanations and they can threaten internal validity for example in the pasta study it would be a confound if some participants received less appetizing pasta and others received more appetizing pasta in this case your IV serving Bowl size would be confounded or confused with pasta quality so you couldn't be sure if it was the bowl size or the pasta quality that was affecting participants consumption amount we specifically refer to large compounds present throughout the study as design compounds a design compound is a compound that appears systematically throughout your entire experimental design a design compound is present when a second variable varies systematically along with the independent variable and provides an alternative explanation for the results for example consider the study on note taking if all of the students in the laptop group had to answer more difficult essay questions than those in the handwritten group that would be a design confound we would not know whether the difference in conceptual performance was caused by the question difficulty or the note-taking method if design compound is present it threatens internal validity and we can't support a causal claim in that case it's important to note that internal validity is only threatened if there is systematic variability with the IV for example if those participants in the large dish group had higher quality pasta than those in the medium-dish group then there's systematic variability and a design compound unsystematic variability is random or haphazard and affects both groups it is not a confound however it can make it difficult to detect differences in your dependent variable a selection effect occurs in an experiment when the participants in one level of the independent variable are systematically different than the participants in the other level or levels of the IV for example a study in 1987 was conducted to test an intensive therapy for children with autism some children in the study receive the new treatment and others continued their usual treatment however they were not randomly assigned to these groups because some families live too far away for the intensive treatment and other families requested the intensive treatment the researchers found that the autistic symptoms of children in the intensive treatment group showed more Improvement than those in the treatment as usual group however the study had a selection effect in which families in the intensive treatment group were probably systematically different than those in the treatment as usual group therefore it's important to determine the reason for the Improvement because of this selection effect random assignment is a way of assigning participants to levels of the IV such that each participant has an equal chance of being in each group there should be no systematic difference between groups with random assignment and this acts as a way to avoid any selection effects that might otherwise be present random assignment doesn't always work well if your sample size is small as groups may be imbalanced some researchers prefer to use matched groups with small samples matching involves matching groups of some variable for example IQ researchers randomly assigned the three participants with the highest IQs to the two groups then assign the next three highest participants and so on this ensures a much more even distribution of the selected variable here IQ between each group next we're going to talk a bit about independent group designs what they are how they're different from within group designs and different ways and times to include measurements within them in general there are many forms of experiments and the most basic distinction is between independent group designs and within group designs also just to clarify independent group designs and between group designs are two separate ways to say the same thing so let's discuss this basic distinction between the two experimental methodologies again you can notice on the slide that there are multiple ways to refer to each method but know that saying any of these names is simply making this distinction independent group designs have different groups of participants placed at different levels of the independent variable for example each participant was randomly assigned to either the large or medium serving Bowl condition and there are two types post-test only and pre-test post-test within group designs or when each participant is presented with all levels of the independent variable for example if you conducted a note-taking study and each participant engaged in both longhand and laptop note-taking so I just said that there are two different types of independent group designs post-test only and pre-test post-test so post-test only designs are also known as equivalent groups or post-test only designs it's a type of independent group experiments in which participants are randomly assigned to IV groups and are tested on the dependent variable just once at the end of the experimentation you can see the flow of the experiment in the two diagrams here [Music] in a pre-test post-test design or an equivalence group's pre-test post-test design participants are randomly assigned to at least two different groups and are tested on the key dependent variable twice once before and once after exposure to the independent variable for example the study on the effects of mindfulness training introduced in chapter 1 is an example of a pre-test post-test design in this study 48 students were randomly assigned to participate in either a two-week mindfulness class or a two-week nutrition class one week before starting their respective classes all students completed a verbal reasoning section of a graduate record examination or GRE test one week after their classes ended all students completed another verbal reasoning GRE test on the same difficulty figure 10.12 here shows that while the nutrition group did not improve significantly from pre-test to post-test the mindfulness group scored significantly higher at post-test than at pre-test so which of these two methods is better well deciding which design is better depends on the researcher's particular goal in some situations it is problematic to use a pre-test post-test design for example if the DV involves eating or physical exertion participants might become full or exhausted by having a pre-test and a post-test in other situations a pre-test post-test design makes sense for example in Mueller's note-taking study post-test only designs can still be very powerful given the combination of random assignment and a manipulated variable or IV but if you want to make sure that groups are equivalent at the start you'll need a pre-test next let's talk about the second type of group Design Within group designs there are two types of within group designs concurrent measure designs and repeated measure designs we're going to talk about each of these along with some of the advantages and disadvantages of using a within group's design we'll also talk about the three criteria for a causal relationship with these types of designs and also discuss whether or not a pre-test post-test design is considered a within group's design let's first discuss a repeated measures design so a repeated measures design is a type of within group's design in which participants are measured on the dependent variable more than once after exposure to each level of the independent variable researchers Erica bobfy and our colleagues used a repeated measure designed to investigate whether a shared experience would be intensified even when people do not interact with the other person they recruited 23 college women to a laboratory each participant was joined by a female Confederate for the first condition the two Sat side by side facing forward and never spoke to each other the participant and Confederate were each given a piece of chocolate and asked to rate how much they liked it then the Confederate was removed and the participant was given another piece of chocolate and asteroid how much they liked that piece the participant was told that the two chocolates were different but in fact they were exactly the same the results showed that people liked the chocolate more when the Confederate was also tasting it we can see here that this is what's referred to as a repeated measures designed the procedure of tasting and raiding a chocolate was repeated multiple times with the same participant only under different circumstances or levels of the independent variable the other type of within subjects design we have is called a concurrent measures design here rather than having a single participant undergo each level of the independent variable in sequence or one after the other like in a repeated measures design they are exposed to all levels of the independent variable at the same time then measured on the dependent variable the example on this slide is of a study investigating looking preferences in infants here they are exposed to both levels of the independent variable male or female faces then measured once on the dependent variable or their looking preference now let's talk about some advantages of within group designs first participants in your groups are equivalent because they are the same participants and serve as their own controls this helps to reduce any random differences that could occur between different groups of people for example some people really like dark chocolate and others do not but in a repeated measures design people bring their same level of infection for chocolate to both conditions so their individual liking for the chocolate stays the same the only difference between the two conditions will be attributed to the independent variable whether people were sharing the experience with confederate or not second these designs give researchers more power to notice differences between conditions within group's designs also provide more statistical power so that group differences are more likely to be detected power is the ability of a study to show a statistically significant result when an IV truly has an effect on a DV since the same people are in your groups it reduces unsystematic variability or noise between groups that may obscure true differences that exist and third within group designs require fewer participants than other designs if you want 20 participants in each condition you'll need a total of 40 people for an independent group's design however if you run the same study as a within group's design you'll only need 20 participants because each participant experiences all levels of the independent variable you can see this shown here in figure 10.15 however we also need to remember the three criteria for causal claims to be made we must have covariance temporal precedence and internal validity do within group's designs fulfill all of these three claims well we need to be careful of the second criteria temporal precedence does the independent variable come before the dependent variable well sort of one level of it does however the second time the independent variable is presented it is now come after a dependent measure it then becomes difficult to say that the participant isn't responding differently simply because they've already seen the stimuli or information already this issue is called an order effect to combat order effects we must rely on a technique called counterbalancing counterbalancing is when you have some participants experiencing one level of the independent variable first while other participants experience the level second you can see an example in the chart on screen here some participants first tested the chocolate alone while others first tasted it with the Confederate this helps to average out any effect that might be caused by the order in which the variable was presented order effects have the potential to threaten internal validity an order of fact occurs when exposure to one level of the IV influences reactions to other levels of the IV order effects are confounds for within group's designs because it might not be the levels of the IV that cause changes in the DV but the order in which the conditions were experienced one type of order effect is a practice effect which occurs when participants either get better at a task from practicing or get worse at a task due to fatigue called a fatigue effect another type of order effect is a carryover effect this occurs when there is a contamination carrying over from one condition to the next for example you drink caffeinated coffee and then take a test then you drink decaf coffee and take a test however the caffeinated coffee is still having an effect on you during the second test again in order to avoid order effects researchers use counterbalancing or presenting levels of the IV to participants in different orders there are two types of counterbalancing full and partial full counterbalancing occurs when all possible condition orders are presented for example with two conditions there are two orders with three conditions there are six orders partial counterbalancing occurs when only some of the possible condition orders are used for example a researcher could present a randomized order for each participant or what's called a Latin Square where each condition appears in each position at least once shown here there are also some disadvantages to using within group designs first like we talked about there's the potential for order effects but again these can be controlled by using counterbalancing techniques second within group design simply might not be practical or possible for example suppose someone has devised a new way of teaching children how to ride a bike called method a they want to compare method a with the older method method B obviously they cannot teach a group of children to ride a bike with method a and then return them to Baseline and teach them again with method B once taught the children are permanently changed in such a case within group's design with or without counterbalancing would simply make no sense and third demand characteristics present problems as well demand characteristics or experimental demand occur when participants pick up on cues that lead them to guess the experimenter's hypothesis thus changing the way participants would normally Act these are far more likely to occur based on the number of times a participant is exposed to a condition lastly let's talk about whether pre-test post-test designs are considered repeated measure designs should a pre-test post-test independent groups design be considered within groups design because participants are tested twice once at pre-test and once at post-test it's important to remember that in a true within group's design participants are exposed to all levels of the independent variable and the levels can be counterbalanced but in a pre-test post-test design participants experience only one level of the independent variable not all levels so the answer is no experiments that use pre-test post-test methods are not considered repeated measures the last major section we'll cover in this chapter is how we can once again use the four big validities to interrogate our causal claims we'll touch on each of these construct external statistical and internal validities let's first look at how we go about interrogating the construct validity we'll once again use the note-taking study as our example to interrogate construct validity in the note-taking study you would start by asking how well the researchers measured their dependent variables factual knowledge and conceptual knowledge the researchers actually provided examples of the factual and conceptual questions they used so you could examine them by evaluating if they actually do constitute good measures of factual learning for example what is the purpose of adding calcium propionate to bread or conceptual learning for example if a person's epiglottis was not working properly what would be likely to happen these two examples do seem to be appropriate types of questions because the first asks for direct recall of a lectures factual information and the second requires people to understand the epiglottis and make inferences to interrogate the construct validity of the independent variables you would ask how well the researchers manipulated or operationalize them in the note-taking study this was straightforward people were given either a pen or a laptop this operationalization clearly manipulated the intended independent variable a manipulation check is an extra dependent variable that researchers can insert into an experiment to convince them that their experimental manipulation worked a manipulation check was not necessary in the note-taking study because research assistants could simply observe participants to make sure they were actually using the laptops or pens they had been assigned a pilot study is a simple study using a separate group of participants that has completed either before or sometimes after conducting the study of primary interest researchers use pilot study data to confirm the effectiveness of their manipulations before using them in a Target study although manipulation checks and pilot studies aren't always needed they are often used by careful researchers recall that Mueller and Oppenheimer originally proposed that laptop note-taking would let students more easily take notes verbatim compared to taking handwritten notes in fact their study included measures of verbatim overlap so they could test their theory about why laptop note takers might perform worse after transcribing each person's notes they measured how closely the notes overlapped verbatim with the lecture material however it turned out that people in the laptop condition had in fact written more verbatim notes than people in the handwritten condition in addition the more people wrote verbatim notes the worse they did on the essay test the researchers supported their Theory by measuring key constructs that their theory proposed for external validity we need to consider how the claims can be generalized how were the participants selected for the experiment was random sampling used in the note-taking study the 67 students were a convenient sample rather than a random sample of undergraduates from Princeton University because they were a convenience sample you can't be sure if the results were generalized to all Princeton University students not to mention to college students in general in addition because the study was run only on college students you can't assume the results would apply to middle school or high school students external validity also applies to the types of situations to which an experiment might generalize the note-taking study used five videotaped Ted Talk lectures in their published article you learn Oppenheimer reported two additional experiments Each of which used new video lectures all three experiments found the same pattern so you can infer that the effect of laptop note-taking does generalize to other TED Talks however you cannot be sure from the study if laptop note-taking would generalize to a live lecture class you also don't know if the effective laptop note-taking would generalize to other kinds of college teaching such as team-based learning or lab courses what if external validity is poor remember that in an experiment the validity that is emphasized most is internal validity or experimental control in order to achieve experimental control researchers sometimes conduct their studies in artificial laboratory environments that may not represent the real world many experiments sacrifice real world representativeness in exchange for internal validity testing their Theory and testing out the causal variable from potential confounds are the steps most experiments take care of first in addition running an experiment on a relatively homogeneous sample such as college students meant that the unsystematic variability was less likely to obscure the effects of the independent variable we must also ask ourselves is the difference statistically significant is the difference between means obtained in the experiment statistically significant are you reasonably sure that the results did not occur by chance if your result isn't statistically significant then you can't conclude that you have covariance first how large is the effect effect size can help determine covariance typically the larger the effect size the stronger the causal effect is in experiments researchers use an indicator of standardized effect size called d this coefficient quantifies how far apart two groups are on the dependent variable it indicates distance between group means and how much scores within groups overlap large D values mean a stronger effect of the independent variable on the dependent variable small D equals a lot of overlap in your groups see the figure on the right which illustrates that the difference between means is exactly the same in the two groups the effect sizes are different however because of the spread of those scores or the amount of overlap table 10.2 shows Cohen's guidelines for effect sizes using d remember that internal validity is the top priority when interrogating a causal claim and we must ask ourselves three important questions one did the experimental design ensure that there were no design confounds or did some other variable accidentally co-vary along with the independent variable the researchers in the note-taking study made sure people in both groups saw the same video lectures in the same room and so on two if the experimenters used an independent group's design did they control for selection Effects by using random assignment or matching random assignment helped control for selection effects in the note-taking study and three if the experimenter is used within group's design did they control for order Effects by counterbalancing again counterbalancing was not relevant in the note-taking study because it was an independent group design so let's wrap up this week's topics we first went over two examples of simple experiments you'll learn oppenheimer's note-taking study and the pasta bowl serving size study we talked about the different types of variables we have in experimental studies specifically we have independent and dependent variables as well as control variables we discuss the reasons why experiments are able to support causal claims through the three criteria of covariance temporal precedence and internal validity we talked about the two different design methods experimenters can use either between groups designs or within group's designs we also talked about the advantages and disadvantages of each of these two methods finally we discussed how we can still use the four validities to interrogate causal claims made using experiments