What’s at the heart of reproducibility when we talk about research? The ability to do the same experiment and get the same results. It’s replicability. Replication is one of the most important aspects of good research, and it’s the topic of this module in Experimental Design. [credits] We’re scientists. We want to do an experiment. Say that I’m interested in squirrels. Specifically, I’m interested in whether male squirrels are bigger than female squirrels. Now it’s likely that many factors are related to squirrel size besides sex. Maybe they grow differently based on what they eat or how much room they have to run around. Other factors might be coming into play that impacts squirrel size beyond their sex, like readily available food. Scientists want to remove as much of the effect of things they’re not studying as possible. These are called “confounding factors” or “confounders.” Sometimes we can eliminate the effect of confounders by holding them constant across groups. We could make sure that all of the squirrels we observe come from an area of abundant food, or live in similar locations, for example. But even if we try to control for every confounding factor we can think of, there could still be other confounders we don’t think of or can’t control for. After all, not all male squirrels and female squirrels are the same size, even those that eat the same amount of food and live in the same location. There’s still variation. We call this random variation. Random variation makes it hard to learn anything from a single measurement. If we measure one group of squirrels at one point in time, some of those squirrels will be randomly large. Some will be randomly small. In order to get a sense of what the “real” size of a male and female squirrel is, we need to measure a lot of squirrels. This is a simple, but unbelievably important concept. If we do an experiment only on a small number of subjects, we may get a result that’s due to random variation and not due to the factor of interest. Measure just a couple squirrels, and you might get a few big females and a few small males. Measure a lot of squirrels, and you’re more likely to get many more average sized ones. The more times we measure, or replicate our observations, the more assured we can be that what we are seeing represents the truth. Let’s work through some experiments to see what I’m talking about. Animal first, then human. Say we have some mice. We’re interested if a newly developed feed will make them faster. Of course, we need to do an experiment. The simplest thing to do is give the mice different feeds and see if they can run a certain distance faster. Savvy scientists will immediately see some issues. Maybe the time of day is important. Maybe how much sleep the mice got is a factor. Maybe they run at different speeds depending on what’s motivating their running. Maybe the temperature affects their speed. We’d want to control for all of these things. We could make them all race at the same time of the day, and all at the same time of the year. We could make sure that they all go to bed at a certain hour (or at least turn out the lights) and wake them at the same time. We could entice them to run with the same tasty treat at the end of the track every race. We could try and make every factor that we can control the same. But, even after controlling for these factors, we’ll still get some variation. Here, we’re going to chart the differences between the race times of the mice on feed A and feed B. If there’s no difference, we get 0. If mice on feed A are faster, the results are positive, and if mice on feed B are faster the results are negative. Our first few races might look like this. But the more we do, the more they might fill in like this. If the variation we are seeing is random, then as we replicate the experiment over and over again, we will get what’s called a normal distribution. If there was no difference between the feeds, then our results should cluster around 0. Because of random variation, sometimes those on A will be faster, and sometimes those on B will be faster. But more results will be closer to the actual value of 0. This is an example of between experiment replication – where scientists repeat the same experiment more than once. Another approach to address random variation is to increase the number of mice that we include in each experiment. The more mice that we measure per experiment, the more that the results will look like this. Individual experiments will be more likely to be close to the true value of 0. Variation will still occur, but it will be decreased. This is an example of within experiment replication – where scientists measure more subjects. All of this holds true for human subject research as well. Imagine the same experiment, but with kids. If we want to know how different food affects activity. If we only take one measurement, we might get this. But as we take more and more, we see the normal distribution that clusters around the most likely result. This isn’t the fault of the subjects, and it’s not something that would be different if we’re working with molecules, cells, tissues, animals, or people. Variation will exist, and we’ll need replication to account for it. Let’s review. The more times you replicate an experiment among different subjects, the more likely that you will begin to see results cluster around a true value. The more subjects you measure in each experiment, the more likely it is that any single experiment will be a correct or true value. Put another way, within experiment replication beats down error variance. It improves precision. Between experiment replication is more about rooting out bias. Any one experiment can be biased, but a bunch will (hopefully) be biased differently. Within experiment replication – measuring more subjects - and between experiment replication – repeating experiments more than once, both increase the likelihood that results are robust, and that others will get the same result later.