Transcript for:
Ethics in Statistical Data Collection

Hi, I'm Adriene Hill, and welcome back to Crash Course Statistics. Today we're going to take a step back from sampling and regressions to talk about the impact of all that statistical gathering. We've seen that the interpretation of this information can have real, lasting effects on our society, but its collection can also have lasting effects on the subjects.

The process of gathering and applying statistics can affect real people's lives, which means there is a responsibility to gather and use this data ethically. Today we're going to discuss five stories. four of them are real, and all of them can help us learn where collecting data can go wrong, and how we can help prevent these things from happening again.

Our first story begins in 1822 when a young fur trapper named Alexis St. Martin got shot in the stomach when another trapper's gun accidentally went off. The wound was serious but a local army doctor, William Beaumont, was able to stabilize St. Martin. through a series of presumably painful anesthetic-free surgeries.

But Dr. Beaumont couldn't close the wound, which left a small hole called a gastric fistula that allowed access to the stomach. St. Martin was out of a job, since it's hard to be an active fur trapper with a hole in your stomach. So he signed a contract to become a servant to Dr. Beaumont. In addition to traditional chores, St. Martin participated in all sorts of experiments at the whim of the doctor. Beaumont used the gastric fistula to study how the body digested food.

He made huge strides in the field, including exploring the influence of mental disturbance on the process of digestion and correcting the long-held belief that the stomach digested food by grinding it up. When in 1838 the two finally parted ways, Beaumont spent the last 15 years of his life pleading with St. Martin to come back. Maybe unsurprisingly, St. Martin declined.

Without this strange situation, the field of gastroenterology may have progressed more slowly. In fact, St. Martin's fistula was an inspiration to Pavlov, who used fistulas in dogs during his famous classical conditioning experiments. But all this progress came at a cost to St. Martin, and also to those dogs.

One of the most important ethical considerations in research is whether humans who participate are able to feasibly say no. People with little power, resources, or money can be coerced into participating in experiments that they're uncomfortable with. Most research institutions have a committee called the Institutional Review Board, IRB, which oversees all the research at that institution to make sure that it's ethical.

Voluntariness is one of the most important things that they check for. This prohibits people with undue power or influence over us from asking that we participate in a research study. For example, your boss or professor is limited in how they can ask you to participate in a research study because you might feel that you have no choice, that you have to participate.

Otherwise, they might fire you or give you a failing grade. Ethical research needs to be voluntary, at least in humans. Animal rights activists argue that since animals cannot volunteer for a study, we shouldn't use them.

In addition to their voluntary participation, subjects should also know what will happen to them during the study. This was not the case in 1932, when the Tuskegee Institute began a 40-year-long study on over 600 black men. Under the guise of free medical care, the men were secretly enrolled in a study to observe the long-term progression of syphilis.

Over 300 of the men enrolled had the disease, but researchers failed to treat them with anything but fake or innocuous medicines like aspirin, even after it became clear that penicillin was a high- highly effective treatment for the disease. Late stage symptoms of syphilis include serious neurological and cardiovascular problems, yet the institute allowed the study to go on. Some wives and kids also contracted syphilis.

In 1972, public outrage caused the study to close down, when news of unethical conditions was leaked to the media. In 1951, at the same time the Tuskegee study was running, a poor tobacco farmer named Henrietta Lacks went to Johns Hopkins Hospital in Maryland and had cells from a tumor. collected without her knowledge or consent.

These cells were used to grow a new cell line, called the HeLa line, which scientists used to do in vitro experiments. The cell's ability to thrive and multiply outside her body made the cell line useful to researchers. It's still used today for medical research, lending itself to cancer and AIDS research, as well as immunology studies, like the one that led Jonas Salk to discover the polio vaccine.

And in 1955, HeLa cells were the first human cells to be successfully cloned. Over time, the cell line and the discoveries it facilitated became extremely lucrative for researchers, but Lacks and her family didn't receive any financial benefit. These studies emphasize the need for informed consent. Subjects have the right to not only be able to access information, but also be able to access information.

receive all the facts relevant to their decision to participate, they have the right to understand them. Many institutions require that information must be presented clearly and in a way that's appropriate for the subject's comprehension level. Even children whose parents are legally allowed to consent for them must get an age-appropriate explanation of what will happen in the study. This is incredibly important because it respects the dignity and autonomy of the subject, allowing them to stop research procedures at any time.

That incentivizes researchers to design studies with more acceptable levels of risk. In all three of those stories, the research procedures didn't have any benefit to the patients. In 1947, the Nuremberg Code was created in order to establish guidelines for the ethical treatment of human subjects. One of the main tenets is beneficence, which not only requires that researchers minimize the potential risk to subjects, but also requires that the risk should be outweighed by potential benefits to the patient and the scientific community.

The Nuremberg Code was created and implemented after the Second World War, during which horrifying experiments were conducted on prisoners in Nazi concentration camps. The Nuremberg Code lays out ten principles to which modern-day studies still must adhere. These ten principles stand as the basis for much of current research ethics, and include things like voluntariness, informed consent, and beneficence.

But as we settle into the age of technology, the application of these ethical principles can become more cloudy. Our last story here isn't real, but it illustrates the complexities of research ethics in the digital age. In the seventh season of the hit show Parks and Recreation, a giant internet corporation comes to the small town of Pawnee, Indiana, to offer free Wi-Fi to the entire city.

Everyone gladly accepts. They like the free service. But when boxes of personalized gifts arrive at every citizen's doorstep, some become a little concerned because the gifts are perfect, fitting the exact interests of the recipient.

Someone who collects stuffed pigs dressed as celebrities gets Hamuel L. Jackson, and someone obsessed with politics gets the newest Joe Biden poetry collection. These boxes are perfect for the people who receive them. Eerily perfect.

So how did the internet company know what each person would want? Well, in the show it turns out that the free Wi-Fi came with a pretty high cost. Privacy. In exchange for the free Wi-Fi, the internet company, Grizzle, was collecting all data that was transferred over the network. This gets called data mining.

And it may seem far-fetched, But it's happening right now, not the gift stuff, the data mining. Grocery stores track what we buy with our rewards cards, Netflix keeps track of everything we watch, Amazon knows exactly what we buy, what we look at, and those terms of service agreements we click on without reading them when we download an app or sign up for a social media account, they often include some kind of stipulation. When we use free internet services, we're agreeing to pay not with money, but usually with our information.

Facebook and Google offer their services for free in part because they're profiting off of our data. They might be using it for research, or to customize our experience on the site so that maybe we buy or watch more stuff on Amazon and YouTube. They also use it to sell targeted ads, giving advertisers the opportunity to select exactly the type of people who are going to see their ads.

And sometimes the way these ads are targeted can be pretty unethical. For example, companies discriminating based on age by specifying that job ads should only be shown to young people. Data is being used in ways that affect every facet of our lives.

But since we're still in the beginning stages of this huge influx of digital information, we get to see the progression of ethics in this area unfold right in front of us. The laws that will protect your data and privacy, and mine, like the Nuremberg Code protects participants in scientific experiments, are still being written. And many of the same concepts are coming up. For example, using the internet. Using Google, social media have become so entrenched in some societies that it's almost impossible to hold a job without them.

And if that's the case, we need to ask whether it's ethical to require that users sign over their right to privacy in order to use them. Or like in most clinical studies, does that border on coercion? We also need to ask whether companies that use or sell our information be held to the standard of informed consent, which requires agreements to be in language that's simple enough for the user to understand what they're agreeing to.

Subtitles by the Amara.org community even if they don't have a law degree, or on the other hand whether companies should be exempt from this requirement if they only use the data internally. It's possible to draw parallels between data mining and the stories we talked about at the beginning of the episode, though admittedly it's not quite as harrowing. Like Alexis St. Martin may have felt pressure to stay with Dr. Beaumont because he couldn't work as a fur trapper anymore. It can be argued, to a much lesser degree, that we use sites like Google or Twitter because we feel there's no other option as we try to remain informed in our hyper-connected world.

And we might not be getting all the information we need to consent in an understandable way, similar to how Henrietta Lacks was not informed why her cells were being taken or what they'd be used for. These situations are obviously not exactly the same. And we, as a society, need to decide how to apply the principles of research ethics in these new digital spaces. As we move forward and gain the ability to do things like sequence an entire genome in days rather than years, we open the door for amazing advances in personalized medicine that could save millions of lives. But we also open the door for abuse of this sensitive information.

The conversation about how to handle these types of situations is still going on. We're the ones who will decide what is said, and we're going to be the subject of those decisions. Thanks for watching, I'll see you next time. Crash Course Statistics is filmed in the Chad and Stacey Emigholz Studio in Indianapolis, Indiana.

and it's made with the help of all these nice people. Our animation team is Thought Cafe. If you'd like to keep Crash Course free for everyone, forever, you can support the series at Patreon, a crowdfunding platform that allows you to support the content you love. Thank you to all our patrons for your continued support. Crash Course is a production of Complexly.

If you like content designed to get you thinking, check out some of our other channels at Complexly.com. Thanks for watching.