How Not to Run a Consumer-Side Experiment - by Cecilia Chen
Jul 12, 2024
How Not to Run a Consumer-Side Experiment - by Cecilia Chen
Introduction
Cecilia Chen is a seasoned behavioral economist and a senior data scientist at Livery, also a senior lecturer in Economics at the University of Exeter.
Specializes in combining experimental methodology, game theory, and insights from psychology and sociology.
Focuses on bringing behavioral science out of academia into the real world.
Talk: "How not to run a consumer-side experiment."
Topics Covered
How to run and not run an experiment
Experiments with imperfect compliance
Practical example and learnings from a real experiment
Types of Experiments
AB Testing
Control and treatment groups are randomly assigned.
Common in testing new features against the current version.
Experiments with Non-Compliance
Some units in the treatment group may not actually receive the treatment (e.g., clinical trials where patients forget to take their medication).
Example: Health regimens and user feature adoption in app testing.
Problems with Non-Compliance
Simply calculating the average treatment effect dilutes the impact and creates larger variance.
Results in higher p-values making it harder to achieve statistical significance.
Solutions for Non-Compliance
Analytic Solutions
Local Average Treatment Effect (LATE): Adjusts the diluted average treatment effect using the percentage of units treated.
Instrumental Variables (IVs): Uses actual treatment status as an instrument to calculate effect size accurately. (More complex but precise).
Experimental Design Solutions
Design the experiment to identify units in both control and treatment that would have been treated.
Compare only those units within both groups to get an accurate measure of treatment impact.
Case Study: Delivery Time Promises
Problem: Ambitious delivery time promises by restaurant partners.
Solution: Algorithm to replace unrealistic delivery times with actual travel times.
Experiment Design: Logged data whether delivery time was modified but did not show modified times to users in control group. Compared modified delivery times to untreated.
Outcome: Identified issue with incomplete data logging. Resolved by linking data via timestamp.
Key Takeaways
Always Log Your Data: If in doubt, log it.
Plan Ahead: Think through the type of analysis beforehand to identify potential missing links.
Use Conservative Analytical Approaches: Making sure to understand variances and probabilities conservatively.
Identify and ISOLATE Treatment Effects Accurately: Ensuring you have enough data to accurately gauge effects.
Q&A Highlights
Sample Size and Variance: Larger sample size increases variance, making statistical significance harder to achieve.
Algorithm Evaluation: Random assignment ensures comparable treatment rates between control and treatment.
Effectiveness of the Feature: Improved order volumes linked to more accurate delivery time promise.
Customer Satisfaction: Assessed through order volume, care contacts, and reduction in complaints.
Conclusion
Real life experiments in data science may not always go as planned, but effective logging, thoughtful planning, and rigorous analytic approaches can alleviate many common pitfalls.