How Not to Run a Consumer-Side Experiment - by Cecilia Chen

Jul 12, 2024

How Not to Run a Consumer-Side Experiment - by Cecilia Chen

Introduction

  • Cecilia Chen is a seasoned behavioral economist and a senior data scientist at Livery, also a senior lecturer in Economics at the University of Exeter.
  • Specializes in combining experimental methodology, game theory, and insights from psychology and sociology.
  • Focuses on bringing behavioral science out of academia into the real world.
  • Talk: "How not to run a consumer-side experiment."

Topics Covered

  1. How to run and not run an experiment
  2. Experiments with imperfect compliance
  3. Practical example and learnings from a real experiment

Types of Experiments

  • AB Testing
    • Control and treatment groups are randomly assigned.
    • Common in testing new features against the current version.
  • Experiments with Non-Compliance
    • Some units in the treatment group may not actually receive the treatment (e.g., clinical trials where patients forget to take their medication).
    • Example: Health regimens and user feature adoption in app testing.

Problems with Non-Compliance

  • Simply calculating the average treatment effect dilutes the impact and creates larger variance.
  • Results in higher p-values making it harder to achieve statistical significance.

Solutions for Non-Compliance

Analytic Solutions

  • Local Average Treatment Effect (LATE): Adjusts the diluted average treatment effect using the percentage of units treated.
  • Instrumental Variables (IVs): Uses actual treatment status as an instrument to calculate effect size accurately. (More complex but precise).

Experimental Design Solutions

  • Design the experiment to identify units in both control and treatment that would have been treated.
  • Compare only those units within both groups to get an accurate measure of treatment impact.

Case Study: Delivery Time Promises

  • Problem: Ambitious delivery time promises by restaurant partners.
  • Solution: Algorithm to replace unrealistic delivery times with actual travel times.
  • Experiment Design: Logged data whether delivery time was modified but did not show modified times to users in control group. Compared modified delivery times to untreated.
  • Outcome: Identified issue with incomplete data logging. Resolved by linking data via timestamp.

Key Takeaways

  1. Always Log Your Data: If in doubt, log it.
  2. Plan Ahead: Think through the type of analysis beforehand to identify potential missing links.
  3. Use Conservative Analytical Approaches: Making sure to understand variances and probabilities conservatively.
  4. Identify and ISOLATE Treatment Effects Accurately: Ensuring you have enough data to accurately gauge effects.

Q&A Highlights

  • Sample Size and Variance: Larger sample size increases variance, making statistical significance harder to achieve.
  • Algorithm Evaluation: Random assignment ensures comparable treatment rates between control and treatment.
  • Effectiveness of the Feature: Improved order volumes linked to more accurate delivery time promise.
  • Customer Satisfaction: Assessed through order volume, care contacts, and reduction in complaints.

Conclusion

  • Real life experiments in data science may not always go as planned, but effective logging, thoughtful planning, and rigorous analytic approaches can alleviate many common pitfalls.
  • Embrace mistakes as learning opportunities.

Applause and Wrap-Up

  • Encouragement to ask further questions.
  • Reminder: When in doubt, always log your data!