Introduction to ANOVA in Statistics

Jun 26, 2024

Introduction to ANOVA in Statistics

Instructor: Adriene Hill, Crash Course Statistics

Key Topics Covered:

  • Differences between two groups using t-tests
  • Comparison of more than two groups using ANOVA

General Linear Model (GLM) Framework

  • Objective: Partition data into two piles: explained information (model) and unexplained information (error).

Introduction to ANOVA (Analysis of Variance)

  • ANOVA: Similar to regression but uses categorical variables to predict a continuous variable.
  • Example: Using a soccer player's position to predict running distance.

How ANOVA Works

  • Model Example: Predicting bunny sightings based on weather (rainy vs sunny days).
  • Representation: Error is the difference between observed and predicted values.
  • ANOVA Model: Represented as a regression with categorical variables, e.g., rainy days coded as 0 and sunny days as 1.

Regression and ANOVA Similarities

  • Regression: Slope shows relationship between variables (e.g., years and shoe size).
  • ANOVA: Slope shows difference between group means (e.g., bunnies seen on rainy vs. sunny days).

Example: Chocolate Bar Ratings

  • Dataset: Chocolate bars rated based on type of cocoa bean (Criollo, Forastero, Trinitario).
  • Goal: Determine if bean type affects ratings.

Calculation Steps

  1. Sums of Squares Total (SST): Total variation in data (N * Variance).
  2. Sums of Squares for Model (SSM): Variation explained by the model.
  3. Sums of Squares for Error (SSE): Variation not explained by the model.
  4. Degrees of Freedom: Calculated using number of groups and samples.
  5. F-Statistic: Ratio of explained variation to unexplained variation.
  6. P-value: Used to determine statistical significance.

Follow-Up Tests

  • Following significant ANOVA results with multiple t-tests for each pair of groups to find specific differences.
  • Example results: Criollo beans significantly different from Trinitario and Forastero beans.

Application to More Groups

  • Original Use: R.A. Fisher's fertilizer studies on potato farms.
  • Example: 12 varieties of potatoes and different fertilizers.
  • ANOVA Table: Summarizes degrees of freedom, sums of squares, mean squares, F-statistic, and p-value.

Conclusion

  • Key Ideas:
    1. ANOVA and regression use similar models (General Linear Model form).
    2. ANOVAs help filter significant effects, saving time and focusing on meaningful analysis.
  • Practical Tip: Avoid unnecessary tests if the initial ANOVA shows no significant effects.

Additional Notes

  • Omnibus Test: ANOVA is an overall test that indicates the presence of a significant difference without specifying where.
  • Next Steps: If significant, perform follow-up tests to find specific differences.