Understanding Variation and Standard Deviation

Sep 8, 2024

Lecture on Variation and Standard Deviation

Introduction

  • Topic: Variation in data sets
  • Purpose: Understanding how data is spread out.

Importance of Variation

  • Mean vs. Variation:
    • Mean is important, but not the only metric.
    • Example: Bank line wait times.
      • Bank 1: Managed queue, all waited 6 minutes.
      • Bank 2: Typical line, waits were 4, 7, 7 minutes.
      • Bank 3: Choose line, waits were 1, 3, 14 minutes.
    • All have the same mean (6), but different spread.

Measuring Variation

  1. Range:

    • Definition: Difference between max and min values.
    • Pros: Easy to calculate.
    • Cons: Only considers two data points, ignores others.
  2. Standard Deviation (SD):

    • Definition: Average distance from the mean.
    • Properties:
      • Never negative.
      • Zero if all data points are the same.
      • Affected by outliers.
    • Calculation for Sample (s):
      • Formula: ( s = \sqrt{\frac{\sum (x - \bar{x})^2}{n-1}} )
      • Example: Calculate SD for data 1, 3, 14.
      • Variance is SD squared.
  3. Population Standard Deviation (σ):

    • Formula: ( \sigma = \sqrt{\frac{\sum (x - \mu)^2}{N}} )

Empirical Rule

  • Applies to normally distributed data.
  • 68-95-99.7 Rule:
    • 68% falls within 1 SD of the mean.
    • 95% falls within 2 SDs of the mean.
    • 99.7% falls within 3 SDs of the mean.
    • Used to determine if data is usual or unusual.

Coefficient of Variation

  • Compares SD relative to the mean.
  • Formula: ( \frac{s}{\bar{x}} \times 100 %
  • Helps compare variability between different datasets.

Conclusion

  • Understanding variation is crucial in data analysis.
  • Empirical rule and standard deviation provide insights into data spread.
  • Coefficient of variation helps compare different data sets effectively.

Homework: Complete 3.3 exercises, due Monday. Explore online resources for further practice.