Lecture on Variation and Standard Deviation
Introduction
- Topic: Variation in data sets
- Purpose: Understanding how data is spread out.
Importance of Variation
- Mean vs. Variation:
- Mean is important, but not the only metric.
- Example: Bank line wait times.
- Bank 1: Managed queue, all waited 6 minutes.
- Bank 2: Typical line, waits were 4, 7, 7 minutes.
- Bank 3: Choose line, waits were 1, 3, 14 minutes.
- All have the same mean (6), but different spread.
Measuring Variation
-
Range:
- Definition: Difference between max and min values.
- Pros: Easy to calculate.
- Cons: Only considers two data points, ignores others.
-
Standard Deviation (SD):
- Definition: Average distance from the mean.
- Properties:
- Never negative.
- Zero if all data points are the same.
- Affected by outliers.
- Calculation for Sample (s):
- Formula: ( s = \sqrt{\frac{\sum (x - \bar{x})^2}{n-1}} )
- Example: Calculate SD for data 1, 3, 14.
- Variance is SD squared.
-
Population Standard Deviation (σ):
- Formula: ( \sigma = \sqrt{\frac{\sum (x - \mu)^2}{N}} )
Empirical Rule
- Applies to normally distributed data.
- 68-95-99.7 Rule:
- 68% falls within 1 SD of the mean.
- 95% falls within 2 SDs of the mean.
- 99.7% falls within 3 SDs of the mean.
- Used to determine if data is usual or unusual.
Coefficient of Variation
- Compares SD relative to the mean.
- Formula: ( \frac{s}{\bar{x}} \times 100 %
- Helps compare variability between different datasets.
Conclusion
- Understanding variation is crucial in data analysis.
- Empirical rule and standard deviation provide insights into data spread.
- Coefficient of variation helps compare different data sets effectively.
Homework: Complete 3.3 exercises, due Monday. Explore online resources for further practice.