L5

Sep 17, 2024

Lecture Notes: Quantifying Data Variation

Recap of Previous Lecture

  • Discussed three matrices of quantifying data:
    • Mean
    • Median
    • Mode

Quantifying Data Variation

Arithmetic Mean

  • Formula: ( \bar{x} = \frac{\Sigma x_i}{n} ), where ( n ) is the number of observations.
  • For a population, replace ( \bar{x} ) with ( \mu ) and ( n ) with ( N ).
  • Transformation examples:
    • ( y = ax ) leads to ( \bar{y} = a\bar{x} )
    • ( y = c + x ) results in ( \bar{y} = c + \bar{x} )
    • ( y = c + ax ) provides ( \bar{y} = c + a\bar{x} )
  • Caveat: Sensitive to outliers (e.g., data 1, 1, 1, 2, 2, 20 yields ( \bar{x} = 4.5 )).

Geometric Mean

  • Formula: ( \sqrt[n]{\Pi x_i} )
  • Example values: 15, 10, 5, 8, 17, 100
    • Arithmetic mean = 25.8
    • Geometric mean = 14.7 (closer to main data set)
  • Property: Geometric mean ( \leq ) Arithmetic mean

Median

  • Middle value of an ordered data set.
  • Formula for position: ( 0.5(n + 1) )
  • If even number of values, average the two middle values.

Mode

  • Most frequently occurring value in the data set.
  • Useful for large datasets; less so for small datasets.

Comparing Mean, Median, and Mode

  • Mode is used for large datasets.
  • Mean is sensitive to outliers.
  • Median is less sensitive to outliers.

Examples

  1. Example data set: 1, 2, 3, 2, 4, 2, 8, 3, 6, 3, 2, 5, 45, 36, 89
    • Median = 4
    • Mode = 2
    • Mean > Median > Mode
  2. When data has two subpopulations, neither mean, median, nor mode make sense.

Variability Measures

Range

  • Formula: Maximum - Minimum
  • Sensitive to outliers, may not represent data distribution well.

Mean Absolute Deviation

  • Formula: ( \frac{\Sigma |x_i - \bar{x}|}{n} )

Standard Deviation

  • More commonly used measure of variability.
  • Formula for population: ( \sigma = \sqrt{\frac{\Sigma (x_i - \mu)^2}{N}} )
  • Formula for sample: ( s = \sqrt{\frac{\Sigma (x_i - \bar{x})^2}{n-1}} )
    • Using ( n-1 ) gives a better estimate for small sample sizes.
  • Variance: Square of standard deviation.

Transformation of Standard Deviation

  • ( y = ax ) results in ( s_y = a \times s_x )
  • ( y = c + x ) does not change ( s_x )
  • For grouped data, use frequency to calculate mean and variance.

Conclusion

  • Arithmetic mean, median, and mode have specific use cases.
  • Standard deviation is vital for understanding data dispersion.
  • Population vs. sample standard deviation calculation differs slightly.
  • Transformations of data do not affect standard deviation unless scaled by a factor.

Thank you for your attention. We will continue in the next lecture.