L7

Sep 19, 2024

Lecture Notes on Dispersion and Skewness

Review of Last Class

  • Discussed ways of quantifying dispersion in a population/sample.
  • Standard deviation:
    • Population: ( \sigma = \sqrt{\sum (x - \mu)^2 / N} )
    • Sample: ( s = \sqrt{\sum (x - \bar{x})^2 / (n - 1)} )
    • ( n - 1 ) used in sample to provide a better estimate.

Practical Significance of Standard Deviation

  • Chebyshev's Theorem: At least (1 - \frac{1}{k^2}) of the data lies within (k) standard deviations.
    • Example: (k = 2) yields 75% coverage.
  • Normal distribution:
    • 68% within ±1 standard deviation.
    • 95% within ±2 standard deviations.
    • 99.7% within ±3 standard deviations.

Z-Score

  • Defined as ( \frac{x - \text{mean}}{\text{standard deviation}} ).
  • Used to detect outliers (e.g., (z)-score > 3).

Percentiles and Quartiles

  • Percentiles indicate position relative to a dataset.
  • Quartiles:
    • Q1 (25%), Q2 (Median, 50%), Q3 (75%).
    • Interquartile Range (IQR): ( Q3 - Q1 ).
  • Box Plot:
    • Visual representation of data spread and outliers.
    • Outliers lie outside "fence" values: ( Q1 - 1.5 \times IQR ) and ( Q3 + 1.5 \times IQR ).

Moments

  • Moments describe shape of distribution.
  • Moment about zero: ( M_r^* = \frac{\sum y^r}{n} ).
  • Moments about mean:
    • First moment (mean): ( M_1 = 0 ).
    • Second moment (variance): ( M_2 = \text{variance} ).
    • Third moment (skewness): ( M_3 ).

Skewness

  • Skewness ( A_3 = \frac{M_3}{M_2^{3/2}} ).
  • Indicates asymmetry:
    • A3 = 0 for symmetric distributions.
    • A3 < 0 when distribution is skewed left.
    • A3 > 0 when distribution is skewed right.

Example: Calculating Skewness

  • Provided example with numeric data to calculate skewness.
  • Noted that skewness indicates whether data is balanced around the mean.

Summary

  • Discussed measures of dispersion, significance of standard deviation, z-scores, percentiles, quartiles, and moments.
  • Skewness as a measure of asymmetry in data distribution.

Note: Always verify calculations and understand the implications of skewness and other statistical measures in data analysis.