Lecture Notes on Dispersion and Skewness
Review of Last Class
- Discussed ways of quantifying dispersion in a population/sample.
- Standard deviation:
- Population: ( \sigma = \sqrt{\sum (x - \mu)^2 / N} )
- Sample: ( s = \sqrt{\sum (x - \bar{x})^2 / (n - 1)} )
- ( n - 1 ) used in sample to provide a better estimate.
Practical Significance of Standard Deviation
- Chebyshev's Theorem: At least (1 - \frac{1}{k^2}) of the data lies within (k) standard deviations.
- Example: (k = 2) yields 75% coverage.
- Normal distribution:
- 68% within ±1 standard deviation.
- 95% within ±2 standard deviations.
- 99.7% within ±3 standard deviations.
Z-Score
- Defined as ( \frac{x - \text{mean}}{\text{standard deviation}} ).
- Used to detect outliers (e.g., (z)-score > 3).
Percentiles and Quartiles
- Percentiles indicate position relative to a dataset.
- Quartiles:
- Q1 (25%), Q2 (Median, 50%), Q3 (75%).
- Interquartile Range (IQR): ( Q3 - Q1 ).
- Box Plot:
- Visual representation of data spread and outliers.
- Outliers lie outside "fence" values: ( Q1 - 1.5 \times IQR ) and ( Q3 + 1.5 \times IQR ).
Moments
- Moments describe shape of distribution.
- Moment about zero: ( M_r^* = \frac{\sum y^r}{n} ).
- Moments about mean:
- First moment (mean): ( M_1 = 0 ).
- Second moment (variance): ( M_2 = \text{variance} ).
- Third moment (skewness): ( M_3 ).
Skewness
- Skewness ( A_3 = \frac{M_3}{M_2^{3/2}} ).
- Indicates asymmetry:
- A3 = 0 for symmetric distributions.
- A3 < 0 when distribution is skewed left.
- A3 > 0 when distribution is skewed right.
Example: Calculating Skewness
- Provided example with numeric data to calculate skewness.
- Noted that skewness indicates whether data is balanced around the mean.
Summary
- Discussed measures of dispersion, significance of standard deviation, z-scores, percentiles, quartiles, and moments.
- Skewness as a measure of asymmetry in data distribution.
Note: Always verify calculations and understand the implications of skewness and other statistical measures in data analysis.