Numerical Summary Measures

Jul 3, 2024

Lecture Notes: Numerical Summary Measures

Overview

  • Discussed various numerical summary measures, including familiar and less common ones.
  • Introduced sample estimates vs. population measures.
  • Emphasis on understanding basic statistical concepts for continuous data.

Measures of Central Tendency

Sample Mean (Arithmetic Mean)

  • Definition: Sum of all data points divided by the number of observations (n).
  • Notation: X̄ (X bar) represents the sample mean.
  • Example: Given data: 120, 80, 90, 110, 95.
    • Sum = 495, n = 5, X̄ = 495/5 = 99 mmHg.
  • Formula: ![Sample Mean Formula](https://latex.codecogs.com/svg.image?ar{X}&space;=&space; rac{ extstyleegin{array}{c} extstyle extstyle extstyle&space; extstyle extstyle&space; extstyle&space; extstyle extstyle& extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle\sum_{i=1}^n&space;X&space;i}{n})
  • Sensitivity: Sensitive to extreme values. One data point can significantly affect the mean, especially in small samples.

Sample Median

  • Definition: Middle value in an ordered dataset, also known as the 50th percentile.
  • Computation: Arrange data in order and pick the middle value.
  • Example: Given data: 80, 90, 95, 110, 120.
    • Median = 95 mmHg.
  • Robustness: Less sensitive to extreme values compared to the mean.

Measures of Variability

Sample Variance (s²) and Standard Deviation (s)

  • Variance (s²): Average squared distance of each data point from the mean.
    • Formula: ![Sample Variance Formula](https://latex.codecogs.com/svg.image?s^2&space;=&space; rac{ extstyleegin{array}{c} extstyle extstyle extstyle&space;1}{n-1} extstyle extstyle extstyle extstyle&space; extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle&sum_{i=1}^n&space;(X_i&space;-Xar{)^2})
  • Standard Deviation (s): Square root of the variance, representing the average distance from the mean.
    • Example: Given data: 120, 80, 90, 110, 95.
      • Mean (X̄) = 99 mmHg.
      • Variance (s²) = 255 mmHg².
      • Standard Deviation (s) = √255 ≈ 16 mmHg.
  • Conclusion: Larger s indicates more spread in data.

Population vs. Sample

  • Sample: Subset of a population used to estimate population parameters.
  • Population: Entire group of interest.
  • Sample Statistics: Estimates of population parameters (e.g., mean, standard deviation).
  • Population Parameters: True values of interest (e.g., population mean (µ), population standard deviation (σ)).
  • Representation: Simple random sampling ensures every subset is equally likely to be chosen.

Importance of Sample Size

  • Effect on Estimates: Larger samples provide better estimates of population parameters.
  • Sample Statistics: Mean and standard deviation depend on sample size for accuracy.
  • Predictability: Larger samples do not predict direction (increase or decrease) of sample estimates but provide a better approximation to the population parameters.

Degrees of Freedom

  • Explanation: The number of values that can vary in a dataset after a constraint (e.g., mean) is imposed.
  • Example: If sample mean is known and n readings are provided, only n-1 values are free to vary.
  • Division by n-1: Corrects bias in estimate of population variance from a sample (Bessel's correction).

Range vs. Standard Deviation

  • Range: Difference between maximum and minimum values.
    • Limitation: Range increases with sample size due to the likelihood of extreme values.
  • Standard Deviation: Provides a consistent measure of variability regardless of sample size.

Conclusion

  • Summary Measures: Mean, median, and standard deviation are key summaries but do not alone tell the whole story of continuous data.
  • Next Steps: Explore visualization of continuous data to understand the overall distribution and shape.