Lecture Notes: Numerical Summary Measures

Overview

Discussed various numerical summary measures, including familiar and less common ones.
Introduced sample estimates vs. population measures.
Emphasis on understanding basic statistical concepts for continuous data.

Definition: Sum of all data points divided by the number of observations (n).
Notation: X̄ (X bar) represents the sample mean.
Example: Given data: 120, 80, 90, 110, 95.
- Sum = 495, n = 5, X̄ = 495/5 = 99 mmHg.
Formula: ![Sample Mean Formula](https://latex.codecogs.com/svg.image?ar{X}&space;=&space;rac{ extstyleegin{array}{c} extstyle extstyle extstyle&space; extstyle extstyle&space; extstyle&space; extstyle extstyle& extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle\sum_{i=1}^n&space;X&space;i}{n})
Sensitivity: Sensitive to extreme values. One data point can significantly affect the mean, especially in small samples.

Definition: Middle value in an ordered dataset, also known as the 50th percentile.
Computation: Arrange data in order and pick the middle value.
Example: Given data: 80, 90, 95, 110, 120.
- Median = 95 mmHg.
Robustness: Less sensitive to extreme values compared to the mean.

Variance (s²): Average squared distance of each data point from the mean.
- Formula: ![Sample Variance Formula](https://latex.codecogs.com/svg.image?s^2&space;=&space;rac{ extstyleegin{array}{c} extstyle extstyle extstyle&space;1}{n-1} extstyle extstyle extstyle extstyle&space; extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle extstyle&sum_{i=1}^n&space;(X_i&space;-Xar{)^2})
Standard Deviation (s): Square root of the variance, representing the average distance from the mean.
- Example: Given data: 120, 80, 90, 110, 95.
  - Mean (X̄) = 99 mmHg.
  - Variance (s²) = 255 mmHg².
  - Standard Deviation (s) = √255 ≈ 16 mmHg.
Conclusion: Larger s indicates more spread in data.

Sample: Subset of a population used to estimate population parameters.
Population: Entire group of interest.
Sample Statistics: Estimates of population parameters (e.g., mean, standard deviation).
Population Parameters: True values of interest (e.g., population mean (µ), population standard deviation (σ)).
Representation: Simple random sampling ensures every subset is equally likely to be chosen.

Effect on Estimates: Larger samples provide better estimates of population parameters.
Sample Statistics: Mean and standard deviation depend on sample size for accuracy.
Predictability: Larger samples do not predict direction (increase or decrease) of sample estimates but provide a better approximation to the population parameters.

Explanation: The number of values that can vary in a dataset after a constraint (e.g., mean) is imposed.
Example: If sample mean is known and n readings are provided, only n-1 values are free to vary.
Division by n-1: Corrects bias in estimate of population variance from a sample (Bessel's correction).

Range: Difference between maximum and minimum values.
- Limitation: Range increases with sample size due to the likelihood of extreme values.
Standard Deviation: Provides a consistent measure of variability regardless of sample size.

Summary Measures: Mean, median, and standard deviation are key summaries but do not alone tell the whole story of continuous data.
Next Steps: Explore visualization of continuous data to understand the overall distribution and shape.