📊

Chapter 2: Numerical Descriptors

Jun 12, 2024

Chapter 2: Numerical Descriptors

Chapter Objectives

  • Describing distribution with numbers:
    • Measure of center: Mean and Median
    • Measure of spread: Quartiles and Standard Deviation
    • Five-number summary and Box plots (a.k.a. Box and Whisker plots)
    • Interquartile range (IQR) and Outliers
    • Dealing with outliers
    • Choosing among summary statistics
    • Organizing statistical problems

Measure of Center

Mean (Arithmetic Average)

  • Calculation: Sum of values / Number of values (n)
  • Symbol: xÌ„ (x-bar)
  • Formula: xÌ„ = (Σ xi) / n where xi are the values and n is the number of values
  • Nature: Sensitive to skewing and outliers

Median

  • Definition: Midpoint of a distribution
  • Calculation:
    • Sort values from smallest to largest
    • Median location: (n + 1) / 2
    • If n is odd: Median is the center value
    • If n is even: Median is the mean of the two center values
  • Nature: Resistant to skewing and outliers
  • Median and Mean should be close if the distribution is symmetric

Study Calculation Example:

Group Sizes in Bars

  • n Value: 500 groups
  • Median: Group size 2 (most frequent)
  • Mean: Slightly larger than 2

Measure of Spread

Five-Number Summary

  • Components: Minimum, Q1 (First Quartile), Median, Q3 (Third Quartile), Maximum
  • Quartiles:
    • Q1: Median of values below the overall median
    • Q3: Median of values above the overall median
  • Example Calculation:
    • Median: n + 1 / 2
    • Q1: Median of lower half
    • Q3: Median of upper half

Standard Deviation (SD)

  • Definition: Measure of spread concerning how far each observation is from the mean
  • Calculation:
    • Variance (s²): s² = Σ (xi - xÌ„)² / (n - 1)
    • Standard Deviation (s): s = √s²
  • Example Calculation: Metabolic rates of 7 men
    • Sample mean (xÌ„) = 1600
    • Variance (s²) = 35811.7
    • Standard Deviation (s) = 189.2

Box and Whisker Plot

  • Components: Box (Q1 to Q3), Median line, Whiskers (min and max), and Outliers marked as asterisks
  • IQR: Q3 - Q1
  • Finding Outliers:
    • Lower bound: Q1 - 1.5 * IQR
    • Upper bound: Q3 + 1.5 * IQR
  • Example Calculation:
    • Q2 (Median): 3.4
    • Q1: 2.2
    • Q3: 4.35
    • IQR: 2.15
    • Upper Outlier: 7.9 (outlier)
    • Whisker max adjusted to 5.6

Dealing with Outliers

  • Types of Outliers:
    • Human error (recording/experiment)
    • Legitimate but wild observations
  • Action: Assess whether to exclude based on interest in typical vs. all individuals; never exclude to force data fit

Choosing Summary Statistics

  • Mean + SD: Use for fairly symmetrical distributions
  • Median + Five-Number Summary: Use for skewed distributions or with outliers
  • Example: Phytopigment concentrations—highly right-skewed data, likely Median preferred

Organizing Statistical Problems

  1. State: Practical question in real-world context
  2. Plan: Specific statistical operations needed
  3. Solve: Make graphs and calculations
  4. Conclude: Practical conclusion in real-world context

End of Chapter 2