Measures of Central Tendency and Dispersion

Jul 12, 2024

Lecture Notes: Measures of Central Tendency and Dispersion

Announcements

  • Tomorrow is a holiday; no lecture.
  • Material covered today will be tested next Monday.
  • Yesterday's and today's lecture will be uploaded tonight.

Finding Medians and Modes

Median

  • Def: Median splits a data set into two equal parts.
  • Grouped Data Example: Values with assigned frequencies:
    • Four 1s, three 2s, two 3s, six 4s, eight 5s.
  • Position of Median:
    • Calculate the sum of frequencies: 23
    • Formula: (Sum of frequencies + 1) / 2 = 12th position
    • Line up values in ascending order; 12th position falls under value 4, making the median 4.

Mode

  • Def: Mode is the value that occurs most often.
  • Examples:
    • Small sample (10 students’ number of siblings): Mode is 3.
    • Frequency distribution: 5 has the highest frequency (8), so the mode is 5.

Stem and Leaf Displays

  • Advantage: Allows visibility of actual data.
  • Example: Stem and Leaf Display for data values
    • Find mode and median from display:
    • Median position: (Sum of frequencies + 1) / 2 = 11th position → Median is 37.
    • Mode: Highest frequency digit within grouped values (e.g., most frequent digits in the 10's places).

Symmetry in Data Sets

  • Symmetric Distribution: Pattern of frequencies is similar on the left and right of a central point.
  • Non-symmetric Distribution: Different patterns on the left and right.
    • Uniform: Values are uniformly distributed (e.g., same or nearly the same frequency).
    • Bimodal: Two peaks in the distribution, not necessarily equal.
    • Skewed Left: Tail extends left.
    • Skewed Right: Tail extends right.

Measures of Central Tendency: Summary

  • Mean: Sum of values divided by the number of values.
  • Median: Middle value when values are ordered; for even numbers, it’s the mean of two central values.
  • Mode: Value that occurs most frequently; some data sets may be bimodal or multimodal.

Measures of Dispersion

Range

  • Def: Difference between the maximum and minimum values in a data set.
  • Example: Set A (Range = 12); Set B (Range = 4)

Standard Deviation

  • Def: Measures data spread from the mean.
  • Calculation Steps: (Sample)
    1. Calculate mean.
    2. Find deviations from mean.
    3. Square each deviation.
    4. Sum squared deviations.
    5. Divide by (n-1) for samples.
    6. Take the square root.

Comparing Standard Deviations

  • Used to compare the spread or consistency between different data sets.
  • Example: Comparing sugar pack consistency from two companies using their mean and standard deviation values.

Coefficient of Variation

  • Def: Standard deviation expressed as a percentage of the mean.
  • Usage: Allows meaningful comparisons between different data sets by normalizing the spread.
  • Example: Comparing relative dispersions between two samples with different units.

Measures of Position

Z-Score

  • Def: Measures how many standard deviations a value is from the mean.
  • Calculation: Z = (X - mean) / standard deviation
  • Example: Comparing performance of students (Jen and Joy) in different settings using their Z-scores.

Percentiles

  • Def: Value below which a specified percent of observations fall.
  • Calculation: Percentile rank position = (Percentage * Total number of data) -> rounding rules apply.
  • Example: Finding the 40th percentile in a given test score distribution.

Additional Measures

  • Deciles, Quartiles: Will be covered in the next lecture.
  • Box Plots: Visualization tool for showing measures of position.

Summary

  • Measures of Central Tendency: Mean, Median, Mode.
  • Measures of Dispersion: Range, Standard Deviation, Coefficient of Variation.
  • Measures of Position: Z-scores, Percentiles, Deciles, Quartiles.
  • Understanding symmetry and shape of data distributions.