Fundamentals of Statistics and Data Analysis

Jul 31, 2024

Statistics Lecture Notes

Key Concepts

  • Mean
  • Median
  • Mode
  • Range
  • Quartiles and Interquartile Range (IQR)
  • Outliers
  • Box and Whisker Plot
  • Skewness
  • Dot Plot
  • Stem-and-Leaf Plot
  • Frequency Table
  • Histogram
  • Percentiles

Data Set Analysis

Example Data Set 1: {7, 7, 10, 14, 15, 23, 32}

  • Mean:
    • Sum: 108
    • Mean = 108 / 7 = 15.43
  • Median:
    • Middle number: 14
  • Mode:
    • Most frequent number: 7
  • Range:
    • Difference between highest and lowest number: 32 - 7 = 25

Example Data Set 2: {11, 15, 15, 21, 37, 41, 59, 59}

  • Mean:
    • Sum: 258
    • Mean = 258 / 8 = 32.25
  • Median:
    • Middle numbers: 21 and 37
    • Median = (21 + 37) / 2 = 29
  • Mode:
    • Numbers that appear most frequently: 15 and 59 (Bimodal)
  • Range:
    • Difference between highest and lowest number: 59 - 11 = 48

Quartiles and Interquartile Range (IQR)

  • Quartiles: Dividing a data set into four equal parts
    • Q1: Median of the lower half
    • Q2: Median of the entire data set
    • Q3: Median of the upper half
  • IQR: Difference between Q3 and Q1 (IQR = Q3 - Q1)
  • Determining Outliers:
    • Not an outlier if within range:
      • Q1 - 1.5 * IQR to Q3 + 1.5 * IQR

Example Data Set: {7, 11, 14, 5, 8, 27, 16, 10, 13, 17, 16}

  • Q1: 8
  • Q2: 13
  • Q3: 16
  • IQR: Q3 - Q1 = 16 - 8 = 8
  • Outliers:
    • Range: -4 to 28
    • 27 is not an outlier

Box and Whisker Plot

  • Components:
    • Minimum
    • Q1
    • Median (Q2)
    • Q3
    • Maximum
  • Outliers: Represented by points outside the main box

Skewness

  • Symmetric Distribution: Mean = Median
  • Skewed Right (Positive Skew):
    • Mean > Median
    • Box and whisker plot: Right side longer
  • Skewed Left (Negative Skew):
    • Mean < Median
    • Box and whisker plot: Left side longer

Dot Plot

  • Example: Data set {5, 8, 3, 7, 1, 5, 3, 2, 3, 3, 8, 5}
    • Dot plot shows frequency of each number
    • Mode is the number with the most dots

Stem-and-Leaf Plot

  • Construction:
    • Separate each number into a stem and leaf
    • Example: 78 -> Stem: 7, Leaf: 8
  • Key: Provides explanation of stem-leaf representation
  • Decimal Values Representation: Similar method, but stems can represent integer part and leaves decimal part

Frequency Table

  • Components:
    • Value
    • Frequency
    • Relative Frequency
    • Cumulative Relative Frequency
  • Example Calculation:
    • Data set: {5, 9, 8, 7, 8, 12, 9, 8, 10, 8, 9, 7}
    • Frequency of each value found
    • Relative Frequency: Frequency / Total number of data points
    • Cumulative Relative Frequency: Running total of the relative frequencies
  • Mean Calculation from Frequency Table: Sum of (value * frequency) / Total frequency

Histogram

  • Construction:
    • Bar graph representation of data
    • Bars are connected
    • Example: Test scores categorized by grade intervals

Percentiles

  • Calculation: Using cumulative relative frequency to determine percentile rank
  • Example:
    • Data set with frequency and cumulative relative frequency columns
    • 60th percentile: Average of values around 0.60 cumulative relative frequency
    • 80th percentile: Average of values around 0.80 cumulative relative frequency

Summary

  • Statistics Basics: Mean, median, mode, range, quartiles, IQR, outliers, skewness, dot plots, stem-and-leaf plots, frequency tables, histograms, percentiles
  • Importance: Understanding these concepts is crucial for analyzing and interpreting data effectively