📊

Understanding One-Variable Data Analysis

May 7, 2025

Unit 1: Exploring One-Variable Data

Categorical Variables

  • Definition: Take on values that are category names or group labels.
  • Representation:
    • Frequency tables or relative frequency tables.
    • Graphical displays: Bar graphs, dot plots, pie charts.
  • Example 1.1: Survey of 2,000 parents about desired school year length:
    • 180 days: 1,100 parents (55%).
    • 160 days: 300 parents (15%).
    • 200 days: 500 parents (25%).
    • No Opinion: 100 parents (5%).

Quantitative Variables

  • Definition: Take on numerical values for measured or counted quantities.
  • Categories:
    • Discrete: Finite or countable values with gaps.
    • Continuous: Infinite values with no gaps.
  • Graphical Representation: Dotplots, histograms, stemplots, cumulative plots, boxplots.
  • Example 1.2: AP classes taken by 2,200 seniors:
    • 0 classes: 400 seniors (18%).
    • 1 class: 500 seniors (23%).
    • 2 classes: 900 seniors (41%).
    • 3 classes: 300 seniors (14%).
    • 4 classes: 100 seniors (5%).

Describing Quantitative Data Distribution

  • Center: Separates values roughly in half.
  • Spread: Range of values.
  • Patterns:
    • Clusters: Natural subgroups.
    • Gaps: Areas with no values.
  • Shape:
    • Unimodal: Single peak.
    • Bimodal: Two peaks.
    • Symmetric: Mirror images.
    • Skewed: Asymmetric distribution.
    • Bell-shaped: Symmetric with center mound.
    • Uniform: Even distribution.
  • Example 1.3: Hodgkins lymphoma age distribution shows bimodal pattern.

Summary Statistics for Quantitative Variables

  • Descriptive Statistics: Summarization of data.
  • Inferential Statistics: Inferences from data.
  • Average: Representative value (mean or median).
    • Median: Middle value of ordered set.
    • Mean: Sum of values divided by count.
  • Variability:
    • Range: Difference between max and min.
    • Interquartile Range (IQR): Middle 50% range.
    • Variance: Average of squared differences from mean.
    • Standard Deviation: Square root of variance.
  • Example 1.4 & 1.5: Home run distances and salaries illustrating mean and variability.

Graphical Representations

  • Skewness: Mean vs. median indicates skewness.
  • Example 1.7: Faculty salaries suggesting right skew.

Comparing Distributions

  • Methods: Back-to-back stemplots, side-by-side histograms, parallel boxplots, cumulative plots.
  • Example Comparisons:
    • Example 1.9: NBA team wins via stemplot.
    • Example 1.10: Student sleep hours via histograms.
    • Example 1.11: Stock price trends via boxplots.
    • Example 1.12: Population age comparisons via cumulative frequency plots.

The Normal Distribution

  • Characteristics: Symmetric, bell-shaped.
  • Mean = Median: Centered distribution.
  • Empirical Rule: 68-95-99.7% rule for normal distributions.
  • Example 1.13: Taxicab mileage under normal distribution.