📊

Comprehensive AP Statistics Lecture Notes

May 8, 2025

AP Statistics Unit 1-9 Lecture Notes

Unit 1: Introduction to Statistics

Data Types

  • Quantitative Data:
    • Deals with numbers (e.g., heights, class size, population).
    • Represented numerically.
  • Categorical Data:
    • Deals with names and labels (e.g., eye color, hair color).
    • Cannot be numerically represented.
    • Represented with two-way tables.

Two-Way Tables

  • Used to represent categorical data.
  • Shows intersections between variables.
  • Marginal Relative Frequency: Percentage of data in a single row or column vs total.
  • Joint Relative Frequency: Percentage of data in a single group vs total.
  • Conditional Relative Frequencies: Percentage in a single category given a specific group.

Quantitative Data

  • Described using the acronym "C-SOCS":
    • Context
    • Shape: Symmetrical, skewed, number of peaks.
    • Outliers: Identify any extreme data points.
    • Center: Mean or median.
    • Spread: Range, standard deviation, IQR.

Basic Statistical Terms

  • Mean: Average of all values.
  • Standard Deviation: Measure of variation.
  • Median: 50th percentile.
  • Range: Difference between max and min values.

Box Plots

  • Use the five-number summary: Minimum, Q1, Median, Q3, Maximum.
  • IQR (Interquartile Range): Q3 - Q1.
  • Identifies low-end and high-end outliers.

Percentiles and Frequency

  • Percentile: Percentage of values below a certain point.
  • Cumulative Relative Frequency: Cumulative percentages from intervals.

Z-scores

  • Measures the number of standard deviations a value is from the mean.

Data Transformation

  • Addition/Subtraction: Shape and variability remain the same; center changes.
  • Multiplication/Division: Center and variability change proportionally.

Density Curves and Normal Distribution

  • Density Curve: Shows probability distribution with an area of one.
  • Normal Distribution: Follows the 68-95-99.7 rule.

Calculator Commands for Normal Distribution

  • normPDF: Finds probability at a specific value.
  • normCDF: Probability between an interval.
  • Inverse Normal: Finds value for a given percentile.

Normal Probability Plot

  • Plots actual vs theoretical Z-values to check normality.

Unit 2: Correlation and Regression

Scatter Plots and Correlation

  • Acronym SEED:
    • State in context.
    • Examine direction (positive/negative).
    • Outliers noted.
    • Form (linear/nonlinear).
    • Strength of correlation.

R Value (Correlation Coefficient)

  • Ranges from -1 to 1.
  • Closer to -1 or 1 indicates stronger correlation.

Regression Lines

  • Best Fit Line (y-hat): Predicts values based on the slope and intercept.
  • Residuals: Difference between actual and predicted values.

Least Squares Regression Line

  • Minimizes sum of squared residuals.
  • S Value: Average distance predicted values are from LSR.
  • R-square Value: Coefficient of determination.

Impact of Outliers

  • Affects slope and y-intercept of regression lines.

Residual Plots

  • Used to determine if linear functions are best.
  • Clear pattern implies linear function may not be best.

Unit 3: Sampling Methods

Common Sampling Methods

  • Simple Random Sample (SRS): Equal chance for all.
  • Stratified Random Sample: Divides population by shared traits.
  • Cluster Sample: Divides into clusters, samples entire clusters.
  • Systematic Random Sample: Regular intervals with a random start.

Poor Sampling Methods

  • Convenience Sample: Chooses easy-to-reach individuals.
  • Voluntary Response Sampling: Participants choose to join.

Shortcomings and Biases

  • Undercoverage: Leaving out groups.
  • Non-response Bias: Selected individuals don't respond.
  • Response Bias: False or misleading answers.
  • Wording Bias: Poorly phrased questions.

Observational Studies vs. Experiments

  • Observational Study: Observes without influencing subjects.
  • Experiment: Manipulates variables to measure effects.

Principles of Experimental Design

  • Comparison, Random Assignment, Control, Replication.

Terminology

  • Factors, Levels, Confounding.
  • Placebo: Fake treatment.
  • Blinding: Single and Double Blind.

Experimental Designs

  • Randomized Block Design: Divides subjects into blocks and assigns treatments.
  • Matched Pairs Design: Pairs subjects based on characteristics.

Unit 4: Probability and Random Variables

Probability Basics

  • Probability ranges from 0 to 1.
  • Long-term vs. short-term unpredictability.

Simulation

  • Uses models to mimic real-world events.
  • Four-step process: Define problem, model, perform, estimate probability.

Probability Rules

  • Mutually Exclusive: Events can't occur together.
  • Independence: Outcome of one doesn’t affect the other.
  • Addition Rule: For mutually exclusive and non-mutually exclusive events.
  • Multiplication Rule: For independent and non-independent events.
  • Complement Rule: Probability of not occurring is 1 minus probability of occurring.

Visualization of Probability

  • Venn diagrams, two-way tables, probability trees.

Random Variables

  • Discrete: Countable values.
  • Continuous: Any value within a range.
  • Binomial: Fixed number of trials.
  • Geometric: Trials until the first success.

Mean and Standard Deviation of Random Variables

  • Use calculator for one-variable stats.
  • Discrete example calculation.

Transforming Probability Distributions

  • Addition/Subtraction: Shape unchanged, center changes.
  • Multiplication/Division: Shape unchanged, center and variability change.

Binomial Random Variables

  • Acronym BINS: Binary, Independent, Number of trials, Set probability.

Calculator Commands for Binomial Random Variables

  • binomialPDF: Probability of exactly x successes.
  • binomialCDF: Cumulative probability up to x.

Geometric Random Variables

  • Models number of trials until first success.
  • Calculator commands: geometricPDF, geometricCDF.

Unit 5: Sampling Distributions

Statistic vs. Parameter

  • Statistic: From a sample.
  • Parameter: From a population.

Sampling Distribution

  • Probability distribution obtained through repeated sampling.
  • Unbiased estimator: Statistic's distribution equals parameter.

Proportions

  • Repeated samplings, plotting sample proportions.
  • Conditions: Random sampling, 10% condition, large counts.

Calculating Z-scores

  • Used to find probabilities of sample proportions.

Central Limit Theorem

  • Sampling distribution is normal if sample size ≥ 30.

Unit 6: Inference for Proportions

Point Estimate

  • Statistic estimating population parameter.

Confidence Interval vs. Confidence Level

  • Interval: Range of values estimating parameter.
  • Level: Probability parameter falls within a range.

Confidence Interval Interpretation

  • We are X% confident the interval from Y to Z captures the true parameter in context.

Factors Affecting Confidence Interval

  • Higher confidence level = wider interval.
  • Larger sample size = narrower interval.
  • Bias does not affect margin of error.

Panic for Confidence Interval

  • Parameter, Assumptions, Name of test, Interval, Conclusion.

Confidence Interval Calculation

  • Use calculator or formula.

Significance Tests

  • Phantoms:
    • Parameter, Hypothesis, Assumptions, Name of test, Test statistic, Obtain p-value, Make decision, State conclusion.

Type I & II Errors

  • Type I: False positive; null hypothesis is true but rejected.
  • Type II: False negative; null hypothesis is false but not rejected.
  • Power: Probability of correctly rejecting a false null hypothesis.

Unit 7: Inference for Means

Key Differences

  • Use T instead of Z for mean.
  • Degrees of freedom: n - 1.

Panic Procedure

  • Same as proportions but with means.

Interpretation

  • Similar to proportions but refers to means.

Significance Tests

  • Similar to proportions but for means.

Unit 8: Chi-Squared Tests

Chi-Squared Tests Overview

  • Tests for goodness of fit, homogeneity, and association/independence.

Goodness of Fit

  • Compares observed distribution to expected.
  • No parameter/sample statistic.

Homogeneity

  • Compares distributions across multiple groups for a single variable.

Association/Independence

  • Tests for association between two variables in one sample.

Unit 9: Inference for Slope

Confidence Intervals and Significance Tests

  • For true population slope.
  • Check assumptions for linearity, independence, equal variance, normality.

Degrees of Freedom

  • n - 2 for slope inference.

Conclusion

  • Always in context.

These notes provide a comprehensive review of the major concepts across various units in AP Statistics, serving as a helpful study guide for the course.