AP Statistics Unit 1-9 Lecture Notes
Unit 1: Introduction to Statistics
Data Types
- Quantitative Data:
- Deals with numbers (e.g., heights, class size, population).
- Represented numerically.
- Categorical Data:
- Deals with names and labels (e.g., eye color, hair color).
- Cannot be numerically represented.
- Represented with two-way tables.
Two-Way Tables
- Used to represent categorical data.
- Shows intersections between variables.
- Marginal Relative Frequency: Percentage of data in a single row or column vs total.
- Joint Relative Frequency: Percentage of data in a single group vs total.
- Conditional Relative Frequencies: Percentage in a single category given a specific group.
Quantitative Data
- Described using the acronym "C-SOCS":
- Context
- Shape: Symmetrical, skewed, number of peaks.
- Outliers: Identify any extreme data points.
- Center: Mean or median.
- Spread: Range, standard deviation, IQR.
Basic Statistical Terms
- Mean: Average of all values.
- Standard Deviation: Measure of variation.
- Median: 50th percentile.
- Range: Difference between max and min values.
Box Plots
- Use the five-number summary: Minimum, Q1, Median, Q3, Maximum.
- IQR (Interquartile Range): Q3 - Q1.
- Identifies low-end and high-end outliers.
Percentiles and Frequency
- Percentile: Percentage of values below a certain point.
- Cumulative Relative Frequency: Cumulative percentages from intervals.
Z-scores
- Measures the number of standard deviations a value is from the mean.
Data Transformation
- Addition/Subtraction: Shape and variability remain the same; center changes.
- Multiplication/Division: Center and variability change proportionally.
Density Curves and Normal Distribution
- Density Curve: Shows probability distribution with an area of one.
- Normal Distribution: Follows the 68-95-99.7 rule.
Calculator Commands for Normal Distribution
- normPDF: Finds probability at a specific value.
- normCDF: Probability between an interval.
- Inverse Normal: Finds value for a given percentile.
Normal Probability Plot
- Plots actual vs theoretical Z-values to check normality.
Unit 2: Correlation and Regression
Scatter Plots and Correlation
- Acronym SEED:
- State in context.
- Examine direction (positive/negative).
- Outliers noted.
- Form (linear/nonlinear).
- Strength of correlation.
R Value (Correlation Coefficient)
- Ranges from -1 to 1.
- Closer to -1 or 1 indicates stronger correlation.
Regression Lines
- Best Fit Line (y-hat): Predicts values based on the slope and intercept.
- Residuals: Difference between actual and predicted values.
Least Squares Regression Line
- Minimizes sum of squared residuals.
- S Value: Average distance predicted values are from LSR.
- R-square Value: Coefficient of determination.
Impact of Outliers
- Affects slope and y-intercept of regression lines.
Residual Plots
- Used to determine if linear functions are best.
- Clear pattern implies linear function may not be best.
Unit 3: Sampling Methods
Common Sampling Methods
- Simple Random Sample (SRS): Equal chance for all.
- Stratified Random Sample: Divides population by shared traits.
- Cluster Sample: Divides into clusters, samples entire clusters.
- Systematic Random Sample: Regular intervals with a random start.
Poor Sampling Methods
- Convenience Sample: Chooses easy-to-reach individuals.
- Voluntary Response Sampling: Participants choose to join.
Shortcomings and Biases
- Undercoverage: Leaving out groups.
- Non-response Bias: Selected individuals don't respond.
- Response Bias: False or misleading answers.
- Wording Bias: Poorly phrased questions.
Observational Studies vs. Experiments
- Observational Study: Observes without influencing subjects.
- Experiment: Manipulates variables to measure effects.
Principles of Experimental Design
- Comparison, Random Assignment, Control, Replication.
Terminology
- Factors, Levels, Confounding.
- Placebo: Fake treatment.
- Blinding: Single and Double Blind.
Experimental Designs
- Randomized Block Design: Divides subjects into blocks and assigns treatments.
- Matched Pairs Design: Pairs subjects based on characteristics.
Unit 4: Probability and Random Variables
Probability Basics
- Probability ranges from 0 to 1.
- Long-term vs. short-term unpredictability.
Simulation
- Uses models to mimic real-world events.
- Four-step process: Define problem, model, perform, estimate probability.
Probability Rules
- Mutually Exclusive: Events can't occur together.
- Independence: Outcome of one doesn’t affect the other.
- Addition Rule: For mutually exclusive and non-mutually exclusive events.
- Multiplication Rule: For independent and non-independent events.
- Complement Rule: Probability of not occurring is 1 minus probability of occurring.
Visualization of Probability
- Venn diagrams, two-way tables, probability trees.
Random Variables
- Discrete: Countable values.
- Continuous: Any value within a range.
- Binomial: Fixed number of trials.
- Geometric: Trials until the first success.
Mean and Standard Deviation of Random Variables
- Use calculator for one-variable stats.
- Discrete example calculation.
Transforming Probability Distributions
- Addition/Subtraction: Shape unchanged, center changes.
- Multiplication/Division: Shape unchanged, center and variability change.
Binomial Random Variables
- Acronym BINS: Binary, Independent, Number of trials, Set probability.
Calculator Commands for Binomial Random Variables
- binomialPDF: Probability of exactly x successes.
- binomialCDF: Cumulative probability up to x.
Geometric Random Variables
- Models number of trials until first success.
- Calculator commands: geometricPDF, geometricCDF.
Unit 5: Sampling Distributions
Statistic vs. Parameter
- Statistic: From a sample.
- Parameter: From a population.
Sampling Distribution
- Probability distribution obtained through repeated sampling.
- Unbiased estimator: Statistic's distribution equals parameter.
Proportions
- Repeated samplings, plotting sample proportions.
- Conditions: Random sampling, 10% condition, large counts.
Calculating Z-scores
- Used to find probabilities of sample proportions.
Central Limit Theorem
- Sampling distribution is normal if sample size ≥ 30.
Unit 6: Inference for Proportions
Point Estimate
- Statistic estimating population parameter.
Confidence Interval vs. Confidence Level
- Interval: Range of values estimating parameter.
- Level: Probability parameter falls within a range.
Confidence Interval Interpretation
- We are X% confident the interval from Y to Z captures the true parameter in context.
Factors Affecting Confidence Interval
- Higher confidence level = wider interval.
- Larger sample size = narrower interval.
- Bias does not affect margin of error.
Panic for Confidence Interval
- Parameter, Assumptions, Name of test, Interval, Conclusion.
Confidence Interval Calculation
- Use calculator or formula.
Significance Tests
- Phantoms:
- Parameter, Hypothesis, Assumptions, Name of test, Test statistic, Obtain p-value, Make decision, State conclusion.
Type I & II Errors
- Type I: False positive; null hypothesis is true but rejected.
- Type II: False negative; null hypothesis is false but not rejected.
- Power: Probability of correctly rejecting a false null hypothesis.
Unit 7: Inference for Means
Key Differences
- Use T instead of Z for mean.
- Degrees of freedom: n - 1.
Panic Procedure
- Same as proportions but with means.
Interpretation
- Similar to proportions but refers to means.
Significance Tests
- Similar to proportions but for means.
Unit 8: Chi-Squared Tests
Chi-Squared Tests Overview
- Tests for goodness of fit, homogeneity, and association/independence.
Goodness of Fit
- Compares observed distribution to expected.
- No parameter/sample statistic.
Homogeneity
- Compares distributions across multiple groups for a single variable.
Association/Independence
- Tests for association between two variables in one sample.
Unit 9: Inference for Slope
Confidence Intervals and Significance Tests
- For true population slope.
- Check assumptions for linearity, independence, equal variance, normality.
Degrees of Freedom
- n - 2 for slope inference.
Conclusion
These notes provide a comprehensive review of the major concepts across various units in AP Statistics, serving as a helpful study guide for the course.