Lecture Notes: A-Level Statistics - Mr. Bison's Stats Series Part 3
Introduction
- Welcome to the third part of the A-Level maths series focused on statistics by Mr. Bison.
- Previous parts: Part 1 (Pure Maths), Part 2 (Mechanics).
- Importance: Students achieve varying marks—those who memorize formulas often perform better.
Data Collection
Key Terms:
- Census: Measures every member of a population.
- Advantage: Accurate results.
- Disadvantage: Expensive, time-consuming, may destroy testing units (e.g., avocados).
- Sampling Units: Individuals or items being sampled.
- Sampling Frame: List of sampling units.
Types of Sampling:
- *Random Sampling:
- *Simple Random Sampling:
- Equal chance for all units.
- Methods: Random number generator, lottery sampling.
- Advantage: Bias-free.
- Disadvantage: Requires a sampling frame.
- Systematic Sampling:
- Select every kth unit (formula: k = population size/sample size).
- Must start with a random number between 1 and k.
- Advantage: Quick.
- Disadvantage: Requires a sampling frame.
- Stratified Sampling:
- Sample reflects groups (strata) of a population.
- Method: Proportional sampling within strata.
- Advantage: Reflects population accurately.
- Disadvantage: Strata must be clearly defined.
Non-Random Sampling:
- Quota Sampling:
- Stratified sampling but not random.
- Advantage: No need for a sampling frame.
- Disadvantage: Potential bias.
- Opportunity Sampling:
- Selecting individuals available at the time (e.g., outside supermarkets).
- Advantage: Easy, cheap.
- Disadvantage: Unlikely to be representative.
Types of Data:
- Qualitative: Non-numerical (e.g., color).
- Quantitative: Numerical data.
- Discrete: Fixed values.
- Continuous: Any value within a range.
Large Data Set
Key Points:
- UK Weather Stations: Camborne, Heathrow, Hearn, Leeming, Leuchars.
- Coastal stations: Windy, rainy.
- Southern stations: Warmer, more sunshine.
- Recorded data: May to October (1987, 2015).
- International Stations: Perth (Australia), Beijing (China), Jacksonville (Florida, USA).
- Notable events: Hurricanes in Jacksonville, great storm in the UK (Oct 1987).
- Data Types:
- TR: Less than 0.05mm rain. Treated as zero.
- NA: Reading not available, problematic for sampling.
- Cloud cover: Measured in oktas (discrete, 0-8).
- Max gust: Strongest wind speed in knots. 1 knot = 1.15 mph.
Measures of Central Tendency and Spread
Calculation Formulas:
- Mean (x̄): Σx/N or Σfx/Σf for grouped data.
- Quartiles & Percentiles:
- Lower quartile (Q1): n/4
- Median (Q2): n/2
- Upper quartile (Q3): 3n/4
- Linear Interpolation: Required for grouped data.
- Variance: Mean of squares minus square of the means.
- For groups: Σfx²/Σf - (Σfx/Σf)²
- Standard Deviation: Square root of variance.
- Coding:
- y = ax + b
- Mean (ỹ) = a(x̄) + b
- Standard Deviation (σy) = a(σx)
Data Representation
Diagrams:
- Cumulative Frequency and Box Plots: Used to visualize data distribution.
- Histograms: Continuous data representation.
- Frequency density = Frequency / Class width.
- Comparison: One measure of location & one measure of spread.
Regression and Correlation
Key Concepts:
- Product Moment Correlation Coefficient (PMCC): Measures strength and direction (-1 to 1).
- Regression Line (y = a + bx): Line of best fit.
- a: y-intercept (value of y when x=0).
- b: Slope (change in y per unit change in x).
- Interpolation vs. Extrapolation:
- Interpolation is reliable; estimating within range.
- Extrapolation is less reliable; estimating outside range.
Probability
Venn Diagrams:
- Union (A ∪ B): Shade all parts.
- Intersection (A ∩ B): Shade overlap.
Tree Diagrams:
- Multiply probabilities along branches.
Key Formulas:
- Mutual Exclusivity: P(A ∩ B) = 0
- Independence:
- P(A ∩ B) = P(A) * P(B)
- P(A|B) = P(A)
- Conditional Probability:
- Addition Law: P(A ∪ B) = P(A) + P(B) - P(A ∩ B)
Discrete and Continuous Distributions
Discrete Uniform Distribution:
- Equal probability for all outcomes.
Binomial Distribution:
- X ~ B(n, p) (number of trials = n, probability = p).
- Conditions:
- Fixed number of trials.
- Fixed probability of success.
- Trials are independent.
- Two outcomes (success/failure).
- Cumulative Probability: Use tables or calculators.
- P(X ≤ k) / P(X ≥ k)
- P(a ≤ X ≤ b)
Normal Distribution:
- Y ~ N(μ, σ^2) (mean = μ, variance = σ^2).
- Standard Normal Distribution (Z):
- Z ~ N(0, 1)
- Z = (Y - μ) / σ
- Approximation of Binomial: If n is large and p ≈ 0.5.
- Mean (μ) = np
- Std. Dev (σ) = √(np(1 - p))
- Continuity Corrections: Discrete to continuous adjustments.
- E.g., P(X > 5) becomes P(Y > 4.5)
Hypothesis Testing
Definitions:
- Null Hypothesis (H₀): Assumed true.
- Alternative Hypothesis (H₁): True if H₀ is incorrect.
- Significance Level (α): Threshold for rejecting H₀.
Testing Types:
- One-tailed: H₁ states > or <
- Two-tailed: H₁ states ≠
- Split significance level.
Correlation Testing:
- Null: No correlation (ρ = 0).
- Alternative: Positive/Negative/Some correlation.
- Compare r with critical value.
Binomial Testing:
- H₀: p = k
- H₁: p > k or p < k (one-tailed); p ≠ k (two-tailed)
- Calculate probability, compare with α.
Normal Testing of Sample Mean:
- Sample mean (x̄) ~ Normal(μ, σ²/n)
- H₀: μ = k
- H₁: μ > k or μ < k
- Probability comparison with α.
Conclusion
- Comprehensive memorization aids in high performance.
- Explore other resources on the Bison Maths channel for more insights.