A-Level Statistics - Part 3 with Mr. Bison

Jun 13, 2024

Lecture Notes: A-Level Statistics - Mr. Bison's Stats Series Part 3

Introduction

  • Welcome to the third part of the A-Level maths series focused on statistics by Mr. Bison.
  • Previous parts: Part 1 (Pure Maths), Part 2 (Mechanics).
  • Importance: Students achieve varying marks—those who memorize formulas often perform better.

Data Collection

Key Terms:

  • Census: Measures every member of a population.
    • Advantage: Accurate results.
    • Disadvantage: Expensive, time-consuming, may destroy testing units (e.g., avocados).
  • Sampling Units: Individuals or items being sampled.
  • Sampling Frame: List of sampling units.

Types of Sampling:

  1. *Random Sampling:
    • *Simple Random Sampling:
      • Equal chance for all units.
      • Methods: Random number generator, lottery sampling.
      • Advantage: Bias-free.
      • Disadvantage: Requires a sampling frame.
    • Systematic Sampling:
      • Select every kth unit (formula: k = population size/sample size).
      • Must start with a random number between 1 and k.
      • Advantage: Quick.
      • Disadvantage: Requires a sampling frame.
    • Stratified Sampling:
      • Sample reflects groups (strata) of a population.
      • Method: Proportional sampling within strata.
      • Advantage: Reflects population accurately.
      • Disadvantage: Strata must be clearly defined.

Non-Random Sampling:

  1. Quota Sampling:
    • Stratified sampling but not random.
    • Advantage: No need for a sampling frame.
    • Disadvantage: Potential bias.
  2. Opportunity Sampling:
    • Selecting individuals available at the time (e.g., outside supermarkets).
    • Advantage: Easy, cheap.
    • Disadvantage: Unlikely to be representative.

Types of Data:

  • Qualitative: Non-numerical (e.g., color).
  • Quantitative: Numerical data.
    • Discrete: Fixed values.
    • Continuous: Any value within a range.

Large Data Set

Key Points:

  • UK Weather Stations: Camborne, Heathrow, Hearn, Leeming, Leuchars.
    • Coastal stations: Windy, rainy.
    • Southern stations: Warmer, more sunshine.
  • Recorded data: May to October (1987, 2015).
  • International Stations: Perth (Australia), Beijing (China), Jacksonville (Florida, USA).
    • Notable events: Hurricanes in Jacksonville, great storm in the UK (Oct 1987).
  • Data Types:
    • TR: Less than 0.05mm rain. Treated as zero.
    • NA: Reading not available, problematic for sampling.
    • Cloud cover: Measured in oktas (discrete, 0-8).
    • Max gust: Strongest wind speed in knots. 1 knot = 1.15 mph.

Measures of Central Tendency and Spread

Calculation Formulas:

  • Mean (x̄): Σx/N or Σfx/Σf for grouped data.
  • Quartiles & Percentiles:
    • Lower quartile (Q1): n/4
    • Median (Q2): n/2
    • Upper quartile (Q3): 3n/4
  • Linear Interpolation: Required for grouped data.
  • Variance: Mean of squares minus square of the means.
    • For groups: Σfx²/Σf - (Σfx/Σf)²
    • Standard Deviation: Square root of variance.
  • Coding:
    • y = ax + b
    • Mean (ỹ) = a(x̄) + b
    • Standard Deviation (σy) = a(σx)

Data Representation

Diagrams:

  • Cumulative Frequency and Box Plots: Used to visualize data distribution.
  • Histograms: Continuous data representation.
    • Frequency density = Frequency / Class width.
  • Comparison: One measure of location & one measure of spread.

Regression and Correlation

Key Concepts:

  • Product Moment Correlation Coefficient (PMCC): Measures strength and direction (-1 to 1).
  • Regression Line (y = a + bx): Line of best fit.
    • a: y-intercept (value of y when x=0).
    • b: Slope (change in y per unit change in x).
  • Interpolation vs. Extrapolation:
    • Interpolation is reliable; estimating within range.
    • Extrapolation is less reliable; estimating outside range.

Probability

Venn Diagrams:

  • Union (A ∪ B): Shade all parts.
  • Intersection (A ∩ B): Shade overlap. Tree Diagrams:
  • Multiply probabilities along branches.

Key Formulas:

  • Mutual Exclusivity: P(A ∩ B) = 0
    • P(A ∪ B) = P(A) + P(B)
  • Independence:
    • P(A ∩ B) = P(A) * P(B)
    • P(A|B) = P(A)
  • Conditional Probability:
    • P(B|A) = P(A ∩ B) / P(A)
  • Addition Law: P(A ∪ B) = P(A) + P(B) - P(A ∩ B)

Discrete and Continuous Distributions

Discrete Uniform Distribution:

  • Equal probability for all outcomes.

Binomial Distribution:

  • X ~ B(n, p) (number of trials = n, probability = p).
  • Conditions:
    • Fixed number of trials.
    • Fixed probability of success.
    • Trials are independent.
    • Two outcomes (success/failure).
  • Cumulative Probability: Use tables or calculators.
    • P(X ≤ k) / P(X ≥ k)
    • P(a ≤ X ≤ b)

Normal Distribution:

  • Y ~ N(μ, σ^2) (mean = μ, variance = σ^2).
  • Standard Normal Distribution (Z):
    • Z ~ N(0, 1)
    • Z = (Y - μ) / σ
  • Approximation of Binomial: If n is large and p ≈ 0.5.
    • Mean (μ) = np
    • Std. Dev (σ) = √(np(1 - p))
  • Continuity Corrections: Discrete to continuous adjustments.
    • E.g., P(X > 5) becomes P(Y > 4.5)

Hypothesis Testing

Definitions:

  • Null Hypothesis (H₀): Assumed true.
  • Alternative Hypothesis (H₁): True if H₀ is incorrect.
  • Significance Level (α): Threshold for rejecting H₀.

Testing Types:

  • One-tailed: H₁ states > or <
  • Two-tailed: H₁ states ≠
    • Split significance level.

Correlation Testing:

  • Null: No correlation (ρ = 0).
  • Alternative: Positive/Negative/Some correlation.
  • Compare r with critical value.

Binomial Testing:

  • H₀: p = k
  • H₁: p > k or p < k (one-tailed); p ≠ k (two-tailed)
  • Calculate probability, compare with α.

Normal Testing of Sample Mean:

  • Sample mean (x̄) ~ Normal(μ, σ²/n)
  • H₀: μ = k
  • H₁: μ > k or μ < k
  • Probability comparison with α.

Conclusion

  • Comprehensive memorization aids in high performance.
  • Explore other resources on the Bison Maths channel for more insights.