📊

Understanding Normal Distribution and Histograms

Aug 29, 2024

Notes on Normal Distribution Lecture

Introduction to Data Analysis

  • Analyzed data on Golden Retriever weights.
  • Plotted data on a number line.
  • Constructed a histogram from the number line.

Understanding Histograms

  • A histogram divides data into equal-sized bins.
  • The height of each bar represents the count of items in each bin.
  • Too many bins can obscure meaningful patterns.
  • An optimal bin size reveals a bell-shaped curve (normal distribution).

Normal Distribution

  • Normal Distribution Formula:

    • Variables involved:
      • μ (mu): Mean (center of the bell curve).
      • σ (sigma): Standard Deviation.
      • Ï€: Constant (approximately 3.14).
      • e: Euler's number (approximately 2.718).
      • X: Variable input on the x-axis.
  • PDF (Probability Density Function):

    • Represents the likelihood of X values.
    • Mean (μ) = 64.53, Standard Deviation (σ) = 3.05.
    • Wider σ leads to a wider bell curve.
    • Moving μ shifts the curve left or right.

Properties of Normal Distribution

  • Area under the entire curve = 1.0.
  • Tails approach the x-axis but never reach it.
  • Finding likelihoods for specific ranges is more meaningful than for single values.
    • Example probabilities:
      • Area between 69 and 70: 0.03 (3%).
      • Area between 68 and 71: 0.11 (11%).
      • Area between 64 and 77: 0.57 (57%).

Cumulative Density Function (CDF)

  • CDF: Projects area from negative infinity to a given x value.
  • For X ≤ 70:
    • Area = 0.964.
  • To find area between two values (e.g., 65 and 70):
    • Calculate area up to 70 and area up to 65.
    • Area between 65 and 70 = Area(70) - Area(65) = 0.403.

Inverse CDF (PPF)

  • Inverse CDF (PPF): Allows for inverted lookups.
  • Useful for finding x value that corresponds to a specific area (e.g., 0.75).
  • Simply input the probability into the inverse CDF to get the x value.

Conclusion

  • Additional topics related to normal distribution:
    • Central Limit Theorem.
    • Hypothesis Testing.
    • Confidence Intervals.
  • Encouragement to subscribe for more content and reference to related books:
    • "Essential Math for Data Science"
    • "Getting Started with SQL"

Thank you for attending the lecture!