Statistics for Data Science - Lecture 1

May 30, 2024

Statistics for Data Science-1 Lecture Notes

Introduction

  • Course overview: Foundation course in statistics tailored for beginners.
  • Key Objectives: Learning to create, summarize data using graphical and numerical techniques; understanding uncertainty and probability; and focusing on applications over theory.
  • Target Audience: Anyone with class 10 level math.

Key Learning Objectives

  • Create and manipulate datasets.
  • Present and describe data using appropriate graphical and numerical techniques.
  • Understand uncertainty through probability and application of random variables.

Course Structure

  • Duration: 12 weeks, divided into 3 modules:
    • Basics of Data and Summarization (Weeks 1-4)
    • Introduction to Probability (Weeks 5-7)
    • Random Variables and Distributions (Weeks 8-12)

Week-by-Week Breakdown

Weeks 1-4: Basics of Data

  1. Week 1: Introduction to Data
    • Understanding data collection, variables, and observations.
    • Classifying data: Quantitative vs Qualitative, Numerical vs Categorical.
  2. Week 2: Categorizing and Summarizing Categorical Data
    • Framing questions and finding answers from data.
    • Using frequency tables and appropriate graphical techniques.
  3. Week 3: Summarizing Numerical Data
    • Numerical summaries like mean and variability.
    • Graphical summaries like histograms and box plots.
  4. Week 4: Associations Between Variables
    • Understanding relationships between variables using contingency tables and scatter plots.

Weeks 5-7: Introduction to Probability

  1. Week 5: Principles of Counting
    • Understanding permutations and combinations for real-life applications.
  2. Weeks 6-7: Basic Probability Concepts
    • Uncertainty in real life and introduction to set algebra.
    • Understanding simple/compound events, mutually exclusive events, and independent events.

Weeks 8-12: Random Variables and Distributions

  1. Weeks 8-10: Discrete Random Variables
    • Concept of random variables, expectation, and variance.
    • Binomial distribution and its applications (e.g., guessing in MCQ exams).
  2. Weeks 11-12: Continuous Random Variables
    • Concepts of continuous variables and probability density function.
    • Focus on normal distribution and empirical rule.

Summary and Expectations

  • By the end of the course, students should:
    • Understand and manipulate data sets.
    • Classify and summarize variables.
    • Formulate questions based on data and find appropriate summaries.
    • Understand basic probability and its applications.
    • Differentiate between and work with discrete and continuous random variables.

Conclusion

  • Course focuses on a practical understanding of statistics and its applications rather than theoretical proofs.
  • Students should gain a strong conceptual foundation in dealing with data and uncertainty.

Miscellaneous Information

Example discussed in the lecture

  • University admissions data:
    • Captured fields: Name, gender, DOB, class 10 and 12 marks, board, mobile number.
    • Questions that can be asked from such data sets include: proportion of female students, distribution by regions, average marks, etc.