-
Introduction to Data Science Statistics
- Differences between Descriptive and Inferential Statistics
-
Descriptive Statistics Topics
- Measure of Central Tendency
- Measure of Dispersions
- Data Summarizing Tools
- Histograms
- Box Plot
- Whisker Plot
-
Detailed Breakdown of Descriptive Statistics
- Histograms: Understanding PDF, CDF, and creation techniques.
- Probability & Permutations:
- Importance in Data Science
- Mean, Median, Mode
- Variance, Standard Deviation
- Distributions:
- Gaussian (Normal)
- Log-Normal
- Binomial
- Bernoulli
- Pareto (Power Law)
- Standard Normal
-
Techniques in Standard Normal Distribution
- Transformation
- Standardization
- QQ Plot
- Determining Normality of a Distribution
-
Inferential Statistics
- Hypothesis Testing: Null and Alternative Hypotheses
- P-Values
- Confidence Intervals
- Z-Test, T-Test, Chi-Square Test
- ANOVA (F-Test)
- Importance of Hypothesis Testing
- Defining P-Value and Z, T tables
-
Introduction to Statistics
- Definitions: Collection, Organization, and Analysis of Data
- Importance in Decision Making
- Data: Facts/Information that can be Measured
-
Types of Statistics: Descriptive vs. Inferential
- Descriptive Statistics: Organizing and Summarizing Data
- Example: Average age of students
- Inferential Statistics: Making Conclusions from Data
- Example: IQ of class vs. college
-
Population and Sample Concepts
- Population: Complete data set
- Sample: Subset of the population
- Sampling Techniques
- Simple Random Sampling
- Stratified Sampling
- Systematic Sampling
- Convenience Sampling
-
Variables
- Types
- Quantitative: Can be measured numerically (e.g., Age, Height, Weight)
- Qualitative (Categorical): Based on characteristics (e.g., Gender, Blood Group)
- Quantitative Variables
- Discrete: Whole numbers (e.g., Number of Bank Accounts)
- Continuous: Any value (e.g., Height, Weight)
-
Variable Measurement Levels
- Nominal: Categorical (e.g., Color, Gender)
- Ordinal: Rank-Ordered (e.g., Ranks)
- Interval: Order Matters, No Natural Zero (e.g., Temperature in Fahrenheit)
- Ratio: Interval with a Natural Zero (e.g., Height, Weight)
-
Frequency Distribution
- Organizing data into a frequency table
- Cumulative Frequency
- Example: Counting flowers types
-
**Bar Graph vs. Histogram
- Bar Graph: Discrete variables
- Histogram: Continuous variables
-
Probability and Sampling Techniques
- Determining the likelihood of events
- Various Sampling Techniques
- Simple Random Sampling
- Stratified Sampling
- Systematic Sampling
- Convenience Sampling
-
Central Measure of Tendency: Mean, Median, Mode
- How to compute and their significance in presence of outliers
- When to use each measure
-
Measure of Dispersion: Variance and Standard Deviation
- Concept and importance of spread
- Calculations involving Population and Sample variance
-
Percentiles and Quartiles for Outlier Detection
- Examples of outlier detection
- Calculation of Percentiles, Interquartile Range (IQR), and Fences
-
Hypothesis Testing and Types of Errors
- Type 1 and Type 2 Errors
- One-Tailed and Two-Tailed Tests
-
Confusion Matrix
- True Positive, True Negative, False Positive, False Negative
-
Distributions and Tests with Examples
- Binomial, Bernoulli, Poisson Behavior, and more
- Central Limit Theorem
- Practical examples and calculations
- Z Test, T Test, Chi-Square Test, and ANOVA Test
-
Covariance and Correlation
- Definitions and formulas
- Real-world examples
- Calculating covariance and correlation using Python
-
Tools and Implementations in Python
- Google Collab for Z Test, T Test, Chi-Square Test
- Libraries: Pandas, Seaborn, Numpy**