Overview
This lecture explains how to calculate and interpret variance, standard deviation, typical values, and outliers in a dataset with a normal (bell-shaped) distribution.
Variance and Standard Deviation
- Variance is calculated as the average of squared deviations from the mean and is the square of the standard deviation.
- Standard deviation is the square root of the variance and represents average distance from the mean.
- In the brick weight data example, the standard deviation was approximately 14.7 kilograms.
- Standard deviation is rounded to one more decimal place than the original data.
Identifying Typical Values
- Typical values fall within one standard deviation above or below the mean.
- For normal data, about 68% of values are within one standard deviation from the mean.
- Calculation: Mean ± Standard Deviation (e.g., 36.1 kg ± 14.7 kg yields 21.4 kg to 50.8 kg for typical brick weights).
- Visually, these are the middle data points in the plot.
Identifying Outliers
- Outliers are values more than two standard deviations away from the mean.
- High outliers: ā„ Mean + 2 Ć Standard Deviation (top 2.5%).
- Low outliers: ⤠Mean ā 2 Ć Standard Deviation (bottom 2.5%).
- In the example, cutoffs were 65.5 kg (high) and 6.7 kg (low).
- Bricks weighing 70 kg (high) and 3 kg (low) were considered outliers.
Data Distribution Zones
- Data is not just typical values and outliers; there is a "could happen" zone between them.
- Most data (middle 68%) is typical; outliers are only a small portion at the extremes.
- Not all values are typical or outliers.
Data Analysis Report Summary
- The dataset was normal (bell-shaped) with a mean of 36.1 kg and a standard deviation of 14.7 kg.
- Typical values ranged from 21.4 kg to 50.8 kg.
- There were two outliers: 70 kg (high) and 3 kg (low).
Key Terms & Definitions
- Variance ā The average of squared differences from the mean; symbolized as s².
- Standard Deviation ā The square root of variance; measures spread from the mean.
- Mean ā The average of all values in a dataset.
- Outlier ā A value more than two standard deviations from the mean.
- Typical Value ā A value within one standard deviation of the mean.
Action Items / Next Steps
- Practice calculating variance and standard deviation with sample data.
- Identify typical values and outliers in a given dataset.
- Prepare for discussion on the empirical rule (68-95-99.7%) in future lectures.