Lecture Notes on Data Variation and Standard Deviation

Introduction to Variation

Variation in a data set refers to how spread out the data is.
It's important to understand that the mean is not the only measurement needed for understanding data.
Example: Bank line wait times from three banks demonstrate different data spreads despite having the same mean wait time.

Range
- Calculated as the highest value minus the lowest value.
- Pros: Easiest to calculate.
- Cons: Only considers two data points, doesn't represent the entire data set well.
Standard Deviation
- More important and useful than range.
- Measures the average distance of data values from the mean.
- Properties:
  - Never negative.
  - Only zero if all values are the same.
  - Affected by outliers.
  - Symbol for sample standard deviation: S.

Formula for sample standard deviation involves:
- Finding the distance from each data value to the mean.
- Squaring the distances to keep them positive.
- Finding the average of these squared distances.
- Taking the square root of the average.
Two Formulas
1. Using mean calculation.
2. Without mean calculation (saves steps in large data sets).

Population standard deviation uses the Greek letter σ and divides by N (total number of data points).
Sample standard deviation uses S and divides by n-1.
Different calculations for sample and population due to sample's estimation error.

Works with normally distributed data.
68-95-99.7 rule:
- 68% of data falls within 1 standard deviation from the mean.
- 95% within 2 standard deviations.
- 99.7% within 3 standard deviations.
Determines usual vs unusual data.
Data value within two standard deviations is usual, outside is unusual.

Understanding variation and standard deviation is fundamental in statistics.
These concepts aid in hypothesis testing and data analysis.
More advanced concepts like z-scores and normal distribution analysis will build on these foundations.