Understanding Data Variation and Standard Deviation

Sep 8, 2024

Lecture Notes on Data Variation and Standard Deviation

Introduction to Variation

  • Variation in a data set refers to how spread out the data is.
  • It's important to understand that the mean is not the only measurement needed for understanding data.
  • Example: Bank line wait times from three banks demonstrate different data spreads despite having the same mean wait time.

Measures of Variation

  1. Range

    • Calculated as the highest value minus the lowest value.
    • Pros: Easiest to calculate.
    • Cons: Only considers two data points, doesn't represent the entire data set well.
  2. Standard Deviation

    • More important and useful than range.
    • Measures the average distance of data values from the mean.
    • Properties:
      • Never negative.
      • Only zero if all values are the same.
      • Affected by outliers.
      • Symbol for sample standard deviation: S.

Calculating Standard Deviation

  • Formula for sample standard deviation involves:
    • Finding the distance from each data value to the mean.
    • Squaring the distances to keep them positive.
    • Finding the average of these squared distances.
    • Taking the square root of the average.
  • Two Formulas
    1. Using mean calculation.
    2. Without mean calculation (saves steps in large data sets).

Example Calculation

  • Calculating standard deviation using examples from bank wait times.
  • Step-by-step method:
    • List data values.
    • Calculate the mean.
    • Subtract mean from each value, square the result.
    • Sum the squared differences.
    • Divide by n-1 for samples, take the square root.

Population vs Sample Standard Deviation

  • Population standard deviation uses the Greek letter σ and divides by N (total number of data points).
  • Sample standard deviation uses S and divides by n-1.
  • Different calculations for sample and population due to sample's estimation error.

Variance

  • Variance is simply the square of the standard deviation.
  • Symbols: S^2 for sample variance and σ^2 for population variance.

Using a Calculator

  • Explanation on using statistical calculators to find standard deviation.
  • Understanding outputs: S for sample, σ for population.

Empirical Rule

  • Works with normally distributed data.
  • 68-95-99.7 rule:
    • 68% of data falls within 1 standard deviation from the mean.
    • 95% within 2 standard deviations.
    • 99.7% within 3 standard deviations.
  • Determines usual vs unusual data.
  • Data value within two standard deviations is usual, outside is unusual.

Coefficient of Variation

  • Used to compare variability between different data sets.
  • Calculated as (Standard Deviation / Mean) * 100%.
  • Provides a percentage to compare variability relative to their means.

Conclusion

  • Understanding variation and standard deviation is fundamental in statistics.
  • These concepts aid in hypothesis testing and data analysis.
  • More advanced concepts like z-scores and normal distribution analysis will build on these foundations.