📊

Understanding Sample Mean, Standard Deviation, and Outliers

Mar 28, 2025

Lecture Notes on Sample Mean, Standard Deviation, and Outliers

Key Concepts

  • We have data on distances between 46 retail stores and a central Distribution Center.
  • The objective is to calculate the sample mean, sample standard deviation, and identify outliers in the data set.

Sample Mean Calculation

  1. Data Handling:

    • Data consists of distances in miles from 46 stores to a distribution center.
    • This is a sample, not a population.
  2. Process:

    • Sum up all data values.
    • Divide the sum by the number of data points (n = 46) to get the mean.
    • Store result as XÌ„ (sample mean).
  3. Result:

    • Sample mean (XÌ„) = 197.2826 miles.

Sample Standard Deviation Calculation

  1. Process:

    • Calculate differences between each data value and the mean (X - XÌ„).
    • Square each difference to eliminate direction.
    • Sum these squared differences.
    • Divide by n-1 (since it’s a sample): This is because we're using Bessel's correction.
    • Take the square root of the result to find the standard deviation.
  2. Result:

    • Sample standard deviation (s) = 32.4884 miles.

Finding Outliers

  1. Definition:

    • Outliers are data points that lie outside two standard deviations from the mean.
  2. Bounds Calculation:

    • Upper bound = XÌ„ + 2s = 262.2594 miles.
    • Lower bound = XÌ„ - 2s = 132.3058 miles.
  3. Outliers Identification:

    • Identify data values not between the bounds (132.3058 and 262.2594).
  4. Results:

    • Lower end: 132 is an outlier (below the lower bound).
    • Upper end: 277 is an outlier (above the upper bound).

Conclusion

  • Outliers identified are 132 and 277 miles, which deviate significantly from the average range.
  • Ensure data is in order for efficient outlier identification.
  • Remember to use full precision for calculations to avoid rounding errors.

Tips

  • Use print function to display multiple calculations but be aware of artifact in displaying the last line twice.
  • Confirm outlier values by checking both ends of the ordered data set.