Fundamentals of Statistics Explained

Aug 20, 2024

Key Concepts in Statistics

Measures of Central Tendency

  1. Mean: The average of a data set.

    • Arrange numbers in increasing order.
    • Sum of numbers divided by the total count.
    • Example: For the set [7, 7, 10, 14, 15, 23, 32], sum = 108, mean = $108/7 = 15.43$.
  2. Median: The middle number when numbers are arranged in order.

    • Eliminate numbers from the ends toward the center.
    • If two numbers are in the middle, take average.
    • Example: For [7, 7, 10, 14, 15, 23, 32], median = 14.
  3. Mode: The most frequent number in the data set.

    • Example: Mode in [7, 7, 10, 14, 15, 23, 32] is 7.
    • Bimodal Data Set: [15, 15, 21, 37, 41, 59, 59], mode = 15 and 59.
  4. Range: Difference between the highest and lowest numbers.

    • Example: Range = 32 - 7 = 25.

Quartiles and Interquartile Range

  • Quartiles: Values dividing data into four equal parts.
    • Q1 (first quartile): Median of the lower half.
    • Q2 (second quartile): Median of the entire data set.
    • Q3 (third quartile): Median of the upper half.
  • Interquartile Range (IQR): Q3 - Q1.
  • Outliers:
    • A point is an outlier if it is outside the range: $Q1 - 1.5 \times IQR$ to $Q3 + 1.5 \times IQR$.

Box and Whisker Plots

  • Construction:
    • Plot minimum, Q1, median (Q2), Q3, and maximum.
    • Use to visualize data distribution, detect outliers.

Skewness in Data

  • Symmetrical Data: Mean = Median.
  • Right Skewed (Positive Skew):
    • Mean > Median; longer whisker on the right in plots.
  • Left Skewed (Negative Skew):
    • Mean < Median; longer whisker on the left.

Graphical Representations

  1. Dot Plot:

    • Dots represent frequency of data points on a number line.
    • Mode is the number with the most dots.
  2. Stem and Leaf Plot:

    • Splits data into 'stem' (all but last digit) and 'leaf' (last digit).
    • Useful for small data sets.
  3. Frequency Table:

    • Numerical data organized by frequency of occurrence.
    • Can compute mean by using frequency multiplication.
  4. Histogram:

    • Bars represent frequency of data within certain ranges (bins).
    • Bars are adjacent, unlike bar graphs.
  5. Relative Frequency Table:

    • Shows frequency relative to total data points.
    • Includes cumulative relative frequency to track accumulation.

Percentiles

  • Percentile: Indicates the value below which a given percentage falls.
    • Example: 60th percentile is the value separating the lowest 60% from the highest 40%.

These concepts form foundational knowledge for statistical analysis, providing tools for data interpretation, summarization, and visualization.