Key Concepts in Statistics
Measures of Central Tendency
-
Mean: The average of a data set.
- Arrange numbers in increasing order.
- Sum of numbers divided by the total count.
- Example: For the set
[7, 7, 10, 14, 15, 23, 32]
, sum = 108, mean = $108/7 = 15.43$.
-
Median: The middle number when numbers are arranged in order.
- Eliminate numbers from the ends toward the center.
- If two numbers are in the middle, take average.
- Example: For
[7, 7, 10, 14, 15, 23, 32]
, median = 14.
-
Mode: The most frequent number in the data set.
- Example: Mode in
[7, 7, 10, 14, 15, 23, 32]
is 7.
- Bimodal Data Set:
[15, 15, 21, 37, 41, 59, 59]
, mode = 15 and 59.
-
Range: Difference between the highest and lowest numbers.
- Example: Range = 32 - 7 = 25.
Quartiles and Interquartile Range
- Quartiles: Values dividing data into four equal parts.
- Q1 (first quartile): Median of the lower half.
- Q2 (second quartile): Median of the entire data set.
- Q3 (third quartile): Median of the upper half.
- Interquartile Range (IQR): Q3 - Q1.
- Outliers:
- A point is an outlier if it is outside the range: $Q1 - 1.5 \times IQR$ to $Q3 + 1.5 \times IQR$.
Box and Whisker Plots
- Construction:
- Plot minimum, Q1, median (Q2), Q3, and maximum.
- Use to visualize data distribution, detect outliers.
Skewness in Data
- Symmetrical Data: Mean = Median.
- Right Skewed (Positive Skew):
- Mean > Median; longer whisker on the right in plots.
- Left Skewed (Negative Skew):
- Mean < Median; longer whisker on the left.
Graphical Representations
-
Dot Plot:
- Dots represent frequency of data points on a number line.
- Mode is the number with the most dots.
-
Stem and Leaf Plot:
- Splits data into 'stem' (all but last digit) and 'leaf' (last digit).
- Useful for small data sets.
-
Frequency Table:
- Numerical data organized by frequency of occurrence.
- Can compute mean by using frequency multiplication.
-
Histogram:
- Bars represent frequency of data within certain ranges (bins).
- Bars are adjacent, unlike bar graphs.
-
Relative Frequency Table:
- Shows frequency relative to total data points.
- Includes cumulative relative frequency to track accumulation.
Percentiles
- Percentile: Indicates the value below which a given percentage falls.
- Example: 60th percentile is the value separating the lowest 60% from the highest 40%.
These concepts form foundational knowledge for statistical analysis, providing tools for data interpretation, summarization, and visualization.