📊

Understanding Data Types for Analysis

Sep 5, 2024

Types of Data in Statistical Analysis

Introduction to Data

  • Data is essential for statistical analysis.
  • Collection of data is necessary to understand phenomena or processes.
  • Each data unit collected is termed an observation.
    • Observations can be people, businesses, products, or time periods.
  • Variables are the measurements we are interested in (e.g., age, sex, chocolate preference).
  • In data representation:
    • Each row in a spreadsheet represents a single observation.
    • Each column corresponds to a variable.

Levels of Measurement

  • The level of measurement determines the analysis and summary statistics that can be used.

1. Nominal Level (Categorical/Qualitative)

  • Definition: The most basic level of measurement.
  • Characteristics:
    • Descriptions or labels with no inherent order.
    • Examples: sex, preferred chocolate type, color.
  • Data Representation:
    • Can be stored as text or numerical codes (numbers do not imply order).
    • Summary using frequency or percentage.
    • Cannot calculate mean or average.

2. Ordinal Level

  • Definition: Measurement where order is meaningful, but intervals may not be equal.
  • Characteristics:
    • Examples: rank, satisfaction levels, fanciness.
    • The gap between ranked values may vary (e.g., race positions, satisfaction levels).
  • Data Representation:
    • Summarized using frequencies.
    • Some researchers calculate mean values, but it must be justifiable.

3. Interval Ratio Level

  • Definition: The most precise measurement level.
  • Characteristics:
    • Measurable data (e.g., number of customers, weight, age, size).
    • Can be discrete (whole numbers) or continuous (fractional numbers).
  • Data Representation:
    • Mathematically versatile; common summary statistics include mean, median, and standard deviation.

Graphical Representation of Data

  • Appropriate graph types depend on the level of measurement:
    • Nominal Data: Pie chart, column chart, bar chart.
    • Ordinal Data: Column chart, bar chart (not pie charts).
    • Interval Ratio Data: Bar chart, histogram.
    • Box Plots: Illustrate summary statistics neatly.
    • Time Series Data: Best displayed as a line chart.

Example Case: Helen's Choconutties Survey

  • Helen surveys 50 customers about their preferences and satisfaction with Choconutties.
  • Collected Data:
    • Nominal: Type of chocolate preferred (e.g., dark, milk, white).
      • Example Summary: 46% dark, 40% milk, 14% white.
    • Ordinal: Satisfaction levels (very satisfied, satisfied, etc.).
      • Example Summary: 32% very satisfied; 72% satisfied or very satisfied.
      • Average satisfaction score: 2.06 (interpretation debated).
    • Interval Ratio: Age, amount spent, number of chocolate bars.
      • Example Summary: Mean age is 38 years, mean groceries spent is $192, mean chocolate bars bought is 3.3.

Conclusion

  • The type of analysis performed depends on the level of measurement of the dataset.
  • Further insights can be gained from related resources such as the video "Choosing the Test."