Overview
This lecture explains the main types of data used in statistical analysis, focusing on levels of measurement and how each type determines suitable summary statistics and charts.
Types of Data and Observations
- Data is collected to learn about phenomena by observing multiple measures on people or things.
- Each row in a dataset represents an observation (e.g., person, business, product, or time period).
- Columns in a dataset represent variables (traits, characteristics, or measures).
- For every observation, a value is recorded for each variable.
Levels of Measurement
- The level of measurement affects which statistics, graphs, and analyses are appropriate.
Nominal Data
- Nominal (categorical/qualitative) data labels or categorizes without ordering (ex: sex, chocolate type, color).
- Nominal values can be words or codes; numeric codes do not imply order.
- Only frequencies or percentages are appropriate; calculating a mean is invalid.
- Best shown with pie charts or bar/column charts.
Ordinal Data
- Ordinal data has a meaningful order but unequal intervals (ex: ranks, satisfaction).
- Frequencies are appropriate; calculating a mean is possible but sometimes questionable.
- Best shown with column/bar charts, not pie charts.
Interval/Ratio Data
- Interval/ratio (scale/quantitative/parametric) data are measurable and can be discrete or continuous (ex: age, weight).
- Common summary statistics: mean, median, standard deviation.
- Best shown with bar charts, histograms, or box plots (for grouped or continuous data).
Data Visualization Guidelines
- Choose chart types based on data level: bar or pie chart for nominal; column/bar for ordinal; histogram/bar/box plot for interval/ratio.
- Data over time is best shown in a line chart.
Example: Helen's Choconutties Survey
- Nominal: Preferred chocolate type summarized via pie/bar chart (e.g., 46% prefer dark).
- Ordinal: Satisfaction measured and shown in ordered column chart, not a pie chart; mean satisfaction is debated.
- Interval/Ratio: Age, spending, and quantity summarized by means and shown in bar/histogram.
Key Terms & Definitions
- Observation — a single entity measured in a dataset (person, object, or time period).
- Variable — a characteristic or measure recorded for each observation.
- Nominal Data — data categorized by label without order.
- Ordinal Data — data with an inherent order, but unequal intervals.
- Interval/Ratio Data — numerical data with meaningful intervals and ratios; supports most statistical analysis.
Action Items / Next Steps
- Watch the video "Choosing the test" for further guidance on analysis selection.