Overview
This lecture covers how to describe categorical data using frequency and relative frequency tables, including how to construct these tables both manually and in Google Sheets.
Recap: Statistics Basics
- Statistics has two main branches: descriptive statistics (summarizing data) and inferential statistics (drawing conclusions about a population from a sample).
- A population is the whole group of interest; a sample is a subset used for analysis.
- Data for this course focuses on structured, table-based data with rows as cases and columns as variables.
- Data types: categorical (qualitative) and numerical (quantitative).
- Categorical data can be nominal (no order) or ordinal (ordered); numerical data can be interval or ratio.
- Data can be cross-sectional (single point in time) or time series (over time).
Describing Categorical Data
- Categorical data are best summarized using frequency distributions (tables showing counts for each category).
- Frequency is the count of each unique value (category) in the dataset.
- To construct a frequency table manually: list distinct categories, tally occurrences, and count totals per category.
- Examples showed how different data sets can have the same or different frequency distributions.
Creating Frequency Tables in Google Sheets
- Enter data in a column, select the cells, use Data > Pivot Table to create the table.
- In the Pivot Table Editor, set the category as 'Rows' and the counts as 'Values' to produce the frequency table.
- This process can be used for any categorical variable, such as blood group or gender.
Relative Frequency
- Relative frequency is the ratio of a category's frequency to the total number of observations.
- Calculated by dividing each frequency by the dataset total; all relative frequencies sum to 1.
- Relative frequencies allow comparison between datasets of different sizes.
- Relative frequency tables are useful for standardized comparisons.
Key Terms & Definitions
- Descriptive Statistics — methods for summarizing data.
- Inferential Statistics — methods for making predictions or inferences about a population.
- Structured Data — data organized in tables with rows and columns.
- Categorical Data — data that can be sorted into groups, e.g., A, B, C, D.
- Frequency — the count of observations in each category.
- Frequency Table — a table listing each category and its frequency.
- Relative Frequency — the proportion or percentage of the total for each category.
Action Items / Next Steps
- Practice constructing frequency and relative frequency tables (manually and in Google Sheets) for given categorical data sets.
- Prepare for next class: describing associations between two categorical variables.