Overview
This lecture explains how to classify and organize different types of data in a data table, focusing on distinguishing categorical and numerical variables, their measurement units, and the concepts of time series and cross-sectional data.
Data Organization
- Data tables consist of columns (variables) and rows (cases or observations).
- Variables can represent different types of information, such as names, gender, marks, boards, etc.
Types of Variables
- Variables are broadly divided into categorical (qualitative) and numerical (quantitative).
- Categorical variables represent group membership (e.g., gender, board, blood group).
- Numerical variables are associated with numeric values (e.g., marks, height, weight).
Categorical Data
- Categorical variables have values that are names or categories (e.g., Male/Female, CBSE/State/ICSE).
- Group membership allows classification of each observation into groups.
- Some variables may appear numerical but act as categories (e.g., jersey number).
Numerical Data
- Numerical variables can be measured and have units (e.g., marks in numbers, height in cm, weight in kg).
- Numerical data are divided into discrete data (whole numbers, e.g., matches played) and continuous data (can take any value, e.g., batting average).
Measurement Units
- Numerical variables require consistent measurement units across all observations (e.g., all heights in cm).
Time Series vs. Cross-Sectional Data
- Time series data tracks a single variable across different times (e.g., daily potato quantity over a month).
- Cross-sectional data records variables at a single point in time across different cases.
Key Terms & Definitions
- Variable — a property or characteristic measured for each case in a data set.
- Categorical Variable — a variable with values as categories or groups.
- Numerical Variable — a variable with numeric values, can be discrete or continuous.
- Discrete Data — numerical data that can take only specific values (usually counts).
- Continuous Data — numerical data that can take any value within a range.
- Measurement Unit — the standard unit used to measure a variable.
- Time Series Data — data collected on the same variable at different times.
- Cross-Sectional Data — data collected at one point in time on multiple variables or cases.
Action Items / Next Steps
- Classify mobile number and jersey number as categorical or numerical variables.
- Practice identifying categorical and numerical variables in sample datasets.
- Ensure measurement units are consistent when analyzing numerical data.