📊

Data Classification and Organization

Sep 7, 2025

Overview

This lecture explains how to classify and organize different types of data in a data table, focusing on distinguishing categorical and numerical variables, their measurement units, and the concepts of time series and cross-sectional data.

Data Organization

  • Data tables consist of columns (variables) and rows (cases or observations).
  • Variables can represent different types of information, such as names, gender, marks, boards, etc.

Types of Variables

  • Variables are broadly divided into categorical (qualitative) and numerical (quantitative).
  • Categorical variables represent group membership (e.g., gender, board, blood group).
  • Numerical variables are associated with numeric values (e.g., marks, height, weight).

Categorical Data

  • Categorical variables have values that are names or categories (e.g., Male/Female, CBSE/State/ICSE).
  • Group membership allows classification of each observation into groups.
  • Some variables may appear numerical but act as categories (e.g., jersey number).

Numerical Data

  • Numerical variables can be measured and have units (e.g., marks in numbers, height in cm, weight in kg).
  • Numerical data are divided into discrete data (whole numbers, e.g., matches played) and continuous data (can take any value, e.g., batting average).

Measurement Units

  • Numerical variables require consistent measurement units across all observations (e.g., all heights in cm).

Time Series vs. Cross-Sectional Data

  • Time series data tracks a single variable across different times (e.g., daily potato quantity over a month).
  • Cross-sectional data records variables at a single point in time across different cases.

Key Terms & Definitions

  • Variable — a property or characteristic measured for each case in a data set.
  • Categorical Variable — a variable with values as categories or groups.
  • Numerical Variable — a variable with numeric values, can be discrete or continuous.
  • Discrete Data — numerical data that can take only specific values (usually counts).
  • Continuous Data — numerical data that can take any value within a range.
  • Measurement Unit — the standard unit used to measure a variable.
  • Time Series Data — data collected on the same variable at different times.
  • Cross-Sectional Data — data collected at one point in time on multiple variables or cases.

Action Items / Next Steps

  • Classify mobile number and jersey number as categorical or numerical variables.
  • Practice identifying categorical and numerical variables in sample datasets.
  • Ensure measurement units are consistent when analyzing numerical data.