Forecasting Techniques in Data Science

Aug 6, 2024

30 Days of Data Series: Forecasting Pipeline

Introduction

  • Objective of the Series: Understand data science, processes, mathematical models, and actionable coding for projects.
  • Engagement: Students are encouraged to ask questions and request specific videos.
  • Structure: 30 videos covering feature correlation, segmentation pipelines, regression models, etc.
  • Today's Topic: Introduction to forecasting with a focus on a forecasting pipeline.

Presenter Introduction

  • Name: Priya
  • Current Role: Senior Data Scientist at Uber (Uber Advertising)
  • Background: Degree in Astrophysics from UChicago, experience as a data scientist.

Forecasting Overview

  • Definition: Forecasting = Time Series Modeling
  • Applications: Common in sales/demand forecasting (e.g., daily sales data).
  • Data Requirements: Minimum requirements include a datetime stamp and a corresponding value.

Data Used for Forecasting

  • Dataset Source: Sales data from Kaggle (store in Ecuador).
  • Data Characteristics: Daily sales data across various categories.
  • Initial Data Exploration:
    • Minimum and maximum dates from 2013 to 2017.
    • Recommended data range: 2-3 years for accuracy.

Data Processing Steps

  1. Data Import: Key functions for checking missing data and calculating metrics (MAPE).
  2. Data Aggregation: Aggregate by the category (family) for daily sales.
  3. Visualize Data: Plot data to understand trends and seasonality.
  4. Model Selection: Start with the Prophet model for forecasting.

Time Series Modeling

  • Data Aggregation: Focus on high volume sales metrics for better accuracy.
  • Modeling Approach: One model per category due to seasonal differences in sales.
  • Trend Analysis: Identify linear and irregular trends across different categories.

Future Steps in the Series

  • Upcoming Videos:
    • Video 2: Automate the forecasting pipeline for all categories.
    • Video 3: Deep dive into the Prophet model and its mathematics.

Practical Example: Demand Forecasting

  • Model used: Prophet
  • Data Preparation:
    • Use daily sales data for produce from a specified date range.
    • Incorporate holiday data for seasonality effects.
  • Back Testing: Train the model on historical data and predict the next 30 days.

Model Evaluation Metrics

  • MAPE: Mean Absolute Percentage Error for model accuracy assessment.
  • Results: Initial accuracy observed at 93%.

Cross Validation in Time Series

  • Purpose: Evaluate the model's performance over different time periods.
  • Procedure: Split the dataset to continuously predict and assess accuracy.

Data Integrity Issues

  • Common Problems: Systematic errors in data (e.g., zero sales on certain dates).
  • Remediation: Investigate and clean anomalies in the data.

Parameter Tuning for Prophet Model

  • Key Parameters: Change Point Prior Scale, Seasonality Prior Scale.
  • Optimization: Use tuning results to improve model performance (e.g., reducing MAPE).

Conclusion

  • Next Steps: Upcoming videos will cover automation and deeper insights into the Prophet model.
  • Call to Action: Subscribe for more content and engagement with the series.