Forecasting Techniques in Data Science

Aug 6, 2024

30 Days of Data Series: Forecasting Pipeline

Introduction

Objective of the Series: Understand data science, processes, mathematical models, and actionable coding for projects.
Engagement: Students are encouraged to ask questions and request specific videos.
Structure: 30 videos covering feature correlation, segmentation pipelines, regression models, etc.
Today's Topic: Introduction to forecasting with a focus on a forecasting pipeline.

Presenter Introduction

Name: Priya
Current Role: Senior Data Scientist at Uber (Uber Advertising)
Background: Degree in Astrophysics from UChicago, experience as a data scientist.

Forecasting Overview

Definition: Forecasting = Time Series Modeling
Applications: Common in sales/demand forecasting (e.g., daily sales data).
Data Requirements: Minimum requirements include a datetime stamp and a corresponding value.

Data Used for Forecasting

Dataset Source: Sales data from Kaggle (store in Ecuador).
Data Characteristics: Daily sales data across various categories.
Initial Data Exploration:
- Minimum and maximum dates from 2013 to 2017.
- Recommended data range: 2-3 years for accuracy.

Data Processing Steps

Data Import: Key functions for checking missing data and calculating metrics (MAPE).
Data Aggregation: Aggregate by the category (family) for daily sales.
Visualize Data: Plot data to understand trends and seasonality.
Model Selection: Start with the Prophet model for forecasting.

Time Series Modeling

Data Aggregation: Focus on high volume sales metrics for better accuracy.
Modeling Approach: One model per category due to seasonal differences in sales.
Trend Analysis: Identify linear and irregular trends across different categories.

Future Steps in the Series

Upcoming Videos:
- Video 2: Automate the forecasting pipeline for all categories.
- Video 3: Deep dive into the Prophet model and its mathematics.

Practical Example: Demand Forecasting

Model used: Prophet
Data Preparation:
- Use daily sales data for produce from a specified date range.
- Incorporate holiday data for seasonality effects.
Back Testing: Train the model on historical data and predict the next 30 days.

Model Evaluation Metrics

MAPE: Mean Absolute Percentage Error for model accuracy assessment.
Results: Initial accuracy observed at 93%.

Cross Validation in Time Series

Purpose: Evaluate the model's performance over different time periods.
Procedure: Split the dataset to continuously predict and assess accuracy.

Data Integrity Issues

Common Problems: Systematic errors in data (e.g., zero sales on certain dates).
Remediation: Investigate and clean anomalies in the data.

Parameter Tuning for Prophet Model

Key Parameters: Change Point Prior Scale, Seasonality Prior Scale.
Optimization: Use tuning results to improve model performance (e.g., reducing MAPE).

Conclusion

Next Steps: Upcoming videos will cover automation and deeper insights into the Prophet model.
Call to Action: Subscribe for more content and engagement with the series.

Full transcript