Coconote
AI notes
AI voice & video notes
Try for free
📊
Forecasting Techniques in Data Science
Aug 6, 2024
30 Days of Data Series: Forecasting Pipeline
Introduction
Objective of the Series
: Understand data science, processes, mathematical models, and actionable coding for projects.
Engagement
: Students are encouraged to ask questions and request specific videos.
Structure
: 30 videos covering feature correlation, segmentation pipelines, regression models, etc.
Today's Topic
: Introduction to forecasting with a focus on a forecasting pipeline.
Presenter Introduction
Name
: Priya
Current Role
: Senior Data Scientist at Uber (Uber Advertising)
Background
: Degree in Astrophysics from UChicago, experience as a data scientist.
Forecasting Overview
Definition
: Forecasting = Time Series Modeling
Applications
: Common in sales/demand forecasting (e.g., daily sales data).
Data Requirements
: Minimum requirements include a datetime stamp and a corresponding value.
Data Used for Forecasting
Dataset Source
: Sales data from Kaggle (store in Ecuador).
Data Characteristics
: Daily sales data across various categories.
Initial Data Exploration
:
Minimum and maximum dates from 2013 to 2017.
Recommended data range: 2-3 years for accuracy.
Data Processing Steps
Data Import
: Key functions for checking missing data and calculating metrics (MAPE).
Data Aggregation
: Aggregate by the category (family) for daily sales.
Visualize Data
: Plot data to understand trends and seasonality.
Model Selection
: Start with the
Prophet
model for forecasting.
Time Series Modeling
Data Aggregation
: Focus on high volume sales metrics for better accuracy.
Modeling Approach
: One model per category due to seasonal differences in sales.
Trend Analysis
: Identify linear and irregular trends across different categories.
Future Steps in the Series
Upcoming Videos
:
Video 2: Automate the forecasting pipeline for all categories.
Video 3: Deep dive into the
Prophet
model and its mathematics.
Practical Example: Demand Forecasting
Model used:
Prophet
Data Preparation
:
Use daily sales data for produce from a specified date range.
Incorporate holiday data for seasonality effects.
Back Testing
: Train the model on historical data and predict the next 30 days.
Model Evaluation Metrics
MAPE
: Mean Absolute Percentage Error for model accuracy assessment.
Results
: Initial accuracy observed at 93%.
Cross Validation in Time Series
Purpose
: Evaluate the model's performance over different time periods.
Procedure
: Split the dataset to continuously predict and assess accuracy.
Data Integrity Issues
Common Problems
: Systematic errors in data (e.g., zero sales on certain dates).
Remediation
: Investigate and clean anomalies in the data.
Parameter Tuning for Prophet Model
Key Parameters
: Change Point Prior Scale, Seasonality Prior Scale.
Optimization
: Use tuning results to improve model performance (e.g., reducing MAPE).
Conclusion
Next Steps
: Upcoming videos will cover automation and deeper insights into the
Prophet
model.
Call to Action
: Subscribe for more content and engagement with the series.
📄
Full transcript