Gradient Boosting: Lecture Notes

Jul 15, 2024

Gradient Boosting: Lecture Notes

Introduction

  • Presenter: Ranjan
  • Topic: Gradient Boosting
  • Context: AI Playlist on YouTube
  • Previous Topics Covered: Ensemble techniques including Voting, Bagging, Stacking, Pasting, and Adaptive Boosting (AdaBoost)

Gradient Boosting Overview

  • Types of Boosting: 4 known types of boosting; previously covered AdaBoost
  • Importance: Widely used in data science problems and competitions, common interview topic
  • Difference from Other Techniques: Boosting involves dependency between models, unlike Bagging where models are independent

Key Concepts

  • AdaBoost Recap: Weight Manipulation
    • Correct classification: decrease weight
    • Misclassification: increase weight
  • Gradient Boosting: Loss Function Optimization
    • No weights
    • Learning by optimizing the loss function (actual - predicted)
    • Various loss functions can be used: L1, L2, RMS, MSS

Detailed Process of Gradient Boosting

  • Base Model: Average model just better than random guessing
  • Errors & Residuals: Use of Residuals
    • First, use the base model for initial prediction
    • Calculate errors (residuals) as actual - predicted
    • Residuals are used as input for subsequent models
  • Learning Rate: Determines the magnitude of change added by each subsequent model
  • Model Iteration: Continue until the number of estimators is reached or residuals are minimized to zero

Example Walkthrough

  • Dataset: Home price data with variables like number of rooms, floor, and room size
  • Steps:
    1. Base Model: Calculate average dependent variable (home price)
    2. Calculate Residuals: Difference between actual values and predictions from the base model
    3. Train First Residual Model (RM1): Use residuals as target for RM1
    4. Update Predictions: Add RM1’s predictions to base model’s predictions
    5. Recalculate Residuals: Use new predictions to calculate residuals
    6. Repeat: Fit subsequent residual models (RM2, RM3, etc.) on updated residuals until all residuals are minimized
  • Formula for Final Output:

$$ \text{Final Output} = \text{Base Model Output} + \eta \times \text{RM1 Output} + \eta \times \text{RM2 Output} + \ldots $$

Hyperparameters in Gradient Boosting

  • Learning Rate (η): Value between 0-1, controls the contribution of each tree; lower is preferable for robustness and generalization
  • Number of Estimators: Number of decision trees used
  • Subsample: Fraction of data used in each iteration
  • Loss Function: Type of loss function (e.g., MSE, RMSE)
  • Maximum Features & Depth: Controls overfitting by limiting the number and depth of features in decision trees

Summary

  • Gradient Boosting: Uses residuals from prior models as a corrective measure, iteratively reduces error
  • Final Steps: Add up the models to get final predictions, minimize error through iterations
  • Next Video: Practical implementation using a dataset

Conclusion

  • Engagement: Like, subscribe, press the bell icon, share the video
  • Call to Action: Stay tuned for the next video on using Gradient Boosting in a model