Jul 15, 2024

**Presenter:**Ranjan**Topic:**Gradient Boosting**Context:**AI Playlist on YouTube**Previous Topics Covered:**Ensemble techniques including Voting, Bagging, Stacking, Pasting, and Adaptive Boosting (AdaBoost)

**Types of Boosting:**4 known types of boosting; previously covered AdaBoost**Importance:**Widely used in data science problems and competitions, common interview topic**Difference from Other Techniques: Boosting**involves dependency between models, unlike Bagging where models are independent

**AdaBoost Recap: Weight Manipulation**- Correct classification: decrease weight
- Misclassification: increase weight

**Gradient Boosting: Loss Function Optimization**- No weights
- Learning by optimizing the loss function (actual - predicted)
- Various loss functions can be used: L1, L2, RMS, MSS

**Base Model:**Average model just better than random guessing**Errors & Residuals: Use of Residuals**- First, use the base model for initial prediction
- Calculate errors (residuals) as actual - predicted
- Residuals are used as input for subsequent models

**Learning Rate:**Determines the magnitude of change added by each subsequent model**Model Iteration:**Continue until the number of estimators is reached or residuals are minimized to zero

**Dataset:**Home price data with variables like number of rooms, floor, and room size**Steps:****Base Model:**Calculate average dependent variable (home price)**Calculate Residuals:**Difference between actual values and predictions from the base model**Train First Residual Model (RM1):**Use residuals as target for RM1**Update Predictions:**Add RM1’s predictions to base model’s predictions**Recalculate Residuals:**Use new predictions to calculate residuals**Repeat:**Fit subsequent residual models (RM2, RM3, etc.) on updated residuals until all residuals are minimized

**Formula for Final Output:**

$$ \text{Final Output} = \text{Base Model Output} + \eta \times \text{RM1 Output} + \eta \times \text{RM2 Output} + \ldots $$

**Learning Rate (η):**Value between 0-1, controls the contribution of each tree; lower is preferable for robustness and generalization**Number of Estimators:**Number of decision trees used**Subsample:**Fraction of data used in each iteration**Loss Function:**Type of loss function (e.g., MSE, RMSE)**Maximum Features & Depth:**Controls overfitting by limiting the number and depth of features in decision trees

**Gradient Boosting:**Uses residuals from prior models as a corrective measure, iteratively reduces error**Final Steps:**Add up the models to get final predictions, minimize error through iterations**Next Video:**Practical implementation using a dataset

**Engagement:**Like, subscribe, press the bell icon, share the video**Call to Action:**Stay tuned for the next video on using Gradient Boosting in a model