🌲

XG Boost and Its Trees for Regression

Jun 27, 2024

XG Boost and Its Trees for Regression

Introduction

Presenter: Josh Stormer
Topic: XG Boost Basics - Part 1 (Regression Trees)
Assumes familiarity with gradient boost for regression and regularization.
Split into three parts: Regression (Part 1), Classification (Part 2), Mathematical Details (Part 3).

XG Boost Overview

XG Boost: A comprehensive ML algorithm with many components.
Despite complexity, individual parts are simple to understand.
Designed for large, complex datasets (example used is simple for explanation).

Initial Steps in XG Boost

Initial Prediction: Default is 0.5 for both regression and classification.
Residual Calculation: Differences between observed and predicted values.

Building XG Boost Trees for Regression

Key Differences from Gradient Boost

Uses unique regression trees (XG Boost Trees)
Each tree starts with a single leaf including all residuals

Calculating Similarity Score

Formula: Sum of residuals squared / number of residuals + lambda (regularization parameter)
Example (with lambda = 0):
- Residuals: +/- values cancel each other out
- For the root: (7.5, -7.5, 6.5, -10.5) -> Similarity Score = 4

Splitting Nodes

Process: Split observations based on dosage thresholds
Example: Threshold = 15
- Left node: Residual < 15 -> Residual = 110.25
- Right node: Residual >= 15 -> Residual sum cancels out mostly, similarity = 14.08

Gain Calculation

Formula: Similarity of left node + similarity of right node - similarity of root
Compare gains for different thresholds to choose the best split
- Example: Dosage < 15 has highest gain (120.33)
Continue splitting for each node with highest gains

Pruning Trees

Pruning Process:
1. Choose a threshold (gamma, e.g., 130)
2. Calculate gain - gamma for branches
3. Prune if result is negative;
4. Difference at further nodes impacts removal

Regularization Parameter (Lambda)

Alters similarity scores and gain calculations, affecting pruning and tree structure.
Example: Lambda = 1 reduces similarity score, making pruning more aggressive.

Output Values

Formula: Sum of residuals / (number of residuals + lambda)
Lambda reduces sensitivity to outliers
Leaves are assigned output values which adjust predictions

Predictions and Iterative Process

New predictions: Start with 0.5, add scaled output from tree by learning rate (eta, default 0.3)
Residual Update: Calculate new, smaller residuals
Reiterate with new trees until residuals are minimal or max trees reached

Summary

Steps to building XG Boost trees: calculate similarity -> split for gain -> prune using gamma -> calculate output -> adjust predictions iteratively.
Lambda increases pruning probability and reduces tree complexity.
Default values: Lambda = 0, Learning rate (Eta) = 0.3, Gamma needs user selection.
Part 2 will cover classification trees.

Conclusion

Encouragement to support Stat Quest and subscribe for more content.

Full transcript