XGBoost Mathematical Formulation

Jul 19, 2024

XGBoost Mathematical Formulation - Lecture Notes

Prerequisites

Adaboost Math (covered in Part 5)
Gradient Boost Math (covered in Part 6)
XGBoost Concepts (covered in Part 7)

Overview

XGBoost: Implementation of optimized gradient booster trees
Builds on the 8-step gradient boosting algorithm with additional improvements
Regularized model: Measures complexity of the tree with two components:
- Training Loss: Measures fit of data to model
- Regularization Component: Measures complexity of the tree

Loss Function

Uses squared loss: Sum of squared differences between predicted and actual values
Trade-off optimization between training loss and regularization loss
- Training Loss Optimization: Higher accuracy on training data
- Regularization Optimization: Generalized, simpler models for production

Regularization Parameters

Number of Leaves
L2 Norm of Leaf Weight
Hyperparameters: Gamma and Lambda
Regularization formula includes:
- Number of leaves weighted by gamma
- Lambda weighted L2 norm of leaf score

Example

Sample tree with 2 nodes and 3 leaves:
- Weights: Leaf 1 = +2, Leaf 2 = 0.1, Leaf 3 = -1
- Regularization: Gamma * 3 + 0.5 * Lambda * (W1^2 + W2^2 + W3^2)

Training Loss Component

Represented as an additive model due to boosting
Follows similar construction to Adaboost and Gradient Boost
Formula: Final model Y_hat(t) = Previous model Y_hat(t-1) + New model

Taylor Approximation

Approximates differentiable functions
Formula: F(x + deltaX) ≈ F(x) + F'(x) * deltaX + 0.5 * F''(x) * deltaX^2
Applied to XGBoost’s objective function for quadratic approximation
Introduction of Gi and Hi terms: First and second-order differentiations

Simplified Objective Function

Combines training loss and regularization
Uses summations G and H for notation
New objective function: Sum of quadratic equations
Optimal leaf weight Wj derived as -Gi / (Hi + Lambda)
Rewrites objective function without W’s: Minimum Objective Function

Tree Building Process

Greedy tree-growing algorithm
Starts with tree depth zero
Tries to add splits for each leaf node (greedy approach)
Objective function “before” and “after” split
Gain function calculates the gain from splitting:
- Gain = (Score of left child + Score of right child) - (Aggregated score if not split) - Gamma
Stops split if best split node has negative gain
Uses sparsity-awareness algorithm

Summary of Algorithm

Start with weak learner F(xi)
For loop: Build T trees using tree learning algorithm
Use minimum objective function for each tree built
Additively combine models
Training stops when T is reached or acceptable accuracy level
Final model: Additive combination of all models in the loop

Recap

General objective function and squared loss
Trade-off optimization
Measure of tree complexity
Decomposition to additive method
Taylor approximation for quadratic form
Deriving optimal weights and minimum objective function
Tree building and structure learning process

Next Steps

Next video: Illustrative example of XGBoost tree-building based on discussed formulations

Full transcript