Coconote
AI notes
AI voice & video notes
Try for free
📘
XGBoost Mathematical Formulation
Jul 19, 2024
XGBoost Mathematical Formulation - Lecture Notes
Prerequisites
Adaboost Math (covered in Part 5)
Gradient Boost Math (covered in Part 6)
XGBoost Concepts (covered in Part 7)
Overview
XGBoost: Implementation of optimized gradient booster trees
Builds on the 8-step gradient boosting algorithm with additional improvements
Regularized model: Measures complexity of the tree with two components:
Training Loss
: Measures fit of data to model
Regularization Component
: Measures complexity of the tree
Loss Function
Uses squared loss: Sum of squared differences between predicted and actual values
Trade-off optimization between training loss and regularization loss
Training Loss Optimization
: Higher accuracy on training data
Regularization Optimization
: Generalized, simpler models for production
Regularization Parameters
Number of Leaves
L2 Norm of Leaf Weight
Hyperparameters:
Gamma
and
Lambda
Regularization formula includes:
Number of leaves weighted by gamma
Lambda weighted L2 norm of leaf score
Example
Sample tree with 2 nodes and 3 leaves:
Weights: Leaf 1 = +2, Leaf 2 = 0.1, Leaf 3 = -1
Regularization: Gamma * 3 + 0.5 * Lambda * (W1^2 + W2^2 + W3^2)
Training Loss Component
Represented as an additive model due to boosting
Follows similar construction to Adaboost and Gradient Boost
Formula: Final model Y_hat(t) = Previous model Y_hat(t-1) + New model
Taylor Approximation
Approximates differentiable functions
Formula: F(x + deltaX) ≈ F(x) + F'(x) * deltaX + 0.5 * F''(x) * deltaX^2
Applied to XGBoost’s objective function for quadratic approximation
Introduction of
Gi
and
Hi
terms: First and second-order differentiations
Simplified Objective Function
Combines training loss and regularization
Uses summations G and H for notation
New objective function: Sum of quadratic equations
Optimal leaf weight Wj derived as -Gi / (Hi + Lambda)
Rewrites objective function without W’s: Minimum Objective Function
Tree Building Process
Greedy tree-growing algorithm
Starts with tree depth zero
Tries to add splits for each leaf node (greedy approach)
Objective function “before” and “after” split
Gain function calculates the gain from splitting:
Gain = (Score of left child + Score of right child) - (Aggregated score if not split) - Gamma
Stops split if best split node has negative gain
Uses sparsity-awareness algorithm
Summary of Algorithm
Start with weak learner F(xi)
For loop: Build T trees using tree learning algorithm
Use minimum objective function for each tree built
Additively combine models
Training stops when T is reached or acceptable accuracy level
Final model: Additive combination of all models in the loop
Recap
General objective function and squared loss
Trade-off optimization
Measure of tree complexity
Decomposition to additive method
Taylor approximation for quadratic form
Deriving optimal weights and minimum objective function
Tree building and structure learning process
Next Steps
Next video: Illustrative example of XGBoost tree-building based on discussed formulations
📄
Full transcript