XGBoost Mathematical Formulation

Jul 19, 2024

XGBoost Mathematical Formulation - Lecture Notes

Prerequisites

  • Adaboost Math (covered in Part 5)
  • Gradient Boost Math (covered in Part 6)
  • XGBoost Concepts (covered in Part 7)

Overview

  • XGBoost: Implementation of optimized gradient booster trees
  • Builds on the 8-step gradient boosting algorithm with additional improvements
  • Regularized model: Measures complexity of the tree with two components:
    • Training Loss: Measures fit of data to model
    • Regularization Component: Measures complexity of the tree

Loss Function

  • Uses squared loss: Sum of squared differences between predicted and actual values
  • Trade-off optimization between training loss and regularization loss
    • Training Loss Optimization: Higher accuracy on training data
    • Regularization Optimization: Generalized, simpler models for production

Regularization Parameters

  • Number of Leaves
  • L2 Norm of Leaf Weight
  • Hyperparameters: Gamma and Lambda
  • Regularization formula includes:
    • Number of leaves weighted by gamma
    • Lambda weighted L2 norm of leaf score

Example

  • Sample tree with 2 nodes and 3 leaves:
    • Weights: Leaf 1 = +2, Leaf 2 = 0.1, Leaf 3 = -1
    • Regularization: Gamma * 3 + 0.5 * Lambda * (W1^2 + W2^2 + W3^2)

Training Loss Component

  • Represented as an additive model due to boosting
  • Follows similar construction to Adaboost and Gradient Boost
  • Formula: Final model Y_hat(t) = Previous model Y_hat(t-1) + New model

Taylor Approximation

  • Approximates differentiable functions
  • Formula: F(x + deltaX) ≈ F(x) + F'(x) * deltaX + 0.5 * F''(x) * deltaX^2
  • Applied to XGBoost’s objective function for quadratic approximation
  • Introduction of Gi and Hi terms: First and second-order differentiations

Simplified Objective Function

  • Combines training loss and regularization
  • Uses summations G and H for notation
  • New objective function: Sum of quadratic equations
  • Optimal leaf weight Wj derived as -Gi / (Hi + Lambda)
  • Rewrites objective function without W’s: Minimum Objective Function

Tree Building Process

  • Greedy tree-growing algorithm
  • Starts with tree depth zero
  • Tries to add splits for each leaf node (greedy approach)
  • Objective function “before” and “after” split
  • Gain function calculates the gain from splitting:
    • Gain = (Score of left child + Score of right child) - (Aggregated score if not split) - Gamma
  • Stops split if best split node has negative gain
  • Uses sparsity-awareness algorithm

Summary of Algorithm

  1. Start with weak learner F(xi)
  2. For loop: Build T trees using tree learning algorithm
  3. Use minimum objective function for each tree built
  4. Additively combine models
  5. Training stops when T is reached or acceptable accuracy level
  6. Final model: Additive combination of all models in the loop

Recap

  • General objective function and squared loss
  • Trade-off optimization
  • Measure of tree complexity
  • Decomposition to additive method
  • Taylor approximation for quadratic form
  • Deriving optimal weights and minimum objective function
  • Tree building and structure learning process

Next Steps

  • Next video: Illustrative example of XGBoost tree-building based on discussed formulations