Building a Decision Tree Using CART Algorithm

Jul 18, 2024

Building a Decision Tree Using CART Algorithm

Introduction

  • Discussed how to build a decision tree using Classification and Regression Trees (CART) Algorithm.
  • Used a simple solved example with a dataset of 4 attributes and 10 examples.
  • Target attribute is 'job offer'.

Step 1: Calculate Gini Index of the Whole Dataset

  • Gini index formula: 1 - Σ (Pi^2)
  • Pi = probability of class
  • For job offer: 7 'yes' and 3 'no'
  • Gini index calculation:
    • Probability 'yes' (P1): 7/10
    • Probability 'no' (P2): 3/10
    • Gini = 1 - (7/10)^2 - (3/10)^2 = 0.42

Step 2: Calculate Gini Index for Each Attribute

CGPA

  • Three values: >= 9, >= 8, < 8
  • Calculate Gini for all possible subsets:
    • Subsets (combinations): 8 total (2^3 - 1 null)
  • Find best subset using minimum Gini.
  • Example Splitting:
    • First subset: [>=9, >=8], <=8
    • Second subset: [> = 8, < 9], [>=9]
    • Optimal split: [>=9, >=8], <=8 (Gini = 0.17552)

Interactiveness

  • Two values: yes, no
  • Calculate Gini: yes and no
  • Only one subset, combined Gini is 0.52

Practical Knowledge

  • Three values: very good, good, average
  • Splitting based on possible combinations:
    • Subset splits similar to CGPA.
    • Optimal split: very good, good and average (Gini = 0.3750)

Communication Skill

  • Three values: good, moderate, poor
  • Evaluate all splits:
    • Optimal split: [good, moderate], poor (Gini = 0.245)

Choose Best Root Node

  • Compare Gini gains: select attribute with maximum gain
  • CGPA and Communication Skill have highest gains.
  • Select CGPA as root node.

Build the Tree

  1. Root: CGPA
  • Branch: <8 -> Job Offer = 'No'
  • Branch: [>=8, >=9]
    • Mix of 'yes' and 'no'
    • Requires further splitting
  1. Further Split on [>=8, >=9]
    • Choose Communication Skill
    • Values: good, moderate, poor
      • Branch: poor -> Job Offer = 'No'
      • Branch: [good, moderate] -> Job Offer = 'Yes'

Conclusion

  • CART algorithm effectively splits data to build a decision tree.
  • Example demonstrated splitting and Gini calculation.
  • Final tree splits based on CGPA and Communication Skill.
  • Formula Recap:
    • Gini Index: 1 - Σ (Pi^2)
    • Split Evaluation: Gini of subsets and total Gini.
  • Practical steps for splits and root node determination explained.