One Hot Encoding in Machine Learning

Jul 14, 2024

One Hot Encoding in Machine Learning

Introduction

  • Objective: Understand one-hot encoding and its usage in machine learning.
  • Previously discussed: Labels for images encoded as one-hot vectors.

Supervised Learning and Labels

  • During training, labeled inputs are passed to the model resulting in predicted outputs.
  • Example: Image classifier with labeled images of animals.
  • Labels & outputs typically not words (e.g., 'cat', 'dog') but encoded into integers or vectors.

What is One Hot Encoding?

  • Definition: Transforms categorical labels into vectors of zeros and ones.
  • Vector Length: Equal to the number of classes/categories.
    • Example: Two categories (cat, dog) = vector length of 2.
    • Three categories (cat, dog, lizard) = vector length of 3.

Detailed Explanation

  • Index Association: Each element in the vector corresponds to a category.
    • E.g., Cat -> 1st element, Dog -> 2nd element, Lizard -> 3rd element.
  • Vector Composition: All elements are 0 except the one corresponding to the actual category.
  • Examples:
    • Cat: [1, 0, 0]
    • Dog: [0, 1, 0]
    • Lizard: [0, 0, 1]

Adding More Categories

  • Example with added category (llama):
    • Four categories: Cat, Dog, Lizard, Llama -> vector length of 4.
    • Encoding:
      • Cat: [1, 0, 0, 0]
      • Dog: [0, 1, 0, 0]
      • Lizard: [0, 0, 1, 0]
      • Llama: [0, 0, 0, 1]
  • Order Flexibility: The order can vary based on the underlying code or library.
    • Example variation:
      • Dog: [1, 0, 0, 0]
      • Lizard: [0, 1, 0, 0]
      • Llama: [0, 0, 1, 0]
      • Cat: [0, 0, 0, 1]

Practical Tips

  • Check how encoding is mapped in the chosen library (e.g., Keras for image data).
  • Refer to previous tutorials for detailed steps on accessing this mapping.

Conclusion

  • One-hot encoding is crucial for transforming labels for classification tasks in neural networks.
  • Ensures categorical data is in a format suitable for model processing.