One Hot Encoding in Machine Learning

Jul 14, 2024

One Hot Encoding in Machine Learning

Introduction

Objective: Understand one-hot encoding and its usage in machine learning.
Previously discussed: Labels for images encoded as one-hot vectors.

Supervised Learning and Labels

During training, labeled inputs are passed to the model resulting in predicted outputs.
Example: Image classifier with labeled images of animals.
Labels & outputs typically not words (e.g., 'cat', 'dog') but encoded into integers or vectors.

What is One Hot Encoding?

Definition: Transforms categorical labels into vectors of zeros and ones.
Vector Length: Equal to the number of classes/categories.
- Example: Two categories (cat, dog) = vector length of 2.
- Three categories (cat, dog, lizard) = vector length of 3.

Detailed Explanation

Index Association: Each element in the vector corresponds to a category.
- E.g., Cat -> 1st element, Dog -> 2nd element, Lizard -> 3rd element.
Vector Composition: All elements are 0 except the one corresponding to the actual category.
Examples:
- Cat: [1, 0, 0]
- Dog: [0, 1, 0]
- Lizard: [0, 0, 1]

Adding More Categories

Example with added category (llama):
- Four categories: Cat, Dog, Lizard, Llama -> vector length of 4.
- Encoding:
  - Cat: [1, 0, 0, 0]
  - Dog: [0, 1, 0, 0]
  - Lizard: [0, 0, 1, 0]
  - Llama: [0, 0, 0, 1]
Order Flexibility: The order can vary based on the underlying code or library.
- Example variation:
  - Dog: [1, 0, 0, 0]
  - Lizard: [0, 1, 0, 0]
  - Llama: [0, 0, 1, 0]
  - Cat: [0, 0, 0, 1]

Practical Tips

Check how encoding is mapped in the chosen library (e.g., Keras for image data).
Refer to previous tutorials for detailed steps on accessing this mapping.

Conclusion

One-hot encoding is crucial for transforming labels for classification tasks in neural networks.
Ensures categorical data is in a format suitable for model processing.

Full transcript