Machine Learning with Python: Feature Scaling

Jul 8, 2024

Machine Learning with Python: Feature Scaling

Introduction

  • Builder ninth video in the series
  • Topic: Feature Scaling for your dataset

What is Feature Scaling?

  • Converting numerical data in a wide range into the same scale or range
  • Example: Columns like mpg, displacement, horsepower, weight, and acceleration with different value ranges
  • Goal: Convert numerical data to range between -1 to 1 for uniformity

Why Do We Need Feature Scaling?

Reason 1: Algorithms Using Euclidean Distance

  • Euclidean distance formula used by many algorithms
  • Issue: Large differences in numerical values cause high uncertainty
  • Example: Thousands vs. tens of numerical values creates inaccuracies
  • Solution: Feature scaling makes ranges uniform, improving model accuracy

Reason 2: Training Time Efficiency

  • Small numerical range (-1 to 1) reduces training time

Methods of Feature Scaling

  1. Standardization
  2. Normalization
  • Both methods provided by Python libraries
  • Python package handles calculations automatically

Practical Implementation

  • Jupyter notebook used (same as test and train split video)
  • Import necessary libraries

Formulas

  • Standardization formula provided
  • Not focusing on detailed math, as Python handles this

Implementation Steps

  1. Create standard scaler class object
  2. Apply standard scaler to train data (X_train)
  3. Transform train data
  4. Apply to test data (X_test)

Results

  • Values in each column vary between -1 and 1 for both train and test data

Special Note: Dependent Variable (Y)

  • Feature scaling not applied to Y as it contains categorical data
  • Y has values 0 (not buy) or 1 (buy)
  • Scaling Y could spoil the model

Conclusion

  • Final video on data pre-processing
  • Next: Supervised and unsupervised machine learning algorithms
  • Future videos will focus on machine learning journey

Prepared by [Your Name]