Definition: Regression models are simple models in machine learning used for predicting continuous numerical values (e.g., stock prices, temperatures).
Example Use Case: Predicting salaries based on years of experience.
Explanatory Data Analysis: Understand the relationship between years of experience and salaries by plotting data.
Observation: Typically, as experience increases, salaries increase (positive relationship).
Training the Regression Model
Model Representation: Regression model is a straight line represented by the equation: Y = WX + B
Where:
Y: Predicted value
W: Model weight
X: Input value
B: Bias term
Model Parameters: W and B are the parameters the model learns during training.
Error Measurement
Definition of Error: The difference between the model's predicted value and the actual value.
Error Calculation:
Subtract actual Y value from predicted Y value.
Focus on the magnitude of the error, either by taking the absolute value or squaring it.
Loss Functions: Commonly used loss functions for regression tasks include:
Mean Squared Error (MSE):
Calculation:
Sum of squared errors divided by the number of data points.
Mean Absolute Error (MAE):
Calculation:
Sum of absolute errors divided by the number of data points.
Model Training Process
Objective: Find the best combination of model parameters (W and B) that minimize the loss function.
Gradient Descent: Algorithm that iteratively adjusts model parameters based on the gradient of the loss function.
Learning Rate: Controls how fast the model learns (often a small value).
Standardizing the Dataset
Importance: Standardizing ensures features are on the same scale with mean 0 and standard deviation 1, aiding model learning efficiency.
Training Steps Overview
Initialize model with random parameters.
Set a learning rate (e.g., 0.06).
Calculate loss for parameter values.
Update parameters based on loss gradient.
Repeat the process (an Epoch) until minimum loss is achieved.
Observing Changes in Loss
Epochs: The process of training (adjusting parameters) continues until the loss function reaches a minimum value.
Learning Rate Impact:
Small learning rate = gradual convergence.
Large learning rate (e.g., 0.9) = unstable learning, causing the model to oscillate.
Incorporating Bias
Additional Parameter: Adding bias into the model necessitates calculating gradients for both weight and bias concurrently.
3D Representation: The loss plot becomes three-dimensional with respect to both weight and bias.
Applications of Gradient Descent
Gradient descent allows for the modeling of various relationships in different datasets.
Advanced Optimization Algorithms: Concepts of gradient descent lead to more sophisticated methods like Adam, adaptive momentum, mini-batch, and stochastic gradient descent.
Conclusion
Understanding gradient descent is fundamental for building and optimizing regression models in machine learning.