Gradient Descent and Regression Models

Overview of Regression Models

Definition: Regression models are simple models in machine learning used for predicting continuous numerical values (e.g., stock prices, temperatures).
Example Use Case: Predicting salaries based on years of experience.

Regression Problem: Involves predicting unknown continuous values (e.g., salaries).
Explanatory Data Analysis: Understand the relationship between years of experience and salaries by plotting data.
- Observation: Typically, as experience increases, salaries increase (positive relationship).

Definition of Error: The difference between the model's predicted value and the actual value.
Error Calculation:
- Subtract actual Y value from predicted Y value.
- Focus on the magnitude of the error, either by taking the absolute value or squaring it.
Loss Functions: Commonly used loss functions for regression tasks include:
- Mean Squared Error (MSE):
  Calculation:
  - Sum of squared errors divided by the number of data points.
- Mean Absolute Error (MAE):
  Calculation:
  - Sum of absolute errors divided by the number of data points.

Objective: Find the best combination of model parameters (W and B) that minimize the loss function.
Gradient Descent: Algorithm that iteratively adjusts model parameters based on the gradient of the loss function.
- Learning Rate: Controls how fast the model learns (often a small value).

Importance: Standardizing ensures features are on the same scale with mean 0 and standard deviation 1, aiding model learning efficiency.

Epochs: The process of training (adjusting parameters) continues until the loss function reaches a minimum value.
Learning Rate Impact:
- Small learning rate = gradual convergence.
- Large learning rate (e.g., 0.9) = unstable learning, causing the model to oscillate.

Additional Parameter: Adding bias into the model necessitates calculating gradients for both weight and bias concurrently.
3D Representation: The loss plot becomes three-dimensional with respect to both weight and bias.

Gradient descent allows for the modeling of various relationships in different datasets.
Advanced Optimization Algorithms: Concepts of gradient descent lead to more sophisticated methods like Adam, adaptive momentum, mini-batch, and stochastic gradient descent.

Understanding gradient descent is fundamental for building and optimizing regression models in machine learning.