📉

Hands-On Implementation of Linear Regression using Microsoft Azure ML Studio Classic

Jul 11, 2024

Hands-On Implementation of Linear Regression using Microsoft Azure ML Studio Classic

Introduction

  • Welcome Remarks: Professor greets participants from different parts of the world and acknowledges their time.
  • Lab Objective: Hands-on implementation of linear regression.
  • Tool: Microsoft Azure ML Studio Classic.

Set Up

  1. Open Microsoft Azure ML Studio Classic and log in with the appropriate ID.
  2. Start a new blank experiment.
  3. Ensure everyone is logged in and present on the same page by confirming in the chat window.

Data Setup

Data Set Selection

  • Use the sample dataset provided: Automobile Price Data.
  • Purpose: Predict automobile prices using linear regression.

Understanding Dataset Attributes

  • Rows and Columns: 205 rows, 26 columns.
  • Target Variable: Price (supervised learning problem).
  • General Feeds: Various features that describe automobiles (make, fuel type, engine location, etc.).
  • Categorical Variables: Symboling, Make, Fuel Type, Aspiration, etc.
  • Numeric Variables: Length, Weight, Horsepower, etc.

Important Columns

  • Symboling: Risk assessment for insuring a specific automobile.
  • Normalized Losses: Losses incurred by insurance companies on specific vehicles.

Data Preparation

Initial Steps

  1. Select Columns in Data Set: Select all features.
  2. Remove Duplicate Rows: Check for duplicates and remove if any.
  3. Edit Metadata: Separate categorical and numeric features.
    • Demarcate categorical features from numeric ones.
    • Ensure 11 categorical features and 15 numeric features.
    • Explicitly set these features' category.

Data Imputation

  1. Categorical Variables: Use clean missing data to replace missing values with the mode.
  2. Numeric Variables: Use clean missing data to replace missing values with the median.
  3. Data Normalization: Normalize numeric data using Min-Max normalization.

Splitting Data

  1. Initial Split: Split data into training (95%) and test (5%) datasets.
  2. Secondary Split: Further split training data into 95% training and 5% validation datasets.

Linear Regression Model Setup

Initialize Models

  1. Initialize two linear regression models with different solution methods:
    • Ordinary Least Squares (OLS).
    • Online Gradient Descent.
  2. Adjust hyperparameters for both models.
    • L2 Regularization Weight: Set to a larger value to penalize coefficients.
    • Online Gradient Descent Learning Rate: Set to 0.1, enable learning rate decay, set training epochs.

Hyperparameter Tuning

  1. Use the Tune Model Hyperparameters component.
    • Parameter Sweeping Mode: Random grid, set maximum runs.
    • Random Seed for reproducibility.
    • Metric for Performance: Root Mean Squared Error (RMSE).
  2. Run the tuning process on both models.
    • Explore which hyperparameter combinations yield the best performing models.

Training the Model

  1. Use the Train Model component to train the best model from hyperparameter tuning.
  2. Specify the label column as Price.
  3. Connect the training dataset from the initial split.
  4. Run the training process for both models.

Model Testing and Evaluation

  1. Score Model: Score both trained models using the test dataset.
  2. Evaluate Model: Evaluate both models and compare performative metrics.
    • Coefficient of Determination (R²): Measures goodness of fit.
    • Mean Absolute Error (MAE): Average error in predictions.
    • Root Mean Squared Error (RMSE): Square root of the mean of squared errors.
    • Relative Absolute Error (RAE): Normalizes errors for comparison.
    • Relative Squared Error (RSE): Similar use case as RAE but often used in financial modeling.
  3. Visualize and compare results from both models.

Key Insights

  • OSL vs. Online Gradient Descent: OSL typically provides better performance as it’s a closed-form solution while online gradient descent is iterative and approximated.
  • Adjusting Hyperparameters and Performing Regularizations Can Impact Model Performance Significantly.

Q&A and Conclusion

  • Addressed various queries from participants regarding data steps, model configurations, and statistical significance of the parameters.
  • Discussion on Future Sessions: Professor will check if logistic regression is included.
  • Feedback Request: Participants encouraged to provide feedback.
  • Next Steps: Concluding remarks and thanks to participants.

Contact Information

  • Professor’s Email: Provided for further queries.