📊

Building a Machine Learning Model with Weka

Feb 13, 2025

Building a Machine Learning Model with Weka

Introduction

  • Aim: Build a machine learning model without coding.
  • Software: Weka (Waikato Environment for Knowledge Analysis)
    • Developed by the University of Waikato.
    • Coded in Java.
    • First machine learning software used by the presenter.

Getting Started with Weka

  • Download Weka from the official website.
  • Installation:
    • Choose file based on your operating system.

Weka GUI Overview

  • Open Weka GUI Chooser, select "Explorer".
  • Interface has six tabs, starting at the "Preprocess" tab.

Importing Data

  • Open file option to import datasets.
  • Example dataset: CPU Data.
    • Instances: 209
    • Attributes: 7 (6 independent variables, 1 dependent variable)

Data Preprocessing

  • Importance of data scaling due to differing ranges of variables.
  • Min-Max Normalization:
    • Scale values between 0 and 1.
    • Steps:
      1. Click on "Choose" in filters.
      2. Select "Unsupervised" -> "Attribute" -> "Normalize" and apply.

Building the Model

  • Navigate to "Classify" tab.
  • Steps to create a model:
    1. Click "Choose" under Classifier.
    2. Select "Functions" -> "Linear Regression".
    3. Set cross-validation to 10 folds.
    4. Click "Start" to build the model.
  • Model Evaluation:
    • Correlation coefficient: 0.9
    • Root mean squared error: 69.556
    • Displays the linear regression equation.

Making Predictions

  • For training data predictions, click "Start".
  • Option for an 80/20 data split for training/testing sets.

Exploring Different Algorithms

  • Other algorithms include:
    • Multi-Layer Perceptron (Neural Network)
    • Support Vector Machine (SVM)
    • Random Forest
  • Performance Results:
    • Random forest showed best performance (0.9737).

Visualization

  • Visualize data distribution using scatter plots.

Creating Custom Datasets

  • Example: Delaney Solubility Prediction Dataset.
  • Steps to prepare data:
    1. Download the dataset.
    2. Format as .arff for Weka.
    3. Normalize or Standardize data if necessary.

Additional Notes

  • Random Forest performed best overall in the example.
  • Encouragement to like, subscribe, and share the video.

Conclusion

  • Reminder: The best way to learn data science is through practice.
  • Thank you for watching!