⚙️

Rock vs. Mine Prediction Using Sonar Data

Jul 12, 2024

Lecture Notes: Rock vs. Mine Prediction Using Sonar Data

Introduction

  • Channel Focus: Artificial Intelligence and Machine Learning
  • Content Schedule:
    • Monday & Wednesday: Basics of ML concepts/ topics
    • Friday: Detailed use case/ project in ML
  • Current Topic: Rock vs. Mine Prediction using Sonar Data

Tools and Setup

  • Programming Language: Python
  • Platform: Google Collaboratory (Cloud-based system, no installation required)
    • Only need Google Chrome
    • Connects to Google Drive
    • Offers good storage (12 GB RAM, usable storage)

Use Case Explanation

  • Problem: Predict whether an object underwater is a mine or a rock using sonar data
    • Example: Submarine navigation and identification of underwater mines
  • Data: Sonar signals reflecting off objects (rock vs. metal)
    • Collected in a lab setting using a sonar device
    • Data format: CSV (Comma Separated Values)

Workflow

  1. Data Collection
  2. Pre-processing Data
    • Analyze and split data into training and test sets
  3. Training Machine Learning Model
    • Logistic Regression (Binary classification)
    • Supervised Learning Algorithm
  4. Testing and Validation
  5. Prediction

Implementation Steps

Data Collection

  • Source: Kaggle or UC Machine Learning repositories
  • Dataset: sonar_data.csv (link provided in the video description)
  • Features: 60
  • Labels: 'R' for rock, 'M' for mine

Data Pre-processing

  • Libraries Used: Import Dependencies import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score
  • Loading Data into Pandas DataFrame sonar_data = pd.read_csv(path_to_csv_file, header=None)
  • Description of Data
    • 208 rows, 61 columns
    • 1st to 60th columns: Features
    • 61st column: Labels ('R' or 'M')

Data Analysis

  • Checking basic statistics: data.describe()
  • Count of 'R' and 'M': data[60].value_counts()
  • Grouping data based on 'R' and 'M': data.groupby(60).mean()

Splitting Data

  • Separate Data and Labels X = data.drop(columns=60) Y = data[60]
  • Split into training and test sets X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.1, stratify=Y, random_state=1)

Model Training

  • Logistic Regression Model model = LogisticRegression() model.fit(X_train, Y_train)

Model Evaluation

  • Training data accuracy training_predictions = model.predict(X_train) training_accuracy = accuracy_score(Y_train, training_predictions) print(f"Training Accuracy: {training_accuracy}")
  • Test data accuracy test_predictions = model.predict(X_test) test_accuracy = accuracy_score(Y_test, test_predictions) print(f"Test Accuracy: {test_accuracy}")

Making Predictions

  • Create a predictive system def predictive_system(data_example): ... # Convert data example to numpy array ... # Reshape data to match model input ... # Predict using logistic regression model return prediction
  • Example: Check for rock and mine using sample data sample_input = [...] # Some example data prediction = predictive_system(sample_input) if prediction[0] == 'R': print("The object is a Rock") else: print("The object is a Mine")

Conclusion

  • Check the description for datasets and Collab file
  • Practice writing python scripts on your own
  • Post queries in comments for clarification