Developing a Stress-Checking Algorithm with NLP

Jun 1, 2024

Lecture: Developing a Stress-Checking Algorithm with NLP

Introduction

  • Topic: Checking stress levels using data on personal feelings
  • Objective: Determine if textual data on personal feelings indicates stress

Importing Libraries

  1. Pandas: import pandas as PD
  2. Numpy: import numpy as NP
  3. NLTK
    • Need it for NLP work
    • Importing stop words and stemmer: from nltk.corpus import stopwords
    • Using SnowballStemmer for English: nltk.SnowballStemmer('english')
  4. Regex: For text cleaning
  5. Matplotlib: For plotting word clouds and visualizations
    • import matplotlib.pyplot as plt
  6. Word Cloud: For generating word clouds
    • from wordcloud import WordCloud, ImageColorGenerator
  7. Sklearn: For machine learning tasks
    • CountVectorizer: from sklearn.feature_extraction.text import CountVectorizer
    • Train-Test Split: from sklearn.model_selection import train_test_split
    • Bernoulli Naive Bayes: from sklearn.naive_bayes import BernoulliNB

Bringing in the Data

  • Dataset: stress.csv
    • Loaded using Pandas: pd.read_csv('stress.csv')
  • Data columns: Text ID, label, confidence, social stamp, problem description (e.g., PTSD, assistance syndrome, relationship issues)

Data Preparation

  1. Text Cleaning Function

    • Convert text to lowercase
    • Remove HTML tags, URLs, and unnecessary characters
  2. Stop Words Processing

    • Use NLTK stop words: stopwords.words('english')
    • Create a set of stop words for efficient lookup

Data Visualization

  • Generate a word cloud using WordCloud to visualize frequently occurring words in the dataset

Label Mapping

  • Map label values to stress indicators
    • 0: No stress
    • 1: Stress
  • Code: data['label'] = data['label'].map({0: 'no stress', 1: 'stress'})

Feature Extraction

  1. Count Vectorizer

    • Transform textual data into numerical data
    • Code: CV = CountVectorizer()
    • Fit and transform the text data: X = CV.fit_transform(data['text'])
  2. Train-Test Split

    • Split data into training and testing sets
    • Code: train_test_split(X, y, test_size=0.3, random_state=42)
    • Variables: X_train, X_test, y_train, y_test

Model Building

  1. Bernoulli Naive Bayes Model
    • Import and create the model: BernoulliNB()
    • Fit the model: model.fit(X_train, y_train)

Prediction

  1. Example Inputs
    • Input: