🤖

Sentiment Analysis Using BERT and Transformers

Jul 10, 2024

Lecture on Sentiment Analysis with BERT and Transformers

Introduction

  • Topic: Understanding Sentiment Analysis using BERT and the Transformers library.
  • Goal: Build a sentiment analysis model and apply it to Yelp reviews.

Steps Overview

  1. Install and import dependencies.
  2. Instatiate and download pre-trained models (BERT).
  3. Perform sentiment scoring on sample text.
  4. Scrape and analyze Yelp reviews.
  5. Store the results in a pandas DataFrame.

Installing Dependencies

  • Transformers: For NLP models and sentiment analysis.
  • PyTorch: Backend that supports Transformers library.
  • Requests: For making HTTP requests to scrape data.
  • BeautifulSoup: For parsing HTML and extracting data.
  • Pandas: For data manipulation and storage.
  • Numpy: Additional data transformation utilities.

Installation Commands

!pip install transformers !pip install torch !pip install requests !pip install beautifulsoup4 !pip install pandas !pip install numpy

Loading the BERT Model

  • Tokenizer: Converts text into sequence of numbers.
  • Model: AutoModelForSequenceClassification for sequence classification tasks.
  • Tokenizer & Model Initialization:
from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # also for argmax import requests from bs4 import BeautifulSoup import re # Initialize tokenizer and model tokenizer = AutoTokenizer.from_pretrained('nlp-town/bert-base-multilingual-uncased-sentiment') model = AutoModelForSequenceClassification.from_pretrained('nlp-town/bert-base-multilingual-uncased-sentiment')

Sentiment Scoring Process

Tokenize Text

text = "I hated this, absolutely the worst!" tokens = tokenizer.encode(text, return_tensors='pt')

Predict Sentiment

result = model(tokens) sentiment = torch.argmax(result.logits) sentiment_score = sentiment.item() + 1 # Scores 1-5

Web Scraping Yelp Reviews

Scraper Function

  1. Request Yelp Page
url = 'https://www.yelp.com/biz/mexico-sydney-2' r = requests.get(url) soup = BeautifulSoup(r.text, 'html.parser')
  1. Find Reviews in HTML
regex = re.compile('.*comment.*') results = soup.find_all('p', {'class': regex})
  1. Extract Text
reviews = [result.text for result in results]

Aggregating Reviews in a DataFrame

Create DataFrame

import pandas as pd import numpy as np df = pd.DataFrame(np.array(reviews), columns=['review'])

Apply Sentiment Analysis to each review

def sentiment_score(text): tokens = tokenizer.encode(text, return_tensors='pt') result = model(tokens) sentiment = torch.argmax(result.logits) return sentiment.item() + 1 df['sentiment'] = df['review'].apply(lambda x: sentiment_score(x[:512]))

Testing with another Yelp page

Steps

  1. Change URL in the scraper function.
  2. Re-run the code block to scrape and analyze the new reviews.

Summary

  • Installed necessary libraries.
  • Used BERT model and tokenizer from Transformers for sentiment analysis.
  • Scraped reviews from Yelp and analyzed sentiment.
  • Aggregated results in pandas DataFrame.
  • Capable of extending to other data sources or languages.

End Note: This approach is useful for businesses looking to gauge customer sentiment from reviews.


Fun Fact: Some models can analyze text in multiple languages, making them versatile for international applications.

Tools Mentioned

  • PyTorch
  • Transformers Library
  • Beautiful Soup
  • Pandas
  • Numpy
  • Mito (for Excel to Python transformations)