Lecture on Sentiment Analysis with BERT and Transformers

Introduction

Topic: Understanding Sentiment Analysis using BERT and the Transformers library.
Goal: Build a sentiment analysis model and apply it to Yelp reviews.

Steps Overview

Install and import dependencies.
Instatiate and download pre-trained models (BERT).
Perform sentiment scoring on sample text.
Scrape and analyze Yelp reviews.
Store the results in a pandas DataFrame.

Installing Dependencies

Transformers: For NLP models and sentiment analysis.
PyTorch: Backend that supports Transformers library.
Requests: For making HTTP requests to scrape data.
BeautifulSoup: For parsing HTML and extracting data.
Pandas: For data manipulation and storage.
Numpy: Additional data transformation utilities.

Installation Commands

!pip install transformers
!pip install torch
!pip install requests
!pip install beautifulsoup4
!pip install pandas
!pip install numpy

Loading the BERT Model

Tokenizer: Converts text into sequence of numbers.
Model: AutoModelForSequenceClassification for sequence classification tasks.
Tokenizer & Model Initialization:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch # also for argmax
import requests
from bs4 import BeautifulSoup
import re

# Initialize tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('nlp-town/bert-base-multilingual-uncased-sentiment')
model = AutoModelForSequenceClassification.from_pretrained('nlp-town/bert-base-multilingual-uncased-sentiment')

Sentiment Scoring Process

Tokenize Text

text = "I hated this, absolutely the worst!"
tokens = tokenizer.encode(text, return_tensors='pt')

Predict Sentiment

result = model(tokens)
sentiment = torch.argmax(result.logits)
sentiment_score = sentiment.item() + 1 # Scores 1-5

Web Scraping Yelp Reviews

Scraper Function

Request Yelp Page

url = 'https://www.yelp.com/biz/mexico-sydney-2'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')

Find Reviews in HTML

regex = re.compile('.*comment.*')
results = soup.find_all('p', {'class': regex})

Extract Text

reviews = [result.text for result in results]

Aggregating Reviews in a DataFrame

Create DataFrame

import pandas as pd
import numpy as np

df = pd.DataFrame(np.array(reviews), columns=['review'])

Apply Sentiment Analysis to each review

def sentiment_score(text):
    tokens = tokenizer.encode(text, return_tensors='pt')
    result = model(tokens)
    sentiment = torch.argmax(result.logits)
    return sentiment.item() + 1

df['sentiment'] = df['review'].apply(lambda x: sentiment_score(x[:512]))

Testing with another Yelp page

Steps

Change URL in the scraper function.
Re-run the code block to scrape and analyze the new reviews.

Summary

Installed necessary libraries.
Used BERT model and tokenizer from Transformers for sentiment analysis.
Scraped reviews from Yelp and analyzed sentiment.
Aggregated results in pandas DataFrame.
Capable of extending to other data sources or languages.

End Note: This approach is useful for businesses looking to gauge customer sentiment from reviews.

Fun Fact: Some models can analyze text in multiple languages, making them versatile for international applications.

Tools Mentioned

PyTorch
Transformers Library
Beautiful Soup
Pandas
Numpy
Mito (for Excel to Python transformations)

Sentiment Analysis Using BERT and Transformers