Build a Sentiment Analysis Model in Python

How to Build a Sentiment Analysis Model in Python

TechsBucketNovember 23, 2024

Understanding sentiments in text has become a cornerstone of many modern applications. From analyzing customer feedback to monitoring social media, sentiment analysis enables businesses and developers to uncover insights from textual data.

In this article, we’ll walk you through how to build a sentiment analysis model in Python using natural language processing (NLP) techniques and machine learning.

What Is Sentiment Analysis?

Sentiment analysis, also known as opinion mining, is a text analysis technique used to determine the emotional tone behind words. It categorizes text into sentiments such as positive, negative, or neutral, helping machines interpret human emotions.

Applications of Sentiment Analysis:

Customer feedback analysis
Social media monitoring
Product reviews classification
Political sentiment tracking

Prerequisites for Building a Sentiment Analysis Model

Before we start coding, ensure you have the following installed on your machine:

Python 3.6 or later: Download Python here.
Jupyter Notebook: Available via Anaconda Distribution.
Basic Libraries: We’ll use pandas, numpy, scikit-learn, and NLTK. Install them using pip if not already installed:

pip install pandas numpy scikit-learn nltk

pip install pandas numpy scikit-learn nltk

Step-by-Step Guide to Building a Sentiment Analysis Model

1. Import Necessary Libraries

Start by importing the required libraries in your Python environment.

python

import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.naive_bayes import MultinomialNB

from sklearn.metrics import accuracy_score

import nltk

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize

import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB from sklearn.metrics import accuracy_score import nltk from nltk.corpus import stopwords from nltk.tokenize import word_tokenize

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

Download the NLTK stopwords package if you haven’t already:

2. Load and Explore the Dataset

For this tutorial, we’ll use a dataset of labeled movie reviews from Kaggle. You can download it here.

Load the dataset into a Pandas DataFrame and inspect its structure.

The dataset should have two columns:

Review: The textual review
Sentiment: The target variable (positive/negative)

3. Preprocess the Text Data

Clean and tokenize the text data to prepare it for model training. Here are the steps:

a) Remove Stopwords

Stopwords like “is,” “the,” and “an” don’t add significant value to sentiment analysis.

python

stop_words = set(stopwords.words('english'))

stop_words = set(stopwords.words('english'))

def preprocess_text(text):
tokens = word_tokenize(text.lower()) # Convert to lowercase and tokenize
filtered_tokens = [word for word in tokens if word.isalnum() and word not in stop_words]
return ” “.join(filtered_tokens)

data[‘Cleaned_Review’] = data[‘Review’].apply(preprocess_text)

b) Check for Null Values

Ensure there are no missing or null values in the dataset.

4. Split the Dataset

Split the dataset into training and testing sets. Typically, an 80-20 split works well.

python

X = data['Cleaned_Review']

y = data['Sentiment']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

X = data['Cleaned_Review'] y = data['Sentiment'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

X = data['Cleaned_Review']
y = data['Sentiment']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

5. Convert Text to Numerical Data

Since machine learning models cannot directly interpret text, convert the reviews into numerical data using CountVectorizer.

6. Train a Sentiment Analysis Model

We’ll use the Naive Bayes classifier, which is effective for text classification tasks.

7. Evaluate the Model

Predict on the test data and evaluate the model’s performance using accuracy.

python

y_pred = model.predict(X_test_vectorized)

accuracy = accuracy_score(y_test, y_pred)

print(f"Model Accuracy: {accuracy * 100:.2f}%")

y_pred = model.predict(X_test_vectorized) accuracy = accuracy_score(y_test, y_pred) print(f"Model Accuracy: {accuracy * 100:.2f}%")

y_pred = model.predict(X_test_vectorized)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy * 100:.2f}%")

8. Test with Custom Inputs

Test the model with your own sentences.

python

custom_reviews = [

"I absolutely loved this movie! The story was fantastic.",

"The plot was terrible and the acting was even worse.",

"An average film with a predictable storyline."

]

custom_reviews_vectorized = vectorizer.transform(custom_reviews)

predictions = model.predict(custom_reviews_vectorized)

print(predictions)

custom_reviews = [ "I absolutely loved this movie! The story was fantastic.", "The plot was terrible and the acting was even worse.", "An average film with a predictable storyline." ] custom_reviews_vectorized = vectorizer.transform(custom_reviews) predictions = model.predict(custom_reviews_vectorized) print(predictions)

custom_reviews = [
"I absolutely loved this movie! The story was fantastic.",
"The plot was terrible and the acting was even worse.",
"An average film with a predictable storyline."
]

custom_reviews_vectorized = vectorizer.transform(custom_reviews)
predictions = model.predict(custom_reviews_vectorized)
print(predictions)

Output: The predictions will classify the reviews as positive, negative, or neutral.

Advanced Tips to Improve the Model

Use TF-IDF Vectorizer: Instead of CountVectorizer, use TfidfVectorizer for better results:
python
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer()
X_train_tfidf = tfidf.fit_transform(X_train)
X_test_tfidf = tfidf.transform(X_test)
from sklearn.feature_extraction.text import TfidfVectorizer tfidf = TfidfVectorizer() X_train_tfidf = tfidf.fit_transform(X_train) X_test_tfidf = tfidf.transform(X_test)
```
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer()
X_train_tfidf = tfidf.fit_transform(X_train)
X_test_tfidf = tfidf.transform(X_test)
```
Try Other Algorithms: Experiment with Support Vector Machines (SVM) or deep learning models like LSTMs for potentially higher accuracy.
Hyperparameter Tuning: Use GridSearchCV to find the best parameters for your model.

FAQs

What is sentiment analysis used for?
It is widely used in customer feedback analysis, social media monitoring, and brand reputation management.

Why use Naive Bayes for sentiment analysis?
Naive Bayes is simple, fast, and effective for text-based classification tasks.

Can I use a pre-trained model for sentiment analysis?
Yes, libraries like Hugging Face Transformers provide pre-trained models like BERT for sentiment analysis.

What is the difference between CountVectorizer and TfidfVectorizer?
CountVectorizer counts word occurrences, while TfidfVectorizer considers word importance relative to the entire corpus.

How accurate is sentiment analysis?
Accuracy depends on the dataset, preprocessing, and model. With proper tuning, you can achieve 85–95% accuracy.

Wrap Up

By following this guide, you now know how to build a sentiment analysis model in Python. From preprocessing text to training a Naive Bayes classifier, this project gives you a hands-on introduction to natural language processing.

As you advance, consider exploring deep learning frameworks like TensorFlow or PyTorch for more sophisticated sentiment analysis models.

A big thank you for exploring TechsBucket! Your visit means a lot to us, and we’re grateful for your time on our platform. If you have any feedback or suggestions, we’d love to hear them.

add a comment

How to Build a Sentiment Analysis Model in Python

What Is Sentiment Analysis?

Prerequisites for Building a Sentiment Analysis Model

Step-by-Step Guide to Building a Sentiment Analysis Model

1. Import Necessary Libraries

2. Load and Explore the Dataset

3. Preprocess the Text Data

a) Remove Stopwords

b) Check for Null Values

4. Split the Dataset

5. Convert Text to Numerical Data

6. Train a Sentiment Analysis Model

7. Evaluate the Model

Advanced Tips to Improve the Model

FAQs

Wrap Up

Leave a Response Cancel reply

How to do youtube keyword research

Enable Pro Subscription to Ubuntu LTS

Tweetfull – Twitter Automation Tool

How to display 300*250 Google Adsense Ads

Upcoming Smartphones in 2025

Popular

Fix Bluetooth Audio Devices and Wireless Displays in Windows 10

Build a Weather App with Real-Time Features

Vivo X200 Ultra: A Photography Beast

Pages

Contact Info

What Is Sentiment Analysis?

Prerequisites for Building a Sentiment Analysis Model

Step-by-Step Guide to Building a Sentiment Analysis Model

1. Import Necessary Libraries

2. Load and Explore the Dataset

3. Preprocess the Text Data

a) Remove Stopwords

b) Check for Null Values

4. Split the Dataset

5. Convert Text to Numerical Data

6. Train a Sentiment Analysis Model

7. Evaluate the Model

Advanced Tips to Improve the Model

FAQs

Wrap Up

Leave a Response Cancel reply

You Might Also Like