Sentiment Analysis
Remembering
In the last class we saw how to build Lambda functions in AWS.
We chose the API Gateway service to expose our Lambda functions. Thus, the function is triggered whenever the user makes a request to an endpoint.
Info!
AWS Lambda allows machine learning models to be deployed and hosted serverlessly.
Looking back at the examples we did, they involved a function that returned a fixed JSON and a lambda function that counted the number of words in a sentence. No ML for now!
Let's use this class to build more complex examples using AWS Lambda!
Sentiment Analysis
Sentiment analysis (SA) is the process of determining whether a given phrase is positive, negative or neutral.
It can be applied to analyze feedback, reviews, survey responses, social media posts and more to gauge public opinion on certain topics.
Question 1
Answer!
We can represent it as a continuous score, for example, in the range [-1,1]``, where
-1.0represents a very negative text and
1.0` represents a very positive text.
Another option is to use categorical variables, such as:
- Very negative
- Negative
- Neutral
- Positive
- Very positive
Question 2
Answer!
One of the simplest ways is to create fixed rules. For example, count how many times a word like disappointed
occurs in text. A high occurrence may indicate negativity!
Another way is to create two lists of words: one of negative words and another of positive words. We count how many words we have from each list in the sentence we are analyzing, and if we have more positive words, we say the text has positive sentiment, otherwise we say it has negative sentiment.
Furthermore, we can ask an expert to provide weights for words, for example, hate
could have a weight of -0.9
while cool
could have a weight of 0.4
. Thus, we can compute whether in total we have a predominant positive or negative weight.
Question 3
Answer!
Considering a manually classified database, we can pre-process the text, removing punctuations and stop-words. Then we can tokenize the text and train a Naive Bayes classifier. You probably did this in Ciência dos dados (you created the Python code to calculate probabilities yourself) and Megadados (using Spark) courses.
Instead of training our own sentiment analysis model, we will use a ready-made library that already provides this functionality.
Textblob library
We chose to use the textblob
library for sentiment analysis. It provides a series of features such as:
- Calculation of n-grams.
- Tokenization
- Spelling Correction
- Sentiment Analysis
- etc.
See the documentation here and here.
Let's see an example of how to use the library. But first, do the installation:
Let's use textblob
to obtain the polarity of three different texts:
from textblob import TextBlob
text1 = "What a damn company. You guys are the worst, you can't meet the deadline."
text2 = "Hello everybody"
text3 = "I am so happy to be here"
blob1 = TextBlob(text1)
blob2 = TextBlob(text2)
blob3 = TextBlob(text3)
print(f"Polarity of text1: {blob1.polarity}")
print(f"Polarity of text2: {blob2.polarity}")
print(f"Polarity of text3: {blob3.polarity}")
Tip! 1
Here it is the repository of textblob: https://github.com/sloria/textblob
Question 4