Classes
07 - Lambda Functions
Part 2

Sentiment Analysis

Remembering

In the last class we saw how to build Lambda functions in AWS.

We chose the API Gateway service to expose our Lambda functions. Thus, the function is triggered whenever the user makes a request to an endpoint.

Info!

AWS Lambda allows machine learning models to be deployed and hosted serverlessly.

Looking back at the examples we did, they involved a function that returned a fixed JSON and a lambda function that counted the number of words in a sentence. No ML for now!

Let's use this class to build more complex examples using AWS Lambda!

Sentiment Analysis

Sentiment analysis (SA) is the process of determining whether a given phrase is positive, negative or neutral.

It can be applied to analyze feedback, reviews, survey responses, social media posts and more to gauge public opinion on certain topics.

Question 1

Answer!

We can represent it as a continuous score, for example, in the range [-1,1]``, where-1.0represents a very negative text and1.0` represents a very positive text.

Another option is to use categorical variables, such as:

Very negative
Negative
Neutral
Positive
Very positive

Question 2

Answer!

One of the simplest ways is to create fixed rules. For example, count how many times a word like disappointed occurs in text. A high occurrence may indicate negativity!

Another way is to create two lists of words: one of negative words and another of positive words. We count how many words we have from each list in the sentence we are analyzing, and if we have more positive words, we say the text has positive sentiment, otherwise we say it has negative sentiment.

Furthermore, we can ask an expert to provide weights for words, for example, hate could have a weight of -0.9 while cool could have a weight of 0.4. Thus, we can compute whether in total we have a predominant positive or negative weight.

See more Here

Question 3

Answer!

Considering a manually classified database, we can pre-process the text, removing punctuations and stop-words. Then we can tokenize the text and train a Naive Bayes classifier. You probably did this in Ciência dos dados (you created the Python code to calculate probabilities yourself) and Megadados (using Spark) courses.

See more Here and Here.

Instead of training our own sentiment analysis model, we will use a ready-made library that already provides this functionality.

Textblob library

We chose to use the textblob library for sentiment analysis. It provides a series of features such as:

Calculation of n-grams.
Tokenization
Spelling Correction
Sentiment Analysis
etc.

See the documentation here and here.

Let's see an example of how to use the library. But first, do the installation:

$ pip install textblob

Let's use textblob to obtain the polarity of three different texts:

from textblob import TextBlob

text1 = "What a damn company. You guys are the worst, you can't meet the deadline."
text2 = "Hello everybody"
text3 = "I am so happy to be here"

blob1 = TextBlob(text1)
blob2 = TextBlob(text2)
blob3 = TextBlob(text3)

print(f"Polarity of text1: {blob1.polarity}")
print(f"Polarity of text2: {blob2.polarity}")
print(f"Polarity of text3: {blob3.polarity}")

Tip! 1

Here it is the repository of textblob: https://github.com/sloria/textblob

Question 4