2.2 Sentiment/emotion Lexicons
Sentiment lexicon is a dictionary of word with thier semantic orientation
General sentiment lexicon
- bing : positive and negative
- nrc : positive, negative, anger, anticipation, disgust, fear, joy, sadness, surprise, and trust
- AFINN : Assign scores to words (-5 to +5)
Domain-specific Sentiment lexicon performs better, but difficult to generate
Lexicon can be Unigram and ngram
issue with text with many paragraphs (can often have positive and negative sentiment averaged out to about zero)
How are sentiment lexicon created
- crowdsourcing (using, for example, Amazon Mechanical Turk)
- by the labor of one of the authors.
library(tidyverse)
library(tidytext)
library(janeaustenr)
library(stringr)
library(lexicon)
Note: These sentiments aren’t shown in the online version of these notes, because they need to be manually downloaded for licensing reasons.
head(get_sentiments("afinn"))
head(get_sentiments("bing"))
<- read_tsv("data/NRC-Emotion-Lexicon-Wordlevel-v0.92.txt", col_names = FALSE)
nrc_emotions_lex
head(nrc_emotions_lex)
## # A tibble: 6 × 3
## X1 X2 X3
## <chr> <chr> <dbl>
## 1 aback anger 0
## 2 aback anticipation 0
## 3 aback disgust 0
## 4 aback fear 0
## 5 aback joy 0
## 6 aback negative 0
# get_sentiments("nrc") is not working
<- read_tsv("data/NRC-Emotion-Lexicon-Wordlevel-v0.92.txt", col_names = FALSE) %>%
nrc_emotions_lex rename( "word" = 1, "sentiment" = 2, "score" = 3) %>%
select(-score)
head(nrc_emotions_lex)
## # A tibble: 6 × 2
## word sentiment
## <chr> <chr>
## 1 aback anger
## 2 aback anticipation
## 3 aback disgust
## 4 aback fear
## 5 aback joy
## 6 aback negative
head(lexicon::nrc_emotions)
## # A tibble: 6 × 9
## term anger anticipation disgust fear joy sadness surprise trust
## <chr> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 aback 0 0 0 0 0 0 0 0
## 2 abacus 0 0 0 0 0 0 0 1
## 3 abandon 0 0 0 1 0 1 0 0
## 4 abandoned 1 0 0 1 0 1 0 0
## 5 abandonment 1 0 0 1 0 1 1 0
## 6 abate 0 0 0 0 0 0 0 0