2.2 Sentiment/emotion Lexicons

  • Sentiment lexicon is a dictionary of word with thier semantic orientation

  • General sentiment lexicon

    • bing : positive and negative
    • nrc : positive, negative, anger, anticipation, disgust, fear, joy, sadness, surprise, and trust
    • AFINN : Assign scores to words (-5 to +5)
  • Domain-specific Sentiment lexicon performs better, but difficult to generate

  • Lexicon can be Unigram and ngram

  • issue with text with many paragraphs (can often have positive and negative sentiment averaged out to about zero)

  • How are sentiment lexicon created

    • crowdsourcing (using, for example, Amazon Mechanical Turk)
    • by the labor of one of the authors.
library(tidyverse)
library(tidytext)
library(janeaustenr)
library(stringr)
library(lexicon)

Note: These sentiments aren’t shown in the online version of these notes, because they need to be manually downloaded for licensing reasons.

head(get_sentiments("afinn"))
head(get_sentiments("bing"))
nrc_emotions_lex <- read_tsv("data/NRC-Emotion-Lexicon-Wordlevel-v0.92.txt", col_names = FALSE) 

head(nrc_emotions_lex)
## # A tibble: 6 × 3
##   X1    X2              X3
##   <chr> <chr>        <dbl>
## 1 aback anger            0
## 2 aback anticipation     0
## 3 aback disgust          0
## 4 aback fear             0
## 5 aback joy              0
## 6 aback negative         0
# get_sentiments("nrc") is not working

nrc_emotions_lex <- read_tsv("data/NRC-Emotion-Lexicon-Wordlevel-v0.92.txt", col_names = FALSE) %>%
  rename( "word" = 1,  "sentiment" = 2, "score" = 3) %>% 
   select(-score)

head(nrc_emotions_lex)
## # A tibble: 6 × 2
##   word  sentiment   
##   <chr> <chr>       
## 1 aback anger       
## 2 aback anticipation
## 3 aback disgust     
## 4 aback fear        
## 5 aback joy         
## 6 aback negative
head(lexicon::nrc_emotions)
## # A tibble: 6 × 9
##   term        anger anticipation disgust  fear   joy sadness surprise trust
##   <chr>       <int>        <int>   <int> <int> <int>   <int>    <int> <int>
## 1 aback           0            0       0     0     0       0        0     0
## 2 abacus          0            0       0     0     0       0        0     1
## 3 abandon         0            0       0     1     0       1        0     0
## 4 abandoned       1            0       0     1     0       1        0     0
## 5 abandonment     1            0       0     1     0       1        1     0
## 6 abate           0            0       0     0     0       0        0     0