9.4 9.3 Sentiment Analysis

Question 1: how often positive or negative words appeared in the Usenet data? Question 1a: which words contributed the most within each newsgroup? Question 2: what were the most positive/negative messages?

9.4.1 Question 2:

sentiment_messages <- usenet_words %>%
  inner_join(get_sentiments("afinn"), by = "word") %>%
  group_by(newsgroup, id) %>%
  summarize(sentiment = mean(value),
            words = n()) %>%
  ungroup() %>%
  filter(words >= 5)

sentiment_messages %>%
  arrange(desc(sentiment))

Clearly message id 53560 was the most positive in the whole dataset. What was it?!

print_message <- function(group, message_id) {
  result <- cleaned_text %>%
    filter(newsgroup == group, id == message_id, text != "")
  
  cat(result$text, sep = "\n")
}

print_message("rec.sport.hockey", 53560)

What about the most negative?

print_message("rec.sport.hockey", 53907)