3.4 Using off-the-shelf stop word lists
A quick option for using stop words is to get a list that has already been created.
There are many lits available, but not all lists are created equal
Quanteda provides multilingual stopwords
## [1] "snowball" "stopwords-iso" "misc" "smart"
## [5] "marimo" "ancient" "nltk" "perseus"
- Get languages supported by a stopwords
## [1] "da" "de" "en" "es" "fi" "fr" "hu" "ir" "it" "nl" "no" "pt" "ro" "ru" "sv"
## [1] "ar" "az" "da" "nl" "en" "fi" "fr" "de" "el" "hu" "id" "it" "kk" "ne" "no"
## [16] "pt" "ro" "ru" "sl" "es" "sv" "tg" "tr"
## [1] "af" "ar" "hy" "eu" "bn" "br" "bg" "ca" "zh" "hr" "cs" "da" "nl" "en" "eo"
## [16] "et" "fi" "fr" "gl" "de" "el" "ha" "he" "hi" "hu" "id" "ga" "it" "ja" "ko"
## [31] "ku" "la" "lt" "lv" "ms" "mr" "no" "fa" "pl" "pt" "ro" "ru" "sk" "sl" "so"
## [46] "st" "es" "sw" "sv" "th" "tl" "tr" "uk" "ur" "vi" "yo" "zu"
- Default stopword in Quanteeda is snowball. Why?
## [1] 571
## [1] 175
## [1] 1298
These stopwords do intersect
Bt, words that appear in Snowball and ISO but not in the SMART list.
## [1] "she's" "he'd" "she'd" "he'll" "she'll" "shan't" "mustn't"
## [8] "when's" "why's" "how's"