Supervised Machine Learning for Text Analysis in R Book Club
Welcome
Book club meetings
Pace
1
Language and modeling
1.1
Linguistics for Text Analysis
1.2
A Glimpse into Morphology
1.3
Different Languages
1.4
Other Ways Text can Vary
1.5
Meeting Videos
1.5.1
Cohort 1
2
Tokenization
2.1
Define “Token”
2.2
Different types of tokens
2.2.1
Token type: Character
2.2.2
Token type: Word
2.2.3
Token type: n-grams
2.2.4
Token type: Lines, sentence, and paragraph
2.3
Where does tokenization break down?
2.4
Building your own tokenizer
2.4.1
Mimick tokenize_characters()
2.4.2
Allow for hyphenated words in tokenize_words()
2.4.3
Character n-gram tokenizer
2.5
Meeting Videos
2.5.1
Cohort 1
3
Stop words
3.1
What are stop-words
3.2
Why do we remove them
3.3
When stopwords removal make sense?
3.4
Using off-the-shelf stop word lists
3.5
Stop word removal in R
3.6
Creating your own stop words list
3.7
Stopwords for Pidgin
3.8
How many words do we include in our stop word list?
3.9
All stop word lists are context-specific
3.10
Problem with off-the-shelf stopwords
3.11
Meeting Videos
3.11.1
Cohort 1
4
Stemming
4.1
Slide 1
4.2
Meeting Videos
4.2.1
Cohort 1
5
Word Embeddings
5.1
Slide 1
5.2
Meeting Videos
5.2.1
Cohort 1
6
Regression
6.1
Slide 1
6.2
Meeting Videos
6.2.1
Cohort 1
7
Classification
7.1
Slide 1
7.2
Meeting Videos
7.2.1
Cohort 1
8
Dense neural networks
8.1
Slide 1
8.2
Meeting Videos
8.2.1
Cohort 1
9
Long short-term memory (LSTM) networks
9.1
Slide 1
9.2
Meeting Videos
9.2.1
Cohort 1
10
Convolutional neural networks
10.1
Slide 1
10.2
Meeting Videos
10.2.1
Cohort 1
Published with bookdown
Supervised Machine Learning for Text Analysis in R Book Club
5.1
Slide 1
Add slides as sections (marked with
##
).
Please give code chunks unique names (such as “01-something” for a block in chapter 1). This makes debugging much easier.