3.3 When stopwords removal make sense?

  • Removal of stopwords depends on the task you are solving.

  • There is no hard and fast rule on when to remove stop words. But we should remove stop words if our task is one of Language Classification, Spam Filtering, Caption Generation, Auto-Tag Generation, Sentiment analysis, or something that is related to text classification

  • On the other hand, if our task is one of Machine Translation, Question-Answering problems, Text Summarization, Language Modeling, it’s better not to remove the stop words as they are a crucial part of these applications

  • Less complex Model: Stop words do not carry meaning on their own, but only in the context of a sentence. If you use a model (a linear classifier, decision tree/forest) that is in principle incapable of leveraging the context, keeping the stop words cannot actually help.

  • Complex Model : But, if you use more complex models (LSTM, Transformers) that can grasp the grammatical meaning of the stopwords, it does not make sense to remove them.

  • Sentiment analysis task is sensitive to stop words (e.g., David is not happy vs (David, Happy))

  • So, what kind of stopwords to inlude depend on the task at hand?

  • it’s better to keep these words and do some tests with and without them to see how it affects the model and you should never remove stop words without thinking about the impact of these words on the problem you are trying to solve.