A sequence of

This article belongs to the NLP study Notes series.

The second stop

For NLP applications, we usually filter out the stop words and the words that occur only infrequently.

For the stop word dictionary, see the previous section: text preprocessing and stop words

This mainly has little impact on the business and does not affect analysis, similar to the process of feature filtering.

Consider your application scenario.

Case: Some adjectives are usually filtered out, but the tone of expression should be retained in sentiment analysis.

You can make your own changes.

Standardization of three words

This is in English. Examples: went,go,going, singular, plural, comparative, etc.

Technology involved:

Stem extraction means stemming the stem or root form of the word (not necessarily able to express the full semantics)

Lemmatization refers to the reduction of a language vocabulary in any form to its general form (to express its complete meaning)

Chinese doesn’t cover it, so I won’t watch it. For those interested, check out Porter Stemmer.