Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

What is NLP?

NLP = NLU + NLG

  • NLU: Voice/text -> meaning
  • NLG: meaning -> text/voice

The challenges of natural language processing

Multiple Ways to Express

Ambiguity – Ambiguity

Natural language processing main application scenarios

Question Answering

Sentiment Analysis

Machine Translation

Text Summarization

Chatbot

Information Extraction

Key technologies

Natural language processing has four dimensions

  • Morphology (words)
  • Syntax(sentence structure)
  • Semantic (Semantic)
  • Four Dimensions of Phonetics for Natural Language Processing

Word Segmentation

part-of-speech

Named Entity Recognition

Parsing

Dependency Parsing

Relation Extraction

Word segmentation technology

Forward maximum matching

First, define the size of MAX_LENGTH (generally the length of the longest word in the dictionary), then scan the substring of words to be partitioned from front to back to get the length of MAX_LENGTH, and then match in the dictionary, select the word matching with the longest word in the dictionary as the target word segmentation, and then the next match.

Backward maximum matching

First, define the size of MAX_LENGTH (generally the length of the longest word in the dictionary), then scan the substring to get the length of MAX_LENGTH from back to front, then match in the dictionary, select the word matching with the longest word in the dictionary as the target word segmentation, and then match the next time.

Viterbi algorithm based on graph

The Viterbi algorithm is to find the optimal of all observation sequences.

Word segmentation tools

  • Github.com/fxsjy/jieba Jieba participle
  • SnowNLP github.com/isnowfy/sno…
  • LTP www.ltp-cloud.com/
  • HanNLP github.com/hankcs/HanL…