Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.
What is NLP?
NLP = NLU + NLG
- NLU: Voice/text -> meaning
- NLG: meaning -> text/voice
The challenges of natural language processing
Multiple Ways to Express
Ambiguity – Ambiguity
Natural language processing main application scenarios
Question Answering
Sentiment Analysis
Machine Translation
Text Summarization
Chatbot
Information Extraction
Key technologies
Natural language processing has four dimensions
- Morphology (words)
- Syntax(sentence structure)
- Semantic (Semantic)
- Four Dimensions of Phonetics for Natural Language Processing
Word Segmentation
part-of-speech
Named Entity Recognition
Parsing
Dependency Parsing
Relation Extraction
Word segmentation technology
Forward maximum matching
First, define the size of MAX_LENGTH (generally the length of the longest word in the dictionary), then scan the substring of words to be partitioned from front to back to get the length of MAX_LENGTH, and then match in the dictionary, select the word matching with the longest word in the dictionary as the target word segmentation, and then the next match.
Backward maximum matching
First, define the size of MAX_LENGTH (generally the length of the longest word in the dictionary), then scan the substring to get the length of MAX_LENGTH from back to front, then match in the dictionary, select the word matching with the longest word in the dictionary as the target word segmentation, and then match the next time.
Viterbi algorithm based on graph
The Viterbi algorithm is to find the optimal of all observation sequences.
Word segmentation tools
- Github.com/fxsjy/jieba Jieba participle
- SnowNLP github.com/isnowfy/sno…
- LTP www.ltp-cloud.com/
- HanNLP github.com/hankcs/HanL…