Natural language processing (NLP), a branch of artificial intelligence, is simply the use of computers to process, understand and use human language. It belongs to the category of “cognitive intelligence”. NLP has been expected to be considerate since it was proposed.

What is the specific meaning of “satisfactory”, which has to mention the three levels of language: grammar, semantics and pragmatics.

01 grammar

Grammar may seem abstract, but most of us have been exposed to it in high school. The subject-verb-object and determinate complement I learned in primary school Chinese class, or the morphological changes, function words and word order I learned in English class, all belong to the grammar level of language. Our language, from words to sentence formation, follows these grammatical rules. The development of NLP technology also starts from breaking through these problems.

02 semantic

The simple answer is to determine what the meaning of a given piece of text is. When seeing “Xiao Ming bullied Xiao Qiang, so I criticized him.” Why do humans naturally determine that “he” refers to “Xiao Ming”? Why is it that when we see a perfectly grammatically correct sentence, “Colorless blue sleeps in anger,” it immediately feels semantically illogical? On the semantic level, we pay more attention to the semantic components, the relationship between giving and receiving, verb categories and dominance, that is, how language maps to the world we understand. At present, a lot of competition between NLP manufacturers is basically launched at this level.

03 pragmatic

This aspect returns to the fundamental purpose of language: communication. What information is conveyed by a piece of text, what is its subject, what is its focus. In a conversation, how to make a reasonable judgment on the meaning by considering the situation, context, even body language and other factors. Saying “you hate it” with a smile must send a different message than saying “you hate it” with gritted teeth. At present, few manufacturers are involved in this aspect, but Takesuma has made attempts in business scenarios such as outbound phone calls and multi-mode affective computing, with outstanding results and highly recognized by customers. To do NLP, only by solving the problems of these three planes can we truly achieve the “understanding” expected by people.

There are some challenges in solving the problem of language in three planes:

01 Input irregularity

This irregularity is manifested in typos, colloquial input, or grammatical errors. Incorrect characters, including “mi yue chuan” mistyped as “half moon chuan”, “endless growth and growth” mistyped as “rising and rising ceaseless”, etc. The practice of this kind of problem industry is to automatically correct errors through training error correction model. Our error correction model has also been verified by changhong, Huaxia and other projects, and is now a part of the standard product. As for colloquial input, it is often seen in the process of outgoing call text processing, which is characterized by long and mixed with a lot of useless information. To prune, straighten out and extract the key information of this kind of problem is the way to solve the problem. Takema has successfully helped many enterprises to solve dozens of scenarios such as TV advertisement, questionnaire return visit, insurance identity verification, etc., and has applied for a patent for the processing of long and difficult sentences.

The problem of ambiguity caused by the universal uncertainty of language

In word segmentation, we will encounter such a situation: e.g. Tennis finished auction/sale/finished/or tennis racket/sell/two segmentation of way, and it should according to which the results output? Our answer is: print the correct one in context. What if there’s no context? Output two, by the downstream module according to the needs of flexible use. In many cases, the same sentence varies pragmatically according to the business scene, context and tone. “I know” can mean “OK, I know” and “stop, I know” respectively in the questionnaire return visit and overdue reminder. In practical business processing, in addition to considering the results of the sentence at the grammatical and semantic levels, we also include the specific terms such as scene differentiation, context and user habits into the scope of assignment from the pragmatic level, and realize the flexible parsing of the same sentence:

Complexity of language knowledge acquisition

To return to the question we discussed before: Xiao Ming bullied Xiao Qiang, so I criticized xiao Ming for bullying Xiao Qiang, so I comforted him. Because human beings know the meanings of “bullying”, “criticism” and “comfort”, it can be concluded that the first “he” refers to “Xiao Ming” and the second “he” refers to xiao Qiang. For the computer, it is necessary to establish the graph of verb dominance and semantic graph, and cooperate with the existing NLP module, so as to catch up with the instant judgment of human brain and truly achieve cognitive intelligence.