Man falls in love with ai in Her, movie

Understanding speech is a subtle problem for a computer program. The literal interpretation of speech sounds may differ from, or be influenced by, tone, expression and subtext.

Voice-activated systems like Alexa and Siri are great for commands, but having a sustained conversation seems to be a harder, more human problem to crack. The spoken conversation system (SDS) interprets speech to understand the user’s intention. These systems analyse the semantic concepts involved in speech — the ideas that give a phrase or sentence its general meaning.

Spoken dialogue AI system

The semantic concepts in human conversation may be subsemantic, subversive and multiterminal. Humans are sarcastic, referential, and often share a common understanding of context and subtext. These are complex questions for machines to explore and infer.

Unsupervised learning, a subdiscipline of machine learning, is a statistics-computing concept that glean information, trends, and features from raw, unlabeled data. For example, natural language processing (NLP) procedures and clustering techniques are used to summarize students’ lecture notes corpus to identify major, maximum mutual concepts. This principle is used to extract knowledge from raw speech data.

Systems like Siri have a spoken comprehension (SLU) component that connects natural speech with its meaning. SDS uses this meaning and phonetic structure to understand the user’s speech. Most systems limit conversations to a few predefined topics, although there is a strong initiative to popularise and extend these systems.

Understanding unlabeled speech by matrix factorization

The system proposed by Chen aims to understand intentions using knowledge acquisition models and matrix decomposition. These models can identify dominant and recessive features. For example, in the query “I want to go to a cheap restaurant”, the dominant characteristics are “cheap” and “restaurant”. Semantic concept induction relates words to basic concepts or slots. Thus, “cheap” is a measure of “expensive” and “restaurant” is a “target” location.

Conceptual relational models relate underlying concepts and assign probabilities to the semantic effects of words that may not exist in user queries. Although the word “food” is not explicitly mentioned in the speech, the model learns that the correlation between “food” and “restaurant” has a probability of 0.85, as you can see in the bottom right of the image above. Thus, the system creates a matrix of the probability values of concepts appearing in sentences. Each row is a spoken sentence or phrase, and each column is a concept’s probability of forming the meaning of that sentence.

Diagrams that capture relationships between semantic concepts

Using a pre-trained semantic analysis system, the system can automatically extract interdependent concept maps from a word, like the image above. A word-based lexical knowledge graph, conceptually illustrated by the following image, is used in combination to capture the ontological structure of the content. These are generated immediately when the corpus is perceived.

Construct knowledge map to interpret discourse

The probability matrix mentioned above is composed of matrices representing the above two knowledge graphs to form the final matrix. Matrix factorization can be used to extract potential features from this matrix. This linear algebra technique is used because it helps to model noisy data, hidden information, and observed dependencies.

The purpose of factorization is to use the existing knowledge captured in the matrix and its factors to infer whether a particular concept exists in a discourse. This allows SDS to understand context and hidden meaning. It also helps solve ambiguity problems because the system has an ontological, relationalist, and structuralist understanding of a word in a sentence.

In addition to predicting semantic concepts, the system can also understand whether a query is domain-specific. Adding more domain knowledge can further standardize. Including additional data, such as data from the Google Knowledge Graph, can help with generalization and extensibility.

This article is the summary of this paper. I look forward to the emergence of emotional intelligence conversational ai systems.


How conversational AI Systems Understand Language was originally published in Nerd For Tech, and people are continuing the conversation by highlighting and responding to the story.