This article was originally created by “TGO Kunpeng Club”. Link: Yu Kai, co-founder of Spic: Cultivating artificial intelligence ecology to achieve win-win situation of the enterprise

Yu Kai is a scholar and an entrepreneur. In the academic world, Kai Yu is a researcher in the Department of Computer Science and Engineering and director of the Intelligent Speech Technology Laboratory at Shanghai Jiaotong University. In the industry, he is the founder and chief scientist of Spitz. He has a PhD in speech from The University of Cambridge and is the only voice expert from the intelligent speech technology industry in the “Young Thousand Talents Plan” of the Chinese academic circle.

Yu Kai has conducted extensive research in the main core technical fields of human-computer spoken dialogue interaction, and has published more than 100 papers in international first-class journals and conferences. In 2014, he was awarded the “Wu Wenjun Artificial Intelligence Science and Technology Award for Progress” by The Chinese Society for Artificial Intelligence. In 2016, he was awarded the “Scientific Chinese Person of the Year”.

The author | Echo Tang planning | hai-xing liu

From 2002 to 2012, Yu kai spent ten years in Cambridge. In the first five years, he focused on speech recognition, and his research direction included the core modules of speech recognition such as acoustic model, speech model and system construction. In the last five years, Yu Kai focused on dialogue, including end-to-end dialogue system, adaptive personalized synthesis, semantic understanding, dialogue management, and end-to-end system architecture. He co-founded VocalIQ, a UK voice technology company, with Professor Steve Young and Dr Blaise Tomson, fellow of the Royal Academy of Engineering at Cambridge University.

In 2007, the middle period of his study in Cambridge, Yu Kai realized that the development of artificial intelligence must be based on the continuous innovation of the underlying technology. Without his own originality and continuous innovation ability, technological backwardness would become inevitable. So he co-founded Spitz with Fellow Cambridge student Gao Shixing.

The following is the transcript of Dr. Yu Kai’s interview with TGO kunpeng club:

TGO Kunpeng: I have heard that you have played many different roles in Spitz. Could you give me a brief introduction?

I was actually the first chairman of Spitz. I was CEO until we raised our first round of funding in 2012. After the financing, the CEO needed more energy to deal with things other than technology, so The CEO of Spis was replaced by Gao Shixing, and I became the chairman. However, after Spitz got the second round of financing, the coordination among shareholders began to increase, and I think this part of the matter also needs to be removed. I stepped down as chairman to focus on propulsion technology and became Chief scientist at Spitz.

TGO Kunpenghui: With your help, Spitz and Shanghai Jiao Tong University established the Speech Lab. Could you introduce the research results of this Lab?

Spitz’s joint laboratory at JIAOTONG University has enabled us to continue to have basic research capabilities, and we are internationally leading in several of our capabilities: for example, in speech recognition with noise cancellation, we have maintained the lowest error rate in the world in Aurora 4, an international standard test set, since the end of 2015. The word error rate of 7.09% is a significant advantage over the current best results of around 10% in other institutions around the world. This test set has been used in the speech recognition research community for over 20 years, and all the best schools and institutions you can think of, including IBM and Microsoft, have tested on it, so the error rate is comparable. Here we are talking about a single system, in terms of the search speed of speech recognition decoding, the algorithm we released in 2016, the search speed increased by 5 to 7 times, this is the biggest improvement since the paper was published.

In addition, It has ranked first in the international Challenge of conversational state tracking, speech conversion and speech synthesis. In addition, in the process of communicating with customers, Spitz also has some third-party evaluations, which are generally high. Spitz may be one of the only or few companies with the ability to serve big clients like Alibaba, Tencent, Xiaomi, Lenovo, and Foxconn at the same time. All these represent the technical strength of The company itself. I want to emphasize that this is only a small part of the many things THAT I can tell you. Because we have these foundations, so we can constantly find new methods emerge one after another, but also more adapt to the rapid and iterative development of the whole artificial intelligence technology.

TGO Kunpeng: Would you please introduce the DUI dialogue customization platform of Spitz in detail?

Yu Kai: DUI is actually the first platform in China to introduce the concept of dialogue. So if you want to talk about what kind of company Spitz is, the first keyword is “artificial intelligence”. What artificial intelligence? The key word that most people probably think of is “phonetics,” and what we’ve been emphasizing is not “phonetics,” but “spoken conversation,” which is to say, “speech conversation,” which involves both speech and language, and the more important thing is to deal with interaction. I studied speech recognition in the first five years abroad, and dialogue system in the last five years. These two are different. The predecessor of DUI platform is the Dialogue Workshop we released in 2013. We introduced the concept of dialogue to domestic research, and Spis was also the first company to release this concept. Dialogue intelligence is the core part of the whole future artificial intelligence. From dialogue Workshop in 2013 to DUI platform in 2017, it is actually an extension of the original, and it is the first large-scale customizable dialogue platform in China. You have to pay attention to the six key words “mass customizable,” which means we’re not doing the same platform anymore.

TGO Kunpeng: Can you give me an example of “mass customization”?

For example, one client wanted to use speech synthesis technology, other clients wanted to use speech synthesis technology, they all wanted to be unique, different. At present, the large-scale customization we do is to use some advanced technologies to help customers achieve personalized customization at the levels of speech recognition, synthesis, understanding, dialogue and interaction, which is the most important feature of DUI platform. More specifically, DUI platform is actually an application-oriented platform, so it has not only the core voice technology, but also four system service developers, who can customize almost every part of the dialogue freely. There are also native SDKS in other places. On DUI platform, Users can customize voice functions according to their own product features, such as the combination of graphics and voice, such as the customization of wakeword, for example, I changed some semantic understanding, almost immediately your phone can be used, do you think how fast?

TGO kunpeng would: is it module customized?

Yu Kai: No, it is actually customized online. We have a local Linglong system, which can be regarded as a local AI operating system, which is an artificial intelligence layer operating system on top of the underlying computing system. It is a universal thing. Instead of just executing a specific voice command, it can use all kinds of custom models that come in from the cloud, so it can be updated very quickly.

There is also a sky machine system, which means that it can do all kinds of data analysis and statistics, which is to say that the previous customization was only given some basic technology. For example, if you’re an app developer, that doesn’t help you understand the variety of real users. How many people inside your users are sichuan people, how many people are Shanghai people, how to statistics? You have to go to the old technology to provide solutions, which is very troublesome. We use our own system, and all these statistics can be customized.

The third green capsule system is responsible for DUI service and r&d support subsystem. Qing capsule is responsible for record and tracking developer feedback problem, the background of bug fixes and updates, and tracking the usage of developers, and keep the system optimization, greatly shortened from find problems to solve on-line closed-loop alignment, is advantageous for the platform and processes through research and development of the whole system and management, guarantee the benign development of DUI.

There is also a Zimicro system, which is responsible for backend content docking and interface standardization packaging, and compatible with AVS services, so that developers can easily complete the call and configuration. The ultimate purpose of dialogue is to let the machine understand the task intention, which requires a large number of third-party content and services as back-end support to meet the personalized needs of users. The DUI platform is not just a technology platform in nature, it is essentially a platform to provide a full range of developer services.

TGO Kunpeng: At present, Spitz is trying to solve problems in the field of cognitive intelligence such as understanding, decision making and presentation. Could you briefly introduce the progress?

Yu Kai: Part of the process of solving these problems is theoretical progress and part is practical progress. Theoretical progress, we have achieved some of the best international results in the Conversation Tracking Challenge, including now our management of statistics, we are implementing some novel frameworks. Before, there were more or less problems with unified dialogue management online all over the world, and the performance was not good enough. Now we have some integrated methods to make this thing online, so now the theoretical framework has been broken, and now we are making further progress. In addition, we’ve done q&A comprehension, we’ve introduced some controlled Q&A in Q&A chat, we’ve introduced some new ways to do generative controlled chat, these are some of the more advanced things.

TGO Kunpeng Club: The dialogue system of Spic has been involved in intelligent home, intelligent vehicle, robot and other fields. What scenarios will it be applied in in the future?

YuKai: scene very much, now you can see this part is mainly intelligent hardware, and the intelligent hardware itself including cloud services, it is in other more fields can be used widely, such as financial education, call center, medical, government affairs, security, etc., to sum up is wisdom city, these are all can get a very wide range of applications.

TGO Kunpeng: Are you starting to expand in this direction?

Yu Kai: Yes, this involves the overall layout of Spitz. First of all, We provide overall solutions, but we also provide solutions for intelligent services and customized solutions for different fields, which are divided from different perspectives. So The layout of The Company is divided into two levels: one level is that The company itself provides these solutions to the corresponding companies, the other is that the company has two funds, around the big ecosystem, we also invest in some enterprises. Therefore, one of the core points of The Company is that what you may see is not completely a product made by The company itself, which is the difference between the concept of The company and others. The company hopes to achieve win-win cooperation, cultivate the artificial intelligence ecosystem, and grow up more companies in vertical fields. In this way, professional people do professional things, and then there will be a greater potential of artificial intelligence, so Spitz is to achieve through enabling, so this layout is achieved through cooperation with other parties, as well as some ecological investment.

TGO Kunpeng: Let’s talk a little bit about the industry. Where do you think China’s AI technology stands in the world?

Yu Kai: From the level of application technology, it is definitely leading. By using mobile phones, you can see that many of our applied innovations are already ahead of the United States. However, in terms of research-based technologies, including underlying technologies and original technologies, I think China is internationally advanced, but not yet a leader. We don’t have anything that leads people around.

TGO kunpeng would: What kind of companies do you think should develop AI technology?

Yu Kai: Enterprises in need. Why do you say that? Artificial intelligence offers only new ways of computing, sensing and analysis. If these things are involved in the operation chain of the original enterprise, as long as it involves information flow, basically artificial intelligence can be used more or less. I think almost all traditional industries need some AI. But consumer products, related to information technology, it needs more.

TGO Kunpeng: What advice would you give to people just starting to learn ai skills?

Yu Kai: I think the technical aspect must start from deep learning, there is no doubt about that. As long as you have a certain foundation, you can find a lot of things to learn on the Internet. However, from an application perspective, I would suggest that algorithms are no substitute for understanding the industry or the specific application. For example, if a machine learning expert is not good at making products but can do research. However, if you want to make application products, the most important core competence is how to combine algorithms with practical applications, how to deeply understand applications and summarize the basic problems of artificial intelligence from reality, and how to solve the problems.


For more content, please pay attention to TGO Kunpeng Club, ID: TGO-Kunpenghui, now you can get a “CTO Skill Map” for free!