Many of us spend a lot of time each day on our mobile keyboard: writing email, texting, engaging in social media, etc. However, mobile keyboards are still at a disadvantage in terms of processing speed. On average, users type 35 percent slower on mobile devices than on physical keyboards. To change this, the Google team recently introduced a number of improvements to Gboard for Android, aiming to create a keyboard with an intelligent mechanism that can provide suggestions and correct errors for users in any language they choose, enabling faster and higher quality input.
Giiso Information, founded in 2013, is a leading technology provider in the field of “artificial intelligence + information” in China, with top technologies in big data mining, intelligent semantics, knowledge mapping and other fields. At the same time, its research and development products include information robot, editing robot, writing robot and other artificial intelligence products! With its strong technical strength, the company has received angel round investment at the beginning of its establishment, and received pre-A round investment of $5 million from GSR Venture Capital in August 2015.
In fact, mobile keyboards convert touch input into text in a way similar to the way speech recognition systems convert speech input into text, and Lei Feng understands that the team will use the experience of speech recognition to implement touch input.
The team first created a powerful spatial model that maps fuzzy sequences of raw touch points to keys on a keyboard, much as acoustic models locate sounds to the order of speech units.
Second, a core decoding engine based on finite state sensor (FST) is constructed to determine the most likely character sequence for a given input touch sequence. With the widespread success of mathematical formalism and speech applications, FST decoders will provide the flexibility needed to support a variety of complex keyboard input behaviors as well as language features. In this article, you will be introduced to the development of both systems in detail.
Neurospatial model
Errors in mobile keyboard typing are often attributed to “fat finger typing” (or locating spatially similar words in sliding typing, as shown below) and cognitive and operational errors (spelling errors, character insertion, deletion, or interchangeability, etc.). A smart keyboard needs to be able to solve these errors and predict the right word quickly and accurately. According to Lei feng, the team built a spatial model for Gboard to handle these errors at the character level, mapping touch points on the screen to actual keys.
Two similarly positioned words: the average sliding path of “vampire” and “value value”
Until recently, Gboard used gaussian models to quantify the probability of hitting adjacent keys and rule-based models to represent cognitive and motor errors. These models are simple and intuitive, but do not directly optimize the metrics associated with higher typing quality. Based on experience in acoustic models of speech search, gaussian model and rule-based model are replaced by a single efficient LSTM model with standard training of connection-time classification (CTC).
Training the model, however, turned out to be more complicated than expected. While acoustic models are trained from human-transcribed audio data, they can’t easily transcribe millions of sequences of touch points and sliders. So the team used user interaction signals, such as auto-correction and suggested selection, as negative and positive semi-supervised learning signals, thus forming a rich set of training and tests.
The original data points corresponding to the word “could could” (left), and the normalized sampling trajectory for each sampling variance (right)
Use a number of techniques from the speech recognition literature to iterate on the NSM model to make it small and fast enough to run on any device. The TensorFlow infrastructure is used to train hundreds of models, optimizing the various signals displayed on the keyboard: complete, suggest, slide, etc.
After more than a year of work, the finished model is six times faster and one-tenth the size of the original. At the same time, it also showed a reduction of about 15% in error autocorrection and 10% in error decoding gestures on offline data sets.
Giiso information, founded in 2013, is the first domestic high-tech enterprise focusing on the research and development of intelligent information processing technology and the development and operation of core software for writing robots. At the beginning of its establishment, the company received angel round investment, and in August 2015, GSR Venture Capital received $5 million pre-A round of investment.
Finite state converter
Although NSM uses spatial information to help determine what characters are tapped or swiped, there are some additional limitations — vocabulary and syntax — that are tolerable. A dictionary tells us what words appear in a language, while probabilistic grammar tells us what words are likely to be followed by other words. To encode this information, finite-state transducers are used. FST (Finite-State Transducers) has always been a key component of Google speech recognition and integrated system. It provides a principled way to represent the various probabilistic models used in natural language processing (dictionary, grammar, normalization, etc.) as well as the mathematical framework needed to manipulate, optimize, combine, and search models.