A visual tour of data-driven features that make modern smartphone keyboards smart
A personalized probabilistic smartphone keyboard with various visualizations: touch distribution, key areas, uncertainty/entropy measurements. (Photo courtesy of the author)
What you put in is what you get back, right? Modern touch keyboards are not. This article visually explains the four core functions of your smartphone keyboard — including personalization, autocorrect, and word prediction. Based on the materials I produced for the “Intelligent User Interface” lecture, we examine the inner workings of these features in our daily typing and conclude by inspiring, evaluating, and critically reflecting on the lessons learned from data-driven and “intelligent” user interfaces.
Feature 1: Adapt to key areas
Modern smartphone keyboards personalize the screen area assigned to each key. This is usually not shown in graphical user interfaces to avoid confusion and undesirable co-adaptation – but we can reveal it here.
Touch the data
As the user types, the keyboard collects touch locations in the background (2D points: x, Y). The graph shows two people with this touch data, which was collected over several days.
The touch of two users on a smartphone keyboard. (Photo courtesy of the author)
Touch keyboard model
Using this touch data, we can create personalized buttons for each user. These key models capture the finger placement behavior of each user on the keyboard. For example, the optimal position of the letter “X” for Anna may be slightly different from that for Bob.
Personalized keyboard models for two users, mounted on the touch screen shown above. (Photo courtesy of the author)
As shown, we use normal distribution _p (t | k) _ for each key _k_ personalized keyboard model is set up, and the fitting of the touch of the keys to their location. In terms of form.
Each key K is modeled as a normal distribution with the center being the average touch position of that key. The figure illustrates this visually with the “C” bond as an example. (Photo courtesy of the author)
For each key _k_, the model stores the _ average position (x, y) _ for that key, as well as the _ covariance matrix _ (which can be intuitively thought of as describing the size and shape of the key). Visually, the circles show two and three standard deviations of these distributions.
We can also from the three-dimensional perspective: each key is a “hill”, height corresponds to touch the possibility _p (t | k) _. An example of the “C” key is shown in the figure.
Three-dimensional diagram of the touch possibilities of the C “key. (Photo courtesy of the author)
Which key does the user press? Decode a touch
This keyboard model can be used to decode touches. Decoding simply means finding out which key the user intends to press.
The results are displayed in these graphs, with colors indicating the most likely keys for each touch position (i.e. pixel). We’ve got a personalized keyboard with pixel-to-key assignments based on each user’s personal touch behavior.
The pixel to key assignment of the two users reveals the difference in finger position on the key, and the keyboard can be interpreted in this personalized way. (Photo courtesy of the author)
Look from the form, in order to get the figure, we evaluate _p (k_ | _t), namely under the condition of the touch location _t_, buttons _k_ possibility. This produces the most likely key _k’.
Pay attention to.
_p(k) is the prior distribution of key _k, which describes the probability of the key in general, regardless of the touch position. A simple prior is unified (i.e. all keystrokes are equally likely). A better prior is to use the locale, which we will examine later.
P (t) | k is to assume that the cases of the keys is _k_, touch _t_ possibility.
uncertainty
Since this is a probabilistic model, we can look at the uncertainty of the keyboard. Posterior _p (k | t) _ entropy, is one such uncertainty measurement. Intuitively, if many keys _k_ are equally possible for a touch position _t_, then it is high; Conversely, if a particular bond is considered only possible, it is low.
The posterior uncertainty (entropy) of the model for each pixel on the keyboard. (Photo courtesy of the author)
The diagram above shows this entropy. The uncertainty is highest at the “boundary” between different pixel to key assignments. This fits with our intuition that touches near the edge of the key are “sloppy” or, in other words, less clearly explained than touches that hit the middle of the key.
Feature 2: Integrated language environment
This personalized keyboard can be further improved by integrating the locale. To illustrate, we use a simple big-word language model here. A big word is a pair of letters (e.g. “th”). It’s easy to set up. Simply count pairs of letters in a large number of texts and calculate their relative frequencies.
Formally, the big word language model refers to the possibility of the next key given the previous key. Take this as a priori to the decoding equation, and we have it.
We can study the influence of language models by comparing examples where the preceding letter is “q “and “t”. Given the common collocations of these letters in English, we would expect “u “to get screen space after” Q “and” H “after” T “. In fact, this change is shown in the diagram.
An example of the influence of a (simple) language model. Key assignment changes depending on whether the preceding letter is “Q” or “t”. These variations in bond size fit our intuition that in English “q “is often followed by” U “and “t” by “H”. (Photo courtesy of the author)
To further investigate this result, we then compare the difference in model uncertainty between the two language environments. This shows the increase (or decrease) in the determinism of each pixel due to changes in context.
For our example of switching from “q “to” T “as the previous letter, we expect the pixels around “u” to lose determinism and the pixels around “h “to gain determinism. This is because.
- Many letters may come after “t “– as opposed to “q,” which is often followed by no other letter but “u.”
- In contrast, “th “is very common, increasing the model’s certainty about “h” after “t “.
In fact, this effect is shown in the diagram.
Keyboard uncertainty (posterior entropy) varies depending on whether the preceding letter is “q “or” T “. (Photo courtesy of the author)
Feature 3: Decode the whole word
So far, we’ve decoded a single touch at a time, which is the equivalent of personalizing the key area. Typing consists of many consecutive touches. Therefore, we can also study how to decode touch sequences into words, including the locale. This can be used for “auto-correct”.
To the sequence
We now seek to find the most likely alphabetic sequence _s_, given a sequence _o_ of the observed touch position. Here it is.
The equation is the same as the equation for single touch, and the principle is the same. To decode the word, we simply generalize its components to the keystroke/touch sequence. For simplicity, suppose that a word of length _n_ has _n_ touches _ (_ i.e., no missing or false touches), which can be formally stated as.
Pay attention to.
_p(s) is a prior about alphabetic sequences. To illustrate, we reuse our big word model. That is, the joint probability of a sequence of _n letters (such as a word) is the product of its key-to-key transformations (i.e., large words).
(_p letter sequence (o | s) is assumed for the touch of _s_ observations _o_ possibility. Here we are again using our key gaussian model, produce every touch _p | k (t). We sum them up in the sequence by multiplication as joint probabilities.
Use the token passing algorithm to find the most likely word
It is very expensive (exponentially) to try all possible alphabetic sequences _s_ of length _n_ to find the most likely one. We need to make compromises. Specifically, we use a token passing algorithm with beam clipping here.
In short, the algorithm tracks a set of partial sequences as “tokens” that can be split (” forks “in the path: explore multiple continuations of the current sequence) or discarded (” dead ends” : the sequence represented by the token becomes too unlikely).
Here is a concrete example. A user is about to type “Hello” (left). The algorithm explores two paths in the hypothesis space (right image). Hello “and” Helli “. As the line thickness suggests, “hello “is more likely; Decoding to find the right word is most likely (red path).
Decodes the expected input “hello “(left image) and passes it using markup. The figure on the right shows the path explored by the algorithm in the hypothesis space. Read from top to bottom (vertical: touch 1-5, horizontal: explore letters in red, line thickness indicates possibility, red path most likely). (Photo courtesy of the author)
Beam search width
Intuitively, the beam width defines how much exploration we want to allow. More exploration may increase the chances of finding the right word, but it also increases calculation time.
As shown in the figure below, the influence of beamwidth becomes apparent by comparing the hypotheses explored at different beamwidths. On the right, the algorithm explores “jello” extra, but it thinks this is less likely than the correct “hello “(see thinner path from “je” than “he “).
Increasing the beamwidth leads to more exploration of the hypothetical space (here “jello “is added to the list of candidates). (Photo courtesy of the author)
Insert and delete
Extensions to the token passing algorithm introduce insertion (producing a letter without processing the next touch) and deletion (processing the next touch without producing a letter). This addresses two common user errors: fake (false) touches and accidentally skipped buttons.
While we won’t go into the technical details here, the next two figures illustrate the decoding of inserts and deletions with concrete examples.
For inserts, let’s look at the example “hlLO” (accidentally skipped “e”). The insertion decoder in the figure below correctly finds that “hello “is the most likely word: intuitively, inserting” E “yields a higher overall probability than simply following touch evidence, since “he” is more likely in English than “HL”.
“Hllo” (the user accidentally skipped the “e”) was decoded with a decoder that could insert extra letters. (Photo courtesy of the author)
Note that the decoder in this example also explores “thllo” by inserting “t “at the beginning, since “th” is very possible in English.
As a side note, for deletion, let’s look at the “hqello” (accidental “Q”) example. The decoder correctly found that “hello “was the most likely hypothesis. While “HQ” is likely, “QE” is less likely than skipping “Q” by “ε “. (empty) and adopt the “he” way.
Decode “hqello” (note the fake “q”) with a decoder that removes the touch (ε) instead of producing a letter. (Photo courtesy of the author)
Feature 4: Suggest the next word
So far, all features have assumed that the user has already touched the key. Conversely, if we don’t have touch (which we haven’t), we have to rely entirely on the locale. This leads to the “word suggestion” feature.
Word suggestions on a smartphone keyboard, displayed after typing “Hello”. (Photo courtesy of the author)
For example, we can look at the last _n-1 _ word to predict the next word.
This is a simple word-level _n_-gram model that can be trained by computing sequences of words in a large textual corpus. More recently, deep learning language models have also been explored for this task. They offer some advantages, such as the ability to include longer contexts.
Discussion and Harvest
What does all this mean after a thorough study of keyboards? Here are three implications for future “smart” user interface design: 1) ideas; 2) Evaluation; 3) Critical thinking beyond interaction.
There’s a lot going on behind your data-driven keyboard
Go back five or ten years to a keyboard application. I did this recently for fun, and something didn’t feel right: I felt strangely clumsy and slow — probably because that old keyboard didn’t yet have the touch data about me, and certainly didn’t have the quality of language modeling and decoding THAT I was used to.
– > _ ideas. To inspire new ideas for “smart” user interfaces, we might ask how to transfer successful data-driven and probabilistic concepts from keyboards to more general GUIs (see also suggested reading at the end).
The user experience is more important than the user interface, especially when it comes to “smart” user interfaces
That old keyboard looks almost the same as my latest keyboard app, but my user experience is much worse, as mentioned above. Today’s keyboard is a classic example of an interactive system in which the user experience cannot be discernable from the visual UI, which does not reveal the underlying algorithmic quality.
– > _ evaluation. Empirical evaluation is critical to the design of an adaptive UI. Future advances are expected to benefit from knowledge and methodologies built on human-computer interaction and artificial intelligence.
How is data used in a data-driven user interface?
Every day we interact with data-driven user interfaces that collect our data. The keyboard is one of the few interactive systems that, at least in principle, only collects input data to improve the input method itself.
_ reflection. We can critically examine data-driven UIs by asking them about their in situ use of user data (i.e., during interaction) and out of situ use (i.e., after use).
Summarize and think about key points
To stimulate further discussion and insight, we can think of today’s touch keyboards as.
- _ Data-driven UI. Keyboard uses touch data and speech data to improve your typing experience and productivity.
- _ Probabilistic user interface. _ The keyboard takes uncertainty into account when interpreting your input.
- Dynamic and adaptive user interfaces.Integrating language models is not the “magic” of typing, but rather leads to dynamic changes in the allocation of pixels to keys.
- _ Biometric user interface. _ To accommodate their keys, the keyboard learns your personal touch behavior, which varies from user to user, like a behavioral fingerprint.
- _ Deceptive user interface. _ The keyboard internally uses a different allocation of pixels to keys than the keys displayed on the screen (the purpose is good: co-adaptation to avoid confusion and deterioration).
- _ Searcher. Keyboard search a huge hypothetical space to “decode” your input into words and sentences.
conclusion
We’ve worked out the core features that make today’s smartphone keyboards smart, and explored their effects and central parameters through visualizations based on specific touch and language data.
Most importantly, the four core functions explained here all come from a probabilistic framework (Bayes’ rule) by filling its slots with different concrete components. Here’s a summary.
An overview of Bayesian keyboards. (Photo courtesy of the author)
In short, mobile touch keyboards have become a common everyday example of “smart”, data-driven, and probabilistic user interfaces.