From task to visualization, how to understand neurons in LSTM networks

Transplanting is a relatively easy and explainable task for humans, so it is a good way to explain what neural networks do and whether there are similarities between what neural networks do and what humans do on the same task. So let’s start with the transcription task and go one step further and visually explain what individual neurons in a neural network actually learn and how they actually make decisions.

Directory:

  • The transcription
  • The network structure
  • Analytic neuron
  • How did the “T” turn into “Brilliance”?
  • How do neurons learn?
  • Visualize cells in LSTM networks
  • Summary comments

Transliteration

About half of the billions of Internet users use languages denoted by non-Latin alphabets, such as Russian, Arabic, Chinese, Greek and Armenian. Most of the time they would also write these languages in Latin alphabet at random.

  • п ривет: Privet, Privyet, Priwjet…
  • ك download ف ح free button ك: Kayf Halk, Keyf 7alek,…
  • Բարև Ձեզ: Barev Dzez, Barew Dzez,…

As a result, user-generated content is increasingly in “Latinized” or “Romanized” formats that are difficult to parse, search, or even identify. Transliteration is the task of automatically converting this content into a canonical format.

  • Aydpes aveli Sirun E.: Ա both sides դպես ավելի սիրուն է:

What factors make this difficult?

  1. As shown above, different language speakers use different romanization methods. For example, v or w is written վ in Armenian.
  2. Multiple letters can be romanized into a Single Latin letter. For example, r can stand for ր or ռ in Armenian.
  3. A single letter can be romanized into multiple Latin letters or combinations of Latin letters. For example, the ch combination stands for equal in Cyrillic or equal in Armenian չ, but the C and H stand for something else, respectively.
  4. English words and cross-lingual Latin symbols, such as urls, often appear in non-Latin text. For example, letters in youtube.com and MSFT will not be changed.

Humans are very good at solving these ambiguities. We have shown that LSTM can also learn to resolve all of these ambiguities, at least in Armenian. For example, our model can correctly will es sirum em Deep Learning transfer for ե ս ս ի ր ո ւ մ ե մ Deep Learning, rather than ե ս ս ի ր ո ւ մ ե մ Դ ե ե փ Լ է ա ր ն ի ն գ.

The network architecture

We took a lot of Armenian texts from Wikipedia and used probabilistic rules to get the Romanized texts. The rules of probability cover most of the Romanized rules that people use in Armenian.

We encode the Latin letters as one-hot vectors, and then use character-level bi-directional LSTM. At each time step, the network tries to guess the next character in the original Armenian sentence. Sometimes a single Armenian letter is represented by multiple Latin letters, so it is helpful to align the Romanized text with the original text before using LSTM (otherwise, we would need to use sentential-to-sentential-LSTM, but this network is very difficult to train). Fortunately we can align, because the Romanian text is generated only by ourselves. For example, dzi should be transliterated to ձի, where dz corresponds to ձ and I corresponds to ի. So we added a placeholder to the Armenian text: ձի became ձ_ի, so z can now be transliterated as _. All we need to do is remove all _ from the output string after the transcription is complete.

Our neural network consists of two LSTMS (228 units in all), which will look forward and backward along the Latin sequence. The output of the LSTM is linked at each step (connection layer), then a dense layer with 228 neurons is used at the top of the hidden layer, and then a dense layer (output layer) using softMax activation function is used to get the output probability. We also connect the input layer to the hidden layer, so we have 300 neurons. This is much cleaner than the simplified version of the network used on the same topic described in our previous blog (the main difference is that we did not use the second layer of bi-directional LSTM (biLSTM)).

Analytic neuron

We tried to answer two questions:

  1. How does the network handle examples with several possible outputs? (e.g. R => ր vs ռ, etc.)
  2. What problems do specific neurons solve?

How did the “T” turn into “Brilliance”?

First, we use one specific character as input and another specific character as output. We are interested, for example, in how “T” turns into “trumpet” (as we know it can turn into տ, թ or As far as we are concerned).

We have a histogram of the correct output of each neuron as well as the one that is not. For most neurons, the histogram is similar, except for the following two conditions:



The two histograms show that by observing the activation result of this specific neuron, we can guess with a high accuracy whether the output result of T is as far as possible. To quantify the difference between the two histograms, we used Hellinger distance, that is, we obtained the maximum and minimum values of neuron activation results, divided them into 1000 points, and then applied the Heininger distance formula. We calculated the Heininger distance of each neuron and presented the most interesting ones as a picture:



The color of the neuron represents the distance between the two histograms (a deeper color means a greater distance). The line width of the line between two neurons represents the contribution of connections from lower to higher levels, known as the mean. Orange and green lines represent positive or negative signals, respectively.

The neurons at the top of the image come from the output layer, and those below the output layer come from the hidden layer (the top 12 neurons represent the distance between the histograms). Below the hidden layer is the connection layer. The neurons of the connecting layer are divided into two parts: the left part of the neuron propagates LSTM from the input sequence to the output sequence, and the right part of the neuron propagates LSTM from the output to the input. We show the first ten neurons from each LSTM according to the distance of the histogram.

In the case of T => Depth, it is obvious that the first 12 neurons of the hidden layer are all going to send positive signals as Far as t and ց (ց is also often romanized as T in The Armenian language) and negative signals as far as տ, թ and other characters.



We can also find the output of color from right to left more deepen, suggesting that these neurons “have more knowledge about whether to predict into ծ”, on the other hand, the color of connections between the neurons and the hidden layer is more shallow, which means that they at the top of the 12 neurons in hidden layer activation degree of contribution is greater. This is a natural result as the T usually goes into as far as the current symbol is AN S and only the LSTM going from right to left is aware of the next character.

Let’s do a similar analysis of neurons and gates inside LSTM. The results of the analysis are shown in the bottom six lines of the figure. In fact, it’s interesting that the most “believable” neurons are those called cell inputs. Both the input unit and the gate depend on the input of the current step and the hidden state of the previous step (like the right-to-left LSTM we talked about, which is the hidden state of the next character), so they are both “aware” of the next S, but for some reason the unit input is more confident than the other parts.

In cases where s should be written as a placeholder, useful information is more likely to come from the input-to-output positive LSTM as s goes into _, especially in the ts => Depth case. You can see this in the figure below:



How do neurons learn?

In the second part of the analysis we explain how each neuron helps in ambiguous situations. We used a Latin character set that could be transcribed into more than one Armenian alphabet. We then removed the sample results with fewer than 300 occurrences in the 5,000 sample sentences, because our distance metric was less effective on those with fewer samples. We then analyzed each neuron given an input-output pair.

For example, this is an analysis of neurons numbered #70 in the output layer of LSTM from left to right. As we saw in the previous visualization, it helps determine whether s will be transposed to _. Here are the top four input-output pairs judged by this neuron:



So this neuron is best at predicting s as _ (which we’ve known for a long time), but it also helps determine whether the Latin letter H should be transfigured into the Armenian letter հ or a placeholder _ (for example, the Armenian letter չ is often romanized as a ch, so h sometimes becomes _).

We visualize Hellinger distances between neuron activation histograms when input is H and output is _. It is found that, in the case of H =>_, neuron #70 also belongs to one of the top ten neurons in LSTM from left to right.



Visualize LSTM cells

Inspired by the paper Visualizing and Understanding Recurrent Networks (Andrej Karpathy, Justin Johnson and Fei-Fei Li), We try to find neurons that respond most strongly to the suffix թ on the side ուն (romanization to tyun).



The first line is the visualization of the output sequence. The following lines show how much the most interesting neurons are firing:

  1. Output to unit #6 in the input reverse LSTM
  2. The input to output is forward to the unit numbered #147 in LSTM
  3. Neuron number 37 in the hidden layer
  4. The 78th neuron in the connection layer



As you can see, unit #6 is active on Tyuns, but not in the rest of the sequence. Unit #144 in the forward LSTM behaves quite the opposite, interested in everything but Tyuns.

We know that in Armenian languages the prefix T in tyuns should always become թ, so we think that if a neuron is interested in tyuns, it might help determine whether the Latin T should not be translatable to թ or տ. So we visualized the most important neurons in the case of input and output to t => թ.



In fact, unit #147 in the forward LSTM is also in the top 10.

conclusion

The interpretability of neural networks remains a challenge in machine learning. Convolutional neural networks and short – and long-term memory work well for many learning tasks, but few tools understand the inner workings of these systems. Transliteration is a good problem to analyze the role of actual neurons.

Our experiments show that even in simple cases, there are a large number of neurons involved in decision making, but it is possible to find a subset of neurons that are more effective than others. On the other hand, most neurons are involved in multiple decision-making processes, depending on context. This is to be expected, since the loss functions we use to train neural networks do not force neurons to be independent of each other and can be explained. More recently, attempts have been made to use regularization methods of information theory in order to obtain more interpretability. It would be interesting to test these ideas in a transscript task.

LSTM