Whether expressed literally, symbolically or conceptually, these neurons respond to the same concepts.
Heart of the Machine reporting, participation: Du Wei, demon King.
The OpenAI researchers found “real” neurons in the artificial neural network CLIP, a mechanism that explains how AI models can categorize surprising visual representations with such accuracy. The researchers say this is an important finding that could have major implications for the study of the computer brain and even the human brain.
This may mean that general-purpose AI is not as far away as we think. But neurons that understand abstract concepts can also make some ironic interpretations.
Fifteen years ago, Quiroga et al. discovered that the human brain contains multimodal neurons. These neurons are able to respond to clusters of abstract concepts around common high-level themes, rather than any specific visual feature. One of the most famous is the Halle Berry neuron, which responds to photos, images and text of the American actress Halle Berry.
In early January, OpenAI presented CLIP, a general-purpose vision system that matched resNET-50 performance and outperformed existing vision systems on some challenging data sets. Given a set of linguistic categories, CLIP can instantly match an image to one of them, and it doesn’t need to fine-tune specific data for those categories, as standard neural networks do.
Recently, OpenAI made another surprising discovery: Multimodal neurons appear in CLIP models! These neurons respond to the same concepts in the form of text, symbols or concepts. For example, “Spider-Man” neurons (similar to Halle Berry neurons) can respond to images of spiders, images of text “Spider” and the comic book character “Spider-Man”.
The neurons found in the CLIP model had similar functions to Halle Berry neurons in the human brain, an improvement over previous artificial neurons.
This discovery provides clues to the common mechanism of abstraction in both synthetic and natural visual systems. The researchers found that the highest level of CLIP organized the images into loose semantic collections of IDEA, thus providing a simple explanation for the universality of the model and the compactness of the representation.
This finding may explain the classification accuracy of the CLIP model and is an important step towards understanding the associations and biases learned during training in large language models, OpenAI said.
So what does a multimodal neuron in a CLIP actually look like? OpenAI researchers explored using interpretable tools and found that the advanced concepts within the CLIP weights contained many human visual vocabularies, such as regions, facial expressions, religious images, celebrities, etc. By exploring the influence of neurons, we can learn more about how CLIP performs classification.
Multimodal neurons in CLIP
OpanAI’s paper, Multimodal Neurons in Artificial Neural Networks, builds on nearly a decade of research on the interpretation of convolutional Networks, which first observed that many classical methods can be applied directly to CLIP. OpenAI uses two tools to understand model activation, feature visualization (maximizing neuron activation by gradient-based optimization of inputs) and dataset examples (observing the distribution of images of maximum neuron activation in a dataset).
Using these simple methods, OpenAI found that most of the neurons in CLIP RN50x4 (ResNet-50 EfficientNet) could be explained. These neurons seem to be extreme examples of “multifaceted neurons” that only respond to different use cases at a higher level of abstraction.
For example, text, face, Logo, building, interior, nature and posture are shown to have different effects for summer and winter:
Texts, faces, logos, buildings, interiors, nature and gestures are also presented in different ways in the US and India:
OpenAI was surprised to find that many of these categories appear to be mirror neurons in the medial temporal lobes of epileptic patients recorded using deep intracranial electrodes, containing neurons that respond to emotions, animals and celebrities.
OpenAI’s study of CLIP, however, has uncovered many more of these strange but wonderful abstractions, including neurons that seem to count, that respond to artistic styles, and even to images with traces of digital modification.
What are multimodal neurons made of
These multimodal neurons help us understand how the CLIP performs classification. Using a sparse linear probe it is easy to view the weights of the CLIP to see which concepts combine to achieve the final classification on the ImageNet dataset.
As shown below, a piggy bank appears to be made up of a “finance” neuron and porcelain neuron. Spider neurons also act as Spider detectors and play an important role in the classification of barn spiders.
A key finding of OpenAI for text classification is that these concepts are contained in neurons in a manner similar to word2vec objective functions, and they are almost linear. Thus, these concepts constitute a simple algebra that behaves like a linear probe. By linearizing attention, we can also examine any sentence like a linear probe, as shown in the following figure:
False abstraction
The degree of abstraction of the CLIP reveals a new vector of attack that OpenAI believes has not been represented in previous systems. As with many deep networks, the representation at the highest level of the model is completely controlled by such high-level abstractions. But what differentiates CLIP is degree, and its multimodal neurons’ ability to generalize between words and symbols can be a double-edged sword.
Through a series of carefully designed experiments, OpenAI demonstrated that this reductive behavior can be used to trick the model into making absurd classifications. In addition, OpenAI observed that the firing of neurons in CLIP can often be controlled by their response to a text image, thus providing a simple vector for attacking the model.
Financial neurons, for example, can respond to a piggy bank and the string “$$$”. By compulsorily activating financial neurons, we can trick the CLIP model into classifying a dog as a piggy bank. The details are shown in the figure below:
The wild attack
OpenAI calls these types of attacks “typographic attacks.” The researchers exhausted the CLIP model’s robustness to read text and found that even handwritten text images could fool the model. As shown in the picture below, a granny Smith green apple with the word “iPod” affixed to its surface was misclassified as “iPod.”
The researchers think the attacks could also take more subtle, less obvious forms. The input image of CLIP is often abstracted in a variety of subtle and complex forms, which can over-abstract some common patterns — over-simplify, which can lead to over-generalization.
Bias and overgeneralization
The CLIP model is trained on carefully collected network images, but it still inherits many unexamined biases and associations. Researchers have found that many of the associations in CLIP are benign, but others can be damaging, such as derogatory references to specific individuals or organizations. For example, “Middle East” neurons have been linked to terrorism, “immigration” neurons respond to Latin America, and some even respond to dark-skinned people and gorillas. This reflects the problem of image annotation in earlier models, which is unacceptable.
These associations pose great challenges to the application of such powerful visual systems. Whether fine-tuned or using zero-degree learning, these biases and correlation probabilities remain in the system and affect model deployment in visible or invisible ways. We may not be able to predict a lot of biased behaviors, and measuring and correcting them can be very difficult. OpenAI believes these interpretable tools can help practitioners avoid potential problems by identifying associations and discrimination ahead of time.
OpenAI states that their understanding of The CLIP model is still ongoing, and it is unknown whether a larger version will be released.
The research may open up new avenues for AI technology and even neuroscience research. “Because we don’t understand how neural networks work, it’s hard to understand why they go wrong,” said Ilya Sutskever, co-founder and chief scientist at OpenAI. “We don’t know if they’re reliable or if they have some vulnerabilities that weren’t found in the tests.”
In addition, OpenAI has published tools for understanding CLIP models, such as the OpenAI Microscope, which recently updated feature visualizations, dataset examples, and text feature visualizations for each neuron in CLIP RN50x4. See: microscope.openai.com/models
Photo source: microscope.openai.com/models/cont…
The researchers also published weights for CLIP RN50x4 and RN101 at GitHub project: github.com/openai/CLIP
Original link: openai.com/blog/multim…
Distill. Pub / 2021 / multim…