For a decade now, many of the most impressive artificial intelligence systems have been taught using a huge inventory of labeled data.
An image might be labeled “tabby cat” or “tiger cat,” for example, to “train” an artificial neural network to correctly distinguish a tabby from a tiger. The strategy has been both spectacularly successful and woefully deficient.
Such “supervised” training requires data laboriously labeled by humans, and the neural networks often take shortcuts, learning to associate the labels with minimal and sometimes superficial information. For example, a neural network might use the presence of grass to recognize a photo of a cow, because cows are typically photographed in fields.
“We are raising a generation of algorithms that are like undergrads [who] didn’t come to class the whole semester and then the night before the final, they’re cramming,” said Alexei Efros, a computer scientist at the University of California, Berkeley. “They don’t really learn the material, but they do well on the test.”
For researchers interested in the intersection of animal and machine intelligence, moreover, this “supervised learning” might be limited in what it can reveal about biological brains. Animals — including humans — don’t use labeled data sets to learn. For the most part, they explore the environment on their own, and in doing so, they gain a rich and robust understanding of the world.
Now some computational neuroscientists have begun to explore neural networks that have been trained with little or no human-labeled data.
These “self-supervised learning” algorithms have proved enormously successful at modeling human language and, more recently, image recognition.
In recent work, computational models of the mammalian visual and auditory systems built using self-supervised learning models have shown a closer correspondence to brain function than their supervised-learning counterparts. To some neuroscientists, it seems as if the artificial networks are beginning to reveal some of the actual methods our brains use to learn.
Flawed Supervision
Brain models inspired by artificial neural networks came of age about 10 years ago, around the same time that a neural network named AlexNet revolutionized the task of classifying unknown images.
That network, like all neural networks, was made of layers of artificial neurons, computational units that form connections to one another that can vary in strength, or “weight.” If a neural network fails to classify an image correctly, the learning algorithm updates the weights of the connections between the neurons to make that misclassification less likely in the next round of training.
The algorithm repeats this process many times with all the training images, tweaking weights, until the network’s error rate is acceptably low.
Around the same time, neuroscientists developed the first computational models of the primate visual system, using neural networks like AlexNet and its successors.
The union looked promising: When monkeys and artificial neural nets were shown the same images, for example, the activity of the real neurons and the artificial neurons showed an intriguing correspondence. Artificial models of hearing and odor detection followed.
But as the field progressed, researchers realized the limitations of supervised training. For instance, in 2017, Leon Gatys, a computer scientist then at the University of Tübingen in Germany, and his colleagues took an image of a Ford Model T, then overlaid a leopard skin pattern across the photo, generating a bizarre but easily recognizable image.
A leading artificial neural network correctly classified the original image as a Model T, but considered the modified image a leopard. It had fixated on the texture and had no understanding of the shape of a car (or a leopard, for that matter).
Self-supervised learning strategies are designed to avoid such problems. In this approach, humans don’t label the data. Rather, “the labels come from the data itself,” said Friedemann Zenke, a computational neuroscientist at the Friedrich Miescher Institute for Biomedical Research in Basel, Switzerland.
Self-supervised algorithms essentially create gaps in the data and ask the neural network to fill in the blanks. In a so-called large language model, for instance, the training algorithm will show the neural network the first few words of a sentence and ask it to predict the next word. When trained with a massive corpus of text gleaned from the internet, the model appears to learn the syntactic structure of the language, demonstrating impressive linguistic ability — all without external labels or supervision.
A similar effort is underway in computer vision. In late 2021, Kaiming He and colleagues revealed their “masked auto-encoder,” which builds on a technique pioneered by Efros’ team in 2016. The self-supervised learning algorithm randomly masks images, obscuring almost three-quarters of each one. The masked auto-encoder turns the unmasked portions into latent representations — compressed mathematical descriptions that contain important information about an object. (In the case of an image, the latent representation might be a mathematical description that captures, among other things, the shape of an object in the image.) A decoder then converts those representations back into full images.
The self-supervised learning algorithm trains the encoder-decoder combination to turn masked images into their full versions. Any differences between the real images and the reconstructed ones get fed back into the system to help it learn. This process repeats for a set of training images until the system’s error rate is suitably low. In one example, when a trained masked auto-encoder was shown a previously unseen image of a bus with almost 80% of it obscured, the system successfully reconstructed the structure of the bus.
“This is a very, very impressive result,” said Efros.
The latent representations created in a system such as this appear to contain substantially deeper information than previous strategies could include. The system might learn the shape of a car, for example — or a leopard — and not just their patterns. “And this is really the fundamental idea of self-supervised learning — you build up your knowledge from the bottom up,” said Efros. No last-minute cramming to pass tests.
Sources:
Published 3 Jun 2022 - in Arxiv : Toward a realistic model of speech processing in the brain with self-supervised learning - https://doi.org/10.48550/arXiv.2206.01685