Continual learning

It is all about improving.

The human brain has the ability to efficiently learn new tasks without dramatically forgetting the knowledge acquired in previously learned tasks. When we learn to play the guitar, we don’t have to learn how to brush our teeth again, right? Somehow, the brain is able to acquire new knowledge while protecting consolidated memories, so that we are still able to brush our teeth while we get better and better at playing the guitar. But that’s not all. When humans face an unknown task for the first time, they are able to reuse and exploit relevant memories (experience) that can be useful. Such experience allows humans to learn faster and faster as they face more tasks sharing common characteristics and concepts.

How far AI is

AI, however, is still far from being so efficient at learning new stuff. Due to the inherent properties of modern Machine Learning (ML) algorithms (mainly based on artificial neural networks and gradient descent optimization methods), AI often suffers from catastrophic forgetting. This phenomenom causes the AI to dramatically forget old memories whenever it acquires new knowledge. To deal with this situation, the AI must be often retrained on every single task it is meant to perform.

Why it matters

Training an AI on a task requires storing enough data so that the AI can effectively learn to perform the task. As a consequence, maintaining and improving AI systems becomes more and more expensive as we add more tasks, concepts, or characteristics to the system, in terms of both storage and computing resources. It is then fundamental to develop ML algorithms which allow the AI to efficiently acquire new knowledge without the need to retrain.

Our research at Neuraptic AI

Inspired by some of the brain functions involved in learning, we are working hard to allow AI systems to continously learn in an efficient way. In particular, we are designing a dual-memory neural network formed of two learning systems specialized in two different things. One of them learns to perform the tasks at hand while the other learns to feed the former system with the most relevant data so that it can protect memories and avoid forgetting tasks. While this architecture doesn’t get rid of retraining, it helps to significantly reduce storage and computing costs by minimizing the quantity of data needed to retain consolidated memories.

Our research is based on the Complementary Learning Systems (CLS) framework proposed by McClelland et al. in the field of neuroscience. The CLS suggests that the hippocampus would allow for the rapid learning of novel information by performing short-term adaptations with fast learning rates. Conversely, the neocortex would be characterized by a slow learning rate and build overlapping representations of the learned knowledge. These two learning systems would interact with each other by replaying recent memories for its long-term retention in the neocortex.

Other findings from neuroscience have also served as inspiration for our research. The Hippocampal Memory Indexing theory suggests that the hippocampus may funtion as an index of memories. Such an index would be represented in our model by the network specialized in feeding the system with the most relevant data. This network would be learning to reinforce the index of high-impact memories and diminish the index of low-impact memories.

Want to know more?

Different approaches have been proposed to mitigate catastrophic forgetting and/or reduce continual learning costs.

Replay

Replay methods interleave data from previous and new tasks, either by explicitly storing real data or by generating synthetic data with a generative model (pseudo-replay). Jointly retraining all tasks with intearleaved data allows the AI to learn and keep the overlapping representations and protect relevant memories. This approach doesn’t usually require any modification to existing ML algorithms, as it is focused on the data provided to the AI for training.

Regularization

Regularization methods focus on restricting updates on model parameters to protect consolidated memories without the need to retrain previously learned tasks. High regularization coefficients (restrictions) lead to stable memories (no forgetting) but poor plasticity (no learning), and vice versa. This constraint is known as the stability-plasticity dilemma.

In the figure below, parameter updates are represented by dashed lines. As the figure shows, parameters encoding consolidated memories are updated less frequently, while parameters encoding new knowledge are updated more frequently.

Parameter isolation

Parameter isolation methods devote a subset of parameters to each task using one of the following architectures:

    Fixed

    When a new task is presented to the AI, task-specific parameters are added to the model. To learn the new task, the subsets of parameters devoted to previously learned tasks are frozen (kept untouched), while remaining parameters are updated.

    Dynamic

    For each new task, the model grows beyond task-specific parameters (process known as neurogenesis). When learning the new task, all previous parameters (both task-specific ones and those shared by all tasks) are frozen, while the new parameters are updated.

    Dynamic architectures may offer more protection against catastrophic forgetting than fixed architectures, but may need to grow indefinitely to continously learn new tasks, which would be unfeasible in many cases.