• Home
  • Towards Adaptive AI with Continual Learning

Machine learning (ML) is a type of Artificial Intelligence (AI) where machines can learn from huge amounts of data, such as images or text, containing highly complex patterns. Previously, these patterns had to be discovered based on hand-crafted rules of thumb, whereas today the horse-power of machine learning algorithms resides in finding these patterns automatically.

Traditional machine learning typically first learns a model using all the available examples in the data and is then deployed for real-world use. This means that whenever a model has finished learning, it remains unchanged when used in practice. The model’s static nature is problematic as it doesn’t befit our ever-changing world. For example, with the rise of autonomous vehicles and the Internet of Things (IoT), data becomes available at unprecedented rates and continuously changes over time. This leaves many contemporary machine learning approaches in the dark, as the static model at deployment can’t put the never-ending flow of data to use. Continual Learning aims to tackle this problem by taking a closer look at adaptive machine learning models, adapting flexibly and continuously to the never-ending stream of data.

Representing the world

henever we think about a concept, say a donut, we can imagine such delight in many varieties, differing in properties such as color and size. Even more so, if we would encounter a sugarcoated torus-shaped pastry in a color we’ve never seen before, we are still able to allocate it to this same semantic concept. This is because our human brains often use representativeness heuristics, where each concept is represented by a most representative prototype.

Neural networks are built by multiple consecutive layers of neuron-like units, remotely based on neurons in the human brain. Typically, many consecutive layers are used, which is why it is often referred to as deep learning. This hierarchy of layers allows the building of increasingly complex representations. In the case of the donut example, early layers would have some low-level processing like the curved lines and the color, whereas later layers combine these properties to form the representation of the donut. If we now observe a lot of donut images, we can combine all of their representations to make the donut prototype. You can think of this prototype as the most generic, plain donut any other donut would resemble.

Eventually, we want such a representative prototype for each of the concepts in our data. In fact, these prototypes capture the most important characteristics of the related concept so well, that when we give a new image to our network it has never seen before (e.g. take a picture with your smartphone), it could tell you which concept prototype it corresponds to (you took a picture of a donut). This predicting of unseen images is called the testing phase, and can be visualized as in the following:

But in order to obtain these prototypes, we need the neural network to learn in some way from the images it gets presented, this process is called the training phase. First of all, the neural network needs an indication of the concepts in an image, such that its corresponding prototype can be identified. Typically, a human supervisor comes into play, annotating each image with a label, indicating it contains one or the other concept and hence which prototype each image belongs to.

In other words, the old prototypes lose their ability to serve as a good approximation to detect future instances of their related concept. Hence, the old task concepts are prone to catastrophic forgetting. This indicates that for Continual Learning, training clearly needs an alternative for this Cross-Entropy error, maintaining the quality of our prototypes.

Evolving prototypes

The primary goal of this Continual Learning approach is to keep the prototypes for all concepts up-to-date at all times. As the knowledge about our previous concepts resides within our adaptive prototypes, we don’t need to store all data for retraining as in standard machine learning approaches. Therefore, if we would determine to add images of new concepts, such as pretzels, to our collection of pastry images, we don’t need to start retraining the neural network all over again, but we can learn from them much like we humans do, without catastrophic forgetting.

Overall, the Continual Learning approach described here aims to create a flexible and adaptive system capable of learning new concepts efficiently without sacrificing the knowledge it has gained from past experiences. This is particularly useful in scenarios where data is continually changing or expanding, and the model needs to adapt to novel information without starting from scratch.

By Asif Raza

Leave Comment