Datascience in Towards Data Science on Medium,

Continual Learning: The Three Common Scenarios

10/29/2024 Jesus Santana

Plus paper recommendations

As the training costs of machine learning models rise [1], continual learning (CL) emerges as a useful countermeasure. In CL, a machine learning model (e.g., a LLM such as GPT), is trained on a continually arriving stream of data (e.g., text data). Crucially, in CL the data cannot be stored, and thus only the most recent data is available for training. The main challenge is then to train on the current data (often called task) while not forgetting the knowledge learned from the old tasks. Not forgetting old knowledge is critical because at test-time, the model is evaluated on the test-data of all seen tasks. That challenge is often described as catastrophic forgetting in the literature, and is part of the stability-plasticity tradeoff.

One the one hand, the stability-plasticity tradeoff refers to keeping network parameters (e.g., layer weights) stable to not forget (stability). On the other hand, it means to allow parameter changes in order to learn from novel tasks (plasticity). CL methods approach this tradeoff from multiple directions, which I have written about in a previous article.

Photo by lionel mermoz on Unsplash

The focus of today’s article is on the fundamental scenarios that repeatedly appear in CL research: class-incremental, domain-incremental, and task-incremental learning. The choice of the scenario strongly affects the hardness of a problem. In the remainder of this article, I will introduce each scenario in detail. As in my earlier writings, I close with a list of recommended papers to explore the topic further.

Class-incremental learning

Conceptual visualization of the class-incremental learning scenario: The second tasks introduces novel classes (car, airplane), for which additional output neurons are created. Image by the author; created with draw.io.

Class-incremental learning (CIL) is a continual learning scenario that assumes that each task (i.e., each new portion of data) introduces novel classes. These classes are distinct to all previously seen classes. A real-world scenario would be a car that drives from the city to the outback. As the surroundings change, the onboard system likely encounters a completely different set of objects (pedestrian vs. animals, buildings vs. trees, streets vs. roads, …).

To incorporate the new data into a machine learning model, the output layer of the model commonly is expanded. This means that m new neurons are added, one new neuron per newly encountered class.

The challenge in this scenario is that the model implicitly must identify the correct set of output neurons when facing test data. That is, during training the mapping from data to neuron set (say, pedestrian <-> set of neurons that were added in task 1) is clear. Over the training, new neuron sets are introduced with each task. Thus, during testing, considerably more neurons are available to be activated on test data from an earlier task. Exactly this is the challenge: the model must now activate the correct pedestrian neuron in the output layers. This task is not easy. On the well-explored CIFAR100 image dataset, the CIL scenario reaches a low test-accuracy of ~7% if no specific CL methods (see this article) are used [2].

Generally, CIL is the most-challenging of the three scenarios, as the task-identity (i.e., the subset of neurons related to a task, not: the specific output neuron to a data sample — that would be the label) is only available during training. At test time, the task-identity must be inferred.

Domain-incremental learning

Conceptual visualization of the domain-incremental learning scenario: The new tasks introduce known classes (people, buildings), but from different domains (raining, snowing). Image by the author; created with draw.io.

In domain-incremental learning (DIL), it is assumed that each task brings the same classes, but from a different domain. In the real-world scenario of an autonomous vehicle, this would correspond to going from the city center to the sub-urban region. The onboard systems likely detect the same set of classes (pedestrians, pets, buildings, …) but within a new surrounding.

In DIL, the output layer usually remains fixed in size. Only when entirely new classes are added is it expanded. Apart from this exception, the output layer does not have task-specific neuron sets. Generally, the output layer is considered to be already encoding the knowledge of the classes, and new tasks simply deliver know classes froma different domain.

In the DIL scenario, the task-identity is available at train-time only, but not at test time. It is, however, not neccessary at inference, because of the non-specific neuron sets. Furthermore, these sets do not need to be inferred from the data. This makes the DIL scenario easier than the previous CIL. On the CIFAR100 dataset, a simple MLP can reach ~46% test accuracy [2].

Task-incremental learning

Conceptual visualization of the task-incremental learning scenario: The second tasks introduces new classes (car, airplane), for which a separate output head is created. Image by the author; created with draw.io.

In the last scenario, task-incremental learning (TIL), a separate output layer is used for each task. This scenario is sometime also referred to as a multi-head setup, denoting the separate output heads (commonly, an output head is a single feedforward layer). In the example of the self-driving car, TIL would be useful when different driving modes are selected. Each driving mode (sportive, quiet, etc.) comes with a set of actions (=output neurons), and the actions differ between the tasks (=driving modes). That is, in each driving mode, different actions are exclusively available.

In TIL, the output layer(s) of the model are not modified upon arrival of a new task. Instead, separate task-specific output layers are used.

In this scenario, the task-identities are available at train- and test-time. Because of the task-specific output-layers, catastrophic forgetting is often minimal, if any. It can only occur when the weights of the shared backbone — upon which the heads build — are modified. However, as the task-specific heads also encode knowledge via their parameters, the shared backbone might not need to be modified strongly.

Of all presented scenarios, TIL is the easiest one, but requires task-identities (to stress: not labels!) at test-time. These identifies are neccessary to load the task-specific head, and if not supplied with the incoming data might need to be inferred. On CIFAR10, this scenario can reach ~61% final test accuracy [2]. Note that training a separate network for each task can be considered the most extreme case of task-incremental learning (c.f. [3]).

Conclusion and recommended reading

Continual learning trains models on a continuously arriving data stream. Its main challenge is balancing stability and plasticity, which can be measured by the strenght of forgetting. Forgetting refers to the overwriting of neuron weights, which causes the model to loose knowledge learned before. However, depending on the scenario, continual learning becomes easier or harder.

I introduced the following three commonly used scenarios: Class-incremental learning, which is the most challenging one by expanding a shared output head upon new arriving classes. This scenario is followed by domain-incremental learning, where a shared output head is not modified, making it easier. Finally, task-incremental learning is the easiest scenario, as each task uses a separate output head.

Have a look at the following reading recommendations (paper titles given) to further explore the three scenarios:

  • Three types of incremental learning
  • Three scenarios for continual learning
  • Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines

References

[1] https://www.visualcapitalist.com/training-costs-of-ai-models-over-time/; accessed 27. October 2024

[2]: Van de Ven, Gido M., Tinne Tuytelaars, and Andreas S. Tolias. “Three types of incremental learning.” Nature Machine Intelligence 4.12 (2022): 1185–1197.

[3]: Rusu, Andrei A., et al. “Progressive neural networks.” arXiv preprint arXiv:1606.04671 (2016).


Continual Learning: The Three Common Scenarios was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.



from Datascience in Towards Data Science on Medium https://ift.tt/XHnfz8N
via IFTTT

También Podría Gustarte