Spelling suggestions: "subject:"andincremental 1earning"" "subject:"andincremental c1earning""
1 |
Referencing Unlabelled World Data to Prevent Catastrophic Forgetting in Class-incremental LearningLi, Xuan 24 June 2022 (has links)
This thesis presents a novel strategy to address the challenge of "catastrophic forgetting" in deep continual-learning systems. The term refers to severe performance degradation for older tasks, as a system learns new tasks that are presented sequentially. Most previous techniques have emphasized preservation of existing knowledge while learning new tasks, in some cases advocating a memory buffer that grows in proportion to the number of tasks. However, we offer another perspective, which is that mitigating local-task fitness during learning is as important as attempting to preserve existing knowledge. We posit the existence of a consistent, unlabelled world environment that the system uses as an easily-accessible reference to avoid favoring spurious properties over more generalizable ones. Based on this assumption, we have developed a novel method called Learning with Reference (LwR), which delivers substantial performance gains relative to its state-of-the-art counterparts. The approach does not involve a growing memory buffer, and therefore promotes better performance at scale. We present extensive empirical evaluation on real-world datasets. / Master of Science / Rome was not built in a day, and in nature knowledge is acquired and consolidated gradually over time. Evolution has taught biological systems how to address emerging challenges by building on past experience, adapting quickly while retaining known skills. Modern artificial intelligence systems also seek to amortize the learning process over time. Specifically, one large learning task can be divided into many smaller non-overlapping tasks. For example, a classification task of two classes, tiger and horse, is divided into two tasks, where the classifier only sees and learns from tiger data in the first task and horse data in the second task. The systems are expected to sequentially acquire knowledge from these smaller tasks. Such learning strategy is known as continual learning and provides three meaningful benefits: higher resource efficiency, a progressively better knowledge base, and strong adaptability. In this thesis, we investigate the class-incremental learning problem, a subset of continual learning, which refers to learning a classification model from a sequence of tasks.
Different from transfer learning, which targets better performance in new domains, continual learning emphasizes the knowledge preservation of both old and new tasks. In deep neural networks, one challenge against the preservation is "catastrophic forgetting", which refers to severe performance degradation for older tasks, as a system learns new ones that are presented sequentially. An intuitive explanation is that old task data is missing in the new tasks under continual learning setting and the model is optimized toward new tasks without concerning the old ones. To overcome this, most previous techniques have emphasized the preservation of existing knowledge while learning new tasks, in some cases advocating old-data replay with a memory buffer, which grows in proportion to the number of tasks.
In this thesis, we offer another perspective, which is that mitigating local-task fitness during learning is as important as attempting to preserve existing knowledge. We notice that local task data always has strong biases because of its smaller size. Optimization on it leads the model to local optima, therefore losing a holistic view that is crucial for other tasks. To mitigate this, a reliable reference should be enforced across tasks and the model should consistently learn all new knowledge based on this. With this assumption, we have developed a novel method called Learning with Reference (LwR), which posits the existence of a consistent, unlabelled world environment that the system uses as an easily-accessible reference to avoid favoring spurious properties over more generalizable ones. Our extensive empirical experiments show that it significantly outperforms state-of-the-art counterparts in real-world datasets.
|
Page generated in 0.094 seconds