1 |
Developing Toward Generality: Combating Catastrophic Forgetting with Developmental CompressionBeaulieu, Shawn L 01 January 2018 (has links)
General intelligence is the exhibition of intelligent behavior across multiple problems in a variety of settings, however intelligence is defined and measured.
Endemic in approaches to realize such intelligence in machines is catastrophic forgetting, in which sequential learning corrupts knowledge obtained earlier in the sequence or in which tasks antagonistically compete for system resources. Methods for obviating catastrophic forgetting have either sought to identify and preserve features of the system necessary to solve one problem when learning to solve another, or enforce modularity such that minimally overlapping sub-functions contain task-specific knowledge. While successful in some domains, both approaches scale poorly because they require larger architectures as the number of training instances grows, causing different parts of the system to specialize for separate subsets of the data.
Presented here is a method called developmental compression that addresses catastrophic forgetting in the neural networks of embodied agents. It exploits the mild impacts of developmental mutations to lessen adverse changes to previously evolved capabilities and `compresses' specialized neural networks into a single generalized one. In the absence of domain knowledge, developmental compression produces systems that avoid overt specialization, alleviating the need to engineer a bespoke system for every task permutation, and does so in a way that suggests better scalability than existing approaches. This method is validated on a robot control problem and may be extended to other machine learning domains in the future.
|
2 |
Machines Do Not Have Little Gray Cells: : Analysing Catastrophic Forgetting in Cross-Domain Intrusion Detection Systems / Machines Do Not Have Little Gray Cells: : Analysing Catastrophic Forgetting in Cross-Domain Intrusion Detection SystemsValieh, Ramin, Esmaeili Kia, Farid January 2023 (has links)
Cross-domain intrusion detection, a critical component of cybersecurity, involves evaluating the performance of neural networks across diverse datasets or databases. The ability of intrusion detection systems to effectively adapt to new threats and data sources is paramount for safeguarding networks and sensitive information. This research delves into the intricate world of cross-domain intrusion detection, where neural networks must demonstrate their versatility and adaptability. The results of our experiments expose a significant challenge: the phenomenon known as catastrophic forgetting. This is the tendency of neural networks to forget previously acquired knowledge when exposed to new information. In the context of intrusion detection, it means that as models are sequentially trained on different intrusion detection datasets, their performance on earlier datasets degrades drastically. This degradation poses a substantial threat to the reliability of intrusion detection systems. In response to this challenge, this research investigates potential solutions to mitigate the effects of catastrophic forgetting. We propose the application of continual learning techniques as a means to address this problem. Specifically, we explore the Elastic Weight Consolidation (EWC) algorithm as an example of preserving previously learned knowledge while allowing the model to adapt to new intrusion detection tasks. By examining the performance of neural networks on various intrusion detection datasets, we aim to shed light on the practical implications of catastrophic forgetting and the potential benefits of adopting EWC as a memory-preserving technique. This research underscores the importance of addressing catastrophic forgetting in cross-domain intrusion detection systems. It provides a stepping stone for future endeavours in enhancing multi-task learning and adaptability within the critical domain of intrusion detection, ultimately contributing to the ongoing efforts to fortify cybersecurity defences.
|
3 |
CATASTROPHIC FORGETTING IN NEURAL NETWORKSRiesenberg, John R. January 2000 (has links)
No description available.
|
4 |
Avoiding Catastrophic Forgetting in Continual Learning through Elastic Weight ConsolidationEvilevitch, Anton, Ingram, Robert January 2021 (has links)
Image classification is an area of computer science with many areas of application. One key issue with using Artificial Neural Networks (ANN) for image classification is the phenomenon of Catastrophic Forgetting when training tasks sequentially (i.e Continual Learning). This is when the network quickly looses its performance on a given task after it has been trained on a new task. Elastic Weight Consolidation (EWC) has previously been proposed as a remedy to lessen the effects of this phenomena through the use of a loss function which utilizes a Fisher Information Matrix. We want to explore and establish if this still holds true for modern network architectures, and to what extent this can be applied using today’s state- of- the- art networks. We focus on applying this approach on tasks within the same dataset. Our results indicate that the approach is feasible, and does in fact lessen the effect of Catastrophic Forgetting. These results are achieved, however, at the cost of much longer execution times and time spent tuning the hyper- parameters. / Bildklassifiering är ett område inom dataologi med många tillämpningsområden. En nyckelfråga när det gäller användingen av Artificial Neural Networks (ANN) för bildklassifiering är fenomenet Catastrophic Forgetting. Detta inträffar när ett nätverk tränas sekventiellt (m.a.o. Continual Learning). Detta innebär att nätverket snabbt tappar prestanda för en viss uppgift efter att den har tränats på en ny uppgift. Elastic Weight Consolidation (EWC) har tidigare föreslagits som ett lindring genom applicering av en förlustfunktion som använder Fisher Information Matrix. Vi vill utforska och fastställa om detta fortfarande gäller för moderna nätverksarkitekturer, och i vilken utsträckning det kan tillämpas. Vi utför metoden på uppgifter inom en och samma dataset. Våra resultat visar att metoden är genomförbar och har en minskande effekt på Catastrophic Forgetting. Dessa resultat uppnås dock på bekostnad av längre körningstider och ökad tidsåtgång för val av hyperparametrar.
|
5 |
Extension on Adaptive MAC Protocol for Space CommunicationsLi, Max Hongming 06 December 2018 (has links)
This work devises a novel approach for mitigating the effects of Catastrophic Forgetting in Deep Reinforcement Learning-based cognitive radio engine implementations employed in space communication applications. Previous implementations of cognitive radio space communication systems utilized a moving window- based online learning method, which discards part of its understanding of the environment each time the window is moved. This act of discarding is called Catastrophic Forgetting. This work investigated ways to control the forgetting process in a more systematic manner, both through a recursive training technique that implements forgetting in a more controlled manner and an ensemble learning technique where each member of the ensemble represents the engine's understanding over a certain period of time. Both of these techniques were integrated into a cognitive radio engine proof-of-concept, and were delivered to the SDR platform on the International Space Station. The results were then compared to the results from the original proof-of-concept. Through comparison, the ensemble learning technique showed promise when comparing performance between training techniques during different communication channel contexts.
|
6 |
Incremental Learning With Sample Generation From Pretrained NetworksJanuary 2020 (has links)
abstract: In the last decade deep learning based models have revolutionized machine learning and computer vision applications. However, these models are data-hungry and training them is a time-consuming process. In addition, when deep neural networks are updated to augment their prediction space with new data, they run into the problem of catastrophic forgetting, where the model forgets previously learned knowledge as it overfits to the newly available data. Incremental learning algorithms enable deep neural networks to prevent catastrophic forgetting by retaining knowledge of previously observed data while also learning from newly available data.
This thesis presents three models for incremental learning; (i) Design of an algorithm for generative incremental learning using a pre-trained deep neural network classifier; (ii) Development of a hashing based clustering algorithm for efficient incremental learning; (iii) Design of a student-teacher coupled neural network to distill knowledge for incremental learning. The proposed algorithms were evaluated using popular vision datasets for classification tasks. The thesis concludes with a discussion about the feasibility of using these techniques to transfer information between networks and also for incremental learning applications. / Dissertation/Thesis / Masters Thesis Computer Science 2020
|
7 |
Multi-Task Reinforcement Learning: From Single-Agent to Multi-Agent SystemsTrang, Matthew Luu 06 January 2023 (has links)
Generalized collaborative drones are a technology that has many potential benefits. General purpose drones that can handle exploration, navigation, manipulation, and more without having to be reprogrammed would be an immense breakthrough for usability and adoption of the technology. The ability to develop these multi-task, multi-agent drone systems is limited by the lack of available training environments, as well as deficiencies of multi-task learning due to a phenomenon known as catastrophic forgetting. In this thesis, we present a set of simulation environments for exploring the abilities of multi-task drone systems and provide a platform for testing agents in incremental single-agent and multi-agent learning scenarios. The multi-task platform is an extension of an existing drone simulation environment written in Python using the PyBullet Physics Simulation Engine, with these environments incorporated. Using this platform, we present an analysis of Incremental Learning and detail the beneficial impacts of using the technique for multi-task learning, with respect to multi-task learning speed and catastrophic forgetting. Finally, we introduce a novel algorithm, Incremental Learning with Second-Order Approximation Regularization (IL-SOAR), to mitigate some of the effects of catastrophic forgetting in multi-task learning. We show the impact of this method and contrast the performance relative to a multi-agent multi-task approach using a centralized policy sharing algorithm. / Master of Science / Machine Learning techniques allow drones to be trained to achieve tasks which are otherwise time-consuming or difficult. The goal of this thesis is to facilitate the work of creating these complex drone machine learning systems by exploring Reinforcement Learning (RL), a field of machine learning which involves learning the correct actions to take through experience. Currently, RL methods are effective in the design of drones which are able to solve one particular task. The next step in this technology is to develop RL systems which are able to handle generalization and perform well across multiple tasks. In this thesis, simulation environments for drones to learn complex tasks are created, and algorithms which are able to train drones in multiple hard tasks are developed and tested. We explore the benefits of using a specific multi-task training technique known as Incremental Learning. Additionally, we consider one of the prohibitive factors of multi-task machine learning-based solutions, the degradation problem of agent performance on previously learned tasks, known as catastrophic forgetting. We create an algorithm that aims to prevent the impact of forgetting when training drones sequentially on new tasks. We contrast this approach with a multi-agent solution, where multiple drones learn simultaneously across the tasks.
|
8 |
Non-linguistic Notions in Language Modeling: Learning, Retention, and ApplicationsSharma, Mandar 11 September 2024 (has links)
Language modeling, especially through the use of transformer-based large language models (LLMs), has drastically changed how we view and use artificial intelligence (AI) and machine learning (ML) in our daily lives. Although LLMs have showcased remarkable linguistic proficiency in their abilities to write, summarize, and phrase, these model have yet to achieve the same remarkability in their ability to quantitatively reason. This deficiency is specially apparent in smaller models (less than 1 Billion parameters) than can run natively on-device. Between the complementary capabilities of qualitative and quantitative reasoning, this thesis focuses on the latter, where the goal is to devise mechanisms to instill quantitative reasoning capabilities into these models. However, instilling this notion is not as straight forward as traditional end-to-end learning. The learning of quantitative notions include the ability of the model to discern between regular linguistic tokens and magnitude/scale-oriented non-linguistic tokens. The learning of these notions, specially after pre-training, comes at a cost for these models: catastrophic forgetting. Thus, learning needs to be followed with retention - making sure these models do not forget what they have learned. Thus, we first motivate the need for numeracy-enhanced models via their potential applications in field of data-to-text generation (D2T), showcasing how these models behave as quantitative reasoners as-is. Then, we devise both token-level training interventions and information-theoretic training interventions to numerically enhance these models, with the latter specifically focused on combating catastrophic forgetting. Our information-theoretic interventions not only lead to numerically-enhanced models but lend us critical insights into the learning behavior of these models, especially when it comes to adapting these models to the target task distribution from their pretraining distribution. Finally, we extrapolate these insights to devise more effective strategies transfer learning and unlearning for language modeling. / Doctor of Philosophy / Language modeling, especially through the use of transformer-based large language models (LLMs), has drastically changed how we view and use artificial intelligence (AI) and machine learning (ML) in our daily lives. Although LLMs have showcased remarkable linguistic proficiency in their abilities to write, summarize, and phrase, these model have yet to achieve the same remarkability in their ability to quantitatively reason. This deficiency is specially apparent in smaller models than can run natively on-device. This thesis focuses on instilling within these models the ability to perform quantitative reasoning - the ability to differentiate between words and numbers and understand the notions of magnitude tied with said numbers, while retaining their linguistic skills. The learned insights from our experiments are further used to devise models that better adapt to target tasks.
|
9 |
Lifelong Adaptive Neuronal Learning for Autonomous Multi-Robot Demining in Colombia, and Enhancing the Science, Technology and Innovation Capacity of the Ejército Nacional de ColombiaJanuary 2019 (has links)
abstract: In order to deploy autonomous multi-robot teams for humanitarian demining in Colombia, two key problems need to be addressed. First, a robotic controller with limited power that can completely cover a dynamic search area is needed. Second, the Colombian National Army (COLAR) needs to increase its science, technology and innovation (STI) capacity to help develop, build and maintain such robots. Using Thangavelautham's (2012, 2017) Artificial Neural Tissue (ANT) control algorithm, a robotic controller for an autonomous multi-robot team was developed. Trained by a simple genetic algorithm, ANT is an artificial neural network (ANN) controller with a sparse, coarse coding network architecture and adaptive activation functions. Starting from the exterior of open, basic geometric grid areas, computer simulations of an ANT multi-robot team with limited time steps, no central controller and limited a priori information, covered some areas completely in linear time, and other areas near completely in quasi-linear time, comparable to the theoretical cover time bounds of grid-based, ant pheromone, area coverage algorithms. To mitigate catastrophic forgetting, a new learning method for ANT, Lifelong Adaptive Neuronal Learning (LANL) was developed, where neural network weight parameters for a specific coverage task were frozen, and only the activation function and output behavior parameters were re-trained for a new coverage task. The performance of the LANL controllers were comparable to training all parameters ab initio, for a new ANT controller for the new coverage task.
To increase COLAR's STI capacity, a proposal for a new STI officer corps, Project ÉLITE (Equipo de Líderes en Investigación y Tecnología del Ejército) was developed, where officers enroll in a research intensive, master of science program in applied mathematics or physics in Colombia, and conduct research in the US during their final year. ÉLITE is inspired by the Israel Defense Forces Talpiot program. / Dissertation/Thesis / Doctoral Dissertation Applied Mathematics for the Life and Social Sciences 2019
|
10 |
Continual Learning and Biomedical Image Data : Attempting to sequentially learn medical imaging datasets using continual learning approaches / Kontinuerligt lärande och Biomedicinsk bilddata : Försöker att sekventiellt lära sig medicinska bilddata genom att använda metoder för kontinuerligt lärandeSoselia, Davit January 2022 (has links)
While deep learning has proved to be useful in a large variety of tasks, a limitation remains of needing all classes and samples to be present at the training stage in supervised problems. This is a major issue in the field of biomedical imaging since keeping samples in the training sets consistently is often a liability. Furthermore, this issue prevents the simple updating of older models with only the new data when it is introduced, and prevents collaboration between companies. In this work, we examine an array of Continual Learning approaches to try to improve upon the baseline of the naive finetuning approach when retraining on new tasks, and achieve accuracy levels similar to the ones seen when all the data is available at the same time. Continual learning approaches with which we attempt to mitigate the problem are EWC, UCB, EWC Online, SI, MAS, CN-DPM. We explore some complex scenarios with varied classes being included in the tasks, as well as close to ideal scenarios where the sample size is balanced among the tasks. Overall, we focus on X-ray images, since they encompass a large variety of diseases, with new diseases requiring retraining. In the preferred setting, where classes are relatively balanced, we get an accuracy of 63.30 versus a baseline of 53.92 and the target score of 66.83. For the continued training on the same classes, we get an accuracy of 35.52 versus a baseline of 27.73. We also examine whether learning rate adjustments at task level improve accuracy, with some improvements for EWC Online. The preliminary results indicate that CL approaches such as EWC Online and SI could be integrated into radiography data learning pipelines to reduce catastrophic forgetting in situations where some level of sequential training ability justifies the significant computational overhead. / Även om djupinlärning har visat sig vara användbart i en mängd olika uppgifter, kvarstår en begränsning av att behöva alla klasser och prover som finns på utbildningsstadiet i övervakade problem. Detta är en viktig fråga inom området biomedicinsk avbildning eftersom det ofta är en belastning att hålla prover i träningsuppsättningarna. Dessutom förhindrar det här problemet enkel uppdatering av äldre modeller med endast nya data när de introduceras och förhindrar samarbete mellan företag. I det här arbetet undersöker vi en rad kontinuerliga inlärningsmetoder för att försöka förbättra baslinjen för den naiva finjusteringsmetoden vid omskolning på nya uppgifter och närma sig noggrannhetsnivåer som de som ses när alla data är tillgängliga samtidigt. Kontinuerliga inlärningsmetoder som vi försöker mildra problemet med inkluderar bland annat EWC, UCB, EWC Online, SI. Vi utforskar några komplexa scenarier med olika klasser som ingår i uppgifterna, samt nära idealiska scenarier där exempelstorleken balanseras mellan uppgifterna. Sammantaget fokuserar vi på röntgenbilder, eftersom de omfattar ett stort antal sjukdomar, med nya sjukdomar som kräver omskolning. I den föredragna inställningen får vi en noggrannhet på 63,30 jämfört med en baslinje på 53,92 och målpoängen på 66,83. Medan vi för den utökade träningen på samma klasser får en noggrannhet på 35,52 jämfört med en baslinje på 27,73. Vi undersöker också om justeringar av inlärningsfrekvensen på uppgiftsnivå förbättrar noggrannheten, med vissa förbättringar för EWC Online. De preliminära resultaten tyder på att CL-metoder som EWC Online och SI kan integreras i rörledningar för röntgendatainlärning för att minska katastrofal glömska i situationer där en viss nivå av sekventiell utbildningsförmåga motiverar den betydande beräkningskostnaden.
|
Page generated in 0.0453 seconds