• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 91
  • 4
  • 3
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 124
  • 124
  • 124
  • 38
  • 25
  • 22
  • 22
  • 22
  • 21
  • 21
  • 20
  • 20
  • 20
  • 19
  • 18
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

[en] A SIMULATION STUDY OF TRANSFER LEARNING IN DEEP REINFORCEMENT LEARNING FOR ROBOTICS / [pt] UM ESTUDO DE TRANSFER LEARNING EM DEEP REINFORCEMENT LEARNING EM AMBIENTES ROBÓTICOS SIMULADOS

EVELYN CONCEICAO SANTOS BATISTA 05 August 2020 (has links)
[pt] Esta dissertação de mestrado consiste em um estudo avançado sobre aprendizado profundo por reforço visual para robôs autônomos através de técnicas de transferência de aprendizado. Os ambientes de simulação testados neste estudo são ambientes realistas complexos onde o robô tinha como desafio aprender e transferir conhecimento em diferentes contextos para aproveitar a experiência de ambientes anteriores em ambientes futuros. Este tipo de abordagem, além de agregar conhecimento ao robô autônomo, diminui o número de épocas de treinamento do algoritmo, mesmo em ambientes complexos, justificando o uso de técnicas de transferência de aprendizado. / [en] This master s thesis consists of an advanced study on deep learning by visual reinforcement for autonomous robots through transfer learning techniques. The simulation environments tested in this study are highly realistic environments where the challenge of the robot was to learn and tranfer knowledge in different contexts to take advantage of the experiencia of previous environments in future environments. This type of approach besides adding knowledge to the autonomous robot reduces the number of training epochs the algorithm, even in complex environments, justifying the use of transfer learning techniques.
112

Towards Novelty-Resilient AI: Learning in the Open World

Trevor A Bonjour (18423153) 22 April 2024 (has links)
<p dir="ltr">Current artificial intelligence (AI) systems are proficient at tasks in a closed-world setting where the rules are often rigid. However, in real-world applications, the environment is usually open and dynamic. In this work, we investigate the effects of such dynamic environments on AI systems and develop ways to mitigate those effects. Central to our exploration is the concept of \textit{novelties}. Novelties encompass structural changes, unanticipated events, and environmental shifts that can confound traditional AI systems. We categorize novelties based on their representation, anticipation, and impact on agents, laying the groundwork for systematic detection and adaptation strategies. We explore novelties in the context of stochastic games. Decision-making in stochastic games exercises many aspects of the same reasoning capabilities needed by AI agents acting in the real world. A multi-agent stochastic game allows for infinitely many ways to introduce novelty. We propose an extension of the deep reinforcement learning (DRL) paradigm to develop agents that can detect and adapt to novelties in these environments. To address the sample efficiency challenge in DRL, we introduce a hybrid approach that combines fixed-policy methods with traditional DRL techniques, offering enhanced performance in complex decision-making tasks. We present a novel method for detecting anticipated novelties in multi-agent games, leveraging information theory to discern patterns indicative of collusion among players. Finally, we introduce DABLER, a pioneering deep reinforcement learning architecture that dynamically adapts to changing environmental conditions through broad learning approaches and environment recognition. Our findings underscore the importance of developing AI systems equipped to navigate the uncertainties of the open world, offering promising pathways for advancing AI research and application in real-world settings.</p>
113

Optimizing the Fronthaul in C-RAN by Deep Reinforcement Learning : Latency Constrained Fronthaul optimization with Deep Reinforment Learning / Optimering av Fronthaul i C-RAN med Djup Förstärknings Inlärning : Latens begränsad Fronthaul Optimering med Djup Förstärknings Inlärning

Grönland, Axel January 2023 (has links)
Centralized Radio Access Networks or C-RAN for short is a type of network that aims to centralize perform some of it's computation at centralized locations. Since a lot of functionality is centralized we can show from multiplexing that the centralization leads to lower operating costs. The drawback with C-RAN are the huge bandwidth requirements over the fronthaul. We know that scenarios where all cells experience high load is a very low probability scenario. Since functions are centralized this also allows more adaptability, we can choose to change the communication standard for each cell depending on the load scenario. In this thesis we set out to create such a controller with the use of Deep Reinforcement Learning. The problem overall is difficult due to the complexity of modelling the problem, but also since C-RAN is a relatively new concept in the telecom world. We solved this problem with two traditional reinforcement learning algorithms, DQN and SAC. We define a constraint optimization problem and phrase it in such a way that the problem can be solved with a deep reinforcement learning algorithm. We found that the learning worked pretty well and we can show that our trained policies satisfy the constraint. With these results one could show that resource allocations problems can be solved pretty well by a deep reinforcement learning controller. / Centralized Radio Access Networks eller C-RAN som förkortning är en kommunications nätverk som siktar på att centralisera vissa funktioner i centrala platser. Eftersom mmånga funktioner är centraliserade så kan vi visa från statistisk multiplexing att hög trafik scenarion över många celler är av låg sannolikhet vilket leder till lägre service kostnader. Nackdelen med C-RAN är den höga bandbredds kravet över fronthaulen. Trafik scenarion där alla celler utsäts för hög last är väldigt låg sannolikhet så kan vi dimensionera fronthaulen för att klara mindre än det värsta trafik scenariot. Eftersom funktioner är centralizerade så tillåter det även att vi kan adaptivt anpassa resurser för trafiken. I denna uppsats så kommer vi att skapa en sådan kontroller med djup reinforcement learning. Problemet är komplext att modellera och C-RAN är ett relativt nytt concept i telecom världen. Vi löser detta problem med två traditionella algoritmer, deep Q networks(DQN) och soft actor critic(SAC). Vi definierar ett vilkorligt optimerings problem och visar hur det kan formuleras som ett inlärnings problem. Vi visar att denna metod funkar rätt bra som en lösning till problemet och att den uppfyller bivilkoren. Våra resultat visar att resurs allokerings problem kan lösas nära optimalitet med reinforcement learning.
114

Beyond the status quo in deep reinforcement learning

Agarwal, Rishabh 05 1900 (has links)
L’apprentissage par renforcement profond (RL) a connu d’énormes progrès ces dernières années, mais il est encore difficile d’appliquer le RL aux problèmes de prise de décision du monde réel. Cette thèse identifie trois défis clés avec la façon dont nous faisons la recherche RL elle-même qui entravent les progrès de la recherche RL. — Évaluation et comparaison peu fiables des algorithmes RL ; les méthodes d’évaluation actuelles conduisent souvent à des résultats peu fiables. — Manque d’informations préalables dans la recherche RL ; Les algorithmes RL sont souvent formés à partir de zéro, ce qui peut nécessiter de grandes quantités de données ou de ressources informatiques. — Manque de compréhension de la façon dont les réseaux de neurones profonds interagissent avec RL, ce qui rend difficile le développement de méthodes évolutives de RL. Pour relever ces défis susmentionnés, cette thèse apporte les contributions suivantes : — Une méthodologie plus rigoureuse pour évaluer les algorithmes RL. — Un flux de travail de recherche alternatif qui se concentre sur la réutilisation des progrès existants sur une tâche. — Identification d’un phénomène de perte de capacité implicite avec un entraînement RL hors ligne prolongé. Dans l’ensemble, cette thèse remet en question le statu quo dans le RL profond et montre comment cela peut conduire à des algorithmes de RL plus efficaces, fiables et mieux applicables dans le monde réel. / Deep reinforcement learning (RL) has seen tremendous progress in recent years, but it is still difficult to apply RL to real-world decision-making problems. This thesis identifies three key challenges with how we do RL research itself that hinder the progress of RL research. — Unreliable evaluation and comparison of RL algorithms; current evaluation methods often lead to unreliable results. — Lack of prior information in RL research; RL algorithms are often trained from scratch, which can require large amounts of data or computational resources. — Lack of understanding of how deep neural networks interact with RL, making it hard to develop scalable RL methods. To tackle these aforementioned challenges, this thesis makes the following contributions: — A more rigorous methodology for evaluating RL algorithms. — An alternative research workflow that focuses on reusing existing progress on a task. — Identifying an implicit capacity loss phenomenon with prolonged offline RL training. Overall, this thesis challenges the status quo in deep reinforcement learning and shows that doing so can make RL more efficient, reliable and improve its real-world applicability
115

On two sequential problems : the load planning and sequencing problem and the non-normal recurrent neural network

Goyette, Kyle 07 1900 (has links)
The work in this thesis is separated into two parts. The first part deals with the load planning and sequencing problem for double-stack intermodal railcars, an operational problem found at many rail container terminals. In this problem, containers must be assigned to a platform on which the container will be loaded, and the loading order must be determined. These decisions are made with the objective of minimizing the costs associated with handling the containers, as well as minimizing the cost of containers left behind. The deterministic version of the problem can be cast as a shortest path problem on an ordered graph. This problem is challenging to solve because of the large size of the graph. We propose a two-stage heuristic based on the Iterative Deepening A* algorithm to compute solutions to the load planning and sequencing problem within a five-minute time budget. Next, we also illustrate how a Deep Q-learning algorithm can be used to heuristically solve the same problem.The second part of this thesis considers sequential models in deep learning. A recent strategy to circumvent the exploding and vanishing gradient problem in recurrent neural networks (RNNs) is to enforce recurrent weight matrices to be orthogonal or unitary. While this ensures stable dynamics during training, it comes at the cost of reduced expressivity due to the limited variety of orthogonal transformations. We propose a parameterization of RNNs, based on the Schur decomposition, that mitigates the exploding and vanishing gradient problem, while allowing for non-orthogonal recurrent weight matrices in the model. / Le travail de cette thèse est divisé en deux parties. La première partie traite du problème de planification et de séquencement des chargements de conteneurs sur des wagons, un problème opérationnel rencontré dans de nombreux terminaux ferroviaires intermodaux. Dans ce problème, les conteneurs doivent être affectés à une plate-forme sur laquelle un ou deux conteneurs seront chargés et l'ordre de chargement doit être déterminé. Ces décisions sont prises dans le but de minimiser les coûts associés à la manutention des conteneurs, ainsi que de minimiser le coût des conteneurs non chargés. La version déterministe du problème peut être formulé comme un problème de plus court chemin sur un graphe ordonné. Ce problème est difficile à résoudre en raison de la grande taille du graphe. Nous proposons une heuristique en deux étapes basée sur l'algorithme Iterative Deepening A* pour calculer des solutions au problème de planification et de séquencement de la charge dans un budget de cinq minutes. Ensuite, nous illustrons également comment un algorithme d'apprentissage Deep Q peut être utilisé pour résoudre heuristiquement le même problème. La deuxième partie de cette thèse examine les modèles séquentiels en apprentissage profond. Une stratégie récente pour contourner le problème de gradient qui explose et disparaît dans les réseaux de neurones récurrents (RNN) consiste à imposer des matrices de poids récurrentes orthogonales ou unitaires. Bien que cela assure une dynamique stable pendant l'entraînement, cela se fait au prix d'une expressivité réduite en raison de la variété limitée des transformations orthogonales. Nous proposons une paramétrisation des RNN, basée sur la décomposition de Schur, qui atténue les problèmes de gradient, tout en permettant des matrices de poids récurrentes non orthogonales dans le modèle.
116

GAME-THEORETIC MODELING OF MULTI-AGENT SYSTEMS: APPLICATIONS IN SYSTEMS ENGINEERING AND ACQUISITION PROCESSES

Salar Safarkhani (9165011) 24 July 2020 (has links)
<div><div><div><p>The process of acquiring the large-scale complex systems is usually characterized with cost and schedule overruns. To investigate the causes of this problem, we may view the acquisition of a complex system in several different time scales. At finer time scales, one may study different stages of the acquisition process from the intricate details of the entire systems engineering process to communication between design teams to how individual designers solve problems. At the largest time scale one may consider the acquisition process as series of actions which are, request for bids, bidding and auctioning, contracting, and finally building and deploying the system, without resolving the fine details that occur within each step. In this work, we study the acquisition processes in multiple scales. First, we develop a game-theoretic model for engineering of the systems in the building and deploying stage. We model the interactions among the systems and subsystem engineers as a principal-agent problem. We develop a one-shot shallow systems engineering process and obtain the optimum transfer functions that best incentivize the subsystem engineers to maximize the expected system-level utility. The core of the principal-agent model is the quality function which maps the effort of the agent to the performance (quality) of the system. Therefore, we build the stochastic quality function by modeling the design process as a sequential decision-making problem. Second, we develop and evaluate a model of the acquisition process that accounts for the strategic behavior of different parties. We cast our model in terms of government-funded projects and assume the following steps. First, the government publishes a request for bids. Then, private firms offer their proposals in a bidding process and the winner bidder enters in a con- tract with the government. The contract describes the system requirements and the corresponding monetary transfers for meeting them. The winner firm devotes effort to deliver a system that fulfills the requirements. This can be assumed as a game that the government plays with the bidder firms. We study how different parameters in the acquisition procedure affect the bidders’ behaviors and therefore, the utility of the government. Using reinforcement learning, we seek to learn the optimal policies of involved actors in this game. In particular, we study how the requirements, contract types such as cost-plus and incentive-based contracts, number of bidders, problem complexity, etc., affect the acquisition procedure. Furthermore, we study the bidding strategy of the private firms and how the contract types affect their strategic behavior.</p></div></div></div>
117

[pt] ESTUDO DE TÉCNICAS DE APRENDIZADO POR REFORÇO APLICADAS AO CONTROLE DE PROCESSOS QUÍMICOS / [en] STUDY OF REINFORCEMENT LEARNING TECHNIQUES APPLIED TO THE CONTROL OF CHEMICAL PROCESSES

30 December 2021 (has links)
[pt] A indústria 4.0 impulsionou o desenvolvimento de novas tecnologias para atender as demandas atuais do mercado. Uma dessas novas tecnologias foi a incorporação de técnicas de inteligência computacional no cotidiano da indústria química. Neste âmbito, este trabalho avaliou o desempenho de controladores baseados em aprendizado por reforço em processos químicos industriais. A estratégia de controle interfere diretamente na segurança e no custo do processo. Quanto melhor for o desempenho dessa estrategia, menor será a produção de efluentes e o consumo de insumos e energia. Os algoritmos de aprendizado por reforço apresentaram excelentes resultados para o primeiro estudo de caso, o reator CSTR com a cinética de Van de Vusse. Entretanto, para implementação destes algoritmos na planta química do Tennessee Eastman Process mostrou-se que mais estudos são necessários. A fraca ou inexistente propriedade Markov, a alta dimensionalidade e as peculiaridades da planta foram fatores dificultadores para os controladores desenvolvidos obterem resultados satisfatórios. Foram avaliados para o estudo de caso 1, os algoritmos Q-Learning, Actor Critic TD, DQL, DDPG, SAC e TD3, e para o estudo de caso 2 foram avaliados os algoritmos CMA-ES, TRPO, PPO, DDPG, SAC e TD3. / [en] Industry 4.0 boosted the development of new technologies to meet current market demands. One of these new technologies was the incorporation of computational intelligence techniques into the daily life of the chemical industry. In this context, this present work evaluated the performance of controllers based on reinforcement learning in industrial chemical processes. The control strategy directly affects the safety and cost of the process. The better the performance of this strategy, the lower will be the production of effluents and the consumption of input and energy. The reinforcement learning algorithms showed excellent results for the first case study, the Van de Vusse s reactor. However, to implement these algorithms in the Tennessee Eastman Process chemical plant it was shown that more studies are needed. The weak Markov property, the high dimensionality and peculiarities of the plant were factors that made it difficult for the developed controllers to obtain satisfactory results. For case study 1, the algorithms Q-Learning, Actor Critic TD, DQL, DDPG, SAC and TD3 were evaluated, and for case study 2 the algorithms CMA-ES, TRPO, PPO, DDPG, SAC and TD3 were evaluated.
118

EXPANDING THE AUTONOMOUS SURFACE VEHICLE NAVIGATION PARADIGM THROUGH INLAND WATERWAY ROBOTIC DEPLOYMENT

Reeve David Lambert (13113279) 19 July 2022 (has links)
<p>This thesis presents solutions to some of the problems facing Autonomous Surface Vehicle (ASV) deployments in inland waterways through the development of navigational and control systems. Fluvial systems are one of the hardest inland waterways to navigate and are thus used as a use-case for system development. The systems are built to reduce the reliance on a-prioris during ASV operation. This is crucial for exceptionally dynamic environments such as fluvial bodies of water that have poorly defined routes and edges, can change course in short time spans, carry away and deposit obstacles, and expose or cover shoals and man-made structures as their water level changes. While navigation of fluvial systems is exceptionally difficult potential autonomous data collection can aid in important scientific missions in under studied environments.</p> <p><br></p> <p>The work has four contributions targeting solutions to four fundamental problems present in fluvial system navigation and control. To sense the course of fluvial systems for navigable path determination a fluvial segmentation study is done and a novel dataset detailed. To enable rapid path computations and augmentations in a fast moving environment a Dubins path generator and augmentation algorithm is presented ans is used in conjunction with an Integral Line-Of-Sight (ILOS) path following method. To rapidly avoid unseen/undetected obstacles present in fluvial environments a Deep Reinforcement Learning (DRL) agent is built and tested across domains to create dynamic local paths that can be rapidly affixed to for collision avoidance. Finally, a custom low-cost and deployable ASV, BREAM (Boat for Robotic Engineering and Applied Machine-Learning), capable of operating in fluvial environments is presented along with an autonomy package used in providing base level sensing and autonomy processing capability to varying platforms.</p> <p><br></p> <p>Each of these contributions form a part of a larger documented Fluvial Navigation Control Architecture (FNCA) that is proposed as a way to aid in a-priori free navigation of fluvial waterways. The architecture relates the navigational structures into high, mid, and low-level controller Guidance and Navigational Control (GNC) layers that are designed to increase cross vehicle and domain deployments. Each component of the architecture is documented, tested, and its application to the control architecture as a whole is reported.</p>
119

Deep Reinforcement Learning for Autonomous Highway Driving Scenario

Pradhan, Neil January 2021 (has links)
We present an autonomous driving agent on a simulated highway driving scenario with vehicles such as cars and trucks moving with stochastically variable velocity profiles. The focus of the simulated environment is to test tactical decision making in highway driving scenarios. When an agent (vehicle) maintains an optimal range of velocity it is beneficial both in terms of energy efficiency and greener environment. In order to maintain an optimal range of velocity, in this thesis work I proposed two novel reward structures: (a) gaussian reward structure and (b) exponential rise and fall reward structure. I trained respectively two deep reinforcement learning agents to study their differences and evaluate their performance based on a set of parameters that are most relevant in highway driving scenarios. The algorithm implemented in this thesis work is double-dueling deep-Q-network with prioritized experience replay buffer. Experiments were performed by adding noise to the inputs, simulating Partially Observable Markov Decision Process in order to obtain reliability comparison between different reward structures. Velocity occupancy grid was found to be better than binary occupancy grid as input for the algorithm. Furthermore, methodology for generating fuel efficient policies has been discussed and demonstrated with an example. / Vi presenterar ett autonomt körföretag på ett simulerat motorvägsscenario med fordon som bilar och lastbilar som rör sig med stokastiskt variabla hastighetsprofiler. Fokus för den simulerade miljön är att testa taktiskt beslutsfattande i motorvägsscenarier. När en agent (fordon) upprätthåller ett optimalt hastighetsområde är det fördelaktigt både när det gäller energieffektivitet och grönare miljö. För att upprätthålla ett optimalt hastighetsområde föreslog jag i detta avhandlingsarbete två nya belöningsstrukturer: (a) gaussisk belöningsstruktur och (b) exponentiell uppgång och nedgång belöningsstruktur. Jag utbildade respektive två djupförstärkande inlärningsagenter för att studera deras skillnader och utvärdera deras prestanda baserat på en uppsättning parametrar som är mest relevanta i motorvägsscenarier. Algoritmen som implementeras i detta avhandlingsarbete är dubbel-duell djupt Q- nätverk med prioriterad återuppspelningsbuffert. Experiment utfördes genom att lägga till brus i ingångarna, simulera delvis observerbar Markov-beslutsprocess för att erhålla tillförlitlighetsjämförelse mellan olika belöningsstrukturer. Hastighetsbeläggningsgaller visade sig vara bättre än binärt beläggningsgaller som inmatning för algoritmen. Dessutom har metodik för att generera bränsleeffektiv politik diskuterats och demonstrerats med ett exempel.
120

Apprentissage de stratégies de calcul adaptatives pour les réseaux neuronaux profonds

Kamanda, Aton 07 1900 (has links)
La théorie du processus dual stipule que la cognition humaine fonctionne selon deux modes distincts : l’un pour le traitement rapide, habituel et associatif, appelé communément "système 1" et le second, ayant un traitement plus lent, délibéré et contrôlé, que l’on nomme "système 2". Cette distinction indique une caractéristique sous-jacente importante de la cognition humaine : la possibilité de passer de manière adaptative à différentes stratégies de calcul selon la situation. Cette capacité est étudiée depuis longtemps dans différents domaines et de nombreux bénéfices hypothétiques semblent y être liés. Cependant, les réseaux neuronaux profonds sont souvent construits sans cette capacité à gérer leurs ressources calculatoires de manière optimale. Cette limitation des modèles actuels est d’autant plus préoccupante que de plus en plus de travaux récents semblent montrer une relation linéaire entre la capacité de calcul utilisé et les performances du modèle lors de la phase d’évaluation. Pour résoudre ce problème, ce mémoire propose différentes approches et étudie leurs impacts sur les modèles, tout d’abord, nous étudions un agent d’apprentissage par renforcement profond qui est capable d’allouer plus de calcul aux situations plus difficiles. Notre approche permet à l’agent d’adapter ses ressources computationnelles en fonction des exigences de la situation dans laquelle il se trouve, ce qui permet en plus d’améliorer le temps de calcul, améliore le transfert entre des tâches connexes et la capacité de généralisation. L’idée centrale commune à toutes nos approches est basée sur les théories du coût de l’effort venant de la littérature sur le contrôle cognitif qui stipule qu’en rendant l’utilisation de ressource cognitive couteuse pour l’agent et en lui laissant la possibilité de les allouer lors de ses décisions il va lui-même apprendre à déployer sa capacité de calcul de façon optimale. Ensuite, nous étudions des variations de la méthode sur une tâche référence d’apprentissage profond afin d’analyser précisément le comportement du modèle et quels sont précisément les bénéfices d’adopter une telle approche. Nous créons aussi notre propre tâche "Stroop MNIST" inspiré par le test de Stroop utilisé en psychologie afin de valider certaines hypothèses sur le comportement des réseaux neuronaux employant notre méthode. Nous finissons par mettre en lumière les liens forts qui existent entre apprentissage dual et les méthodes de distillation des connaissances. Notre approche a la particularité d’économiser des ressources computationnelles lors de la phase d’inférence. Enfin, dans la partie finale, nous concluons en mettant en lumière les contributions du mémoire, nous détaillons aussi des travaux futurs, nous approchons le problème avec les modèles basés sur l’énergie, en apprenant un paysage d’énergie lors de l’entrainement, le modèle peut ensuite lors de l’inférence employer une capacité de calcul dépendant de la difficulté de l’exemple auquel il fait face plutôt qu’une simple propagation avant fixe ayant systématiquement le même coût calculatoire. Bien qu’ayant eu des résultats expérimentaux infructueux, nous analysons les promesses que peuvent tenir une telle approche et nous émettons des hypothèses sur les améliorations potentielles à effectuer. Nous espérons, avec nos contributions, ouvrir la voie vers des algorithmes faisant un meilleur usage de leurs ressources computationnelles et devenant par conséquent plus efficace en termes de coût et de performance, ainsi que permettre une compréhension plus intime des liens qui existent entre certaines méthodes en apprentissage machine et la théorie du processus dual. / The dual-process theory states that human cognition operates in two distinct modes: one for rapid, habitual and associative processing, commonly referred to as "system 1", and the second, with slower, deliberate and controlled processing, which we call "system 2". This distinction points to an important underlying feature of human cognition: the ability to switch adaptively to different computational strategies depending on the situation. This ability has long been studied in various fields, and many hypothetical benefits seem to be linked to it. However, deep neural networks are often built without this ability to optimally manage their computational resources. This limitation of current models is all the more worrying as more and more recent work seems to show a linear relationship between the computational capacity used and model performance during the evaluation phase. To solve this problem, this thesis proposes different approaches and studies their impact on models. First, we study a deep reinforcement learning agent that is able to allocate more computation to more difficult situations. Our approach allows the agent to adapt its computational resources according to the demands of the situation in which it finds itself, which in addition to improving computation time, enhances transfer between related tasks and generalization capacity. The central idea common to all our approaches is based on cost-of-effort theories from the cognitive control literature, which stipulate that by making the use of cognitive resources costly for the agent, and allowing it to allocate them when making decisions, it will itself learn to deploy its computational capacity optimally. We then study variations of the method on a reference deep learning task, to analyze precisely how the model behaves and what the benefits of adopting such an approach are. We also create our own task "Stroop MNIST" inspired by the Stroop test used in psychology to validate certain hypotheses about the behavior of neural networks employing our method. We end by highlighting the strong links between dual learning and knowledge distillation methods. Finally, we approach the problem with energy-based models, by learning an energy landscape during training, the model can then during inference employ a computational capacity dependent on the difficulty of the example it is dealing with rather than a simple fixed forward propagation having systematically the same computational cost. Despite unsuccessful experimental results, we analyze the promise of such an approach and speculate on potential improvements. With our contributions, we hope to pave the way for algorithms that make better use of their computational resources, and thus become more efficient in terms of cost and performance, as well as providing a more intimate understanding of the links that exist between certain machine learning methods and dual process theory.

Page generated in 0.1094 seconds