• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 94
  • 12
  • 6
  • 4
  • 3
  • 2
  • 1
  • 1
  • Tagged with
  • 151
  • 151
  • 151
  • 79
  • 55
  • 54
  • 25
  • 24
  • 24
  • 23
  • 20
  • 20
  • 19
  • 19
  • 18
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
141

Finding the QRS Complex in a Sampled ECG Signal Using AI Methods / Hitta QRS komplex in en samplad EKG signal med AI metoder

Skeppland Hole, Jeanette Marie Victoria January 2023 (has links)
This study aimed to explore the application of artificial intelligence (AI) and machine learning (ML) techniques in implementing a QRS detector forambulatory electrocardiography (ECG) monitoring devices. Three ML models, namely long short-term memory (LSTM), convolutional neural network (CNN), and multilayer perceptron (MLP), were compared and evaluated using the MIT-BIH arrhythmia database (MITDB) and the MIT-BIH noise stress test database (NSTDB). The MLP model consistently outperformed the other models, achieving high accuracy in R-peak detection. However, when tested on noisy data, all models faced challenges in accurately predicting R-peaks, indicating the need for further improvement. To address this, the study emphasized the importance of iteratively refining the input data configurations for achieving accurate R-peak detection. By incorporating both the MITDB and NSTDB during training, the models demonstrated improved generalization to noisy signals. This iterative refinement process allowed for the identification of the best models and configurations, consistently surpassing existing ML-based implementations and outperforming the current ECG analysis system. The MLP model, without shifting segments and utilizing both datasets, achieved an outstanding accuracy of 99.73 % in R-peak detection. This accuracy exceeded values reported in the literature, demonstrating the superior performance of this approach. Furthermore, the shifted MLP model, which considered temporal dependencies by incorporating shifted segments, showed promising results with an accuracy of 99.75 %. It exhibited enhanced accuracy, precision, and F1-score compared to the other models, highlighting the effectiveness of incorporating shifted segments. For future research, it is important to address challenges such as overfitting and validate the models on independent datasets. Additionally, continuous refinement and optimization of the input data configurations will contribute to further advancements in ECG signal analysis and improve the accuracy of R-peak detection. This study underscores the potential of ML techniques in enhancing ECG analysis, ultimately leading to improved cardiac diagnostics and better patient care. / Syftet med denna studie var att utforska användningen av AI- och ML-tekniker för att implementera en QRS-detektor i EKG-övervakningsenheter. Tre olika ML-modeller, LSTM, CNN och MLP jämfördes och utvärderades med hjälp av MITDB och NSTDB. Resultaten visade att MLP-modellen konsekvent presterade bättre än de andra modellerna och uppnådde hög noggrannhet vid detektion av R-toppar i EKG-signalen. Trots detta stötte alla modeller på utmaningar när de testades på brusig realtidsdata, vilket indikerade behovet av ytterligare förbättringar. För att hantera dessa utmaningar betonade studien vikten av att iterativt förbättra konfigurationen av indata för att uppnå noggrann detektering av R toppar. Genom att inkludera både MITDB och NSTDB under träningen visade modellerna förbättrad förmåga att generalisera till brusiga signaler. Denna iterativa process möjliggjorde identifiering av de bästa modellerna och konfigurationerna, vilka konsekvent överträffade befintliga ML-baserade implementeringar och presterade bättre än den nuvarande EKG-analysystemet. MLP-modellen, utan användning av skiftade segment och med båda databaserna, uppnådde en imponerande noggrannhet på 99,73 % vid detektion av R-toppar. Denna noggrannhet överträffade tidigare studier och visade på den överlägsna prestandan hos denna metod. Dessutom visade den skiftade MLP-modellen, som inkluderade skiftade segment för att beakta tidsberoenden, lovande resultat med en noggrannhet på 99,75 %. Modellen uppvisade förbättrad noggrannhet, precision och F1-score jämfört med de andra modellerna, vilket betonar vikten av att inkludera skiftade segment. För framtida studier är det viktigt att hantera utmaningar som överanpassning och att validera modellerna med oberoende datamängder. Dessutom kommer en kontinuerlig förfining och optimering av konfigurationen av indata att bidra till ytterligare framsteg inom EKG-signalanalys och förbättrad noggrannhet vid detektion av R-toppar. Denna studie understryker potentialen hos ML-modeller för att förbättra EKG-analysen och därigenom bidra till förbättrad diagnostik av hjärtsjukdomar och högre kvalitet inom patientvården.
142

ENERGY EFFICIENT EDGE INFERENCE SYSTEMS

Soumendu Kumar Ghosh (14060094) 07 August 2023 (has links)
<p>Deep Learning (DL)-based edge intelligence has garnered significant attention in recent years due to the rapid proliferation of the Internet of Things (IoT), embedded, and intelligent systems, collectively termed edge devices. Sensor data streams acquired by these edge devices are processed by a Deep Neural Network (DNN) application that runs on the device itself or in the cloud. However, the high computational complexity and energy consumption of processing DNNs often limit their deployment on these edge inference systems due to limited compute, memory and energy resources. Furthermore, high costs, strict application latency demands, data privacy, security constraints, and the absence of reliable edge-cloud network connectivity heavily impact edge application efficiency in the case of cloud-assisted DNN inference. Inevitably, performance and energy efficiency are of utmost importance in these edge inference systems, aside from the accuracy of the application. To facilitate energy- efficient edge inference systems running computationally complex DNNs, this dissertation makes three key contributions.</p> <p><br></p> <p>The first contribution adopts a full-system approach to Approximate Computing, a design paradigm that trades off a small degradation in application quality for significant energy savings. Within this context, we present the foundational concepts of AxIS, the first approximate edge inference system that jointly optimizes the constituent subsystems leading to substantial energy benefits compared to optimization of the individual subsystem. To illustrate the efficacy of this approach, we demonstrate multiple versions of an approximate smart camera system that executes various DNN-based unimodal computer vision applications, showcasing how the sensor, memory, compute, and communication subsystems can all be synergistically approximated for energy-efficient edge inference.</p> <p><br></p> <p>Building on this foundation, the second contribution extends AxIS to multimodal AI, harnessing data from multiple sensor modalities to impart human-like cognitive and perceptual abilities to edge devices. By exploring optimization techniques for multiple sensor modalities and subsystems, this research reveals the impact of synergistic modality-aware optimizations on system-level accuracy-efficiency (AE) trade-offs, culminating in the introduction of SysteMMX, the first AE scalable cognitive system that allows efficient multimodal inference at the edge. To illustrate the practicality and effectiveness of this approach, we present an in-depth case study centered around a multimodal system that leverages RGB and Depth sensor modalities for image segmentation tasks.</p> <p><br></p> <p>The final contribution focuses on optimizing the performance of an edge-cloud collaborative inference system through intelligent DNN partitioning and computation offloading. We delve into the realm of distributed inference across edge devices and cloud servers, unveiling the challenges associated with finding the optimal partitioning point in DNNs for significant inference latency speedup. To address these challenges, we introduce PArtNNer, a platform-agnostic and adaptive DNN partitioning framework capable of dynamically adapting to changes in communication bandwidth and cloud server load. Unlike existing approaches, PArtNNer does not require pre-characterization of underlying edge computing platforms, making it a versatile and efficient solution for real-world edge-cloud scenarios.</p> <p><br></p> <p>Overall, this thesis provides novel insights, innovative techniques, and intelligent solutions to enable energy-efficient AI at the edge. The contributions presented herein serve as a solid foundation for future researchers to build upon, driving innovation and shaping the trajectory of research in edge AI.</p>
143

[pt] ENSAIOS EM MODELOS DE DOIS ESTÁGIOS EM SISTEMAS DE POTÊNCIAS: CONTRIBUIÇÕES EM MODELAGEM E APLICAÇÕES DO MÉTODO DE GERAÇÃO DE LINHAS E COLUNAS / [en] ESSAYS ON TWO-STAGE ROBUST MODELS FOR POWER SYSTEMS: MODELING CONTRIBUTIONS AND APPLICATIONS OF THE COLUMN-AND-CONSTRAINT-GENERATION ALGORITHM

ALEXANDRE VELLOSO PEREIRA RODRIGUES 07 December 2020 (has links)
[pt] Esta dissertação está estruturada como uma coleção de cinco artigos formatados em capítulos. Os quatro primeiros artigos apresentam contribuições em modelagem e metodológicas para problemas de operação ou investimento em sistemas de potência usando arcabouço de otimização robusta adaptativa e modificações no algoritmo de geração de linhas e colunas (CCGA). O primeiro artigo aborda a programação de curto prazo com restrição de segurança, onde a resposta automática de geradores é considerada. Um modelo robusto de dois estágios é adotado, resultando em complexas instâncias de programação inteira mista, que apresentam variáveis binárias associadas às decisões de primeiro e segundo estágios. Um novo CCGA que explora a estrutura do problema é desenvolvido. O segundo artigo usa redes neurais profundas para aprender o mapeamento das demandas nodais aos pontos de ajuste dos geradores para o problema do primeiro artigo. O CCGA é usados para garantir a viabilidade da solução. Este método resulta em importantes ganhos computacionais em relação ao primeiro artigo. O terceiro artigo propõe uma abordagem adaptativa em dois estágios para um modelo robusto de programação diária no qual o conjunto de incerteza poliedral é caracterizado diretamente a partir dos dados de geração não despachável observados. O problema resultante é afeito ao CCGA. O quarto artigo propõe um modelo de dois estágios adaptativo, robusto em distribuição para expansão de transmissão, incorporando incertezas a longo e curto prazo. Um novo CCGA é desenvolvido para lidar com os subproblemas. Finalmente, sob uma perspectiva diferente e generalista, o quinto artigo investiga a adequação de prêmios de incentivo para promover inovações em aspectos teóricos e computacionais para os desafios de sistemas de potência modernos. / [en] This dissertation is structured as a collection of five papers formatted as chapters. The first four papers provide modeling and methodological contributions in scheduling or investment problems in power systems using the adaptive robust optimization framework and modifications to the column-and-constraint-generation algorithm (CCGA). The first paper addresses the security-constrained short-term scheduling problem where automatic primary response is considered. A two-stage robust model is adopted, resulting in complex mixed-integer linear instances featuring binary variables associated with first- and second-stage decisions. A new tailored CCGA which explores the structure of the problem is devised. The second paper uses deep neural networks for learning the mapping of nodal demands onto generators set point for the first paper s model. Robust-based modeling approaches and the CCGA are used to enforce feasibility for the solution. This method results in important computational gains as compared to results of the first paper. The third paper proposes an adaptive data-driven approach for a two-stage robust unit commitment model, where the polyhedral uncertainty set is characterized directly from data, through the convex hull of a set of previously observed non-dispatchable generation profiles. The resulting problem is suitable for the exact CCGA. The fourth paper proposes an adaptive two-stage distributionally robust transmission expansion model incorporating long- and short-term uncertainties. A novel extended CCGA is devised to tackle distributionally robust subproblems. Finally, under a different and higher-level perspective, the fifth paper investigates the adequacy of systematic inducement prizes for fostering innovations in theoretical and computational aspects for various modern power systems challenges.
144

Deep Neural Networks for Context Aware Personalized Music Recommendation : A Vector of Curation / Djupa neurala nätverk för kontextberoende personaliserad musikrekommendation

Bahceci, Oktay January 2017 (has links)
Information Filtering and Recommender Systems have been used and has been implemented in various ways from various entities since the dawn of the Internet, and state-of-the-art approaches rely on Machine Learning and Deep Learning in order to create accurate and personalized recommendations for users in a given context. These models require big amounts of data with a variety of features such as time, location and user data in order to find correlations and patterns that other classical models such as matrix factorization and collaborative filtering cannot. This thesis researches, implements and compares a variety of models with the primary focus of Machine Learning and Deep Learning for the task of music recommendation and do so successfully by representing the task of recommendation as a multi-class extreme classification task with 100 000 distinct labels. By comparing fourteen different experiments, all implemented models successfully learn features such as time, location, user features and previous listening history in order to create context-aware personalized music predictions, and solves the cold start problem by using user demographic information, where the best model being capable of capturing the intended label in its top 100 list of recommended items for more than 1/3 of the unseen data in an offine evaluation, when evaluating on randomly selected examples from the unseen following week. / Informationsfiltrering och rekommendationssystem har använts och implementeratspå flera olika sätt från olika enheter sedan gryningen avInternet, och moderna tillvägagångssätt beror påMaskininlärrning samtDjupinlärningför att kunna skapa precisa och personliga rekommendationerför användare i en given kontext. Dessa modeller kräver data i storamängder med en varians av kännetecken såsom tid, plats och användardataför att kunna hitta korrelationer samt mönster som klassiska modellersåsom matris faktorisering samt samverkande filtrering inte kan. Dettaexamensarbete forskar, implementerar och jämför en mängd av modellermed fokus påMaskininlärning samt Djupinlärning för musikrekommendationoch gör det med succé genom att representera rekommendationsproblemetsom ett extremt multi-klass klassifikationsproblem med 100000 unika klasser att välja utav. Genom att jämföra fjorton olika experiment,så lär alla modeller sig kännetäcken såsomtid, plats, användarkänneteckenoch lyssningshistorik för att kunna skapa kontextberoendepersonaliserade musikprediktioner, och löser kallstartsproblemet genomanvändning av användares demografiska kännetäcken, där den bästa modellenklarar av att fånga målklassen i sin rekommendationslista medlängd 100 för mer än 1/3 av det osedda datat under en offline evaluering,när slumpmässigt valda exempel från den osedda kommande veckanevalueras.
145

Finding duplicate offers in the online marketplace catalogue using transformer based methods : An exploration of transformer based methods for the task of entity resolution / Hitta dubbletter av erbjudanden i online marknadsplatskatalog med hjälp av transformer-baserade metoder : En utforskning av transformer-baserad metoder för uppgiften att deduplicera

Damian, Robert-Andrei January 2022 (has links)
The amount of data available on the web is constantly growing, and e-commerce websites are no exception. Considering the abundance of available information, finding offers for the same product in the catalogue of different retailers represents a challenge. This problem is an interesting one and addresses the needs of multiple actors. A customer is interested in finding the best deal for the product they want to buy. A retailer wants to keep up to date with the competition and adapt its pricing strategy accordingly. Various services already offer the possibility of finding duplicate products in catalogues of e-commerce retailers, but their solutions are based on matching a Global Trade Identification Number (GTIN). This strategy is limited because a GTIN may not be made publicly available by a competitor, may be different for the same product exported by the manufacturer to different markets or may not even exist for low-value products. The field of Entity Resolution (ER), a sub-branch of Natural Language Processing (NLP), focuses on solving the issue of matching duplicate database entries when a deterministic identifier is not available. We investigate various solutions from the the field and present a new model called Spring R-SupCon that focuses on low volume datasets. Our work builds upon the recently introduced model, R-SupCon, introducing a new learning scheme that improves R-SupCon’s performance by up to 74.47% F1 score, and surpasses Ditto by up 12% F1 score for low volume datasets. Moreover, our experiments show that smaller language models can be used for ER with minimal loss in performance. This has the potential to extend the adoption of Transformer-based solutions to companies and markets where datasets are difficult to create, like it is the case for the Swedish marketplace Fyndiq. / Mängden data på internet växer konstant och e-handeln är inget undantag. Konsumenter har idag många valmöjligheter varifrån de väljer att göra sina inköp från. Detta gör att det blir svårare och svårare att hitta det bästa erbjudandet. Även för återförsäljare ökar svårigheten att veta vilken konkurrent som har lägst pris. Det finns tillgängliga lösningar på detta problem men de använder produktunika identifierare såsom Global Trade Identification Number (förkortat “GTIN”). Då det finns en rad utmaningar att bara förlita sig på lösningar som baseras på GTIN behövs ett alternativt tillvägagångssätt. GTIN är exempelvis inte en offentlig information och identifieraren kan dessutom vara en annan när samma produkt erbjuds på en annan marknad. Det här projektet undersöker alternativa lösningar som inte är baserade på en deterministisk identifierare. Detta projekt förlitar sig istället på text såsom produktens namn för att fastställa matchningar mellan olika erbjudanden. En rad olika implementeringar baserade på maskininlärning och djupinlärning studeras i detta projekt. Projektet har dock ett särskilt fokus på “Transformer”-baserade språkmodeller såsom BERT. Detta projekt visar hur man generera proprietär data. Projektet föreslår även ett nytt inlärningsschema och bevisar dess fördelar. / Le volume des données qui se trouve sur l’internet est en une augmentation constante et les commerces électroniques ne font pas note discordante. Le consommateur a aujourd’hui beaucoup des options quand il decide d’où faire son achat. Trouver le meilleur prix devient de plus en plus difficile. Les entreprises qui gerent cettes plates-formes ont aussi la difficulté de savoir en tous moments lesquels de ses concurrents ont le meilleur prix. Il y-a déjà des solutions en ligne qui ont l’objectif de résoudre ce problème, mais ils utilisent un identifiant de produit unique qui s’appelle Global Trade identification number (ou GTIN). Plusieurs difficultés posent des barriers sur cette solution. Par exemple, GTIN n’est pas public peut-être, ou des GTINs différents peut-être assigne par la fabricante au même produit pour distinguer des marchés différents. Ce projet étudie des solutions alternatives qui ne sont pas basées sur avoir un identifiant unique. On discute des methods qui font la décision en fonction du nom des produits, en utilisant des algorithmes d’apprentissage automatique ou d’apprentissage en profondeur. Le projet se concentre sur des solutions avec ”Transformer” modèles de langages, comme BERT. On voit aussi comme peut-on créer un ensemble de données propriétaire pour enseigner le modèle. Finalement, une nouvelle method d’apprentissage est proposée et analysée.
146

Deep Reinforcement Learning Adaptive Traffic Signal Control / Reinforcement Learning Traffic Signal Control

Genders, Wade 22 November 2018 (has links)
Sub-optimal automated transportation control systems incur high mobility, human health and environmental costs. With society reliant on its transportation systems for the movement of individuals, goods and services, minimizing these costs benefits many. Intersection traffic signal controllers are an important element of modern transportation systems that govern how vehicles traverse road infrastructure. Many types of traffic signal controllers exist; fixed time, actuated and adaptive. Adaptive traffic signal controllers seek to minimize transportation costs through dynamic control of the intersection. However, many existing adaptive traffic signal controllers rely on heuristic or expert knowledge and were not originally designed for scalability or for transportation’s big data future. This research addresses the aforementioned challenges by developing a scalable system for adaptive traffic signal control model development using deep reinforcement learning in traffic simulation. Traffic signal control can be modelled as a sequential decision-making problem; reinforcement learning can solve sequential decision-making problems by learning an optimal policy. Deep reinforcement learning makes use of deep neural networks, powerful function approximators which benefit from large amounts of data. Distributed, parallel computing techniques are used to provide scalability, with the proposed methods validated on a simulation of the City of Luxembourg, Luxembourg, consisting of 196 intersections. This research contributes to the body of knowledge by successfully developing a scalable system for adaptive traffic signal control model development and validating it on the largest traffic microsimulator in the literature. The proposed system reduces delay, queues, vehicle stopped time and travel time compared to conventional traffic signal controllers. Findings from this research include that using reinforcement learning methods which explicitly develop the policy offers improved performance over purely value-based methods. The developed methods are expected to mitigate the problems caused by sub-optimal automated transportation signal controls systems, improving mobility and human health and reducing environmental costs. / Thesis / Doctor of Philosophy (PhD) / Inefficient transportation systems negatively impact mobility, human health and the environment. The goal of this research is to mitigate these negative impacts by improving automated transportation control systems, specifically intersection traffic signal controllers. This research presents a system for developing adaptive traffic signal controllers that can efficiently scale to the size of cities by using machine learning and parallel computation techniques. The proposed system is validated by developing adaptive traffic signal controllers for 196 intersections in a simulation of the City of Luxembourg, Luxembourg, successfully reducing delay, queues, vehicle stopped time and travel time.
147

ACCELERATING SPARSE MACHINE LEARNING INFERENCE

Ashish Gondimalla (14214179) 17 May 2024 (has links)
<p>Convolutional neural networks (CNNs) have become important workloads due to their<br> impressive accuracy in tasks like image classification and recognition. Convolution operations<br> are compute intensive, and this cost profoundly increases with newer and better CNN models.<br> However, convolutions come with characteristics such as sparsity which can be exploited. In<br> this dissertation, we propose three different works to capture sparsity for faster performance<br> and reduced energy. </p> <p><br></p> <p>The first work is an accelerator design called <em>SparTen</em> for improving two-<br> sided sparsity (i.e, sparsity in both filters and feature maps) convolutions with fine-grained<br> sparsity. <em>SparTen</em> identifies efficient inner join as the key primitive for hardware acceleration<br> of sparse convolution. In addition, <em>SparTen</em> proposes load balancing schemes for higher<br> compute unit utilization. <em>SparTen</em> performs 4.7x, 1.8x and 3x better than dense architecture,<br> one-sided architecture and SCNN, the previous state of the art accelerator. The second work<br> <em>BARISTA</em> scales up SparTen (and SparTen like proposals) to large-scale implementation<br> with as many compute units as recent dense accelerators (e.g., Googles Tensor processing<br> unit) to achieve full speedups afforded by sparsity. However at such large scales, buffering,<br> on-chip bandwidth, and compute utilization are highly intertwined where optimizing for<br> one factor strains another and may invalidate some optimizations proposed in small-scale<br> implementations. <em>BARISTA</em> proposes novel techniques to balance the three factors in large-<br> scale accelerators. <em>BARISTA</em> performs 5.4x, 2.2x, 1.7x and 2.5x better than dense, one-<br> sided, naively scaled two-sided and an iso-area two-sided architecture, respectively. The last<br> work, <em>EUREKA</em> builds an efficient tensor core to execute dense, structured and unstructured<br> sparsity with losing efficiency. <em>EUREKA</em> achieves this by proposing novel techniques to<br> improve compute utilization by slightly tweaking operand stationarity. <em>EUREKA</em> achieves a<br> speedup of 5x, 2.5x, along with 3.2x and 1.7x energy reductions over Dense and structured<br> sparse execution respectively. <em>EUREKA</em> only incurs area and power overheads of 6% and<br> 11.5%, respectively, over Ampere</p>
148

Minds, Machines &amp; Metaphors : Limits of AI Understanding

Másson, Mímir January 2024 (has links)
This essay critically examines the limitations of artificial intelligence (AI) in achieving human-like understanding and intelligence. Despite significant advancements in AI, such as the development of sophisticated machine learning algorithms and neural networks, current systems fall short in comprehending the cognitive depth and flexibility inherent in human intelligence. Through an exploration of historical and contemporary arguments, including Searle's Chinese Room thought experiment and Dennett's Frame Problem, this essay highlights the inherent differences between human cognition and AI. Central to this analysis is the role of metaphorical thinking and embodied cognition, as articulated by Lakoff and Johnson, which are fundamental to human understanding but absent in AI. Proponents of AGI, like Kurzweil and Bostrom, argue for the potential of AI to surpass human intelligence through recursive self-improvement and technological integration. However, this essay contends that these approaches do not address the core issues of experiential knowledge and contextual awareness. By integrating insights from contemporary scholars like Bender, Koller, Buckner, Thorstad, and Hoffmann, the essay ultimately concludes that AI, while a powerful computational framework, is fundamentally incapaple of replicating the true intelligence and understanding unique to humans.
149

Reparametrization in deep learning

Dinh, Laurent 02 1900 (has links)
No description available.
150

Towards computationally efficient neural networks with adaptive and dynamic computations

Kim, Taesup 08 1900 (has links)
Ces dernières années, l'intelligence artificielle a été considérablement avancée et l'apprentissage en profondeur, où des réseaux de neurones profonds sont utilisés pour tenter d'imiter vaguement le cerveau humain, y a contribué de manière significative. Les réseaux de neurones profonds sont désormais capables d'obtenir un grand succès sur la base d'une grande quantité de données et de ressources de calcul suffisantes. Malgré leur succès, leur capacité à s'adapter rapidement à de nouveaux concepts, tâches et environnements est assez limitée voire inexistante. Dans cette thèse, nous nous intéressons à la façon dont les réseaux de neurones profonds peuvent s'adapter à des circonstances en constante évolution ou totalement nouvelles, de la même manière que l'intelligence humaine, et introduisons en outre des modules architecturaux adaptatifs et dynamiques ou des cadres de méta-apprentissage pour que cela se produise de manière efficace sur le plan informatique. Cette thèse consiste en une série d'études proposant des méthodes pour utiliser des calculs adaptatifs et dynamiques pour aborder les problèmes d'adaptation qui sont étudiés sous différentes perspectives telles que les adaptations au niveau de la tâche, au niveau temporel et au niveau du contexte. Dans le premier article, nous nous concentrons sur l'adaptation rapide des tâches basée sur un cadre de méta-apprentissage. Plus précisément, nous étudions l'incertitude du modèle induite par l'adaptation rapide à une nouvelle tâche avec quelques exemples. Ce problème est atténué en combinant un méta-apprentissage efficace basé sur des gradients avec une inférence variationnelle non paramétrique dans un cadre probabiliste fondé sur des principes. C'est une étape importante vers un méta-apprentissage robuste que nous développons une méthode d'apprentissage bayésienne à quelques exemples pour éviter le surapprentissage au niveau des tâches. Dans le deuxième article, nous essayons d'améliorer les performances de la prédiction de la séquence (c'est-à-dire du futur) en introduisant une prédiction du futur sauteur basée sur la taille du pas adaptatif. C'est une capacité critique pour un agent intelligent d'explorer un environnement qui permet un apprentissage efficace avec une imagination sauteur futur. Nous rendons cela possible en introduisant le modèle hiérarchique d'espace d'état récurrent (HRSSM) qui peut découvrir la structure temporelle latente (par exemple, les sous-séquences) tout en modélisant ses transitions d'état stochastiques de manière hiérarchique. Enfin, dans le dernier article, nous étudions un cadre qui peut capturer le contexte global dans les données d'image de manière adaptative et traiter davantage les données en fonction de ces informations. Nous implémentons ce cadre en extrayant des concepts visuels de haut niveau à travers des modules d'attention et en utilisant un raisonnement basé sur des graphes pour en saisir le contexte global. De plus, des transformations au niveau des caractéristiques sont utilisées pour propager le contexte global à tous les descripteurs locaux de manière adaptative. / Over the past few years, artificial intelligence has been greatly advanced, and deep learning, where deep neural networks are used to attempt to loosely emulate the human brain, has significantly contributed to it. Deep neural networks are now able to achieve great success based on a large amount of data and sufficient computational resources. Despite their success, their ability to quickly adapt to new concepts, tasks, and environments is quite limited or even non-existent. In this thesis, we are interested in how deep neural networks can become adaptive to continually changing or totally new circumstances, similarly to human intelligence, and further introduce adaptive and dynamic architectural modules or meta-learning frameworks to make it happen in computationally efficient ways. This thesis consists of a series of studies proposing methods to utilize adaptive and dynamic computations to tackle adaptation problems that are investigated from different perspectives such as task-level, temporal-level, and context-level adaptations. In the first article, we focus on task-level fast adaptation based on a meta-learning framework. More specifically, we investigate the inherent model uncertainty that is induced from quickly adapting to a new task with a few examples. This problem is alleviated by combining the efficient gradient-based meta-learning with nonparametric variational inference in a principled probabilistic framework. It is an important step towards robust meta-learning that we develop a Bayesian few-shot learning method to prevent task-level overfitting. In the second article, we attempt to improve the performance of sequence (i.e. future) prediction by introducing a jumpy future prediction that is based on the adaptive step size. It is a critical ability for an intelligent agent to explore an environment that enables efficient option-learning and jumpy future imagination. We make this possible by introducing the Hierarchical Recurrent State Space Model (HRSSM) that can discover the latent temporal structure (e.g. subsequences) while also modeling its stochastic state transitions hierarchically. Finally, in the last article, we investigate a framework that can capture the global context in image data in an adaptive way and further process the data based on that information. We implement this framework by extracting high-level visual concepts through attention modules and using graph-based reasoning to capture the global context from them. In addition, feature-wise transformations are used to propagate the global context to all local descriptors in an adaptive way.

Page generated in 0.047 seconds