Global ETD Search

11	A Systematic Literature Review on Meta Learning for Predictive Maintenance in Industry 4.0 Fisenkci, Ahmet January 2022 (has links) Recent refinements in Industry 4.0 and Machine Learning demonstrate the positive effects of using deep learning models for intelligent maintenance. The primary benefit of Deep Learning (DL) is its capability to extract attributes and make fast, accurate, and automated predictions without supervision. However, DL requires high computational power, significant data preprocessing, and vast amounts of data to make accurate predictions for intelligent maintenance. Given the considerable obstacles, meta-learning has been developed as a novel way to overcome these challenges. As a learning technique, meta-learning aims to quickly acquire knowledge of new tasks using theminimal available data by learning through meta-knowledge. There has been less research in the area of using meta-learning for Predictive Maintenance (PdM) and we considered it necessary to conduct this review to understand the applicability of meta-learning’s capabilities and functions to PdM since the outcomes of this technique seem to be rather promising. The review started with the development of a methodology and four research questions: (1) What is the taxonomy of meta-learning for PdM?, (2) What are the current state-of-the-art methodologies? (3) Which datasets are available for meta-learning in PdM?, and (4) What are the open issues, challenges, and opportunities of meta-learning in PdM?. To answer the first and second questions, a new taxonomy was proposed and meta-learnings role in predictive maintenance was identified from selected 55 papers. To answer the third question, we determined which types of datasets and their characteristics exist for this domain. Finally, the challenges, open issues, and opportunities of meta-learning in predictive maintenance were examined to answer the final question. The results of the research questions provided suggestions for future research topics. Meta-learning Few-Shot Predictive Maintenance Industry 4.0 Fault detection Fault Prognostics Fault Diagnosis Computer Sciences Datavetenskap (datalogi)
12	Uncertainty Estimation on Natural Language Processing He, Jianfeng 15 May 2024 (has links) Text plays a pivotal role in our daily lives, encompassing various forms such as social media posts, news articles, books, reports, and more. Consequently, Natural Language Processing (NLP) has garnered widespread attention. This technology empowers us to undertake tasks like text classification, entity recognition, and even crafting responses within a dialogue context. However, despite the expansive utility of NLP, it frequently necessitates a critical decision: whether to place trust in a model's predictions. To illustrate, consider a state-of-the-art (SOTA) model entrusted with diagnosing a disease or assessing the veracity of a rumor. An incorrect prediction in such scenarios can have dire consequences, impacting individuals' health or tarnishing their reputation. Consequently, it becomes imperative to establish a reliable method for evaluating the reliability of an NLP model's predictions, which is our focus-uncertainty estimation on NLP. Though many works have researched uncertainty estimation or NLP, the combination of these two domains is rare. This is because most NLP research emphasizes model prediction performance but tends to overlook the reliability of NLP model predictions. Additionally, current uncertainty estimation models may not be suitable for NLP due to the unique characteristics of NLP tasks, such as the need for more fine-grained information in named entity recognition. Therefore, this dissertation proposes novel uncertainty estimation methods for different NLP tasks by considering the NLP task's distinct characteristics. The NLP tasks are categorized into natural language understanding (NLU) and natural language generation (NLG, such as text summarization). Among the NLU tasks, the understanding could be on two views, global-view (e.g. text classification at document level) and local-view (e.g. natural language inference at sentence level and named entity recognition at token level). As a result, we research uncertainty estimation on three tasks: text classification, named entity recognition, and text summarization. Besides, because few-shot text classification has captured much attention recently, we also research the uncertainty estimation on few-shot text classification. For the first topic, uncertainty estimation on text classification, few uncertainty models focus on improving the performance of text classification where human resources are involved. In response to this gap, our research focuses on enhancing the accuracy of uncertainty scores by bolstering the confidence associated with winning scores. we introduce MSD, a novel model comprising three distinct components: 'mix-up,' 'self-ensembling,' and 'distinctiveness score.' The primary objective of MSD is to refine the accuracy of uncertainty scores by mitigating the issue of overconfidence in winning scores while simultaneously considering various categories of uncertainty. seamlessly integrate with different Deep Neural Networks. Extensive experiments with ablation settings are conducted on four real-world datasets, resulting in consistently competitive improvements. Our second topic focuses on uncertainty estimation on few-shot text classification (UEFTC), which has few or even only one available support sample for each class. UEFTC represents an underexplored research domain where, due to limited data samples, a UEFTC model predicts an uncertainty score to assess the likelihood of classification errors. However, traditional uncertainty estimation models in text classification are ill-suited for UEFTC since they demand extensive training data, while UEFTC operates in a few-shot scenario, typically providing just a few support samples, or even just one, per class. To tackle this challenge, we introduce Contrastive Learning from Uncertainty Relations (CLUR) as a solution tailored for UEFTC. CLUR exhibits the unique capability to be effectively trained with only one support sample per class, aided by pseudo uncertainty scores. A distinguishing feature of CLUR is its autonomous learning of these pseudo uncertainty scores, in contrast to previous approaches that relied on manual specification. Our investigation of CLUR encompasses four model structures, allowing us to evaluate the performance of three commonly employed contrastive learning components in the context of UEFTC. Our findings highlight the effectiveness of two of these components. Our third topic focuses on uncertainty estimation on sequential labeling. Sequential labeling involves the task of assigning labels to individual tokens in a sequence, exemplified by Named Entity Recognition (NER). Despite significant advancements in enhancing NER performance in prior research, the realm of uncertainty estimation for NER (UE-NER) remains relatively uncharted but is of paramount importance. This topic focuses on UE-NER, seeking to gauge uncertainty scores for NER predictions. Previous models for uncertainty estimation often overlook two distinctive attributes of NER: the interrelation among entities (where the learning of one entity's embedding depends on others) and the challenges posed by incorrect span predictions in entity extraction. To address these issues, we introduce the Sequential Labeling Posterior Network (SLPN), designed to estimate uncertainty scores for the extracted entities while considering uncertainty propagation from other tokens. Additionally, we have devised an evaluation methodology tailored to the specific nuances of wrong-span cases. Our fourth topic focuses on an overlooked question that persists regarding the evaluation reliability of uncertainty estimation in text summarization (UE-TS). Text summarization, a key task in natural language generation (NLG), holds significant importance, particularly in domains where inaccuracies can have serious consequences, such as healthcare. UE-TS has garnered attention due to the potential risks associated with erroneous summaries. However, the reliability of evaluating UE-TS methods raises concerns, stemming from the interdependence between uncertainty model metrics and the wide array of NLG metrics. To address these concerns, we introduce a comprehensive UE-TS benchmark incorporating twenty-six NLG metrics across four dimensions. This benchmark evaluates the uncertainty estimation capabilities of two large language models and one pre-trained language model across two datasets. Additionally, it assesses the effectiveness of fourteen common uncertainty estimation methods. Our study underscores the necessity of utilizing diverse, uncorrelated NLG metrics and uncertainty estimation techniques for a robust evaluation of UE-TS methods. / Doctor of Philosophy / Text is integral to our daily activities, appearing in various forms such as social media posts, news articles, books, and reports. We rely on text for communication, information dissemination, and decision-making. Given its ubiquity, the ability to process and understand text through Natural Language Processing (NLP) has become increasingly important. NLP technology enables us to perform tasks like text classification, which involves categorizing text into predefined labels, and named entity recognition (NER), which identifies specific entities such as names, dates, and locations within text. Additionally, NLP facilitates generating coherent and contextually appropriate responses in conversational agents, enhancing human-computer interaction. However, the reliability of NLP models is crucial, especially in sensitive applications like medical diagnoses, where errors can have severe consequences. This dissertation focuses on uncertainty estimation in NLP, a less explored but essential area. Uncertainty estimation helps evaluate the confidence of NLP model predictions. We propose new methods tailored to various NLP tasks, acknowledging their unique needs. NLP tasks are divided into natural language understanding (NLU) and natural language generation (NLG). Within NLU, we look at tasks from two perspectives: a global view (e.g., document-level text classification) and a local view (e.g., sentence-level inference and token-level entity recognition). Our research spans text classification, named entity recognition (NER), and text summarization, with a special focus on few-shot text classification due to its recent prominence. For text classification, we introduce the MSD model, which includes three components to enhance uncertainty score accuracy and address overconfidence issues. This model integrates seamlessly with different neural networks and shows consistent improvements in experiments. For few-shot text classification, we develop Contrastive Learning from Uncertainty Relations (CLUR), designed to work effectively with minimal support samples per class. CLUR autonomously learns pseudo uncertainty scores, demonstrating effectiveness with various contrastive learning components. In NER, we address the unique challenges of entity interrelation and span prediction errors. We propose the Sequential Labeling Posterior Network (SLPN) to estimate uncertainty scores while considering uncertainty propagation from other tokens. For text summarization, we create a benchmark with tens of metrics to evaluate uncertainty estimation methods across two datasets. This benchmark helps assess the reliability of these methods, highlighting the need for diverse, uncorrelated metrics. Overall, our work advances the understanding and implementation of uncertainty estimation in NLP, providing more reliable and accurate predictions across different tasks. Uncertainty Estimation Bayesian Neural Network Evidential Neural Network Text Classification Few-Shot Named Entity Recognition Text Summarization
13	Lost in Transcription : Evaluating Clustering and Few-Shot learningfor transcription of historical ciphers Magnifico, Giacomo January 2021 (has links) Where there has been a steady development of Optical Character Recognition (OCR) techniques for printed documents, the instruments that provide good quality for hand-written manuscripts by Hand-written Text Recognition methods (HTR) and transcriptions are still some steps behind. With the main focus on historical ciphers (i.e. encrypted documents from the past with various types of symbol sets), this thesis examines the performance of two machine learning architectures developed within the DECRYPT project framework, a clustering based unsupervised algorithm and a semi-supervised few-shot deep-learning model. Both models are tested on seen and unseen scribes to evaluate the difference in performance and the shortcomings of the two architectures, with the secondary goal of determining the influences of the datasets on the performance. An in-depth analysis of the transcription results is performed with particular focus on the Alchemic and Zodiac symbol sets, with analysis of the model performance relative to character shape and size. The results show the promising performance of Few-Shot architectures when compared to Clustering algorithm, with a respective SER average of 0.336 (0.15 and 0.104 on seen data / 0.754 on unseen data) and 0.596 (0.638 and 0.350 on seen data / 0.8 on unseen data). Image Recognition Handwritten Text Recognition HTR Deep-learning K-mean clustering NN Neural Network Few-Shot
14	FAZT: FEW AND ZERO-SHOT FRAMEWORK TO LEARN TEMPO-VISUAL EVENTS FROM LITTLE OR NO DATA Naveen Madapana (11613925) 20 December 2021 (has links) <div>Supervised classification methods based on deep learning have achieved great success in many domains and tasks that are previously unimaginable. Such approaches build on learning paradigms that require hundreds of examples in order to learn to classify objects or events. Thus, their immediate application to the domains with few or no observations is limited. This is because of the lack of ability to rapidly generalize to new categories from a few examples or from high-level descriptions of categories. This can be attributed to the significant gap between the way machines represent knowledge and the way humans represent categories in their minds and learn to recognize them. In this context, this research represents categories as semantic trees in a high-level attribute space and proposes an approach to utilize these representations to conduct N-Shot, Few-Shot, One-Shot, and Zero-Shot Learning (ZSL). This work refers to this paradigm as the problem of general classification (GCP) and proposes a unified framework for GCP referred to as the Few and Zero-Shot Technique (FAZT). FAZT framework is an end-to-end approach that uses trainable 3D convolutional neural networks and recurrent neural networks to simultaneously optimize for both the semantic and the classification tasks. Lastly, the problem of systematically obtaining semantic attributes by utilizing domain-specific ontologies is presented. The proposed framework is validated in the domains of hand gesture and action/activity recognition, however, this research can be applied to other domains such as video understanding, the study of human behavior, emotion recognition, etc. First, an attribute-based dataset for gestures is developed in a systematic manner by relying on literature in gestures and semantics, and crowdsourced platforms such as Amazon Mechanical Turk. To the best of our knowledge, this is the first ZSL dataset for hand gestures (ZSGL dataset). Next, our framework is evaluated in two experimental conditions: 1. Within-category (to test the attribute recognition power) and 2. Across-category (to test the ability to recognize an unknown category). In addition, we conducted experiments in zero-shot, one-shot, few-shot and continuous learning conditions in both open-set and closed-set scenarios. Results showed that our framework performs favorably on the ZSGL, Kinetics, UIUC Action, UCF101 and HMDB51 action datasets in all the experimental conditions.<br></div><div><br></div> Computer Engineering transfer learning machine learning deep learning zero-shot learning few-shot learning lifelong learning gesture recognition activity recognition semantic description agreement analysis
15	Query By Example Keyword Spotting Sunde Valfridsson, Jonas January 2021 (has links) Voice user interfaces have been growing in popularity and with them an interest for open vocabulary keyword spotting. In this thesis we focus on one particular approach to open vocabulary keyword spotting, query by example keyword spotting. Three types of query by example keyword spotting approaches are described and evaluated: sequence distances, speech to phonemes and deep distance learning. Evaluation is done on a series of custom tasks designed to measure a variety of aspects. The Google Speech Commands benchmark is used for evaluation as well, this to make it more comparable to existing works. From the results, the deep distance learning approach seem most promising in most environments except when memory is very constrained; in which sequence distances might be considered. The speech to phonemes methods is lacking in the usability evaluation. / Röstgränssnitt har växt i populäritet och med dem ett intresse för öppenvokabulärnyckelordsigenkänning. I den här uppsatsen fokuserar vi på en specifik form av öppenvokabulärnyckelordsigenkänning, den s.k nyckelordsigenkänning- genom- exempel. Tre typer av nyckelordsigenkänning- genom- exempel metoder beskrivs och utvärderas: sekvensavstånd, tal till fonem samt djupavståndsinlärning. Utvärdering görs på konstruerade uppgifter designade att mäta en mängd olika aspekter hos metoderna. Google Speech Commands data används för utvärderingen också, detta för att göra det mer jämförbart mot existerade arbeten. Från resultaten framgår det att djupavståndsinlärning verkar mest lovande förutom i miljöer där resurser är väldigt begränsade; i dessa kan sekvensavstånd vara av intresse. Tal till fonem metoderna visar brister i användningsuvärderingen. Keyword Spotting Automatic Speech Recognition ASR Query By Example Deep Distance Learning Dynamic Time Warping Few- Shot Learning Nyckelords igenkänning automatisk taligenkänning fåförsöksinlärning Computer and Information Sciences Data- och informationsvetenskap
16	Multilingual Zero-Shot and Few-Shot Causality Detection Reimann, Sebastian Michael January 2021 (has links) Relations that hold between causes and their effects are fundamental for a wide range of different sectors. Automatically finding sentences that express such relations may for example be of great interest for the economy or political institutions. However, for many languages other than English, a lack of training resources for this task needs to be dealt with. In recent years, large, pretrained transformer-based model architectures have proven to be very effective for tasks involving cross-lingual transfer such as cross-lingual language inference, as well as multilingual named entity recognition, POS-tagging and dependency parsing, which may hint at similar potentials for causality detection. In this thesis, we define causality detection as a binary labelling problem and use cross-lingual transfer to alleviate data scarcity for German and Swedish by using three different classifiers that make either use of multilingual sentence embeddings obtained from a pretrained encoder or pretrained multilingual language models. The source languages in most of our experiments will be English, for Swedish we however also use a small German training set and a combination of English and German training data. We try out zero-shot transfer as well as making use of limited amounts of target language data either as a development set or as additional training data in a few-shot setting. In the latter scenario, we explore the impact of varying sizes of training data. Moreover, the problem of data scarcity in our situation also makes it necessary to work with data from different annotation projects. We also explore how much this would impact our result. For German as a target language, our results in a zero-shot scenario expectedly fall short in comparison with monolingual experiments, but F1-macro scores between 60 and 65 in cases where annotation did not differ drastically still signal that it was possible to transfer at least some knowledge. When introducing only small amounts of target language data, already notable improvements were observed and with the full German training data of about 3,000 sentences combined with the most suitable English data set, the performance for German in some scenarios even almost matches the state of the art for monolingual experiments on English. The best zero-shot performance on the Swedish data was even outperforming the scores achieved for German. However, due to problems with the additional Swedish training data, we were not able to improve upon the zero-shot performance in a few-shot setting in a similar manner as it was the case for German. classification causality causal relation multilingual cross-lingual zero-shot few-shot bert xlm-r laser
17	Investigating Few-Shot Transfer Learning for Address Parsing : Fine-Tuning Multilingual Pre-Trained Language Models for Low-Resource Address Segmentation / En Undersökning av Överföringsinlärning för Adressavkodning med Få Exempel : Finjustering av För-Tränade Språkmodeller för Låg-Resurs Adress Segmentering Heimisdóttir, Hrafndís January 2022 (has links) Address parsing is the process of splitting an address string into its different address components, such as street name, street number, et cetera. Address parsing has been quite extensively researched and there exist some state-ofthe-art address parsing solutions, mostly unilingual. In more recent years research has emerged which focuses on multinational address parsing and deep architecture address parsers have been used to achieve state-of-the-art performance on multinational address data. However, training these deep architectures for address parsing requires a rather large amount of address data which is not always accessible. Generally within Natural Language Processing (NLP) data is difficult to come by and most of the NLP data available consists of data from about only 20 of the approximately 7000 languages spoken around the world, so-called high-resource languages. This also applies to address data, which can be difficult to come by for some of the so-called low-resource languages of the world for which little or no NLP data exists. To attempt to deal with the lack of address data availability for some of the less spoken languages of the world, the current project investigates the potential of FewShot Learning (FSL) for multinational address parsing. To investigate this, two few-shot transfer learning models are implemented, both implementations consist of a fine-tuned pre-trained language model (PTLM). The difference between the two models is the PTLM used, which were the multilingual language models mBERT and XLM-R, respectively. The two PTLMs are finetuned using a linear classifier layer to then be used as multinational address parsers. The two models are trained and their results are compared with a state-of-the-art multinational address parser, Deepparse, as well as with each other. Results show that the two models do not outperform Deepparse, but they do show promising results, not too far from what Deepparse achieves on holdout and zero-shot datasets. On a mix of low- and high-resource language address data, both models perform well and achieve over 96% on the overall F1-score. Out of the two models used for implementation, XLM-R achieves significantly better results than mBERT and can therefore be considered the more appropriate PTLM to use for multinational FSL address parsing. Based on these results the conclusion is that there is great potential for FSL within the field of multinational address parsing and that general FSL methods can be used and perform well on multinational address parsing tasks. / Adressavkodning är processen att dela upp en adresssträng i dess olika adresskomponenter såsom gatunamn, gatunummer, et cetera. Adressavkodning har undersökts ganska omfattande och det finns några toppmoderna adressavkodningslösningar, mestadels enspråkiga. Senaste åren har forskning fokuserad på multinationell adressavkodning börjat dyka upp och djupa arkitekturer för adressavkodning har använts för att uppnå toppmodern prestation på multinationell adressdata. Att träna dessa arkitekturer kräver dock en ganska stor mängd adressdata, vilket inte alltid är tillgängligt. Det är generellt svårt att få tag på data inom naturlig språkbehandling och majoriteten av den data som är tillgänglig består av data från endast 20 av de cirka 7000 språk som används runt om i världen, så kallade högresursspråk. Detta gäller även för adressdata, vilket kan vara svårt att få tag på för vissa av världens så kallade resurssnåla språk för vilka det finns lite eller ingen data för naturlig språkbehandling. För att försöka behandla denna brist på adressdata för några av världens mindre talade språk undersöker detta projekt om det finns någon potential för inlärning med få exempel för multinationell adressavkodning. För detta implementeras två modeller för överföringsinlärning med få exempel genom finjustering av förtränade språkmodeller. Skillnaden mellan de två modellerna är den förtränade språkmodellen som används, mBERT respektive XLM-R. Båda modellerna finjusteras med hjälp av ett linjärt klassificeringsskikt för att sedan användas som multinationella addressavkodare. De två modellerna tränas och deras resultat jämförs med en toppmodern multinationell adressavkodare, Deepparse. Resultaten visar att de två modellerna presterar båda sämre än Deepparse modellen, men de visar ändå lovande resultat, inte långt ifrån vad Deepparse uppnår för både holdout och zero-shot dataset. Vidare, så presterar båda modeller bra på en blandning av adressdata från låg- och högresursspråk och båda modeller uppnår över 96% övergripande F1-score. Av de två modellerna uppnår XLM-R betydligt bättre resultat än mBERT och kan därför anses vara en mer lämplig förtränad språkmodell att använda för multinationell inlärning med få exempel för addressavkodning. Utifrån dessa resultat dras slutsatsen att det finns stor potential för inlärning med få exempel inom området multinationall adressavkodning, samt att generella metoder för inlärning med få exempel kan användas och preseterar bra på multinationella adressavkodningsuppgifter. Address Parsing Address Segmentation Few-Shot Learning Transfer Learning Named Entity Recognition Adressavkodning Adress Segmentering Inlärning med Få Exempel Överföringsinlärning Computer and Information Sciences Data- och informationsvetenskap
18	Improving a Few-shot Named Entity Recognition Model Using Data Augmentation / Förbättring av en existerande försöksmodell för namnidentifiering med få exempel genom databerikande åtgärder Mellin, David January 2022 (has links) To label words of interest into a predefined set of named entities have traditionally required a large amount of labeled in-domain data. Recently, the availability of pre-trained transformer-based language models have enabled multiple natural language processing problems to utilize transfer learning techniques to construct machine learning models with less task-specific labeled data. In this thesis, the impact of data augmentation when training a pre-trained transformer-based model to adapt to a named entity recognition task with few labeled sentences is explored. The experimental results indicate that data augmentation increases performance of the trained models, however the data augmentation is shown to have less impact when more labeled data is available. In conclusion, data augmentation has been shown to improve performance of pre-trained named entity recognition models when few labeled sentences are available for training. / Att kategorisera ord som tillhör någon av en mängd förangivna entiteter har traditionellt krävt stora mängder förkategoriserad områdesspecifik data. På senare år har det tillgängliggjorts förtränade språkmodeller som möjliggjort för språkprocesseringsproblem att lösas med en mindre mängd områdesspecifik kategoriserad data. I den här uppsatsen utforskas datautöknings påverkan på en maskininlärningsmodell för identifiering av namngivna entiteter. De experimentella resultaten indikerar att datautökning förbättrar modellerna, men att inverkan blir mindre när mer kategoriserad data är tillgänglig. Sammanfattningsvis så kan datautökning förbättra modeller för identifiering av namngivna entiteter när få förkategoriserade meningar finns tillgängliga för träning. Named Entity Recognition Data Augmentation Self-training BERT Few-shot Learning Identifiering av namngivna entiteter Datautökning Självträning BERT Fåförsöksinlärning Computer Sciences Datavetenskap (datalogi)
19	Zero/Few-Shot Text Classification : A Study of Practical Aspects and Applications / Textklassificering med Zero/Few-Shot Learning : En Studie om Praktiska Aspekter och Applikationer Åslund, Jacob January 2021 (has links) SOTA language models have demonstrated remarkable capabilities in tackling NLP tasks they have not been explicitly trained on – given a few demonstrations of the task (few-shot learning), or even none at all (zero-shot learning). The purpose of this Master’s thesis has been to investigate practical aspects and potential applications of zero/few-shot learning in the context of text classification. This includes topics such as combined usage with active learning, automated data labeling, and interpretability. Two different methods for zero/few-shot learning have been investigated, and the results indicate that: • Active learning can be used to marginally improve few-shot performance, but it seems to be mostly beneficial in settings with very few samples (e.g. less than 10). • Zero-shot learning can be used produce reasonable candidate labels for classes in a dataset, given knowledge of the classification task at hand. • It is difficult to trust the predictions of zero-shot text classification without access to a validation dataset, but IML methods such as saliency maps could find usage in debugging zero-shot models. / Ledande språkmodeller har uppvisat anmärkningsvärda förmågor i att lösa NLP-problem de inte blivit explicit tränade på – givet några exempel av problemet (few-shot learning), eller till och med inga alls (zero-shot learning). Syftet med det här examensarbetet har varit att undersöka praktiska aspekter och potentiella tillämpningar av zero/few-shot learning inom kontext av textklassificering. Detta inkluderar kombinerad användning med aktiv inlärning, automatiserad datamärkning, och tolkningsbarhet. Två olika metoder för zero/few-shot learning har undersökts, och resultaten indikerar att: • Aktiv inlärning kan användas för att marginellt förbättra textklassificering med few-shot learning, men detta verkar vara mest fördelaktigt i situationer med väldigt få datapunkter (t.ex. mindre än 10). • Zero-shot learning kan användas för att hitta lämpliga etiketter för klasser i ett dataset, givet kunskap om klassifikationsuppgiften av intresse. • Det är svårt att lita på robustheten i textklassificering med zero-shot learning utan tillgång till valideringsdata, men metoder inom tolkningsbar maskininlärning såsom saliency maps skulle kunna användas för att felsöka zero-shot modeller. zero-shot learning few-shot learning text classification active learning automated data labeling interpretable machine learning deep learning NLP NLU zero-shot learning few-shot learning textklassificering aktiv inlärning automatiserad datamärkning tolkningsbar maskininlärning djupinlärning NLP NLU Computer and Information Sciences Data- och informationsvetenskap
20	Towards Understanding Generalization in Gradient-Based Meta-Learning Guiroy, Simon 08 1900 (has links) Dans ce mémoire, nous étudions la généralisation des réseaux de neurones dans le contexte du méta-apprentissage, en analysant divers propriétés des surface leurs fonctions objectifs. La recherche en apprentissage automatique portant sur les surfaces de fonctions objectifs des réseaux de neurones ayant aidé à comprendre leur généralisation en apprentissage supervisé standard, nous proposons l'étude de telles surfaces dans le but d'approfondir nos connaissances sur la généralisation en méta-apprentissage. Nous introduisons d'abord la littérature sur les fonctions objectifs des réseaux de neurones à la Section \ref{sec:intro:objective_landscapes}, puis celle portant sur le méta-apprentissage à la Section \ref{sec:intro:meta-learning}, pour enfin terminer notre introduction avec le méta-apprentissage par descente de gradient, très similaire à l'entraînement des réseaux de neurones par descente de gradient stochastique et pour une tâche unique. Nous présentons par la suite notre travail sur les fonctions objectifs en méta-apprentissage au Chapitre \ref{chap:prof_forcing}, lequel nous avons soumis à la conférence NeurIPS 2019 en tant qu'article scientifique. Au moment d'écrire ce mémoire, et au meilleur de notre connaissance, ce travail est le premier à étudier empiriquement les surfaces des fonctions objectifs en méta-apprentissage, particulièrement dans le contexte de l'apprentissage profond, et nous mettons notamment en lumière certaines propriétés de ces surfaces qui apparaissent liées à la généralisation des réseaux de neurones à de nouvelles tâches. Nous démontrons empiriquement qu'alors que progresse la phase de méta-entraînement, pour les solutions aux nouvelles tâches obtenues via quelques itérations de descente de gradient, la courbure de la fonction objective décroit monotoniquement, la valeur de la fonction objective diminue, tandis que la distance euclidienne avec la solution ``méta-entraînement" augmente. Cependant, nous observons que la courbure des minima continue de décroître même lorsque le sur-apprentissage devient apparent et que la généralisation commence à se dégrader, indiquant que la courbure des minima semble peu corrélée à la généralisation en méta-apprentissage par descente de gradient. De plus, nous montrons empiriquement que la généralisation aux nouvelles tâches semble plutôt liée à la cohérence de leurs trajectoires d'adaptation dans l'espace des paramètres, mesurée par la similarité cosinus moyenne entre les trajectoires. Nous montrons également que la cohérence des gradients ''meta-test", mesurée par le produit scalaire moyen entre les vecteurs de gradients spécifiques aux nouvelles tâches, évalué à solution meta-entraînement, est également corrélée à la généralisation. Nous basant sur ces observations, nous proposons un nouveau terme de régularisation pour l'algorithme de méta-apprentissage Model Agnostic Meta-Learning (MAML). / In this master's thesis, we study the generalization of neural networks in gradient-based meta-learning by analyzing various properties of the objective landscapes. Meta-learning, a challenging paradigm where models not only have to learn a task but beyond that, are trained for ``learning to learn" as they must adapt to new tasks and environments with very limited data about them. With research on the objective landscapes of neural networks in classical supervised having provided some answers regarding their ability to generalize for new data points, we propose similar analyses aimed at understanding generalization in meta-learning. We first introduce the literature on objective landscapes of neural networks in Section \ref{sec:intro:objective_landscapes}. We then introduce the literature of meta-learning in Section \ref{chap:prof_forcing}, concluding our introduction with the approach of gradient-based meta-learning, a meta-learning setup that bears strong similarities to the traditional supervised learning setup through stochastic gradient-based optimization. At the time of writing of this thesis, and to the best of our knowledge, this is the first work to empirically study the objective landscapes in gradient-based meta-learning, especially in the context of deep learning. We notably provide some insights on some properties of those landscapes that appear correlated to the generalization to new tasks. We experimentally demonstrate that as meta-training progresses, the meta-test solutions, obtained after adapting the meta-train solution of the model, to new tasks via few steps of gradient-based fine-tuning, become flatter, lower in loss, and further away from the meta-train solution. We also show that those meta-test solutions become flatter even as generalization starts to degrade, thus providing experimental evidence against the correlation between generalization and flat minima in the paradigm of gradient-based meta-leaning. Furthermore, we provide empirical evidence that generalization to new tasks is correlated with the coherence between their adaptation trajectories in parameter space, measured by the average cosine similarity between task-specific trajectory directions, starting from a same meta-train solution. We also show that coherence of meta-test gradients, measured by the average inner product between the task-specific gradient vectors evaluated at meta-train solution, is also correlated with generalization. Based on these observations, we propose a novel regularizer for the Model Agnostic Meta-Learning (MAML) algorithm and provide experimental evidence for its effectiveness. Apprentissage profond Méta-apprentissage Fonction objectif Généralisation Apprentissage via peu d'exemples Deep learning Meta-learning Objective landscapes Generalization Few-shot learning

Search results