• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 44
  • 3
  • 1
  • 1
  • 1
  • Tagged with
  • 56
  • 56
  • 52
  • 29
  • 22
  • 21
  • 17
  • 16
  • 14
  • 12
  • 11
  • 11
  • 10
  • 10
  • 10
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Self-Supervised Representation Learning for Content Based Image Retrieval

Govindarajan, Hariprasath January 2020 (has links)
Automotive technologies and fully autonomous driving have seen a tremendous growth in recent times and have benefitted from extensive deep learning research. State-of-the-art deep learning methods are largely supervised and require labelled data for training. However, the annotation process for image data is time-consuming and costly in terms of human efforts. It is of interest to find informative samples for labelling by Content Based Image Retrieval (CBIR). Generally, a CBIR method takes a query image as input and returns a set of images that are semantically similar to the query image. The image retrieval is achieved by transforming images to feature representations in a latent space, where it is possible to reason about image similarity in terms of image content. In this thesis, a self-supervised method is developed to learn feature representations of road scenes images. The self-supervised method learns feature representations for images by adapting intermediate convolutional features from an existing deep Convolutional Neural Network (CNN). A contrastive approach based on Noise Contrastive Estimation (NCE) is used to train the feature learning model. For complex images like road scenes where mutiple image aspects can occur simultaneously, it is important to embed all the salient image aspects in the feature representation. To achieve this, the output feature representation is obtained as an ensemble of feature embeddings which are learned by focusing on different image aspects. An attention mechanism is incorporated to encourage each ensemble member to focus on different image aspects. For comparison, a self-supervised model without attention is considered and a simple dimensionality reduction approach using SVD is treated as the baseline. The methods are evaluated on nine different evaluation datasets using CBIR performance metrics. The datasets correspond to different image aspects and concern the images at different spatial levels - global, semi-global and local. The feature representations learned by self-supervised methods are shown to perform better than the SVD approach. Taking into account that no labelled data is required for training, learning representations for road scenes images using self-supervised methods appear to be a promising direction. Usage of multiple query images to emphasize a query intention is investigated and a clear improvement in CBIR performance is observed. It is inconclusive whether the addition of an attentive mechanism impacts CBIR performance. The attention method shows some positive signs based on qualitative analysis and also performs better than other methods for one of the evaluation datasets containing a local aspect. This method for learning feature representations is promising but requires further research involving more diverse and complex image aspects.
42

Evaluating the effects of data augmentations for specific latent features : Using self-supervised learning / Utvärdering av effekterna av datamodifieringar på inlärda representationer : Vid självövervakande maskininlärning

Ingemarsson, Markus, Henningsson, Jacob January 2022 (has links)
Supervised learning requires labeled data which is cumbersome to produce, making it costly and time-consuming. SimCLR is a self-supervising framework that uses data augmentations to learn without labels. This thesis investigates how well cropping and color distorting augmentations work for two datasets, MPI3D and Causal3DIdent. The representations learned are evaluated using representation similarity analysis. The data augmentations were meant to make the model learn invariant representations of the object shape in the images regarding it as content while ignoring unnecessary features and regarding them as style. As a result, 8 models were created, models A-H. A and E were trained using supervised learning as a benchmark for the remaining self-supervised models. B and C learned invariant features of style instead of learning invariant representations of shape. Model D learned invariant representations of shape. Although, it also regarded style-related factors as content. Model F, G, and H managed to learn invariant representations of shape with varying intensities while regarding the rest of the features as style. The conclusion was that models can learn invariant representations of features related to content using self-supervised learning with the chosen augmentations. However, the augmentation settings must be suitable for the dataset. / Övervakad maskininlärning kräver annoterad data, vilket är dyrt och tidskrävande att producera. SimCLR är ett självövervakande maskininlärningsramverk som använder datamodifieringar för att lära sig utan annoteringar. Detta examensarbete utvärderar hur väl beskärning och färgförvrängande datamodifieringar fungerar för två dataset, MPI3D och Causal3DIdent. De inlärda representationerna utvärderas med hjälp av representativ likhetsanalys. Syftet med examensarbetet var att få de självövervakande maskininlärningsmodellerna att lära sig oföränderliga representationer av objektet i bilderna. Meningen med datamodifieringarna var att påverka modellens lärande så att modellen tolkar objektets form som relevant innehåll, men resterande egenskaper som icke-relevant innehåll. Åtta modeller skapades (A-H). A och E tränades med övervakad inlärning och användes som riktmärke för de självövervakade modellerna. B och C lärde sig oföränderliga representationer som bör ha betraktas som irrelevant istället för att lära sig form. Modell D lärde sig oföränderliga representationer av form men också irrelevanta representationer. Modellerna F, G och H lyckades lära sig oföränderliga representationer av form med varierande intensitet, samtidigt som de resterande egenskaperna betraktades som irrelevant. Beskärning och färgförvrängande datamodifieringarna gör således att självövervakande modeller kan lära sig oföränderliga representationer av egenskaper relaterade till relevant innehåll. Specifika inställningar för datamodifieringar måste dock vara lämpliga för datasetet.
43

Data-efficient reinforcement learning with self-predictive representations

Schwarzer, Max 08 1900 (has links)
L'efficacité des données reste un défi majeur dans l'apprentissage par renforcement profond. Bien que les techniques modernes soient capables d'atteindre des performances élevées dans des tâches extrêmement complexes, y compris les jeux de stratégie comme le StarCraft, les échecs, le shogi et le go, ainsi que dans des domaines visuels exigeants comme les jeux Atari, cela nécessite généralement d'énormes quantités de données interactives, limitant ainsi l'application pratique de l'apprentissage par renforcement. Dans ce mémoire, nous proposons la SPR, une méthode inspirée des récentes avancées en apprentissage auto-supervisé de représentations, conçue pour améliorer l'efficacité des données des agents d'apprentissage par renforcement profond. Nous évaluons cette méthode sur l'environement d'apprentissage Atari, et nous montrons qu'elle améliore considérablement les performances des agents avec un surcroît de calcul modéré. Lorsqu'on lui accorde à peu près le même temps d'apprentissage qu'aux testeurs humains, un agent d'apprentissage par renforcement augmenté de SPR atteint des performances surhumaines dans 7 des 26 jeux, une augmentation de 350% par rapport à l'état de l'art précédent, tout en améliorant fortement les performances moyennes et médianes. Nous évaluons également cette méthode sur un ensemble de tâches de contrôle continu, montrant des améliorations substantielles par rapport aux méthodes précédentes. Le chapitre 1 présente les concepts nécessaires à la compréhension du travail présenté, y compris des aperçus de l'apprentissage par renforcement profond et de l'apprentissage auto-supervisé de représentations. Le chapitre 2 contient une description détaillée de nos contributions à l'exploitation de l'apprentissage de représentation auto-supervisé pour améliorer l'efficacité des données dans l'apprentissage par renforcement. Le chapitre 3 présente quelques conclusions tirées de ces travaux, y compris des propositions pour les travaux futurs. / Data efficiency remains a key challenge in deep reinforcement learning. Although modern techniques have been shown to be capable of attaining high performance in extremely complex tasks, including strategy games such as StarCraft, Chess, Shogi, and Go as well as in challenging visual domains such as Atari games, doing so generally requires enormous amounts of interactional data, limiting how broadly reinforcement learning can be applied. In this thesis, we propose SPR, a method drawing from recent advances in self-supervised representation learning designed to enhance the data efficiency of deep reinforcement learning agents. We evaluate this method on the Atari Learning Environment, and show that it dramatically improves performance with limited computational overhead. When given roughly the same amount of learning time as human testers, a reinforcement learning agent augmented with SPR achieves super-human performance on 7 out of 26 games, an increase of 350% over the previous state of the art, while also strongly improving mean and median performance. We also evaluate this method on a set of continuous control tasks, showing substantial improvements over previous methods. Chapter 1 introduces concepts necessary to understand the work presented, including overviews of Deep Reinforcement Learning and Self-Supervised Representation learning. Chapter 2 contains a detailed description of our contributions towards leveraging self-supervised representation learning to improve data-efficiency in reinforcement learning. Chapter 3 provides some conclusions drawn from this work, including a number of proposals for future work.
44

Self-supervision for data interpretability in image classification and sample efficiency in reinforcement learning

Rajkumar, Nitarshan 06 1900 (has links)
L'apprentissage auto-surveillé (AAS), c'est-à-dire l'apprentissage de connaissances en exploitant la structure intrinsèque présente dans un ensemble de données non étiquettées, a beaucoup fait progresser l'apprentissage automatique dans la dernière décennie, et plus particulièrement dans les dernières deux années en vision informatique. Dans cet ouvrage, nous nous servons de l'AAS comme outil dans deux champs applicatifs: Pour interpréter efficacement les ensembles de données et les décisions prises par des modèles statistiques, et pour pré-entrainer un modèle d'apprentissage par renforcement pour grandement augmenter l'efficacité de son échantillonnage dans son contexte d'entraînement. Le Chapitre 1 présente les connaissances de fond nécessaires à la compréhension du reste du mémoire. Il offre un aperçu de l'apprentissage automatique, de l'apprentissage profond, de l'apprentissage auto-surveillé et de l'apprentissage par renforcement (profond). Le Chapitre 2 se détourne brièvement du sujet de l'auto-surveillance pour étudier comment le phénomène de la mémorisation se manifeste dans les réseaux de neurones profonds. Les observations que nous ferons seront alors utilisées comme pièces justificatives pour les travaux présentés dans le Chapitre 3. Ce chapitre aborde la manière dont l'auto-surveillance peut être utilisée pour découvrir efficacement les régularités structurelles présentes dans un ensemble de données d'entraînement, estimer le degré de mémorisation de celui-ci par le modèle, et l'influence d'un échantillon d'entraînement sur les résultats pour un échantillon-test. Nous passons aussi en revue de récents travaux touchant à l'importance de mémoriser la ``longue traîne'' d'un jeu de données. Le Chapitre 4 fait la démonstration d'une combinaison d'objectifs de pré-entraînement AAS axés sur les caractéristiques des données en apprentissage par renforcement, de ce fait élevant l'efficacité d'échantillonnage à un niveau comparable à celui d'un humain. De plus, nous montrons que l'AAS ouvre la porte à de plus grands modèles, ce qui a été par le passé un défi à surmonter en apprentissage par renforcement profond. Finalement, le Chapitre 5 conclut l'ouvrage avec un bref survol des contributions scientifiques et propose quelque avenues pour des recherches poussées dans le futur. / Self-Supervised Learning (SSL), or learning representations of data by exploiting inherent structure present in it without labels, has driven significant progress in machine learning over the past decade, and in computer vision in particular over the past two years. In this work, we explore applications of SSL towards two separate goals - first, as a tool for efficiently interpreting datasets and model decisions, and second, as a tool for pretraining in reinforcement learning (RL) to greatly advance sample efficiency in that setting. Chapter 1 introduces background material necessary to understand the remainder of this thesis. In particular, it provides an overview of Machine Learning, Deep Learning, Self-Supervised Representation Learning, and (Deep) Reinforcement Learning. Chapter 2 briefly detours away from this thesis' focus on self-supervision, to examine how the phenomena of memorization manifests in deep neural networks. These results are then used to partially justify work presented in Chapter 3, which examines how self-supervision can be used to efficiently uncover structural regularity in training datasets, and to estimate training memorization and the influence of training samples on test samples. Recent experimental work on understanding the importance of memorizing the long-tail of data is also revisited. Chapter 4 demonstrates how a combination of SSL pretraining objectives designed for the structure of data in RL can greatly improve sample efficiency to nearly human-level performance. Furthermore, it is shown that SSL enables the use of larger models, which has historically been a challenge in deep RL. Chapter 5 concludes by reviewing the contributions of this work, and discusses future directions.
45

Musical Instrument Activity Detection using Self-Supervised Learning and Domain Adaptation / Självövervakad inlärning och Domänadaption för Musikinstrumentsaktivitetsigenkänning

Nyströmer, Carl January 2020 (has links)
With the ever growing media and music catalogs, tools that search and navigate this data are important. For more complex search queries, meta-data is needed, but to manually label the vast amounts of new content is impossible. In this thesis, automatic labeling of musical instrument activities in song mixes is investigated, with a focus on ways to alleviate the lack of annotated data for instrument activity detection models. Two methods for alleviating the problem of small amounts of data are proposed and evaluated. Firstly, a self-supervised approach based on automatic labeling and mixing of randomized instrument stems is investigated. Secondly, a domain-adaptation approach that trains models on sampled MIDI files for instrument activity detection on recorded music is explored. The self-supervised approach yields better results compared to the baseline and points to the fact that deep learning models can learn instrument activity detection without an intrinsic musical structure in the audio mix. The domain-adaptation models trained solely on sampled MIDI files performed worse than the baseline, however using MIDI data in conjunction with recorded music boosted the performance. A hybrid model combining both self-supervised learning and domain adaptation by using both sampled MIDI data and recorded music produced the best results overall. / I och med de ständigt växande media- och musikkatalogerna krävs verktyg för att söka och navigera i dessa. För mer komplexa sökförfrågningar så behövs det metadata, men att manuellt annotera de enorma mängderna av ny data är omöjligt. I denna uppsats undersöks automatisk annotering utav instrumentsaktivitet inom musik, med ett fokus på bristen av annoterad data för modellerna för instrumentaktivitetsigenkänning. Två metoder för att komma runt bristen på data föreslås och undersöks. Den första metoden bygger på självövervakad inlärning baserad på automatisk annotering och slumpartad mixning av olika instrumentspår. Den andra metoden använder domänadaption genom att träna modeller på samplade MIDI-filer för detektering av instrument i inspelad musik. Metoden med självövervakning gav bättre resultat än baseline och pekar på att djupinlärningsmodeller kan lära sig instrumentigenkänning trots att ljudmixarna saknar musikalisk struktur. Domänadaptionsmodellerna som endast var tränade på samplad MIDI-data presterade sämre än baseline, men att använda MIDI-data tillsammans med data från inspelad musik gav förbättrade resultat. En hybridmodell som kombinerade både självövervakad inlärning och domänadaption genom att använda både samplad MIDI-data och inspelad musik gav de bästa resultaten totalt.
46

Remote sensing representation learning for a species distribution modeling case study

Elkafrawy, Sara 08 1900 (has links)
Les changements climatiques et les phénomènes météorologiques extrêmes sont devenus des moteurs importants de changements de la biodiversité, posant une menace pour la perte d’habitat et l’extinction d’espèces. Comprendre l’état actuel de la biodiversité et identifier les zones hautement adaptées (still strugling with this expression, high suitability for who or what?) sont essentiels afin de lutter contre la perte de biodiversité et guider les processus décisionnels en lien avec les études scientifiques (added scientifiques, as in scientific surveys), les mesures de protection et les efforts de restauration. Les modèles de distribution des espèces (MDE ou SDM en anglais) sont des outils statistiques permettant de prédire la distribution géographique potentielle d’une espèce en fonction de variables environnementales et des données recueillies à cet endroit. Cependant, les MDE conventionnels sont souvent confrontés à des limitations dues à la résolution spatiale et à la couverture restreinte des variables environnementales, lesquelles sont obtenues suite à des mesures au sol ou à l’aide de stations météorologiques. Pour mieux comprendre la distribution des espèces à des fins de conservation, le défi GeoLifeCLEF 2022 a été organisé. Cette compétiion comprend un vaste ensemble de données composé de 1,6 million géo-observations liées à la présence de 17 000 espèces végétales et animales. L’objectif principal de ce défi est d’explorer le potentiel des données de télédétection afin de prédire la présence d’espèces à des géolocalisations spécifiques. Dans ce mémoire, nous étudions diverses techniques d’apprentissage automatique et leur performance en lien avec le défi GeoLifeCLEF 2022. Nous explorons l’efficacité d’algorithmes bien connus en apprentissage par transfert, établissons un cadre d’apprentissage non supervisé et étudions les approches d’apprentissage auto-supervisé lors de la phase d’entraînement. Nos résultats démontrent qu’un ajustement fin des encodeurs pré-entraînés sur différents domaines présente les résultats les plus prometteurs lors de la phase de test. / Climate change and extreme weather events have emerged as significant drivers of biodiversity changes, posing a threat of habitat loss and species extinction. Understanding the current state of biodiversity and identifying areas with high suitability for different species are vital in combating biodiversity loss and guiding decision-making processes for protective measures and restoration efforts. Species distribution models (SDMs) are statistical tools for predicting a species' potential geographic distribution based on environmental variables and occurrence data. However, conventional SDMs often face limitations due to the restricted spatial resolution and coverage of environmental variables derived from ground-based measurements or weather station data. To better understand species distribution for conservation purposes, the GeoLifeCLEF 2022 challenge was introduced. This competition encompasses a large dataset of 1.6 million geo-observations linked to the presence of 17,000 plant and animal species. The primary objective of this challenge is to explore the potential of remote sensing data in forecasting species' presence at specific geolocations. In this thesis, we investigate various machine learning techniques and their performance on the GeoLifeCLEF 2022 challenge. We explore the effectiveness of standard transfer learning algorithms, establish an unsupervised learning framework, and investigate self-supervised learning approaches for training. Our findings demonstrate that fine-tuning pre-trained encoders on different domains yields the most promising test set performance results.
47

Sur l'élaboration de meilleures techniques pour l'apprentissage auto-supervisé des représentations du code

Maes, Lucas 07 1900 (has links)
Les représentations du code apprises par les modèles d’apprentissage profond sont une composante cruciale pour certaines applications en génie logiciel telles que la recherche de code ou la détection de clones. Les performances de ces applications dépendent de la qualité des représentations apprises par les modèles. De fait, des représentations possédant peu de bruit et contenant des informations avec un haut niveau d’abstraction, comme la sémantique fonctionnelle, facilitent la résolution de ces tâches. En effet, la recherche de code nécessite de comprendre les objectifs des morceaux de code pour les comparer avec une requête en langage naturel, tandis que la détection de clone exige de déterminer si deux morceaux de code ont la même sémantique fonctionnelle. La capacité des modèles à apprendre des représentations contenant de telles informations abstraites est donc cruciale pour la bonne résolution de ces tâches. Cependant, il est toujours difficile pour les modèles de code d’apprendre des représentations abstraites indépendantes de la syntaxe, par exemple la sémantique fonctionnelle. Ce mémoire se consacre donc à l’élaboration de meilleures techniques pour l’apprentissage des représentations du code via l’apprentissage auto-supervisé. Plus spécifiquement, nous nous sommes concentrés sur deux tâches centrales dans l’automatisation du génie logiciel nécessitant un minimum de compréhension de la sémantique fonctionnelle, à savoir, la recherche de code et la détection de clones de type 4. Ce mémoire propose différentes approches à différents degrés d’entraînement. Le premier degré est le pré-entraînement et consiste à apprendre des représentations génériques du code adaptables à n’importe quels problèmes. Le second est le peaufinage, modifiant les représentations apprises pour un problème spécifique. Tout d’abord, nous proposons un nouvel algorithme de pré-entraînement pour les modèles de code utilisant une méthode non contrastive régularisée adaptée de VICReg, permettant l’apprentissage de représentations génériques. Ensuite, nous proposons un nouvel objectif de peaufinage des modèles de code utilisant la distillation des connaissances d’un ensemble de modèles déjà peaufinés, appelés enseignants, sur un modèle étudiant, lui permettant ainsi l’apprentissage de représentations plus abstraites. L’ensemble des contributions vise à améliorer les représentations du code et à maximiser les performances des modèles d’apprentissage automatique pour le code, mais aussi à déterminer quel est le meilleur degré d’entraînement à adopter pour cela. Les résultats expérimentaux et les analyses menées dans ce mémoire sont préliminaires et ne permettent pas de tirer de conclusions définitives. Néanmoins, il est important de souligner que la deuxième contribution surpasse la méthode classique de peaufinage des modèles pour la recherche de code. De plus, les approches décrites proposent des pistes de directions de recherche innovantes et non conventionnelles. / Code representations learned by deep learning models are a crucial component for certain software engineering applications such as code search or clone detection. The performance of these applications depends on the quality of the representations learned by the models. In fact, low-noise representations containing highly abstract information, such as functional semantics, facilitate the resolution of these tasks. Indeed, code search requires understanding the objectives of code snippets in order to compare them with a natural language query, while clone detection requires determining whether two code snippets have the same functional semantics. The ability of models to learn representations containing such abstract information is therefore crucial to the successful resolution of these tasks. However, it is still difficult for code models to learn abstract representations that are independent of syntax, such as functional semantics. This thesis is therefore dedicated to developing better techniques for learning code representations via self-supervised learning. More specifically, we focus on two central tasks in software engineering automation requiring a minimum understanding of functional semantics, namely, code search and type 4 clone detection. This work proposes different approaches with different degrees of training. The first, pre-training, consists in learning generic code representations that can be adapted to any problem. The second is fine-tuning, modifying the representations learned for a specific problem. First, we propose a new pre-training algorithm for code models using a regularized non-contrastive method adapted from VICReg [14] enabling the learning of generic representations. Secondly, we propose a new code model refinement objective using knowledge distillation of a set of already refined models, called teachers, on a student model allowing it to learn more abstract representations. The aim of all these contributions is not only to improve code representations and maximize the performance of machine learning models for code, but also to determine the best degree of training to adopt for this purpose. The experimental results and analyses carried out in this thesis are preliminary and do not allow to draw formal conclusions. Nevertheless, it is important to underline that the second contribution outperforms the classical model refinement method for code search. Moreover, the approaches described suggest innovative and unconventional research directions.
48

Messing With The Gap: On The Modality Gap Phenomenon In Multimodal Contrastive Representation Learning

Al-Jaff, Mohammad January 2023 (has links)
In machine learning, a sub-field of computer science, a two-tower architecture model is a specialised type of neural network model that encodes paired data from different modalities (like text and images, sound and video, or proteomics and gene expression profiles) into a shared latent representation space. However, when training these models using a specific contrastive loss function, known as the multimodalinfoNCE loss, seems to often lead to a unique geometric phenomenon known as the modality gap. This gap is a clear geometric separation of the embeddings of the modalities in the joint contrastive latent space. This thesis investigates the modality gap in multimodal machine learning, specifically in two-tower neural networks trained with multimodal-infoNCE loss. We examine the adequacy of the current definition of the modality gap, the conditions under which the modality gap phenomenon manifests, and its impact on representation quality and downstream task performance. The approach to address these questions consists of a two-phase experimental strategy. Phase I involves a series of experiments, ranging from toy synthetic simulations to true multimodal machine learning with complex datasets, to explore and characterise the modality gap under varying conditions. Phase II focuses on modifying the modality gap and analysing representation quality, evaluating different loss functions and their impact on the modality gap. This methodical exploration allows us to systematically dissect the emergence and implications of the modality gap phenomenon, providing insights into its impact on downstream tasks, measured with proxy metrics based on semantic clustering in the shared latent representation space and modality-specific linear probe evaluation. Our findings reveal that the modality gap definition proposed by W. Liang et al. 2022, is insufficient. We demonstrate that similar modality gap magnitudes can exhibit varying linear separability between modality embeddings in the contrastive latent space and varying embedding topologies, indicating the need for additional metrics to capture the true essence of the gap. Furthermore, our experiments show that the temperature hyperparameter in the multimodal infoNCE loss function plays a crucial role in the emergence of the modality gap, and this effect varies with different data sets. This suggests that individual dataset characteristics significantly influence the modality gap's manifestation. A key finding is the consistent emergence of modality gaps with small temperature settings in the fixed temperature mode of the loss function and almost invariably under learned temperature mode settings, regardless of the initial temperature value. Additionally, we observe that the magnitude of the modality gap is influenced by distribution shifts, with the gap magnitude increasing progressively from the training set to the validation set, then to the test set, and finally to more distributionally shifted datasets. We discover that the choice of contrastive learning method, temperature settings, and temperature values is crucial in influencing the modality gap. However, reducing the gap does not consistently improve downstream task performance, suggesting its role may be more nuanced than previously understood. This insight indicates that the modality gap might be a geometric by-product of the learning methods rather than a critical determinant of representation quality. Our results encourage the need to reevaluate the modality gap's significance in multimodal contrastive learning, emphasising the importance of dataset characteristics and contrastive learning methodology.
49

[pt] APRENDIZADO SEMI E AUTO-SUPERVISIONADO APLICADO À CLASSIFICAÇÃO MULTI-LABEL DE IMAGENS DE INSPEÇÕES SUBMARINAS / [en] SEMI AND SELF-SUPERVISED LEARNING APPLIED TO THE MULTI-LABEL CLASSIFICATION OF UNDERWATER INSPECTION IMAGE

AMANDA LUCAS PEREIRA 11 July 2023 (has links)
[pt] O segmento offshore de produção de petróleo é o principal produtor nacional desse insumo. Nesse contexto, inspeções submarinas são cruciais para a manutenção preventiva dos equipamentos, que permanecem toda a vida útil em ambiente oceânico. A partir dos dados de imagem e sensor coletados nessas inspeções, especialistas são capazes de prevenir e reparar eventuais danos. Tal processo é profundamente complexo, demorado e custoso, já que profissionais especializados têm que assistir a horas de vídeos atentos a detalhes. Neste cenário, o presente trabalho explora o uso de modelos de classificação de imagens projetados para auxiliar os especialistas a encontrarem o(s) evento(s) de interesse nos vídeos de inspeções submarinas. Esses modelos podem ser embarcados no ROV ou na plataforma para realizar inferência em tempo real, o que pode acelerar o ROV, diminuindo o tempo de inspeção e gerando uma grande redução nos custos de inspeção. No entanto, existem alguns desafios inerentes ao problema de classificação de imagens de inspeção submarina, tais como: dados rotulados balanceados são caros e escassos; presença de ruído entre os dados; alta variância intraclasse; e características físicas da água que geram certas especificidades nas imagens capturadas. Portanto, modelos supervisionados tradicionais podem não ser capazes de cumprir a tarefa. Motivado por esses desafios, busca-se solucionar o problema de classificação de imagens submarinas a partir da utilização de modelos que requerem menos supervisão durante o seu treinamento. Neste trabalho, são explorados os métodos DINO (Self-DIstillation with NO labels, auto-supervisionado) e uma nova versão multi-label proposta para o PAWS (Predicting View Assignments With Support Samples, semi-supervisionado), que chamamos de mPAWS (multi-label PAWS). Os modelos são avaliados com base em sua performance como extratores de features para o treinamento de um classificador simples, formado por uma camada densa. Nos experimentos realizados, para uma mesma arquitetura, se obteve uma performance que supera em 2.7 por cento o f1-score do equivalente supervisionado. / [en] The offshore oil production segment is the main national producer of this input. In this context, underwater inspections are crucial for the preventive maintenance of equipment, which remains in the ocean environment for its entire useful life. From the image and sensor data collected in these inspections,experts are able to prevent and repair damage. Such a process is deeply complex, time-consuming and costly, as specialized professionals have to watch hours of videos attentive to details. In this scenario, the present work explores the use of image classification models designed to help experts to find the event(s) of interest in under water inspection videos. These models can be embedded in the ROV or on the platform to perform real-time inference,which can speed up the ROV, monitor notification time, and greatly reduce verification costs. However, there are some challenges inherent to the problem of classification of images of armored submarines, such as: balanced labeled data are expensive and scarce; the presence of noise among the data; high intraclass variance; and some physical characteristics of the water that achieved certain specificities in the captured images. Therefore, traditional supervised models may not be able to fulfill the task. Motivated by these challenges, we seek to solve the underwater image classification problem using models that require less supervision during their training. In this work, they are explorers of the DINO methods (Self-Distillation with NO labels, self-supervised) anda new multi-label version proposed for PAWS (Predicting View AssignmentsWith Support Samples, semi-supervised), which we propose as mPAWS (multi-label PAWS). The models are evaluated based on their performance as features extractors for training a simple classifier, formed by a dense layer. In the experiments carried out, for the same architecture, a performance was obtained that exceeds by 2.7 percent the f1-score of the supervised equivalent.
50

Multilingual Speech Emotion Recognition using pretrained models powered by Self-Supervised Learning / Flerspråkig känsloigenkänning från tal med hjälp av förtränade tal-modeller baserat på själv-övervakad Inlärning

Luthman, Felix January 2022 (has links)
Society is based on communication, for which speech is the most prevalent medium. In day to day interactions we talk to each other, but it is not only the words spoken that matters, but the emotional delivery as well. Extracting emotion from speech has therefore become a topic of research in the area of speech tasks. This area as a whole has in recent years adopted a Self- Supervised Learning approach for learning speech representations from raw speech audio, without the need for any supplementary labelling. These speech representations can be leveraged for solving tasks limited by the availability of annotated data, be it for low-resource language, or a general lack of data for the task itself. This thesis aims to evaluate the performances of a set of pre-trained speech models by fine-tuning them in different multilingual environments, and evaluating their performance thereafter. The model presented in this paper is based on wav2vec 2.0 and manages to correctly classify 86.58% of samples over eight different languages and four emotional classes when trained on those same languages. Experiments were conducted to garner how well a model trained on seven languages would perform on the one left out, which showed that there is quite a large margin of similarity in how different cultures express vocal emotions, and further investigations showed that as little as just a few minutes of in-domain data is able to increase the performance substantially. This shows promising results even for niche languages, as the amount of available data may not be as large of a hurdle as one might think. With that said, increasing the amount of data from minutes to hours does still garner substantial improvements, albeit to a lesser degree. / Hela vårt samhälle är byggt på kommunikation mellan olika människor, varav tal är det vanligaste mediet. På en daglig basis interagerar vi genom att prata med varandra, men det är inte bara orden som förmedlar våra intentioner, utan även hur vi uttrycker dem. Till exempel kan samma mening ge helt olika intryck beroende på ifall den sägs med ett argt eller glatt tonfall. Talbaserad forskning är ett stort vetenskapligt område i vilket talbaserad känsloigenkänning vuxit fram. Detta stora tal-område har under de senaste åren sett en tendens att utnyttja en teknik kallad själv-övervakad inlärning för att utnyttja omärkt ljuddata för att lära sig generella språkrepresentationer, vilket kan liknas vid att lära sig strukturen av tal. Dessa representationer, eller förtränade modeller, kan sedan utnyttjas som en bas för att lösa problem med begränsad tillgång till märkt data, vilket kan vara fallet för sällsynta språk eller unika uppgifter. Målet med denna rapport är att utvärdera olika applikationer av denna representations inlärning i en flerspråkig miljö genom att finjustera förtränade modeller för känsloigenkänning. I detta syfte presenterar vi en modell baserad på wav2vec 2.0 som lyckas klassifiera 86.58% av ljudklipp tagna från åtta olika språk över fyra olika känslo-klasser, efter att modellen tränats på dessa språk. För att avgöra hur bra en modell kan klassifiera data från ett språk den inte tränats på skapades modeller tränade på sju språk, och evaluerades sedan på det språk som var kvar. Dessa experiment visar att sättet vi uttrycker känslor mellan olika kulturer är tillräckligt lika för att modellen ska prestera acceptabelt även i det fall då modellen inte sett språket under träningsfasen. Den sista undersökningen utforskar hur olika mängd data från ett språk påverkar prestandan på det språket, och visar att så lite som endast ett par minuter data kan förbättra resultet nämnvärt, vilket är lovande för att utvidga modellen för fler språk i framtiden. Med det sagt är ytterligare data att föredra, då detta medför fortsatta förbättringar, om än i en lägre grad.

Page generated in 0.1335 seconds