• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 241
  • 10
  • 10
  • 10
  • 3
  • 2
  • 2
  • 1
  • Tagged with
  • 324
  • 324
  • 145
  • 122
  • 116
  • 99
  • 73
  • 66
  • 62
  • 58
  • 57
  • 54
  • 52
  • 52
  • 52
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
281

A study of transfer learning on data-driven motion synthesis frameworks / En studie av kunskapsöverföring på datadriven rörelse syntetiseringsramverk

Chen, Nuo January 2022 (has links)
Various research has shown the potential and robustness of deep learning-based approaches to synthesise novel motions of 3D characters in virtual environments, such as video games and films. The models are trained with the motion data that is bound to the respective character skeleton (rig). It inflicts a limitation on the scalability and the applicability of the models since they can only learn motions from one particular rig (domain) and produce motions in that domain only. Transfer learning techniques can be used to overcome this issue and allow the models to better adapt to other domains with limited data. This work presents a study of three transfer learning techniques for the proposed Objective-driven motion generation model (OMG), which is a model for procedurally generating animations conditioned on positional and rotational objectives. Three transfer learning approaches for achieving rig-agnostic encoding (RAE) are proposed and experimented with: Feature encoding (FE), Feature clustering (FC) and Feature selection (FS), to improve the learning of the model on new domains with limited data. All three approaches demonstrate significant improvement in both the performance and the visual quality of the generated animations, when compared to the vanilla performance. The empirical results indicate that the FE and the FC approaches yield better transferring quality than the FS approach. It is inconclusive which of them performs better, but the FE approach is more computationally efficient, which makes it the more favourable choice for real-time applications. / Många studier har visat potentialen och robustheten av djupinlärningbaserade modeller för syntetisering av nya rörelse för 3D karaktärer i virtuell miljö, som datorspel och filmer. Modellerna är tränade med rörelse data som är bunden till de respektive karaktärskeletten (rig). Det begränsar skalbarheten och tillämpningsmöjligheten av modellerna, eftersom de bara kan lära sig av data från en specifik rig (domän) och därmed bara kan generera animationer i den domänen. Kunskapsöverföringsteknik (transfer learning techniques) kan användas för att överkomma denna begränsning och underlättar anpassningen av modeller på nya domäner med begränsade data. I denna avhandling presenteras en studie av tre kunskapsöverföringsmetoder för den föreslagna måldriven animationgenereringsnätverk (OMG), som är ett neural nätverk-baserad modell för att procedurellt generera animationer baserade på positionsmål och rotationsmål. Tre metoder för att uppnå rig-agnostisk kodning är presenterade och experimenterade: Feature encoding (FE), Feature clustering (FC) and Feature selection (FS), för att förbättra modellens lärande på nya domäner med begränsade data. All tre metoderna visar signifikant förbättring på både prestandan och den visuella kvaliteten av de skapade animationerna, i jämförelse med den vanilla prestandan. De empiriska resultaten indikerar att både FE och FC metoderna ger bättre överföringskvalitet än FS metoden. Det går inte att avgöra vilken av de presterar bättre, men FE metoden är mer beräkningseffektiv, vilket är fördelaktigt för real-time applikationer.
282

Transfer Learning for Soil Spectroscopy Based on Convolutional Neural Networks and Its Application in Soil Clay Content Mapping Using Hyperspectral Imagery

Liu, Lanfa, Ji, Min, Buchroithner, Manfred 14 December 2018 (has links)
Soil spectra are often measured in the laboratory, and there is an increasing number of large-scale soil spectral libraries establishing across the world. However, calibration models developed from soil libraries are difficult to apply to spectral data acquired from the field or space. Transfer learning has the potential to bridge the gap and make the calibration model transferrable from one sensor to another. The objective of this study is to explore the potential of transfer learning for soil spectroscopy and its performance on soil clay content estimation using hyperspectral data. First, a one-dimensional convolutional neural network (1D-CNN) is used on Land Use/Land Cover Area Frame Survey (LUCAS) mineral soils. To evaluate whether the pre-trained 1D-CNN model was transferrable, LUCAS organic soils were used to fine-tune and validate the model. The fine-tuned model achieved a good accuracy (coefficient of determination (R²) = 0.756, root-mean-square error (RMSE)= 7.07 and ratio of percent deviation (RPD) = 2.26) for the estimation of clay content. Spectral index, as suggested as a simple transferrable feature, was also explored on LUCAS data, but did not performed well on the estimation of clay content. Then, the pre-trained 1D-CNN model was further fine-tuned by field samples collect in the study area with spectra extracted from HyMap imagery, achieved an accuracy of R² = 0.601, RMSE = 8.62 and RPD = 1.54. Finally, the soil clay map was generated with the fine-tuned 1D-CNN model and hyperspectral data.
283

Zero-shot, One Kill: BERT for Neural Information Retrieval

Efes, Stergios January 2021 (has links)
[Background]: The advent of bidirectional encoder representation from trans- formers (BERT) language models (Devlin et al., 2018) and MS Marco, a large scale human-annotated dataset for machine reading comprehension (Bajaj et al., 2016) that made publicly available, led the field of information retrieval (IR) to experience a revolution (Lin et al., 2020). The retrieval model based on BERT of Nogueira and Cho (2019), by the time they published their paper, became the top entry in the MS Marco passage-reranking leaderboard, surpassing the previous state of the art by 27% in MRR@10. However, training such neural IR models for different domains than MS Marco is still hard because neural approaches often require a vast amount of training data to perform effectively, which is not always available. To address the problem of the shortage of labelled data a new line of research emerged, training neural models with weak supervision. In weak supervision, given an unlabelled dataset labels are generated automatically using an existing model and then a machine learning model is trained upon the artificial “weak“ data. In case of weak supervision for IR, the training dataset comes in the form of a tuple (query, passage). Dehghani et al. (2017) in their work used the AOL query logs (Pass et al., 2006), which is a set of millions of real web queries, and BM25 to retrieve the relevant passages for each of the user queries. A drawback with this approach is that it is hard to obtain query logs for every single different domain. [Objective]: This thesis proposes an intuitive approach for addressing the shortage of data in domains with limited or no data at all through transfer learning in the context of IR. We leverage Wikipedia’s structure for creating a Wikipedia-based generic IR training dataset for zero-shot neural models. [Method]: We create the “pseudo-queries“ by concatenating the titles of Wikipedia’s articles along with each of their title sections and we consider the associated section’s passage as the relevant passage of the pseudo-queries. All of our experiments are evaluated on a standard collection: MS Marco, which is a large scale web collection. For our zero-shot experiments, our proposed model, called “Wiki“, is a BERT model trained on the artificial Wikipedia-based dataset and the baseline is a default BERT model without any additional training. In our second line of experiments, we explore the benefits gained by pre-fine- tuning on the Wikipedia-based IR dataset and further fine-tuning on in-domain data. Our proposed model, "Wiki+Ma", is a BERT model pre-fine-tuned in the Wikipedia-based dataset and further fine-tuned in MS Marco, while the baseline is a BERT model fine-tuned only in MS Marco. [Results]: Results regarding our first experiments show that our BERT model trained on the Wikipedia-based IR dataset, called "Wiki", achieves a performance of 0.197 in MRR@10, which is about +10 points more in comparison to a BERT model with default weights; in addition, results in the development set indicate that the “Wiki“ model performs better than BERT model trained on in-domain data when the data is between 10k-50k instances. Results regarding our second line of experiments show that pre-fine-tuning on the Wikipedia-based IR dataset benefits later fine-tuning steps on in-domain data in terms of stability. [Conclusion]: Our findings suggest that transfer learning for IR tasks by leveraging the generic knowledge incorporated in Wikipedia is possible, though more experimentation is needed to understand its limitations in comparison with the traditional approaches such as the BM25.
284

Deep Learning Models for Human Activity Recognition

Albert Florea, George, Weilid, Filip January 2019 (has links)
AMI Meeting Corpus (AMI) -databasen används för att undersöka igenkännande av gruppaktivitet. AMI Meeting Corpus (AMI) -databasen ger forskare fjärrstyrda möten och naturliga möten i en kontorsmiljö; mötescenario i ett fyra personers stort kontorsrum. För attuppnågruppaktivitetsigenkänninganvändesbildsekvenserfrånvideosoch2-dimensionella audiospektrogram från AMI-databasen. Bildsekvenserna är RGB-färgade bilder och ljudspektrogram har en färgkanal. Bildsekvenserna producerades i batcher så att temporala funktioner kunde utvärderas tillsammans med ljudspektrogrammen. Det har visats att inkludering av temporala funktioner både under modellträning och sedan förutsäga beteende hos en aktivitet ökar valideringsnoggrannheten jämfört med modeller som endast använder rumsfunktioner[1]. Deep learning arkitekturer har implementerats för att känna igen olika mänskliga aktiviteter i AMI-kontorsmiljön med hjälp av extraherade data från the AMI-databas.Neurala nätverks modellerna byggdes med hjälp av KerasAPI tillsammans med TensorFlow biblioteket. Det finns olika typer av neurala nätverksarkitekturer. Arkitekturerna som undersöktes i detta projektet var Residual Neural Network, Visual GeometryGroup 16, Inception V3 och RCNN (LSTM). ImageNet-vikter har använts för att initialisera vikterna för Neurala nätverk basmodeller. ImageNet-vikterna tillhandahålls av Keras API och är optimerade för varje basmodell [2]. Basmodellerna använder ImageNet-vikter när de extraherar funktioner från inmatningsdata. Funktionsextraktionen med hjälp av ImageNet-vikter eller slumpmässiga vikter tillsammans med basmodellerna visade lovande resultat. Både Deep Learning användningen av täta skikt och LSTM spatio-temporala sekvens predikering implementerades framgångsrikt. / The Augmented Multi-party Interaction(AMI) Meeting Corpus database is used to investigate group activity recognition in an office environment. The AMI Meeting Corpus database provides researchers with remote controlled meetings and natural meetings in an office environment; meeting scenario in a four person sized office room. To achieve the group activity recognition video frames and 2-dimensional audio spectrograms were extracted from the AMI database. The video frames were RGB colored images and audio spectrograms had one color channel. The video frames were produced in batches so that temporal features could be evaluated together with the audio spectrogrames. It has been shown that including temporal features both during model training and then predicting the behavior of an activity increases the validation accuracy compared to models that only use spatial features [1]. Deep learning architectures have been implemented to recognize different human activities in the AMI office environment using the extracted data from the AMI database.The Neural Network models were built using the Keras API together with TensorFlow library. There are different types of Neural Network architectures. The architecture types that were investigated in this project were Residual Neural Network, Visual Geometry Group 16, Inception V3 and RCNN(Recurrent Neural Network). ImageNet weights have been used to initialize the weights for the Neural Network base models. ImageNet weights were provided by Keras API and was optimized for each base model[2]. The base models uses ImageNet weights when extracting features from the input data.The feature extraction using ImageNet weights or random weights together with the base models showed promising results. Both the Deep Learning using dense layers and the LSTM spatio-temporal sequence prediction were implemented successfully.
285

Alzheimer prediction from connected speech extracts : assessment of generalisation to new data

Chafouleas, Geneviève 09 1900 (has links)
co-direction : Simona Brambati / Plusieurs avancées utilisant le discours obtenu de la tâche de description d’image ont été réalisées dans la détection de la maladie d’Alzheimer (AD). L’utilisation de caractéristiques linguistiques et acoustiques sélectionnées manuellement ainsi que l’utilisation de méthodologies d’apprentissage profond ont montré des résultats très prometteurs dans la classification des patients avec AD. Dans ce mémoire, nous comparons les deux méthodologies sur la scène Cookie Theft du Boston Aphasia Examination en entrainant des modèles avec des caractéristiques sélectionnées à partir des extraits textuels et audio ainsi que sur un modèle d’apprentissage profond BERT. Nos modèles sont entrainés sur l’ensemble de données ADReSS challenge plus récent et évaluées sur l’ensemble de données CCNA et vice versa pour mesurer la généralisation des modèles sur des exemples jamais vus dans des ensembles de données différents. Une évaluation détaillée de l’interprétabilité des modèles est effectuée pour déterminer si les modèles ont bien appris les représentations reliées à la maladie. Nous observons que les modèles ne performent pas bien lorsqu’ils sont évalués sur différents ensembles de données provenant du même domaine. Les représentations apprises des modèles entrainés sur les deux ensembles de données sont très différentes, ce qui pourrait expliquer le bas niveau de performance durant l’étape d’évaluation. Même si nous démontrons l’importance des caractéristiques linguistiques sur la classification des AD vs contrôle, nous observons que le meilleur modèle est BERT avec un niveau d’exactitude de 62.6% sur les données ADReSS challenge et 66.7% sur les données CCNA. / Many advances have been made in the early diagnosis of Alzheimer’s Disease (AD) using connected speech elicited from a picture description task. The use of hand built linguistic and acoustic features as well as Deep Learning approaches have shown promising results in the classification of AD patients. In this research, we compare both approaches on the Cookie Theft scene from the Boston Aphasia Examination with models trained with features derived from the text and audio extracts as well as a Deep Learning approach using BERT. We train our models on the newer ADReSS challenge dataset and evaluate on the CCNA dataset and vice versa in order to asses the generalisation of the trained model on unseen examples from a different dataset. A thorough evaluation of the interpretability of the models is performed to see how well each of the models learn the representations related to the disease. It is observed that the models do not perform well when evaluated on a different dataset from the same domain. The selected and learned representations from the models trained on either dataset are very different and may explain the low performance in the evaluation step. While we demonstrate the importance of linguistic features in the classification of AD vs non-AD, we find the best overall model is BERT which achieves a test accuracy of 62.6% on the ADRess challenge dataset and 66.7% on the CCNA dataset.
286

Dialogue systems based on pre-trained language models

Zeng, Yan 07 1900 (has links)
Les modèles de langue pré-entraînés ont montré leur efficacité dans beaucoup de tâches de traitement de la langue naturelle. Ces modèles peuvent capter des régularités générales d'une langue à partir d'un grand ensemble de textes, qui sont utiles dans la plupart des applications en traitement de langue naturelle. Dans ce mémoire, nous étudions les problèmes de dialogue, i.e. générer une réponse à un énoncé de l'utilisateur. Nous exploitons les modèles de langue pré-entraînés pour traiter différents aspects des systèmes de dialogue. Premièrement, les modèles de langue pré-entraînés sont entraînés and utilisés dans les systèmes de dialogue de différentes façons. Il n'est pas clair quelle façon est la plus appropriée. Pour le dialogue orienté-tâche, l’approche de l'état de l'art pour le suivi de l'état de dialogue (Dialogue State Tracking) utilise BERT comme encodeur et empile un autre réseau de neurones récurrent (RNN) sur les sorties de BERT comme décodeur. Dans ce cas, seul l'encodeur peut bénéficier des modèles de langue pré-entraînés. Dans la première partie de ce mémoire, nous proposons une méthode qui utilise un seul modèle BERT pour l'encodeur et le décodeur, permettant ainsi un ajustement de paramètres plus efficace. Notre méthode atteint une performance qui dépasse l'état de l'art. Pour la tâche de génération de réponses dans un chatbot, nous comparons 4 approches communément utilisées. Elles sont basées sur des modèles pré-entraînés et utilisent des objectifs et des mécanismes d'attention différents. En nous appuyant sur des expérimentations, nous observons l'impact de deux types de disparité qui sont largement ignorées dans la littérature: disparité entre pré-entraînement et peaufinage, et disparité entre peaufinage et génération de réponse. Nous montrons que l'impact de ces disparités devient évident quand le volume de données d’entraînement est limité. Afin de remédier à ce problème, nous proposons deux méthodes qui réduisent les disparités, permettant d'améliorer la performance. Deuxièmement, même si les méthodes basées sur des modèles pré-entraînés ont connu de grands succès en dialogue général, nous devons de plus en plus traiter le problème de dialogue conditionné, c'est-à-dire dialogue en relation à une certaine condition (qui peut désigner un personnage, un sujet, etc.). Des chercheurs se sont aussi intéressés aux systèmes de chatbot avec des habiletés de conversation multiples, i.e. chatbot capable de confronter différentes situations de dialogues conditionnés. Ainsi, dans la seconde partie de ce mémoire, nous étudions le problème de génération de dialogue conditionné. D'abord, nous proposons une méthode générale qui exploite non seulement des données de dialogues conditionnées, mais aussi des données non-dialogues (textes) conditionnées. Ces dernières sont beaucoup plus faciles à acquérir en pratique. Ceci nous permet d'atténuer le problème de rareté de données. Ensuite, nous proposons des méthodes qui utilisent le concept d'adaptateur proposé récemment dans la littérature. Un adaptateur permet de renforcer un système de dialogue général en lui donnant une habileté spécifique. Nous montrons que les adaptateurs peuvent encoder des habiletés de dialogue conditionné de façon stricte ou flexible, tout en utilisant seulement 6% plus de paramètres. Ce mémoire contient 4 travaux sur deux grands problèmes de dialogue: l'architecture inhérente du modèle de dialogue basé sur des modèles de langue pré-entraînés, et l'enrichissement d'un système de dialogue général pour avoir des habiletés spécifiques. Ces travaux non seulement nous permettent d'obtenir des performances dépassant de l'état de l'art, mais aussi soulignent l'importance de concevoir l'architecture du modèle pour bien correspondre à la tâche, plutôt que simplement augmenter le volume de données d'entraînement et la puissance de calcul brute. / Pre-trained language models (LMs) have shown to be effective in many NLP tasks. They can capture general language regularities from a large amount of texts, which are useful for most applications related to natural languages. In this thesis, we study the problems of dialogue, i.e. to generate a response to a user's utterance. We exploit pre-trained language models to deal with different aspects of dialogue systems. First, pre-trained language models have been trained and used in different ways in dialogue systems and it is unclear what is the best way to use pre-trained language models in dialogue. For task-oriented dialogue systems, the state-of-the-art framework for Dialogue State Tracking (DST) uses BERT as the encoder and stacks an RNN upon BERT outputs as the decoder. Pre-trained language models are only leveraged for the encoder. In the first part of the thesis, we investigate methods using a single BERT model for both the encoder and the decoder, allowing for more effective parameter updating. Our method achieves new state-of-the-art performance. For the task of response generation in generative chatbot systems, we further compare the 4 commonly used frameworks based on pre-trained LMs, which use different training objectives and attention mechanisms. Through extensive experiments, we observe the impact of two types of discrepancy: pretrain-finetune discrepancy and finetune-generation discrepancy (i.e. differences between pre-training and fine-tuning, and between fine-tuning and generation), which have not been paid attention to. We show that the impact of the discrepancies will surface when limited amount of training data is available. To alleviate the problem, we propose two methods to reduce discrepancies, yielding improved performance. Second, even though pre-training based methods have shown excellent performance in general dialogue generation, we are more and more faced with the problem of conditioned conversation, i.e. conversation in relation with some condition (persona, topic, etc.). Researchers are also interested in multi-skill chatbot systems, namely equipping a chatbot with abilities to confront different conditioned generation tasks. Therefore, in the second part of the thesis, we investigate the problem of conditioned dialogue generation. First, we propose a general method that leverages not only conditioned dialogue data, but also conditioned non-dialogue text data, which are much easier to collect, in order to alleviate the data scarcity issue of conditioned dialogue generation. Second, the concept of Adapter has been recently proposed, which adapts a general dialogue system to enhance some dialogue skill. We investigate the ways to learn a dialogue skill. We show that Adapter has enough capacity to model a dialogue skill for either loosely-conditioned or strictly-conditioned response generation, while using only 6% more parameters. This thesis contains 4 pieces of work relating to the two general problems in dialogue systems: the inherent architecture for dialogue systems based on pre-trained LMs, and enhancement of a general dialogue system for some specific skills. The studies not only propose new approaches that outperform the current state of the art, but also stress the importance of carefully designing the model architecture to fit the task, instead of simply increasing the amount of training data and the raw computation power.
287

3D Object Detection Using Virtual Environment Assisted Deep Network Training

Dale, Ashley S. 12 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / An RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety of configurations. When the MR-CNN architecture was initialized with MS COCO weights and the heads were trained with a mix of synthetic data and real world data, F1 scores improved in four of the five classes: The average maximum F1-score of all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91, compared to F1 = 0.89 for the networks trained exclusively with real data, and the standard deviation of the maximum mean F1-score for synthetically trained networks is σ∗ = 0.015, compared to σ_F1 = 0.020 for the networks trained exclusively with real F1 data. Various backgrounds in synthetic data were shown to have negligible impact on F1 scores, opening the door to abstract backgrounds and minimizing the need for intensive synthetic data fabrication. When the MR-CNN architecture was initialized with MS COCO weights and depth data was included in the training data, the net- work was shown to rely heavily on the initial convolutional input to feed features into the network, the image depth channel was shown to influence mask generation, and the image color channels were shown to influence object classification. A set of latent variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering based on image background.
288

Neural Methods Towards Concept Discovery from Text via Knowledge Transfer

Das, Manirupa January 2019 (has links)
No description available.
289

Improving Image Classification using Domain Adaptation for Autonomous Driving : A Master Thesis in Collaboration with Scania / Förbättring av Bildklassificering med hjälp av Domain Adaptation för Sjävkörande Fordon : Ett examensarbete i samarbete med Scania

Westlund, Mikael January 2023 (has links)
Autonomous driving is a rapidly changing industry and has recently become a heavily focused research topic for vehicle producing companies and research organizations. These autonomous vehicles are typically equipped with sensors such as Light Detection and Radar (LiDAR) in order to perceive their surroundings. The problem of detecting and classifying surrounding objects from the sensor data can be solved using different types of algorithms. Recently, machine learning solutions have been investigated. One problem with the machine learning approach is that the models usually require a substantial amount of labeled data, and labeling LiDAR data is a time-consuming process. A promising solution to this problem is utilizing Domain Adaptation (DA) methods. The DA methods can use labeled camera data, which are easier to label, in conjunction with unlabeled LiDAR data to improve the performance of machine learning models on LiDAR data. This thesis investigates and compares different DA methods that can be used for classification of LiDAR data. In this thesis, two image classification datasets with data of humans and vehicles were created. One dataset contains camera images, and the other dataset contains LiDAR intensity images. The datasets were used to train and test three methods: (1) a baseline method, which simply uses labeled camera images to train a model. (2) Correlation Alignment (CORAL), a DA method that aligns the covariance of camera features towards LiDAR features. (3) Deep Adaptation Network (DAN), a DA method that includes a maximum mean discrepancy computation between camera and LiDAR features within the objective function of the model. These methods were then evaluated based on the resulting confusion matrices, accuracy, recall, precision and F1-score on LiDAR data. The results showed that DAN was the best out of the three methods, reaching an accuracy of 87% while the baseline and CORAL only measured at 65% and 73%, respectively. The strong performance of DAN showed that there is potential for using DA methods within the field of autonomous vehicles. / Industrin för självkörande fordon är snabbt förändlig och har under de senaste åren fått ett enormt fokus från biltillverkare och forskningsorganisationer. De självkörande fordonen är oftast utrustade med sensorer som Light Detection and Radar (LiDAR) för att hjälpa fordonen förstå omgivningen. Klassificering och identifiering av omgivande objekt är ett problem som kan lösas med hjälp av olika slags algoritmer. Nyligen har lösningar som utnyttjar maskininlärning undersökts. Ett problem med dessa lösningar är att modellerna oftast kräver en enorm mängd annoterad data, och att annotera LiDAR-data är en kostsam process. En lösning till detta problem är att utnyttja metoder inom Domain Adaptation (DA). DA metoder kan utnyttja både annoterad kameradata samt oannoterad LiDAR-data för att förbättra modellernas prestanda på LiDAR-data. Den här avhandlingen undersöker och jämför olika metoder inom DA som kan användas för att klassificera LiDAR-data. I det här arbetet skapades två dataset som består av data från människor och fordon. Det ena datasettet innehöll kamerabilder och det andra innehöll LiDAR-intensitetsbilder. Dessa dataset användes för att träna och testa tre olika metoder: (1) en baselinemetod, som endast använde annoterade kamerabilder för att träna en modell. (2) Correlation Alignment (CORAL), en metod inom DA som justerar kovariansen hos kamerafeatures mot kovariansen hos LiDAR-features. (3) Deep Adaptation Network (DAN), en metod inom DA som lägger till en uträkning av maximum mean discrepancy mellan kamerafeatures och LiDAR-features i modellens optimeringskriterie. Metoderna bedömdes sedan beroende på deras förvirringsmatriser, träffsäkerhet, precision, täckning och F1-träffsäkerhet på LiDAR-data. Resultaten avslöjade att DAN presterade bäst av de tre metoderna och uppnåde 87% träffsäkerhet medan baselinemetoden och CORAL bara uppnådde 65% respektive 73%. DANs imponerande prestation visade att det finns potential för att använda metoder inom DA för självkörande fordon.
290

Cross-Lingual and Genre-Supervised Parsing and Tagging for Low-Resource Spoken Data

Fosteri, Iliana January 2023 (has links)
Dealing with low-resource languages is a challenging task, because of the absence of sufficient data to train machine-learning models to make predictions on these languages. One way to deal with this problem is to use data from higher-resource languages, which enables the transfer of learning from these languages to the low-resource target ones. The present study focuses on dependency parsing and part-of-speech tagging of low-resource languages belonging to the spoken genre, i.e., languages whose treebank data is transcribed speech. These are the following: Beja, Chukchi, Komi-Zyrian, Frisian-Dutch, and Cantonese. Our approach involves investigating different types of transfer languages, employing MACHAMP, a state-of-the-art parser and tagger that uses contextualized word embeddings, mBERT, and XLM-R in particular. The main idea is to explore how the genre, the language similarity, none of the two, or the combination of those affect the model performance in the aforementioned downstream tasks for our selected target treebanks. Our findings suggest that in order to capture speech-specific dependency relations, we need to incorporate at least a few genre-matching source data, while language similarity-matching source data are a better candidate when the task at hand is part-of-speech tagging. We also explore the impact of multi-task learning in one of our proposed methods, but we observe minor differences in the model performance.

Page generated in 0.0756 seconds