Global ETD Search

341	Automatic Categorization of News Articles With Contextualized Language Models / Automatisk kategorisering av nyhetsartiklar med kontextualiserade språkmodeller Borggren, Lukas January 2021 (has links) This thesis investigates how pre-trained contextualized language models can be adapted for multi-label text classification of Swedish news articles. Various classifiers are built on pre-trained BERT and ELECTRA models, exploring global and local classifier approaches. Furthermore, the effects of domain specialization, using additional metadata features and model compression are investigated. Several hundred thousand news articles are gathered to create unlabeled and labeled datasets for pre-training and fine-tuning, respectively. The findings show that a local classifier approach is superior to a global classifier approach and that BERT outperforms ELECTRA significantly. Notably, a baseline classifier built on SVMs yields competitive performance. The effect of further in-domain pre-training varies; ELECTRA’s performance improves while BERT’s is largely unaffected. It is found that utilizing metadata features in combination with text representations improves performance. Both BERT and ELECTRA exhibit robustness to quantization and pruning, allowing model sizes to be cut in half without any performance loss. Natural Language Processing Text Classification Hierarchical Classification Domain Specialization Metadata Features Model Compression Quantization Pruning Machine Learning Deep Learning Contextualized Language Models BERT ELECTRA News Media
342	Zero-shot, One Kill: BERT for Neural Information Retrieval Efes, Stergios January 2021 (has links) [Background]: The advent of bidirectional encoder representation from trans- formers (BERT) language models (Devlin et al., 2018) and MS Marco, a large scale human-annotated dataset for machine reading comprehension (Bajaj et al., 2016) that made publicly available, led the field of information retrieval (IR) to experience a revolution (Lin et al., 2020). The retrieval model based on BERT of Nogueira and Cho (2019), by the time they published their paper, became the top entry in the MS Marco passage-reranking leaderboard, surpassing the previous state of the art by 27% in MRR@10. However, training such neural IR models for different domains than MS Marco is still hard because neural approaches often require a vast amount of training data to perform effectively, which is not always available. To address the problem of the shortage of labelled data a new line of research emerged, training neural models with weak supervision. In weak supervision, given an unlabelled dataset labels are generated automatically using an existing model and then a machine learning model is trained upon the artificial “weak“ data. In case of weak supervision for IR, the training dataset comes in the form of a tuple (query, passage). Dehghani et al. (2017) in their work used the AOL query logs (Pass et al., 2006), which is a set of millions of real web queries, and BM25 to retrieve the relevant passages for each of the user queries. A drawback with this approach is that it is hard to obtain query logs for every single different domain. [Objective]: This thesis proposes an intuitive approach for addressing the shortage of data in domains with limited or no data at all through transfer learning in the context of IR. We leverage Wikipedia’s structure for creating a Wikipedia-based generic IR training dataset for zero-shot neural models. [Method]: We create the “pseudo-queries“ by concatenating the titles of Wikipedia’s articles along with each of their title sections and we consider the associated section’s passage as the relevant passage of the pseudo-queries. All of our experiments are evaluated on a standard collection: MS Marco, which is a large scale web collection. For our zero-shot experiments, our proposed model, called “Wiki“, is a BERT model trained on the artificial Wikipedia-based dataset and the baseline is a default BERT model without any additional training. In our second line of experiments, we explore the benefits gained by pre-fine- tuning on the Wikipedia-based IR dataset and further fine-tuning on in-domain data. Our proposed model, "Wiki+Ma", is a BERT model pre-fine-tuned in the Wikipedia-based dataset and further fine-tuned in MS Marco, while the baseline is a BERT model fine-tuned only in MS Marco. [Results]: Results regarding our first experiments show that our BERT model trained on the Wikipedia-based IR dataset, called "Wiki", achieves a performance of 0.197 in MRR@10, which is about +10 points more in comparison to a BERT model with default weights; in addition, results in the development set indicate that the “Wiki“ model performs better than BERT model trained on in-domain data when the data is between 10k-50k instances. Results regarding our second line of experiments show that pre-fine-tuning on the Wikipedia-based IR dataset benefits later fine-tuning steps on in-domain data in terms of stability. [Conclusion]: Our findings suggest that transfer learning for IR tasks by leveraging the generic knowledge incorporated in Wikipedia is possible, though more experimentation is needed to understand its limitations in comparison with the traditional approaches such as the BM25. neural information retrieval passage ranking weak supervision question answering passage reranking BERT transfer-learning in IR zero-shot IR passage-retrieval BERT for passage-retrieval MS Marco information retrieval neural IR
343	Hotspot Detection for Automatic Podcast Trailer Generation / Hotspot-detektering för automatisk generering av podcast-trailers Zhu, Winstead Xingran January 2021 (has links) With podcasts being a fast growing audio-only form of media, an effective way of promoting different podcast shows becomes more and more vital to all the stakeholders concerned, including the podcast creators, the podcast streaming platforms, and the podcast listeners. This thesis investigates the relatively little studied topic of automatic podcast trailer generation, with the purpose of en- hancing the overall visibility and publicity of different podcast contents and gen- erating more user engagement in podcast listening. This thesis takes a hotspot- based approach, by specifically defining the vague concept of “hotspot” and designing different appropriate methods for hotspot detection. Different meth- ods are analyzed and compared, and the best methods are selected. The selected methods are then used to construct an automatic podcast trailer generation sys- tem, which consists of four major components and one schema to coordinate the components. The system can take a random podcast episode audio as input and generate an around 1 minute long trailer for it. This thesis also proposes two human-based podcast trailer evaluation approaches, and the evaluation results show that the proposed system outperforms the baseline with a large margin and achieves promising results in terms of both aesthetics and functionality. automatic podcast trailer generation hotspot detection speech emotion recognition text emotion recognition text arousal detection pull-quote selection music detection laughter detection affect analysis affective computing machine learning neural network
344	Using a Character-Based Language Model for Caption Generation / Användning av teckenbaserad språkmodell för generering av bildtext Keisala, Simon January 2019 (has links) Using AI to automatically describe images is a challenging task. The aim of this study has been to compare the use of character-based language models with one of the current state-of-the-art token-based language models, im2txt, to generate image captions, with focus on morphological correctness. Previous work has shown that character-based language models are able to outperform token-based language models in morphologically rich languages. Other studies show that simple multi-layered LSTM-blocks are able to learn to replicate the syntax of its training data. To study the usability of character-based language models an alternative model based on TensorFlow im2txt has been created. The model changes the token-generation architecture into handling character-sized tokens instead of word-sized tokens. The results suggest that a character-based language model could outperform the current token-based language models, although due to time and computing power constraints this study fails to draw a clear conclusion. A problem with one of the methods, subsampling, is discussed. When using the original method on character-sized tokens this method removes characters (including special characters) instead of full words. To solve this issue, a two-phase approach is suggested, where training data first is separated into word-sized tokens where subsampling is performed. The remaining tokens are then separated into character-sized tokens. Future work where the modified subsampling and fine-tuning of the hyperparameters are performed is suggested to gain a clearer conclusion of the performance of character-based language models. Natural Language Processing NLP Machine Learning ML Neural Network Caption Generation Deep Learning Recurrent Neural Network Long-Short-Term-Memory LSTM word2vec Language Model
345	En undersökning av AI-verktyget Whisper som potentiell ersättare till det manuella arbetssättet inom undertextframtagning / A Study of the AI-tool Whisper as a Potential Substitute to the Manual Process of Subtitling Kaka, Mailad Waled Kider, Oummadi, Yassin January 2023 (has links) Det manuella arbetssättet för undertextframtagning är en tidskrävande och kostsam process. Arbetet undersöker AI-verktyget Whisper och dess potential att ersätta processen som används idag. Processen innefattar både transkribering och översättning. För att verktyget ska kunna göra denna transkribering och översättning behöver den i första hand kunna omvandla tal till text. Detta kallas för taligenkänning och är baserat på upptränade språkmodeller. Precisionen för transkriberingen kan mätas med ordfelfrekvens (Word Error Rate – WER) och för översättningen med COMET-22. Resultaten visade sig klara av Microsofts krav för maximalt tillåten WER och anses därför vara tillräckligt bra för användning. Resultaten indikerade även att de maskinproducerade översättningarna uppnår tillfredställande kvalitet. Undertextframtagning, som är det andra steget i processen, visade sig Whisper ha svårare för när det gäller skapandet av undertexter. Detta gällde både för transkriberingen i originalspråk samt den engelsköversatta versionen. Kvaliteten på undertexternas formatering, som mäts med SubER-metoden, kan tolkas som för låga för att anses vara användbara. Resultaten låg i intervallet 59 till 96% vilket innebär hur stor del av den automatiskt tillverkade undertexten behöver korrigeras för att matcha referensen. Den övergripande slutsatsen man kan dra är att Whisper eventuellt kan ersätta den faktiska transkriberings -och översättningsprocessen, då den både är snabbare och kostar mindre resurser än det manuella tillvägagångssättet. Den är dock inte i skrivande stund tillräcklig för att ersätta undertextframtagningen. / The manual process of subtitling creation is a time consuming and costly process. This study examines the AI-tool Whisper and its potential of substituting the process used today. The process consists of both speech recognition and speech translation. For the tool to accomplish the transcription and translation, it first needs to be able to convert speech-to-text. This is called speech recognition and is based on trained speech models. The precision for the transcription can be measured using the Word Error Rate (WER), while the translation uses COMET-22 for measuring precision. The results met the requirements for maximal allowed WER-value and were therefore considered to be usable. The results also indicated that the machine produced translations reached satisfactory quality. Subtitle creation, which is the second part of the process, turned out to be more of a challenge for Whisper. This applied to both the transcription in the original language and the English translated version. The quality of the subtitling format, measured using the SubER-method, can be interpreted as too low to be considered useful. The results were in the interval of 59 to 96% which informs how large part of the automatically created subtitle need to be corrected to match the reference. The conclusion one can draw is that Whisper could eventually substitute the actual transcription and translation process, since it is both faster and costs less resources than the manual process. Though it is not good enough, in the moment of writing, to substitute the subtitling creation. Manual subtitling creation AI Whisper speech recognition speech translation Word Error Rate COMET-22 SubER Manuell undertextframtagning AI Whisper taligenkänning talöversättning Word Error Rate COMET-22 SubER
346	Prompt-learning and Zero-shot Text Classification with Domain-specific Textual Data Luo, Hengyu January 2023 (has links) The rapid growth of textual data in the digital age presents unique challenges in domain-specific text classification, particularly the scarcity of labeled data for many applications, due to expensive cost of manual labeling work. In this thesis, we explore the applicability of prompt-learning method, which is well-known for being suitable in few-shot scenarios and much less data-consuming, as an emerging alternative to traditional fine-tuning methods, for domain-specific text classification in the context of customer-agent interactions in the retail sector. Specifically, we implemented the entire prompt-learning pipeline for the classification task, and, our investigation encompasses various strategies of prompt-learning, including fixed-prompt language model tuning strategy and tuning-free prompting strategy, along with an examination of language model selection, few-shot sampling strategy, prompt template design, and verbalizer design. In this manner, we assessed the overall performance of the prompt-learning method in the classification task. Through a systematic evaluation, we demonstrate that with the fixed-prompt language model tuning strategy, based on relatively smaller language models (e.g. T5-base with around 220M parameters), prompt-learning can achieve competitive performance (close to 75% accuracy) even with limited labeled data (up to merely 15% of full data). And besides, with the tuning-free prompting strategy, based on a regular-size language model (e.g. FLAN-T5-large with around 770M parameters), the performance can be up to around 30% accuracy with detailed prompt templates and zero-shot setting (no extra training data involved). These results can offer valuable insights for researchers and practitioners working with domain-specific textual data, prompt-learning and few-shot / zero-shot learning. The findings of this thesis highlight the potential of prompt-learning as a practical solution for classification problems across diverse domains and set the stage for future research in this area. prompt-learning zero-shot few-shot text classification domain-specific retail sector domain customer-agent interaction transformer large language models ChatGPT natural language processing machine learning deep learning
347	Cross-Lingual and Genre-Supervised Parsing and Tagging for Low-Resource Spoken Data Fosteri, Iliana January 2023 (has links) Dealing with low-resource languages is a challenging task, because of the absence of sufficient data to train machine-learning models to make predictions on these languages. One way to deal with this problem is to use data from higher-resource languages, which enables the transfer of learning from these languages to the low-resource target ones. The present study focuses on dependency parsing and part-of-speech tagging of low-resource languages belonging to the spoken genre, i.e., languages whose treebank data is transcribed speech. These are the following: Beja, Chukchi, Komi-Zyrian, Frisian-Dutch, and Cantonese. Our approach involves investigating different types of transfer languages, employing MACHAMP, a state-of-the-art parser and tagger that uses contextualized word embeddings, mBERT, and XLM-R in particular. The main idea is to explore how the genre, the language similarity, none of the two, or the combination of those affect the model performance in the aforementioned downstream tasks for our selected target treebanks. Our findings suggest that in order to capture speech-specific dependency relations, we need to incorporate at least a few genre-matching source data, while language similarity-matching source data are a better candidate when the task at hand is part-of-speech tagging. We also explore the impact of multi-task learning in one of our proposed methods, but we observe minor differences in the model performance. dependency parsing part-of-speech tagging low-resource languages transcribed speech large language models cross-lingual learning transfer learning multi-task learning Universal Dependencies
348	Incorporating speaker’s role in classification of text-based dialogues Stålhandske, Therese January 2020 (has links) Dialogues are an interesting type of document, as they contain a speaker role feature not found in other types of texts. Previous work has included incorporating a speaker role dependency in text-generation, but little has been done in the realm of text classification. In this thesis, we incorporate speaker role dependency in a classification model by creating different speaker dependent word representations and simulating a conversation within neural networks. The results show a significant improvement in the performance of the binary classification of dialogues, with incorporated speaker role information. Further, by extracting attention weights from the model, we are given an insight into how the speaker’s role affects the interpretation of utterances, giving an intuitive explanation of our model. / Konversationer är en speciell typ av text, då den innehåller information om talare som inte hittas i andra typer av dokument. Tidigare arbeten har inkluderat en talares roll i generering av text, men lite har gjorts inom textklassificering. I det här arbetet, introducerar vi deltagarens roller till en klassifikationsmodell. Detta görs genom att skapa ordrepresentationer, som är beroende på deltagaren i konversationen, samt simulering av en konversation inom ett neuralt nätverk. Resultaten visar en signifikant förbättring av prestandan i binär klassificering av dialoger, med talares roll inkluderat. Vidare, genom utdragning av attentionvikterna, kan vi få en bättre överblick över hur en talares roll påverkar tolkningen av yttranden, vilket i sin tur ger en mer intuitiv förklaring av vår modell. Natural Language Processing Text classification Binary Classification Conversations Speaker context Hierarchical Attention Networks Attention Role Dependent Classification Model Språkteknologi klassificering av konversationer textklassifisering neurala nätverk Computer and Information Sciences Data- och informationsvetenskap
349	Improvements of the syntax of the query language DQL / Förbättringar i syntax för query språket DQL Diep, Mikael, Cheimonettos, Anestis January 2023 (has links) This thesis focuses on improving the syntax of a query language named DQL(Dynamic Query Language) in order to enhance the user experience and productivity of its users. The study investigates the original state of the query language and identifies areas for improvement in terms of intuitiveness, efficiency, and consistency. Through an extensive review of existing literature and case studies, the thesis develops a set of guidelines for designing intuitive query languages that minimise the cognitive load for users. The thesis also proposes several modifications to the syntax of DQL that aim to simplify the structure and improve the readability of queries. Finally, the thesis evaluates the effectiveness of the proposed modifications through semi-structured interviews to compare the original syntax with the proposed new one. DQL language design language syntax query language syntax syntaxis user experience DQL språkdesign språksyntax query språk syntax användarupplevelse Computer and Information Sciences Data- och informationsvetenskap
350	Improving accuracy of speech recognition for low resource accents : Testing the performance of fine-tuned Wav2vec2 models on accented Swedish / Förbättrad taligenkänning för lågresurs-brytningar : Testning av prestandan för finjusterade Wav2vec2-modeller på bryten svenska Dabiri, Arash January 2023 (has links) While the field of speech recognition has recently advanced quickly, even the highest performing models struggle with accents. There are several methods of improving the performance on accents, but many are hard to implement or need high amounts of data and are therefore costly to implement. Therefore, examining the performance of the Wav2vec2 architecture, which previously has performed well on small amounts of labeled data, becomes relevant. Using a model trained in Swedish, this thesis fine-tunes the model on small datasets of three Swedish accents, to create both accent-dependent specialized models as well as an accent-independent general model. The specialized models perform better than the original model, and the general model performs approximately as well as each specialized model without sacrificing performance on non-accented Swedish. This means that the Wav2vec2 framework offers a low cost method of improving speech recognition that can be used to improve private and public services for larger parts of the population. / Trots att området för taligenkänning nyligen har avancerat snabbt, presterar även de bästa modellerna sämre vid språk med utländsk brytning. Det finns flera metoder för att förbättra prestandan på accenter, men många är komplexa eller behöver stora mängder data och är därför dyra att implementera. Därför blir det relevant att undersöka prestandan för Wav2vec2-arkitekturen, som tidigare har presterat väl med små mängder märkt träningsdata. En modell tränad i svenska finjusteras i denna avhandling på tre små datamängder bestående av olika svenska brytningar, för att skapa både brytningsberoende specialiserade modeller såväl som en brytningsoberoende generell modell. De specialiserade modellerna presterar bättre än originalmodellen, och den allmänna modellen presterar ungefär lika bra som varje specialiserad modell utan att ge avkall på prestanda på ickebruten svenska. Detta innebär att ramverket Wav2vec2 erbjuder en lågkostnadsmetod för att förbättra taligenkänning som kan användas för att förbättra privata och offentliga tjänster för större delar av befolkningen. Speech-to-text deep learning accents wav2vec tal-till-text djupinlärning brytningar wav2vec Computer Sciences Datavetenskap (datalogi) Computer and Information Sciences Data- och informationsvetenskap

Search results