Global ETD Search

11	[en] ASSESSMENT OF FINE-TUNING ON END-TO-END SPEECH RECOGNITION MODELS / [pt] AVALIAÇÃO DE AJUSTE FINO EM MODELOS DE PONTA A PONTA PARA RECONHECIMENTO DE FALA JONATAS DOS SANTOS GROSMAN 04 November 2022 (has links) [pt] Utilizar representações fornecidas por um grande modelo pré-treinado tornou-se a principal estratégia para alcançar o estado da arte nas mais variadas tarefas. Um grande modelo pré-treinado recentemente proposto, wav2vec 2.0, foi seminal para vários outros trabalhos sobre pré-treinamento de grandes modelos em dados de fala. Muitos modelos estão sendo pré-treinados usando a mesma arquitetura baseada em transformer que o wav2vec 2.0 e estão obtendo o estado da arte em várias tarefas relacionadas à fala. No entanto, poucos trabalhos propuseram maiores análises sobre o comportamento desses modelos em diferentes cenários de fine-tuning. Nosso trabalho visa analisar esse modelo sobre dois aspectos diferentes. O primeiro é sobre a transferibilidade entre línguas desses modelos. Nossos experimentos nos mostraram que o tamanho dos dados usados durante o pré-treinamento desses modelos não é tão crucial para a transferibilidade quanto a diversidade. Percebemos que o desempenho das línguas indo-europeias é superior ao das línguas não indo-europeias nos modelos avaliados. Vimos uma transferência positiva de conhecimento entre línguas usando modelos monolinguais, o que foi percebido em todos os idiomas que usamos, mas foi mais evidente quando o idioma usado durante o pré-treinamento era mais semelhante ao idioma do fine-tuning. O segundo aspecto que investigamos em nosso trabalho é quão bem esses modelos se comportam em cenários de desbalanceamento de dados, onde há um subconjunto mais representativo no conjunto de dados do fine-tuning. Nossos resultados mostraram que o desbalanceamento dos dados no fine-tuning geralmente afeta o resultado final dos modelos, com melhor desempenho nos subconjuntos mais representativos. No entanto, uma maior variabilidade no conjunto de treinamento favorece o desempenhodo modelo para um subconjunto mais representativo. Porém essamaior variabilidade nos dados não favoreceu os idiomas não vistos durante o treinamento. Observamos também que os modelos parecem mais robustos em lidar com o desbalanceamento de gênero do que idade ou sotaque. Com esses achados, esperamos ajudar a comunidade científica na utilização de modelos pré-treinados existentes, bem como auxiliar no pré-treinamento de novosmodelos. / [en] Using representations given by a large pre-trained model has become the primary strategy to reach the state-of-the-art in the most varied tasks. A recently proposed large pre-trained model, wav2vec 2.0, was seminal for several other works on pre-training large models on speech data. Many models are being pre-trained using the same transformer-based architecture as wav2vec 2.0 and are getting state-of-the-art in various speech-related tasks. However, few works have proposed further analysis of these models in different finetuning scenarios. Our work investigates these models concerning two different aspects. The first is about the cross-lingual transferability of these models. Our experiments showed us that the size of data used during the pre-training of these models is not as crucial to the transferability as the diversity. We noticed that the performance of Indo-European languages is superior to non-Indo- European languages in the evaluated models. We have seen a positive crosslingual transfer of knowledge using monolingual models, which was noticed in all the languages we used but was more evident when the language used during the pre-training was more similar to the downstream task language. The second aspect we investigated in our work is how well these models perform in data imbalance scenarios, where there is a more representative subset in the fine-tuning dataset. Our results showed that data imbalance in fine-tuning generally affects the final result of the models, with better performance in the most representative subsets. However, greater variability in the training set favors model performance for a more representative subset. Nevertheless, this greater variability in the data did not favor languages not seen during training. We also observed that the models seem more robust in dealing with gender imbalance than age or accent. With these findings, we hope to help the scientific community in the use of existing pre-trained models, as well as assist in the pre-training of new models. [pt] AJUSTE FINO [pt] RECONHECIMENTO DE FALA [pt] MODELOS PRE-TREINADOS [en] FINE-TUNING [en] SPEECH RECOGNITION [en] PRE-TRAINED MODELS
12	Fine-tuning a LLM using Reinforcement Learning from Human Feedback for a Therapy Chatbot Application / Finjustering av en LLM med hjälp av förstärkande inlärning från mänsklig återkoppling (eng. RLHF) för en Psykolog-chatbot applikation Bill, Desirée, Eriksson, Theodor January 2023 (has links) The field of AI and machine learning has seen exponential growth in the last decade and even more so in the recent year with the considerable public interest in Large Language models (LLMs) such as chat-GPT. LLMs can be used for several purposes, but one possible application would be fine-tuning a model to perform a particular function in a specific field. The goal is therefore fine-tuning a LLM in the field of psychology using a new method called Reinforcement Learning from Human Feedback to determine if it is a viable method in such cases. The theory behind LLMs and RLHF as well as the ethical perspective on developing a psychological AI is presented. Previous studies on both RLHF and AI in psychology are presented, showing the goal is feasible. Then the method is explained for both training and evaluating the model which is done by comparing a pre-trained model with the fine-tuned one. The study is considered scientifically relevant as RLHF has been used to fine-tune LLMs earlier, but has not been done with the intent to make it more specified in a field. The result did not show any clear difference between the pre-trained and the fine-tuned model therefore, more tests are required. However, with the limitations regarding hardware, time to train, and available data, there is much improvement needed for future studies. An ethical framework applied to a digital psychology assistant is discussed and a suitable introduction to the market and division of responsibilities is proposed. / Området AI och maskininlärning har sett exponentiell tillväxt under det senaste decenniet och ännu mer under det senaste året med det stora allmänintresset för stora språkmodeller som chat-GPT. Stora språkmodeller kan användas till flera saker där en möjlig tillämpning är att finjustera en modell för att fylla en viss funktion inom ett specifikt yrke. Målet med arbetet är därför att finjustera en språkmodell inom området psykologi med hjälp av en ny metod kallad Reinforcement Learning from Human Feedback för att undersöka metodens tillämplighet. Teorin bakom stora språkmodeller och RLHF samt det etiska perspektivet på att utveckla en digital psykologi assistent förklaras. Därefter presenteras tidigare studier om både RLHF och AI inom psykologi som visar att målet är genomförbart. Metoden för att både träna och utvärdera modellen förklaras som görs genom att jämföra den förtränade modellen med den finjusterade. Studien bedöms som vetenskapligt relevant även fast RLHF har använts för att finjustera språkmodeller tidigare, har det inte gjorts med målet att finjustera en språkmodell till ett visst yrke. Resultatet visade inte på någon tydlig skillnad mellan den förtränade och den finjusterade modellen, därför krävs fler tester krävs. Men med de begräsningar som fanns gällande hårdvara, tid att träna och tillgänglig data är det mycket som kan förbättras i framtida studier. Det etiska ramverket applicerat på en digital psykologi assistent diskuteras och en lämplig introduktion till marknaden och ansvarsfördelning föreslås. Ethics Fine-tuning Large Language Models Machine learning Psychology Computer and Information Sciences Data- och informationsvetenskap
13	NLP-Assisted Workflow Improving Bug Ticket Handling Eriksson, Caroline, Kallis, Emilia January 2021 (has links) Software companies spend a lot of resources on debugging, a process where previous solutions can help in solving current problems. The bug tickets, containing this information, are often time-consuming to read. To minimize the time spent on debugging and to make sure that the knowledge from prior solutions is kept in the company, an evaluation was made to see if summaries could make this process more efficient. Abstractive and extractive summarization models were tested for this task and fine-tuning of the bert-extractive-summarizer was performed. The model-generated summaries were compared in terms of perceived quality, speed, similarity to each other, and summarization length. The average description summary contained part of the description needed and the found solution was either well documented or did not answer the problem at all. The fine-tuned extractive model and the abstractive model BART provided good conditions for generating summaries containing all the information needed. / Vid mjukvaruutveckling går mycket resurser åt till felsökning, en process där tidigare lösningar kan hjälpa till att lösa aktuella problem. Det är ofta tidskrävande att läsa felrapporterna som innehåller denna information. För att minimera tiden som läggs på felsökning och säkerställa att kunskap från tidigare lösningar bevaras inom företaget, utvärderades om sammanfattningar skulle kunna effektivisera detta. Abstrakta och extraherande sammanfattningsmodeller testades för uppgiften och en finjustering av bert-extractive- summarizer gjordes. De genererade sammanfattningarna jämfördes i avseende på upplevd kvalitet, genereringshastighet, likhet mellan varandra och sammanfattningslängd. Den genomsnittliga sammanfattningen innehöll delar av den viktigaste informationen och den föreslagna lösningen var antingen väldokumenterad eller besvarade inte problembeskrivningen alls. Den finjusterade BERT och den abstrakta modellen BART visade goda förutsättningar för att generera sammanfattningar innehållande all den viktigaste informationen. BERT Fine-tuning Machine Learning Natural Language Processing Pipeline Text Summarization Workflow Computer and Information Sciences Data- och informationsvetenskap
14	Identifying Sensitive Data using Named Entity Recognition with Large Language Models : A comparison of transformer models fine-tuned for Named Entity Recognition Ström Boman, Alfred January 2024 (has links) Utvecklingen av artificiell intelligens och språkmodeller har ökat drastiskt under de senaste åren vilket medfört både möjligheter såväl som risker. Med en större användning av AI-relaterade produkter och människolika chattbotar har det medfört ett intresse av att kontrollera vilken sorts data som delas med dessa verktyg. Under särskilda omständigheter kan det förekomma data som till exempel information relaterat till personer, som inte får delas. Detta projekt har av denna anledning kretsat kring att använda och jämföra olika system för automatisk namnigenkänning, med målet att förhindra sådan data från att bli delad. I projektet jämfördes tre stycken olika alternativ för att implementera system för namnigenkänning, innan det mest lämpliga alternativet valdes för implementationen. Fortsättningsvis användes de tre förtränade transformer-modellerna GPT-SW3, TinyLlama och Mistral för implementationen där dessa tre blev finjusterade på två olika dataset. Implementationsfasen involverade applicering av tekniker för att öka datastorleken, databearbetning samt modellkvantisering innan de finjusterades för namnigenkänning. En uppsättning av utvärderingsmått bestående av bland annat F1-mått användes därefter för att mäta de tränade modellernas prestanda. De tre modellerna utvärderades och jämfördes med varandra utifrån resultatet från mätningen och träningen. Modellerna uppvisade varierande resultat och prestanda där både över- och underanpassning förekom. Avslutningsvis drogs slutsatsen om att TinyLlama var den bäst presterande modellen utifrån resultatet och övriga kringliggande aspekter. / The development of artificial intelligence and large language models has increased rapidly in recent years, bringing both opportunities and risks. With a broader use of AI related products such as human-like chatbots there has been an increase in interest in controlling the data that is being shared with them. In some scenarios there is data, such as personal or proprietary information, which should not be shared. This project has therefore revolved around utilizing and comparing different Named Entity Recognition systems to prevent such data from being shared. Three different approaches to implement Named Entity Recognition systems were compared before selecting the most appropriate one to further use for the actual implementation. Furthermore, three pre-trained transformer models, GPT-SW3, TinyLlama and Mistral, were used for the implementation where these were fine-tuned on two different datasets. The implementation phase included applying data augmentation techniques, data processing and model quantization before fine-tuning the models on Named Entity Recognition. A set of metrics including precision, recall and F1-score was further used to measure the performances of the trained models. The three models were compared and evaluated against each other based on the results obtained from the measurements and the training. The models showed varying results and performances where both overfitting and underfitting occured. Finally, the TinyLlama model was concluded to be the best model based on the obtained results and other considered aspects. Named Entity Recognition Natural Language Processing Machine Learning Fine-tuning. Namnigenkänning Språkteknologi Maskininlärning Finjustering Software Engineering Programvaruteknik
15	A PLL Design Based on a Standing Wave Resonant Oscillator Karkala, Vinay 2010 August 1900 (has links) In this thesis, we present a new continuously variable high frequency standing wave oscillator and demonstrate its use in generating the phase locked clock signal of a digital IC. The ring based standing wave resonant oscillator is implemented with a plurality of wires connected in a mobius configuration, with a cross coupled inverter pair connected across the wires. The oscillation frequency can be modulated by coarse and fine tuning. Coarse modification is achieved by altering the number of wires in the ring that participate in the oscillation, by driving a digital word to a set of passgates which are connected to each wire in the ring. Fine tuning of the oscillation frequency is achieved by varying the body bias voltage of both the PMOS transistors in the cross coupled inverter pair which sustains the oscillations in the resonant ring. We validated our PLL design in a 90nm process technology. 3D parasitic RLCs for our oscillator ring were extracted with skin effect accounted for. Our PLL provides a frequency locking range from 6 GHz to 9 GHz, with a center frequency of 7.5 GHz. The oscillator alone consumes about 25 mW of power, and the complete PLL consumes a power of 28.5 mW. The observed jitter of the PLL is 2.56 percent. These numbers are significant improvements over the prior art in standing wave based PLLs. Voltage Controlled Oscillator VCO Phase Locked Loop PLL Standing Wave Resonant Oscillator Traveling Wave Resonant Oscillator Fine Tuning Coarse Tuning Transmission line Parasitic Extraction Locking Range Clock distribution
16	Deep Learning for Autonomous Collision Avoidance Strömgren, Oliver January 2018 (has links) Deep learning has been rapidly growing in recent years obtaining excellent results for many computer vision applications, such as image classification and object detection. One aspect for the increased popularity of deep learning is that it mitigates the need for hand-crafted features. This thesis work investigates deep learning as a methodology to solve the problem of autonomous collision avoidance for a small robotic car. To accomplish this, transfer learning is used with the VGG16 deep network pre-trained on ImageNet dataset. A dataset has been collected and then used to fine-tune and validate the network offline. The deep network has been used with the robotic car in a real-time manner. The robotic car sends images to an external computer, which is used for running the network. The predictions from the network is sent back to the robotic car which takes actions based on those predictions. The results show that deep learning has great potential in solving the collision avoidance problem. deep learning collision avoidance convolutional neural networks transfer learning fine-tuning Computer Sciences Datavetenskap (datalogi)
17	Context matters : Classifying Swedish texts using BERT's deep bidirectional word embeddings Holmer, Daniel January 2020 (has links) When classifying texts using a linear classifier, the texts are commonly represented as feature vectors. Previous methods to represent features as vectors have been unable to capture the context of individual words in the texts, in theory leading to a poor representation of natural language. Bidirectional Encoder Representations from Transformers (BERT), uses a multi-headed self-attention mechanism to create deep bidirectional feature representations, able to model the whole context of all words in a sequence. A BERT model uses a transfer learning approach, where it is pre-trained on a large amount of data and can be further fine-tuned for several down-stream tasks. This thesis uses one multilingual, and two dedicated Swedish BERT models, for the task of classifying Swedish texts as of either easy-to-read or standard complexity in their respective domains. The performance on the text classification task using the different models is then compared both with feature representation methods used in earlier studies, as well as with the other BERT models. The results show that all models performed better on the classification task than the previous methods of feature representation. Furthermore, the dedicated Swedish models show better performance than the multilingual model, with the Swedish model pre-trained on more diverse data outperforming the other. NLP text classification BERT feature representation pre-trained language models transformer networks fine-tuning
18	Evaluating Text Summarization Models on Resumes : Investigating the Quality of Generated Resume Summaries and their Suitability as Resume Introductions / Utvärdering av Textsammanfattningsmodeller för CV:n : Undersökning av Kvaliteten på Genererade CV-sammanfattningar och deras Lämplighet som CV-introduktioner Krohn, Amanda January 2023 (has links) This thesis aims to evaluate different abstractive text summarization models and techniques for summarizing resumes. It has two main objectives: investigate the models’ performance on resume summarization and assess the suitability of the generated summaries as resume introductions. Although automatic abstractive text summarization has gained traction in various areas, its application in the resume domain has not yet been explored. Resumes present a unique challenge for abstractive summarization due to their diverse style, content, and length. To address these challenges, three state-of-the-art pre-trained text generation models: BART, T5, and ProphetNet, were selected. Additionally, two approaches that can handle longer resumes were investigated. The first approach, named LongBART, modified the BART architecture by incorporating the Longformer’s self-attention into the encoder. The second approach, named HybridBART, used an extractive-then-abstractive summarization strategy. The models were fine-tuned on a dataset of 653 resume-introduction pairs and were evaluated using automatic metrics as well as two types of human evaluations: a survey and expert interviews. None of the models demonstrated superiority across all criteria and evaluation metrics. However, the survey responses indicated that LongBART showed promising results, receiving the highest scores in three out of five criteria. On the other hand, ProphetNet consistently received the lowest scores across all criteria in the survey, and across all automatic metrics. Expert interviews emphasized that the generated summaries cannot be considered correct summaries due to the presence of hallucinated personal attributes. However, there is potential for using the generated texts as resume introductions, given that measures are taken to ensure the hallucinated personal attributes are sufficiently generic. / Denna avhandling utvärderar olika modeller och tekniker för automatisk textsammanfattning för sammanfattning av CV:n. Avhandlingen har två mål: att undersöka modellernas prestanda på sammanfattning av CV:n och bedöma lämpligheten att använda de genererade sammanfattningar som CV-introduktioner. Även om automatisk abstrakt textsummering har fått fotfäste inom olika sammanhang är dess tillämpning inom CV-domänen ännu outforskad. CV:n utgör en unik utmaning för abstrakt textsammanfattning på grund av deras varierande stil, innehåll och längd. För att hantera dessa utmaningar valdes tre av de främsta förtränade modellerna inom textgenerering: BART, T5 och ProphetNet. Dessutom undersöktes två extra metoder som kan hantera längre CV:n. Det första tillvägagångssättet, kallat LongBART, modifierade BART-arkitekturen genom att inkludera självuppmärksamhet från Longformer-arkitekturen i kodaren. Det andra tillvägagångssättet, kallat HybridBART, använde en extraktiv-sen-abstraktiv sammanfattningsstrategi. Modellerna finjusterades med ett dataset med 653 CV-introduktionspar och utvärderades med hjälp av automatiska mått, samt två typer av mänsklig utvärdering: en enkätundersökning och intervjuer med experter. Ingen av modellerna visade överlägsenhet på alla kriterier och utvärderingsmått. Dock indikerade enkätsvaren att LongBART visade lovande resultat, genom att få högst poäng i tre av fem utvärderingskategorier. Å andra sidan fick ProphetNet lägst poäng i samtliga utvärderingskategorier, samt lägst poäng i alla automatiska mätningar. Expertintervjuer framhävde att de genererade sammanfattningarna inte kan anses vara pålitliga som fristående sammanfattningar på grund av förekomsten av hallucinerade personliga egenskaper. Trots detta finns det potential att använda dessa sammanfattningar som introduktioner, under förutsättningen att åtgärder vidtas för att säkerställa att hallucinerade personliga attribut är tillräckligt generiska. Natural language processing Abstractive text summarization Transformer architecture Fine-tuning Resumes Språkteknologi Abstrakt textsammanfattning Transformer-arkitektur Finjustering CV Computer Sciences Datavetenskap (datalogi)
19	Large Language Models as Advanced Data Preprocessors : Transforming Unstructured Text into Fine-Tuning Datasets Vangeli, Marius January 2024 (has links) The digital landscape increasingly generates vast amounts of unstructured textual data, valuable for analytics and various machine learning (ML) applications. These vast stores of data, often likened to digital gold, are often challenging to process and utilize. Traditional text processing methods, lacking the ability to generalize, typically struggle with unstructured and unlabeled data. For many complex data management workflows, the solution typically involves human intervention in the form of manual curation and labeling — a time-consuming process. Large Language Models (LLMs) are AI models trained on vast amounts of text data. They have remarkable Natural Language Processing (NLP) capabilities and offer a promising alternative. This thesis serves as an empirical case study of LLMs as advanced data preprocessing tools. It explores the effectiveness and limitations of using LLMs to automate and refine traditionally challenging data preprocessing tasks, highlighting a critical area of research in data management. An LLM-based preprocessing pipeline, designed to clean and prepare raw textual data for use in ML applications, is implemented and evaluated. This pipeline was applied to a corpus of unstructured text documents, extracted from PDFs, with the aim of transforming them into a fine-tuning dataset for LLMs. The efficacy of the LLM-based preprocessing pipeline was assessed by comparing the results against a manually curated benchmark dataset using two text similarity metrics: the Levenshtein distance and ROUGE score. The findings indicate that although LLMs are not yet capable of fully replacing human curation in complex data management workflows, they substantially improve the efficiency and manageability of preprocessing unstructured textual data. Large language models LLM Unstructured data fine-tuning preprocessing preparation data wrangling dataset curation Computer Sciences Datavetenskap (datalogi)
20	Optimisation du rendement de production de bioéthanol chez Saccharomyces cerevisiae par minimisation de la synthèse du glycérol : approche intégrée de génie métabolique et microbiologique / Improvement of Saccharomyces cerevisiae bioethanol yield through minimization of glycerol yield : microbiologic and Metabolic engineering integrative approach Pagliardini, Julien 09 July 2010 (has links) Ces travaux visaient à étudier la possibilité de réduire la production de glycérol chezSaccharomyces cerevisiae, afin d’améliorer le rendement éthanol, tout en préservant les capacités decroissance et de production des levures. La production minimale de glycérol nécessaire à la croissancea été déterminée à l'aide d'un modèle de calcul des flux métaboliques. Des souches présentant uneactivité des enzymes de la voie de production du glycérol modulée, afin de s'approcher au plus près del'activité minimale nécessaire estimée in silico, ont été utilisées.Cette stratégie d’ajustement de l’activité de la voie de synthèse du glycérol a permis, encondition aérobie, de réduire de 88 % le rendement glycérol et d'améliorer le rendement éthanol de4,7 % sans modifier la tolérance des mutants à l'éthanol, mais au détriment de la vitesse spécifique decroissance, légèrement réduite. En condition anaérobie, une diminution de 61 % du rendementglycérol et une amélioration de 7 % du rendement éthanol ont pu être obtenues, mais au détriment dela vitesse spécifique de croissance,qui subit une sévère diminution, et de la tolérance à l'éthanol,qui estréduite.L'analyse fine des résultats, grâce à un modèle métabolique, a permis de mettre en évidence,chez les souches mutantes, un besoin accru en énergie, interprété comme la traduction d'une plusgrande difficulté à gérer le stress du procédé et une réorganisation du métabolisme oxydo-réductif,interprétée comme l'impact de la réduction du glycérol sur les voies de réoxydation du cofacteurNADH dans les cellules.Ces résultats ont permis de valider la pertinence de la stratégie de réajustement des fluxmétaboliques, assistée par modélisation stoechiométrique pour l'amélioration des souches, mais aussid'accroître la compréhension du rôle physiologique du glycérol et son intégration au métabolismecellulaire. / This work aimed to assess the possibility of reducing Saccharomyces cerevisiae's glycerolproduction, in order to improve ethanol yield, without altering the abilities of yeasts to grow andproduce ethanol. Minimum glycerol production required for growth was found, thanks to a metabolicflux calculation model. Strains showing a fine tuned activity in the glycerol synthesis pathway enzymeswere used, to get close to the minimum activity established in silico.This fine tuning strategy lead, in aerobiosis, to a 88 % glycerol yield decrease together with a4.7 % ethanol yield increase, with no reduction of mutants'ethanol tolerance, but there is a slightdecrease of the growth rate. In anaerobiosis, a 61 % glycerol yield decrease, together with a 7 %ethanol yield increase were obtained, but mutant strains suffered of a sharp growth rate reduction anda decrease in their ethanol tolerance.A close analysis of the results, with the help of a metabolic model, highlighted both an increaseof mutants' energy requirements, interpreted as an increased difficulty to cope with osmotic stress,and a reorganisation of their oxydo-reductive metabolism, interpreted as glycerol reduction's impacton the NADH cofactor reoxydation pathway.These results validated the relevance of metabolic fine-tuning, assisted with in silicostoichiometric model for strains improvement and they increased the understanding of the integrationof glycerol in cell metabolism as well as its physiological role. Saccharomyces cerevisiae Acides Organiques Amélioration du rendement Glycérol Ethanol Ajustement Métabolique Modélisation Métabolique Ingénierie Métabolique Fed-batch Saccharomyces cerevisiae Fed-batch Organic Acids Yield Improvement Glycerol Ethanol Fine-Tuning Metabolic Modelling Metabolic Engineering

Search results