Spelling suggestions: "subject:"finetuning"" "subject:"feintuning""
11 |
Information Extraction for Test Identification in Repair Reports in the Automotive DomainJie, Huang January 2023 (has links)
The knowledge of tests conducted on a problematic vehicle is essential for enhancing the efficiency of mechanics. Therefore, identifying the tests performed in each repair case is of utmost importance. This thesis explores techniques for extracting data from unstructured repair reports to identify component tests. The main emphasis is on developing a supervised multi-class classifier to categorize data and extract sentences that describe repair diagnoses and actions. It has been shown that incorporating a category-aware contrastive learning objective can improve the repair report classifier’s performance. The proposed approach involves training a sentence representation model based on a pre-trained model using a category-aware contrastive learning objective. Subsequently, the sentence representation model is further trained on the classification task using a loss function that combines the cross-entropy and supervised contrastive learning losses. By applying this method, the macro F1-score on the test set is increased from 90.45 to 90.73. The attempt to enhance the performance of the repair report classifier using a noisy data classifier proves unsuccessful. The noisy data classifier is trained using a prompt-based fine-tuning method, incorporating open-ended questions and two examples in the prompt. This approach achieves an F1-score of 91.09 and the resulting repair report classification datasets are found easier to classify. However, they do not contribute to an improvement in the repair report classifier’s performance. Ultimately, the repair report classifier is utilized to aid in creating the input necessary for identifying component tests. An information retrieval method is used to conduct the test identification. The incorporation of this classifier and the existing labels when creating queries leads to an improvement in the mean average precision at the top 3, 5, and 10 positions by 0.62, 0.81, and 0.35, respectively, although with a slight decrease of 0.14 at the top 1 position.
|
12 |
[en] ASSESSMENT OF FINE-TUNING ON END-TO-END SPEECH RECOGNITION MODELS / [pt] AVALIAÇÃO DE AJUSTE FINO EM MODELOS DE PONTA A PONTA PARA RECONHECIMENTO DE FALAJONATAS DOS SANTOS GROSMAN 04 November 2022 (has links)
[pt] Utilizar representações fornecidas por um grande modelo pré-treinado
tornou-se a principal estratégia para alcançar o estado da arte nas mais variadas
tarefas. Um grande modelo pré-treinado recentemente proposto, wav2vec
2.0, foi seminal para vários outros trabalhos sobre pré-treinamento de grandes
modelos em dados de fala. Muitos modelos estão sendo pré-treinados usando a
mesma arquitetura baseada em transformer que o wav2vec 2.0 e estão obtendo
o estado da arte em várias tarefas relacionadas à fala. No entanto, poucos trabalhos
propuseram maiores análises sobre o comportamento desses modelos
em diferentes cenários de fine-tuning. Nosso trabalho visa analisar esse modelo
sobre dois aspectos diferentes. O primeiro é sobre a transferibilidade entre línguas
desses modelos. Nossos experimentos nos mostraram que o tamanho dos
dados usados durante o pré-treinamento desses modelos não é tão crucial para
a transferibilidade quanto a diversidade. Percebemos que o desempenho das
línguas indo-europeias é superior ao das línguas não indo-europeias nos modelos
avaliados. Vimos uma transferência positiva de conhecimento entre línguas
usando modelos monolinguais, o que foi percebido em todos os idiomas que usamos,
mas foi mais evidente quando o idioma usado durante o pré-treinamento
era mais semelhante ao idioma do fine-tuning. O segundo aspecto que investigamos
em nosso trabalho é quão bem esses modelos se comportam em cenários
de desbalanceamento de dados, onde há um subconjunto mais representativo
no conjunto de dados do fine-tuning. Nossos resultados mostraram que o desbalanceamento
dos dados no fine-tuning geralmente afeta o resultado final dos modelos, com melhor desempenho nos subconjuntos mais representativos. No entanto, uma maior variabilidade no conjunto de treinamento favorece o desempenhodo modelo para um subconjunto mais representativo. Porém essamaior variabilidade nos dados não favoreceu os idiomas não vistos durante o treinamento. Observamos também que os modelos parecem mais robustos em lidar com o desbalanceamento de gênero do que idade ou sotaque. Com esses achados, esperamos ajudar a comunidade científica na utilização de modelos pré-treinados existentes, bem como auxiliar no pré-treinamento de novosmodelos. / [en] Using representations given by a large pre-trained model has become
the primary strategy to reach the state-of-the-art in the most varied tasks. A
recently proposed large pre-trained model, wav2vec 2.0, was seminal for several
other works on pre-training large models on speech data. Many models are
being pre-trained using the same transformer-based architecture as wav2vec
2.0 and are getting state-of-the-art in various speech-related tasks. However,
few works have proposed further analysis of these models in different finetuning
scenarios. Our work investigates these models concerning two different
aspects. The first is about the cross-lingual transferability of these models. Our
experiments showed us that the size of data used during the pre-training of
these models is not as crucial to the transferability as the diversity. We noticed
that the performance of Indo-European languages is superior to non-Indo-
European languages in the evaluated models. We have seen a positive crosslingual
transfer of knowledge using monolingual models, which was noticed
in all the languages we used but was more evident when the language used
during the pre-training was more similar to the downstream task language. The
second aspect we investigated in our work is how well these models perform
in data imbalance scenarios, where there is a more representative subset in
the fine-tuning dataset. Our results showed that data imbalance in fine-tuning
generally affects the final result of the models, with better performance in
the most representative subsets. However, greater variability in the training
set favors model performance for a more representative subset. Nevertheless,
this greater variability in the data did not favor languages not seen during
training. We also observed that the models seem more robust in dealing with
gender imbalance than age or accent. With these findings, we hope to help the
scientific community in the use of existing pre-trained models, as well as assist
in the pre-training of new models.
|
13 |
Fine-tuning a LLM using Reinforcement Learning from Human Feedback for a Therapy Chatbot Application / Finjustering av en LLM med hjälp av förstärkande inlärning från mänsklig återkoppling (eng. RLHF) för en Psykolog-chatbot applikationBill, Desirée, Eriksson, Theodor January 2023 (has links)
The field of AI and machine learning has seen exponential growth in the last decade and even more so in the recent year with the considerable public interest in Large Language models (LLMs) such as chat-GPT. LLMs can be used for several purposes, but one possible application would be fine-tuning a model to perform a particular function in a specific field. The goal is therefore fine-tuning a LLM in the field of psychology using a new method called Reinforcement Learning from Human Feedback to determine if it is a viable method in such cases. The theory behind LLMs and RLHF as well as the ethical perspective on developing a psychological AI is presented. Previous studies on both RLHF and AI in psychology are presented, showing the goal is feasible. Then the method is explained for both training and evaluating the model which is done by comparing a pre-trained model with the fine-tuned one. The study is considered scientifically relevant as RLHF has been used to fine-tune LLMs earlier, but has not been done with the intent to make it more specified in a field. The result did not show any clear difference between the pre-trained and the fine-tuned model therefore, more tests are required. However, with the limitations regarding hardware, time to train, and available data, there is much improvement needed for future studies. An ethical framework applied to a digital psychology assistant is discussed and a suitable introduction to the market and division of responsibilities is proposed. / Området AI och maskininlärning har sett exponentiell tillväxt under det senaste decenniet och ännu mer under det senaste året med det stora allmänintresset för stora språkmodeller som chat-GPT. Stora språkmodeller kan användas till flera saker där en möjlig tillämpning är att finjustera en modell för att fylla en viss funktion inom ett specifikt yrke. Målet med arbetet är därför att finjustera en språkmodell inom området psykologi med hjälp av en ny metod kallad Reinforcement Learning from Human Feedback för att undersöka metodens tillämplighet. Teorin bakom stora språkmodeller och RLHF samt det etiska perspektivet på att utveckla en digital psykologi assistent förklaras. Därefter presenteras tidigare studier om både RLHF och AI inom psykologi som visar att målet är genomförbart. Metoden för att både träna och utvärdera modellen förklaras som görs genom att jämföra den förtränade modellen med den finjusterade. Studien bedöms som vetenskapligt relevant även fast RLHF har använts för att finjustera språkmodeller tidigare, har det inte gjorts med målet att finjustera en språkmodell till ett visst yrke. Resultatet visade inte på någon tydlig skillnad mellan den förtränade och den finjusterade modellen, därför krävs fler tester krävs. Men med de begräsningar som fanns gällande hårdvara, tid att träna och tillgänglig data är det mycket som kan förbättras i framtida studier. Det etiska ramverket applicerat på en digital psykologi assistent diskuteras och en lämplig introduktion till marknaden och ansvarsfördelning föreslås.
|
14 |
NLP-Assisted Workflow Improving Bug Ticket HandlingEriksson, Caroline, Kallis, Emilia January 2021 (has links)
Software companies spend a lot of resources on debugging, a process where previous solutions can help in solving current problems. The bug tickets, containing this information, are often time-consuming to read. To minimize the time spent on debugging and to make sure that the knowledge from prior solutions is kept in the company, an evaluation was made to see if summaries could make this process more efficient. Abstractive and extractive summarization models were tested for this task and fine-tuning of the bert-extractive-summarizer was performed. The model-generated summaries were compared in terms of perceived quality, speed, similarity to each other, and summarization length. The average description summary contained part of the description needed and the found solution was either well documented or did not answer the problem at all. The fine-tuned extractive model and the abstractive model BART provided good conditions for generating summaries containing all the information needed. / Vid mjukvaruutveckling går mycket resurser åt till felsökning, en process där tidigare lösningar kan hjälpa till att lösa aktuella problem. Det är ofta tidskrävande att läsa felrapporterna som innehåller denna information. För att minimera tiden som läggs på felsökning och säkerställa att kunskap från tidigare lösningar bevaras inom företaget, utvärderades om sammanfattningar skulle kunna effektivisera detta. Abstrakta och extraherande sammanfattningsmodeller testades för uppgiften och en finjustering av bert-extractive- summarizer gjordes. De genererade sammanfattningarna jämfördes i avseende på upplevd kvalitet, genereringshastighet, likhet mellan varandra och sammanfattningslängd. Den genomsnittliga sammanfattningen innehöll delar av den viktigaste informationen och den föreslagna lösningen var antingen väldokumenterad eller besvarade inte problembeskrivningen alls. Den finjusterade BERT och den abstrakta modellen BART visade goda förutsättningar för att generera sammanfattningar innehållande all den viktigaste informationen.
|
15 |
Identifying Sensitive Data using Named Entity Recognition with Large Language Models : A comparison of transformer models fine-tuned for Named Entity RecognitionStröm Boman, Alfred January 2024 (has links)
Utvecklingen av artificiell intelligens och språkmodeller har ökat drastiskt under de senaste åren vilket medfört både möjligheter såväl som risker. Med en större användning av AI-relaterade produkter och människolika chattbotar har det medfört ett intresse av att kontrollera vilken sorts data som delas med dessa verktyg. Under särskilda omständigheter kan det förekomma data som till exempel information relaterat till personer, som inte får delas. Detta projekt har av denna anledning kretsat kring att använda och jämföra olika system för automatisk namnigenkänning, med målet att förhindra sådan data från att bli delad. I projektet jämfördes tre stycken olika alternativ för att implementera system för namnigenkänning, innan det mest lämpliga alternativet valdes för implementationen. Fortsättningsvis användes de tre förtränade transformer-modellerna GPT-SW3, TinyLlama och Mistral för implementationen där dessa tre blev finjusterade på två olika dataset. Implementationsfasen involverade applicering av tekniker för att öka datastorleken, databearbetning samt modellkvantisering innan de finjusterades för namnigenkänning. En uppsättning av utvärderingsmått bestående av bland annat F1-mått användes därefter för att mäta de tränade modellernas prestanda. De tre modellerna utvärderades och jämfördes med varandra utifrån resultatet från mätningen och träningen. Modellerna uppvisade varierande resultat och prestanda där både över- och underanpassning förekom. Avslutningsvis drogs slutsatsen om att TinyLlama var den bäst presterande modellen utifrån resultatet och övriga kringliggande aspekter. / The development of artificial intelligence and large language models has increased rapidly in recent years, bringing both opportunities and risks. With a broader use of AI related products such as human-like chatbots there has been an increase in interest in controlling the data that is being shared with them. In some scenarios there is data, such as personal or proprietary information, which should not be shared. This project has therefore revolved around utilizing and comparing different Named Entity Recognition systems to prevent such data from being shared. Three different approaches to implement Named Entity Recognition systems were compared before selecting the most appropriate one to further use for the actual implementation. Furthermore, three pre-trained transformer models, GPT-SW3, TinyLlama and Mistral, were used for the implementation where these were fine-tuned on two different datasets. The implementation phase included applying data augmentation techniques, data processing and model quantization before fine-tuning the models on Named Entity Recognition. A set of metrics including precision, recall and F1-score was further used to measure the performances of the trained models. The three models were compared and evaluated against each other based on the results obtained from the measurements and the training. The models showed varying results and performances where both overfitting and underfitting occured. Finally, the TinyLlama model was concluded to be the best model based on the obtained results and other considered aspects.
|
16 |
A PLL Design Based on a Standing Wave Resonant OscillatorKarkala, Vinay 2010 August 1900 (has links)
In this thesis, we present a new continuously variable high frequency standing wave oscillator
and demonstrate its use in generating the phase locked clock signal of a digital IC.
The ring based standing wave resonant oscillator is implemented with a plurality of wires
connected in a mobius configuration, with a cross coupled inverter pair connected across
the wires. The oscillation frequency can be modulated by coarse and fine tuning. Coarse
modification is achieved by altering the number of wires in the ring that participate in the
oscillation, by driving a digital word to a set of passgates which are connected to each wire
in the ring. Fine tuning of the oscillation frequency is achieved by varying the body bias
voltage of both the PMOS transistors in the cross coupled inverter pair which sustains the
oscillations in the resonant ring. We validated our PLL design in a 90nm process technology.
3D parasitic RLCs for our oscillator ring were extracted with skin effect accounted for.
Our PLL provides a frequency locking range from 6 GHz to 9 GHz, with a center frequency
of 7.5 GHz. The oscillator alone consumes about 25 mW of power, and the complete PLL
consumes a power of 28.5 mW. The observed jitter of the PLL is 2.56 percent. These numbers
are significant improvements over the prior art in standing wave based PLLs.
|
17 |
Deep Learning for Autonomous Collision AvoidanceStrömgren, Oliver January 2018 (has links)
Deep learning has been rapidly growing in recent years obtaining excellent results for many computer vision applications, such as image classification and object detection. One aspect for the increased popularity of deep learning is that it mitigates the need for hand-crafted features. This thesis work investigates deep learning as a methodology to solve the problem of autonomous collision avoidance for a small robotic car. To accomplish this, transfer learning is used with the VGG16 deep network pre-trained on ImageNet dataset. A dataset has been collected and then used to fine-tune and validate the network offline. The deep network has been used with the robotic car in a real-time manner. The robotic car sends images to an external computer, which is used for running the network. The predictions from the network is sent back to the robotic car which takes actions based on those predictions. The results show that deep learning has great potential in solving the collision avoidance problem.
|
18 |
Context matters : Classifying Swedish texts using BERT's deep bidirectional word embeddingsHolmer, Daniel January 2020 (has links)
When classifying texts using a linear classifier, the texts are commonly represented as feature vectors. Previous methods to represent features as vectors have been unable to capture the context of individual words in the texts, in theory leading to a poor representation of natural language. Bidirectional Encoder Representations from Transformers (BERT), uses a multi-headed self-attention mechanism to create deep bidirectional feature representations, able to model the whole context of all words in a sequence. A BERT model uses a transfer learning approach, where it is pre-trained on a large amount of data and can be further fine-tuned for several down-stream tasks. This thesis uses one multilingual, and two dedicated Swedish BERT models, for the task of classifying Swedish texts as of either easy-to-read or standard complexity in their respective domains. The performance on the text classification task using the different models is then compared both with feature representation methods used in earlier studies, as well as with the other BERT models. The results show that all models performed better on the classification task than the previous methods of feature representation. Furthermore, the dedicated Swedish models show better performance than the multilingual model, with the Swedish model pre-trained on more diverse data outperforming the other.
|
19 |
Evaluating Text Summarization Models on Resumes : Investigating the Quality of Generated Resume Summaries and their Suitability as Resume Introductions / Utvärdering av Textsammanfattningsmodeller för CV:n : Undersökning av Kvaliteten på Genererade CV-sammanfattningar och deras Lämplighet som CV-introduktionerKrohn, Amanda January 2023 (has links)
This thesis aims to evaluate different abstractive text summarization models and techniques for summarizing resumes. It has two main objectives: investigate the models’ performance on resume summarization and assess the suitability of the generated summaries as resume introductions. Although automatic abstractive text summarization has gained traction in various areas, its application in the resume domain has not yet been explored. Resumes present a unique challenge for abstractive summarization due to their diverse style, content, and length. To address these challenges, three state-of-the-art pre-trained text generation models: BART, T5, and ProphetNet, were selected. Additionally, two approaches that can handle longer resumes were investigated. The first approach, named LongBART, modified the BART architecture by incorporating the Longformer’s self-attention into the encoder. The second approach, named HybridBART, used an extractive-then-abstractive summarization strategy. The models were fine-tuned on a dataset of 653 resume-introduction pairs and were evaluated using automatic metrics as well as two types of human evaluations: a survey and expert interviews. None of the models demonstrated superiority across all criteria and evaluation metrics. However, the survey responses indicated that LongBART showed promising results, receiving the highest scores in three out of five criteria. On the other hand, ProphetNet consistently received the lowest scores across all criteria in the survey, and across all automatic metrics. Expert interviews emphasized that the generated summaries cannot be considered correct summaries due to the presence of hallucinated personal attributes. However, there is potential for using the generated texts as resume introductions, given that measures are taken to ensure the hallucinated personal attributes are sufficiently generic. / Denna avhandling utvärderar olika modeller och tekniker för automatisk textsammanfattning för sammanfattning av CV:n. Avhandlingen har två mål: att undersöka modellernas prestanda på sammanfattning av CV:n och bedöma lämpligheten att använda de genererade sammanfattningar som CV-introduktioner. Även om automatisk abstrakt textsummering har fått fotfäste inom olika sammanhang är dess tillämpning inom CV-domänen ännu outforskad. CV:n utgör en unik utmaning för abstrakt textsammanfattning på grund av deras varierande stil, innehåll och längd. För att hantera dessa utmaningar valdes tre av de främsta förtränade modellerna inom textgenerering: BART, T5 och ProphetNet. Dessutom undersöktes två extra metoder som kan hantera längre CV:n. Det första tillvägagångssättet, kallat LongBART, modifierade BART-arkitekturen genom att inkludera självuppmärksamhet från Longformer-arkitekturen i kodaren. Det andra tillvägagångssättet, kallat HybridBART, använde en extraktiv-sen-abstraktiv sammanfattningsstrategi. Modellerna finjusterades med ett dataset med 653 CV-introduktionspar och utvärderades med hjälp av automatiska mått, samt två typer av mänsklig utvärdering: en enkätundersökning och intervjuer med experter. Ingen av modellerna visade överlägsenhet på alla kriterier och utvärderingsmått. Dock indikerade enkätsvaren att LongBART visade lovande resultat, genom att få högst poäng i tre av fem utvärderingskategorier. Å andra sidan fick ProphetNet lägst poäng i samtliga utvärderingskategorier, samt lägst poäng i alla automatiska mätningar. Expertintervjuer framhävde att de genererade sammanfattningarna inte kan anses vara pålitliga som fristående sammanfattningar på grund av förekomsten av hallucinerade personliga egenskaper. Trots detta finns det potential att använda dessa sammanfattningar som introduktioner, under förutsättningen att åtgärder vidtas för att säkerställa att hallucinerade personliga attribut är tillräckligt generiska.
|
20 |
Large Language Models as Advanced Data Preprocessors : Transforming Unstructured Text into Fine-Tuning DatasetsVangeli, Marius January 2024 (has links)
The digital landscape increasingly generates vast amounts of unstructured textual data, valuable for analytics and various machine learning (ML) applications. These vast stores of data, often likened to digital gold, are often challenging to process and utilize. Traditional text processing methods, lacking the ability to generalize, typically struggle with unstructured and unlabeled data. For many complex data management workflows, the solution typically involves human intervention in the form of manual curation and labeling — a time-consuming process. Large Language Models (LLMs) are AI models trained on vast amounts of text data. They have remarkable Natural Language Processing (NLP) capabilities and offer a promising alternative. This thesis serves as an empirical case study of LLMs as advanced data preprocessing tools. It explores the effectiveness and limitations of using LLMs to automate and refine traditionally challenging data preprocessing tasks, highlighting a critical area of research in data management. An LLM-based preprocessing pipeline, designed to clean and prepare raw textual data for use in ML applications, is implemented and evaluated. This pipeline was applied to a corpus of unstructured text documents, extracted from PDFs, with the aim of transforming them into a fine-tuning dataset for LLMs. The efficacy of the LLM-based preprocessing pipeline was assessed by comparing the results against a manually curated benchmark dataset using two text similarity metrics: the Levenshtein distance and ROUGE score. The findings indicate that although LLMs are not yet capable of fully replacing human curation in complex data management workflows, they substantially improve the efficiency and manageability of preprocessing unstructured textual data.
|
Page generated in 0.0806 seconds