Spelling suggestions: "subject:"finetuning."" "subject:"detuning.""
21 |
Optimisation du rendement de production de bioéthanol chez Saccharomyces cerevisiae par minimisation de la synthèse du glycérol : approche intégrée de génie métabolique et microbiologique / Improvement of Saccharomyces cerevisiae bioethanol yield through minimization of glycerol yield : microbiologic and Metabolic engineering integrative approachPagliardini, Julien 09 July 2010 (has links)
Ces travaux visaient à étudier la possibilité de réduire la production de glycérol chezSaccharomyces cerevisiae, afin d’améliorer le rendement éthanol, tout en préservant les capacités decroissance et de production des levures. La production minimale de glycérol nécessaire à la croissancea été déterminée à l'aide d'un modèle de calcul des flux métaboliques. Des souches présentant uneactivité des enzymes de la voie de production du glycérol modulée, afin de s'approcher au plus près del'activité minimale nécessaire estimée in silico, ont été utilisées.Cette stratégie d’ajustement de l’activité de la voie de synthèse du glycérol a permis, encondition aérobie, de réduire de 88 % le rendement glycérol et d'améliorer le rendement éthanol de4,7 % sans modifier la tolérance des mutants à l'éthanol, mais au détriment de la vitesse spécifique decroissance, légèrement réduite. En condition anaérobie, une diminution de 61 % du rendementglycérol et une amélioration de 7 % du rendement éthanol ont pu être obtenues, mais au détriment dela vitesse spécifique de croissance,qui subit une sévère diminution, et de la tolérance à l'éthanol,qui estréduite.L'analyse fine des résultats, grâce à un modèle métabolique, a permis de mettre en évidence,chez les souches mutantes, un besoin accru en énergie, interprété comme la traduction d'une plusgrande difficulté à gérer le stress du procédé et une réorganisation du métabolisme oxydo-réductif,interprétée comme l'impact de la réduction du glycérol sur les voies de réoxydation du cofacteurNADH dans les cellules.Ces résultats ont permis de valider la pertinence de la stratégie de réajustement des fluxmétaboliques, assistée par modélisation stoechiométrique pour l'amélioration des souches, mais aussid'accroître la compréhension du rôle physiologique du glycérol et son intégration au métabolismecellulaire. / This work aimed to assess the possibility of reducing Saccharomyces cerevisiae's glycerolproduction, in order to improve ethanol yield, without altering the abilities of yeasts to grow andproduce ethanol. Minimum glycerol production required for growth was found, thanks to a metabolicflux calculation model. Strains showing a fine tuned activity in the glycerol synthesis pathway enzymeswere used, to get close to the minimum activity established in silico.This fine tuning strategy lead, in aerobiosis, to a 88 % glycerol yield decrease together with a4.7 % ethanol yield increase, with no reduction of mutants'ethanol tolerance, but there is a slightdecrease of the growth rate. In anaerobiosis, a 61 % glycerol yield decrease, together with a 7 %ethanol yield increase were obtained, but mutant strains suffered of a sharp growth rate reduction anda decrease in their ethanol tolerance.A close analysis of the results, with the help of a metabolic model, highlighted both an increaseof mutants' energy requirements, interpreted as an increased difficulty to cope with osmotic stress,and a reorganisation of their oxydo-reductive metabolism, interpreted as glycerol reduction's impacton the NADH cofactor reoxydation pathway.These results validated the relevance of metabolic fine-tuning, assisted with in silicostoichiometric model for strains improvement and they increased the understanding of the integrationof glycerol in cell metabolism as well as its physiological role.
|
22 |
Extractive Multi-document Summarization of News ArticlesGrant, Harald January 2019 (has links)
Publicly available data grows exponentially through web services and technological advancements. To comprehend large data-streams multi-document summarization (MDS) can be used. In this research, the area of multi-document summarization is investigated. Multiple systems for extractive multi-document summarization are implemented using modern techniques, in the form of the pre-trained BERT language model for word embeddings and sentence classification. This is combined with well proven techniques, in the form of the TextRank ranking algorithm, the Waterfall architecture and anti-redundancy filtering. The systems are evaluated on the DUC-2002, 2006 and 2007 datasets using the ROUGE metric. Where the results show that the BM25 sentence representation implemented in the TextRank model using the Waterfall architecture and an anti-redundancy technique outperforms the other implementations, providing competitive results with other state-of-the-art systems. A cohesive model is derived from the leading system and tried in a user study using a real-world application. The user study is conducted using a real-time news detection application with users from the news-domain. The study shows a clear favour for cohesive summaries in the case of extractive multi-document summarization. Where the cohesive summary is preferred in the majority of cases.
|
23 |
Charakterizace chodců ve videu / Pedestrian Attribute AnalysisStudená, Zuzana January 2019 (has links)
This work deals with obtaining pedestrian information, which are captured by static, external cameras located in public, outdoor or indoor spaces. The aim is to obtain as much information as possible. Information such as gender, age and type of clothing, accessories, fashion style, or overall personality are obtained using using convolutional neural networks. One part of the work consists of creating a new dataset that captures pedestrians and includes information about the person's sex, age, and fashion style. Another part of the thesis is the design and implementation of convolutional neural networks, which classify the mentioned pedestrian characteristics. Neural networks evaluate pedestrian input images in PETA, FashionStyle14 and BUT Pedestrian Attributes datasets. Experiments performed over the PETA and FashionStyle datasets compare my results to various convolutional neural networks described in publications. Further experiments are shown on created BUT data set of pedestrian attributes.
|
24 |
Dialogue systems based on pre-trained language modelsZeng, Yan 07 1900 (has links)
Les modèles de langue pré-entraînés ont montré leur efficacité dans beaucoup de tâches de traitement de la langue naturelle. Ces modèles peuvent capter des régularités générales d'une langue à partir d'un grand ensemble de textes, qui sont utiles dans la plupart des applications en traitement de langue naturelle. Dans ce mémoire, nous étudions les problèmes de dialogue, i.e. générer une réponse à un énoncé de l'utilisateur. Nous exploitons les modèles de langue pré-entraînés pour traiter différents aspects des systèmes de dialogue.
Premièrement, les modèles de langue pré-entraînés sont entraînés and utilisés dans les systèmes de dialogue de différentes façons. Il n'est pas clair quelle façon est la plus appropriée. Pour le dialogue orienté-tâche, l’approche de l'état de l'art pour le suivi de l'état de dialogue (Dialogue State Tracking) utilise BERT comme encodeur et empile un autre réseau de neurones récurrent (RNN) sur les sorties de BERT comme décodeur. Dans ce cas, seul l'encodeur peut bénéficier des modèles de langue pré-entraînés. Dans la première partie de ce mémoire, nous proposons une méthode qui utilise un seul modèle BERT pour l'encodeur et le décodeur, permettant ainsi un ajustement de paramètres plus efficace. Notre méthode atteint une performance qui dépasse l'état de l'art.
Pour la tâche de génération de réponses dans un chatbot, nous comparons 4 approches communément utilisées. Elles sont basées sur des modèles pré-entraînés et utilisent des objectifs et des mécanismes d'attention différents.
En nous appuyant sur des expérimentations, nous observons l'impact de deux types de disparité qui sont largement ignorées dans la littérature: disparité entre pré-entraînement et peaufinage, et disparité entre peaufinage et génération de réponse. Nous montrons que l'impact de ces disparités devient évident quand le volume de données d’entraînement est limité. Afin de remédier à ce problème, nous proposons deux méthodes qui réduisent les disparités, permettant d'améliorer la performance.
Deuxièmement, même si les méthodes basées sur des modèles pré-entraînés ont connu de grands succès en dialogue général, nous devons de plus en plus traiter le problème de dialogue conditionné, c'est-à-dire dialogue en relation à une certaine condition (qui peut désigner un personnage, un sujet, etc.). Des chercheurs se sont aussi intéressés aux systèmes de chatbot avec des habiletés de conversation multiples, i.e. chatbot capable de confronter différentes situations de dialogues conditionnés. Ainsi, dans la seconde partie de ce mémoire, nous étudions le problème de génération de dialogue conditionné. D'abord, nous proposons une méthode générale qui exploite non seulement des données de dialogues conditionnées, mais aussi des données non-dialogues (textes) conditionnées. Ces dernières sont beaucoup plus faciles à acquérir en pratique. Ceci nous permet d'atténuer le problème de rareté de données. Ensuite, nous proposons des méthodes qui utilisent le concept d'adaptateur proposé récemment dans la littérature. Un adaptateur permet de renforcer un système de dialogue général en lui donnant une habileté spécifique. Nous montrons que les adaptateurs peuvent encoder des habiletés de dialogue conditionné de façon stricte ou flexible, tout en utilisant seulement 6% plus de paramètres.
Ce mémoire contient 4 travaux sur deux grands problèmes de dialogue: l'architecture inhérente du modèle de dialogue basé sur des modèles de langue pré-entraînés, et l'enrichissement d'un système de dialogue général pour avoir des habiletés spécifiques. Ces travaux non seulement nous permettent d'obtenir des performances dépassant de l'état de l'art, mais aussi soulignent l'importance de concevoir l'architecture du modèle pour bien correspondre à la tâche, plutôt que simplement augmenter le volume de données d'entraînement et la puissance de calcul brute. / Pre-trained language models (LMs) have shown to be effective in many NLP tasks. They can capture general language regularities from a large amount of texts, which are useful for most applications related to natural languages. In this thesis, we study the problems of dialogue, i.e. to generate a response to a user's utterance. We exploit pre-trained language models to deal with different aspects of dialogue systems.
First, pre-trained language models have been trained and used in different ways in dialogue systems and it is unclear what is the best way to use pre-trained language models in dialogue. For task-oriented dialogue systems, the state-of-the-art framework for Dialogue State Tracking (DST) uses BERT as the encoder and stacks an RNN upon BERT outputs as the decoder. Pre-trained language models are only leveraged for the encoder. In the first part of the thesis, we investigate methods using a single BERT model for both the encoder and the decoder, allowing for more effective parameter updating. Our method achieves new state-of-the-art performance.
For the task of response generation in generative chatbot systems, we further compare the 4 commonly used frameworks based on pre-trained LMs, which use different training objectives and attention mechanisms. Through extensive experiments, we observe the impact of two types of discrepancy: pretrain-finetune discrepancy and finetune-generation discrepancy (i.e. differences between pre-training and fine-tuning, and between fine-tuning and generation), which have not been paid attention to. We show that the impact of the discrepancies will surface when limited amount of training data is available. To alleviate the problem, we propose two methods to reduce discrepancies, yielding improved performance.
Second, even though pre-training based methods have shown excellent performance in general dialogue generation, we are more and more faced with the problem of conditioned conversation, i.e. conversation in relation with some condition (persona, topic, etc.). Researchers are also interested in multi-skill chatbot systems, namely equipping a chatbot with abilities to confront different conditioned generation tasks. Therefore, in the second part of the thesis, we investigate the problem of conditioned dialogue generation. First, we propose a general method that leverages not only conditioned dialogue data, but also conditioned non-dialogue text data, which are much easier to collect, in order to alleviate the data scarcity issue of conditioned dialogue generation. Second, the concept of Adapter has been recently proposed, which adapts a general dialogue system to enhance some dialogue skill. We investigate the ways to learn a dialogue skill. We show that Adapter has enough capacity to model a dialogue skill for either loosely-conditioned or strictly-conditioned response generation, while using only 6% more parameters.
This thesis contains 4 pieces of work relating to the two general problems in dialogue systems: the inherent architecture for dialogue systems based on pre-trained LMs, and enhancement of a general dialogue system for some specific skills. The studies not only propose new approaches that outperform the current state of the art, but also stress the importance of carefully designing the model architecture to fit the task, instead of simply increasing the amount of training data and the raw computation power.
|
25 |
Nivåbedömning i oktavband: Är det rimligt vid hörapparatanpassning? / Level evaluation in octave bands: Is it reasonable when fitting hearing aids?Stolt, Petter, Wahlsten, Markus January 2023 (has links)
Bakgrund: Finjusteringar av hörapparatens förstärkning görs för att validera förstärkningen. Patientens förmåga att kategorisera ljudbilden ligger till grund för de justeringar som görs. Syfte: Att utvärdera en praktiknära metod för finjustering av hörapparater. Metod: Deltagarna (N = 18) fick lyssna på och bedöma ljudbilden för ett talmaterial med slumpade nivåmodifieringar i oktavbandet 4 kHz. Försöksledaren korrigerade ljudbilden utifrån deltagarnas nivåbedömning, till dess att deltagarna upplevde att ljudbilden var naturlig. Deltagarna fick efter halva undersökningen, som intervention, lyssna på en genomgång som förklarade och jämförde de olika ljudbilderna. Resultat: Deltagarnas nivåbedömningar ledde till korrigeringar i oktavbandet som var statistiskt signifikanta, men en normalisering av oktavbandet uppnåddes inte. Efter genomgången kunde fler nivåmodifikationer korrigeras med en statistiskt signifikant skillnad. Nivåmodifikationer som kan kategoriseras som metalliska/skarpa ledde oftare till en statistiskt signifikant korrigering, än nivåmodifikationer som kan kategoriseras som otydliga/dova. Slutsatser: Om finjusteringar av hörapparaterna görs, bör audionomen ha klart för sig att det kan behövas större nivåförändringar i större frekvensband, för att patienten ska ha möjlighet att uppleva en skillnad i ljudbilden i en klinisk miljö. / Background: Fine-tuning of the hearing aid amplification is done to validate the amplification. The patient's ability to describe the sound quality lays as a foundation for the fine-tuning. Aim: To evaluate a practice-oriented method for fine-tuning hearing aids. Methods: The participants (N = 18) listened to and evaluated the sound quality for a speech-material with randomized level modifications in the octave band 4 kHz. The sound quality was adjusted according to the participants' evaluation, until a normalized sound quality was perceived by the participants. Halfway through the examination the participants, as an intervention, listened to a briefing which explained and compared differences in the different sound qualities. Results: The participants level evaluation led to adjustment in the octave band that was statistically significant, but a normalization of the octave band could not be achieved. After the briefing a larger number of level modifications were adjusted with a statistical significance. Level modifications which were categorized as metallic/sharp more often led to a statistically significant adjustment compared to level modifications categorized as unclear/dull. Conclusions: If fine-tuning of hearing aids is done, the audiologist should be aware that bigger level adjustment in broad bands might be needed, for the patient to be able to notice a difference in sound quality in a clinical setting.
|
26 |
Fine-Tuning Pre-Trained Language Models for CEFR-Level and Keyword Conditioned Text Generation : A comparison between Google’s T5 and OpenAI’s GPT-2 / Finjustering av förtränade språkmodeller för CEFR-nivå och nyckelordsbetingad textgenerering : En jämförelse mellan Googles T5 och OpenAIs GPT-2Roos, Quintus January 2022 (has links)
This thesis investigates the possibilities of conditionally generating English sentences based on keywords-framing content and different difficulty levels of vocabulary. It aims to contribute to the field of Conditional Text Generation (CTG), a type of Natural Language Generation (NLG), where the process of creating text is based on a set of conditions. These conditions include words, topics, content or perceived sentiments. Specifically, it compares the performances of two well-known model architectures: Sequence-toSequence (Seq2Seq) and Autoregressive (AR). These are applied to two different tasks, individual and combined. The Common European Framework of Reference (CEFR) is used to assess the vocabulary level of the texts. In the absence of openly available CEFR-labelled datasets, the author has developed a new methodology with the host company to generate suitable datasets. The generated texts are evaluated on accuracy of the vocabulary levels and readability using readily available formulas. The analysis combines four established readability metrics, and assesses classification accuracy. Both models show a high degree of accuracy when classifying texts into different CEFR-levels. However, the same models are weaker when generating sentences based on a desired CEFR-level. This study contributes empirical evidence suggesting that: (1) Seq2Seq models have a higher accuracy than AR models in generating English sentences based on a desired CEFR-level and keywords; (2) combining Multi-Task Learning (MTL) with instructiontuning is an effective way to fine-tune models on text-classification tasks; and (3) it is difficult to assess the quality of computer generated language using only readability metrics. / I den här studien undersöks möjligheterna att villkorligt generera engelska meningar på så-kallad “naturligt” språk, som baseras på nyckelord, innehåll och vokabulärnivå. Syftet är att bidra till området betingad textgenerering, en underkategori av naturlig textgenerering, vilket är en metod för att skapa text givet vissa ingångsvärden, till exempel ämne, innehåll eller uppfattning. I synnerhet jämförs prestandan hos två välkända modellarkitekturer: sekvenstill-sekvens (Seq2Seq) och autoregressiv (AR). Dessa tillämpas på två uppgifter, såväl individuellt som kombinerat. Den europeiska gemensamma referensramen (CEFR) används för att bedöma texternas vokabulärnivå. I och med avsaknaden av öppet tillgängliga CEFR-märkta dataset har författaren tillsammans med värdföretaget utvecklat en ny metod för att generera lämpliga dataset. De av modellerna genererade texterna utvärderas utifrån vokabulärnivå och läsbarhet samt hur väl de uppfyller den sökta CEFRnivån. Båda modellerna visade en hög träffsäkerhet när de klassificerar texter i olika CEFR-nivåer. Dock uppvisade samma modeller en sämre förmåga att generera meningar utifrån en önskad CEFR-nivå. Denna studie bidrar med empiriska bevis som tyder på: (1) att Seq2Seq-modeller har högre träffsäkerhet än AR-modeller när det gäller att generera engelska meningar utifrån en önskad CEFR-nivå och nyckelord; (2) att kombinera inlärning av multipla uppgifter med instruktionsjustering är ett effektivt sätt att finjustera modeller för textklassificering; (3) att man inte kan bedömma kvaliteten av datorgenererade meningar genom att endast använda läsbarhetsmått.
|
27 |
Stora språkmodeller för bedömning av applikationsrecensioner : Implementering och undersökning av stora språkmodeller för att sammanfatta, extrahera och analysera nyckelinformation från användarrecensioner / Large Language Models for application review data : Implementation survey of Large Language Models (LLM) to summarize, extract, and analyze key information from user reviewsvon Reybekiel, Algot, Wennström, Emil January 2024 (has links)
Manuell granskning av användarrecensioner för att extrahera relevant informationkan vara en tidskrävande process. Denna rapport har undersökt om stora språkmodeller kan användas för att sammanfatta, extrahera och analysera nyckelinformation från recensioner, samt hur en sådan applikation kan konstrueras. Det visade sig att olika modeller presterade olika bra beroende på mätvärden ochviktning mellan recall och precision. Vidare visade det sig att fine-tuning av språkmodeller som Llama 3 förbättrade prestationen vid klassifikation av användbara recensioner och ledde, enligt vissa mätvärden, till högre prestation än större språkmodeller som Chat-Bison. För engelskt översatta recensioner hade Llama 3:8b:Instruct, Chat-Bison samt den fine-tunade versionen av Llama 3:8b ett F4-makro-score på 0.89, 0.90 och 0.91 respektive. Ytterligare ett resultat är att de större modellerna Chat-Bison, Text-Bison och Gemini, presterade bättre i fallet för generering av sammanfattande texter, än de mindre modeller som testades vid inmatning av flertalet recensioner åt gången. Generellt sett presterade språkmodellerna också bättre om recensioner först översattes till engelska innan bearbetning, snarare än då recensionerna var skrivna i originalspråk där de majoriteten av recensionerna var skrivna på svenska. En annan lärdom från förbearbetning av recensioner är att antal anrop till dessa språkmodeller kan minimeras genom att filtrera utifrån ordlängd och betyg. Utöver språkmodeller visade resultaten att användningen av vektordatabaser och embeddings kan ge en större överblick över användbara recensioner genom vektordatabasers inbyggda förmåga att hitta semantiska likheter och samla liknande recensioner i kluster. / Manually reviewing user reviews to extract relevant information can be a time consuming process. This report investigates if large language models can be used to summarize, extract, and analyze key information from reviews, and how such anapplication can be constructed. It was discovered that different models exhibit varying degrees of performance depending on the metrics and the weighting between recall and precision. Furthermore, fine-tuning of language models such as Llama 3 was found to improve performance in classifying useful reviews and, according to some metrics, led to higher performance than larger language models like Chat-bison. Specifically, for English translated reviews, Llama 3:8b:Instruct, Chat-bison, and Llama 3:8b fine-tuned had an F4 macro score 0.89, 0.90, 0.91 respectively. A further finding is that the larger models, Chat-Bison, Text-Bison, and Gemini performed better than the smaller models that was tested, when inputting multiple reviews at a time in the case of summary text generation. In general, language models performed better if reviews were first translated into English before processing rather than when reviews were written in the original language where most reviews were written in Swedish. Additionally, another insight from the pre-processing phase, is that the number of API-calls to these language models can be minimized by filtering based on word length and rating. In addition to findings related to language models, the results also demonstrated that the use of vector databases and embeddings can provide a greater overview of reviews by leveraging the databases’ built-in ability to identify semantic similarities and cluster similar reviews together.
|
28 |
Automatic text summarization of French judicial data with pre-trained language models, evaluated by content and factuality metricsAdler, Malo January 2024 (has links)
During an investigation carried out by a police officer or a gendarme, audition reports are written, the length of which can be up to several pages. The high-level goal of this thesis is to study various automatic and reliable text summarization methods to help with this time-consuming task. One challenge comes from the specific, French and judicial data that we wish to summarize; and another challenge comes from the need for reliable and factual models. First, this thesis focuses on automatic summarization evaluation, in terms of both content (how well the summary captures essential information of the source text) and factuality (to what extent the summary only includes information from or coherent with the source text). Factuality evaluation, in particular, is of crucial interest when using LLMs for judicial purposes, because of their hallucination risks. Notably, we propose a light variation of SelfCheckGPT, which has a stronger correlation with human judgment (0.743) than the wide-spread BARTScore (0.542), or our study dataset. Other paradigms, such as Question-Answering, are studied in this thesis, which however underperform compared to these. Then, extractive summarization methods are explored and compared, including one based on graphs via the TextRank algorithm, and one based on greedy optimization. The latter (overlap rate: 0.190, semantic similarity: 0.513) clearly outperforms the base TextRank (overlap rate: 0.172, semantic similarity: 0.506). An improvement of the TextRank with a threshold mechanism is also proposed, leading to a non-negligible improvement (overlap rate: 0.180, semantic similarity: 0.513). Finally, abstractive summarization, with pre-trained LLMs based on a Transformer architecture, is studied. In particular, several general-purpose and multilingual models (Llama-2, Mistral and Mixtral) were objectively compared on a summarization dataset of judicial procedures from the French police. Results show that the performances of these models are highly related to their size: Llama-2 7B struggles to adapt to uncommon data (overlap rate: 0.083, BARTScore: -3.099), while Llama-2 13B (overlap rate: 0.159, BARTScore: -2.718) and Llama-2 70B (overlap rate: 0.191, BARTScore: -2.479) have proven quite versatile and efficient. To improve the performances of the smallest models, empirical prompt-engineering and parameter-efficient fine-tuning are explored. Notably, our fine-tuned version of Mistral 7B reaches performances comparable to those of much larger models (overlap rate: 0.185, BARTScore: -2.060), without the need for empirical prompt-engineering, and with a linguistic style closer to what is expected. / Under en utredning som görs av en polis eller en gendarm skrivs förhörsprotokoll vars längd kan vara upp till flera sidor. Målet på hög nivå med denna rapport är att studera olika automatiska och tillförlitliga textsammanfattningsmetoder för att hjälpa till med denna tidskrävande uppgift. En utmaning kommer från de specifika franska och rättsliga uppgifter som vi vill sammanfatta; och en annan utmaning kommer från behovet av pålitliga, sakliga och uppfinningsfria modeller. För det första fokuserar denna rapport på automatisk sammanfattningsutvärdering, både vad gäller innehåll (hur väl sammanfattningen fångar väsentlig information i källtexten) och fakta (i vilken utsträckning sammanfattningen endast innehåller information från eller överensstämmer med källtexten). Faktautvärdering, i synnerhet, är av avgörande intresse när man använder LLM för rättsliga ändamål, på grund av deras hallucinationsrisker. Vi föreslår särskilt en lätt variant av SelfCheckGPT, som har en starkare korrelation med mänskligt omdöme (0,743) än den utbredda BARTScore (0,542), eller vår studiedatauppsättning. Andra paradigm, såsom Question-Answering, studeras i denna rapport, som dock underpresterar jämfört med dessa. Sedan utforskas och jämförs extraktiva sammanfattningsmetoder, inklusive en baserad på grafer via TextRank-algoritmen och en baserad på girig optimering. Den senare (överlappning: 0,190, semantisk likhet: 0,513) överträffar klart basen TextRank (överlappning: 0,172, semantisk likhet: 0,506). En förbättring av TextRank med en tröskelmekanism föreslås också, vilket leder till en icke försumbar förbättring (överlappning: 0,180, semantisk likhet: 0,513). Slutligen studeras abstrakt sammanfattning, med förutbildade LLM baserade på en transformatorarkitektur. I synnerhet jämfördes flera allmänna och flerspråkiga modeller (Llama-2, Mistral och Mixtral) objektivt på en sammanfattningsdatauppsättning av rättsliga förfaranden från den franska polisen. Resultaten visar att prestandan för dessa modeller är starkt relaterade till deras storlek: Llama-2 7B kämpar för att anpassa sig till ovanliga data (överlappning: 0,083, BARTScore: -3,099), medan Llama-2 13B (överlappning: 0,159, BARTScore: -2,718) och Llama-2 70B (överlappning: 0,191, BARTScore: -2,479) har visat sig vara ganska mångsidiga och effektiva. För att förbättra prestandan för de minsta modellerna utforskas empirisk prompt-teknik och parametereffektiv finjustering. Noterbart är att vår finjusterade version av Mistral 7B når prestanda som är jämförbara med de för mycket större modeller (överlappning: 0,185, BARTScore: -2,060), utan behov av empirisk prompt-teknik och med en språklig stil som ligger närmare vad som förväntas.
|
29 |
Event-Cap – Event Ranking and Transformer-based Video Captioning / Event-Cap – Event rankning och transformerbaserad video captioningCederqvist, Gabriel, Gustafsson, Henrik January 2024 (has links)
In the field of video surveillance, vast amounts of data are gathered each day. To be able to identify what occurred during a recorded session, a human annotator has to go through the footage and annotate the different events. This is a tedious and expensive process that takes up a large amount of time. With the rise of machine learning and in particular deep learning, the field of both image and video captioning has seen large improvements. Contrastive Language-Image Pretraining is capable of efficiently learning a multimodal space, thus able to merge the understanding of text and images. This enables visual features to be extracted and processed into text describing the visual content. This thesis presents a system for extracting and ranking important events from surveillance videos as well as a way of automatically generating a description of the event. By utilizing the pre-trained models X-CLIP and GPT-2 to extract visual information from the videos and process it into text, a video captioning model was created that requires very little training. Additionally, the ranking system was implemented to extract important parts in video, utilizing anomaly detection as well as polynomial regression. Captions were evaluated using the metrics BLEU, METEOR, ROUGE and CIDEr, and the model receives scores comparable to other video captioning models. Additionally, captions were evaluated by experts in the field of video surveillance, who rated them on accuracy, reaching up to 62.9%, and semantic quality, reaching 99.2%. Furthermore the ranking system was also evaluated by the experts, where they agree with the ranking system 78% of the time. / Inom videoövervakning samlas stora mängder data in varje dag. För att kunna identifiera vad som händer i en inspelad övervakningsvideo så måste en människa gå igenom och annotera de olika händelserna. Detta är en långsam och dyr process som tar upp mycket tid. Under de senaste åren har det setts en enorm ökning av användandet av olika maskininlärningsmodeller. Djupinlärningsmodeller har fått stor framgång när det kommer till att generera korrekt och trovärdig text. De har också använts för att generera beskrivningar för både bilder och video. Contrastive Language-Image Pre-training har gjort det möjligt att träna en multimodal rymd som kombinerar förståelsen av text och bild. Detta gör det möjligt att extrahera visuell information och skapa textbeskrivningar. Denna master uppsatts beskriver ett system som kan extrahera och ranka viktiga händelser i en övervakningsvideo samt ett automatiskt sätt att generera beskrivningar till dessa. Genom att använda de förtränade modellerna X-CLIP och GPT-2 för att extrahera visuell information och textgenerering, har en videobeskrivningsmodell skapats som endast behöver en liten mängd träning. Dessutom har ett rankingsystem implementerats för att extrahera de viktiga delarna i en video genom att använda anomalidetektion och polynomregression. Video beskrivningarna utvärderades med måtten BLEU, METOER, ROUGE och CIDEr, där modellerna får resultat i klass med andra videobeskrivningsmodeller. Fortsättningsvis utvärderades beskrivningarna också av experter inom videoövervakningsområdet där de fick besvara hur bra beskrivningarna var i måtten: beskrivningsprecision som uppnådde 62.9% och semantisk kvalité som uppnådde 99.2%. Ranknignssystemet utvärderades också av experterna. Deras åsikter överensstämde till 78% med rankningssystemet.
|
30 |
Direct Preference Optimization for Improved Technical WritingAssistance : A Study of How Language Models Can Support the Writing of Technical Documentation at Saab / En studie i hur språkmodeller kan stödja skrivandet av teknisk dokumentation på SaabBengtsson, Hannes, Habbe, Patrik January 2024 (has links)
This thesis explores the potential of Large Language Models (LLMs) to assist in the technical documentation process at Saab. With the increasing complexity and regulatory demands on such documentation, the objective is to investigate advanced natural language processing techniques as a means of streamlining the creation of technical documentation. Although many standards exist, this thesis particularly focuses on the standard ASD-STE100, Simplified Technical English abbrv. STE, a controlled language for technical documentation. STE's primary aim is to ensure that technical documents are understandable to individuals regardless of their native language or English proficiency. The study focuses on the implementation of Direct Preference Optimization (DPO) and Supervised Instruction Fine-Tuning (SIFT) to refine the capabilities of LLMs in producing clear and concise outputs that comply with STE. Through a series of experiments, we investigate the effectiveness of LLMs in interpreting and simplifying technical language, with a particular emphasis on adherence to STE standards. The study utilizes a dataset comprised of target data paired with synthetic source data generated by a LLM. We apply various model training strategies, including zero-shot performance, supervised instruction fine-tuning, and direct preference optimization. We evaluate the various models' output using established quantitative metrics for text simplification and substitute human evaluators with company internal software for evaluating adherence to company standards and STE. Our findings suggest that while LLMs can significantly contribute to the technical writing process, the choice of training methods and the quality of data play crucial roles in the model's performance. This study shows how LLMs can improve productivity and reduce manual work. It also looks at the problems and suggests ways to make technical documentation automation better in the future.
|
Page generated in 0.1576 seconds