Spelling suggestions: "subject:"språkteknologi"" "subject:"språkteknologin""
311 |
Active Learning for Extractive Question AnsweringMarti Roman, Salvador January 2022 (has links)
Data labelling for question answering tasks (QA) is a costly procedure that requires oracles to read lengthy excerpts of texts and reason to extract an answer for a given question from within the text. QA is a task in natural language processing (NLP), where a majority of recent advancements have come from leveraging the vast corpora of unlabelled and unstructured text available online. This work aims to extend this trend in the efficient use of unlabelled text data to the problem of selecting which subset of samples to label in order to maximize performance. This practice of selective labelling is called active learning (AL). Recent developments in AL for NLP have introduced the use of self-supervised learning on large corpora of text in the labelling process of samples for classification problems. This work adapts this research to the task of question answering and performs an initial exploration of expected performance. The methods covered in this work use uncertainty estimates obtained from neural networks to guide an incremental labelling process. These estimates are obtained from transformer-based models, previously trained in a self-supervised manner, by calculating the entropy of the confidence scores or with an approximation of Bayesian uncertainty obtained through Monte Carlo dropout. These methods are evaluated on two different benchmarking QA datasets: SQuAD v1 and TriviaQA. Several factors are observed to influence the behaviour of these uncertainty-based acquisition functions, including the choice of language model used, the presence of unanswered questions and the acquisition size used in the incremental process. The study produces no evidence to support that averaging or selecting maximal uncertainty values between the classification of an answer’s starting and ending positions affects sample acquisition quality. However, language model choice, the presence of unanswerable questions and acquisition size are all identified as key factors affecting consistency between runs and degree of success.
|
312 |
Persons with functional difficulties as resources in ICT design processesPersson, Hans January 2008 (has links)
Denna avhandling har sin grund i mina erfarenheter av att arbete med människor som har funktionsnedsättningar. Vanligtvis är denna grupp den sista en producent ser som sina kunder. Det är ganska vanligt att producenter gör olika produkter(produkter och tjänster) för personer med funktionsnedsättningar och en för andra. Om man istället, i designarbetet utgår från synsättet att de flesta personer vid någon tidpunkt och/eller plats har funktionssvårigheter så blir den potentiella kundgruppen större för produkten. Ursprunget för avhandlingen är ett projekt, vilket drevs av PTS (Post och Telestyrelsen), med syfte att identifiera vilka typer av stöd eller anpassningar personer med intellektuella funktionsnedsättningar har för att använda bredbandsbaserade tjänster. Resultatet i projektet pekade ut ett antal svårighetsområden där flertalet av dessa svårighetsområden inte var unika för denna grupp. Utifrån resultat i ovanstående projekt togs det fram en test-, utvärderings- och designmodell (TED-modellen) där ett av stegen använde en ”indikatorgrupp”. Syftet med modellen är att identifiera och ge underlag för att prioritera vilka svårighetsområden det fortsatta designarbetet skall fokuseras på. Indikatorgruppen består av individer med funktionssvårigheter som är relevanta i sammanhanget. Modellen tar vara på möjligheterna i ”design för alla” för att göra att göra bättre produkter för människorna. De empiriska studierna i denna uppsats är gjorda inom två områden. Den första är i ett designsammanhang, där fem olika hemsidor skulle tas fram och den andra är runt en studie av tre olika affärsarbetsplatser, där kassafunktionen var i fokus för studien. Resultatet i denna uppsats pekar ut en möjlig inriktning för en designmetodologi, vars målsättning är att få fram bättre produkter för en större grupp. Utgångspunkten är att använda människors olikheter som en möjlighet och inte som ett problem. Individer med funktionella svårigheter är en resurs för att finna nya innovationer vilket jag har benämnt ”the Lead of Need”. Med detta menar jag att individer med funktionella svårigheter, som har ett behov, en ide för en lösning, men inte har möjlighet att förverkliga denna. Om vi kan organisera en mötesplats för individer med ”the Lead of Need”, designers och utvecklare så har vi skapat ett ”Living lab” för nya innovationer. / This thesis has its roots in my experiences of working with people who have some forms of disability. Usually this group is the last group producers consider as their customers. It is quite common that producers make different products (and services) for individuals with disabilities and for others. If one instead takes the position, in the design work, that most people have some functional difficulties at some point in time or in place, then the potential customer group becomes larger for the product in question. The origin of this thesis is a project run by the Swedish Post and Telecom Agency (PTS), aiming to identify what kind of support or adaptation people with intellectual disabilities needs when using broadband based services. The result of the project pointed out areas of difficulties. Most areas of difficulties were not unique for this group. From the result of the PTS-project, a design and evaluation model (TED-model) was built, where one of the steps involved the use of an “indicator group”. The aim for this step is to identify and give basis for prioritizing areas of difficulty that the continued design work should focus on. The indicator group consists of individuals with functional difficulties relevant in a specified context. This method uses the possibilities of “design for all” as facilitator to design better products for more people. The empirical studies in this thesis were carried out within two areas. The first study was made in a design project, where five different web sites were to be designed, and the second one dealt with three different business workplaces in which the cashier workplaces was in focus. The results of this thesis point out a possible direction of a design methodology, whose objective is to create better products for larger group of people. The starting point is to use people's differences as a possibility for design, and not a problem. Individuals with functional difficulties constitute a resource for finding new innovations, which I have termed “the Lead of Need”. With this I mean individuals with functional difficulties, who have a need, an idea for a solution, but not the possibility to make it happen. If we can organise a meeting ground for individuals with “the Lead of Need”, designers, and developers, we will have created a “living lab” for new innovations. / QC 20101119
|
313 |
Multimodal Multi-label Classification with Small Foundation ModelsMartin Björkdahl, Liv January 2024 (has links)
The use of electronic health records (EHR) from various sources like text, images and time-series data to make predictions or diagnosis have been researchedpreviously. Many previous methods have used separate models either for sepa-rate modalities or for distinct tasks. Recently, models trained to make medicalpredictions using multimodal input have emerged, as a unified approach wouldbe beneficial for health practitioners. We present a single model to make medicalpredictions for several tasks, using diverse input from different modalities. Wedemonstrate the effectiveness of using an autoencoder method to project (EHR)data from three different modalities – images, text and time-series data – into thesmall language model Gemma-2B. 6 projector models are used together with the small language model to perform multi-label prediction for 12 different medicalprediction tasks. Results show that a jointly trained model using asymmetric loss,a loss function that dynamically emphasises positives that are poorly predicted,shows good performance and predicts evenly across tasks.
|
314 |
Exploring source languages for Faroese in single-source and multi-source transfer learning using language-specific and multilingual language modelsFischer, Kristóf January 2024 (has links)
Cross-lingual transfer learning has been the driving force of low-resource natural language processing in recent years, relying on massively multilingual language models with hopes of solving the data scarcity issue for languages with a limited digital presence. However, this "one-size-fits-all" approach is not equally applicable to all low-resource languages, suggesting limitations of such models in cross-lingual transfer. Besides, known similarities and phylogenetic relationships between source and target languages are often overlooked. In this work, the emphasis is placed on Faroese, a low-resource North Germanic language with several closely related resource-rich sibling languages. The cross-lingual transfer potential from these strong Scandinavian source candidates, as well as from additional genetically related, geographically proximate, and syntactically similar source languages is studied in single-source and multi-source experiments, in terms of Faroese syntactic parsing and part-of-speech tagging. In addition, the effect of task-specific fine-tuning on monolingual, linguistically informed smaller multilingual, and massively multilingual pre-trained language models is explored. The results suggest Icelandic as a strong source candidate, however, only when fine-tuning a monolingual model. With multilingual models, task-specific fine-tuning in Norwegian and Swedish seems even more beneficial. Although they do not surpass fully Scandinavian fine-tuning, models trained on genetically related and syntactically similar languages produce good results. Additionally, the findings indicate that multilingual models outperform models pre-trained on a single language, and that even better results can be achieved using a smaller, linguistically informed model, compared to a massively multilingual one.
|
315 |
Technical Language Supervision for Intelligent Fault Diagnosis / Språkteknologi för intelligent diagnostik av maskinskadorLöwenmark, Karl January 2023 (has links)
Condition Monitoring (CM) is widely used in industry to meet sustainability, safety, and equipment efficiency requirements. Intelligent Fault Diagnosis (IFD) research focuses on automating CM data analysis tasks, to detect and prevent machine faults, and provide decision support. IFD enables trained analysts to focus their efforts on advanced tasks such as fault severity estimation and preventive maintenance optimization, instead of performing routine tasks. Industry datasets are rarely labelled, and IFD models are therefore typically trained on labelled data generated in laboratory environments with artificial or accelerated fault development. In the process industry, fault characteristics are often context-dependent and difficult to predict in sufficient detail due to the heterogeneous environment of machine parts. Furthermore, fault development is non-linear and measurements are subject to varying background noise. Thus, IFD models trained on lab data are not expected to transfer well to process industry environments, and require on-site pre-training or fine-tuning to facilitate accurate and advanced fault diagnosis. While ground truth labels are absent in industrial CM datasets, analysts sometimes write annotations of faults and maintenance work orders that describe the fault characteristics and required actions. These annotations deviate from typical natural language due to the technical language used, characterised by a high frequency of technical terms and abbreviations. Recent advances in natural language processing have enabled simultaneous learning from unlabelled pairs of images and captions through Natural Language Supervision (NLS). In this thesis, opportunities to enable weakly supervised IFD using annotated but otherwise unlabelled CM data are investigated. This thesis proposes novel machine learning methods for joint representation learning for IFD directly on annotated CM data. The main contributions are: (1) the introduction and implementation of technical language supervision to merge advances in natural language processing and, including a literature survey; (2) the creation of a method to improve technical languageprocessing by substituting out-of-vocabulary technical words with natural language descriptions, and to evaluate language model performance without explicit labels or downstream tasks; (3) the creation of a method for small-data language-based fault classification using human-centricvisualisation and clustering. Preliminary results for sensor and cable fault detection show an accuracy of over 90%. These results imply a considerable increase in the value of annotated CM datasets through the implementation of IFD models directly on industry data, e.g. for improving the decision support to avoid unplanned stops. / KnowIT FAST
|
316 |
Balancing Performance and Usage Cost: A Comparative Study of Language Models for Scientific Text Classification / Balansera prestanda och användningskostnader: En jämförande undersökning av språkmodeller för klassificering av vetenskapliga texterEngel, Eva January 2023 (has links)
The emergence of large language models, such as BERT and GPT-3, has revolutionized natural language processing tasks. However, the development and deployment of these models pose challenges, including concerns about computational resources and environmental impact. This study aims to compare discriminative language models for text classification based on their performance and usage cost. We evaluate the models using a hierarchical multi-label text classification task and assess their performance using primarly F1-score. Additionally, we analyze the usage cost by calculating the Floating Point Operations (FLOPs) required for inference. We compare a baseline model, which consists of a classifier chain with logistic regression models, with fine-tuned discriminative language models, including BERT with two different sequence lengths and DistilBERT, a distilled version of BERT. Results show that the DistilBERT model performs optimally in terms of performance, achieving an F1-score of 0.56 averaged on all classification layers. The baseline model and BERT with a maximal sequence length of 128 achieve F1-scores of 0.51. However, the baseline model outperforms the transformers at the most specific classification level with an F1-score of 0.33. Regarding usage cost, the baseline model significantly requires fewer FLOPs compared to the transformers. Furthermore, restricting BERT to a maximum sequence length of 128 tokens instead of 512 sacrifices some performance but offers substantial gains in usage cost. The code and dataset are available on GitHub. / Uppkomsten av stora språkmodeller, som BERT och GPT-3, har revolutionerat språkteknologi. Dock ger utvecklingen och implementeringen av dessa modeller upphov till utmaningar, bland annat gällande beräkningsresurser och miljöpåverkan. Denna studie syftar till att jämföra diskriminativa språkmodeller för textklassificering baserat på deras prestanda och användningskostnad. Vi utvärderar modellerna genom att använda en hierarkisk textklassificeringsuppgift och bedöma deras prestanda primärt genom F1-score. Dessutom analyserar vi användningskostnaden genom att beräkna antalet flyttalsoperationer (FLOPs) som krävs för inferens. Vi jämför en grundläggande modell, som består av en klassifikationskedja med logistisk regression, med finjusterande diskriminativa språkmodeller, inklusive BERT med två olika sekvenslängder och DistilBERT, en destillerad version av BERT. Resultaten visar att DistilBERT-modellen presterar optimalt i fråga om prestanda och uppnår en genomsnittlig F1-score på 0,56 för alla klassificeringsnivåer. Den grundläggande modellen och BERT med en maximal sekvenslängd på 128 uppnår ett F1-score på 0,51. Dock överträffar den grundläggande modellen transformermodellerna på den mest specifika klassificeringsnivån med en F1-score på 0,33. När det gäller användningskostnaden kräver den grundläggande modellen betydligt färre FLOPs jämfört med transformermodellerna. Att begränsa BERT till en maximal sekvenslängd av 128 tokens ger vissa prestandaförluster men erbjuder betydande besparingar i användningskostnaden. Koden och datamängden är tillgängliga på GitHub.
|
317 |
Robust Code Generation using Large Language Models : Guiding and Evaluating Large Language Models for Static VerificationAl-Mashahedi, Ahmad, Ljung, Oliver January 2024 (has links)
Background: Generative AI has achieved rapid and widespread acclaim over a short period since the inception of recent models that have opened up opportunities not possible before. Large Language Models (LLMs), a subset of generative AI, have become an essential part of code generation for software development. However, there is always a risk that the generated code does not fulfill the programmer's intent and contains faults or bugs that can go unnoticed. To that end, we propose that verification of generated code should increase its quality and trust. Objectives: This thesis aims to research generation of code that is both functionally correct and verifiable by implementing and evaluating four prompting approaches and a reinforcement learning solution to increase robustness within code generation, using unit-test and verification rewards. Methods: We used a Rapid Literature Review (RLR) and Design Science methodology to get a solid overview of the current state of robust code generation. From the RLR and related works, we evaluated the following four prompting approaches: Base prompt, Documentation prompting, In-context learning, and Documentation + In-context learning on the two datasets: MBPP and HumanEval. Moreover, we fine-tuned one model using Proximal Policy Optimization (PPO) for the novel task. Results: We measured the functional correctness and static verification success rates, amongst other metrics, for the four proposed approaches on eight model configurations, including the PPO fine-tuned LLM. Our results show that for the MBPP dataset, on average, In-context learning had the highest functional correctness at 29.4% pass@1, Documentation prompting had the highest verifiability at 8.48% verfiable@1, and finally, In-context learning had the highest functionally correct verifiable code at 3.2% pass@1 & verifiable@1. Moreover, the PPO fine-tuned model showed an overall increase in performance across all approaches compared to the pre-trained base model. Conclusions: We found that In-context learning on the PPO fine-tuned model yielded the best overall results across most metrics compared to the other approaches. The PPO fine-tuned with In-context learning resulted in 32.0% pass@1, 12.8% verifiable@1, and 5.0% pass@1 & verifiable@1. Documentation prompting was better for verifable@1 on MBPP. However, it did not perform as well for the other metrics. Documentation prompting + In-context learning was performance-wise between Documentation prompting and In-context learning, while Base prompt performed the worst overall. For future work, we envision several improvements to PPO training, including but not limited to training on Nagini documentation and utilizing expert iteration to create supervised fine-tuning datasets to improve the model iteratively. / Bakgrund: Generativ AI har uppnått snabb och utbredd popularitet under en kort tid sedan lanseringen av språk- och bildmodeller som har öppnat upp nya möjligheter. Large Language Models (LLMs), en del av generativ AI, har blivit en viktig del inom mjukvaruutveckling för kodgenerering. Det finns dock alltid en risk att den genererade koden inte uppfyller programmerarens avsikt och innehåller fel eller buggar som kan förbli oupptäckta. För att motverka detta föreslår vi formell verifiering av den genererade koden, vilket bör öka dess kvalitet och därmed förtroendet för den. Syfte: Detta examensarbetets syfte är att undersöka generering av kod som är bååde funktionellt korrekt och verifierbar genom att implementera och utvärdera fyra prompt-metoder samt en ny lösning genom reinforcement learning. Detta för att öka robusthet inom kodgenerering genom unit-test och verifieringsbelöningar. Metoder: Vi använde Rapid Literature Review (RLR) och Design Science metodik för att få en solid översikt över det nuvarande tillståndet för robust kodgenerering. Från RLR:en och relaterade arbeten utvärderade vi följande fyra prompt-metoder: Base prompt, Documentation prompting, In-context learning och Documentation + In-context learning. Dessutom fine-tune:ade vi en modell med Proximal Policy Optimization (PPO) för denna uppgift. Resultat: Vi mätte funktionell korrekthet- och verifieringsvinst-statistiken samt andra mätvärden för de fyra föreslagna prompten på åtta modellkonfigurationer, inklusive den PPO fine-tune:ade LLM:en. Våra resultat visar på MBPP datasetet att i genomsnitt hade In-context learning den högsta funktionella korrektheten vid 29,4% pass@1, Documentation prompting hade den högsta verifierbarheten vid 8,48% verifiable@1, och slutligen hade In-context learning mest funktionellt korrekta verifierbara kod vid 3.2% pass@1 & verifiable@1. Utöver detta visade den PPO fine-tune:ade modellen konsekventa förbättringar gentemot den förtränade basmodellen. Slutsatser: Vi fann att In-context learning med den fine-tune:ade PPO-modellen gav de bästa övergripande resultaten över de flesta mätvärden jämfört med de andra metoderna. Den PPO fine-tune:ade modellen med In-context learning resulterade i 32.0% pass@1, 12.8% verifiable@1, och 5.0% pass@1 & verifiable@1. Documentation prompting va bättre för verifable@1, men den fungerade inte lika bra för de andra mätvärdena. Documentation + In-context learning hamnade mellan Documentation prompting och In-context learning prestationsmässigt. Base prompt presterade sämst av de utvärderade metoderna. För framtida arbete ser vi flera förbättringar av träningen av PPO-modellen. Dessa innefattar, men är inte begränsade till, träning med Nagini dokumentation samt användning av expert iteration för att bygga ett dataset i syfte att iterativt förbättra modellen.
|
318 |
Automatic Text Classification of Research Grant Applications / Automatisk textklassificering av forskningsbidragsansökningarLindqvist, Robin January 2024 (has links)
This study aims to construct a state-of-the-art classifier model and compare it against a largelanguage model. A variation of SVM called LinearSVC was utilised and the BERT model usingbert-base-uncased was used. The data, provided by the Swedish Research Council, consisted ofresearch grant applications. The research grant applications were divided into two groups, whichwere further divided into several subgroups. The subgroups represented research fields such ascomputer science and applied physics. Significant class imbalances were present, with someclasses having only a tenth of the applications of the largest class. To address these imbalances,a new dataset was created using data that had been randomly oversampled. The models weretrained and tested on their ability to correctly assign a subgroup to a research grant application.Results indicate that the BERT model outperformed the SVM model on the original dataset,but not on the balanced dataset . Furthermore, the BERT model’s performance decreased whentransitioning from the original to the balanced dataset, due to overfitting or randomness. / Denna studie har som mål att bygga en state-of-the-art klassificerar model och sedan jämföraden mot en stor språkmodel. SVM modellen var en variation av SVM vid namn LinearSVC ochför BERT användes bert-base-uncased. Data erhölls från Vetenskapsrådet och bestod av forskn-ingsbidragsansökningar. Forskningsbidragsansökningarna var uppdelade i två grupper, som varytterligare uppdelade i ett flertal undergrupper. Dessa undergrupper representerar forsknings-fält såsom datavetenskap och tillämpad fysik. I den data som användes i studien fanns storaskillnader mellan klasserna, där somliga klasser hade en tiondel av ansökningarna som de storaklasserna hade. I syfte att lösa dessa klassbalanseringsproblem skapades en datamängd somundergått slumpmässig översampling. Modellerna tränades och testades på deras förmåga attkorrekt klassificera en forskningsbidragsansökan in i rätt undergrupp. Studiens fynd visade attBERT modellen presterade bättre än SVM modellen på både den ursprungliga datamängden,dock inte på den balanserade datamängden. Tilläggas kan, BERTs prestanda sjönk vid övergångfrån den ursprungliga datamängden till den balanserade datamängden, något som antingen berorpå överanpassning eller slump.
|
319 |
Large Language Models as Advanced Data Preprocessors : Transforming Unstructured Text into Fine-Tuning DatasetsVangeli, Marius January 2024 (has links)
The digital landscape increasingly generates vast amounts of unstructured textual data, valuable for analytics and various machine learning (ML) applications. These vast stores of data, often likened to digital gold, are often challenging to process and utilize. Traditional text processing methods, lacking the ability to generalize, typically struggle with unstructured and unlabeled data. For many complex data management workflows, the solution typically involves human intervention in the form of manual curation and labeling — a time-consuming process. Large Language Models (LLMs) are AI models trained on vast amounts of text data. They have remarkable Natural Language Processing (NLP) capabilities and offer a promising alternative. This thesis serves as an empirical case study of LLMs as advanced data preprocessing tools. It explores the effectiveness and limitations of using LLMs to automate and refine traditionally challenging data preprocessing tasks, highlighting a critical area of research in data management. An LLM-based preprocessing pipeline, designed to clean and prepare raw textual data for use in ML applications, is implemented and evaluated. This pipeline was applied to a corpus of unstructured text documents, extracted from PDFs, with the aim of transforming them into a fine-tuning dataset for LLMs. The efficacy of the LLM-based preprocessing pipeline was assessed by comparing the results against a manually curated benchmark dataset using two text similarity metrics: the Levenshtein distance and ROUGE score. The findings indicate that although LLMs are not yet capable of fully replacing human curation in complex data management workflows, they substantially improve the efficiency and manageability of preprocessing unstructured textual data.
|
320 |
RAG-based data extraction : Mining information from second-life battery documentsEdström, Jesper January 2024 (has links)
With the constant evolution of Large Language Models (LLMs), methods for minimizing hallucinations are being developed to provide more truthful answers. By using Retrieval-Augmented Generation (RAG), external data can be provided to the model on which its answers should be based. This project aims at using RAG for a data extraction pipeline specified for second-life batteries. By pre-defining the prompts the user may only provide the documents that are wished to be analyzed, this is to ensure that the answers are in the correct format for further data processing. To process different document types, initial labeling takes place before more specific extraction suitable for the document can be applied. Best performance is achieved by grouping questions that allow the model to reason around what the relevant questions are so that no hallucinations occur. Regardless of whether there are two or three document types, the model performs equally well, and it is clear that a pipeline of this type is well suited to today's models. Further improvements can be achieved by utilizing models containing a larger context window and initially using Optical Character Recognition (OCR) to read text from the documents.
|
Page generated in 0.0519 seconds