Global ETD Search

1	Zero and Few-Shot Concept Learning with Pre-Trained Embeddings Moody, Jamison M. 21 April 2023 (has links) (PDF) Neural networks typically struggle with reasoning tasks on out of domain data, something that humans can more easily adapt to. Humans come with prior knowledge of concepts and can segment their environment into building blocks (such as objects) that allow them to reason effectively in unfamiliar situations. Using this intuition, we train a network that utilizes fixed embeddings from the CLIP (Contrastive Language--Image Pre-training) model to do a simple task that the original CLIP model struggles with. The network learns concepts (such as "collide" and "avoid") in a supervised source domain in such a way that the network can adapt and identify similar concepts in a target domain with never-before-seen objects. Without any training in the target domain, we show a 11% accuracy improvement in recognizing concepts compared to the baseline zero-shot CLIP model. When provided with a few labels, this accuracy gap widens to 20%. deep learning backpropagation transformer models artificial intelligence Physical Sciences and Mathematics
2	Readability Assessment with Pre-Trained Transformer Models : An Investigation with Neural Linguistic Features Ma, Chuchu January 2022 (has links) Readability assessment (RA) is to assign a score or a grade to a given document, which measures the degree of difficulty to read the document. RA originated in language education studies and was used to classify reading materials for language learners. Later, RA was applied to many other applications, such as aiding automatic text simplification. This thesis is aimed at improving the way of using Transformer for RA. The motivation is the “pipeline” effect (Tenney et al., 2019) of pretrained Transformers: lexical, syntactic, and semantic features are best encoded with different layers of a Transformer model. After a preliminary test of a basic RA model that resembles the previous works, we proposed several methods to enhance the performance: by using a Transformer layer that is not the last, by concatenating or mixing the outputs of all layers, and by using syntax-augmented Transformer layers. We examined these enhanced methods on three datasets: WeeBit, OneStopEnglish, and CommonLit. We observed that the improvements showed a clear correlation with the dataset characteristics. On the OneStopEnglish and the CommonLit datasets, we achieved absolute improvements of 1.2% in F1 score and 0.6% in Pearson’s correlation coefficients, respectively. We also show that an 𝑛-gram frequency- based baseline, which is simple but was not reported in previous works, has superior performance on the classification datasets (WeeBit and OneStopEnglish), prompting further research on vocabulary-based lexical features for RA. Readability Assessment Pre-trained Transformer Models Neural Linguistic Features
3	Structural Self-Supervised Objectives for Transformers Di Liello, Luca 21 September 2023 (has links) In this Thesis, we leverage unsupervised raw data to develop more efficient pre-training objectives and self-supervised tasks that align well with downstream applications. In the first part, we present three alternative objectives to BERT’s Masked Language Modeling (MLM), namely Random Token Substitution (RTS), Cluster-based Random Token Substitution C-RTS, and Swapped Language Modeling (SLM). Unlike MLM, all of these proposals involve token swapping rather than replacing tokens with BERT’s [MASK]. RTS and C-RTS involve pre- dicting the originality of tokens, while SLM tasks the model at predicting the original token values. Each objective is applied to several models, which are trained using the same computational budget and corpora. Evaluation results reveal RTS and C-RTS require up to 45% less pre-training time while achieving performance on par with MLM. Notably, SLM outperforms MLM on several Answer Sentence Selection and GLUE tasks, despite utilizing the same computational budget for pre-training. In the second part of the Thesis, we propose self-supervised pre-training tasks that exhibit structural alignment with downstream applications, leading to improved performance and reduced reliance on labeled data to achieve comparable results. We exploit the weak supervision provided by large corpora like Wikipedia and CC-News, challenging the model to recognize whether spans of text originate from the same paragraph or document. To this end, we design (i) a pre-training objective that targets multi-sentence inference models by performing predictions over multiple spans of texts simultaneously, (ii) self-supervised objectives tailored to enhance performance in Answer Sentence Selection and its Contextual version, and (iii) a pre-training objective aimed at performance improvements in Summarization. Through continuous pre-training, starting from renowned checkpoints such as RoBERTa, ELEC- TRA, DeBERTa, BART, and T5, we demonstrate that our models achieve higher performance on Fact Verification, Answer Sentence Selection, and Summarization. We extensively evaluate our proposals on different benchmarks, revealing significant accuracy gains, particularly when annotation in the target dataset is limited. Notably, we achieve state-of-the-art results on the development set of the FEVER dataset and results close to state-of-the-art models using much more parameters on the test set. Furthermore, our objectives enable us to attain state-of-the-art results on ASNQ, WikiQA, and TREC-QA test sets, across all evaluation metrics (MAP, MRR, and P@1). For Summarization, our objective enhances summary quality, as measured by various metrics like ROUGE and BLEURT. We maintain that our proposals can be seamlessly combined with other techniques from recently proposed works, as they do not require alterations to the internal structure of Transformer models but only involve modifications to the training tasks. Settore INF/01 - Informatica
4	Identifying New Fault Types Using Transformer Embeddings Karlsson, Mikael January 2021 (has links) Continuous integration/delivery and deployment consist of many automated tests, some of which may fail leading to faulty software. Similar faults may occur in different stages of the software production lifecycle and it is necessary to identify similar faults and cluster them into fault types in order to minimize troubleshooting time. Pretrained transformer based language models have been proven to achieve state of the art results in many natural language processing tasks like measuring semantic textual similarity. This thesis aims to investigate whether it is possible to cluster and identify new fault types by using a transformer based model to create context aware vector representations of fault records, which consists of numerical data and logs with domain specific technical terms. The clusters created were compared against the clusters created by an existing system, where log files are grouped by manual specified filters. Relying on already existing fault types with associated log data, this thesis shows that it is possible to finetune a transformer based model for a classification task in order to improve the quality of text embeddings. The embeddings are clustered by using density based and hierarchical clustering algorithms with cosine distance. The results show that it is possible to cluster log data and get comparable results to the existing manual system, where the cluster similarity was assessed with V-measure and Adjusted Rand Index. / Kontinuerlig integration består automatiserade tester där det finns risk för att några misslyckas vilket kan leda till felaktig programvara. Liknande fel kan uppstå under olika faser av en programvarans livscykel och det är viktigt att identifiera och gruppera olika feltyper för att optimera felsökningsprocessen. Det har bevisats att språkmodeller baserade på transformatorarkitekturen kan uppnå höga resultat i många uppgifter inom språkteknologi, inklusive att mäta semantisk likhet mellan två texter. Detta arbete undersöker om det är möjligt att gruppera och identifiera nya feltyper genom att använda en transformatorbaserad språkmodell för att skapa numeriska vektorer av loggtext, som består av domänspecifika tekniska termer och numerisk data. Klustren jämförs mot redan existerande grupperingar som skapats av ett befintligt system där feltyper identifieras med manuellt skrivna filter. Det här arbetet visar att det går att förbättra vektorrepresenationerna skapade av en språkmodell baserad på transformatorarkitekturen genom att tilläggsträna modellen för en klassificeringsuppgift. Vektorerna grupperas med hjälp av densitetsbaserade och hierarkiska klusteralgoritmer. Resultaten visar att det är möjligt att skapa vektorer av logg-texter med hjälp av en transformatorbaserad språkmodell och få jämförbara resultat som ett befintligt manuellt system, när klustren evaluerades med V-måttet och Adjusted Rand Index. Transformer Models Clustering Embeddings Deep Learning Fault Identification Transformatorbaserade modeller Klustering Djupinlärning Felidentifiering Computer Sciences Datavetenskap (datalogi)
5	Generating Terraform Configuration Files with Large Language Models / Att skapa Terraform-konfigurationsfiler med stora språkmodeller Bonde, Oskar January 2022 (has links) This thesis explores how large language models can be used to generate configuration files for Terraform from natural language descriptions. Few-shot and fine-tuning paradigms are evaluated on decoder-only models of varying size, including the state-of-the-art Codex model. The generated configuration files are evaluated with regard to functional correctness on a custom dataset using Terraform, to account for the large space of functionally equivalent configuration files. Results show that the largest model Codex is very capable at generating configuration files given an English description of network infrastructure even without fine-tuning. The result could be a useful tool for engineers who know Terraform fundamentals and have experience with the cloud platforms: AWS, GCP, or Azure. A future study could fine-tune Codex for Terraform using OpenAI's API or create an open source Codex-replication by fine-tuning the GPT-3 replication OPT, which in turn can be \hbox{fine-tuned}. / Denna avhandling undersöker hur stora språkmodeller kan användas till att generera konfigurationsfiler för Terraform med hjälp av språkbeskrivningar. Både few-shot och fine-tuning paradigm utvärderas på decoder-only modeller i olika storlekar, inklusive Codex. För att ta hänsyn till konfigurationsfiler som i utseende ser olika ut men som är funktionellt ekvivalenta utvärderas konfigurationsfilerna utifrån deras funktion. Resultaten visar att Codex, som är den största modellen, har förmågan att generera konfigurationsfiler givet en engelsk beskrivning av nätverksinfrastruktur, trots att Codex inte har undergått fine-tuning. Resultatet kan vara ett användbart verktyg för ingenjörer som har grundläggande kunskap om Terraform och erfarenhet av molnplattformarna: AWS, GCP eller Azure. En framtida studie skulle kunna träna Codex för Terraform med OpenAI:s API eller skapa en Codex-kopia genom att träna GPT-3 kopian OPT som i sin tur kan bli tränad för Terraform. Terraform Transformer models Generating configuration files Large Language Models Codex Terraform Transformer-modeller Generera konfigurationsfiler Stora språkmodeller Codex Computer Sciences Datavetenskap (datalogi)
6	Explainable Antibiotics Prescriptions in NLP with Transformer Models Contreras Zaragoza, Omar Emilio January 2021 (has links) The overprescription of antibiotics has resulted in bacteria resistance, which is considered a global threat to global health. Deciding if antibiotics should be prescribed or not from individual visits of patients’ medical records in Swedish can be considered a text classification task, one of the applications of Natural Language Processing (NLP). However, medical experts and patients can not trust a model if explanations for its decision are not provided. In this work, multilingual and monolingual Transformer models are evaluated for the medical classification task. Furthermore, local explanations are obtained with SHapley Additive exPlanations and Integrated Gradients to compare the models’ predictions and evaluate the explainability methods. Finally, the local explanations are also aggregated to obtain global explanations and understand the features that contributed the most to the prediction of each class. / Felaktig utskrivning av antibiotika har resulterat i ökad antibiotikaresistens, vilket anses vara ett globalt hot mot global hälsa. Att avgöra om antibiotika ska ordineras eller inte från patientjournaler på svenska kan betraktas som ett textklassificeringproblem, en av tillämpningarna av Natural Language Processing (NLP). Men medicinska experter och patienter kan inte lita på en modell om förklaringar till modellens beslut inte ges. I detta arbete utvärderades flerspråkiga och enspråkiga Transformersmodeller för medisinska textklassificeringproblemet. Dessutom erhölls lokala förklaringar med SHapley Additive exPlanations och Integrated gradients för att jämföra modellernas förutsägelser och utvärdera metodernas förklarbarhet. Slutligen aggregerades de lokala förklaringarna för att få globala förklaringar och förstå de ord som bidrog mest till modellens förutsägelse för varje klass. Transformer models NLP Explainable AI Medical domain Antibiotics prescription SHAP Integrated Gradients Transformatormodeller NLP förklarbar AI medicinsk domän antibiotikarecept SHAP Integrated Gradients Computer and Information Sciences Data- och informationsvetenskap
7	Employing a Transformer Language Model for Information Retrieval and Document Classification : Using OpenAI's generative pre-trained transformer, GPT-2 / Transformermodellers användbarhet inom informationssökning och dokumentklassificering Bjöörn, Anton January 2020 (has links) As the information flow on the Internet keeps growing it becomes increasingly easy to miss important news which does not have a mass appeal. Combating this problem calls for increasingly sophisticated information retrieval methods. Pre-trained transformer based language models have shown great generalization performance on many natural language processing tasks. This work investigates how well such a language model, Open AI’s General Pre-trained Transformer 2 model (GPT-2), generalizes to information retrieval and classification of online news articles, written in English, with the purpose of comparing this approach with the more traditional method of Term Frequency-Inverse Document Frequency (TF-IDF) vectorization. The aim is to shed light on how useful state-of-the-art transformer based language models are for the construction of personalized information retrieval systems. Using transfer learning the smallest version of GPT-2 is trained to rank and classify news articles achieving similar results to the purely TF-IDF based approach. While the average Normalized Discounted Cumulative Gain (NDCG) achieved by the GPT-2 based model was about 0.74 percentage points higher the sample size was too small to give these results high statistical certainty. / Informationsflödet på Internet fortsätter att öka vilket gör det allt lättare att missa viktiga nyheter som inte intresserar en stor mängd människor. För att bekämpa detta problem behövs allt mer sofistikerade informationssökningsmetoder. Förtränade transformermodeller har sedan ett par år tillbaka tagit över som de mest framstående neurala nätverken för att hantera text. Det här arbetet undersöker hur väl en sådan språkmodell, Open AIs General Pre-trained Transformer 2 (GPT-2), kan generalisera från att generera text till att användas för informationssökning och klassificering av texter. För att utvärdera detta jämförs en transformerbaserad modell med en mer traditionell Term Frequency- Inverse Document Frequency (TF-IDF) vektoriseringsmodell. Målet är att klargöra hur användbara förtränade transformermodeller faktiskt är i skapandet av specialiserade informationssökningssystem. Den minsta versionen av språkmodellen GPT-2 anpassas och tränas om till att ranka och klassificera nyhetsartiklar, skrivna på engelska, och uppnår liknande prestanda som den TF-IDF baserade modellen. Den GPT-2 baserade modellen hade i genomsnitt 0.74 procentenheter högre Normalized Discounted Cumulative Gain (NDCG) men provstorleken var ej stor nog för att ge dessa resultat hög statistisk säkerhet. Deep Learning Transformer Models Information Retrieval Ranking Generative Pre-training Document Classification djupinlärning transformermodeller informationssökning ranking generativ förträning dokumentklassificering Computer and Information Sciences Data- och informationsvetenskap
8	Duplicate Detection and Text Classification on Simplified Technical English / Dublettdetektion och textklassificering på Förenklad Teknisk Engelska Lund, Max January 2019 (has links) This thesis investigates the most effective way of performing classification of text labels and clustering of duplicate texts in technical documentation written in Simplified Technical English. Pre-trained language models from transformers (BERT) were tested against traditional methods such as tf-idf with cosine similarity (kNN) and SVMs on the classification task. For detecting duplicate texts, vector representations from pre-trained transformer and LSTM models were tested against tf-idf using the density-based clustering algorithms DBSCAN and HDBSCAN. The results show that traditional methods are comparable to pre-trained models for classification, and that using tf-idf vectors with a low distance threshold in DBSCAN is preferable for duplicate detection. NLP CNL transformer models LSTM BERT document embeddings word embeddings text classification text clustering transfer learning machine learning Computer Sciences Datavetenskap (datalogi)
9	Modelling Of Switched Mode Power Converters : A Bond Graph Approach Umarikar, Amod Chandrashekhar 08 1900 (has links) Modelling and simulation are essential ingredients of the analysis and design process in power electronics. It helps a design engineer gain an increased understanding of circuit operation. Accordingly, for a set of specifications given, the designer will choose a particular topology, select component types and values, estimate circuit performance etc. Typically hierarchical modelling, analysis and simulation rather than full detailed simulation of the system provides a crucial insight and understanding. The combination of these insights with hardware prototyping and experiments constitutes a powerful and effective approach to design. Obtaining the mathematical model of the power electronic systems is a major task before any analysis or synthesis or simulation can be performed. There are circuit oriented simulators which uses inbuilt mathematical models for components. Simulation with equation solver needs mathematical models for simulation which are trimmed according to user requirement. There are various methods in the literature to obtain these mathematical models. However, the issues of multi-domain system modelling and causality of the energy variables are not sufficiently addressed. Further, specifically to power converter systems, the issue of switching power models with fixed causality is not addressed. Therefore, our research focuses on obtaining solutions to the above using relatively untouched bond graph method to obtain models for power electronic systems. The power electronic system chosen for the present work is Switched Mode Power Converters (SMPC’s) and in particular PWM DC-DC converters. Bond graph is a labelled and directed graphical representation of physical systems. The basis of bond graph modelling is energy/power flow in a system. As energy or power flow is the underlying principle for bond graph modelling, there is seamless integration across multiple domains. As a consequence, different domains (such as electrical, mechanical, thermal, fluid, magnetic etc.) can be represented in a unified way. The power or the energy flow is represented by a half arrow, which is called the power bond or the energy bond. The causality for each bond is a significant issue that is inherently addressed in bond graph modelling. As every bond involves two power variables, the decision of setting the cause variable and the effect variable is by natural laws. This has a significant bearing in the resulting state equations of the system. Proper assignment of power direction resolves the sign-placing problem when connecting sub-model structures. The causality will dictate whether a specific power variable is a cause or the effect. Using causal bars on either ends of the power bond, graphically indicate the causality for every bond. Once the causality gets assigned, bond graph displays the structure of state space equations explicitly. The first problem we have encountered in modelling power electronic systems with bond graph is power switching. The essential part of any switched power electronic system is a switch. Switching in the power electronic circuits causes change in the structure of the system. This results in change in dynamic equations of the circuit according to position of the switch. We have proposed the switched power junctions (SPJ) to represent switching phenomena in power electronic systems. The switched power junctions are a generalization of the already existing 0-junction and 1-junction concepts of the bond graph element set. The SPJ’s models ideal switching. These elements maintain causality invariance for the whole system for any operational mode of the system. This means that the state vector of the resulting state equation of the system does not change for any operating mode. As SPJs models ideal power switching, the problem of stiff systems and associated numerical stability problems while simulating the system is eliminated. Further, it maintains one to one correspondence with the physical system displaying all the feasible modes of operation at the same time on the same graph. Using these elements, the switched mode power converters (SMPC's) are modelled in bond graph. Bond graph of the converter is the large signal model of the converter. A graphical procedure is proposed that gives the averaged large signal, steady state and small signal ac models. The procedure is suitable for the converters operating in both Continuous Conduction Mode (CCM) and in Discontinuous Conduction Mode (DCM). For modelling in DCM, the concept of virtual switch is used to model the converter using bond graph. Using the proposed method, converters of any complexity can be modelled incorporating all the advantages of bond graph modelling. Magnetic components are essential part of the power electronic systems. Most common parts are the inductor, transformer and coupled inductors which contain both the electric and magnetic domains. Gyrator-Permeance approach is used to model the magnetic components. Gyrator acts as an interface between electric and magnetic domain and capacitor model the permeance of the magnetic circuits. Components like inductor, tapped inductor, transformer, and tapped transformer are modelled. Interleaved converters with coupled inductor, zero ripple phenomena in coupled inductor converters as well as integrated magnetic Cuk converter are also modelled. Modelling of integrated magnetic converters like integrated magnetic forward converter, integrated magnetic boost converter are also explored. To carry out all the simulations of proposed bond graph models, bond graph toolbox is developed using MATLAB/SIMULINK. The MATLAB/SIMULINK is chosen since it is general simulation platform widely available. Therefore all the analysis and simulation can be carried out using facilities available in MATLAB/SIMULINK. Symbolic equation extraction toolbox is also developed which extracts state equations from bond graph model in SIMULINK in symbolic form. Electric Converters Power Electronics Switching Modes Graph Theory Transformer Models Inductor Models Converters - Modelling Switching Systems - Modelling Switched Mode Power Converters (SMPC's) Bond Graph Modelling Bond Graphs Electrical Engineering
10	Classifying and Comparing Latent Space Representation of Unstructured Log Data. / Klassificering och jämförelse av latenta rymdrepresentationer av ostrukturerad loggdata. Sharma, Bharat January 2021 (has links) This thesis explores and compares various methods for producing vector representation of unstructured log data. Ericsson wanted to investigate machine learning methods to analyze logs produced by their systems to reduce the cost and effort required for manual log analysis. Four NLP methods were used to produce vector embeddings for logs: Doc2Vec, DAN, XLNet, and RoBERTa. Also, a Random forest classifier was used to classify those embeddings. The experiments were performed on three different datasets and the results showed that the performance of the models varied based on the dataset being used. The results also show that in the case of log data, fine-tuning makes the transformer models computationally heavy and the performance gain is very low. RoBERTa without fine-tuning produced optimal vector representations for the first and third datasets used whereas DAN had better performance for the second dataset. The study also concluded that the NLP models were able to better understand and classify the third dataset as it contained more plain text information as contrasted against more technical and less human readable datasets. / I den här uppsatsen undersöks och jämförs olika metoder för att skapa vektorrepresentationer av ostrukturerad loggdata. Ericsson vill undersöka om det är möjligt att använda tekniker inom maskininlärning för att analysera loggdata som produceras av deras nuvarande system och på så sätt underlätta och minska kostnaderna för manuell logganalys. Fyra olika språkteknologier undersöks för att skapa vektorrepresentationer av loggdata: Doc2vec, DAN, XLNet and RoBERTa. Dessutom används en Random Forest klassificerare för att klassificera vektorrepresentationerna. Experimenten utfördes på tre olika datamängder och resultaten visade att modellernas prestanda varierade baserat på datauppsättningen som används. Resultaten visar också att finjustering av transformatormodeller gör dem beräkningskrävande och prestandavinsten är liten.. RoBERTa utan finjustering producerade optimala vektorrepresentationer för de första och tredje dataset som användes, medan DAN hade bättre prestanda för det andra datasetet. Studien visar också att språkmodellerna kunde klassificera det tredje datasetet bättre då det innehöll mer information i klartext jämfört med mer tekniska och mindre lättlästa dataseten. Machine learning Natural language processing Deep learning Classification Supervised learning Transformer models Sentence embeddings Doc2Vec Deep averaging networks. Maskininlärning naturligtspråkbehandling djupinlärning klassificering övervakad inlärning transformeringsmodeller meningsinbäddningar Doc2Vec djupa linjärkombinerande nätverk. Computer Sciences Datavetenskap (datalogi)

Search results