Global ETD Search

211	Chatbot : The future of customer feedback Dinh, Kevin Hoang January 2020 (has links) This is a study about how to convert a survey to a chatbot and distribute it to various communication channels to collect feedback to improve themselves. What would be the most convenient way to gather feedback? Our daily lives are becoming more and more dependent on digital devices every day. The rise in digital devices leads to a wider range of communication channels. Is it not a good opportunity to use these channels for several purposes. This study focuses on chatbots, survey systems, communication channels, and their ability to gather feedback from respondents and use it to increase the quality of goods, services, and perhaps life. By using chatbot language knowledge, people can engage with the bot in a conversation and answer survey questions in a different way. By using Restful API, the chatbot can extract quantitative information to be analyzed for development. Although the chatbot is not well-made and still requires a lot of adjustments, the work has proven to have many opportunities in surveys, gathering feedback, and analyzing it. This could be an improvement for research regarding chatbots in the future or a new way to make surveys better. / Detta är en studie om hur man konvertera en undersökning till en chattbot och sprida den till olika kommunikationskanaler för att samla återkoppling for att förbättra sig själv. Vad skulle vara det bekvämaste sättet att samla återkoppling? Våra dagliga liv blir mer och mer beroende av digitala enheter var dag. Ökningen av digitala enheter leder till ett större utbud av kommunikationskanaler. Är det inte då en bra möjlighet att utnyttja dessa kanaler för flera ändamål. Det här arbetet focuserar på chattbotar, undersökningssystem och deras förmåga att samla återkoppling från respondenter och använda den för att öka kvaliteten av varor, tjänster och kanske livet. Genom att använda chattbottens språkkunskap kan människor engagera sig med botten i en konversation och svara på undersökningsfrågor på ett annorlunda sätt. Genom att använda sig av något kallat Restful API kan man ta ut kvantitativ information för att analysera den för förbättringssyfte gällande produkter och tjänster. Trots att chattbotten inte är välgjord och fortfarande kräver mycket justeringar så har arbetet visat sig ha många möjligheter inom undersökningar, samla återkoppling och att analysera det. Detta kan vara en förbättring för forskning om chattbottar i framtiden eller ett nytt sätt att förbättra undersökningar. Chatbot feedback Natural Language Processing Computer and Information Sciences Data- och informationsvetenskap
212	Pre-training a knowledge enhanced model in biomedical domain for information extraction Yan, Xi January 2022 (has links) While recent years have seen a rise of research in knowledge graph enrichedpre-trained language models(PLM), few studies have tried to transfer the work to the biomedical domain. This thesis is a first attempt to pre-train a large-scalebiological knowledge enriched language model (KPLM). Under the frameworkof CoLAKE (T. Sun et al., 2020), a general-use KPLM in general field, this study is pre-trained on PubMed abstracts (a large scale medical text data) andBIKG (AstraZeneca’s biological knowledge graph). We firstly get abstracts from PubMed and their entity linking results. Following this is to connect the entities from abstracts to BIKG to form sub-graphs. Such sub-graphs and sentences from PubMed abstracts are then sent to model CoLAKE for pre-training. By training the model on three objectives (masking word nodes, masking entity nodes and masking relation nodes), this research aims to not only enhancing model’s capacity on modeling natural language but also infusing in-depth knowledge. Later the model is fine-tuned on name entity recognition (NER) and relation extraction tasks on three benchmark datasets (Chemprot (Kringelumet al., 2016), DrugProt (form Text mining drug-protein/gene interactions sharedtask) and DDI (Segura-Bedmar et al., 2013)). Empirical results show that the model outperform state-of-the-art models relation extraction task on DDI dataset, with F1 score of 91.2%. Also on Drugprot and chemprot, this model shows improvement over baseline - scibert model. bio-nlp knowledge enhanced language model pre-training information extraction. transformer
213	Text Prediction using Machine Learning Khalid, Muhammad Faizan January 2022 (has links) Language modeling is a very broad field and has been used for various purposes for a long period of time to make the lives of people easier. Language modeling is also used for text prediction for mobile keyboards to make the user experience smooth. Tobii has been working since 2001 for users who are suffering from ALS (Amyotrophic Lateral Sclerosis). In this disease, users are unable to talk, walk or chew due to the weakening of voluntary muscles and this gets worse day by day. Tobii has designed an Eye Tracker solution for people suffering from ALS to do their tasks more conveniently. They also developed a keyboard for talking which is controlled by an Eye Tracker device. Users can write sentences using the keyboard and then convey them to other people by conversion of this keyboard written text to speech. Therefore, the thesis is related to predicting the text on the initial input of the keyboard to make the user experience fast, easy and less hectic. This thesis project was conducted at Tobii Dynavox with the objective to build a language model which is an automatic, fast, and efficient approach to predict the text for the given input of text. It explores the way to predict sentences by using deep learning models on the initial text input from users and predict the text by taking into consideration user-specific writing style. The model developed in the thesis could be used by Tobii Dynavox for the end-users to predict the text. Part of the objective is also to find out which is the better approach for the implementation of the language models. The results show that federated learning is performing better than centralized machine learning. After analysing the results, it can also be said that Gated Recurrent Units (GRU) will be a good choice for our models because these models show better results for accuracy and take less training and response times. Language prediction sentence completion LSTM GRU Computer Sciences Datavetenskap (datalogi)
214	Semi-automatic Segmentation & Alignment of Handwritten Historical Text Images with the use of Bayesian Optimisation MacCormack, Philip January 2023 (has links) To effortlessly digitise historical documents has risen to be of great interest for some time. Part of the digitisation is what is called annotating of the data. Such data annotations are obtained in a process called alignment which links words in an image to the transcript. Annotated data have many use cases such as being used in the training of handwritten text recognition models. Relevant to the application above, this project aimed to develop an interactive algorithm for the segmentation and alignment of historical document images. Two different developed methods (referred to as method 1 and method 2) were evaluated and compared on two different data sets Labour’sMemory and IAM. A method to incorporate self-learning was also developed and evaluated with Bayesian optimisation aimed at automatically setting parameters for the algorithm. The results proved that the algorithms perform better on the IAM data set, which could partly be explained by the difference in quality of the ground truth used for calculation of the performance metrics. Moreover, method 2 slightly outperformed method 1 for both data sets. Bayesian optimisation proved to be a reasonable, and more time efficient way of effectively setting parameters compared to manually finding parameters for each document. The work done in this project could serve as the basis for the future development of a useful and interactive tool for the alignment of text documents. handwritten text recognition machine learning bayesian optimisation image analysis segmentation alignment
215	Invoice Line Item Extraction using Machine Learning SaaS Models Kadir, Avin January 2022 (has links) Manual invoice processing is a time-consuming and error prone task which has proven to be done more efficiently by introducing automation software that minimizes the need for human input. Amazon Textract is a software as a service provided by Amazon Web Services for that purpose. It has been developed to extract document data from both general and financial documents, such as receipts and invoices, by using machine learning models. The service is available in multiple widely spoken languages, but not in Swedish as of the time of writing this thesis. This thesis explores the potential and accuracy of Amazon Textract in extracting data from Swedish invoices by using the English setting. Specifically, the accuracy of extracting line items as well as Swedish letters are examined. In addition, the potential of correcting incorrectly extracted data is explored. This is achieved by testing certain defined categories on each invoice by comparing the Amazon Textract extractions with the correct labeled data. These categories include emptiness, meaning no data was extracted, equality, missing and added line items, as well as missing and added characters that have been added or removed from otherwise correct line item strings. The invoices themselves are divided into two categories, namely structured and semi-structured invoices. The tests are mainly conducted on the service’s dedicated API method for data extraction of financial documents, but a comparison with the table extraction API method is also made to gain more insight in Amazon Textract’s capability. The results suggest that Amazon Textract is quite inaccurate when extracting line item data from Swedish invoices. Therefore, manual post processing of the data is generally needed to ensure its correctness. However, it showed better results in extracting data from structured invoices, where it scored 70% in equality and 100% in 2 out of 6 invoice layouts. The Swedish character accuracy was 66%. natural language processing Amazon Textract invoice data extraction accounts payable
216	Context-aware Swedish Lexical Simplification : Using pre-trained language models to propose contextually fitting synonyms / Kontextmedveten lexikal förenkling på svenska : Användningen av förtränade språkmodeller för att föreslå kontextuellt passande synonymer. Graichen, Emil January 2023 (has links) This thesis presents the development and evaluation of context-aware Lexical Simplification (LS) systems for the Swedish language. In total three versions of LS models, LäsBERT, LäsBERT-baseline, and LäsGPT, were created and evaluated on a newly constructed Swedish LS evaluation dataset. The LS systems demonstrated promising potential in aiding audiences with reading difficulties by providing context-aware word replacements. While there were areas for improvement, particularly in complex word identification, the systems showed agreement with human annotators on word replacements. The effects of fine-tuning a BERT model for substitution generation on easy-to-read texts were explored, indicating no significant difference in the number of replacements between fine-tuned and non-fine-tuned versions. Both versions performed similarly in terms of synonymous and simplifying replacements, although the fine-tuned version exhibited slightly reduced performance compared to the baseline model. An important contribution of this thesis is the creation of an evaluation dataset for Lexical Simplification in Swedish. The dataset was automatically collected and manually annotated. Evaluators assessed the quality, coverage, and complexity of the dataset. Results showed that the dataset had high quality and a perceived good coverage. Although the complexity of the complex words was perceived to be low, the dataset provides a valuable resource for evaluating LS systems and advancing research in Swedish Lexical Simplification. Finally, a more transparent and reader-empowering approach to Lexical Simplification isproposed. This new approach embraces the challenges with contextual synonymy and reduces the number of failure points in the conventional LS pipeline, increasing the chancesof developing a fully meaning-preserving LS system. Links to different parts of the project can be found here: The Lexical Simplification dataset: https://github.com/emilgraichen/SwedishLSdataset The lexical simplification algorithm: https://github.com/emilgraichen/SwedishLexicalSimplifier automatic text simplification lexical simplification Swedish BERT GPT-3 evaluation dataset synonymy
217	Evaluation and Implementation of Code Search using Transformers to Enhance Developer Productivity / Evaluering och Implementering av Kodsökning genom Transformers för att Förbättra Utvecklares Produktivitet Fredrikson, Sara, Månsson, Clara January 2023 (has links) With the rapid advancements in the field of Natural Language Processing and Artificial Intelligence, several aspects of its use cases and impact on productivity are largely unexplored. Many of the recent machine learning models are based on an architecture called Transformers that allows for faster computation and for more context to be preserved. At the same time, tech companies face the dilemmas of how to navigate their code bases, spanning over millions of lines of code. The aim of this thesis is to investigate whether the implementation and fine-tuning of a Transformers-based model can be utilised to improve the code search process in a tech company, leading to improvements in developer productivity. Specifically, the thesis will evaluate the effectiveness of such implementation from a productivity perspective in terms of velocity, quality, and satisfaction. The research uses a mixed method design consisting of two distinct methodologies as well as analyses of quantitative and qualitative data. To assess the level of accuracy that can be obtained by optimising a Transformers-based model on internal data, an evaluative experiment with various internal datasets was conducted. The second methodology applied was a usability test, investigating potential impacts on velocity, quality, and satisfaction by testing a contextual code-search prototype with developers. Data from the tests was analysed through a heat map-, trade-off- and template analysis. Results indicate that a Transformers-based modes can be optimised for code search on internal data and has the potential to improve code search from the aspects of velocity, quality, and satisfaction. / Den snabba utvecklingen inom områdena för Språlteknologi och Artificiell Intelligens har visat på stora framgångar men också lämnat utrymme för ytterligare forskning på dess användningsområden och inverkan på produktivitet. Många av de senaste maskininlärningsmodellerna använder sig av en arkitektur kallad Transformers. Denna arkitektur möjliggör snabbare bearbetning av data och är bättre på att ta hänsyn till kontext. Samtidigt står tech-bolagen inför stora utmaningar i att navigera sina kodbaser, vilka består av flera miljoner rader kod. Målet med denna uppsats är att undersöka huruvida implementering och fine-tuning av en Transformers-baserad modell kan användas för att förbättra kodsökningsprocessen i ett tech-bolag och därmed leda till förbättring av utvecklares produktivitet. Mer specifikt utvärderar uppsatsen en sådan implementation från ett produktivitetsperspektiv med hänsyn till dimensioner såsom hastighet, kvalitet och tillfredställelse. Uppsatsen använder sig av en mixad metodologi bestående av två distinkta metoder samt analys av både kvalitativ och kvantitativ data. För att utvärdera nivån av noggrannhet som kan uppnås genom implementation och optimering av en Transformers-baserad modell på intern data, genomfördes experiment på olika interna dataset. Den andra metoden består av ett usability test för att undersöka potentiella effekter på hastighet, kvalitet och tillfredställelse genom att testa en kontextuell kodsökningsprototyp med utvecklare. Data från testen analyserades genom en heat map, trade-off och template analys. Resultaten indikerar att en Transformers-baserad modell kan optimeras för kodsökningpå intern data och har möjlighet att förbättra kodsökning från perspektiven hastighet, kvalitet och tillfredställelse. Transformers Code Search Developer Productivity Natural Language Processing Code Discoverability Transformers Kodsökning Utvecklares Produktivitet Språkteknologi Kodupptäckbarhet Engineering and Technology Teknik och teknologier
218	Information Extraction for Test Identification in Repair Reports in the Automotive Domain Jie, Huang January 2023 (has links) The knowledge of tests conducted on a problematic vehicle is essential for enhancing the efficiency of mechanics. Therefore, identifying the tests performed in each repair case is of utmost importance. This thesis explores techniques for extracting data from unstructured repair reports to identify component tests. The main emphasis is on developing a supervised multi-class classifier to categorize data and extract sentences that describe repair diagnoses and actions. It has been shown that incorporating a category-aware contrastive learning objective can improve the repair report classifier’s performance. The proposed approach involves training a sentence representation model based on a pre-trained model using a category-aware contrastive learning objective. Subsequently, the sentence representation model is further trained on the classification task using a loss function that combines the cross-entropy and supervised contrastive learning losses. By applying this method, the macro F1-score on the test set is increased from 90.45 to 90.73. The attempt to enhance the performance of the repair report classifier using a noisy data classifier proves unsuccessful. The noisy data classifier is trained using a prompt-based fine-tuning method, incorporating open-ended questions and two examples in the prompt. This approach achieves an F1-score of 91.09 and the resulting repair report classification datasets are found easier to classify. However, they do not contribute to an improvement in the repair report classifier’s performance. Ultimately, the repair report classifier is utilized to aid in creating the input necessary for identifying component tests. An information retrieval method is used to conduct the test identification. The incorporation of this classifier and the existing labels when creating queries leads to an improvement in the mean average precision at the top 3, 5, and 10 positions by 0.62, 0.81, and 0.35, respectively, although with a slight decrease of 0.14 at the top 1 position. text classification information retrieval contrastive learning prompt-based fine-tuning large language models
219	Automated Extraction of Insurance Policy Information : Natural Language Processing techniques to automate the process of extracting information about the insurance coverage from unstructured insurance policy documents. Hedberg, Jacob, Furberg, Erik January 2023 (has links) This thesis investigates Natural Language Processing (NLP) techniques to extract relevant information from long and unstructured insurance policy documents. The goal is to reduce the amount of time required by readers to understand the coverage within the documents. The study uses predefined insurance policy coverage parameters, created by industry experts to represent what is covered in the policy documents. Three NLP approaches are used to classify the text sequences as insurance parameter classes. The thesis shows that using SBERT to create vector representations of text to allow cosine similarity calculations is an effective approach. The top scoring sequences for each parameter are assigned that parameter class. This approach shows a significant reduction in the number of sequences required to read by a user but misclassifies some positive examples. To improve the model, the parameter definitions and training data were combined into a support set. Similarity scores were calculated between all sequences and the support sets for each parameter using different pooling strategies. This few-shot classification approach performed well for the use case, improving the model’s performance significantly. In conclusion, this thesis demonstrates that NLP techniques can be applied to help understand unstructured insurance policy documents. The model developed in this study can be used to extract important information and reduce the time needed to understand the contents of aninsurance policy document. A human expert would however still be required to interpret the extracted text. The balance between the amount of relevant information and the amount of text shown would depend on how many of the top-scoring sequences are classified for each parameter. This study also identifies some limitations of the approach depending on available data. Overall, this research provides insight into the potential implications of NLP techniques for information extraction and the insurance industry. NLP SBERT AI Insurance Semantic similarity Computer Sciences Datavetenskap (datalogi)
220	Comparing performance of K-Means and DBSCAN on customer support queries Kästel, Arne Morten, Vestergaard, Christian January 2019 (has links) In customer support, there are often a lot of repeat questions, and questions that does not need novel answers. In a quest to increase the productivity in the question answering task within any business, there is an apparent room for automatic answering to take on some of the workload of customer support functions. We look at clustering corpora of older queries and texts as a method for identifying groups of semantically similar questions and texts that would allow a system to identify new queries that fit a specific cluster to receive a connected, automatic response. The approach compares the performance of K-means and density-based clustering algorithms on three different corpora using document embeddings encoded with BERT. We also discuss the digital transformation process, why companies are unsuccessful in their implementation as well as the possible room for a new more iterative model. / I kundtjänst förekommer det ofta upprepningar av frågor samt sådana frågor som inte kräver unika svar. I syfte att öka produktiviteten i kundtjänst funktionens arbete att besvara dessa frågor undersöks metoder för att automatisera en del av arbetet. Vi undersöker olika metoder för klusteranalys, applicerat på existerande korpusar innehållande texter så väl som frågor. Klusteranalysen genomförs i syfte att identifiera dokument som är semantiskt lika, vilket i ett automatiskt system för frågebevarelse skulle kunna användas för att besvara en ny fråga med ett existerande svar. En jämförelse mellan hur K-means och densitetsbaserad metod presterar på tre olika korpusar vars dokumentrepresentationer genererats med BERT genomförs. Vidare diskuteras den digitala transformationsprocessen, varför företag misslyckas avseende implementation samt även möjligheterna för en ny mer iterativ modell. Classification Digital Transformation Natural language processing Short text clustering. Digital transformation Klassifikation Klusteranalys Språkteknologi. Computer and Information Sciences Data- och informationsvetenskap

Search results