Global ETD Search

21	Pretraining Deep Learning Models for Natural Language Understanding Shao, Han 18 May 2020 (has links) No description available. Computer Science Machine learning NLP Deep learning
22	Can artificial intelligence replace humans in programming? Ekedahl, Hampus, Helander, Vilma January 2023 (has links) The recent developments in artificial intelligence have brought forth natural language models like ChatGPT, which exhibits abilities in tasks such as language translation, text generation, and interacting conversations. Notably, ChatGPT's ability to generate code has sparked debates regarding the role of artificial intelligence in software engineering and its potential to replace human programmers. In this thesis, we conduct an experiment in which we prompt ChatGPT with common programming problems, in order to evaluate ChatGPT’s potential in replacing humans as programmers. Our study specifically focuses on code correctness, run-time performance, and memory usage. The objective of this thesis is to investigate the potential of ChatGPT in replacing humans as programmers. We achieved this by designing and conducting an experiment where we prompted ChatGPT with a set of 90 diverse programming problems in terms of types and difficulty levels. Based on the results of our experiment, we have observed that ChatGPT is proficient in solving programming problems at lower and medium difficulty levels. However, its ability to produce correct code declines when prompted with harder problems. In terms of run-time and memory usage, ChatGPT demonstrated above-average results for problems at lower and medium difficulty levels, but its performance declined when faced with more challenging tasks.While ChatGPT falls shortly in fully replacing human programmers, it exhibits potential as programming assistant. Our study shed light onto current capabilities of ChatGPT and others chat-bots as code generating tools and can serve as a groundwork for future work in the area. AI ChatGPT NLP Computer Sciences Datavetenskap (datalogi)
23	Snort Rule Generation for Malware Detection Using the GPT2 Transformer Laryea, Ebenezer Nii Afotey 04 July 2022 (has links) Natural Language machine learning methods are applied to rules generated to identify malware at the network level. These rules use a computer-based signature specification "language" called Snort. Using Natural Language processing techniques and other machine learning methods, new rules are generated based on a training set of existing Snort rule signatures for a specific type of malware family. The performance is then measured, in terms of the detection of existing types of malware and the number of "false positive" triggering events. GPT-2 Snort malware detection NLP
24	Named Entity Recognition for Detecting Trends in Biomedical Literature Törnkvist, Betty January 2024 (has links) The number of publications in the biomedical field increases exponentially, which makes the task of keeping up with current research more and more difficult. However, rapid advances in the field of Natural Language Processing (NLP) offer possible solutions to this problem. In this thesis we focus on investigating three main questions of importance for utilizing the field of NLP, or more specifically the two subfields Named Entity Recognition (NER) and Large Language Models (LLM), to help solve this problem. The questions are; comparing LLM performance to NER models on NER-tasks, the importance of normalization, and how the analysis is affected by the availability of data. We find for the first question that the two models offer a reasonably comparable performance for the specific task we are looking at. For the second question, we find that normalization plays a substantial role in improving the results for tasks involving data synthesis and analysis. Lastly, for the third question, we find that it is important to have access to full papers in most cases since important information can be hidden outside of the abstracts. NLP NER CHO Computer Sciences Datavetenskap (datalogi)
25	Who is Doing What: Tracing and Understanding the Contributions in Collaborative Software DevelopmentProjects Nimér, Ebba, Pesjak, Emma January 2024 (has links) Context - In the fast-paced world of software development, understanding and tracking contributions within project teams is crucial for efficient project management and collaboration. Git, a popular Version Control System, facilitates collaboration but lacks comprehensive tools for analyzing individual contributions in detail. Objective - This thesis proposes an approach to classify and analyze Git commit messages and the associated file paths of the changed files in the commits, using Natural Language Processing (NLP) techniques, aiming to improve project transparency and contributor recognition. Method - By employing Bidirectional Encoder Representations from Transformers (BERT) models, an NLP technique, this study categorizes data from multiple collected Git repositories. A tool named DevAnalyzer is developed to automate the classification and analysis process, enhancing the understanding of contribution patterns. Results - The Git commit message model demonstrated high accuracy with an average of 98.9%, and the file path model showed robust performance with an average accuracy of 99.8%. Thereby, both models provided detailed insights into the types and locations of contributions within projects. Conclusions - The findings validate the effectiveness of using BERT models for classifying and categorizing both Git commit messages and file paths with the DevAnalyzer. This approach provides a more comprehensive understanding of contributions, benefiting project management and team collaboration. Transformers BERT NLP Git Software Engineering Programvaruteknik
26	Enhancing Document Retrieval in the FinTech Domain : Applications of Advanced Language Models Hansen, Jesper January 2024 (has links) In this thesis, methods of creating an information retrieval (IR) model within the Fin-Tech domain are explored. Given the domain-specific and data-scarce environment, methods of artificially generating data to train and evaluate IR models are implemented and their limitations are discussed. The generative model GPT-J 6B is used to generate pseudo-queries for a document corpus, resulting in a training- and test-set of 148 and 166 query-document pairs respectively. Transformer-based models, fine-tuned- and original versions, are put to the test against the baseline model BM25 which historically has been seen as an effective document retrieval model. The models are evaluated using mean reciprocal rank at k (MRR@k) and time-cost to retrieve relevant documents. The main findings are that the historical BM25 model performs well in comparison to the transformer alternatives, it reaches the highest score for MRR@2 = 0.612. The results show that for MRR@5 and MRR@10, a combination model of BM25 and a cross encoder slightly outperforms the baseline reaching scores of MRR@5 = 0.655 and MRR@10 = 0.672. However, the increase in performance is slim and may not be enough to motivate an implementation. Finally, further research using real-world data is required to argue that transformer-based models are more robust in a real-world setting. NLP Information retrieval Semantic similarity Mathematics Matematik
27	[en] TRANSITIONBASED DEPENDENCY PARSING APPLIED ON UNIVERSAL DEPENDENCIES / [pt] ANÁLISE DE DEPENDÊNCIA BASEADA EM TRANSIÇÃO APLICADA A UNIVERSAL DEPENDENCIES CESAR DE SOUZA BOUCAS 11 February 2019 (has links) [pt] Análise de dependência consiste em obter uma estrutura sintática correspondente a determinado texto da linguagem natural. Tal estrutura, usualmente uma árvore de dependência, representa relações hierárquicas entre palavras. Representação computacionalmente eficiente que vem sendo utilizada para lidar com desafios que surgem com o crescente volume de informação textual online. Podendo ser utilizada, por exemplo, para inferir computacionalmente o significado de palavras das mais diversas línguas. Este trabalho apresenta a análise de dependência com enfoque em uma de suas modelagens mais populares em aprendizado de máquina: o método baseado em transição. Desenvolvemos uma implementação gulosa deste modelo com um classificador neural simples para executar experimentos. Datasets da iniciativa Universal Dependencies são utilizados para treinar e posteriormente testar o sistema com a validação disponibilizada na tarefa compartilhada da CoNLL-2017. Os resultados mostram empiricamente que se pode obter ganho de performance inicializando a camada de entrada da rede neural com uma representação de palavras obtida com pré-treino. Chegando a uma performance de 84,51 LAS no conjunto de teste da língua portuguesa do Brasil e 75,19 LAS no conjunto da língua inglesa. Ficando cerca de 4 pontos atrás da performance do melhor resultado para analisadores de dependência baseados em sistemas de transição. / [en] Dependency parsing is the task that transforms a sentence into a syntactic structure, usually a dependency tree, that represents relations between words. This representations are useful to deal with several tasks that arises with the increasing volume of textual online information and the need for technologies that depends on NLP tasks to work. It can be used, for example, to enable computers to infer the meaning of words of multiple natural languages. This paper presents dependency parsing with focus on one of its most popular modeling in machine learning: the transition-based method. A greedy implementation of this model with a simple neural network-based classifier is used to perform experiments. Universal Dependencies treebanks are used to train and then test the system using the validation script published in the CoNLL-2017 shared task. The results empirically indicate the benefits of initializing the input layer of the network with word embeddings obtained through pre-training. It reached 84.51 LAS in the Portuguese of Brazil test set and 75.19 LAS in the English test set. This result is nearly 4 points behind the performance of the best results of transition-based parsers. [pt] APRENDIZADO DE MAQUINA [en] MACHINE LEARNING [pt] ANALISE DE DEPENDENCIA [en] DEPENDENCY PARSING [pt] NLP [en] NLP
28	Nuomonių analizės taikymas komentarams lietuvių kalboje / Opinion analysis of comments in Lithuanian Kavaliauskas, Vytautas 15 June 2011 (has links) Pastaruosius keletą metų, žmonėms vis aktyviau pradėjus reikšti savo požiūrį, įsitikinimus ir potyrius internete, susiformavo nauja tyrinėjimų sritis, kuri apima nuomonių gavybą ir sentimentų analizę. Šios srities tyrinėjimus aktyviai skatina ir jais domisi įvairios verslo kompanijos, matančios didelį, dėka nuolat tobulėjančių rezultatų, praktinį potencialą. Šis darbas skirtas apžvelgti teorinius bei praktinius nuomonės gavybos ir sentimentų analizės rezultatus bei realizuoti prototipinę nuomonės analizės sistemą, skirtą tyrinėti trumpus komentarus, parašytus lietuvių kalba. Taip pat darbe aprašomos problemos, susijusios su lietuvių kalbos taikymu nuomonės gavybos ir sentimentų analizės sistemų veikloje. Galiausiai, baigiamojoje dalyje suformuluojami ir išdėstomi rekomendacinio pobūdžio etapai, skirti nuomonės analizės sistemų kūrimui bei tobulinimui. / In past few years, more and more people started to express their views, beliefs and experiences on the Internet. This caused the emergence of a new research field, which includes opinion mining and sentiment analysis. Various business companies are actively interested in researches of this domain and seeing big potential for practical adaptation of the results. This Master Thesis covers the review of theoretical and practical results of opinion mining and sentiment analysis, including attempt of creating prototype system for opinion analysis of comments in Lithuanian. Also this study aims to identify problems related to adaptation of Lithuanian language in opinion mining and sentiment analysis system work. Finally, last part contains of the formulated guidance steps for development and improvement of the opinion mining and sentiment analysis. Informatics Nuomonių gavyba Sentimentų analizė Sentimentai Nuomonės NLP Opinion mining Sentiment analysis Sentiments Opinions NLP
29	Tool for linguistic quality evaluation of student texts / Verktyg för språklig utvärdering av studenttexter Kärde, Wilhelm January 2015 (has links) Spell checkers are nowadays a common occurrence in most editors. A student writing an essay in school will often have the availability of a spell checker. However, the feedback from a spell checker seldom correlates with the feedback from a teacher. A reason for this being that the teacher has more aspects on which it evaluates a text. The teacher will, as opposed to the the spell checker, evaluate a text based on aspects such as genre adaptation, structure and word variation. This thesis evaluates how well those aspects translate to NLP (Natural Language Processing) and implements those who translate well into a rule based solution called Granska. / Grammatikgranskare ﬁnns numera tillgängligt i de ﬂesta ordbehandlare. En student som skriver en uppsats har allt som oftast tillgång till en grammatikgranskare. Dock så skiljer det sig mycket mellan den återkoppling som studenten får från grammatikgranskaren respektive läraren. Detta då läraren ofta har ﬂer aspekter som den använder sig av vid bedömingen utav en elevtext. Läraren, till skillnad från grammatikgranskaren, bedömmer en text på aspekter så som hur väl texten hör till en viss genre, dess struktur och ordvariation. Denna uppsats utforskar hur pass väl dessa aspekter går att anpassas till NLP (Natural Language Processing) och implementerar de som passar väl in i en regelbaserad lösning som heter Granska. linguistic spell checker NLP natural language processing lingvistik grammatikgranskare NLP Computer Sciences Datavetenskap (datalogi)
30	Textová klasifikace s limitovanými trénovacími daty / Text classification with limited training data Laitoch, Petr January 2021 (has links) The aim of this thesis is to minimize manual work needed to create training data for text classification tasks. Various research areas including weak supervision, interactive learning and transfer learning explore how to minimize training data creation effort. We combine ideas from available literature in order to design a comprehensive text classification framework that employs keyword-based labeling instead of traditional text annotation. Keyword-based labeling aims to label texts based on keywords contained in the texts that are highly correlated with individual classification labels. As noted repeatedly in previous work, coming up with many new keywords is challenging for humans. To accommodate for this issue, we propose an interactive keyword labeler featuring the use of word similarity for guiding a user in keyword labeling. To verify the effectiveness of our novel approach, we implement a minimum viable prototype of the designed framework and use it to perform a user study on a restaurant review multi-label classification problem.

Search results