Global ETD Search

221	Data Segmentation Using NLP: Gender and Age Demmelmaier, Gustav, Westerberg, Carl January 2021 (has links) Natural language processing (NLP) opens the possibilities for a computer to read, decipher, and interpret human languages to eventually use it in ways that enable yet further understanding of the interaction and communication between the human and the computer. When appropriate data is available, NLP makes it possible to determine not only the sentiment information of a text but also information about the author behind an online post. Previously conducted studies show aspects of NLP potentially going deeper into the subjective information, enabling author classification from text data. This thesis addresses the lack of demographic insights of online user data by studying language use in texts. It compares four popular yet diverse machine learning algorithms for gender and age segmentation. During the project, the age analysis was abandoned due to insufficient data. The online texts were analysed and quantified into 118 parameters based on linguistic differences. Using supervised learning, the researchers succeeded in correctly predicting the gender in 82% of the cases when analysing data from English online users. The training and test data may have some correlations, which is important to notice. Language is complex and, in this case, the more complex methods SVM and Neural networks were performing better than the less complex Naive Bayes and Logistic regression. Natural language processing NLP sentiment analysis machine learning demographics demographic factors gender age ethics Computer and Information Sciences Data- och informationsvetenskap
222	Automatic Dispatching of Issues using Machine Learning / Automatisk fördelning av ärenden genom maskininlärning Bengtsson, Fredrik, Combler, Adam January 2019 (has links) Many software companies use issue tracking systems to organize their work. However, when working on large projects, across multiple teams, a problem of finding the correctteam to solve a certain issue arises. One team might detect a problem, which must be solved by another team. This can take time from employees tasked with finding the correct team and automating the dispatching of these issues can have large benefits for the company. In this thesis, the use of machine learning methods, mainly convolutional neural networks (CNN) for text classification, has been applied to this problem. For natural language processing both word- and character-level representations are commonly used. The results in this thesis suggests that the CNN learns different information based on whether word- or character-level representation is used. Furthermore, it was concluded that the CNN models performed on similar levels as the classical Support Vector Machine for this task. When compared to a human expert, working with dispatching issues, the best CNN model performed on a similar level when given the same information. The high throughput of a computer model, therefore, suggests automation of this task is very much possible. NLP Machine Learning Convolutional Neural Networks CNN SVM bug report issue text classification Computer Sciences Datavetenskap (datalogi)
223	Design und Implementierung eines Algorithmus zum maschinellen Lernen der Flexion eines Korpus deutscher Sprache Moritz, Julian 20 October 2017 (has links) Die vorliegende Arbeit beschreibt das Design und die Implementierung eines Algorithmus zur Flexion. Es wird am Beispiel des Deutschen eine konkrete Implementierung entwickelt. Hierfür findet zunächst eine ausführliche Analyse der Flexion des Deutschen statt, bevor ein Verfahren erarbeitet wird, das sprachunabhängig ist und somit prinzipiell auf andere Sprachen übertragen werden kann. Die tatsächliche Machbarkeit des Verfahrens wird anhand von Beispielen nachgewiesen. Die hohe Komplexität der Aufgabe führt allerdings dazu, dass es in der Praxis zu Abstrichen bei der Qualität der flektierten Wortformen kommt. Dies ist insbesondere deswegen der Fall, da das entwickelte System auch ihm unbekannte Grundformen flektiert. info:eu-repo/classification/ddc/000 ddc:000
224	Automated Image Suggestions for News Articles : An Evaluation of Text and Image Representations in an Image Retrieval System / Automatiska bildförslag till nyhetsartiklar Svensson, Pontus January 2020 (has links) Multimodal machine learning is a subfield of machine learning that aims to relate data from different modalities, such as texts and images. One of the many applications that could be built upon this technique is an image retrieval system that, given a text query, retrieves suitable images from a database. In this thesis, a retrieval system based on canonical correlation is used to suggest images for news articles. Different dense text representations produced by Word2vec and Doc2vec, and image representations produced by pre-trained convolutional neural networks are explored to find out how they affect the suggestions. Which part of an article is best suited as a query to the system is also studied. Also, experiments are carried out to determine if an article's date of publication can be used to improve the suggestions. The results show that Word2vec outperforms Doc2vec in the task, which indicates that the meaning of article texts are not as important as the individual words they consist of. Furthermore, the queries are improved by rewarding words that are particularly significant. nlp natural language processing computer vision image classification news articles suggestions retrieval Word2vec Doc2vec Computer and Information Sciences Data- och informationsvetenskap
225	Evaluation of text classification techniques for log file classification / Utvärdering av textklassificeringstekniker för klassificering avloggfiler Olin, Per January 2020 (has links) System log files are filled with logged events, status codes, and other messages. By analyzing the log files, the systems current state can be determined, and find out if something during its execution went wrong. Log file analysis has been studied for some time now, where recent studies have shown state-of-the-art performance using machine learning techniques. In this thesis, document classification solutions were tested on log files in order to classify regular system runs versus abnormal system runs. To solve this task, supervised and unsupervised learning methods were combined. Doc2Vec was used to extract document features, and Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) based architectures on the classification task. With the use of the machine learning models and preprocessing techniques the tested models yielded an f1-score and accuracy above 95% when classifying log files. Text classification machine learning NLP natural language processing log file doc2vec CNN LSTM LSTM-CNN Computer Sciences Datavetenskap (datalogi)
226	A Natural Language Interface for Querying Linked Data Akrin, Christoffer, Tham, Simon January 2020 (has links) The thesis introduces a proof of concept idea that could spark great interest from many industries. The idea consists of a remote Natural Language Interface (NLI), for querying Knowledge Bases (KBs). The system applies natural language technology tools provided by the Stanford CoreNLP, and queries KBs with the use of the query language SPARQL. Natural Language Processing (NLP) is used to analyze the semantics of a question written in natural language, and generates relational information about the question. With correctly defined relations, the question can be queried on KBs containing relevant Linked Data. The Linked Data follows the Resource Description Framework (RDF) model by expressing relations in the form of semantic triples: subject-predicate-object. With our NLI, any KB can be understood semantically. By providing correct training data, the AI can learn to understand the semantics of the RDF data stored in the KB. The ability to understand the RDF data allows for the process of extracting relational information from questions about the KB. With the relational information, questions can be translated to SPARQL and be queried on the KB. SPARQL NLP RDF Semantic Web Knowledge Base Knowledge Graph Computer Sciences Datavetenskap (datalogi)
227	Parafrasidentifiering med maskinklassificerad data : utvärdering av olika metoder / Paraphrase identification with computer classified paraphrases : An evaluation of different methods Johansson, Oskar January 2020 (has links) Detta arbete undersöker hur språkmodellen BERT och en MaLSTM-arkitektur fungerar att för att identifiera parafraser ur 'Microsoft Paraphrase Research Corpus' (MPRC) om dessa tränats på automatiskt identifierade parafraser ur 'Paraphrase Database' (PPDB). Metoderna ställs mot varandra för att undersöka vilken som presterar bäst och metoden att träna på maskinklassificerad data för att användas på mänskligt klassificerad data utvärderas i förhållande till annan klassificering av samma dataset. Meningsparen som används för att träna modellerna hämtas från de högst rankade parafraserna ur PPDB och genom en genereringsmetod som skapar icke-parafraser ur samma dataset. I resultatet visar sig BERT vara kapabel till att identifiera en del parafraser ur MPRC, medan MaLSTM-arkitekturen inte klarade av detta trots förmåga att särskilja på parafraser och icke-parafraser under träning. Både BERT och MaLSTM presterade sämre på att identifiera parafraser ur MPRC än modeller som till exempel StructBERT, som tränat och utvärderats på samma dataset, presterar. Anledningar till att MaLSTM inte klarar av uppgiften diskuteras och främst lyfts att meningarna från icke-parafraserna ur träningsdatan är för olika varandra i förhållande till hur de ser ut i MPRC. Slutligen diskuteras vikten av att forska vidare på hur man kan använda sig av maskinframtagna parafraser inom parafraseringsrelaterad forskning. NLP text classification BERT
228	Value Creation From User Generated Content for Smart Tourism Destinations Celen, Mustafa, Rojas, Maximiliano January 2020 (has links) This paper aims to show how User Generated Content can create value for Smart Tourism Destinations. Applying the analysis on 5 different cases in the region of Stockholm to derive patterns and opportunities of value creation generated by UGC in tourism. Findings of this paper is also discussed in terms of improving decision making, possibilities of new business models and importance of technological improvements on STD’s. Finally, thoughts on models are presented for researchers and practitioners that might be interested in exploitation of UGC in the context of information-intensive industries and mainly in Tourism. Goggle Trends Tripadvisor Smart Tourism Destinations User Generated Content NLP Text Mining Topic Analysis Sentiment Analysis Social Sciences Samhällsvetenskap
229	Predicting Political Party Affiliation in the Swedish Parliament using Natural Language Processing Zetterberg, Johannes January 2022 (has links) Text classification is a fundamental part of natural language processing. In this thesis, methods for text classification are used in an attempt to predict the political party affiliation of members of parliament (MPs). The objective is to evaluate the performance of Support Vector Machines (SVM), naive Bayes, and a fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model in predicting MPs' political party affiliation based on speeches given in the Chamber of the Swedish Parliament. This study shows that BERT outperforms SVM and naive Bayes in correctly classifying MPs, and SVM makes better predictions than naive Bayes and performs reasonably well compared to BERT. The results show that all models correctly predict MPs representing the Sweden Democrats to the highest degree. Both BERT and SVM roughly classify every other speech correctly, which implies much better than making random predictions. These results indicate the potential use of methods for automatically classifying political speeches. Machine learning support vector machines naive Bayes transformer BERT text classification NLP Probability Theory and Statistics Sannolikhetsteori och statistik
230	Object Classification using Language Models From, Gustav January 2022 (has links) In today’s modern digital world more and more emails and messengers must be sent, processed and handled. The categorizing and classification of these text pieces can take an incredibly long time and will cost the company a lot of time and money. If the classification could be done automatically by a computer dependent on the content of the text/message it would result in a major yield for the Easit AB and its customers. In order to facilitate the task of text-classification Easit needs a solution that is made out of one language model and one classifier model. The language model will convert raw text to a vector that is representative of the text and the classifier will construe what predefined labels fit for the vector. The end goal is not to create the best solution. It is simply to create a general understanding about different language and classifier models and how to build a system that will be both fast and accurate. BERT were the primary language model during evaluation but doc2Vec and One-Hot encoding was also tested. The classifier consisted out of boundary condition models or dense neural networks that were all trained without knowledge about what language model that the text vectors came from. The validation accuracy which was presented for the IMDB-comment dataset with BERT resulted between 75% to 94%, mostly dependent on the language model and not on the classifier. The knowledge from the work resulted in a recommendation to Easit for an alternativebased system solution. / I dagens moderna digitala värld är det allt mer majl-ärenden och meddelanden som ska skickas och processeras. Kategorisering och klassificering av dessa kan ta otroligt lång tid och kostar företag tid samt pengar. Om klassifieringen kunde ske automatiskt beroende på text-innehållet skulle det innebära en stor vinst för Easit AB och deras kunder. För att underlätta arbetet med text-klassifiering behöver Easit en tvådelad lösning som består utav en språkmodell och en klassifierare. Språkmodellen som omvandlar text till en vektor som representerar texten och klassifieraren tolkar vilka fördefinerade ettiketter/märken som passar för vektorn. Målet är inte att skapa den bästa lösningen utan det är att skapa en generell kunskap för hur man kan utforma ett system som kan klassifiera texten på ett träffsäkert och effektivt sätt. Vid utvärdering av olika språkmodeller användes framförallt BERT-modeller men även doc2Vec och One-Hot testas också. Klassifieraren bestod utav gränsvillkors-modeller eller dense neurala nätverk som tränades helt utan vetskap om vilken språkmodell som skickat text-vektorerna. Träffsäkerheten som uppvisades vid validering för IMDB-kommentars datasetet med BERT blev mellan 75% till 94%, primärt beroende på språkmodellen. De neuralt nätverk passar bäst som klassifierare mest på grund av deras skalbarhet med flera ettiketter. Kunskapen från arbetet resulterade i en rekommendation till Easit om en alternativbaserad systemlösning. Classifier BERT machine learning ML language model IMDB word2Vec doc2Vec NLP

Search results