Spelling suggestions: "subject:"batural language aprocessing"" "subject:"batural language eprocessing""
561 |
Extracting Customer Sentiments from Email Support Tickets : A case for email support ticket prioritisationFiati-Kumasenu, Albert January 2019 (has links)
Background Daily, companies generate enormous amounts of customer support tickets which are grouped and placed in specialised queues, based on some characteristics, from where they are resolved by the customer support personnel (CSP) on a first-in-first-out basis. Given that these tickets require different levels of urgency, a logical next step to improving the effectiveness of the CSPs is to prioritise the tickets based on business policies. Among the several heuristics that can be used in prioritising tickets is sentiment polarity. Objectives This study investigates how machine learning methods and natural language techniques can be leveraged to automatically predict the sentiment polarity of customer support tickets using. Methods Using a formal experiment, the study examines how well Support Vector Machine (SVM), Naive Bayes (NB) and Logistic Regression (LR) based sentiment polarity prediction models built for the product and movie reviews, can be used to make sentiment predictions on email support tickets. Due to the limited size of annotated email support tickets, Valence Aware Dictionary and sEntiment Reasoner (VADER) and cluster ensemble - using k-means, affinity propagation and spectral clustering, is investigated for making sentiment polarity prediction. Results Compared to NB and LR, SVM performs better, scoring an average f1-score of .71 whereas NB scores least with a .62 f1-score. SVM, combined with the presence vector, outperformed the frequency and TF-IDF vectors with an f1-score of .73 while NB records an f1-score of .63. Given an average f1-score of .23, the models transferred from the movie and product reviews performed inadequately even when compared with a dummy classifier with an f1-score average of .55. Finally, the cluster ensemble method outperformed VADER with an f1-score of .61 and .53 respectively. Conclusions Given the results, SVM, combined with a presence vector of bigrams and trigrams is a candidate solution for extracting sentiments from email support tickets. Additionally, transferring sentiment models from the movie and product reviews domain to the email support tickets is not possible. Finally, given that there exists a limited dataset for conducting sentiment analysis studies in the Swedish and the customer support context, a cluster ensemble is recommended as a sample selection method for generating annotated data.
|
562 |
Intelligent chatbot assistant: A study of Natural Language Processing and Artificial IntelligenceLerjebo, Linus, Hägglund, Johannes January 2020 (has links)
The development and research of Artificial Intelligence have had a recent surge in recent years, which includes the medical field. Despite the new technology and tools available, the staff is still under a heavy workload. The goal of this thesis is to analyze the possibilities of a chatbot whose purpose is to assist the medical staff and provide safety for the patients by guaranteeing that they are being monitored. With the use of technologies such as Artificial Intelligence, Natural Language Processing, and Voice Over Internet Protocol, the chatbot can communicate with the patient. It will work as an assistant for the working staff and provide the information from the calls to the medical staff. With the answers provided from the call, the staff will not be needing to ask routine questions every time and can provide help more quickly. The chatbot is administrated through a web application where administrators can initiate calls and add patients to the database.
|
563 |
Automation of support service using Natural Language Processing : - Automation of errands taggingHaglund, Kristoffer January 2020 (has links)
In this paper, Natural Language Processing and classification algorithms were used to create a program that automatically can tag different errands that are connected to Fortnox (an IT company based in Växjö) support service. Controlled experiments were conducted to find the best classification algorithm together with different Bag-of-Word pre-processing algorithms to find what was best suited for this problem. All data were provided by Fortnox and were manually labeled with tags connected to it as training and test data. The result of the final algorithm was 69.15% correctly/accurately predicted errands using all original data. When looking at the data that were incorrectly predicted a pattern was noticed where many errands have identical text attached to them. By removing the majority of these errands, the result was increased to 94.08%.
|
564 |
Building a Medical Recommendation System : A case study on digitalizing evidence-based radiologyPersson, Fabian January 2020 (has links)
In this thesis, we show how a text-based Recommendation Systems can greatly benefit from neural statistical language models, more particularly BERT. We evaluate the framework on a digital and collaborative platform for radiologists, by automatically suggesting scientific papers from the medical database PubMed, to provide evidence in diagnostic radiology. The models use contextualized vectors to represent text, accounting for writing style, misspelling and jargon. By using pre-computed representations of text passages, we are able to use compute-heavy statistical language models in production environments, where supercomputers are not available during inference. The results suggest pre-computed embeddings are very effective when the texts came from the same domain, and less effective (but still useful) in capturing the interaction between clinical and scientific text. Nonetheless, the suggested solutions hold promises in this and other areas in medicine. Possibly, the results are transferable to other domains, such as processing of legal documents and patent search.
|
565 |
Jak působí jazyk na dav? Případová studie českého crowdfundingu s odměnami / Does Language Drive the Crowd? Case of Czech Reward-Based CrowdfundingHudcová, Tereza January 2020 (has links)
This thesis analyses the biggest reward-based crowdfunding platform in the Czech Repub- lic using textual tools on uniquely collected microdata. The research question investigates which of the attributes of project campaigns (including the language style of project de- scriptions) have a significant impact on successful funding. Empirical analysis combines results of Bayesian Model Averaging and logistic regression. Results reveal that firstly, language style of project descriptions does not possess any significant predictive power. Secondly, that utilization of a video, size of pledging goal, or the number of contributors have a significant effect on the campaign's success, which is in line with current literature. Thirdly, it has proven to be true that project categorization plays an important role as well. On the contrary, the findings do not imply any causal claims, such as whether those factors persuade contributors to donate money. JEL Classification G23 Keywords crowdfunding, rewards, success determinants, natural language processing Author's e-mail t.hudcova@gmail.com Supervisor's e-mail polakpet@gmail.com
|
566 |
EXPLORATORY SEARCH USING VECTOR MODEL AND LINKED DATADaeun Yim (9143660) 30 July 2020 (has links)
The way people acquire knowledge has largely shifted from print to web resources. Meanwhile, search has become the main medium to access information. Amongst various search behaviors, exploratory search represents a learning process that involves complex cognitive activities and knowledge acquisition. Research on exploratory search studies on how to make search systems help people seek information and develop intellectual skills. This research focuses on information retrieval and aims to build an exploratory search system that shows higher clustering performance and diversified search results. In this study, a new language model that integrates the state-of-the-art vector language model (i.e., BERT) with human knowledge is built to better understand and organize search results. The clustering performance of the new model (i.e., RDF+BERT) was similar to the original model but slight improvement was observed with conversational texts compared to the pre-trained language model and an exploratory search baseline. With the addition of the enrichment phase of expanding search results to related documents, the novel system also can display more diverse search results.
|
567 |
A Document Recommender Based on Word EmbeddingHe, Binlai January 2015 (has links)
With the booming development of information technology, text information is not only remained in paper-based forms, but also in digital forms which have been distributed all over internet. Massive information on the internet provides us so many options while at the same time makes it hard for us to choose which detail information we exactly need. The appearance of media monitoring is going to change the situation and help solve the problem. Meltwater group as a media monitoring company provides a service of tracking and sorting information to enterprises and help them to achieve business goals. These goals may include finding the best time or place to do business campaign and knowing the dynamic information about the competitors. There is a recommender system in Meltwater. When a query has been searched, the corresponding documents which are searched from the database will be presented. The problem for the system is that some of the documents have beenturned out to be misclassified and the correctness rate for the recommendation isnot that high. To help solve this problem and make the search better, this paper will introduce a new algorithm which is based on word embedding approach and users’ supervision. The background information of Meltwater group and its existing frame of recommender system will be specifically illustrated at the beginning of the paper. Followed by it will be the exploration of background methods which include LSA (Latent Semantic Analysis), Random Indexing and Word2vec. Besides, the necessary tools such as T-SNE, K-means clustering and hierarchy clustering will also be mentioned in this part. The data sets that are going to be used in this paper will be described after thepart of background methods. Information such as the introduction of the data and the dealing of it will be mentioned in a detail way. The description of the algorithm will appear in the middle of the paper with detail steps. Followed by it is the evaluation. The algorithm will be evaluated by using several different data sets and the confusion matrix will be used as a means of measurement. Finally, a summary of the method as well as future suggestions will be made at the end of the paper.
|
568 |
Are Open-Source Systems Tested Enough? : An analysis of open-source unit testing practicesMainali, Vikrant January 2022 (has links)
It is of utmost importance for software developers to develop a product that is functional for the end users. One way to ensure this is to have continuous unit tests forthe product. However, software testing is deemed as an unimportant part of the iteration process by many developers. The extent to which an open-source system hasbeen tested may show the current testing practices or lack thereof. The results couldhelp future developers and product owners to improve the quality of their softwareand help identify bugs early on in the iteration process. Previous studies have shownthat there is a trend amongst developers to avoid testing their products and this research helps to show if that is the case or not. We have created a tool that extractsthe method names, class names, method bodies, test classes and test methods froma project in order to analyse and show how thoroughly they are being tested. Theresults of this project have agreed with the previous studies that there is a tendencyamong developers to avoid testing which can lead to a lot of problems.
|
569 |
FINE-TUNE A LANGUAGE MODEL FOR TEXT SUMMARIZATION (BERTSUM) ON EDGAR-CORPUSNiu, Yijie January 2022 (has links)
Financial reports include a lot of useful information for investors, but extracting this information is time-consuming. We think text summarization is a feasible method. In this thesis, we implement BERTSUM, a state-of-the-art language model for text summarization, and evaluate the results by ROUGE metrics. The experiment was carried out on a novel and large-scale financial dataset called EDGAR-CORPUS. The BERTSUM with a transformer achieves the best performance with a ROUGE-L F1 score of 9.26%. We also hand-picked some model-generated summaries that contained common errors and investigated the causes. The results were then compared to previous research. The ROUGE-L F1 value in the previous study was much higher than ours, we think this is due to the length of the financial reports.
|
570 |
Will Svenska Akademiens Ordlista Improve Swedish Word Embeddings?Ahlberg, Ellen January 2022 (has links)
Unsupervised word embedding methods are frequently used for natural language processing applications. However, the unsupervised methods overlook known lexical relations that can be of value to capture accurate semantic word relations. This thesis aims to explore if Swedish word embeddings can benefit from prior known linguistic information. Four knowledge graphs extracted from Svenska Akademiens ordlista (SAOL) are incorporated during the training process using the Probabilistic Word Embeddings with Laplacian Priors (PELP) model. The four implemented PELP models are compared with baseline results to evaluate the use of side information. The results suggest that various lexical relations in SAOL are of interest to generate more accurate Swedish word embeddings.
|
Page generated in 0.0986 seconds