Global ETD Search

211	Análise de sentimentos em tíquetes para o suporte de TI / Sentiment Analysis in Tickets for IT Support Blaz, Cássio Castaldi Araújo January 2017 (has links) Análise de Sentimentos/Mineração de Opinião é adotada na engenharia de software para questões como usabilidade e sentimentos de desenvolvedores em projetos. Este trabalho propõe métodos para avaliar os sentimentos presentes em tíquetes abertos à área de suporte de TI. Há diversos tipos de tíquetes abertos à TI (e.g. infraestrutura, software), que envolvem erros, incidentes, requisições, etc. O maior desafio é automaticamente distinguir entre a necessidade em si, a qual é intrinsecamente negativa (por exemplo, a descrição de um erro), de um sentimento embutido na descrição. Nossa abordagem automaticamente cria um dicionário de domínio que contém termos que expressam sentimentos no contexto de TI, utilizados para filtrar expressões em um tíquete para análise de sentimentos. Nós criamos e avaliamos três métodos de classificação para calcular a polaridade em tíquetes. Nosso estudo utilizou 34.895 tíquetes de cinco organizações. Para polaridade, 2.333 tíquetes foram selecionados aleatoriamente para compor nosso gold standard. Nossos melhores resultados apresentam uma precisão e revocação de 82,83% e 88,42%, respectivamente, o que supera outras soluções de análise de sentimentos comparadas. De forma complementar, emoções em tíquetes foram estudadas considerando os modelos de Ekman e VAD. Um dos três métodos de classificação criados foi adaptado para também identificar emoções nos tíquetes. Possíveis correlações entre polaridade e emoções foram verificadas via regras de associação. Resultados correlacionam tíquetes positivos com valência e dominância altas e excitação baixa, além de presença de alegria e surpresa e ausência de medo. Tíquetes negativos correlacionam com valência, excitação e dominância neutras, além de ausência de alegria e presença de medo. Contudo os resultados para a polaridade negativa não são precisos. / Sentiment Analysis/Opinion Mining has been adopted in software engineering for problems such as software usability and sentiment of developers in projects. This work proposes methods to evaluate the sentiment contained in tickets for IT (Information Technology) support. IT tickets are broad in coverage (e.g. infrastructure, software), and involve errors, incidents, requests, etc. The main challenge is to automatically distinguish between factual information, which is intrinsically negative (e.g. error description), from the sentiment embedded in the description. Our approach is to automatically create a domain dictionary that contains terms with sentiment in IT context, used to filter terms in tickets for sentiment analysis. We created and evaluate three classification methods for calculating the polarity of terms in tickets. Our study was developed using 34,895 tickets from five organizations. For polarity, we randomly selected 2.333 tickets to compose a gold standard. Our best results display an average precision and recall of 82.83% and 88.42%, respectively, which outperforms the compared sentiment analysis solutions. Complementarily, emotions in tickets were studied considering the models of Ekman and VAD. One of the three classification methods created has been adapted to also identify emotions in the tickets. Possible correlations between polarity and emotions were verified through association rules. Results correlate positive tickets with valence and dominance high and low excitation, besides presence of joy and surprise and absence of fear. Negative tickets correlate with valence, neutral excitement and dominance, besides absence of joy and presence of fear. However the results for negative polarity are not accurate. Mineracao : Dados Tecnologia : Informacao Sentiment Analysis Domain Dictionary IT Tickets Opinion Mining
212	Domain-specific lexicon generation for emotion detection from text Bandhakavi, Anil January 2018 (has links) Emotions play a key role in effective and successful human communication. Text is popularly used on the internet and social media websites to express and share emotions, feelings and sentiments. However useful applications and services built to understand emotions from text are limited in effectiveness due to reliance on general purpose emotion lexicons that have static vocabulary and sentiment lexicons that can only interpret emotions coarsely. Thus emotion detection from text calls for methods and knowledge resources that can deal with challenges such as dynamic and informal vocabulary, domain-level variations in emotional expressions and other linguistic nuances. In this thesis we demonstrate how labelled (e.g. blogs, news headlines) and weakly-labelled (e.g. tweets) emotional documents can be harnessed to learn word-emotion lexicons that can account for dynamic and domain-specific emotional vocabulary. We model the characteristics of realworld emotional documents to propose a generative mixture model, which iteratively estimates the language models that best describe the emotional documents using expectation maximization (EM). The proposed mixture model has the ability to model both emotionally charged words and emotion-neutral words. We then generate a word-emotion lexicon using the mixture model to quantify word-emotion associations in the form of a probability vectors. Secondly we introduce novel feature extraction methods to utilize the emotion rich knowledge being captured by our word-emotion lexicon. The extracted features are used to classify text into emotion classes using machine learning. Further we also propose hybrid text representations for emotion classification that use the knowledge of lexicon based features in conjunction with other representations such as n-grams, part-of-speech and sentiment information. Thirdly we propose two different methods which jointly use an emotion-labelled corpus of tweets and emotion-sentiment mapping proposed in psychology to learn word-level numerical quantification of sentiment strengths over a positive to negative spectrum. Finally we evaluate all the proposed methods in this thesis through a variety of emotion detection and sentiment analysis tasks on benchmark data sets covering domains from blogs to news articles to tweets and incident reports.
213	Asset price and volatility forecasting using news sentiment Sadik, Zryan January 2018 (has links) The aim of this thesis is to show that news analytics data can be utilised to improve the predictive ability of existing models that have useful roles in a variety of financial applications. The modified models are computationally efficient and perform far better than the existing ones. The new modified models offer a reasonable compromise between increased model complexity and prediction accuracy. I have investigated the impact of news sentiment on volatility of stock returns. The GARCH model is one of the most common models used for predicting asset price volatility from the return time series. In this research, I have considered quantified news sentiment as a second source of information and its impact on the movement of asset prices, which is used together with the asset time series data to predict the volatility of asset price returns. Comprehensive numerical experiments demonstrate that the new proposed volatility models provide superior prediction than the "plain vanilla" GARCH, TGARCH and EGARCH models. This research presents evidence that including news sentiment term as an exogenous variable in the GARCH framework improves the prediction power of the model. The analysis of this study suggested that the use of an exponential decay function is good when the news flow is frequent, whereas the Hill decay function is good only when there are scheduled announcements. The numerical results vindicate some recent findings regarding the utility of news sentiment as a predictor of volatility, and also vindicate the utility of the new models combining the proxies for past news sentiments and the past asset price returns. The empirical analysis suggested that news augmented GARCH models can be very useful in estimating VaR and implementing risk management strategies. Another direction of my research is introducing a new approach to construct a commodity futures pricing model. This study proposed a new method of incorporating macroeconomic news into a predictive model for forecasting prices of crude oil futures contracts. Since these futures contracts are iii iv more liquid than the underlying commodity itself, accurate forecasting of their prices is of great value to multiple categories of market participants. The Kalman filtering framework for forecasting arbitrage-free (futures) prices was utilized, and it is assumed that the volatility of oil (futures) price is influenced by macroeconomic news. The impact of quantified news sentiment on the price volatility is modelled through a parametrized, nonlinear functional map. This approach is motivated by the successful use of a similar model structure in my earlier work, for predicting individual stock volatility using stock-specific news. Numerical experiments with real data illustrate that this new model performs better than the one factor model in terms of accuracy of predictive power as well as goodness of fit to the data. The proposed model structure for incorporating macroeconomic news together with historical (market) data is novel and improves the accuracy of price prediction quite significantly.
214	Sobre normalização e classificação de polaridade de textos opinativos na web / On normalization and polarity classification of opinion texts on the web Lucas Vinicius Avanço 25 August 2015 (has links) A área de Análise de Sentimentos ou Mineração de Opiniões tem como um dos objetivos principais analisar computacionalmente opiniões, sentimentos e subjetividade presentes em textos. Por conta da crescente quantidade de textos opinativos nas mídias sociais da web, e também pelo interesse de empresas e governos em insumos que auxiliem a tomada de decisões, esse tópico de pesquisa tem sido amplamente estudado. Classificar opiniões postadas na web, usualmente expressas em textos do tipo conteúdo gerado por usuários, ou UGC (user-generated content), é uma tarefa bastante desafiadora, já que envolve o tratamento de subjetividade. Além disso, a linguagem utilizada em textos do tipo UGC diverge, de várias maneiras, da norma culta da língua, o que impõe ainda mais dificuldade ao seu processamento. Este trabalho relata o desenvolvimento de métodos e sistemas que visam (a) a normalização de textos UGC, isto é, o tratamento do texto com correção ortográfica, substituição de internetês, e normalização de caixa e de pontuação, e (b) a classificação de opiniões, particularmente de avaliações de produtos, em nível de texto, para o português brasileiro. O método proposto para a normalização é predominantemente simbólico, uma vez que usa de forma explícita conhecimentos linguísticos. Já para a classificação de opiniões, que nesse trabalho consiste em atribuir ao texto um valor de polaridade, positivo ou negativo, foram utilizadas abordagens baseadas em léxico e em aprendizado de máquina, bem como a combinação de ambas na construção de um método híbrido original. Constatamos que a normalização melhorou o resultado da classificação de opiniões, pelo menos para métodos baseados em léxico. Também verificamos extrinsecamente a qualidade de léxicos de sentimentos para o português. Fizemos, ainda, experimentos avaliando a confiabilidade das notas dadas pelos autores das opiniões, já que as mesmas são utilizadas para a rotulação de exemplos, e verificamos que, de fato, elas impactam significativamente o desempenho dos classificadores de opiniões. Por fim, obtivemos classificadores de opiniões para o português brasileiro com valores de medida F1 que chegam a 0,84 (abordagem baseada em léxico) e a 0,95 (abordagem baseada em AM), e que são similares aos sistemas para outras línguas, que representam o estado da arte no domínio de avaliação de produtos. / Sentiment Analysis or Opinion Mining has as a main goal to process opinions, feelings and subjectivity expressed in texts. The large number of opinions in social media has increased the interest of companies and governments, who have changed their decisionmaking systems. This has caused a great interest in this research area. Opinions are usually expressed by subjective text, and their processing is a hard task. Moreover, reviews posted on the web are of a especial text type, also called user-generated content (UGC), whose processing is a very challenging task, since they differ in many ways from the standard language. This work describes the design of methods and systems aimed at (a) the normalization of UGC texts, through the use of spell checking, substitution of web slangs, case and punctuation correction, and (b) the classification of opinions at document level, especially for reviews of products in Brazilian Portuguese. The method proposed for normalization of UGC is linguistically motivated. For the classification of opinions, which, in this work, consists in assigning a polarity value (positive or negative) to a opinion text, some lexicon-based and machine learning approaches, as well as a combination of both in a new hybrid manner have been implemented and evaluated. We noticed that the text normalization has improved the results of opinion classification for lexicon-based methods. The quality of the sentiment lexicons for Portuguese was extrinsically evaluated. The reliability of the opinions authors was verified, since they are used for labeling samples. We concluded that they significantly impact the performance of the opinion classifiers. Finally, we proposed some opinion classifiers for Brazilian Portuguese whose F1-measures values reach 0.84 (lexicon-based approach) and 0.95 (machine learning approach), which are analogous to the the similar systems for other languages, which represent the state of the art in the domain of reviews of products. Análise de sentimentos Classificação de opiniões Normalização de UGC Opinion classification Sentiment analysis UGC normalization
215	Sobre normalização e classificação de polaridade de textos opinativos na web / On normalization and polarity classification of opinion texts on the web Avanço, Lucas Vinicius 25 August 2015 (has links) A área de Análise de Sentimentos ou Mineração de Opiniões tem como um dos objetivos principais analisar computacionalmente opiniões, sentimentos e subjetividade presentes em textos. Por conta da crescente quantidade de textos opinativos nas mídias sociais da web, e também pelo interesse de empresas e governos em insumos que auxiliem a tomada de decisões, esse tópico de pesquisa tem sido amplamente estudado. Classificar opiniões postadas na web, usualmente expressas em textos do tipo conteúdo gerado por usuários, ou UGC (user-generated content), é uma tarefa bastante desafiadora, já que envolve o tratamento de subjetividade. Além disso, a linguagem utilizada em textos do tipo UGC diverge, de várias maneiras, da norma culta da língua, o que impõe ainda mais dificuldade ao seu processamento. Este trabalho relata o desenvolvimento de métodos e sistemas que visam (a) a normalização de textos UGC, isto é, o tratamento do texto com correção ortográfica, substituição de internetês, e normalização de caixa e de pontuação, e (b) a classificação de opiniões, particularmente de avaliações de produtos, em nível de texto, para o português brasileiro. O método proposto para a normalização é predominantemente simbólico, uma vez que usa de forma explícita conhecimentos linguísticos. Já para a classificação de opiniões, que nesse trabalho consiste em atribuir ao texto um valor de polaridade, positivo ou negativo, foram utilizadas abordagens baseadas em léxico e em aprendizado de máquina, bem como a combinação de ambas na construção de um método híbrido original. Constatamos que a normalização melhorou o resultado da classificação de opiniões, pelo menos para métodos baseados em léxico. Também verificamos extrinsecamente a qualidade de léxicos de sentimentos para o português. Fizemos, ainda, experimentos avaliando a confiabilidade das notas dadas pelos autores das opiniões, já que as mesmas são utilizadas para a rotulação de exemplos, e verificamos que, de fato, elas impactam significativamente o desempenho dos classificadores de opiniões. Por fim, obtivemos classificadores de opiniões para o português brasileiro com valores de medida F1 que chegam a 0,84 (abordagem baseada em léxico) e a 0,95 (abordagem baseada em AM), e que são similares aos sistemas para outras línguas, que representam o estado da arte no domínio de avaliação de produtos. / Sentiment Analysis or Opinion Mining has as a main goal to process opinions, feelings and subjectivity expressed in texts. The large number of opinions in social media has increased the interest of companies and governments, who have changed their decisionmaking systems. This has caused a great interest in this research area. Opinions are usually expressed by subjective text, and their processing is a hard task. Moreover, reviews posted on the web are of a especial text type, also called user-generated content (UGC), whose processing is a very challenging task, since they differ in many ways from the standard language. This work describes the design of methods and systems aimed at (a) the normalization of UGC texts, through the use of spell checking, substitution of web slangs, case and punctuation correction, and (b) the classification of opinions at document level, especially for reviews of products in Brazilian Portuguese. The method proposed for normalization of UGC is linguistically motivated. For the classification of opinions, which, in this work, consists in assigning a polarity value (positive or negative) to a opinion text, some lexicon-based and machine learning approaches, as well as a combination of both in a new hybrid manner have been implemented and evaluated. We noticed that the text normalization has improved the results of opinion classification for lexicon-based methods. The quality of the sentiment lexicons for Portuguese was extrinsically evaluated. The reliability of the opinions authors was verified, since they are used for labeling samples. We concluded that they significantly impact the performance of the opinion classifiers. Finally, we proposed some opinion classifiers for Brazilian Portuguese whose F1-measures values reach 0.84 (lexicon-based approach) and 0.95 (machine learning approach), which are analogous to the the similar systems for other languages, which represent the state of the art in the domain of reviews of products. Análise de sentimentos Classificação de opiniões Normalização de UGC Opinion classification Sentiment analysis UGC normalization
216	Pragmatic Quotation Use in Online Yelp Reviews and its Connection to Author Sentiment Wright, Mary Elisabeth 01 March 2016 (has links) Previous research has established that punctuation can be used to communicate nuances of meaning in online writing (McAndrew & De Jonge, 2011). Punctuation, considered a computer mediated communication (CMC) cue, expresses tone and emotion and disambiguates an author's intention (Vandergriff, 2013). Quotation marks as CMC cues can serve pragmatic functions and have been understudied. Some of these functions have been generally described (Predelli, 2003). However, no corpus study has specifically focused on the pragmatic uses of quotations in online text. Consumer reviews, a genre of online text, can directly impact business profits and influence customers' purchasing decisions (Floyd, Freling, Alhoqail, Cho & Freling, 2014). Businesses are investing in sentiment analysis to gauge their target market's opinions (Salehan & Kim, 2016). Sentiment analysis is the computerized appraisal of a text to determine whether its author is expressing a positive or negative opinion (Novak, Smailovic, Sluban & Mozetic, 2015). Sentiment analysis programs are still limited and could be improved in accuracy. Most programs rely on lexicons of words given a pre-determined polarity value (positive or negative) out of context (Novak et al., 2015). However, context is crucial to communication, and sentiment analysis programs could incorporate a better variety of contextual linguistic features to improve their accuracy. Quotations used for pragmatic communication is such a feature. This study discovered seven pragmatic quotation uses in a 2014 Yelp review corpus: Collective Knowledge, Non-standard, Grammatical, Non-literal, Narrative, Idiolect, and Emphasis. An ANOVA and Tukey HSD test were performed, and the results were significant. Pragmatic category accounted for 15% of the variance in review star rating. The Collective Knowledge category and the Narrative and Non-literal categories were significantly different from each other. The Collective Knowledge category showed a correlation with positive sentiment, while the Narrative and Non-literal categories displayed a correlation with negative sentiment. These three categories are likely present in several types of online text, making them valuable for further sentiment analysis research. If these pragmatic patterns could be detected automatically, they could be used in sentiment algorithms to give a more accurate picture of author opinion. quotations quotes pragmatics CMC online reviews Yelp sentiment analysis opinion mining computer mediated communication Linguistics
217	A Study on the Efficacy of Sentiment Analysis in Author Attribution Schneider, Michael J 01 August 2015 (has links) The field of authorship attribution seeks to characterize an author’s writing style well enough to determine whether he or she has written a text of interest. One subfield of authorship attribution, stylometry, seeks to find the necessary literary attributes to quantify an author’s writing style. The research presented here sought to determine the efficacy of sentiment analysis as a new stylometric feature, by comparing its performance in attributing authorship against the performance of traditional stylometric features. Experimentation, with a corpus of sci-fi texts, found sentiment analysis to have a much lower performance in assigning authorship than the traditional stylometric features. NLP Sentiment Analysis Authorship Attribution Data Mining Stylometry Computational Linguistics
218	Classifying textual fast food restaurant reviews quantitatively using text mining and supervised machine learning algorithms Wright, Lindsey 01 May 2018 (has links) Companies continually seek to improve their business model through feedback and customer satisfaction surveys. Social media provides additional opportunities for this advanced exploration into the mind of the customer. By extracting customer feedback from social media platforms, companies may increase the sample size of their feedback and remove bias often found in questionnaires, resulting in better informed decision making. However, simply using personnel to analyze the thousands of relative social media content is financially expensive and time consuming. Thus, our study aims to establish a method to extract business intelligence from social media content by structuralizing opinionated textual data using text mining and classifying these reviews by the degree of customer satisfaction. By quantifying textual reviews, companies may perform statistical analysis to extract insight from the data as well as effectively address concerns. Specifically, we analyzed a subset of 56,000 Yelp reviews on fast food restaurants and attempt to predict a quantitative value reflecting the overall opinion of each review. We compare the use of two different predictive modeling techniques, bagged Decision Trees and Random Forest Classifiers. In order to simplify the problem, we train our model to accurately classify strongly negative and strongly positive reviews (1 and 5 stars) reviews. In addition, we identify drivers behind strongly positive or negative reviews allowing businesses to understand their strengths and weaknesses. This method provides companies an efficient and cost-effective method to process and understand customer satisfaction as it is discussed on social media. text mining sentiment analysis decision tree random forest Other Applied Mathematics
219	Sentiment analysis and transfer learning using recurrent neural networks : an investigation of the power of transfer learning / Sentimentanalys och överföringslärande med neuronnät Pettersson, Harald January 2019 (has links) In the field of data mining, transfer learning is the method of transferring knowledge from one domain into another. Using reviews from prisjakt.se, a Swedish price comparison site, and hotels.com this work investigate how the similarities between domains affect the results of transfer learning when using recurrent neural networks. We test several different domains with different characteristics, e.g. size and lexical similarity. In this work only relatively similar domains were used, the same target function was sought and all reviews were in Swedish. Regardless, the results are conclusive; transfer learning is often beneficial, but is highly dependent on the features of the domains and how they compare with each other’s. Machine Learning Neural Networks Transfer Learning Domain Adaption Sentiment Analysis Computer Engineering Datorteknik
220	Sentiments, networks, literary biography: towards a mesoanalysis of Cicero's Corpus Marley, Caitlin A. 01 May 2018 (has links) In a field as old as Classics, it difficult to find truly innovative approaches to literary works that have been studied for millennia, and it only becomes more difficult to find something new to explore in works as fundamental to the field as Marcus Tullius Cicero’s. However, in the burgeoning field of Digital Humanities, new avenues for textual exploration arise even among the over-picked rubble that is the Classical World. Through the use of computer software, we can search through and statistically analyze corpora of massive sizes. This project uses such techniques to perform a mesoanalysis of Cicero’s corpus. Through the use of R and Gephi, I will “read” Cicero’s works from a distance and see a much broader view of his character than I could through a traditional close reading of a few texts. This mesoanalysis includes a stylometric analysis of Cicero’s entire corpus, a sentiment analysis of his orations, and a network analysis of his letters. The sentiment analysis will explore Cicero as a literary figure. Through a hierarchical cluster analysis in R, I will assess not only how his style changes from genre to genre but within a genre (orations) as well. That analysis will close with an exploration of the lexical richness of his works, how it varies from genre to genre and over his lifetime. For the sentiment analysis, I built a lexicon based on Stoic theory, primarily as it is explained in the Tusculunae Disputationes, and Robert Kaster’s work with emotional scripts. After the lexicon was built, I applied it to Cicero’s orations in a method similar to Matthew Jockers’ syuzhet package for R, and I traced his use of sentiment across the speech. I then compared those trajectories to Latin rhetorical theory, especially the theories included in Cicero’s own treatises, in order to see if Cicero had put into effect his own advice or if he had a few techniques that he kept hidden. The mesoanalysis closes with a network analysis of the Epistulae ad Familiares. I merged Cicero’s social network with a sentiment analysis in order to assess how Cicero felt about and interacted with his peers. From this analysis, one could gather an idea of Cicero as a person. At the end of the mesoanalysis, we can attain a much broader sense of Cicero’s character. This project also has a second aim, and that is to explain how these techniques could be applied to other literary corpora, outside of Cicero’s and Latin. I have carefully detailed my process and provide more instruction in my appendices so that readers could attempt these analyses and be successful in them. Digital Humanities Latin Marcus Tullius Cicero Network Analysis Sentiment Analysis Text Analysis Classics

Search results