81 |
Análise de sentimentos em tíquetes para o suporte de TI / Sentiment Analysis in Tickets for IT SupportBlaz, Cássio Castaldi Araújo January 2017 (has links)
Análise de Sentimentos/Mineração de Opinião é adotada na engenharia de software para questões como usabilidade e sentimentos de desenvolvedores em projetos. Este trabalho propõe métodos para avaliar os sentimentos presentes em tíquetes abertos à área de suporte de TI. Há diversos tipos de tíquetes abertos à TI (e.g. infraestrutura, software), que envolvem erros, incidentes, requisições, etc. O maior desafio é automaticamente distinguir entre a necessidade em si, a qual é intrinsecamente negativa (por exemplo, a descrição de um erro), de um sentimento embutido na descrição. Nossa abordagem automaticamente cria um dicionário de domínio que contém termos que expressam sentimentos no contexto de TI, utilizados para filtrar expressões em um tíquete para análise de sentimentos. Nós criamos e avaliamos três métodos de classificação para calcular a polaridade em tíquetes. Nosso estudo utilizou 34.895 tíquetes de cinco organizações. Para polaridade, 2.333 tíquetes foram selecionados aleatoriamente para compor nosso gold standard. Nossos melhores resultados apresentam uma precisão e revocação de 82,83% e 88,42%, respectivamente, o que supera outras soluções de análise de sentimentos comparadas. De forma complementar, emoções em tíquetes foram estudadas considerando os modelos de Ekman e VAD. Um dos três métodos de classificação criados foi adaptado para também identificar emoções nos tíquetes. Possíveis correlações entre polaridade e emoções foram verificadas via regras de associação. Resultados correlacionam tíquetes positivos com valência e dominância altas e excitação baixa, além de presença de alegria e surpresa e ausência de medo. Tíquetes negativos correlacionam com valência, excitação e dominância neutras, além de ausência de alegria e presença de medo. Contudo os resultados para a polaridade negativa não são precisos. / Sentiment Analysis/Opinion Mining has been adopted in software engineering for problems such as software usability and sentiment of developers in projects. This work proposes methods to evaluate the sentiment contained in tickets for IT (Information Technology) support. IT tickets are broad in coverage (e.g. infrastructure, software), and involve errors, incidents, requests, etc. The main challenge is to automatically distinguish between factual information, which is intrinsically negative (e.g. error description), from the sentiment embedded in the description. Our approach is to automatically create a domain dictionary that contains terms with sentiment in IT context, used to filter terms in tickets for sentiment analysis. We created and evaluate three classification methods for calculating the polarity of terms in tickets. Our study was developed using 34,895 tickets from five organizations. For polarity, we randomly selected 2.333 tickets to compose a gold standard. Our best results display an average precision and recall of 82.83% and 88.42%, respectively, which outperforms the compared sentiment analysis solutions. Complementarily, emotions in tickets were studied considering the models of Ekman and VAD. One of the three classification methods created has been adapted to also identify emotions in the tickets. Possible correlations between polarity and emotions were verified through association rules. Results correlate positive tickets with valence and dominance high and low excitation, besides presence of joy and surprise and absence of fear. Negative tickets correlate with valence, neutral excitement and dominance, besides absence of joy and presence of fear. However the results for negative polarity are not accurate.
|
82 |
Sobre normalização e classificação de polaridade de textos opinativos na web / On normalization and polarity classification of opinion texts on the webLucas Vinicius Avanço 25 August 2015 (has links)
A área de Análise de Sentimentos ou Mineração de Opiniões tem como um dos objetivos principais analisar computacionalmente opiniões, sentimentos e subjetividade presentes em textos. Por conta da crescente quantidade de textos opinativos nas mídias sociais da web, e também pelo interesse de empresas e governos em insumos que auxiliem a tomada de decisões, esse tópico de pesquisa tem sido amplamente estudado. Classificar opiniões postadas na web, usualmente expressas em textos do tipo conteúdo gerado por usuários, ou UGC (user-generated content), é uma tarefa bastante desafiadora, já que envolve o tratamento de subjetividade. Além disso, a linguagem utilizada em textos do tipo UGC diverge, de várias maneiras, da norma culta da língua, o que impõe ainda mais dificuldade ao seu processamento. Este trabalho relata o desenvolvimento de métodos e sistemas que visam (a) a normalização de textos UGC, isto é, o tratamento do texto com correção ortográfica, substituição de internetês, e normalização de caixa e de pontuação, e (b) a classificação de opiniões, particularmente de avaliações de produtos, em nível de texto, para o português brasileiro. O método proposto para a normalização é predominantemente simbólico, uma vez que usa de forma explícita conhecimentos linguísticos. Já para a classificação de opiniões, que nesse trabalho consiste em atribuir ao texto um valor de polaridade, positivo ou negativo, foram utilizadas abordagens baseadas em léxico e em aprendizado de máquina, bem como a combinação de ambas na construção de um método híbrido original. Constatamos que a normalização melhorou o resultado da classificação de opiniões, pelo menos para métodos baseados em léxico. Também verificamos extrinsecamente a qualidade de léxicos de sentimentos para o português. Fizemos, ainda, experimentos avaliando a confiabilidade das notas dadas pelos autores das opiniões, já que as mesmas são utilizadas para a rotulação de exemplos, e verificamos que, de fato, elas impactam significativamente o desempenho dos classificadores de opiniões. Por fim, obtivemos classificadores de opiniões para o português brasileiro com valores de medida F1 que chegam a 0,84 (abordagem baseada em léxico) e a 0,95 (abordagem baseada em AM), e que são similares aos sistemas para outras línguas, que representam o estado da arte no domínio de avaliação de produtos. / Sentiment Analysis or Opinion Mining has as a main goal to process opinions, feelings and subjectivity expressed in texts. The large number of opinions in social media has increased the interest of companies and governments, who have changed their decisionmaking systems. This has caused a great interest in this research area. Opinions are usually expressed by subjective text, and their processing is a hard task. Moreover, reviews posted on the web are of a especial text type, also called user-generated content (UGC), whose processing is a very challenging task, since they differ in many ways from the standard language. This work describes the design of methods and systems aimed at (a) the normalization of UGC texts, through the use of spell checking, substitution of web slangs, case and punctuation correction, and (b) the classification of opinions at document level, especially for reviews of products in Brazilian Portuguese. The method proposed for normalization of UGC is linguistically motivated. For the classification of opinions, which, in this work, consists in assigning a polarity value (positive or negative) to a opinion text, some lexicon-based and machine learning approaches, as well as a combination of both in a new hybrid manner have been implemented and evaluated. We noticed that the text normalization has improved the results of opinion classification for lexicon-based methods. The quality of the sentiment lexicons for Portuguese was extrinsically evaluated. The reliability of the opinions authors was verified, since they are used for labeling samples. We concluded that they significantly impact the performance of the opinion classifiers. Finally, we proposed some opinion classifiers for Brazilian Portuguese whose F1-measures values reach 0.84 (lexicon-based approach) and 0.95 (machine learning approach), which are analogous to the the similar systems for other languages, which represent the state of the art in the domain of reviews of products.
|
83 |
Sobre normalização e classificação de polaridade de textos opinativos na web / On normalization and polarity classification of opinion texts on the webAvanço, Lucas Vinicius 25 August 2015 (has links)
A área de Análise de Sentimentos ou Mineração de Opiniões tem como um dos objetivos principais analisar computacionalmente opiniões, sentimentos e subjetividade presentes em textos. Por conta da crescente quantidade de textos opinativos nas mídias sociais da web, e também pelo interesse de empresas e governos em insumos que auxiliem a tomada de decisões, esse tópico de pesquisa tem sido amplamente estudado. Classificar opiniões postadas na web, usualmente expressas em textos do tipo conteúdo gerado por usuários, ou UGC (user-generated content), é uma tarefa bastante desafiadora, já que envolve o tratamento de subjetividade. Além disso, a linguagem utilizada em textos do tipo UGC diverge, de várias maneiras, da norma culta da língua, o que impõe ainda mais dificuldade ao seu processamento. Este trabalho relata o desenvolvimento de métodos e sistemas que visam (a) a normalização de textos UGC, isto é, o tratamento do texto com correção ortográfica, substituição de internetês, e normalização de caixa e de pontuação, e (b) a classificação de opiniões, particularmente de avaliações de produtos, em nível de texto, para o português brasileiro. O método proposto para a normalização é predominantemente simbólico, uma vez que usa de forma explícita conhecimentos linguísticos. Já para a classificação de opiniões, que nesse trabalho consiste em atribuir ao texto um valor de polaridade, positivo ou negativo, foram utilizadas abordagens baseadas em léxico e em aprendizado de máquina, bem como a combinação de ambas na construção de um método híbrido original. Constatamos que a normalização melhorou o resultado da classificação de opiniões, pelo menos para métodos baseados em léxico. Também verificamos extrinsecamente a qualidade de léxicos de sentimentos para o português. Fizemos, ainda, experimentos avaliando a confiabilidade das notas dadas pelos autores das opiniões, já que as mesmas são utilizadas para a rotulação de exemplos, e verificamos que, de fato, elas impactam significativamente o desempenho dos classificadores de opiniões. Por fim, obtivemos classificadores de opiniões para o português brasileiro com valores de medida F1 que chegam a 0,84 (abordagem baseada em léxico) e a 0,95 (abordagem baseada em AM), e que são similares aos sistemas para outras línguas, que representam o estado da arte no domínio de avaliação de produtos. / Sentiment Analysis or Opinion Mining has as a main goal to process opinions, feelings and subjectivity expressed in texts. The large number of opinions in social media has increased the interest of companies and governments, who have changed their decisionmaking systems. This has caused a great interest in this research area. Opinions are usually expressed by subjective text, and their processing is a hard task. Moreover, reviews posted on the web are of a especial text type, also called user-generated content (UGC), whose processing is a very challenging task, since they differ in many ways from the standard language. This work describes the design of methods and systems aimed at (a) the normalization of UGC texts, through the use of spell checking, substitution of web slangs, case and punctuation correction, and (b) the classification of opinions at document level, especially for reviews of products in Brazilian Portuguese. The method proposed for normalization of UGC is linguistically motivated. For the classification of opinions, which, in this work, consists in assigning a polarity value (positive or negative) to a opinion text, some lexicon-based and machine learning approaches, as well as a combination of both in a new hybrid manner have been implemented and evaluated. We noticed that the text normalization has improved the results of opinion classification for lexicon-based methods. The quality of the sentiment lexicons for Portuguese was extrinsically evaluated. The reliability of the opinions authors was verified, since they are used for labeling samples. We concluded that they significantly impact the performance of the opinion classifiers. Finally, we proposed some opinion classifiers for Brazilian Portuguese whose F1-measures values reach 0.84 (lexicon-based approach) and 0.95 (machine learning approach), which are analogous to the the similar systems for other languages, which represent the state of the art in the domain of reviews of products.
|
84 |
Pragmatic Quotation Use in Online Yelp Reviews and its Connection to Author SentimentWright, Mary Elisabeth 01 March 2016 (has links)
Previous research has established that punctuation can be used to communicate nuances of meaning in online writing (McAndrew & De Jonge, 2011). Punctuation, considered a computer mediated communication (CMC) cue, expresses tone and emotion and disambiguates an author's intention (Vandergriff, 2013). Quotation marks as CMC cues can serve pragmatic functions and have been understudied. Some of these functions have been generally described (Predelli, 2003). However, no corpus study has specifically focused on the pragmatic uses of quotations in online text. Consumer reviews, a genre of online text, can directly impact business profits and influence customers' purchasing decisions (Floyd, Freling, Alhoqail, Cho & Freling, 2014). Businesses are investing in sentiment analysis to gauge their target market's opinions (Salehan & Kim, 2016). Sentiment analysis is the computerized appraisal of a text to determine whether its author is expressing a positive or negative opinion (Novak, Smailovic, Sluban & Mozetic, 2015). Sentiment analysis programs are still limited and could be improved in accuracy. Most programs rely on lexicons of words given a pre-determined polarity value (positive or negative) out of context (Novak et al., 2015). However, context is crucial to communication, and sentiment analysis programs could incorporate a better variety of contextual linguistic features to improve their accuracy. Quotations used for pragmatic communication is such a feature. This study discovered seven pragmatic quotation uses in a 2014 Yelp review corpus: Collective Knowledge, Non-standard, Grammatical, Non-literal, Narrative, Idiolect, and Emphasis. An ANOVA and Tukey HSD test were performed, and the results were significant. Pragmatic category accounted for 15% of the variance in review star rating. The Collective Knowledge category and the Narrative and Non-literal categories were significantly different from each other. The Collective Knowledge category showed a correlation with positive sentiment, while the Narrative and Non-literal categories displayed a correlation with negative sentiment. These three categories are likely present in several types of online text, making them valuable for further sentiment analysis research. If these pragmatic patterns could be detected automatically, they could be used in sentiment algorithms to give a more accurate picture of author opinion.
|
85 |
A Study on the Efficacy of Sentiment Analysis in Author AttributionSchneider, Michael J 01 August 2015 (has links)
The field of authorship attribution seeks to characterize an author’s writing style well enough to determine whether he or she has written a text of interest. One subfield of authorship attribution, stylometry, seeks to find the necessary literary attributes to quantify an author’s writing style. The research presented here sought to determine the efficacy of sentiment analysis as a new stylometric feature, by comparing its performance in attributing authorship against the performance of traditional stylometric features. Experimentation, with a corpus of sci-fi texts, found sentiment analysis to have a much lower performance in assigning authorship than the traditional stylometric features.
|
86 |
Classifying textual fast food restaurant reviews quantitatively using text mining and supervised machine learning algorithmsWright, Lindsey 01 May 2018 (has links)
Companies continually seek to improve their business model through feedback and customer satisfaction surveys. Social media provides additional opportunities for this advanced exploration into the mind of the customer. By extracting customer feedback from social media platforms, companies may increase the sample size of their feedback and remove bias often found in questionnaires, resulting in better informed decision making. However, simply using personnel to analyze the thousands of relative social media content is financially expensive and time consuming. Thus, our study aims to establish a method to extract business intelligence from social media content by structuralizing opinionated textual data using text mining and classifying these reviews by the degree of customer satisfaction. By quantifying textual reviews, companies may perform statistical analysis to extract insight from the data as well as effectively address concerns. Specifically, we analyzed a subset of 56,000 Yelp reviews on fast food restaurants and attempt to predict a quantitative value reflecting the overall opinion of each review. We compare the use of two different predictive modeling techniques, bagged Decision Trees and Random Forest Classifiers. In order to simplify the problem, we train our model to accurately classify strongly negative and strongly positive reviews (1 and 5 stars) reviews. In addition, we identify drivers behind strongly positive or negative reviews allowing businesses to understand their strengths and weaknesses. This method provides companies an efficient and cost-effective method to process and understand customer satisfaction as it is discussed on social media.
|
87 |
Sentiment analysis and transfer learning using recurrent neural networks : an investigation of the power of transfer learning / Sentimentanalys och överföringslärande med neuronnätPettersson, Harald January 2019 (has links)
In the field of data mining, transfer learning is the method of transferring knowledge from one domain into another. Using reviews from prisjakt.se, a Swedish price comparison site, and hotels.com this work investigate how the similarities between domains affect the results of transfer learning when using recurrent neural networks. We test several different domains with different characteristics, e.g. size and lexical similarity. In this work only relatively similar domains were used, the same target function was sought and all reviews were in Swedish. Regardless, the results are conclusive; transfer learning is often beneficial, but is highly dependent on the features of the domains and how they compare with each other’s.
|
88 |
Sentiments, networks, literary biography: towards a mesoanalysis of Cicero's CorpusMarley, Caitlin A. 01 May 2018 (has links)
In a field as old as Classics, it difficult to find truly innovative approaches to literary works that have been studied for millennia, and it only becomes more difficult to find something new to explore in works as fundamental to the field as Marcus Tullius Cicero’s. However, in the burgeoning field of Digital Humanities, new avenues for textual exploration arise even among the over-picked rubble that is the Classical World. Through the use of computer software, we can search through and statistically analyze corpora of massive sizes. This project uses such techniques to perform a mesoanalysis of Cicero’s corpus. Through the use of R and Gephi, I will “read” Cicero’s works from a distance and see a much broader view of his character than I could through a traditional close reading of a few texts.
This mesoanalysis includes a stylometric analysis of Cicero’s entire corpus, a sentiment analysis of his orations, and a network analysis of his letters. The sentiment analysis will explore Cicero as a literary figure. Through a hierarchical cluster analysis in R, I will assess not only how his style changes from genre to genre but within a genre (orations) as well. That analysis will close with an exploration of the lexical richness of his works, how it varies from genre to genre and over his lifetime. For the sentiment analysis, I built a lexicon based on Stoic theory, primarily as it is explained in the Tusculunae Disputationes, and Robert Kaster’s work with emotional scripts. After the lexicon was built, I applied it to Cicero’s orations in a method similar to Matthew Jockers’ syuzhet package for R, and I traced his use of sentiment across the speech. I then compared those trajectories to Latin rhetorical theory, especially the theories included in Cicero’s own treatises, in order to see if Cicero had put into effect his own advice or if he had a few techniques that he kept hidden. The mesoanalysis closes with a network analysis of the Epistulae ad Familiares. I merged Cicero’s social network with a sentiment analysis in order to assess how Cicero felt about and interacted with his peers. From this analysis, one could gather an idea of Cicero as a person. At the end of the mesoanalysis, we can attain a much broader sense of Cicero’s character.
This project also has a second aim, and that is to explain how these techniques could be applied to other literary corpora, outside of Cicero’s and Latin. I have carefully detailed my process and provide more instruction in my appendices so that readers could attempt these analyses and be successful in them.
|
89 |
Deep active learning using Monte Carlo Dropout / Aprendizado ativo profundo usando Monte Carlo DropoutMoura, Lucas Albuquerque Medeiros de 14 November 2018 (has links)
Deep Learning models rely on a huge amount of labeled data to be created. However, there are a number of areas where labeling data is a costly process, making Deep Learning approaches unfeasible. One way to handle that situation is by using the Active Learning technique. Initially, it creates a model with the available labeled data. After that, it incrementally chooses new unlabeled data that will potentially increase the model accuracy, if added to the training data. To select which data will be labeled next, this technique requires a measurement of uncertainty from the model prediction, which is usually not computed for Deep Learning methods. A new approach has been proposed to measure uncertainty in those models, called Monte Carlo Dropout . This technique allowed Active Learning to be used together with Deep Learning for image classification. This research will evaluate if modeling uncertainty on Deep Learning models with Monte Carlo Dropout will make the use of Active Learning feasible for the task of sentiment analysis, an area with huge amount of data, but few of them labeled. / Modelos de Aprendizado Profundo necessitam de uma vasta quantidade de dados anotados para serem criados. Entretanto, existem muitas áreas onde obter dados anotados é uma tarefa custosa. Neste cenário, o uso de Aprendizado Profundo se torna bastante difícil. Uma maneira de lidar com essa situação é usando a técnica de Aprendizado Ativo. Inicialmente, essa técnica cria um modelo com os dados anotados disponíveis. Depois disso, ela incrementalmente escolhe dados não anotados que irão, potencialmente, melhorar à acurácia do modelo, se adicionados aos dados de treinamento. Para selecionar quais dados serão anotados, essa técnica necessita de uma medida de incerteza sobre as predições geradas pelo modelo. Entretanto, tal medida não é usualmente realizada em modelos de Aprendizado Profundo. Uma nova técnica foi proposta para lidar com a problemática de medir a incerteza desses modelos, chamada de Monte Carlo Dropout . Essa técnica permitiu o uso de Aprendizado Ativo junto com Aprendizado Profundo para tarefa de classificação de imagens. Essa pesquisa visa averiguar se ao modelarmos a incerteza em modelos de Aprendizado Profundo com a técnica de Monte Carlo Dropout , será possível usar a técnica de Aprendizado Ativo para tarefa de análise de sentimento, uma área com uma vasta quantidade de dados, mas poucos deles anotados.
|
90 |
Use of social media to monitor and predict outbreaks and public opinion on health topicsSignorini, Alessio 01 December 2014 (has links)
The world in which we live has changed rapidly over the last few decades. Threats of bioterrorism, influenza pandemics, and emerging infectious diseases coupled with unprecedented population mobility led to the development of public health surveillance systems. These systems are useful in detecting and responding to infectious disease outbreaks but often operate with a considerable delay and fail to provide the necessary lead time for optimal public health response.
In contrast, syndromic surveillance systems rely on clinical features (e.g., activities prompted by the onset of symptoms) that are discernible prior to diagnosis to warn of changes in disease activity. Although less precise, these systems can offer considerable lead time. Patient information may be acquired from multiple existing sources established for other purposes, including, for example, emergency department primary complaints, ambulance dispatch data, and over-the-counter medication sales. Unfortunately, these data are often expensive, sometimes difficult to obtain and almost always hard to integrate.
Fortunately, the proliferation of online social networks makes much more information about our daily habits and lifestyles freely available and easily accessible on the web. Twitter, Facebook and FourSquare are only a few examples of the many websites where people voluntarily post updates on their daily behaviors, health status, and physical location.
In this thesis we develop and apply methods to collect, filter and analyze the content of social media postings in order to make predictions. As a proof of concept we used Twitter data to predict public opinion in the form of the outcome of a popular television show. We then used the same methods to monitor and track public perception of influenza during the H1N1 epidemic, and even to predict disease burden in real time, which is a measurable advance over current public health practice. Finally, we used location specific social media data to model human travels and show how this data can improve our prediction of disease burden.
|
Page generated in 0.032 seconds