Spelling suggestions: "subject:"pinion mining"" "subject:"0pinion mining""
11 |
Exploring the Correlation Between Ratings, Adjectives and Sentiment on Customer ReviewsSandström, Einar, Josefsson, Fredrik January 2022 (has links)
Customer reviews are important for both customers and companies. Customers want to find out if the product or service is what they need while companies want to figure out if their product is good enough for their customers. There is, however, an issue where customers very rarely write a product review. An example of a solution for this could be to let the customer choose between adjectives rather than write the entire review. To help future researchers find out if this could make customers more prone to write reviews, this study looks at the correlation between the sentiment and the rating, as well as the adjectives used when a rating and sentiment correlate. Other studies look at the correlation, or the precision of the tool used for sentiment analysis but do not go in-depth on what makes a review correlate with its rating. To study this, four datasets of reviews were used with a total of 105234 reviews. Then, using Stanford CoreNLP each review text got a predicted sentiment score. The Pearson coefficient was then used to find the correlation coefficient between ratings and sentiments. The conclusion is that there is a weak-moderate correlation between ratings and sentiment. Adjectives with a positive sentiment had a higher correlation than negative adjectives, however, most of them still had a low correlation. The sentiment correlates better when the reviews with only one sentence are omitted from the result.
|
12 |
Computational treatment of superlativesScheible, Silke January 2009 (has links)
The use of gradable adjectives and adverbs represents an important means of expressing comparison in English. The grammatical forms of comparatives and superlatives are used to express explicit orderings between objects with respect to the degree to which they possess some gradable property. While comparatives are commonly used to compare two entities (e.g., “The blue whale is larger than an African elephant”), superlatives such as “The blue whale is the largest mammal” are used to express a comparison between a target entity (here, the blue whale) and its comparison set (the set of mammals), with the target ranked higher or lower on a scale of comparison than members of the comparison set. Superlatives thus highlight the uniqueness of the target with respect to its comparison set. Although superlatives are frequently found in natural language, with the exception of recent work by (Bos and Nissim, 2006) and (Jindal and Liu, 2006b), they have not yet been investigated within a computational framework. And within the framework of theoretical linguistics, studies of superlatives have mainly focused on semantic properties that may only rarely occur in natural language (Szabolsci (1986), Heim (1999)). My PhD research aims to pave the way for a comprehensive computational treatment of superlatives. The initial question I am addressing is that of automatically extracting useful information about the target entity, its comparison set and their relationship from superlative constructions. One of the central claims of the thesis is that no unified computational treatment of superlatives is possible because of their great semantic complexity and the variety of syntactic structures in which they occur. I propose a classification of superlative surface forms, and initially focus on so-called “ISA superlatives”, which make explicit the IS-A relation that holds between target and comparison set. They are suitable for a computational approach because both their target and comparison set are usually explicitly realised in the text. I also aim to show that the findings of this thesis are of potential benefit for NLP applications such as Question Answering, Natural Language Generation, Ontology Learning, and Sentiment Analysis/Opinion Mining. In particular, I investigate the use of the “Superlative Relation Extractor“ implemented in this project in the area of Sentiment Analysis/Opinion Mining, and claim that a superlative analysis of the sort presented in this thesis, when applied to product evaluations and recommendations, can provide just the kind of information that Opinion Mining aims to identify.
|
13 |
Probabilistic topic models for sentiment analysis on the WebChenghua, Lin January 2011 (has links)
Sentiment analysis aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text, and has received a rapid growth of interest in natural language processing in recent years. Probabilistic topic models, on the other hand, are capable of discovering hidden thematic structure in large archives of documents, and have been an active research area in the field of information retrieval. The work in this thesis focuses on developing topic models for automatic sentiment analysis of web data, by combining the ideas from both research domains. One noticeable issue of most previous work in sentiment analysis is that the trained classifier is domain dependent, and the labelled corpora required for training could be difficult to acquire in real world applications. Another issue is that the dependencies between sentiment/subjectivity and topics are not taken into consideration. The main contribution of this thesis is therefore the introduction of three probabilistic topic models, which address the above concerns by modelling sentiment/subjectivity and topic simultaneously. The first model is called the joint sentiment-topic (JST) model based on latent Dirichlet allocation (LDA), which detects sentiment and topic simultaneously from text. Unlike supervised approaches to sentiment classification which often fail to produce satisfactory performance when applied to new domains, the weakly-supervised nature of JST makes it highly portable to other domains, where the only supervision information required is a domain-independent sentiment lexicon. Apart from document-level sentiment classification results, JST can also extract sentiment-bearing topics automatically, which is a distinct feature compared to the existing sentiment analysis approaches. The second model is a dynamic version of JST called the dynamic joint sentiment-topic (dJST) model. dJST respects the ordering of documents, and allows the analysis of topic and sentiment evolution of document archives that are collected over a long time span. By accounting for the historical dependencies of documents from the past epochs in the generative process, dJST gives a richer posterior topical structure than JST, and can better respond to the permutations of topic prominence. We also derive online inference procedures based on a stochastic EM algorithm for efficiently updating the model parameters. The third model is called the subjectivity detection LDA (subjLDA) model for sentence-level subjectivity detection. Two sets of latent variables were introduced in subjLDA. One is the subjectivity label for each sentence; another is the sentiment label for each word token. By viewing the subjectivity detection problem as weakly-supervised generative model learning, subjLDA significantly outperforms the baseline and is comparable to the supervised approach which relies on much larger amounts of data for training. These models have been evaluated on real world datasets, demonstrating that joint sentiment topic modelling is indeed an important and useful research area with much to offer in the way of good results.
|
14 |
Diseño e implementación de un sistema para la clasificación de tweets según su polaridadTapia Caro, Pablo Andrés January 2014 (has links)
Ingeniero Civil Indusrial / La alta penetración de Twitter en Chile ha favorecido que esta red social sea utilizada por empresas, políticos y organizaciones como un medio para obtener información adicional de las opiniones de usuarios acerca de sus productos, servicios o ellos mismos. Al ser los comentarios en Twitter, por defecto, de carácter público, se pueden analizar con el fin de extraer información accionable. En particular las empresas además de estar interesadas en la información cuantitativa, les interesa saber bajo qué polaridad se efectúan estas menciones, por cuanto una variación positiva en el número de comentarios puede deberse a un mayor número de menciones tanto positivas como negativas.
Si bien existen un número considerable de softwares que vienen con la funcionalidad de detección de polaridad de sentimientos, estos no son de mucha utilidad ya que la forma en que interactúa el usuario chileno con esta plataforma está llena de modismos propios de nuestro lenguaje local y abreviaciones que se deben principalmente a la limitación de caracteres de Twitter.
Al ser esta una industria inmadura en Chile, la tarea de detección de polaridad de sentimientos, se está realizando de forma manual por agencias publicitarias y otro tipo de empresas, pero dado el gran número de comentarios que se producen minuto a minuto, esta tarea resulta muy demandante en tiempo y dinero. Para resolver este tipo de problemáticas se utilizan técnicas de aprendizaje automático con el fin de entrenar un algoritmo que luego pueda determinar si un comentario es positivo, negativo o neutro, campo que se conoce como sentiment analysis. Mientras más datos sean procesados para el entrenamiento del algoritmo, mejor es el desempeño del clasificador y como en Twitter es sencillo obtener comentarios mediante su API, a diferencia de la web, se han formulado técnicas para generar automáticamente la corpora que contiene los tweets de entrenamiento para cada una de las clases y así sacar provecho de esta propiedad.
En este trabajo se profundiza el uso de una metodología semiautomática basada en emoticons para la generación de una corpora de tweets para la detección de polaridad de sentimientos en Twitter. Esto se realiza introduciendo un nuevo enfoque para la consolidación de los datos de entrenamiento mediante filtros que mejoran el etiquetado automático. Esto permite prevenir la aparición de comentarios erráticos y que causan ruido en las fases de entrenamiento y clasificación. Además se introduce una nueva clase de tweets que no se había considerado anteriormente, que consiste de tweets que carecen de información suficiente para clasificarlos como positivos, negativos o neutros, por lo que clasificarlos en alguna de estas clases disminuye la precisión del sistema. Evaluaciones experimentales mostraron que el uso de esta cuarta clase denominada irrelevante con el criterio de filtros presentado para la generación de la corpora, mejora el desempeño del sistema. Además se comprobó experimentalmente que el uso de una corpora generada en base a tweets chilenos clasifican mejor a los comentarios originados por usuarios locales.
|
15 |
Automatic, adaptive, and applicative sentiment analysis / Analyse de sentiments automatique, adaptative et applicativePak, Alexander 13 June 2012 (has links)
L'analyse de sentiments est un des nouveaux défis apparus en traitement automatique des langues avec l'avènement des réseaux sociaux sur le WEB. Profitant de la quantité d'information maintenant disponible, la recherche et l'industrie se sont mises en quête de moyens pour analyser automatiquement les opinions exprimées dans les textes. Pour nos travaux, nous nous plaçons dans un contexte multilingue et multi-domaine afin d'explorer la classification automatique et adaptative de polarité.Nous proposons dans un premier temps de répondre au manque de ressources lexicales par une méthode de construction automatique de lexiques affectifs multilingues à partir de microblogs. Pour valider notre approche, nous avons collecté plus de 2 millions de messages de Twitter, la plus grande plate-forme de microblogging et avons construit à partir de ces données des lexiques affectifs pour l'anglais, le français, l'espagnol et le chinois.Pour une meilleure analyse des textes, nous proposons aussi de remplacer le traditionnel modèle n-gramme par une représentation à base d'arbres de dépendances syntaxiques. Dans notre modèles, les n-grammes ne sont plus construits à partir des mots mais des triplets constitutifs des dépendances syntaxiques. Cette manière de procéder permet d'éviter la perte d'information que l'on obtient avec les approches classiques à base de sacs de mots qui supposent que les mots sont indépendants.Finalement, nous étudions l'impact que les traits spécifiques aux entités nommées ont sur la classification des opinions minoritaires et proposons une méthode de normalisation des décomptes d'observables, qui améliore la classification de ce type d'opinion en renforçant le poids des termes affectifs.Nos propositions ont fait l'objet d'évaluations quantitatives pour différents domaines d'applications (les films, les revues de produits commerciaux, les nouvelles et les blogs) et pour plusieurs langues (anglais, français, russe, espagnol et chinois), avec en particulier une participation officielle à plusieurs campagnes d'évaluation internationales (SemEval 2010, ROMIP 2011, I2B2 2011). / Sentiment analysis is a challenging task today for computational linguistics. Because of the rise of the social Web, both the research and the industry are interested in automatic processing of opinions in text. In this work, we assume a multilingual and multidomain environment and aim at automatic and adaptive polarity classification.We propose a method for automatic construction of multilingual affective lexicons from microblogging to cover the lack of lexical resources. To test our method, we have collected over 2 million messages from Twitter, the largest microblogging platform, and have constructed affective resources in English, French, Spanish, and Chinese.We propose a text representation model based on dependency parse trees to replace a traditional n-grams model. In our model, we use dependency triples to form n-gram like features. We believe this representation covers the loss of information when assuming independence of words in the bag-of-words approach.Finally, we investigate the impact of entity-specific features on classification of minor opinions and propose normalization schemes for improving polarity classification. The proposed normalization schemes gives more weight to terms expressing sentiments and lower the importance of noisy features.The effectiveness of our approach has been proved in experimental evaluations that we have performed across multiple domains (movies, product reviews, news, blog posts) and multiple languages (English, French, Russian, Spanish, Chinese) including official participation in several international evaluation campaigns (SemEval'10, ROMIP'11, I2B2'11).
|
16 |
Modelo social de relevância para opiniões. / S.O.R.M.: Social Opinion Relevance Model.Lima, Allan Diego Silva 02 October 2014 (has links)
Esta tese apresenta um modelo de relevância de opinião genérico e independente de domínio para usuários de Redes Sociais. O Social Opinion Relevance Model (SORM) é capaz de estimar a relevância de uma opinião com base em doze parâmetros distintos. Comparado com outros modelos, a principal característica que distingue o SORM é a sua capacidade para fornecer resultados personalizados de relevância de uma opinião, de acordo com o perfil da pessoa para a qual ela está sendo estimada. Devido à falta de corpus de relevância de opiniões capazes de testar corretamente o SORM, fez-se necessária a criação de um novo corpus chamado Social Opinion Relevance Corpus (SORC). Usando o SORC, foram realizados experimentos no domínio de jogos eletrônicos que ilustram a importância da personalização da relevância para alcançar melhores resultados, baseados em métricas típicas de Recuperação de Informação. Também foi realizado um teste de significância estatística que reforça e confirma as vantagens que o SORM oferece. / This thesis presents a generic and domain independent opinion relevance model for Social Network users. The Social Opinion Relevance Model (SORM) is able to estimate an opinions relevance based on twelve different parameters. Compared to other models, SORMs main distinction is its ability to provide customized results, according to whom the opinion relevance is being estimated for. Due to the lack of opinion relevance corpora that are able to properly test our model, we have created a new one called Social Opinion Relevance Corpus (SORC). Using SORC, we carried out some experiments on the Electronic Games domain that illustrate the importance of customizing opinion relevance in order to achieve better results, based on typical Information Retrieval metrics, such as NDCG, QMeasure and MAP. We also performed a statistical significance test that reinforces and corroborates the advantages that SORM offers.
|
17 |
Aplicação de Deep Learning em dados refinados para Mineração de OpiniõesJost, Ingo 26 February 2015 (has links)
Submitted by Maicon Juliano Schmidt (maicons) on 2015-06-12T19:13:14Z
No. of bitstreams: 1
Ingo Jost.pdf: 1217467 bytes, checksum: bf67cd6724b1cd182a12a3cd7b5af1eb (MD5) / Made available in DSpace on 2015-06-12T19:13:14Z (GMT). No. of bitstreams: 1
Ingo Jost.pdf: 1217467 bytes, checksum: bf67cd6724b1cd182a12a3cd7b5af1eb (MD5)
Previous issue date: 2015-02-26 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Deep Learning é uma sub-área de Aprendizado de Máquina que tem obtido resultados sa- tisfatórios em várias áreas de aplicação, implementada por diferentes algoritmos, como Stacked Auto-encoders ou Deep Belief Networks. Este trabalho propõe uma modelagem que aplica uma implementação de um classificador que aborda técnicas de Deep Learning em Mineração de Opiniões, área que tem sido alvo de constantes estudos, dada a necessidade das corporações buscarem a compreensão que clientes possuem de seus produtos ou serviços. O favorecimento do crescimento de Mineração de Opiniões também se dá pelo ambiente colaborativo da Web 2.0, em que várias ferramentas propiciam a emissão de opiniões. Os dados utilizados passaram por um refinamento na etapa de pré-processamento com o intuito de aplicar Deep Learning, da qual uma das principais atribuições é a seleção de características, em dados refinados em vez de dados mais brutos. A promissora tecnologia de Deep Learning combinada com a estratégia de refinamento demonstrou nos experimentos a obtenção de resultados competitivos com outros estudos relacionados e abrem perspectiva de extensão deste trabalho. / Deep Learning is a Machine Learning’s sub-area that have achieved satisfactory results in different application areas, implemented by different algorithms, such as Stacked Auto- encoders or Deep Belief Networks. This work proposes a research that applies a classifier that implements Deep Learning concepts in Opinion Mining, area has been approached by con- stant researches, due the need of corporations seeking the understanding that customers have of your products or services. The Opinion Mining’s growth is favored also by the collaborative Web 2.0 environment, where multiple tools provide issuing opinions. The data used for exper- iments were refined in preprocessing step in order to apply Deep Learning, which it one of the main tasks the feature selection, in refined data, instead of applying Deep Learning in more raw data. The refinement strategy combined with the promising technology of Deep Learning has demonstrated in preliminary experiments the achievement of competitive results with other studies and opens the perspective for extension of this work.
|
18 |
Análise de sentimentos baseada em aspectos e atribuições de polaridade / Aspect-based sentiment analysis and polarity assignmentKauer, Anderson Uilian January 2016 (has links)
Com a crescente expansão da Web, cada vez mais usuários compartilham suas opiniões sobre experiências vividas. Essas opiniões estão, na maioria das vezes, representadas sob a forma de texto não estruturado. A Análise de Sentimentos (ou Mineração de Opinião) é a área dedicada ao estudo computacional das opiniões e sentimentos expressos em textos, tipicamente classificando-os de acordo com a sua polaridade (i.e., como positivos ou negativos). Ao mesmo tempo em que sites de vendas e redes sociais tornam-se grandes fontes de opiniões, cresce a busca por ferramentas que, de forma automática, classifiquem as opiniões e identifiquem a qual aspecto da entidade avaliada elas se referem. Neste trabalho, propomos métodos direcionados a dois pontos fundamentais para o tratamento dessas opiniões: (i) análise de sentimentos baseada em aspectos e (ii) atribuição de polaridade. Para a análise de sentimentos baseada em aspectos, desenvolvemos um método que identifica expressões que mencionem aspectos e entidades em um texto, utilizando ferramentas de processamento de linguagem natural combinadas com algoritmos de aprendizagem de máquina. Para a atribuição de polaridade, desenvolvemos um método que utiliza 24 atributos extraídos a partir do ranking gerado por um motor de busca e para gerar modelos de aprendizagem de máquina. Além disso, o método não depende de recursos linguísticos e pode ser aplicado sobre dados com ruídos. Experimentos realizados sobre datasets reais demonstram que, em ambas as contribuições, conseguimos resultados próximos aos dos baselines mesmo com um número pequeno de atributos. Ainda, para a atribuição de polaridade, os resultados são comparáveis aos de métodos do estado da arte que utilizam técnicas mais complexas. / With the growing expansion of the Web, more and more users share their views on experiences they have had. These views are, in most cases, represented in the form of unstructured text. The Sentiment Analysis (or Opinion Mining) is a research area dedicated to the computational study of the opinions and feelings expressed in texts, typically categorizing them according to their polarity (i.e., as positive or negative). As on-line sales and social networking sites become great sources of opinions, there is a growing need for tools that classify opinions and identify to which aspect of the evaluated entity they refer to. In this work, we propose methods aimed at two key points for the treatment of such opinions: (i) aspect-based sentiment analysis and (ii) polarity assignment. For aspect-based sentiment analysis, we developed a method that identifies expressions mentioning aspects and entities in text, using natural language processing tools combined with machine learning algorithms. For the identification of polarity, we developed a method that uses 24 attributes extracted from the ranking generated by a search engine to generate machine learning models. Furthermore, the method does not rely on linguistic resources and can be applied to noisy data. Experiments on real datasets show that, in both contributions, our results using a small number of attributes were similar to the baselines. Still, for assigning polarity, the results are comparable to prior art methods that use more complex techniques.
|
19 |
Análise de sentimentos em tíquetes para o suporte de TI / Sentiment Analysis in Tickets for IT SupportBlaz, Cássio Castaldi Araújo January 2017 (has links)
Análise de Sentimentos/Mineração de Opinião é adotada na engenharia de software para questões como usabilidade e sentimentos de desenvolvedores em projetos. Este trabalho propõe métodos para avaliar os sentimentos presentes em tíquetes abertos à área de suporte de TI. Há diversos tipos de tíquetes abertos à TI (e.g. infraestrutura, software), que envolvem erros, incidentes, requisições, etc. O maior desafio é automaticamente distinguir entre a necessidade em si, a qual é intrinsecamente negativa (por exemplo, a descrição de um erro), de um sentimento embutido na descrição. Nossa abordagem automaticamente cria um dicionário de domínio que contém termos que expressam sentimentos no contexto de TI, utilizados para filtrar expressões em um tíquete para análise de sentimentos. Nós criamos e avaliamos três métodos de classificação para calcular a polaridade em tíquetes. Nosso estudo utilizou 34.895 tíquetes de cinco organizações. Para polaridade, 2.333 tíquetes foram selecionados aleatoriamente para compor nosso gold standard. Nossos melhores resultados apresentam uma precisão e revocação de 82,83% e 88,42%, respectivamente, o que supera outras soluções de análise de sentimentos comparadas. De forma complementar, emoções em tíquetes foram estudadas considerando os modelos de Ekman e VAD. Um dos três métodos de classificação criados foi adaptado para também identificar emoções nos tíquetes. Possíveis correlações entre polaridade e emoções foram verificadas via regras de associação. Resultados correlacionam tíquetes positivos com valência e dominância altas e excitação baixa, além de presença de alegria e surpresa e ausência de medo. Tíquetes negativos correlacionam com valência, excitação e dominância neutras, além de ausência de alegria e presença de medo. Contudo os resultados para a polaridade negativa não são precisos. / Sentiment Analysis/Opinion Mining has been adopted in software engineering for problems such as software usability and sentiment of developers in projects. This work proposes methods to evaluate the sentiment contained in tickets for IT (Information Technology) support. IT tickets are broad in coverage (e.g. infrastructure, software), and involve errors, incidents, requests, etc. The main challenge is to automatically distinguish between factual information, which is intrinsically negative (e.g. error description), from the sentiment embedded in the description. Our approach is to automatically create a domain dictionary that contains terms with sentiment in IT context, used to filter terms in tickets for sentiment analysis. We created and evaluate three classification methods for calculating the polarity of terms in tickets. Our study was developed using 34,895 tickets from five organizations. For polarity, we randomly selected 2.333 tickets to compose a gold standard. Our best results display an average precision and recall of 82.83% and 88.42%, respectively, which outperforms the compared sentiment analysis solutions. Complementarily, emotions in tickets were studied considering the models of Ekman and VAD. One of the three classification methods created has been adapted to also identify emotions in the tickets. Possible correlations between polarity and emotions were verified through association rules. Results correlate positive tickets with valence and dominance high and low excitation, besides presence of joy and surprise and absence of fear. Negative tickets correlate with valence, neutral excitement and dominance, besides absence of joy and presence of fear. However the results for negative polarity are not accurate.
|
20 |
Aspect extraction in sentiment analysis for portuguese language / Extração de aspectos em análise de sentimentos para língua portuguesaBalage Filho, Pedro Paulo 29 August 2017 (has links)
Aspect-based sentiment analysis is the field of study which extracts and interpret the sentiment, usually classified as positive or negative, towards some target or aspect in an opinionated text. This doctoral dissertation details an empirical study of techniques and methods for aspect extraction in aspect-based sentiment analysis with the focus on Portuguese. Three different approaches were explored: frequency-based, relation-based and machine learning. In each one, this work shows a comparative study between a Portuguese and an English corpora and the differences found in applying the approaches. In addition, richer linguistic knowledge is also explored by using syntatic dependencies and semantic roles, leading to better results. This work lead to the establishment of new benchmarks for the aspect extraction in Portuguese. / A análise do sentimento orientada a aspectos é o campo de estudo que extrai e interpreta o sentimento, geralmente classificado como positivo ou negativo, em direção a algum alvo ou aspecto em um texto de opinião. Esta tese de doutorado detalha um estudo empírico de técnicas e métodos para extração de aspectos em análises de sentimentos baseadas em aspectos com foco na língua Portuguesa. Foram exploradas três diferentes abordagens: métodos baseados na frequências, métodos baseados na relação e métodos de aprendizagem de máquina. Em cada abordagem, este trabalho mostra um estudo comparativo entre um córpus para o Português e outro para o Inglês e as diferenças encontradas na aplicação destas abordagens. Além disso, o conhecimento linguístico mais rico também é explorado pelo uso de dependências sintáticas e papéis semânticos, levando a melhores resultados. Este trabalho resultou no estabelecimento de novos padrões de avaliação para a extração de aspectos em Português.
|
Page generated in 0.077 seconds