Global ETD Search

51	Experimentos comparativos combinando aprendizado supervisionado e tradução automática para mineração de emoçoes em textos multilíngues / Comparative experiments combining supervised learning and machine translation for multilingual emotion mining Santos, Aline Graciela Lermen dos January 2016 (has links) Com o avanço da Internet pelo mundo, as pessoas passaram a interagir cada vez mais com a Web, principalmente após o surgimento das redes sociais, criando conteúdo que pode ser explorado de diversas formas. Esse aumento de usuários tem sido global, ou seja, pessoas de diversos países passaram a produzir textos de diversos idiomas. Esses textos compõem um rico conteúdo para Análise de Sentimentos Multilíngue. A maior parte dos trabalhos da área se foca em Mineração de Opinião, analisando o sentimento através da polaridade. Outro tipo de sentimento que tem atraído atenção é a emoção, embora não seja amplamente explorada a Análise de Sentimentos Multilíngue usando emoção. Este trabalho utiliza técnicas geralmente usadas para Mineração de Opinião e polaridade para Análise de Sentimentos Multilíngues usando emoção. O objetivo deste trabalho é comparar diferentes combinações de aprendizado de máquina supervisionado e tradução automática para criar corpora em diferentes idiomas a partir de corpora anotados já existentes. As duas formas de utilizar as traduções comparadas são: criando classificadores de emoção separados por idiomas, chamados monolíngues, e criando um classificador composto do idioma original e das traduções, chamado multilíngue. É feito ainda um experimento cruzando dois corpora, visando avaliar o uso da tradução de um corpus com os textos originais do outro. Os resultados dos experimentos mostram não apenas o sucesso de analisar emoção usando aprendizado supervisionado e tradução automática, mas que o classificador multilíngue supera os classificadores monolíngues. O experimento cruzando os corpora mostra que para algumas emoções os corpora estão alinhados, mas que para outras é preciso que haja maior similaridade nos textos. / With the growth of the Internet around the world, people began to interact more and more with the Web, especially after the emergence of social networks, creating content that can be exploited in several ways. This increase in the number of users has been global, that is, people from different countries started producing texts in several languages. These texts comprise a rich content for Multilingual Sentiment Analysis. Most of the work in the area focus in Opinion Mining, analyzing the feeling through polarity. Another type of feeling that has attracted attention is emotion, although not extensively explored in Multilingual Sentiment Analysis. This work uses techniques commonly used for Opinion Mining and polarity for Multilingual Sentiment Analysis using emotion. The objective of this study is to compare different combinations of supervised machine learning and automatic translation to create corpora in different languages from existing annotated corpora. The two ways to use the translations compared are: creating emotion classifiers separated by languages, called monolingual, and creating a composed classifier, with the original language and it’s translations, called multilingual. An experiment crossing the two corpora used is made, to evaluate the use of the translation of one corpus with the original texts of the other. The results of the experiments show not only the success of analysing emotion using supervised machine learning and automatic translation, but that the multilingual classifier exceeds the monolingual classifiers. The experiment crossing the corpora shows that to some emotions the corpora are aligned, but for others there needs to be greater similarity in the texts. Textos : Análise Mineracao : Dados Emoções Textos para o aprendizado da língua Sentiment analysis Multilingual sentiment analysis Emotion Emotion mining
52	Inferring Aspect-Specific Opinion Structure in Product Reviews Carter, David January 2015 (has links) Identifying differing opinions on a given topic as expressed by multiple people (as in a set of written reviews for a given product, for example) presents challenges. Opinions about a particular subject are often nuanced: a person may have both negative and positive opinions about different aspects of the subject of interest, and these aspect-specific opinions can be independent of the overall opinion on the subject. Being able to identify, collect, and count these nuanced opinions in a large set of data offers more insight into the strengths and weaknesses of competing products and services than does aggregating the overall ratings of such products and services. I make two useful and useable contributions in working with opinionated text. First, I present my implementation of a semi-supervised co-training machine classification method for identifying both product aspects (features of products) and sentiments expressed about such aspects. It offers better precision than fully-supervised methods while requiring much less text to be manually tagged (a time-consuming process). This algorithm can also be run in a fully supervised manner when more data is available. Second, I apply this co-training approach to reviews of restaurants and various electronic devices; such text contains both factual statements and opinions about features/aspects of products. The algorithm automatically identifies the product aspects and the words that indicate aspect-specific opinion polarity, while largely avoiding the problem of misclassifying the products themselves as inherently positive or negative. This method performs well compared to other approaches. When run on a set of reviews of five technology products collected from Amazon, the system performed with some demonstrated competence (with an average precision of 0.83) at the difficult task of simultaneously identifying aspects and sentiments, though comparison to contemporaries' simpler rules-based approaches was difficult. When run on a set of opinionated sentences about laptops and restaurants that formed the basis of a shared challenge in the SemEval-2014 Task 4 competition, it was able to classify the sentiments expressed about aspects of laptops better than any team that competed in the task (achieving 0.72 accuracy). It was above the mean in its ability to identify the aspects of restaurants about which people expressed opinions, even when co-training using only half of the labelled training data at the outset. While the SemEval-2014 aspect-based sentiment extraction task considered only separately the tasks of identifying product aspects and determining their polarities, I take an extra step and evaluate sentences as a whole, inferring aspects and the aspect-specific sentiments expressed simultaneously, a more difficult task that seems more applicable to real-world tasks. I present first results of this sentence-level task. The algorithm uses both lexical and syntactic information in a manner that is shown to be able to handle new words that it has never before seen. It offers some demonstrated ability to adapt to new subject domains for which it has no training data. The system is characterizable by very high precision and weak-to-average recall and it estimates its own confidence in its predictions; this characteristic should make the algorithm suitable for use on its own or for combination in a confidence-based voting ensemble. The software created for and described in the course of this dissertation is made available online. machine learning co-training natural language processing semi-supervised learning sentiment analysis aspect-based sentiment analysis computational linguistics sentiment classification
53	[en] MACHINE LEARNING FOR SENTIMENT CLASSIFICATION / [pt] APRENDIZADO DE MÁQUINA PARA O PROBLEMA DE SENTIMENT CLASSIFICATION PEDRO OGURI 18 May 2007 (has links) [pt] Sentiment Analysis é um problema de categorização de texto no qual deseja-se identificar opiniões favoráveis e desfavoráveis com relação a um tópico. Um exemplo destes tópicos de interesse são organizações e seus produtos. Neste problema, documentos são classificados pelo sentimento, conotação, atitudes e opiniões ao invés de se restringir aos fatos descritos neste. O principal desafio em Sentiment Classification é identificar como sentimentos são expressados em textos e se tais sentimentos indicam uma opinião positiva (favorável) ou negativa (desfavorável) com relação a um tópico. Devido ao crescente volume de dados disponível na Web, onde todos tendem a ser geradores de conteúdo e expressarem opiniões sobre os mais variados assuntos, técnicas de Aprendizado de Máquina vem se tornando cada vez mais atraentes. Nesta dissertação investigamos métodos de Aprendizado de Máquina para Sentiment Analysis. Apresentamos alguns modelos de representação de documentos como saco de palavras e N-grama. Testamos os classificadores SVM (Máquina de Vetores Suporte) e Naive Bayes com diferentes modelos de representação textual e comparamos seus desempenhos. / [en] Sentiment Analysis is a text categorization problem in which we want to identify favorable and unfavorable opinions towards a given topic. Examples of such topics are organizations and its products. In this problem, docu- ments are classifed according to their sentiment, connotation, attitudes and opinions instead of being limited to the facts described in it. The main challenge in Sentiment Classification is identifying how sentiments are expressed in texts and whether they indicate a positive (favorable) or negative (unfavorable) opinion towards a topic. Due to the growing volume of information available online in an environment where we all tend to be content generators and express opinions on a variety of subjects, Machine Learning techniques have become more and more attractive. In this dissertation, we investigate Machine Learning methods applied to Sentiment Analysis. We present document representation models such as bag-of-words and N-grams.We compare the performance of the Naive Bayes and the Support Vector Machine classifiers for each proposed model [pt] APRENDIZADO DE MAQUINA [en] MACHINE LEARNING [pt] CLASSIFICADORES BAYSIANOS [en] BAYSIANS CLASSIFIERS [pt] CLASSIFICACAO DE TEXTOS [en] TEXT CLASSIFICATION [pt] SUPPORT VECTOR MACHINES [en] SUPPORT VECTOR MACHINES [pt] SENTIMENT ANALYSIS [en] SENTIMENT ANALYSIS
54	Sentiment Analysis & Time Series Analysis on Stock Market Singh, Aniket Kumar 28 April 2023 (has links) No description available. Artificial Intelligence Computer Science Stock Market Sentiment Analysis Opinion Mining Twitter Sentiment Analysis Time Series Analysis Sentiment & Time Series Model
55	Graph-based approaches for semi-supervised and cross-domain sentiment analysis Ponomareva, Natalia January 2014 (has links) The rapid development of Internet technologies has resulted in a sharp increase in the number of Internet users who create content online. User-generated content often represents people's opinions, thoughts, speculations and sentiments and is a valuable source of information for companies, organisations and individual users. This has led to the emergence of the field of sentiment analysis, which deals with the automatic extraction and classification of sentiments expressed in texts. Sentiment analysis has been intensively researched over the last ten years, but there are still many issues to be addressed. One of the main problems is the lack of labelled data necessary to carry out precise supervised sentiment classification. In response, research has moved towards developing semi-supervised and cross-domain techniques. Semi-supervised approaches still need some labelled data and their effectiveness is largely determined by the amount of these data, whereas cross-domain approaches usually perform poorly if training data are very different from test data. The majority of research on sentiment classification deals with the binary classification problem, although for many practical applications this rather coarse sentiment scale is not sufficient. Therefore, it is crucial to design methods which are able to perform accurate multiclass sentiment classification. The aims of this thesis are to address the problem of limited availability of data in sentiment analysis and to advance research in semi-supervised and cross-domain approaches for sentiment classification, considering both binary and multiclass sentiment scales. We adopt graph-based learning as our main method and explore the most popular and widely used graph-based algorithm, label propagation. We investigate various ways of designing sentiment graphs and propose a new similarity measure which is unsupervised, easy to compute, does not require deep linguistic analysis and, most importantly, provides a good estimate for sentiment similarity as proved by intrinsic and extrinsic evaluations. The main contribution of this thesis is the development and evaluation of a graph-based sentiment analysis system that a) can cope with the challenges of limited data availability by using semi-supervised and cross-domain approaches b) is able to perform multiclass classification and c) achieves highly accurate results which are superior to those of most state-of-the-art semi-supervised and cross-domain systems. We systematically analyse and compare semi-supervised and cross-domain approaches in the graph-based framework and propose recommendations for selecting the most pertinent learning approach given the data available. Our recommendations are based on two domain characteristics, domain similarity and domain complexity, which were shown to have a significant impact on semi-supervised and cross-domain performance. 004.678
56	Probabilistic topic models for sentiment analysis on the Web Chenghua, Lin January 2011 (has links) Sentiment analysis aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text, and has received a rapid growth of interest in natural language processing in recent years. Probabilistic topic models, on the other hand, are capable of discovering hidden thematic structure in large archives of documents, and have been an active research area in the field of information retrieval. The work in this thesis focuses on developing topic models for automatic sentiment analysis of web data, by combining the ideas from both research domains. One noticeable issue of most previous work in sentiment analysis is that the trained classifier is domain dependent, and the labelled corpora required for training could be difficult to acquire in real world applications. Another issue is that the dependencies between sentiment/subjectivity and topics are not taken into consideration. The main contribution of this thesis is therefore the introduction of three probabilistic topic models, which address the above concerns by modelling sentiment/subjectivity and topic simultaneously. The first model is called the joint sentiment-topic (JST) model based on latent Dirichlet allocation (LDA), which detects sentiment and topic simultaneously from text. Unlike supervised approaches to sentiment classification which often fail to produce satisfactory performance when applied to new domains, the weakly-supervised nature of JST makes it highly portable to other domains, where the only supervision information required is a domain-independent sentiment lexicon. Apart from document-level sentiment classification results, JST can also extract sentiment-bearing topics automatically, which is a distinct feature compared to the existing sentiment analysis approaches. The second model is a dynamic version of JST called the dynamic joint sentiment-topic (dJST) model. dJST respects the ordering of documents, and allows the analysis of topic and sentiment evolution of document archives that are collected over a long time span. By accounting for the historical dependencies of documents from the past epochs in the generative process, dJST gives a richer posterior topical structure than JST, and can better respond to the permutations of topic prominence. We also derive online inference procedures based on a stochastic EM algorithm for efficiently updating the model parameters. The third model is called the subjectivity detection LDA (subjLDA) model for sentence-level subjectivity detection. Two sets of latent variables were introduced in subjLDA. One is the subjectivity label for each sentence; another is the sentiment label for each word token. By viewing the subjectivity detection problem as weakly-supervised generative model learning, subjLDA significantly outperforms the baseline and is comparable to the supervised approach which relies on much larger amounts of data for training. These models have been evaluated on real world datasets, demonstrating that joint sentiment topic modelling is indeed an important and useful research area with much to offer in the way of good results. 004.01
57	Diseño e implementación de un sistema para la clasificación de tweets según su polaridad Tapia Caro, Pablo Andrés January 2014 (has links) Ingeniero Civil Indusrial / La alta penetración de Twitter en Chile ha favorecido que esta red social sea utilizada por empresas, políticos y organizaciones como un medio para obtener información adicional de las opiniones de usuarios acerca de sus productos, servicios o ellos mismos. Al ser los comentarios en Twitter, por defecto, de carácter público, se pueden analizar con el fin de extraer información accionable. En particular las empresas además de estar interesadas en la información cuantitativa, les interesa saber bajo qué polaridad se efectúan estas menciones, por cuanto una variación positiva en el número de comentarios puede deberse a un mayor número de menciones tanto positivas como negativas. Si bien existen un número considerable de softwares que vienen con la funcionalidad de detección de polaridad de sentimientos, estos no son de mucha utilidad ya que la forma en que interactúa el usuario chileno con esta plataforma está llena de modismos propios de nuestro lenguaje local y abreviaciones que se deben principalmente a la limitación de caracteres de Twitter. Al ser esta una industria inmadura en Chile, la tarea de detección de polaridad de sentimientos, se está realizando de forma manual por agencias publicitarias y otro tipo de empresas, pero dado el gran número de comentarios que se producen minuto a minuto, esta tarea resulta muy demandante en tiempo y dinero. Para resolver este tipo de problemáticas se utilizan técnicas de aprendizaje automático con el fin de entrenar un algoritmo que luego pueda determinar si un comentario es positivo, negativo o neutro, campo que se conoce como sentiment analysis. Mientras más datos sean procesados para el entrenamiento del algoritmo, mejor es el desempeño del clasificador y como en Twitter es sencillo obtener comentarios mediante su API, a diferencia de la web, se han formulado técnicas para generar automáticamente la corpora que contiene los tweets de entrenamiento para cada una de las clases y así sacar provecho de esta propiedad. En este trabajo se profundiza el uso de una metodología semiautomática basada en emoticons para la generación de una corpora de tweets para la detección de polaridad de sentimientos en Twitter. Esto se realiza introduciendo un nuevo enfoque para la consolidación de los datos de entrenamiento mediante filtros que mejoran el etiquetado automático. Esto permite prevenir la aparición de comentarios erráticos y que causan ruido en las fases de entrenamiento y clasificación. Además se introduce una nueva clase de tweets que no se había considerado anteriormente, que consiste de tweets que carecen de información suficiente para clasificarlos como positivos, negativos o neutros, por lo que clasificarlos en alguna de estas clases disminuye la precisión del sistema. Evaluaciones experimentales mostraron que el uso de esta cuarta clase denominada irrelevante con el criterio de filtros presentado para la generación de la corpora, mejora el desempeño del sistema. Además se comprobó experimentalmente que el uso de una corpora generada en base a tweets chilenos clasifican mejor a los comentarios originados por usuarios locales. Redes sociales - Chile Minería de datos Twitter Web opinion mining Sentiment analysis
58	Towards a science of human stories: using sentiment analysis and emotional arcs to understand the building blocks of complex social systems Reagan, Andrew James 01 January 2017 (has links) We can leverage data and complex systems science to better understand society and human nature on a population scale through language --- utilizing tools that include sentiment analysis, machine learning, and data visualization. Data-driven science and the sociotechnical systems that we use every day are enabling a transformation from hypothesis-driven, reductionist methodology to complex systems sciences. Namely, the emergence and global adoption of social media has rendered possible the real-time estimation of population-scale sentiment, with profound implications for our understanding of human behavior. Advances in computing power, natural language processing, and digitization of text now make it possible to study a culture's evolution through its texts using a "big data" lens. Given the growing assortment of sentiment measuring instruments, it is imperative to understand which aspects of sentiment dictionaries contribute to both their classification accuracy and their ability to provide richer understanding of texts. Here, we perform detailed, quantitative tests and qualitative assessments of 6 dictionary-based methods applied to 4 different corpora, and briefly examine a further 20 methods. We show that while inappropriate for sentences, dictionary-based methods are generally robust in their classification accuracy for longer texts. Most importantly they can aid understanding of texts with reliable and meaningful word shift graphs if (1) the dictionary covers a sufficiently large enough portion of a given text's lexicon when weighted by word usage frequency; and (2) words are scored on a continuous scale. Our ability to communicate relies in part upon a shared emotional experience, with stories often following distinct emotional trajectories, forming patterns that are meaningful to us. By classifying the emotional arcs for a filtered subset of 4,803 stories from Project Gutenberg's fiction collection, we find a set of six core trajectories which form the building blocks of complex narratives. We strengthen our findings by separately applying optimization, linear decomposition, supervised learning, and unsupervised learning. For each of these six core emotional arcs, we examine the closest characteristic stories in publication today and find that particular emotional arcs enjoy greater success, as measured by downloads. Within stories lie the core values of social behavior, rich with both strategies and proper protocol, which we can begin to study more broadly and systematically as a true reflection of culture. Of profound scientific interest will be the degree to which we can eventually understand the full landscape of human stories, and data driven approaches will play a crucial role. Finally, we utilize web-scale data from Twitter to study the limits of what social data can tell us about public health, mental illness, discourse around the protest movement of #BlackLivesMatter, discourse around climate change, and hidden networks. We conclude with a review of published works in complex systems that separately analyze charitable donations, the happiness of words in 10 languages, 100 years of daily temperature data across the United States, and Australian Rules Football games. Complex systems Emotion Narratives Sentiment analysis Stories Visualization Computer Sciences Mathematics
59	LARGE-SCALE NETWORK ANALYSIS FOR ONLINE SOCIAL BRAND ADVERTISING Zhang, Kunpeng, Bhattacharyya, Siddhartha, Ram, Sudha 12 1900 (has links) This paper proposes an audience selection framework for online brand advertising based on user activities on social media platforms. It is one of the first studies to our knowledge that develops and analyzes implicit brand-brand networks for online brand advertising. This paper makes several contributions. We first extract and analyze implicit weighted brand-brand networks, representing interactions among users and brands, from a large dataset. We examine network properties and community structures and propose a framework combining text and network analyses to find target audiences. As a part of this framework, we develop a hierarchical community detection algorithm to identify a set of brands that are closely related to a specific brand. This latter brand is referred to as the "focal brand." We also develop a global ranking algorithm to calculate brand influence and select influential brands from this set of closely related brands. This is then combined with sentiment analysis to identify target users from these selected brands. To process large-scale datasets and networks, we implement several MapReduce-based algorithms. Finally, we design a novel evaluation technique to test the effectiveness of our targeting framework. Experiments conducted with Facebook data show that our framework provides significant performance improvements in identifying target audiences for focal brands. Online advertising brand-brand networks community detection audience selection sentiment analysis
60	A SENTIMENT BASED AUTOMATIC QUESTION-ANSWERING FRAMEWORK Qiaofei Ye (6636317) 14 May 2019 (has links) With the rapid growth and maturity of Question-Answering (QA) domain, non-factoid Question-Answering tasks are in high demand. However, existing Question-Answering systems are either fact-based, or highly keyword related and hard-coded. Moreover, if QA is to become more personable, sentiment of the question and answer should be taken into account. However, there is not much research done in the field of non-factoid Question-Answering systems based on sentiment analysis, that would enable a system to retrieve answers in a more emotionally intelligent way. This study investigates to what extent could prediction of the best answer be improved by adding an extended representation of sentiment information into non-factoid Question-Answering. Applied Computer Science Natural Language Processing Non-factoid Question-Answering Sentiment analysis Long Short-Term Memory

Search results