A contribution to topological learning and its application in Social Networks / Une contribution à l'apprentissage topologique et son application dans les réseaux sociaux

Ezzeddine, Diala 01 October 2014 (has links)
L'Apprentissage Supervisé est un domaine populaire de l'Apprentissage Automatique en progrès constant depuis plusieurs années. De nombreuses techniques ont été développées pour résoudre le problème de classification, mais, dans la plupart des cas, ces méthodes se basent sur la présence et le nombre de points d'une classe donnée dans des zones de l'espace que doit définir le classifieur. Á cause de cela la construction de ce classifieur est dépendante de la densité du nuage de points des données de départ. Dans cette thèse, nous montrons qu'utiliser la topologie des données peut être une bonne alternative lors de la construction des classifieurs. Pour cela, nous proposons d'utiliser les graphes topologiques comme le Graphe de Gabriel (GG) ou le Graphes des Voisins Relatifs (RNG). Ces dernier représentent la topologie de données car ils sont basées sur la notion de voisinages et ne sont pas dépendant de la densité. Pour appliquer ce concept, nous créons une nouvelle méthode appelée Classification aléatoire par Voisinages (Random Neighborhood Classification (RNC)). Cette méthode utilise des graphes topologiques pour construire des classifieurs. De plus, comme une Méthodes Ensemble (EM), elle utilise plusieurs classifieurs pour extraire toutes les informations pertinentes des données. Les EM sont bien connues dans l'Apprentissage Automatique. Elles génèrent de nombreux classifieurs à partir des données, puis agrègent ces classifieurs en un seul. Le classifieur global obtenu est reconnu pour être très eficace, ce qui a été montré dans de nombreuses études. Cela est possible car il s'appuie sur des informations obtenues auprès de chaque classifieur qui le compose. Nous avons comparé RNC à d'autres méthodes de classification supervisées connues sur des données issues du référentiel UCI Irvine. Nous constatons que RNC fonctionne bien par rapport aux meilleurs d'entre elles, telles que les Forêts Aléatoires (RF) et Support Vector Machines (SVM). La plupart du temps, RNC se classe parmi les trois premières méthodes en terme d'eficacité. Ce résultat nous a encouragé à étudier RNC sur des données réelles comme les tweets. Twitter est un réseau social de micro-blogging. Il est particulièrement utile pour étudier l'opinion à propos de l'actualité et sur tout sujet, en particulier la politique. Cependant, l'extraction de l'opinion politique depuis Twitter pose des défis particuliers. En effet, la taille des messages, le niveau de langage utilisé et ambiguïté des messages rend très diffcile d'utiliser les outils classiques d'analyse de texte basés sur des calculs de fréquence de mots ou des analyses en profondeur de phrases. C'est cela qui a motivé cette étude. Nous proposons d'étudier les couples auteur/sujet pour classer le tweet en fonction de l'opinion de son auteur à propos d'un politicien (un sujet du tweet). Nous proposons une procédure qui porte sur l'identification de ces opinions. Nous pensons que les tweets expriment rarement une opinion objective sur telle ou telle action d'un homme politique mais plus souvent une conviction profonde de son auteur à propos d'un mouvement politique. Détecter l'opinion de quelques auteurs nous permet ensuite d'utiliser la similitude dans les termes employés par les autres pour retrouver ces convictions à plus grande échelle. Cette procédure à 2 étapes, tout d'abord identifier l'opinion de quelques couples de manière semi-automatique afin de constituer un référentiel, puis ensuite d'utiliser l'ensemble des tweets d'un couple (tous les tweets d'un auteur mentionnant un politicien) pour les comparer avec ceux du référentiel. L'Apprentissage Topologique semble être un domaine très intéressant à étudier, en particulier pour résoudre les problèmes de classification...... / Supervised Learning is a popular field of Machine Learning that has made recent progress. In particular, many methods and procedures have been developed to solve the classification problem. Most classical methods in Supervised Learning use the density estimation of data to construct their classifiers.In this dissertation, we show that the topology of data can be a good alternative in constructing classifiers. We propose using topological graphs like Gabriel graphs (GG) and Relative Neighborhood Graphs (RNG) that can build the topology of data based on its neighborhood structure. To apply this concept, we create a new method called Random Neighborhood Classification (RNC).In this method, we use topological graphs to construct classifiers and then apply Ensemble Methods (EM) to get all relevant information from the data. EM is well known in Machine Learning, generates many classifiers from data and then aggregates these classifiers into one. Aggregate classifiers have been shown to be very efficient in many studies, because it leverages relevant and effective information from each generated classifier. We first compare RNC to other known classification methods using data from the UCI Irvine repository. We find that RNC works very well compared to very efficient methods such as Random Forests and Support Vector Machines. Most of the time, it ranks in the top three methods in efficiency. This result has encouraged us to study the efficiency of RNC on real data like tweets. Twitter, a microblogging Social Network, is especially useful to mine opinion on current affairs and topics that span the range of human interest, including politics. Mining political opinion from Twitter poses peculiar challenges such as the versatility of the authors when they express their political view, that motivate this study. We define a new attribute, called couple, that will be very helpful in the process to study the tweets opinion. A couple is an author that talk about a politician. We propose a new procedure that focuses on identifying the opinion on tweet using couples. We think that focusing on the couples's opinion expressed by several tweets can overcome the problems of analysing each single tweet. This approach can be useful to avoid the versatility, language ambiguity and many other artifacts that are easy to understand for a human being but not automatically for a machine.We use classical Machine Learning techniques like KNN, Random Forests (RF) and also our method RNC. We proceed in two steps : First, we build a reference set of classified couples using Naive Bayes. We also apply a second alternative method to Naive method, sampling plan procedure, to compare and evaluate the results of Naive method. Second, we evaluate the performance of this approach using proximity measures in order to use RNC, RF and KNN. The expirements used are based on real data of tweets from the French presidential election in 2012. The results show that this approach works well and that RNC performs very good in order to classify opinion in tweets.Topological Learning seems to be very intersting field to study, in particular to address the classification problem. Many concepts to get informations from topological graphs need to analyse like the ones described by Aupetit, M. in his work (2005). Our work show that Topological Learning can be an effective way to perform classification problem.

Worüber reden die Kunden? – Ein modelbasierter Ansatz für die Analyse von Kundenmeinungen in Microblogs

Schieber, Andreas, Sommer, Stefan, Heinrich, Kai, Hilbert, Andreas January 2011 (has links)
Im Social Commerce entwickeln sich die Kunden zu einer bedeutenden Informationsquelle für Unternehmen. Die Kunden nutzen die Kommunikationsplattformen des Web 2.0 (z.B. Twitter), um ihre Meinungen und Erfahrungen über Produkte zu äußern. Diese Diskussionen können sehr wichtig für die Entwicklung von Produkten eines Unternehmens sein. Ein modellbasierter Ansatz soll es einem Unternehmen ermöglichen, die Meinungen zu seinen Produkten in Microblogs zu betrachten. Der erste Schritt dafür ist die Erkennung von Themen in einem spezifischen Kontext. In einem weiteren Schritt müssen die zu den Themen korrespondierenden Einträge bezüglich der geäußerten Meinungen analysiert werden. Für die Erkennung der Themen kommt ein Verfahren zum Einsatz, das auf der Latent Dirichlet Allocation basiert. Das Verfahren identifizierte eventbasierte Themen im Zusammenhang mit den 3D-TV-Anlagen von Sony.

Comparative text summarization of product reviews

Singi Reddy, Dinesh Reddy January 1900 (has links)
Master of Science / Department of Computing and Information Sciences / William H. Hsu / This thesis presents an approach towards summarizing product reviews using comparative sentences by sentiment analysis. Specifically, we consider the problem of extracting and scoring features from natural language text for qualitative reviews in a particular domain. When shopping for a product, customers do not find sufficient time to learn about all products on the market. Similarly, manufacturers do not have proper written sources from which to learn about customer opinions. The only available techniques involve gathering customer opinions, often in text form, from e-commerce and social networking web sites and analyzing them, which is a costly and time-consuming process. In this work I address these issues by applying sentiment analysis, an automated method of finding the opinion stated by an author about some entity in a text document. Here I first gather information about smart phones from many e-commerce web sites. I then present a method to differentiate comparative sentences from normal sentences, form feature sets for each domain, and assign a numerical score to each feature of a product and a weight coefficient obtained by statistical machine learning, to be used as a weight for that feature in ranking various products by linear combinations of their weighted feature scores. In this thesis I also explain what role comparative sentences play in summarizing the product. In order to find the polarity of each feature a statistical algorithm is defined using a small-to-medium sized data set. Then I present my experimental environment and results, and conclude with a review of claims and hypotheses stated at the outset. The approach specified in this thesis is evaluated using manual annotated trained data and also using data from domain experts. I also demonstrate empirically how different algorithms on this summarization can be derived from the technique provided by an annotator. Finally, I review diversified options for customers such as providing alternate products for each feature, top features of a product, and overall rankings for products.

基於意見探勘與主題模型之部落格食記剖析研究 / A Study of Opinion Mining and Topic Model Analysis on Food Diaries

賴柏帆, Lai, Po Fan Unknown Date (has links)
隨著Web 2.0興起,社群網站在資訊傳遞與獲取所占比重相當高。以美食領域來看,人們在進餐廳前先行閱覽食記評論之情形越來越常見,而部落格文章因圖文並茂,常被消費者列入參考比較之來源。儘管這一類食記內容相對短篇食評來說較為完整,但評論分散於文章中,且多半沒有評分可供參考,讀者很難在第一時間獲悉評論樣貌,得花上一番心力進行閱覽,才能對餐廳整體有所評鑑。 本研究提出一套基於意見探勘與主題模型的食記剖析方法,由部落格中各餐廳貼文情緒量來反映正負面評價,將提及評論歸納為「食物」、「服務」及「環境」三個評分面向,進而提供該家餐廳的整體推薦分數,供讀者快速參閱之。實驗語料自痞客邦美食類貼文中選定添好運台灣-台北站前店、京星港式飲茶PART2、金泰日式料理-內湖店以及喀佈貍(一店)大眾和風串燒居酒洋食堂,合計4家餐廳與200篇語料。 透過LDA主題模型對食記敘述進行主題式分群,使擁有相近主題概念的句子分為一群,並歸類至各面向,例如喀佈貍(一店)之語料可分為10群主題語句,食物面向上有6群,服務與環境面向各為2群。另一方面,為了更有效辨別食記中含有的正負向情緒,本研究透過語意導向方法(SO-PMI)來計算食記中常出現情緒詞彙之極性,以建置該領域的意見詞詞庫。 實驗結果方面,以線上餐廳評論網站-iPeen愛評網作為驗證對象,顯示其語料的平均情緒量相近,於大眾觀感與評價上傾向一致,且相較一般評論網站,本研究能從較細微的面向來切入,並以情緒量反映真實的餐廳評價。最後提出未來欲探討與改善之處,供後續研究參考之。 / As the time of Web 2.0 rise, social media platform plays a crucial role in transferring and receiving information. More and more people get used to reading the related posts before having meal. Because of its richness in content and referring photographs, blog posts are most frequently used for reference. Although the blog posts are more complete regarding their content than other short reviews, the actual reviews are scattered among words that are simply descriptions, and there are no grading scale to take as reference. These all together gives the reader a hard time to efficiently organize the overview of the review, and for them to, therefore, make the decision if they should go to the restaurant. Our study offers a method of analyzing food diaries based on opinion mining and topic model. The scale of emotion in a blog post about a restaurant is used as the reflection of its review's positive or negative. The comments are categorized into food, service and environment. And the restaurant will be graded based on these three aspects to further provide the user an overall score of recommendation. We collected total of 200 articles written on 4 restaurants in PIXNET, then categorized the contents using LDA (Latent Dirichlet Allocation) model base on their theme. The sentences with similar theme with be put into a group, then be further categorized to the three aspects that was mentioned earlier. On the other hand, to better distinguish if the emotion in certain food diary is positive or negative, our study calculated the polarity of common opinion-based words in food diaries using semantic orientation (SO-PMI), and built an opinion corpus specifically for food diaries. In terms of the result, using iPeen, a restaurant rating website, as test reference, it shows that the average scales of opinion of the restaurants we got using our method are close to iPeen, which in this case we can say are close to the public opinion and review. Furthermore, compare to common rating website, our study touches on even the minute aspect, and use the cumulative opinion to reflect the true blog authors' evaluation of the restaurant. Lastly, we would like to bring up what we intend to discuss and improve in the future for upcoming research's reference.


Hsiao, Shih-Hui 01 January 2016 (has links)
Given its relative infancy, there is a dearth of research on a comprehensive view of business social media analytics (SMA). This dissertation first examines current literature related to SMA and develops an integrated, unifying definition of business SMA, providing a nuanced starting point for future business SMA research. This dissertation identifies several benefits of business SMA, and elaborates on some of them, while presenting recent empirical evidence in support of foregoing observations. The dissertation also describes several challenges facing business SMA today, along with supporting evidence from the literature, some of which also offer mitigating solutions in particular contexts. The second part of this dissertation studies one SMA implication focusing on identifying social influencer. Growing social media usage, accompanied by explosive growth in SMA, has resulted in increasing interest in finding automated ways of discovering influencers in online social interactions. Beginning 2008, many variants of multiple basic approaches have been proposed. Yet, there is no comprehensive study investigating the relative efficacy of these methods in specific settings. This dissertation investigates and reports on the relative performance of multiple methods on Twitter datasets containing between them tens of thousands to hundreds of thousands of tweets. Accordingly, the second part of the dissertation helps further an understanding of business SMA and its many aspects, grounded in recent empirical work, and is a basis for further research and development. This dissertation provides a relatively comprehensive understanding of SMA and the implementation SMA in influencer identification.

Leveraging User-Generated Content for Enhancing and Personalizing News Recommendation. / Analyse des opinions pour personnaliser la recommandation d’articles dans les portails d’informations

Meguebli, Youssef 27 March 2015 (has links)
La motivation principale de cette thèse est de proposer un système de recommandation personnalisé pour les plateformes d’informations. Pour cela, nous avons démontré que les opinions peuvent constituer un descripteur efficace pour améliorer la qualité de la recommandation. Au cours de cette thèse, nous avons abordé ce problème en proposant trois contributions principales. Tout d’abord, nous avons proposé un modèle de profil qui décrit avec précision les intérêts des utilisateurs ainsi que le contenu des articles de presse. Le modèle de profil proposé repose sur trois éléments : les entités nommées, les aspects et les sentiments. Nous avons testé notre modèle de profil sur les trois applications différentes que sont l’identification des orientations politiques des utilisateurs, la recommandation personnalisée des articles de presse et enfin la diversification de la liste des articles recommandés. Deuxièmement, nous avons proposé une approche de classement des opinions permettant de filtrer et sélectionner seulement les opinions pertinentes. Pour cela, nous avons utilisé une variation de la technique de PageRank pour définir le score de chaque opinion. Les résultats montrent que notre approche surpasse deux approches récemment proposées pour le classement des opinions. Troisièmement, nous avons étudié différentes façons d’enrichir le contenu des articles de presse par les opinions : par toutes les opinions, par seulement le topk des opinions, et enfin par un ensemble d’opinions diversifiées. Les résultats montrent que l’enrichissement des contenus des articles de presse / In this thesis, we have investigated how to exploit user-generated-content for personalized news recommendation purpose. The intuition behind this line of research is that the opinions provided by users, on news websites, represent a strong indicator about their profiles. We have addressed this problem by proposing three main contributions. Firstly, we have proposed a profile model that accurately describes both users’ interests and news article contents. The profile model was tested on three different applications ranging from identifying the political orientation of users to the context of news recommendation and the diversification of the list of recommended news articles. Results show that our profile model give much better results compared to state-of-the-art models. Secondly, we have investigated the problem of noise on opinions and how we can retrieve only relevant opinions in response to a given query.The proposed opinion ranking strategy is based on users’ debates features. We have used a variation of PageRank technique to define the score of each opinion. Results show that our approach outperforms two recent proposed opinions ranking strategies, particularly for controversial topics. Thirdly, we have investigated different ways of leveraging opinions on news article contents including all opinions, topk opinions based on opinion ranking strategy, and a set of diverse opinion. To extract a list of diverse opinions, we have employed a variation of an existing opinion diversification model. Results show that diverse opinions give the best performance over other leveraging strategies.

Estudo e avaliação de métodos de análise de sentimentos baseada em aspectos para textos opinativos em português / Study and evaluation of methods of aspect based sentiment analysis for opinative texts in Portuguese

Machado, Mateus Tarcinalli 05 September 2018 (has links)
Esta dissertação tem como objeto de estudo a análise de sentimentos baseada em aspectos, aplicação derivada da análise de sentimentos e da área de processamento de linguagem natural. A análise de sentimentos baseada em aspectos é focada em analisar textos avaliativos (textos contendo opiniões) buscando identificar e relacionar sentimentos e aspectos de uma determinada entidade (produtos, serviços entre outros). As principais etapas do desenvolvimento deste trabalho são a identificação de aspectos, que busca identificar as características de determinada entidade no texto e a identificação de sentimentos que procura encontrar o sentimento expresso pelo autor com relação ao aspecto mencionado. O objetivo deste trabalho é implementar, analisar, melhorar e criar métodos não supervisionados de análise de sentimentos baseada em aspectos para textos em português. Essa exploração se dará pela implementação de métodos para identificação de aspectos e sentimentos, criação e combinação de léxicos de sentimentos. Para alcançar esse objetivo realizamos experimentos com conjunto de dados anotado, ou seja, já com os aspectos e sentimentos relacionados marcados em seu texto. Para o processamento, além de técnicas de processamento de língua natural, como a análise gramatical, foram utilizados métodos de análise estatística dos textos e resultados. / This dissertation has as object of study the aspect based sentiment analysis, application derived from sentiment analysis and the area of natural language processing. The aspect based sentiment analysis focuses on analyzing evaluative texts (texts containing opinions) seeking to identify and relate feelings and aspects of a particular entity (products, services among others). The main stages of the development of this work are the identification of aspects, which seeks to identify the characteristics of a certain entity in texts, and sentiment identification that aims to identify the feelings expressed by the author concerned about the mentioned aspects. The purpose of this work is to implement, analyze, improve and create unsupervised methods of aspect based sentiment analysis applying them in portuguese language texts. This exploration will be through the implementation of methods for identifying aspects and sentiments, creation and combination of sentiment lexicons. To achieve this goal we performed experiments with annotated data set, that is, texts with the related aspects and sentiments already marked. For processing, in addition to natural language processing techniques, such as grammatical analysis, methods of statistical analysis of texts and results were used.

Aplicação da mineração de opinião no planejamento turístico do município de Gramado

Endres, Marco Antonio Trois 28 April 2016 (has links)
Submitted by Silvana Teresinha Dornelles Studzinski (sstudzinski) on 2016-07-18T18:07:03Z No. of bitstreams: 1 Marco Antonio Trois Endres _.pdf: 5043076 bytes, checksum: f944e2d6d0e1a6e1ca49512a57670875 (MD5) / Made available in DSpace on 2016-07-18T18:07:04Z (GMT). No. of bitstreams: 1 Marco Antonio Trois Endres _.pdf: 5043076 bytes, checksum: f944e2d6d0e1a6e1ca49512a57670875 (MD5) Previous issue date: 2016-04-28 / Nenhuma / O propósito deste estudo é explorar o processo de descoberta de conhecimento e analisar as oportunidades geradas pela Mineração de Opinião como técnica para se obter um retorno sobre experiência do turista em relação aos produtos e serviços ofertados pelo destino turístico. Entender o turista quanto ao seu comportamento de compra e seus hábitos de viagem é fundamental para a ampliação do mercado turístico e melhoria da experiência turística do visitante. Usuários da web têm a oportunidade de registrar e divulgar suas ideias e opiniões através de comentários em redes sociais. Estas opiniões estão disponíveis e em grande volume para as organizações. Neste contexto perguntam-se, quais as contribuições da Mineração de Opinião na geração de informação útil para a gestão da atividade turística, como suporte ao processo de tomada de decisão no planejamento e no aprimoramento das suas ações? Este estudo teve como cenário de investigação o município de Gramado/RS e os comentários registrados em redes sociais pelos turistas que o visitam. Para alcançar o propósito deste estudo, foram extraídas opiniões do Twitter e Facebook e submetidas a uma técnica de análise de sentimentos. Como resultado do estudo, são apresentados e discutidos os resultados da aplicação da Mineração de Opinião consolidados de acordo com as dimensões de competitividade que o município é avaliado. / The purpose of this study is to explore the knowledge discovery process and analyze the opportunities generated by the Opinion Mining as a technique to obtain a feedback on the tourist experience about products and services offered by the tourist destination. Understanding the tourist about their buying behavior and their travel habits is essential to the expansion of the tourist market and improvement of the tourist experience. Web users have the opportunity to register and show their ideas and opinions through posts on social networks. These opinions are available in high volume to organizations. In this context, what are the contributions of Opinion Mining to generate useful information for the management of tourism activities, to support the decision-making process in planning and improvement of their actions? This study analyses the comments registered on social networks by tourists who visit Gramado/RS. To achieve the purpose of this study, opinions were extracted from Twitter and Facebook and submitted to a sentiment analysis technique. As a result of the study are presented and discussed the results summarized according to the competitiveness of dimensions that the municipality is assessed.

Methods and resources for sentiment analysis in multilingual documents of different text types

Balahur Dobrescu, Alexandra 13 June 2011 (has links)
No description available.

