Spelling suggestions: "subject:"sentiment analysis"" "subject:"centiment analysis""
21 |
Análise de sentimentos em textos curtos provenientes de redes sociais / Sentiment analysis in short texts from social networksSilva, Nadia Felix Felipe da 22 February 2016 (has links)
A análise de sentimentos é um campo de estudo com recente popularização devido ao crescimento da Internet e do conteúdo que é gerado por seus usuários, principalmente nas redes sociais, nas quais as pessoas publicam suas opiniões em uma linguagem coloquial e em muitos casos utilizando de artifícios gráficos para tornar ainda mais sucintos seus diálogos. Esse cenário é observado no Twitter, uma ferramenta de comunicação que pode facilmente ser usada como fonte de informação para várias ferramentas automáticas de inferência de sentimentos. Esforços de pesquisas têm sido direcionados para tratar o problema de análise de sentimentos em redes sociais sob o ponto de vista de um problema de classificação, com pouco consenso sobre qual é o classificador com melhor poder preditivo, bem como qual é a configuração fornecida pela engenharia de atributos que melhor representa os textos. Outro problema é que em um cenário supervisionado, para a etapa de treinamento do modelo de classificação, é imprescindível se dispor de exemplos rotulados, uma tarefa árdua e que demanda esforço humano em grande parte das aplicações. Esta tese tem por objetivo investigar o uso de agregadores de classificadores (classifier ensembles), explorando a diversidade e a potencialidade de várias abordagens supervisionadas quando estas atuam em conjunto, além de um estudo detalhado da fase que antecede a escolha do classificador, a qual é conhecida como engenharia de atributos. Além destes aspectos, um estudo mostrando que o aprendizado não supervisionado pode fornecer restrições complementares úteis para melhorar a capacidade de generalização de classificadores de sentimento é realizado, fornecendo evidências de que ganhos já observados em outras áreas do conhecimento também podem ser obtidos no domínio em questão. A partir dos promissores resultados experimentais obtidos no cenário de aprendizado supervisionado, alavancados pelo uso de técnicas não supervisionadas, um algoritmo existente, denominado de C3E (Consensus between Classification and Clustering Ensembles) foi adaptado e estendido para o cenário semissupervisionado. Este algoritmo refina a classificação de sentimentos a partir de informações adicionais providas pelo agrupamento em um procedimento de autotreinamento (self-training). Tal abordagem apresenta resultados promissores e competitivos com abordagens que representam o estado da arte em outros domínios. / Sentiment analysis is a field of study that shows recent popularization due to the growth of Internet and the content that is generated by its users. More recently, social networks have emerged, where people post their opinions in colloquial and compact language. This is what happens in Twitter, a communication tool that can easily be used as a source of information for various automatic tools of sentiment inference. Research efforts have been directed to deal with the problem of sentiment analysis in social networks from the point of view of a classification problem, where there is no consensus about what is the best classifier, and what is the best configuration provided by the feature engineering process. Another problem is that in a supervised setting, for the training stage of the classification model, we need labeled examples, which are hard to get in the most of applications. The objective of this thesis is to investigate the use of classifier ensembles, exploring the diversity and the potential of various supervised approaches when these work together, as well as to provide a study about the phase that precedes the choice of the classifier, which is known as feature engineering. In addition to these aspects, a study showing that unsupervised learning techniques can provide useful and additional constraints to improve the ability of generalization of the classifiers is also carried out. Based on the promising results got in supervised learning settings, an existing algorithm called C3E (Consensus between Classification and Clustering Ensembles) was adapted and extended for the semi-supervised setting. This algorithm refines the sentiment classification from additional information provided by clusters of data, in a self-training procedure. This approach shows promising results when compared with state of the art algorithms.
|
22 |
Análise de sentimentos em textos curtos provenientes de redes sociais / Sentiment analysis in short texts from social networksNadia Felix Felipe da Silva 22 February 2016 (has links)
A análise de sentimentos é um campo de estudo com recente popularização devido ao crescimento da Internet e do conteúdo que é gerado por seus usuários, principalmente nas redes sociais, nas quais as pessoas publicam suas opiniões em uma linguagem coloquial e em muitos casos utilizando de artifícios gráficos para tornar ainda mais sucintos seus diálogos. Esse cenário é observado no Twitter, uma ferramenta de comunicação que pode facilmente ser usada como fonte de informação para várias ferramentas automáticas de inferência de sentimentos. Esforços de pesquisas têm sido direcionados para tratar o problema de análise de sentimentos em redes sociais sob o ponto de vista de um problema de classificação, com pouco consenso sobre qual é o classificador com melhor poder preditivo, bem como qual é a configuração fornecida pela engenharia de atributos que melhor representa os textos. Outro problema é que em um cenário supervisionado, para a etapa de treinamento do modelo de classificação, é imprescindível se dispor de exemplos rotulados, uma tarefa árdua e que demanda esforço humano em grande parte das aplicações. Esta tese tem por objetivo investigar o uso de agregadores de classificadores (classifier ensembles), explorando a diversidade e a potencialidade de várias abordagens supervisionadas quando estas atuam em conjunto, além de um estudo detalhado da fase que antecede a escolha do classificador, a qual é conhecida como engenharia de atributos. Além destes aspectos, um estudo mostrando que o aprendizado não supervisionado pode fornecer restrições complementares úteis para melhorar a capacidade de generalização de classificadores de sentimento é realizado, fornecendo evidências de que ganhos já observados em outras áreas do conhecimento também podem ser obtidos no domínio em questão. A partir dos promissores resultados experimentais obtidos no cenário de aprendizado supervisionado, alavancados pelo uso de técnicas não supervisionadas, um algoritmo existente, denominado de C3E (Consensus between Classification and Clustering Ensembles) foi adaptado e estendido para o cenário semissupervisionado. Este algoritmo refina a classificação de sentimentos a partir de informações adicionais providas pelo agrupamento em um procedimento de autotreinamento (self-training). Tal abordagem apresenta resultados promissores e competitivos com abordagens que representam o estado da arte em outros domínios. / Sentiment analysis is a field of study that shows recent popularization due to the growth of Internet and the content that is generated by its users. More recently, social networks have emerged, where people post their opinions in colloquial and compact language. This is what happens in Twitter, a communication tool that can easily be used as a source of information for various automatic tools of sentiment inference. Research efforts have been directed to deal with the problem of sentiment analysis in social networks from the point of view of a classification problem, where there is no consensus about what is the best classifier, and what is the best configuration provided by the feature engineering process. Another problem is that in a supervised setting, for the training stage of the classification model, we need labeled examples, which are hard to get in the most of applications. The objective of this thesis is to investigate the use of classifier ensembles, exploring the diversity and the potential of various supervised approaches when these work together, as well as to provide a study about the phase that precedes the choice of the classifier, which is known as feature engineering. In addition to these aspects, a study showing that unsupervised learning techniques can provide useful and additional constraints to improve the ability of generalization of the classifiers is also carried out. Based on the promising results got in supervised learning settings, an existing algorithm called C3E (Consensus between Classification and Clustering Ensembles) was adapted and extended for the semi-supervised setting. This algorithm refines the sentiment classification from additional information provided by clusters of data, in a self-training procedure. This approach shows promising results when compared with state of the art algorithms.
|
23 |
Development of an online reputation monitor / Gerhardus Jacobus Christiaan VenterVenter, Gerhardus Jacobus Christiaan January 2015 (has links)
The opinion of customers about companies are very important as this can influence a company’s profit. Companies often get customer feedback via surveys or other official methods in order to improve their services. However, some customers feel threatened when their opinions are publicly asked and thus prefer to voice their opinion on the internet where they take comfort in anonymity. This form of customer feedback is difficult to monitor as the information can be found anywhere on the internet and new information is generated at an astonishing rate.
Currently there are companies such as Brandseye and Brand.Com that provide online reputation management services. These services have various shortcomings such as cost and is incapable of accessing historical data. Companies are also not allowed to purchase these software and can only use the software on a subscription basis.
The design proposed in this document will be able to scan any number of user defined websites and save all the information found on the websites in a series of index files, which can be queried for occurrences of user defined keywords at any time. Additionally, the software will also be able to scan Twitter and Facebook for any number of user defined keywords and save any occurrences of the keywords to a database. After scanning the internet, the results will be passed through a similarity filter, which will filter out insignificant results as well as any duplicates that might be present. Once passed through the filter the remaining results will be analysed by a sentiment analysis tool which will determine whether the sentence in which the keyword occurs is positive or negative. The analysed results will determine the overall reputation of the keyword that was used.
The proposed design has several advantages over current systems:
- By using the modular design several tasks can execute at the same time without influencingeach other. For example; information can be extracted from the internet while existing resultsare being analysed.
- By providing the keywords and websites that the system will use the user will have full controlover the online reputation management process.
- By saving all the information contained in a website the user will be able to take historicalinformation into account to determine how the keywords reputation changes over time. Savingthe information will also allow the user to search for any keyword without rescanning theinternet.
The proposed system was tested and successfully used to determine the online reputation of many user defined keywords. / MIng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus, 2015
|
24 |
Development of an online reputation monitor / Gerhardus Jacobus Christiaan VenterVenter, Gerhardus Jacobus Christiaan January 2015 (has links)
The opinion of customers about companies are very important as this can influence a company’s profit. Companies often get customer feedback via surveys or other official methods in order to improve their services. However, some customers feel threatened when their opinions are publicly asked and thus prefer to voice their opinion on the internet where they take comfort in anonymity. This form of customer feedback is difficult to monitor as the information can be found anywhere on the internet and new information is generated at an astonishing rate.
Currently there are companies such as Brandseye and Brand.Com that provide online reputation management services. These services have various shortcomings such as cost and is incapable of accessing historical data. Companies are also not allowed to purchase these software and can only use the software on a subscription basis.
The design proposed in this document will be able to scan any number of user defined websites and save all the information found on the websites in a series of index files, which can be queried for occurrences of user defined keywords at any time. Additionally, the software will also be able to scan Twitter and Facebook for any number of user defined keywords and save any occurrences of the keywords to a database. After scanning the internet, the results will be passed through a similarity filter, which will filter out insignificant results as well as any duplicates that might be present. Once passed through the filter the remaining results will be analysed by a sentiment analysis tool which will determine whether the sentence in which the keyword occurs is positive or negative. The analysed results will determine the overall reputation of the keyword that was used.
The proposed design has several advantages over current systems:
- By using the modular design several tasks can execute at the same time without influencingeach other. For example; information can be extracted from the internet while existing resultsare being analysed.
- By providing the keywords and websites that the system will use the user will have full controlover the online reputation management process.
- By saving all the information contained in a website the user will be able to take historicalinformation into account to determine how the keywords reputation changes over time. Savingthe information will also allow the user to search for any keyword without rescanning theinternet.
The proposed system was tested and successfully used to determine the online reputation of many user defined keywords. / MIng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus, 2015
|
25 |
Τεχνικές για την εξαγωγή γνώσης από την πλατφόρμα του TwitterΔήμας, Αναστάσιος 12 October 2013 (has links)
Η χρήση του Twitter από ολοένα και περισσότερους ανθρώπους έχει ως
συνέπεια την παραγωγή μεγάλου όγκου «υποκειμενικών» δεδομένων. Η ανάγκη για
εξεύρεση τυχόν πολύτιμης κρυμμένης πληροφορίας σε αυτά τα δεδομένα, έδωσε
ώθηση στην ανάπτυξη ενός νέου πεδίου έρευνας, του Sentiment Analysis, που έχει
ως αντικείμενο τον εντοπισμό του συναισθήματος ενός χρήστη (ή μιας ομάδας
χρηστών) ως προς κάποιο θέμα. Οι παραδοσιακοί αλγόριθμοι και μέθοδοι
εντοπισμού συναισθήματος στηρίζονται στην λεκτική ανάλυση φράσεων ή
προτάσεων σε «επίσημα» κείμενα και καλούνται word based approaches. Ωστόσο,
το μικρό μέγεθος των κειμένων του Twitter, σε συνδυασμό με την χαλαρότητα της
χρησιμοποιούμενης γλώσσας (από πλευράς χρηστών), δεν επιτρέπει την
αποτελεσματική χρήση αυτών των τεχνικών. Για τον λόγο αυτό, προτιμάται η χρήση
τεχνικών που βασίζονται σε χαρακτήρες (αντί για λέξεις) και καλούνται character
based approaches.
Στόχος της διπλωματικής εργασίας είναι η εφαρμογή της character based
μεθόδου στην ανάλυση tweets πολιτικού περιεχομένου. Συγκεκριμένα,
χρησιμοποιήθηκαν δεδομένα από την πολιτική σκηνή των Η.Π.Α., με σκοπό να
εντοπιστεί η προτίμηση ενός χρήστη ως προς το Ρεπουμπλικανικό ή το Δημοκρατικό
κόμμα μέσω σχετικών tweets. Για την ανάλυση χρησιμοποιήθηκε επιβλεπόμενη
μάθηση με την βοήθεια του Naive Bayes ταξινομητή.
Αρχικά, συλλέχθηκε ένα σύνολο από 7904 tweets, προερχόμενα από τους
επίσημους λογαριασμούς Twitter 48 γερουσιαστών. Το σύνολο αυτό χωρίσθηκε σε
δυο επιμέρους σύνολα, το σύνολο εκπαίδευσης και το σύνολο ελέγχου, ελέγχοντας
για κάθε μια από τις δυο μεθόδους ανάλυσης (την word based και character based
μέθοδο) την ακρίβεια της ταξινόμησης. Από τα πειράματα πρόεκυψε πως η
character based μέθοδος ταξινομεί τα tweets με μεγαλύτερη ακρίβεια. Στην
συνέχεια συλλέξαμε δυο νέα σύνολα έλεγχου, ένα από τον επίσημο λογαριασμό
Twitter του Ρεπουμπλικανικού κόμματος και ένα από τον επίσημο λογαριασμό
Twitter του Δημοκρατικού κόμματος. Αυτή την φορά, ως σύνολο εκπαίδευσης
χρησιμοποιήθηκε ολόκληρο το αρχικό σύνολο από τα tweets των γερουσιαστών και
ελέγχθηκε η ακρίβεια ταξινόμησης για την character based μέθοδο στα δυο νέα
σύνολα ελέγχου. Αν και στην περίπτωση του Democratic Twitter account τα
αποτελέσματα μπορούν να χαρακτηριστούν ως «ικανοποιητικά», μιας και η
ακρίβεια της ταξινόμησης πλησίασε το 80%, για την περίπτωση του Republican
Twitter account κάτι τέτοιο δεν ισχύει. Για το λόγο αυτό, προχωρήσαμε σε μια πιο
διεξοδική μελέτη της δομής και του περιεχομένου αυτών tweets. Από την ανάλυση
προέκυψαν ορισμένα ενδιαφέροντα αποτελέσματα για την προέλευση των
χαμηλών ποσοστών στην ακρίβεια ταξινόμησης. Συγκεκριμένα, πρόεκυψε πως στην
πλειοψηφία των tweets που έγιναν από τους Ρεπουμπλικάνους γερουσιαστές, δεν
περιέχονταν κάποια προσωπική τους άποψη. Ήταν απλά μια αναφορά σε κάποιο
άρθρο ή video που είδαν στον διαδίκτυο. Άρα, η πλειοψηφία των tweets αυτών
περιέχουν «αντικειμενική» αντί για «υποκειμενική» πληροφορία. Συνεπώς, δεν
είναι δυνατόν να εξαχθούν τα χαρακτηριστικά εκείνα που θα βοηθήσουν στον
εντοπισμό της πολικότητας των χρηστών. / As more people enter the “social web”, social media platforms are becoming an increasingly valuable source of subjective information. The large volume of social media content available requires automatic techniques in order to process and extract any valuable information. This need recently gave rise to the field of Sentiment Analysis, also known as Opinion Mining. The goal of sentiment analysis is to identify the position of a user (or a group of users – a crowd), with respect to a particular issue or topic. Existing sentiment analysis systems aim at extracting patterns mainly from formal documents with respect to a particular language (most techniques concern English). They either search for discriminative series of words or use dictionaries that assess the meaning and sentiment of specific words and phrases. The limited size of Twitter posts in conjunction with the non-standard vocabulary and shortened words (used by its users) inserts a great deal of noise, making word based approaches ineffective. For all of the above reasons, a new approach was recommended in the literature. This new approach is not based on the study of words but rather on the study of consecutive character sequences (namely character-based approaches).
In this work, we demonstrate the superiority of the character based approach over the word based one in determining political sentiment. We argue that this approach can be used in order to efficiently determine the political preference (e.g. Republican or Democrat) of voters or to identify the importance that particular issues have on particular voters. This type of feedback can be useful in the organization of political campaigns or policies.
We created a corpus consisting of 7904 tweets, collected from the Twitter accounts of 48 U.S. senators. This corpus was then separated into two sets, the training set and the test set, in order to measure for each method (word and character based) the accuracy of the classification. From the experiments it was found that the character based method classified the tweets with greater accuracy. In the next test, we used two new test sets, one from the official Twitter account of the Republican Party and one from the official Twitter account of the Democratic Party. The main difference, with respect to the previous test, was the use of the total set of tweets collected from the senators’ Twitter accounts as a training set and the use of the tweets from the official Twitter accounts of each party as a test set. Even though from the official Democrat Twitter account, 80% of the tweets were correctly classified as Democrat, for the official Republican Twitter account this is not the case (56.7% accuracy).
This was found to be partly because the majority of the Republican account tweets were references to online articles or videos and not the personal opinions or views of the users. In other words, such tweets cannot be characterized as personal (subjective), in order to classify the respective user as leaning towards one party or the other, but rather should be considered as objective.
|
26 |
Análisis de sentimientos y predicción de eventos en twitterMontesinos García, Lucas January 2014 (has links)
Ingeniero Civil Eléctrico / El análisis de sentimientos o sentiment analysis es el estudio por el cual se determina la opinión de las personas en Internet sobre algún tema en específico, prediciendo la polaridad de los usuarios (a favor, en contra, neutro, etc), abarcando temas que van desde productos, películas, servicios a intereses socio-culturales como elecciones, guerras, fútbol, etc.
En el caso particular de esta memoria, se estudian los principales métodos usados en la literatura para realizar un análisis de sentimientos y se desarrolla un caso empleando parte de estas técnicas con sus respectivos resultados. La plataforma escogida fue Twitter, debido a su alto uso en Chile y el caso de estudio trata acerca de las elecciones presidenciales primarias realizadas en la Alianza por Chile entre los candidatos Andrés Allamand de Renovación Nacional (RN) y Pablo Longueira del partido Unión Demócrata Independiente (UDI). De esta forma, se busca predecir los resultados de las primarias, identificando la gente que está a favor de Allamand y la gente que apoya a Longueira. De igual manera, se busca identificar a los usuarios que están en contra de uno o ambos candidatos.
Para predecir la opinión de los usuarios se diseñó un diccionario con palabras positivas y negativas con un puntaje asociado, de manera que al encontrar estos términos en los tweets se determina la polaridad del mensaje pudiendo ser positiva, neutra o negativa. El Algoritmo diseñado tiene un acierto cercano al 60% al ocupar las 3 categorías, mientras que si sólo se ocupa para determinar mensajes positivos y negativos la precisión llega a un 74%.
Una vez catalogados los tweets se les asigna el puntaje a sus respectivos usuarios de manera de sumar estos valores a aquellas cuentas que tengan más de un tweet, para luego poder predecir el resultado de las elecciones por usuario.
Finalmente, el algoritmo propuesto determina como ganador a Pablo Longueira (UDI) por sobre Andrés Allamand (RN) con un 53% de preferencia mientras que en las elecciones en urnas realizadas en Julio de 2013 en Chile el resultado fue de un 51% sobre 49% a favor de Longueira, lo cual da un error de un 2%, lo que implica que el análisis realizado fue capaz de predecir, con un cierto margen de error, lo que sucedió en las elecciones.
Como trabajo futuro se plantea usar el diccionario y algoritmo diseñados para realizar un análisis de sentimientos en otro tema de interés y comprobar su efectividad para diferentes casos y plataformas.
|
27 |
The Quest for the Abnormal Return : A Study of Trading Strategies Based on Twitter SentimentGustafsson, Peter, Granholm, Jonas January 2017 (has links)
Active investors are always trying to find new ways of systematically beating the market. Since the advent of social media, this has become one of the latest areas where investors are trying to find untapped information to exploit through a technique called sentiment analysis, which is the act of using automatic text processing to discern the opinions of social media users. The purpose of this study is to investigate the possibility of using the sentiment of tweets directed at specific companies to construct portfolios which generate abnormal returns by investing in companies based on the sentiment. To meet this purpose, we have collected company specific tweets for 40 companies from the Nasdaq 100 list. These 40 companies were selected using a simple random selection. To measure the sentiment tweets were downloaded from 2014 to 2016, giving us three years of data. From these tweets we extracted the sentiment using a sentiment program called SentiStrength. The sentiment score for every company was then calculated to a weekly average which we then used for our portfolio construction. The starting point for this study to try and explain the relationship between sentiment and stock returns was the following theories: The Efficient Market Hypothesis, Investor Attention and the Signaling Theory. Tweets act as signals which direct the attention of the investors to which stocks to purchase and, if our hypothesis is correct, this can be exploited to generate abnormal returns. To evaluate the performance of our portfolios the cumulative non-risk adjusted return for all of portfolios was initially calculated followed by calculations of the risk adjusted return by regressing both the Fama-French Three-Factor model and Carhart’s Four-Factor model with the returns for our different portfolios being the dependent variables. The results we obtained from these tests suggests that it might be possible to obtain abnormal returns by constructing portfolios based on the sentiment of tweets, using a few of the strategies tested in this study as no statistically significant negative results were found and a few significant positive results were found. Our conclusion is that the results seems to contradict the strong form of the Efficient Market Hypothesis on the Nasdaq 100 as the information contained in the sentiment of tweets seems to not be fully integrated within the share price. However, we cannot say this with confidence as the EMH is not a testable hypothesis and any test of the EMH is also a test of the models used to measure the efficiency of the market.
|
28 |
AN ITERATIVE METHOD OF SENTIMENT ANALYSIS FOR RELIABLE USER EVALUATIONJingyi Hui (7023500) 16 August 2019 (has links)
<div>
<div>
<p>Benefited from the booming social network, reading posts from other users overinternet is becoming one of commonest ways for people to intake information. Onemay also have noticed that sometimes we tend to focus on users provide well-foundedanalysis, rather than those merely who vent their emotions. This thesis aims atfinding a simple and efficient way to recognize reliable information sources amongcountless internet users by examining the sentiments from their past posts.<br></p><p>To achieve this goal, the research utilized a dataset of tweets about Apples stockprice retrieved from Twitter. Key features we studied include post-date, user name,number of followers of that user, and the sentiment of that tweet. Prior to makingfurther use of the dataset, tweets from users who do not have sufficient posts arefiltered out. To compare user sentiments and the derivative of Apples stock price, weuse Pearson correlation between them for to describe how well each user performs.Then we iteratively increase the weight of reliable users and lower the weight ofuntrustworthy users, the correlation between overall sentiment and the derivative ofstock price will finally converge. The final correlations for individual users are theirperformance scores. Due to the chaos of real world data, manual segmentation viadata visualization is also proposed as a denoise method to improve performance.Besides our method, other metrics can also be considered as user trust index, suchas numbers of followers of each user. Experiments are conducted to prove that ourmethod out performs others. With simple input, this method can be applied on awide range of topics including election, economy, and job market.<br></p>
</div>
</div>
|
29 |
Análise de viés em notícias na língua portuguesa / Bias analysis on newswire in portugueseArruda, Gabriel Domingos de 02 December 2015 (has links)
O projeto descrito neste documento propõe um modelo para análise de viés em notícias, procurando identificar o viés dos meios de comunicação em relação a entidades políticas. Foram analisados três tipos de viés: o viés de seleção, que avalia o quanto uma entidade é referenciada pelo meio de comunicação; o viés de cobertura, que avalia quanto destaque é destinado a entidade e, por fim, o viés de afirmação, que avalia se estão falando mal ou bem da entidade. Para tal, foi construído um corpus de notícias sistematicamente extraídas de 5 produtores de notícias e classificadas manualmente em relação à polaridade e entidade alvo. Técnicas de análise de sentimentos baseadas em aprendizado de máquina foram validadas utilizando o corpus criado. Criou-se uma metodologia para identificação de viés, utilizando o conceito de outliers, a partir de métricas indicadoras. A partir da metodologia proposta, foi analisado o viés em relação aos candidatos ao governo de São Paulo e à presidência a partir do corpus criado, em que se identificou os três tipos de viés em dois produtores de notícias / The project described here proposes a model to study bias on newswire texts, related to political entities. Three types of bias are analysed: selection bias, which refers to the amount of times an entity is referenced by the media outlet; coverage bias, which assesses the amount of coverage given to an entity and, finally, the assertion bias, which analyses whether the news is a positive or negative report of an entity. To accomplish this, a corpus was systematically built by extracting news from 5 different newswires. These texts were manually classified according to their polarity alignment and associated entity. Sentiment Analysis techniques were applied and evaluated using the corpus. Based on the concept of outliers, a methodology for bias detection was created. Bias was analysed using the proposed methodology on the generated corpus for candidates to the government of the state of São Paulo and to presidency, being identified in two newswires for the three above-defined types
|
30 |
Linking Arabic social media based on similarity and sentimentAlhazmi, Samah January 2016 (has links)
A large proportion of World Wide Web (WWW) users treat it as a social medium, i.e. many of them use the WWW to express and communicate their opinions. Economic value or utility can be created if these utterances, reactions, or feedback are extracted from various social media platforms and their content analysed. Some of these benefits are related to e-commerce, marketing, product improvements, improving machine learning algorithms etc. Moreover, establishing links between different social media platforms, based on shared topics and content, could provide access to the comments of users of different platforms. However, studies to date have generally tackled the area of content extraction from each type of social media in isolation. There is a lack of research of some aspects of social media, namely, linking the references from a blog post, for example, to information related to the same issue on Twitter. In addition, while studies have been carried out on various languages, there has been little investigation into social media in the Arabic language. This thesis tackles opinion mining and sentiment analysis of Arabic language social media, particularly in blogs and Twitter. The thesis focuses on Arabic language technology blogs in order to identify the expressed sentiments and then to link an issue within a blog post to relevant tweets in Twitter. This was done by assessing the similarity of content and measuring the sentiments scores. In order to extract the required data, text-mining techniques were used to build up corpora of the raw blog data in Modern Standard Arabic (MSA) and to build tools and lexicons required for this research. The results obtained through this research contribute to the field of computer science by furthering the employment of text-mining techniques, thus improving the process of information retrieval and knowledge accumulation. Moreover, the study developed new approaches to working with Arabic opinion mining and the domain of sentiment analysis.
|
Page generated in 0.1001 seconds