• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 184
  • 43
  • 17
  • 14
  • 9
  • 7
  • 7
  • 7
  • 3
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 319
  • 319
  • 90
  • 84
  • 84
  • 72
  • 66
  • 61
  • 58
  • 56
  • 54
  • 54
  • 53
  • 52
  • 47
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

Continuous dimensional emotion tracking in music

Imbrasaite, Vaiva January 2015 (has links)
The size of easily-accessible libraries of digital music recordings is growing every day, and people need new and more intuitive ways of managing them, searching through them and discovering new music. Musical emotion is a method of classification that people use without thinking and it therefore could be used for enriching music libraries to make them more user-friendly, evaluating new pieces or even for discovering meaningful features for automatic composition. The field of Emotion in Music is not new: there has been a lot of work done in musicology, psychology, and other fields. However, automatic emotion prediction in music is still at its infancy and often lacks that transfer of knowledge from the other fields surrounding it. This dissertation explores automatic continuous dimensional emotion prediction in music and shows how various findings from other areas of Emotion and Music and Affective Computing can be translated and used for this task. There are four main contributions. Firstly, I describe a study that I conducted which focused on evaluation metrics used to present the results of continuous emotion prediction. So far, the field lacks consensus on which metrics to use, making the comparison of different approaches near impossible. In this study, I investigated people’s intuitively preferred evaluation metric, and, on the basis of the results, suggested some guidelines for the analysis of the results of continuous emotion recognition algorithms. I discovered that root-mean-squared error (RMSE) is significantly preferable to the other metrics explored for the one dimensional case, and it has similar preference ratings to correlation coefficient in the two dimensional case. Secondly, I investigated how various findings from the field of Emotion in Music can be used when building feature vectors for machine learning solutions to the problem. I suggest some novel feature vector representation techniques, testing them on several datasets and several machine learning models, showing the advantage they can bring. Some of the suggested feature representations can reduce RMSE by up to 19% when compared to the standard feature representation, and up to 10-fold improvement for non-squared correlation coefficient. Thirdly, I describe Continuous Conditional Random Fields and Continuous Conditional Neural Fields (CCNF) and introduce their use for the problem of continuous dimensional emotion recognition in music, comparing them with Support Vector Regression. These two models incorporate some of the temporal information that the standard bag-of-frames approaches lack, and are therefore capable of improving the results. CCNF can reduce RMSE by up to 20% when compared to Support Vector Regression, and can increase squared correlation for the valence axis by up to 40%. Finally, I describe a novel multi-modal approach to continuous dimensional music emotion recognition. The field so far has focused solely on acoustic analysis of songs, while in this dissertation I show how the separation of vocals and music and the analysis of lyrics can be used to improve the performance of such systems. The separation of music and vocals can improve the results by up to 10% with a stronger impact on arousal, when compared to a system that uses only acoustic analysis of the whole signal, and the addition of the analysis of lyrics can provide a similar improvement to the results of the valence model.
112

Expansão de recursos para análise de sentimentos usando aprendizado semi-supervisionado / Extending sentiment analysis resources using semi-supervised learning

Henrico Bertini Brum 23 March 2018 (has links)
O grande volume de dados que temos disponíveis em ambientes virtuais pode ser excelente fonte de novos recursos para estudos em diversas tarefas de Processamento de Linguagem Natural, como a Análise de Sentimentos. Infelizmente é elevado o custo de anotação de novos córpus, que envolve desde investimentos financeiros até demorados processos de revisão. Nossa pesquisa propõe uma abordagem de anotação semissupervisionada, ou seja, anotação automática de um grande córpus não anotado partindo de um conjunto de dados anotados manualmente. Para tal, introduzimos o TweetSentBR, um córpus de tweets no domínio de programas televisivos que possui anotação em três classes e revisões parciais feitas por até sete anotadores. O córpus representa um importante recurso linguístico de português brasileiro, e fica entre os maiores córpus anotados na literatura para classificação de polaridades. Além da anotação manual do córpus, realizamos a implementação de um framework de aprendizado semissupervisionado que faz uso de dados anotados e, de maneira iterativa, expande o mesmo usando dados não anotados. O TweetSentBR, que possui 15:000 tweets anotados é assim expandido cerca de oito vezes. Para a expansão, foram treinados modelos de classificação usando seis classificadores de polaridades, assim como foram avaliados diferentes parâmetros e representações a fim de obter um córpus confiável. Realizamos experimentos gerando córpus expandidos por cada classificador, tanto para a classificação em três polaridades (positiva, neutra e negativa) quanto para classificação binária. Avaliamos os córpus gerados usando um conjunto de held-out e comparamos a FMeasure da classificação usando como treinamento os córpus anotados manualmente e semiautomaticamente. O córpus semissupervisionado que obteve os melhores resultados para a classificação em três polaridades atingiu 62;14% de F-Measure média, superando a média obtida com as avaliações no córpus anotado manualmente (61;02%). Na classificação binária, o melhor córpus expandido obteve 83;11% de F1-Measure média, superando a média obtida na avaliação do córpus anotado manualmente (79;80%). Além disso, simulamos nossa expansão em córpus anotados da literatura, medindo o quão corretas são as etiquetas anotadas semi-automaticamente. Nosso melhor resultado foi na expansão de um córpus de reviews de produtos que obteve FMeasure de 93;15% com dados binários. Por fim, comparamos um córpus da literatura obtido por meio de supervisão distante e nosso framework semissupervisionado superou o primeiro na classificação de polaridades binária em cross-domain. / The high volume of data available in the Internet can be a good resource for studies of several tasks in Natural Language Processing as in Sentiment Analysis. Unfortunately there is a high cost for the annotation of new corpora, involving financial support and long revision processes. Our work proposes an approach for semi-supervised labeling, an automatic annotation of a large unlabeled set of documents starting from a manually annotated corpus. In order to achieve that, we introduced TweetSentBR, a tweet corpora on TV show programs domain with annotation for 3-point (positive, neutral and negative) sentiment classification partially reviewed by up to seven annotators. The corpus is an important linguistic resource for Brazilian Portuguese language and it stands between the biggest annotated corpora for polarity classification. Beyond the manual annotation, we implemented a semi-supervised learning based framework that uses this labeled data and extends it using unlabeled data. TweetSentBR corpus, containing 15:000 documents, had its size augmented in eight times. For the extending process, we trained classification models using six polarity classifiers, evaluated different parameters and representation schemes in order to obtain the most reliable corpora. We ran experiments generating extended corpora for each classifier, both for 3-point and binary classification. We evaluated the generated corpora using a held-out subset and compared the obtained F-Measure values with the manually and the semi-supervised annotated corpora. The semi-supervised corpus that obtained the best values for 3-point classification achieved 62;14% on average F-Measure, overcoming the results obtained by the same classification with the manually annotated corpus (61;02%). On binary classification, the best extended corpus achieved 83;11% on average F-Measure, overcoming the results on the manually corpora (79;80%). Furthermore, we simulated the extension of labeled corpora in literature, measuring how well the semi-supervised annotation works. Our best results were in the extension of a product review corpora, achieving 93;15% on F1-Measure. Finally, we compared a literature corpus which was labeled by using distant supervision with our semi-supervised corpus, and this overcame the first in binary polarity classification on cross-domain data.
113

Coronavirus public sentiment analysis with BERT deep learning

Ling, Jintao January 2020 (has links)
Microblog has become a central platform where people express their thoughts and opinions toward public events in China. With the sudden outbreak of coronavirus, the posts related to coronavirus are usually followed by a burst immediately in microblog volume, which provides a great opportunity to explore public sentiment about the events. In this context, sentiment analysis is helpful to explore how coronavirus affects public opinions. Deep learning has become a very popular technique for sentiment analysis. This thesis uses Bidirectional Encoder Representations from Transformers (BERT), a pre-trained unsupervised language representation model based on deep learning, to generate initial token embeddings that are further tuned by a neural network model on a supervised corpus, a sentiment classifier is constructed. We utilize data recently made available by the government of Beijing which contains 1 million blog posts from January 1 to February 20, 2020. Also, the model developed in this thesis can be used to track the sentiment variation with Weibo microblog data in the future. At the final stage, the variation of public sentiment is analyzed and presented with visualization charts of preformed people sentiment variation with the development of coronavirus in China. Comparison of the results between labeled data and all data is performed in order to explore how thoughts and opinions evolve in time. The result shows a significant growth of the negative sentiment on January 20 when a lockdown started in Wuhan, and afterward the growth becomes slower. Around February 7 when doctor Wenliang Li died, the number of negative sentiments reached its peak.
114

Analýza recenzí výrobků / Analysis of Product Reviews

Klocok, Andrej January 2020 (has links)
Online store customers generate vast amounts of product and service information through reviews, which are an important source of feedback. This thesis deals with the creation of a system for the analysis of product and shop reviews in the czech language. It describes the current methods of sentiment analysis and builds on current solutions. The resulting system implements automatic data download and their indexing, subsequently sentiment analysis together with text summary in the form of clustering of similar sentences based on vector representation of the text. A graphical user interface in the form of a web page is also included. A review data set with a total of more than six million reviews was created during the semester along with an interface for easy data export.
115

EXPLORING PSEUDO-TOPIC-MODELING FOR CREATING AUTOMATED DISTANT-ANNOTATION SYSTEMS

Sommers, Alexander Mitchell 01 September 2021 (has links)
We explore the use a Latent Dirichlet Allocation (LDA) imitating pseudo-topic-model, based on our original relevance metric, as a tool to facilitate distant annotation of short (often one to two sentence or less) documents. Our exploration manifests as annotating tweets for emotions, this being the current use-case of interest to us, but we believe the method could be extended to any multi-class labeling task of documents of similar length. Tweets are gathered via the Twitter API using "track" terms thought likely to capture tweets with a greater chance of exhibiting each emotional class, 3,000 tweets for each of 26 topics anticipated to elicit emotional discourse. Our pseudo-topic-model is used to produce relevance-ranked vocabularies for each corpus of tweets and these are used to distribute emotional annotations to those tweets not manually annotated, magnifying the number of annotated tweets by a factor of 29. The vector labels the annotators produce for the topics are cascaded out to the tweets via three different schemes which are compared for performance by proxy through the competition of bidirectional-LSMTs trained using the tweets labeled at a distance. An SVM and two emotionally annotated vocabularies are also tested on each task to provide context and comparison.
116

Genre and Domain Dependencies in Sentiment Analysis

Remus, Robert 23 April 2015 (has links)
Genre and domain influence an author\''s style of writing and therefore a text\''s characteristics. Natural language processing is prone to such variations in textual characteristics: it is said to be genre and domain dependent. This thesis investigates genre and domain dependencies in sentiment analysis. Its goal is to support the development of robust sentiment analysis approaches that work well and in a predictable manner under different conditions, i.e. for different genres and domains. Initially, we show that a prototypical approach to sentiment analysis -- viz. a supervised machine learning model based on word n-gram features -- performs differently on gold standards that originate from differing genres and domains, but performs similarly on gold standards that originate from resembling genres and domains. We show that these gold standards differ in certain textual characteristics, viz. their domain complexity. We find a strong linear relation between our approach\''s accuracy on a particular gold standard and its domain complexity, which we then use to estimate our approach\''s accuracy. Subsequently, we use certain textual characteristics -- viz. domain complexity, domain similarity, and readability -- in a variety of applications. Domain complexity and domain similarity measures are used to determine parameter settings in two tasks. Domain complexity guides us in model selection for in-domain polarity classification, viz. in decisions regarding word n-gram model order and word n-gram feature selection. Domain complexity and domain similarity guide us in domain adaptation. We propose a novel domain adaptation scheme and apply it to cross-domain polarity classification in semi- and unsupervised domain adaptation scenarios. Readability is used for feature engineering. We propose to adopt readability gradings, readability indicators as well as word and syntax distributions as features for subjectivity classification. Moreover, we generalize a framework for modeling and representing negation in machine learning-based sentiment analysis. This framework is applied to in-domain and cross-domain polarity classification. We investigate the relation between implicit and explicit negation modeling, the influence of negation scope detection methods, and the efficiency of the framework in different domains. Finally, we carry out a case study in which we transfer the core methods of our thesis -- viz. domain complexity-based accuracy estimation, domain complexity-based model selection, and negation modeling -- to a gold standard that originates from a genre and domain hitherto not used in this thesis.
117

Logické spojky v postojové analýze / Logical connectives in sentiment analysis

Přikrylová, Katrin January 2015 (has links)
No description available.
118

Umělé neuronové sítě a jejich využití při analýze textových dat / Artificial neural networks and their application in text analysis

Jankovič, Radovan January 2016 (has links)
This thesis is devoted to the area of sentiment analysis. Its goal is to discuss and compare various methods applicable to sentiment classification of short texts. When analyzing the described techniques, we will orient ourselves towards the context of social networks. Recently, this type of media became the source of vast amounts of data and the demand for its automatic processing is high. Interesting results have been obtained for clustering used in combination with supervised learning and convolution, which is primarily used for image data.
119

Predicting the Movement Direction of OMXS30 Stock Index Using XGBoost and Sentiment Analysis

Elena, Podasca January 2021 (has links)
Background. Stock market prediction is an active yet challenging research area. A lot of effort has been put in by both academia and practitioners to produce accurate stock market predictions models, in the attempt to maximize investment objectives. Tree-based ensemble machine learning methods such as XGBoost have proven successful in practice. At the same time, there is a growing trend to incorporate multiple data sources in prediction models, such as historical prices and text, in order to achieve superior forecasting performance. However, most applications and research have so far focused on the American or Asian stock markets, while the Swedish stock market has not been studied extensively from the perspective of hybrid models using both price and text derived features.  Objectives. The purpose of this thesis is to investigate whether augmenting a numerical dataset based on historical prices with sentiment features extracted from financial news improves classification performance when predicting the daily price trend of the Swedish stock market index, OMXS30. Methods. A dataset of 3,517 samples between 2006 - 2020 was collected from two sources, historical prices and financial news. XGBoost was used as classifier and four different metrics were employed for model performance comparison given three complementary datasets: the dataset which contains only the sentiment feature, the dataset with only price-derived features and finally, the dataset augmented with sentiment feature extracted from financial news.  Results. Results show that XGBoost has a good performance in classifying the daily trend of OMXS30 given historical price features, achieving an accuracy of 73% on the test set. A small improvement across all metrics is recorded on the test set when augmenting the numerical dataset with sentiment features extracted from financial news.  Conclusions. XGBoost is a powerful ensemble method for stock market prediction, reflected in a satisfactory classification performance of the daily movement direction of OMXS30. However, augmenting the numerical input set with sentiment features extracted from text did not have a powerful impact on classification performance in this case, as the improvements across all employed metrics were small.
120

Fast Data Analysis Methods For Social Media Data

Nhlabano, Valentine Velaphi 07 August 2018 (has links)
The advent of Web 2.0 technologies which supports the creation and publishing of various social media content in a collaborative and participatory way by all users in the form of user generated content and social networks has led to the creation of vast amounts of structured, semi-structured and unstructured data. The sudden rise of social media has led to their wide adoption by organisations of various sizes worldwide in order to take advantage of this new way of communication and engaging with their stakeholders in ways that was unimaginable before. Data generated from social media is highly unstructured, which makes it challenging for most organisations which are normally used for handling and analysing structured data from business transactions. The research reported in this dissertation was carried out to investigate fast and efficient methods available for retrieving, storing and analysing unstructured data form social media in order to make crucial and informed business decisions on time. Sentiment analysis was conducted on Twitter data called tweets. Twitter, which is one of the most widely adopted social network service provides an API (Application Programming Interface), for researchers and software developers to connect and collect public data sets of Twitter data from the Twitter database. A Twitter application was created and used to collect streams of real-time public data via a Twitter source provided by Apache Flume and efficiently storing this data in Hadoop File System (HDFS). Apache Flume is a distributed, reliable, and available system which is used to efficiently collect, aggregate and move large amounts of log data from many different sources to a centralized data store such as HDFS. Apache Hadoop is an open source software library that runs on low-cost commodity hardware and has the ability to store, manage and analyse large amounts of both structured and unstructured data quickly, reliably, and flexibly at low-cost. A Lexicon based sentiment analysis approach was taken and the AFINN-111 lexicon was used for scoring. The Twitter data was analysed from the HDFS using a Java MapReduce implementation. MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. The results demonstrate that it is fast, efficient and economical to use this approach to analyse unstructured data from social media in real time. / Dissertation (MSc)--University of Pretoria, 2019. / National Research Foundation (NRF) - Scarce skills / Computer Science / MSc / Unrestricted

Page generated in 0.1112 seconds