• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 10
  • 7
  • 2
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 25
  • 25
  • 12
  • 8
  • 8
  • 6
  • 6
  • 6
  • 6
  • 5
  • 5
  • 5
  • 5
  • 5
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Combining Lexicon- and Learning-based Approaches for Improved Performance and Convenience in Sentiment Classification

Sommar, Fredrik, Wielondek, Milosz January 2015 (has links)
Sentiment classification is the process of categorizing data into categories based on its polarity with a wide array of applications across several industries. This report examines a combination of two prominent approaches to sentiment classification using a lexicon of weighted words and machine learning respectively. These approaches are compared with the combined hybrid approach in order to give an account of their relative strengths and weaknesses. When run on a set of IMDb movie reviews the results indicate that the hybrid model performs better than the lexicon-based approach, in turn being outperformed by the learning-based approach. However, the gain in convenience brought on by eliminating the need for training data makes the hybrid model an appealing alternative to the other approaches with a slight trade-off in performance. / Att klassificera text i kategorier baserat på känslan de uttrycker är ett aktuellt område idag och kan tillämpas inom många industrier. Rapporten undersöker en kombination av de två framstående tillvägagångssätten till denna typ av klassificering baserade på ett lexikon med definerade ordvikter respektive maskininlärning. Denna hybridlösning jämförs mot de två andra tillvägagångssätten för att framlägga deras relativa styrkor och svagheter. På ett dataset med filmrecensioner från IMDb får maskininlärningsklassificeraren bäst resultat, följt av hybridlösningen och sist den lexikonbaserade lösningen. Trots det kan hybridlösningen vara att föredra i situationer där det är ogenomförbart eller oskäligt att förbereda träningsdata för maskininlärningsklassificeraren, dock med ett visst avkall på prestanda.
12

Machine Learning Based Sentiment Classification of Text, with Application to Equity Research Reports / Maskininlärningsbaserad sentimentklassificering av text, med tillämpning på aktieanalysrapporte

Blomkvist, Oscar January 2019 (has links)
In this thesis, we analyse the sentiment in equity research reports written by analysts at Skandinaviska Enskilda Banken (SEB). We provide a description of established statistical and machine learning methods for classifying the sentiment in text documents as positive or negative. Specifically, a form of recurrent neural network known as long short-term memory (LSTM) is of interest. We investigate two different labelling regimes for generating training data from the reports. Benchmark classification accuracies are obtained using logistic regression models. Finally, two different word embedding models and bidirectional LSTMs of varying network size are implemented and compared to the benchmark results. We find that the logistic regression works well for one of the labelling approaches, and that the best LSTM models outperform it slightly. / I denna rapport analyserar vi sentimentet, eller attityden, i aktieanalysrapporter skrivna av analytiker på Skandinaviska Enskilda Banken (SEB). Etablerade statistiska metoder och maskininlärningsmetoder för klassificering av sentimentet i textdokument som antingen positivt eller negativt presenteras. Vi är speciellt intresserade av en typ av rekurrent neuronnät känt som long short-term memory (LSTM). Vidare undersöker vi två olika scheman för att märka upp träningsdatan som genereras från rapporterna. Riktmärken för klassificeringsgraden erhålls med hjälp av logistisk regression. Slutligen implementeras två olika ordrepresentationsmodeller och dubbelriktad LSTM av varierande nätverksstorlek, och jämförs med riktmärkena. Vi finner att logistisk regression presterar bra för ett av märkningsschemana, och att LSTM har något bättre prestanda.
13

Design an emotionally positive experience via sentiment classification for social media recommendation systems : A case study in TikTok / Skapa en emotionellt positiv upplevelse genom sentimentklassificering för rekommendationssystem för sociala medier : En fallstudie i TikTok

Deng, Yawen January 2023 (has links)
Recommendation system benefits social media by attracting users with the posts they prefer. The recommended posts, however, may not align with what users really need to browse, especially in terms of emotion. Thus we conducted a case study in TikTok, in order to understand the emotional impact of social application’s post feed and to explore the interactive solution. The state-of-arts were reviewed, on the topics of psychology issues caused by social media, related therapy and product solutions. To empathise with users’ situation, a workshop was performed, consisting of a card game, presentation and participatory design. Then an emotion reminder, built on a Naive Bayesian text classifier and a facial expression SVM, was prototyped. With an accuracy of 0.51 (text) and 0.69 (facial expression) in sentiment classification, the emotion reminder was then tested by the users. It was discovered that users had higher emotion awareness, higher sense of control over the browsing and lower engagement in the interface with the prototype, compared with the original TikTok interface. And this was aligned with their needs described in the workshop. Users preferred the prototype’s content-based emotion detection than the detection based on their biological data in terms of privacy, and embraced the format of the reminder, instead of auto-filter, as an emotionally positive experience was not just browsing the posts with positive feelings, but receiving negative posts as well. / Rekommendationssystem gynnar sociala medier genom att locka användare med de inlägg de föredrar. De rekommenderade inläggen kan dock inte alltid överensstämma med det användarna verkligen behöver bläddra igenom, särskilt när det gäller känslor. Därför genomförde vi en fallstudie på TikTok för att förstå den emotionella påverkan av sociala applikationers inläggflöde och för att utforska interaktiva lösningar. Den senaste forskningen inom området granskades med fokus på psykologiska problem orsakade av sociala medier, relaterad terapi och produktlösningar. För att sätta oss in i användarnas situation genomfördes en workshop med ett kortspel, presentation och deltagande design. Därefter skapades en känslomässig påminnelse, baserad på en Naive Bayes-textklassificerare och en SVM för ansiktsuttryck. Med en noggrannhet på 0,51 (text) och 0,69 (ansiktsuttryck) i känslolägesklassificering testades sedan känslominnaren av användarna. Det visade sig att användarna hade ökad medvetenhet om sina känslor, ökad känsla av kontroll över bläddrandet och lägre engagemang i gränssnittet med prototypen jämfört med det ursprungliga TikTok-gränssnittet. Detta stämde överens med deras behov som beskrevs under workshopen. Användarna föredrog prototypens innehållsbaserade känslodetektion jämfört med detektering baserad på deras biologiska data av integritetsskäl och omfamnade formatet på påminnelsen istället för automatisk filtrering. En emotionellt positiv upplevelse handlade inte bara om att bläddra bland inlägg med positiva känslor, utan även att ta emot negativa inlägg.
14

Cross-domain sentiment classification using grams derived from syntax trees and an adapted naive Bayes approach

Cheeti, Srilaxmi January 1900 (has links)
Master of Science / Department of Computing and Information Sciences / Doina Caragea / There is an increasing amount of user-generated information in online documents, includ- ing user opinions on various topics and products such as movies, DVDs, kitchen appliances, etc. To make use of such opinions, it is useful to identify the polarity of the opinion, in other words, to perform sentiment classification. The goal of sentiment classification is to classify a given text/document as either positive, negative or neutral based on the words present in the document. Supervised learning approaches have been successfully used for sentiment classification in domains that are rich in labeled data. Some of these approaches make use of features such as unigrams, bigrams, sentiment words, adjective words, syntax trees (or variations of trees obtained using pruning strategies), etc. However, for some domains the amount of labeled data can be relatively small and we cannot train an accurate classifier using the supervised learning approach. Therefore, it is useful to study domain adaptation techniques that can transfer knowledge from a source domain that has labeled data to a target domain that has little or no labeled data, but a large amount of unlabeled data. We address this problem in the context of product reviews, specifically reviews of movies, DVDs and kitchen appliances. Our approach uses an Adapted Naive Bayes classifier (ANB) on top of the Expectation Maximization (EM) algorithm to predict the sentiment of a sentence. We use grams derived from complete syntax trees or from syntax subtrees as features, when training the ANB classifier. More precisely, we extract grams from syntax trees correspond- ing to sentences in either the source or target domains. To be able to transfer knowledge from source to target, we identify generalized features (grams) using the frequently co-occurring entropy (FCE) method, and represent the source instances using these generalized features. The target instances are represented with all grams occurring in the target, or with a reduced grams set obtained by removing infrequent grams. We experiment with different types of grams in a supervised framework in order to identify the most predictive types of gram, and further use those grams in the domain adaptation framework. Experimental results on several cross-domains task show that domain adaptation approaches that combine source and target data (small amount of labeled and some unlabeled data) can help learn classifiers for the target that are better than those learned from the labeled target data alone.
15

Análise de sentimentos em textos curtos provenientes de redes sociais / Sentiment analysis in short texts from social networks

Silva, Nadia Felix Felipe da 22 February 2016 (has links)
A análise de sentimentos é um campo de estudo com recente popularização devido ao crescimento da Internet e do conteúdo que é gerado por seus usuários, principalmente nas redes sociais, nas quais as pessoas publicam suas opiniões em uma linguagem coloquial e em muitos casos utilizando de artifícios gráficos para tornar ainda mais sucintos seus diálogos. Esse cenário é observado no Twitter, uma ferramenta de comunicação que pode facilmente ser usada como fonte de informação para várias ferramentas automáticas de inferência de sentimentos. Esforços de pesquisas têm sido direcionados para tratar o problema de análise de sentimentos em redes sociais sob o ponto de vista de um problema de classificação, com pouco consenso sobre qual é o classificador com melhor poder preditivo, bem como qual é a configuração fornecida pela engenharia de atributos que melhor representa os textos. Outro problema é que em um cenário supervisionado, para a etapa de treinamento do modelo de classificação, é imprescindível se dispor de exemplos rotulados, uma tarefa árdua e que demanda esforço humano em grande parte das aplicações. Esta tese tem por objetivo investigar o uso de agregadores de classificadores (classifier ensembles), explorando a diversidade e a potencialidade de várias abordagens supervisionadas quando estas atuam em conjunto, além de um estudo detalhado da fase que antecede a escolha do classificador, a qual é conhecida como engenharia de atributos. Além destes aspectos, um estudo mostrando que o aprendizado não supervisionado pode fornecer restrições complementares úteis para melhorar a capacidade de generalização de classificadores de sentimento é realizado, fornecendo evidências de que ganhos já observados em outras áreas do conhecimento também podem ser obtidos no domínio em questão. A partir dos promissores resultados experimentais obtidos no cenário de aprendizado supervisionado, alavancados pelo uso de técnicas não supervisionadas, um algoritmo existente, denominado de C3E (Consensus between Classification and Clustering Ensembles) foi adaptado e estendido para o cenário semissupervisionado. Este algoritmo refina a classificação de sentimentos a partir de informações adicionais providas pelo agrupamento em um procedimento de autotreinamento (self-training). Tal abordagem apresenta resultados promissores e competitivos com abordagens que representam o estado da arte em outros domínios. / Sentiment analysis is a field of study that shows recent popularization due to the growth of Internet and the content that is generated by its users. More recently, social networks have emerged, where people post their opinions in colloquial and compact language. This is what happens in Twitter, a communication tool that can easily be used as a source of information for various automatic tools of sentiment inference. Research efforts have been directed to deal with the problem of sentiment analysis in social networks from the point of view of a classification problem, where there is no consensus about what is the best classifier, and what is the best configuration provided by the feature engineering process. Another problem is that in a supervised setting, for the training stage of the classification model, we need labeled examples, which are hard to get in the most of applications. The objective of this thesis is to investigate the use of classifier ensembles, exploring the diversity and the potential of various supervised approaches when these work together, as well as to provide a study about the phase that precedes the choice of the classifier, which is known as feature engineering. In addition to these aspects, a study showing that unsupervised learning techniques can provide useful and additional constraints to improve the ability of generalization of the classifiers is also carried out. Based on the promising results got in supervised learning settings, an existing algorithm called C3E (Consensus between Classification and Clustering Ensembles) was adapted and extended for the semi-supervised setting. This algorithm refines the sentiment classification from additional information provided by clusters of data, in a self-training procedure. This approach shows promising results when compared with state of the art algorithms.
16

Modelos de tópicos na classificação automática de resenhas de usuários. / Topic models in user review automatic classification.

Mauá, Denis Deratani 14 August 2009 (has links)
Existe um grande número de resenhas de usuário na internet contendo valiosas informações sobre serviços, produtos, política e tendências. A compreensão automática dessas opiniões é não somente cientificamente interessante, mas potencialmente lucrativa. A tarefa de classificação de sentimentos visa a extração automática das opiniões expressas em documentos de texto. Diferentemente da tarefa mais tradicional de categorização de textos, na qual documentos são classificados em assuntos como esportes, economia e turismo, a classificação de sentimentos consiste em anotar documentos com os sentimentos expressos no texto. Se comparados aos classificadores tradicionais, os classificadores de sentimentos possuem um desempenho insatisfatório. Uma das possíveis causas do baixo desempenho é a ausência de representações adequadas que permitam a discriminação das opiniões expressas de uma forma concisa e própria para o processamento de máquina. Modelos de tópicos são modelos estatísticos que buscam extrair informações semânticas ocultas na grande quantidade de dados presente em coleções de texto. Eles representam um documento como uma mistura de tópicos, onde cada tópico é uma distribuição de probabilidades sobre palavras. Cada distribuição representa um conceito semântico implícito nos dados. Modelos de tópicos, as palavras são substituídas por tópicos que representam seu significado de forma sucinta. De fato, os modelos de tópicos realizam uma redução de dimensionalidade nos dados que pode levar a um aumento do desempenho das técnicas de categorização de texto e recuperação de informação. Na classificação de sentimentos, eles podem fornecer a representação necessária através da extração de tópicos que representem os sentimentos expressos no texto. Este trabalho dedica-se ao estudo da aplicação de modelos de tópicos na representação e classificação de sentimentos de resenhas de usuário. Em particular, o modelo Latent Dirichlet Allocation (LDA) e quatro extensões (duas delas desenvolvidas pelo autor) são avaliados na tarefa de classificação de sentimentos baseada em múltiplos aspectos. As extensões ao modelo LDA permitem uma investigação dos efeitos da incorporação de informações adicionais como contexto, avaliações de aspecto e avaliações de múltiplos aspectos no modelo original. / There is a large number of user reviews on the internet with valuable information on services, products, politics and trends. There is both scientific and economic interest in the automatic understanding of such data. Sentiment classification is concerned with automatic extraction of opinions expressed in user reviews. Unlike standard text categorization tasks that deal with the classification of documents into subjects such as sports, economics and tourism, sentiment classification attempts to tag documents with respect to the feelings they express. Compared to the accuracy of standard methods, sentiment classifiers have shown poor performance. One possible cause of such a poor performance is the lack of adequate representations that lead to opinion discrimination in a concise and machine-readable form. Topic Models are statistical models concerned with the extraction of semantic information hidden in the large number of data available in text collections. They represent a document as a mixture of topics, probability distributions over words that represent a semantic concept. According to Topic Model representation, words can be substituted by topics able to represent concisely its meaning. Indeed, Topic Models perform a data dimensionality reduction that can improve the performance of text classification and information retrieval techniques. In sentiment classification, they can provide the necessary representation by extracting topics that represent the general feelings expressed in text. This work presents a study of the use of Topic Models for representing and classifying user reviews with respect to their feelings. In particular, the Latent Dirichlet Allocation (LDA) model and four extensions (two of them developed by the author) are evaluated on the task of aspect-based sentiment classification. The extensions to the LDA model enables us to investigate the effects of the incorporation of additional information such as context, aspect rating and multiple aspect rating into the original model.
17

網路評價搜尋結果的正負意見分類系統 / A sentiment classification system on search results of web opinions

黃泓彰, Huang, Hung Chang Unknown Date (has links)
本研究嘗試建置一個包含兩個主要功能的系統,分別是網路評價搜尋以及情感分類。在網路評價搜尋的部份,我們使用Google搜尋並蒐集一攜帶型智慧裝置(智慧型手機、平板電腦與筆記型電腦)的網路評價搜尋結果;情感分類的部分則是將搜尋結果依照對該產品的意見分類為,共有正面/負面/中立、正面/負面、正面/非正面,以及負面/非負面等四種分類方式。為了建置此系統,我們首先從知名的網路論壇Mobile01和批踢踢蒐集和攜帶型智慧裝置有關的網路文章以及產品名稱,接著以人工的方式標記每篇文章,以及部分文章中的句子的情感。本研究設計了兩個層次的情感分類實驗,我們首先從語句層次出發,以監督式機器學習法訓練將句子分為正面/負面/中立等三個類別的分類模型後,再進入文章層次,將句子的意見彙整,並同樣以監督式機器學習法訓練四種不同文章層次的分類模型:正面/負面/中立、正面/負面、正面/非正面,以及負面/非負面。我們分別選出四種分類實驗中表現最佳的模型,並用於系統建置,其中表現最佳的是分類為正面/負面的分類模型,平均的F-measure為0.87;其次是分類為負面/非負面的模型,對負面類別的F-measure為0.83;接著是分類為正面/非正面的模型,對正面類別的F-measure為0.81;表現最差的是正面/負面/中立的分類,平均的F-measure為0.77。在正面/負面分類的準確率上,本研究的表現並不壞於過去以英文為主要語言的相關研究。最後,我們也以過去不經過語句層次的分類方法進行實驗並比較,其結果發現經過語句層次的情感分類比不經過語句層次的情感分類較佳。 / In this research, we implemented a system that retrieves the search results of mobile phones, tablets, and notebooks from Google, and then classifies them as: (1) positive, negative, or neutral, (2) positive or negative, (3) positive or non-positive, (4) negative or non-negative. To build this system, first we collected some documents about mobile phones, tablets, and notebooks on two popular web forums: mobile01.com and ptt.cc. Next, a sentiment label (positive, negative, or neutral) is attached to each document and each sentence of these documents. We designed a two-level supervised sentiment classification experiment. At sentence level, we trained classifiers that classify sentences as positive, negative, or neutral. The best sentence classifier was then used at document level. At document level, the sentiment labels of the sentences in documents are used. We trained classifiers in four different classification problems: (1) positive, negative, or neutral, (2) positive vs. negative, (3) positive vs. non-positive, (4) negative vs. non-negative. The best is the second classifier with an average F-measure of 0.87. The next is the fourth classifier with an F-measure of 0.83 on negative class, and then comes with the third classifier with an F-measure of 0.81 on positive class. The last is the first classifier with an average F-measure of 0.77. Our accuracy is not worse than the past English study on the classification of positive vs. negative. Finally, we conducted another classification experiment using document-level-only classification method, and the results showed that our two-level sentiment classification (first sentence level, then document level) outperforms document-level-only sentiment classification.
18

Modelos de tópicos na classificação automática de resenhas de usuários. / Topic models in user review automatic classification.

Denis Deratani Mauá 14 August 2009 (has links)
Existe um grande número de resenhas de usuário na internet contendo valiosas informações sobre serviços, produtos, política e tendências. A compreensão automática dessas opiniões é não somente cientificamente interessante, mas potencialmente lucrativa. A tarefa de classificação de sentimentos visa a extração automática das opiniões expressas em documentos de texto. Diferentemente da tarefa mais tradicional de categorização de textos, na qual documentos são classificados em assuntos como esportes, economia e turismo, a classificação de sentimentos consiste em anotar documentos com os sentimentos expressos no texto. Se comparados aos classificadores tradicionais, os classificadores de sentimentos possuem um desempenho insatisfatório. Uma das possíveis causas do baixo desempenho é a ausência de representações adequadas que permitam a discriminação das opiniões expressas de uma forma concisa e própria para o processamento de máquina. Modelos de tópicos são modelos estatísticos que buscam extrair informações semânticas ocultas na grande quantidade de dados presente em coleções de texto. Eles representam um documento como uma mistura de tópicos, onde cada tópico é uma distribuição de probabilidades sobre palavras. Cada distribuição representa um conceito semântico implícito nos dados. Modelos de tópicos, as palavras são substituídas por tópicos que representam seu significado de forma sucinta. De fato, os modelos de tópicos realizam uma redução de dimensionalidade nos dados que pode levar a um aumento do desempenho das técnicas de categorização de texto e recuperação de informação. Na classificação de sentimentos, eles podem fornecer a representação necessária através da extração de tópicos que representem os sentimentos expressos no texto. Este trabalho dedica-se ao estudo da aplicação de modelos de tópicos na representação e classificação de sentimentos de resenhas de usuário. Em particular, o modelo Latent Dirichlet Allocation (LDA) e quatro extensões (duas delas desenvolvidas pelo autor) são avaliados na tarefa de classificação de sentimentos baseada em múltiplos aspectos. As extensões ao modelo LDA permitem uma investigação dos efeitos da incorporação de informações adicionais como contexto, avaliações de aspecto e avaliações de múltiplos aspectos no modelo original. / There is a large number of user reviews on the internet with valuable information on services, products, politics and trends. There is both scientific and economic interest in the automatic understanding of such data. Sentiment classification is concerned with automatic extraction of opinions expressed in user reviews. Unlike standard text categorization tasks that deal with the classification of documents into subjects such as sports, economics and tourism, sentiment classification attempts to tag documents with respect to the feelings they express. Compared to the accuracy of standard methods, sentiment classifiers have shown poor performance. One possible cause of such a poor performance is the lack of adequate representations that lead to opinion discrimination in a concise and machine-readable form. Topic Models are statistical models concerned with the extraction of semantic information hidden in the large number of data available in text collections. They represent a document as a mixture of topics, probability distributions over words that represent a semantic concept. According to Topic Model representation, words can be substituted by topics able to represent concisely its meaning. Indeed, Topic Models perform a data dimensionality reduction that can improve the performance of text classification and information retrieval techniques. In sentiment classification, they can provide the necessary representation by extracting topics that represent the general feelings expressed in text. This work presents a study of the use of Topic Models for representing and classifying user reviews with respect to their feelings. In particular, the Latent Dirichlet Allocation (LDA) model and four extensions (two of them developed by the author) are evaluated on the task of aspect-based sentiment classification. The extensions to the LDA model enables us to investigate the effects of the incorporation of additional information such as context, aspect rating and multiple aspect rating into the original model.
19

Análise de sentimentos em textos curtos provenientes de redes sociais / Sentiment analysis in short texts from social networks

Nadia Felix Felipe da Silva 22 February 2016 (has links)
A análise de sentimentos é um campo de estudo com recente popularização devido ao crescimento da Internet e do conteúdo que é gerado por seus usuários, principalmente nas redes sociais, nas quais as pessoas publicam suas opiniões em uma linguagem coloquial e em muitos casos utilizando de artifícios gráficos para tornar ainda mais sucintos seus diálogos. Esse cenário é observado no Twitter, uma ferramenta de comunicação que pode facilmente ser usada como fonte de informação para várias ferramentas automáticas de inferência de sentimentos. Esforços de pesquisas têm sido direcionados para tratar o problema de análise de sentimentos em redes sociais sob o ponto de vista de um problema de classificação, com pouco consenso sobre qual é o classificador com melhor poder preditivo, bem como qual é a configuração fornecida pela engenharia de atributos que melhor representa os textos. Outro problema é que em um cenário supervisionado, para a etapa de treinamento do modelo de classificação, é imprescindível se dispor de exemplos rotulados, uma tarefa árdua e que demanda esforço humano em grande parte das aplicações. Esta tese tem por objetivo investigar o uso de agregadores de classificadores (classifier ensembles), explorando a diversidade e a potencialidade de várias abordagens supervisionadas quando estas atuam em conjunto, além de um estudo detalhado da fase que antecede a escolha do classificador, a qual é conhecida como engenharia de atributos. Além destes aspectos, um estudo mostrando que o aprendizado não supervisionado pode fornecer restrições complementares úteis para melhorar a capacidade de generalização de classificadores de sentimento é realizado, fornecendo evidências de que ganhos já observados em outras áreas do conhecimento também podem ser obtidos no domínio em questão. A partir dos promissores resultados experimentais obtidos no cenário de aprendizado supervisionado, alavancados pelo uso de técnicas não supervisionadas, um algoritmo existente, denominado de C3E (Consensus between Classification and Clustering Ensembles) foi adaptado e estendido para o cenário semissupervisionado. Este algoritmo refina a classificação de sentimentos a partir de informações adicionais providas pelo agrupamento em um procedimento de autotreinamento (self-training). Tal abordagem apresenta resultados promissores e competitivos com abordagens que representam o estado da arte em outros domínios. / Sentiment analysis is a field of study that shows recent popularization due to the growth of Internet and the content that is generated by its users. More recently, social networks have emerged, where people post their opinions in colloquial and compact language. This is what happens in Twitter, a communication tool that can easily be used as a source of information for various automatic tools of sentiment inference. Research efforts have been directed to deal with the problem of sentiment analysis in social networks from the point of view of a classification problem, where there is no consensus about what is the best classifier, and what is the best configuration provided by the feature engineering process. Another problem is that in a supervised setting, for the training stage of the classification model, we need labeled examples, which are hard to get in the most of applications. The objective of this thesis is to investigate the use of classifier ensembles, exploring the diversity and the potential of various supervised approaches when these work together, as well as to provide a study about the phase that precedes the choice of the classifier, which is known as feature engineering. In addition to these aspects, a study showing that unsupervised learning techniques can provide useful and additional constraints to improve the ability of generalization of the classifiers is also carried out. Based on the promising results got in supervised learning settings, an existing algorithm called C3E (Consensus between Classification and Clustering Ensembles) was adapted and extended for the semi-supervised setting. This algorithm refines the sentiment classification from additional information provided by clusters of data, in a self-training procedure. This approach shows promising results when compared with state of the art algorithms.
20

Style Transfer Paraphrasing for Consistency Training in Sentiment Classification / Stilöverförande parafrasering för textklassificering med consistency training

Casals, Núria January 2021 (has links)
Text data is easy to retrieve but often expensive to classify, which is why labeled textual data is a resource often lacking in quantity. However, the use of labeled data is crucial in supervised tasks such as text classification, but semi-supervised learning algorithms have shown that the use of unlabeled data during training has the potential to improve model performance, even in comparison to a fully supervised setting. One approach to do semi-supervised learning is consistency training, in which the difference between the prediction distribution of an original unlabeled example and its augmented version is minimized. This thesis explores the performance difference between two techniques for augmenting unlabeled data used for detecting sentiment in movie reviews. The study examines whether the use of augmented data through neural style transfer paraphrasing could achieve comparable or better performance than the use of data augmented through back-translation. Five writing styles were used to generate the augmented datasets: Conversational Speech, Romantic Poetry, Shakespeare, Tweets and Bible. The results show that applying neural style transfer paraphrasing as a data augmentation technique for unlabeled examples in a semi-supervised setting does not improve the performance for sentiment classification with any of the styles used in the study. However, the use of style transferred augmented data in the semi-supervised approach generally performs better than using a model trained in a supervised scenario, where orders of magnitude more labeled data are needed and no augmentation is conducted. The study reveals that the experimented semi-supervised approach is superior to the fully supervised setting but worse than the semi-supervised approach using back-translation. / Textdata är lätt att få tag på men dyr att beteckna, vilket är varför annoterad textdata ofta inte finns i stora kvantiteter. Annoterad data är dock av yttersta vikt för övervakad inlärning, exempelvis för textklassificering, men semiövervakade inlärningsalgoritmer har visat att användandet av textdata utan annoteringar har potential att förbättra en inlärningsalgoritms resultat, även i jämförelse med helt övervakade algoritmer. Ett semi-övervakad inlärningsteknik är konsistensträning, där skillnaden mellan inferensen på en oförändrad datapunkt och en förändrar datapunkt minimeras. Denna uppsats utforskar skillnaden i resultat av att använda två olika tekniker för att förändra data som inte är annoterad för att detektera sentiment i filmrecensioner. Studien undersöker huruvida data förändrad via neural stilöverföring kan åstadkomma jämförbara eller bättre resultat i jämförelse med data förändrad genom tillbaka-översättning. Fem olika skrivstilar använda för att generera den förändrade datan: konversationellt tal, romantisk poesi, Shakespeare, Twitter-skrift samt Bibel. Resultaten visar att applicera neural stilöverföring på att förändra ej annoterade exempel för konsistensträning inte förbättrar resultaten i jämförelse med tillbaka-översättning. Semi-övervakad inlärning med stiltransferering presterar dock generellt bättre än en fullt övervakad, jämbördig algoritm som behöver flera magnituder fler annoteringar. Studien visar att den semiövervakade inlärningstekniken är bättre än den fullt övervakade modellen, men sämre än den semi-övervakade tekniken som använder tillbaka-översättning.

Page generated in 0.1728 seconds