Global ETD Search

21	Style Transfer Paraphrasing for Consistency Training in Sentiment Classification / Stilöverförande parafrasering för textklassificering med consistency training Casals, Núria January 2021 (has links) Text data is easy to retrieve but often expensive to classify, which is why labeled textual data is a resource often lacking in quantity. However, the use of labeled data is crucial in supervised tasks such as text classification, but semi-supervised learning algorithms have shown that the use of unlabeled data during training has the potential to improve model performance, even in comparison to a fully supervised setting. One approach to do semi-supervised learning is consistency training, in which the difference between the prediction distribution of an original unlabeled example and its augmented version is minimized. This thesis explores the performance difference between two techniques for augmenting unlabeled data used for detecting sentiment in movie reviews. The study examines whether the use of augmented data through neural style transfer paraphrasing could achieve comparable or better performance than the use of data augmented through back-translation. Five writing styles were used to generate the augmented datasets: Conversational Speech, Romantic Poetry, Shakespeare, Tweets and Bible. The results show that applying neural style transfer paraphrasing as a data augmentation technique for unlabeled examples in a semi-supervised setting does not improve the performance for sentiment classification with any of the styles used in the study. However, the use of style transferred augmented data in the semi-supervised approach generally performs better than using a model trained in a supervised scenario, where orders of magnitude more labeled data are needed and no augmentation is conducted. The study reveals that the experimented semi-supervised approach is superior to the fully supervised setting but worse than the semi-supervised approach using back-translation. / Textdata är lätt att få tag på men dyr att beteckna, vilket är varför annoterad textdata ofta inte finns i stora kvantiteter. Annoterad data är dock av yttersta vikt för övervakad inlärning, exempelvis för textklassificering, men semiövervakade inlärningsalgoritmer har visat att användandet av textdata utan annoteringar har potential att förbättra en inlärningsalgoritms resultat, även i jämförelse med helt övervakade algoritmer. Ett semi-övervakad inlärningsteknik är konsistensträning, där skillnaden mellan inferensen på en oförändrad datapunkt och en förändrar datapunkt minimeras. Denna uppsats utforskar skillnaden i resultat av att använda två olika tekniker för att förändra data som inte är annoterad för att detektera sentiment i filmrecensioner. Studien undersöker huruvida data förändrad via neural stilöverföring kan åstadkomma jämförbara eller bättre resultat i jämförelse med data förändrad genom tillbaka-översättning. Fem olika skrivstilar använda för att generera den förändrade datan: konversationellt tal, romantisk poesi, Shakespeare, Twitter-skrift samt Bibel. Resultaten visar att applicera neural stilöverföring på att förändra ej annoterade exempel för konsistensträning inte förbättrar resultaten i jämförelse med tillbaka-översättning. Semi-övervakad inlärning med stiltransferering presterar dock generellt bättre än en fullt övervakad, jämbördig algoritm som behöver flera magnituder fler annoteringar. Studien visar att den semiövervakade inlärningstekniken är bättre än den fullt övervakade modellen, men sämre än den semi-övervakade tekniken som använder tillbaka-översättning. Semi-Supervised Learning Data Augmentation Sentiment Classification Neural Paraphrasing Semi-övervakad inlärning Data förändring Sentimentklassificering Neural parafrasering Computer and Information Sciences Data- och informationsvetenskap
22	The Impact of the Retrieval Text Set for Text Sentiment Classification With the Retrieval-Augmented Language Model REALM / Effekten av hämtningstextsetet för sentimenttextklassificering med den hämtningsförstärkta språkmodellen REALM Blommegård, Oscar January 2023 (has links) Large Language Models (LLMs) have demonstrated impressive results across various language technology tasks. By training on large corpora of diverse text collections from the internet, these models learn to process text effectively, allowing them to acquire comprehensive world knowledge. However, this knowledge is stored implicitly in the parameters of the model, and it is necessary to train ever-larger networks to capture more information. Retrieval-augmented language models have been proposed as a way of improving the interpretability and adaptability of normal language models by utilizing a separate retrieval text set during application time. These models have demonstrated state-of-the-art results on knowledge-intensive tasks such as question-answering and fact-checking. However, their effectiveness in text classification remains unexplored. This study investigates the impact of the retrieval text set on the performance of the retrieval-augmented language model REALM model for sentiment text classification tasks. The results indicate that the addition of retrieval text data fails to improve the prediction capabilities of REALM for sentiment text classification tasks. This outcome is mainly due to the difference in functionality of the retrieval mechanisms during pre-training and fine-tuning. During pre-training, the neural knowledge retriever focuses on retrieving factual knowledge such as dates, cities and names to enhance the prediction of the model. During fine-tuning, the retriever aims to retrieve texts that can strengthen the prediction of the text sentiment classification task. The findings suggest that retrieval models may hold limited potential to enhance performance for text sentiment classification tasks. / Stora språkmodeller har visat imponerande resultat inom många olika språkteknologiska uppgifter. Genom att träna på stora textmängder från internet lär sig dessa modeller att effektivt processa text, vilket gör att de kan förvärva omfattande världskunskap. Denna kunskap lagras emellertid implicit i modellernas parametrar, och det är nödvändigt att träna allt större nätverk för att fånga mer information. Hämtningsförstärkta språkmodeller (retrieval-augmented language models) har föreslagits som ett sätt att förbättra tolknings- och anpassningsförmågan hos språkmodeller genom att använda en separat hämtningstextmängd (retrieval text set) vid prediktion. Dessa modeller har visat imponerande resultat på kunskapsintensiva uppgifter som frågebesvarande (question-answering) och faktakontroll. Deras effektivitet för textklassificering är dock outforskad. Denna studie undersöker effekten av hämtningstextmängden på prestandan för den hämtningsförstärkta språkmodellen REALM för sentimenttextklassificeringsuppgifter. Resultaten indikerar att användning av hämtningstextmängd vid predicering inte lyckas förbättra REALM prediktionsförmåga för sentimenttextklassificeringsuppgifter. Detta beror främst på skillnaden i funktionalitet hos hämtningsmekanismen under förträning och finjustering. Under förträningen fokuserar hämtningsmekanismen på att hämta fakta som datum, städer och namn för att förbättra modellens predicering. Under finjusteringen syftar hätmningsmekanismen till att hämta texter som kan stärka förutsägelsen av sentimenttextklassificeringsuppgiften. Resultaten tyder på att hämtningsförstärkta modeller kan ha begränsad potential att förbättra prestandan för sentimenttextklassificeringsuppgifter. Hämtningsförstärkta språkmodeller Natural Language Processing Transformers Djupinlärning Textklassificering Other Mathematics Annan matematik
23	Aspektbaserad Sentimentanalys för Business Intelligence inom E-handeln / Aspect-Based Sentiment Analysis for Business Intelligence in E-commerce Eriksson, Albin, Mauritzon, Anton January 2022 (has links) Many companies strive to make data-driven decisions. To achieve this, they need to explore new tools for Business Intelligence. The aim of this study was to examine the performance and usability of aspect-based sentiment analysis as a tool for Business Intelligence in E-commerce. The study was conducted in collaboration with Ellos Group AB which supplied anonymous customer feedback data. The implementation consists of two parts, aspect extraction and sentiment classification. The f irst part, aspect extraction, was implemented using dependency parsing and various aspect grouping techniques. The second part, sentiment classification, was implemented using the language model KB-BERT, a Swedish version of the BERT model. The method for aspect extraction achieved a satisfactory precision of 79,5% but only a recall of 27,2%. Moreover, the result for sentiment classification was unsatisfactory with an accuracy of 68,2%. Although the results underperform expectations, we conclude that aspect-based sentiment analysis in general is a great tool for Business Intelligence. Both as a means of generating customer insights from previously unused data and to increase productivity. However, it should only be used as a supportive tool and not to replace existing processes for decision-making. / Många företag strävar efter att fatta datadrivna beslut. För att åstadkomma detta behöver de utforska nya metoder för Business Intelligence. Syftet med denna studie var att undersöka prestandan och användbarheten av aspektbaserad sentimentanalys som ett verktyg för Business Intelligence inom e-handeln. Studien genomfördes i samarbete med Ellos Group AB som tillhandahöll data bestående av anonym kundfeedback. Implementationen består av två delar, aspektextraktion och sentimentklassificering. Aspektextraktion implementerades med hjälp av dependensparsning och olika aspektgrupperingstekniker. Sentimentklassificering implementerades med hjälp av språkmodellen KB-BERT, en svensk version av BERT. Metoden för aspektextraktion uppnådde en tillfredsställande precision på 79,5% men endast en recall på 27,2%. Resultatet för sentimentklassificering var otillfredsställande med en accuracy på 68,2%. Även om resultaten underpresterar förväntningarna drar vi slutsatsen att aspektbaserad sentimentanalys i allmänhet är ett bra verktyg för Business Intelligence. Både som ett sätt att generera kundinsikter från tidigare oanvända data och som ett sätt att öka produktiviteten. Det bör dock endast användas som ett stödjande verktyg och inte ersätta befintliga processer för beslutsfattande. Aspect-based sentiment analysis Aspect extraction BERT Business Intelligence Dependency parsing Ecommerce KB-BERT Sentiment analysis Sentiment classification Aspektbaserad sentimentanalys Aspektextraktion BERT Business Intelligence Dependensparsning E-handel KB-BERT Sentimentanalys Sentimentklassificering Computer and Information Sciences Data- och informationsvetenskap
24	透過圖片標籤觀察情緒字詞與事物概念之關聯 / An analysis on association between emotion words and concept words based on image tags 彭聲揚, Peng, Sheng-Yang Unknown Date (has links) 本研究試圖從心理學出發，探究描述情緒狀態的分類方法為何，為了進行情緒與語意的連結，我們試圖將影像當作情緒狀態的刺激來源，針對Flickr網路社群所共建共享的內容進行抽樣與觀察，使用心理學研究中基礎的情緒字詞與詞性變化，提取12,000張帶有字詞標籤的照片，進行標籤字詞與情緒分類字詞共現的計算、關聯規則計算。同時，透過語意差異量表，提出了新的偏向與強度的座標分類方法。透過頻率門檻的過濾、詞性加註與詞幹合併字詞的方法，從 65983個不重複的文字標籤中，最後得到272個帶有情緒偏向的事物概念字詞，以及正負偏向的情緒關聯規則。為了透過影像驗證這些字詞是否與影像內容帶給人們的情緒狀態有關聯，我們透過三種查詢管道：Flickr單詞查詢、google image單詞查詢、以及我們透過照片標籤綜合指標：情緒字詞比例、社群過濾參數來選定最後要比較的 42張照片。透過語意差異量表，測量三組照片在136位使用者的答案中，是否能吻合先前提出的強度-偏向模型。實驗結果發現，我們的方法和google image回傳的結果類似，使用者問卷調查結果支持我們的方法對於正負偏向的判定，且比 google有更佳的強弱分離程度。 / This study attempts to proceed from psychology to explore the emotional state of the classification method described why, in order to be emotional and semantic links, images as we try to stimulate the emotional state of the source, the Internet community for sharing Flickr content sampling and observation, using basic psychological research in terms of mood changes with the parts of speech, with word labels extracted 12,000 photos, label and classification of words and word co-occurrence of emotional computing, computing association rules. At the same time, through the semantic differential scale, tend to put forward a new classification of the coordinates and intensity. Through the frequency threshold filter, filling part of speech combined with the terms of the method stems from the 65,983 non-duplicate text labels, the last 272 to get things with the concept of emotional bias term, and positive and negative emotions tend to association rules. In order to verify these words through images is to bring people's emotional state associated with our pipeline through the three sources: Flickr , google image , and photos through our index labels: the proportion of emotional words, the community filtering parameters to select the final 42 photos to compare. Through the semantic differential scale, measuring three photos in 136 users of answers, whether the agreement made earlier strength - bias model. Experimental results showed that our methods and google image similar to the results returned, the user survey results support our approach to determine the positive and negative bias, and the strength of better than google degree of separation. 情緒分類情緒檢索情緒詞社群網路字詞共現關聯規則影像與情緒 sentiment classification sentiment retrieval sentiment words word co-occurrence association rules image and emotion social network
25	Sentiment-Driven Topic Analysis Of Song Lyrics Sharma, Govind 08 1900 (has links) (PDF) Sentiment Analysis is an area of Computer Science that deals with the impact a document makes on a user. The very field is further sub-divided into Opinion Mining and Emotion Analysis, the latter of which is the basis for the present work. Work on songs is aimed at building affective interactive applications such as music recommendation engines. Using song lyrics, we are interested in both supervised and unsupervised analyses, each of which has its own pros and cons. For an unsupervised analysis (clustering), we use a standard probabilistic topic model called Latent Dirichlet Allocation (LDA). It mines topics from songs, which are nothing but probability distributions over the vocabulary of words. Some of the topics seem sentiment-based, motivating us to continue with this approach. We evaluate our clusters using a gold dataset collected from an apt website and get positive results. This approach would be useful in the absence of a supervisor dataset. In another part of our work, we argue the inescapable existence of supervision in terms of having to manually analyse the topics returned. Further, we have also used explicit supervision in terms of a training dataset for a classifier to learn sentiment specific classes. This analysis helps reduce dimensionality and improve classification accuracy. We get excellent dimensionality reduction using Support Vector Machines (SVM) for feature selection. For re-classification, we use the Naive Bayes Classifier (NBC) and SVM, both of which perform well. We also use Non-negative Matrix Factorization (NMF) for classification, but observe that the results coincide with those of NBC, with no exceptions. This drives us towards establishing a theoretical equivalence between the two. Song Lyrics Non-negative Matrix Factorization (NMF) Music Information Retrival Music Recommendation Engine Support Vector Machine (SVM) Naive Bayes Classifier (NBC) Sentiment Analysis Emotion Analysis Latent Dirichlet Allocation (LDA) Sentiment Clustering Sentiment Classification k-Nearest Neighbour Classi er (k-NNC) Computer Science
26	All Negative on the Western Front: Analyzing the Sentiment of the Russian News Coverage of Sweden with Generic and Domain-Specific Multinomial Naive Bayes and Support Vector Machines Classifiers / På västfronten intet gott: attitydanalys av den ryska nyhetsrapporteringen om Sverige med generiska och domänspecifika Multinomial Naive Bayes- och Support Vector Machines-klassificerare Michel, David January 2021 (has links) This thesis explores to what extent Multinomial Naive Bayes (MNB) and Support Vector Machines (SVM) classifiers can be used to determine the polarity of news, specifically the news coverage of Sweden by the Russian state-funded news outlets RT and Sputnik. Three experiments are conducted. In the first experiment, an MNB and an SVM classifier are trained with the Large Movie Review Dataset (Maas et al., 2011) with a varying number of samples to determine how training data size affects classifier performance. In the second experiment, the classifiers are trained with 300 positive, negative, and neutral news articles (Agarwal et al., 2019) and tested on 95 RT and Sputnik news articles about Sweden (Bengtsson, 2019) to determine if the domain specificity of the training data outweighs its limited size. In the third experiment, the movie-trained classifiers are put up against the domain-specific classifiers to determine if well-trained classifiers from another domain perform better than relatively untrained, domain-specific classifiers. Four different types of feature sets (unigrams, unigrams without stop words removal, bigrams, trigrams) were used in the experiments. Some of the model parameters (TF-IDF vs. feature count and SVM’s C parameter) were optimized with 10-fold cross-validation. Other than the superior performance of SVM, the results highlight the need for comprehensive and domain-specific training data when conducting machine learning tasks, as well as the benefits of feature engineering, and to a limited extent, the removal of stop words. Interestingly, the classifiers performed the best on the negative news articles, which made up most of the test set (and possibly of Russian news coverage of Sweden in general). sentiment analysis news sentiment text classification cross-domain sentiment classification domain specificity domain-transfer problem transfer learning knowledge transfer support vector machines SVM multinomial naive Bayes Sweden Aurora 17 Russia Russian news RT Sputnik cyberwarfare influence campaign disinformation fake news propaganda

Page generated in 0.1487 seconds