Global ETD Search

141	Fast Data Analysis Methods For Social Media Data Nhlabano, Valentine Velaphi 07 August 2018 (has links) The advent of Web 2.0 technologies which supports the creation and publishing of various social media content in a collaborative and participatory way by all users in the form of user generated content and social networks has led to the creation of vast amounts of structured, semi-structured and unstructured data. The sudden rise of social media has led to their wide adoption by organisations of various sizes worldwide in order to take advantage of this new way of communication and engaging with their stakeholders in ways that was unimaginable before. Data generated from social media is highly unstructured, which makes it challenging for most organisations which are normally used for handling and analysing structured data from business transactions. The research reported in this dissertation was carried out to investigate fast and efficient methods available for retrieving, storing and analysing unstructured data form social media in order to make crucial and informed business decisions on time. Sentiment analysis was conducted on Twitter data called tweets. Twitter, which is one of the most widely adopted social network service provides an API (Application Programming Interface), for researchers and software developers to connect and collect public data sets of Twitter data from the Twitter database. A Twitter application was created and used to collect streams of real-time public data via a Twitter source provided by Apache Flume and efficiently storing this data in Hadoop File System (HDFS). Apache Flume is a distributed, reliable, and available system which is used to efficiently collect, aggregate and move large amounts of log data from many different sources to a centralized data store such as HDFS. Apache Hadoop is an open source software library that runs on low-cost commodity hardware and has the ability to store, manage and analyse large amounts of both structured and unstructured data quickly, reliably, and flexibly at low-cost. A Lexicon based sentiment analysis approach was taken and the AFINN-111 lexicon was used for scoring. The Twitter data was analysed from the HDFS using a Java MapReduce implementation. MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. The results demonstrate that it is fast, efficient and economical to use this approach to analyse unstructured data from social media in real time. / Dissertation (MSc)--University of Pretoria, 2019. / National Research Foundation (NRF) - Scarce skills / Computer Science / MSc / Unrestricted Big Data Machine Learning Sentiment Analysis Text Mining Apache Hadoop UCTD
142	Extracting Customer Sentiments from Email Support Tickets : A case for email support ticket prioritisation Fiati-Kumasenu, Albert January 2019 (has links) Background Daily, companies generate enormous amounts of customer support tickets which are grouped and placed in specialised queues, based on some characteristics, from where they are resolved by the customer support personnel (CSP) on a first-in-first-out basis. Given that these tickets require different levels of urgency, a logical next step to improving the effectiveness of the CSPs is to prioritise the tickets based on business policies. Among the several heuristics that can be used in prioritising tickets is sentiment polarity. Objectives This study investigates how machine learning methods and natural language techniques can be leveraged to automatically predict the sentiment polarity of customer support tickets using. Methods Using a formal experiment, the study examines how well Support Vector Machine (SVM), Naive Bayes (NB) and Logistic Regression (LR) based sentiment polarity prediction models built for the product and movie reviews, can be used to make sentiment predictions on email support tickets. Due to the limited size of annotated email support tickets, Valence Aware Dictionary and sEntiment Reasoner (VADER) and cluster ensemble - using k-means, affinity propagation and spectral clustering, is investigated for making sentiment polarity prediction. Results Compared to NB and LR, SVM performs better, scoring an average f1-score of .71 whereas NB scores least with a .62 f1-score. SVM, combined with the presence vector, outperformed the frequency and TF-IDF vectors with an f1-score of .73 while NB records an f1-score of .63. Given an average f1-score of .23, the models transferred from the movie and product reviews performed inadequately even when compared with a dummy classifier with an f1-score average of .55. Finally, the cluster ensemble method outperformed VADER with an f1-score of .61 and .53 respectively. Conclusions Given the results, SVM, combined with a presence vector of bigrams and trigrams is a candidate solution for extracting sentiments from email support tickets. Additionally, transferring sentiment models from the movie and product reviews domain to the email support tickets is not possible. Finally, given that there exists a limited dataset for conducting sentiment analysis studies in the Swedish and the customer support context, a cluster ensemble is recommended as a sample selection method for generating annotated data. Machine Learning Natural Language Processing Sentiment Analysis Cluster Ensemble VADER Customer support Computer Systems Datorsystem
143	An Automated Digital Analysis of Depictions of Child Maltreatment in Ancient Roman Writings Browne, Alexander January 2019 (has links) Historians, mostly engaging with written evidence, have argued that the Christianisation of the Roman Empire resulted in changes in both attitudes and behaviour towards children, resulting in a decrease in their maltreatment by society. I begin with a working hypothesis that this attitude-change was real and resulted in a reduction in the maltreatment of children; and that this reduction in maltreatment is evident in the literature. The approach to investigating this hypothesis belongs to the emerging field of digital humanities: by using programming techniques developed in the field of sentiment analysis, I create two sentiment-analysis like tools, one a lexicon-based approach, the other an application of a naive bayes machine learning approach. The latter is favoured as more accurate. The tool is used to automatically tag sentences, extracted from a corpus of texts written between 100 B.C and 600 A.D, that mention children, as to whether the sentences feature the maltreatment of children or not. The results are then quantitively analysed with reference to the year in which the text was written, with no statistically significant result found. However, the high accuracy of the tool in tagging sentences, at above 88%, suggests that similar tools may be able to play an important role, alongside traditional research techniques, in historical and social-science research in the future. digital humanities child maltreatment ancient rome christianity sentiment analysis naive bayes Humanities and the Arts Humaniora och konst
144	Comparing LSTM and GRU for Multiclass Sentiment Analysis of Movie Reviews. Sarika, Pawan Kumar January 2020 (has links) Today, we are living in a data-driven world. Due to a surge in data generation, there is a need for efficient and accurate techniques to analyze data. One such kind of data which is needed to be analyzed are text reviews given for movies. Rather than classifying the reviews as positive or negative, we will classify the sentiment of the reviews on the scale of one to ten. In doing so, we will compare two recurrent neural network algorithms Long short term memory(LSTM) and Gated recurrent unit(GRU). The main objective of this study is to compare the accuracies of LSTM and GRU models. For training models, we collected data from two different sources. For filtering data, we used porter stemming and stop words. We coupled LSTM and GRU with the convolutional neural networks to increase the performance. After conducting experiments, we have observed that LSTM performed better in predicting border values. Whereas, GRU predicted every class equally. Overall GRU was able to predict multiclass text data of movie reviews slightly better than LSTM. GRU was computationally expansive when compared to LSTM. Gated recurrent unit Multiclass classification Movie reviews Sentiment Analysis Recurrent neural network Computer Systems Datorsystem
145	Using Sentiment Analysis of Twitter Discourse to Understand Sentiment Towards Salmon Aquaculture Among Stakeholders Over Time Glutting, Lisa 22 June 2022 (has links) The intersection of the environment, the economy and society create a wicked problem in salmon aquaculture in Canada. To provide a unique insight into the challenges of the salmon aquaculture industry amongst key stakeholders, this thesis investigates the sentiment of several important stakeholder groups in the salmon aquaculture industry: academics, industry, ENGOs, Government, Indigenous peoples, and the media. By scraping data from Twitter from the years 2006 to 2021, it examines aquaculture sentiment from a global English-speaking view, as well as a subset of Canadian data. This thesis addresses the following questions: How does public sentiment towards salmon aquaculture differ over time? How does public sentiment towards salmon aquaculture differ among stakeholder groups? Data is analyzed through a stakeholder management theory framework using sentiment analysis. Data is collected from Twitter because users prefer it to other social media sites to share their unprompted thoughts, ideas, and opinions. The data is scrapable using the open-source Twitter scraper Twint. The data is processed using Google Colab notebooks: raw data is preprocessed into 273,319 tweets (rows) of clean data, which are analyzed using VADER’s natural language processing tool, yielding a sentiment score between -1 and +1 for each tweet. This thesis explores the dependent variable of sentiment and the independent variable of time. Findings are examined through the lens of overall sentiment, sentiment from year to year (2006-2021), sentiment per stakeholder category, and sentiment per stakeholder category per year. Sentiment from 2007 to 2021 is expected to be increasingly negative because of significant negative events in the salmon aquaculture industry from 2006 to 2021. There have been many policy changes, lawsuits, fish escapes and concerns from ENGOs, Indigenous groups, and researchers about salmon aquaculture during this time. However, the data contradicts this hypothesis by trending positively over time. The overall dataset is consistent and clusters around a mean of 0.3 (slightly positive), a median of 0.4 and a standard deviation of 0.4. The skewness of the general data is -0.994, meaning that the distribution has a moderate negative skew (most tweets have positive sentiment). The dataset has an R-squared value of 0.64, meaning that the data represents a moderate model, and an R-squared value of 0.79 (when removing outliers) shows an absolute strong model. All eight stakeholder group categories display a moderately negative skewness value and a positive mean sentiment. The Academic / Researcher Group and the Industry / Worker stakeholder groups show strong models, and the other stakeholder categories with lower R-squared values show weaker models. This thesis provides new insight into the growing and expanding salmon aquaculture industry. Further, understanding stakeholder sentiment can allow a government, individual, or group to be more proactive in its decision-making rather than reactive. The data allows for open dialogue with all stakeholders and promotes future research, analysis, and collaboration within the salmon aquaculture industry. Salmon aquaculture Sentiment analysis Twitter VADER Stakeholder Stakeholder theory Data scraping
146	Attitudes Towards Log4j : A Sentiment Analysis Study on Twitter Data Froissart, Isabelle, Ring, Julia January 2022 (has links) A major security risk with the use of a Java logging library called Log4j was discovered in November 2021. The vulnerability meant that all Java applications using Log4j could be exploited by hackers through remote code execution. The Log4j vulnerability came to the general public's knowledge and became a hot topic on various social media platforms the 9th of December 2021. This is what will be referred to as the Log4j incident in this paper. The aim of the study is to investigate what attitudes users on Twitter have towards Log4j and how these attitudes have evolved over time in relation to the incident in question. Twitter data regarding Log4j was collected using Twitter API and sentiment analysis was performed on the data set using VADER. The gathered tweets were classified as either positive, negative or neutral. The data was collected, sorted and analyzed based on the CRISP-DM methodology. Tweets from two different time periods were studied. The two periods were 1) five months prior to the incident and 2) five months after the incident. The results showed that tweets posted before the incident were mostly positive, while tweets posted after the incident were mostly negative. An interesting discovery was found when comparing the sentiments exhibited within the five-month period directly following the incident. During the first month the results exhibited a predominance of negative sentiment regarding Log4j, while April 2022 on the contrary, was predominantly positive. In conclusion this study has presented the results of the attitudes a large group of Twitter users have expressed towards Log4j and how these attitudes have evolved over time. A gap in related research of how the discussions on social media circulate when a security threat with great impact appears has been identified and this study aims to provide new insights within this area. Sentiment attitude Twitter sentiment analysis Log4j opinion Information Systems
147	Ranking Aspect-Based Features in Restaurant Reviews Chan, Jacob Ling Hang 07 December 2020 (has links) Consumers continuously review products and services on the internet. Others have frequently relied on those reviews in making purchasing decisions. Review texts are usually free-form and associated with a star rating on a 5-point scale. The majority of restaurants receive a 3.5 or 4 star rating on average, so a standalone star rating does not provide adequate information for readers to make a decision. Many researchers have approached the problem with sentiment analysis to classify a sentence or a text as expressing a positive or a negative review. Sentiment analysis, even at the fine-grained level, can only provide classification of positive and negative judgments on any particular aspect under consideration. The novel method proposed in this thesis provides insight into what aspects reviewers deem as relevant when assigning star rating to restaurants. This is accomplished by using an interpretable star rating classification method that predicts star rating based on aspect and polarity score from the review. The model first assigns a polarity score for each aspect in the review text, then predicts a star rating, and outputs a ranked list of aspect importance according to a widely used restaurant reviews dataset. The result from this thesis suggests that the classification model is able to output a reliable ranking from the review texts. Sentiment Analysis Star Rating Prediction Feature Importance Random Forest Arts and Humanities
148	Sentiment Analysis of YouTube Public Videos based on their Comments Kvedaraite, Indre January 2021 (has links) With the rise of social media and publicly available data, opinion mining is more accessible than ever. It is valuable for content creators, companies and advertisers to gain insights into what users think and feel. This work examines comments on YouTube videos, and builds a deep learning classifier to automatically determine their sentiment. Four Long Short-Term Memory-based models are trained and evaluated. Experiments are performed to determine which deep learning model performs with the best accuracy, recall, precision, F1 score and ROC curve on a labelled YouTube Comment dataset. The results indicate that a BiLSTM-based model has the overall best performance, with the accuracy of 89%. Furthermore, the four LSTM-based models are evaluated on an IMDB movie review dataset, achieving an average accuracy of 87%, showing that the models can predict the sentiment of different textual data. Finally, a statistical analysis is performed on the YouTube videos, revealing that videos with positive sentiment have a statistically higher number of upvotes and views. However, the number of downvotes is not significantly higher in videos with negative sentiment. Sentiment analysis Sentiment classification LSTM BiLSTM Recurrent neural networks Convolutional neural networks Software Engineering Programvaruteknik
149	Predictive model based on sentiment analysis for peruvian smes in the sustainable tourist sector Zapata, Gianpierre, Murga, Javier, Raymundo, Carlos, Alvarez, Jose, Dominguez, Francisco 01 January 2017 (has links) In the sustainable tourist sector today, there is a wide margin of loss in small and medium-sized enterprise (SMEs) because of a poor control in logistical expenses. In other words, acquired goods are note being sold, a scenario which is very common in tourism SMEs. These SMEs buy a number of travel packages to big companies and because of the lack of demand of said packages, they expire and they become an expense, not the investment it was meant to be. To solve this problem, we propose a Predictive model based on sentiment analysis of a social networks that will help the sales decision making. Once the data of the social network is analyzed, we also propose a prediction model of tourist destinations, using this information as data source it will be able to predict the tourist interest. In addition, a case study was applied to a real Peruvian tourist enterprise showing their data before and after using the proposed model in order to validate the feasibility of proposed model. Big data Cloud computing Sentiment analysis Tourism sector Travel management process
150	Emotional Perception of Death in Animated Films : Sentiment Analysis of Coco and Soul’s Scripts and Reviews Hsu, Li-Hsin January 2021 (has links) This thesis aims to understand the emotions expressed by adults watching animated films with death topics through sentiment analysis. The research is a quantitative sentiment analysis from the perspective of distant reading. The previous studies on death scenes in animated films have only focused on child audiences. However, the age group of the audience of animated films is extensive; thus, it is necessary to analyse the sentiments of adult audiences. This thesis attempts to collect two movies produced by Pixar studio: Coco (2017) and Soul (2020), as well as their audience reviews on IMDb, a total of 600, for cross-comparison. Additionally, it analyses the content containing death in the reviews to understand better adult audiences’ emotional expressions on the subject of death. The analysing results show that the positive sentiment scores of the comments containing death are slightly lower than the scores of all the reviews, and the scores of the negative sentiments do not differ much. However, positive emotions still dominate these comments that contain death. The emotional performance between the script and the reviews is roughly similar. Still, the emotional intensity of the comments is higher than that of the script, indicating that the audience is willing to show their emotions on the public online film platform. Future research is recommended to conduct analysis together with other NLP analysis methods or close reading to explore more details of the content. Pixar IMDb Distant Reading Sentiment Analysis Emotions Death Other Humanities not elsewhere specified Övrig annan humaniora

Search results