• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 184
  • 43
  • 17
  • 14
  • 9
  • 7
  • 7
  • 7
  • 3
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 318
  • 318
  • 90
  • 84
  • 84
  • 72
  • 66
  • 61
  • 58
  • 56
  • 54
  • 54
  • 53
  • 52
  • 47
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
241

Sentiment Analysis of MOOC learner reviews : What motivates learners to complete a course?

Knöös, Johanna, Rääf, Siri Amanda January 2021 (has links)
In the last decade, development of Information and Communication Technology (ICT) thatsupports online learning has increased the demand for e-learning and Massive Open OnlineCourses (MOOCs). Despite their increased popularity, MOOCs are struggling with highdropout rates and only a small percentage of learners complete the courses they enrolled in. Thepurpose of this thesis is to gain knowledge about MOOC learner behaviour. The aim of thestudy is to identify the motivations of learners and how these differ between learners whocompleted a course and those who dropped out. Research on MOOC learners has mostly beencarried out using a quantitative approach. While quantitative methodologies are effective inhandling the large amount of data produced by MOOCs, qualitative methods can give deeperinsights into online learners’ motivations. Therefore, this thesis employs an explanatorysequential mixed methods research, in which sentiment analysis and topic modeling of learnerreviews from the platform Coursera are further explained by qualitative interviews with MOOClearners. In the study 28,000 reviews scraped from five courses within the fields of data sciencewere analyzed and ten interviews were held with learners who either completed, dropped outfrom or both completed and dropped out from a MOOC. In the quantitative analysis nine coursefactors were found that learners wrote about: content, delivery, assessment, learning experience,tools, video material, teaching style, instructor skills and course provider. In addition, eighteenthemes were yielded from the interviews: self-discipline, just for fun, certificates, personaldevelopment, knowledge, career, time, equipment, practical exercise, interaction, instructor,reality, structure, external material, cost, community, degree of difficulty and other. In thediscussion the empirical findings are reflected upon using the theoretical framework of theresearch and the literature review. The result does not reveal any differences in motivationsbetween learners who completed a course and those who dropped out, however, it does identifyfactors that caused learners’ to drop out and the topics that most negative learner reviews wereabout. This research contributes to the body of knowledge in the field of research on MOOClearner retention and motivations. The topic is relevant for research in education informaticsand for continued improvements in delivery of MOOCs.
242

Exploration of Knowledge Distillation Methods on Transformer Language Models for Sentiment Analysis / Utforskning av metoder för kunskapsdestillation på transformatoriska språkmodeller för analys av känslor

Liu, Haonan January 2022 (has links)
Despite the outstanding performances of the large Transformer-based language models, it proposes a challenge to compress the models and put them into the industrial environment. This degree project explores model compression methods called knowledge distillation in the sentiment classification task on Transformer models. Transformers are neural models having stacks of identical layers. In knowledge distillation for Transformer, a student model with fewer layers will learn to mimic intermediate layer vectors from a teacher model with more layers by designing and minimizing loss. We implement a framework to compare three knowledge distillation methods: MiniLM, TinyBERT, and Patient-KD. Student models produced by the three methods are evaluated by accuracy score on the SST-2 and SemEval sentiment classification dataset. The student models’ attention matrices are also compared with the teacher model to find the best student model for capturing dependencies in the input sentences. The comparison results show that the distillation method focusing on the Attention mechanism can produce student models with better performances and less variance. We also discover the over-fitting issue in Knowledge Distillation and propose a Two-Step Knowledge Distillation with Transformer Layer and Prediction Layer distillation to alleviate the problem. The experiment results prove that our method can produce robust, effective, and compact student models without introducing extra data. In the future, we would like to extend our framework to support more distillation methods on Transformer models and compare performances in tasks other than sentiment classification. / Trots de stora transformatorbaserade språkmodellernas enastående prestanda är det en utmaning att komprimera modellerna och använda dem i en industriell miljö. I detta examensarbete undersöks metoder för modellkomprimering som kallas kunskapsdestillation i uppgiften att klassificera känslor på Transformer-modeller. Transformers är neurala modeller med staplar av identiska lager. I kunskapsdestillation för Transformer lär sig en elevmodell med färre lager att efterlikna mellanliggande lagervektorer från en lärarmodell med fler lager genom att utforma och minimera förluster. Vi genomför en ram för att jämföra tre metoder för kunskapsdestillation: MiniLM, TinyBERT och Patient-KD. Elevmodeller som produceras av de tre metoderna utvärderas med hjälp av noggrannhetspoäng på datasetet för klassificering av känslor SST-2 och SemEval. Elevmodellernas uppmärksamhetsmatriser jämförs också med den från lärarmodellen för att ta reda på vilken elevmodell som är bäst för att fånga upp beroenden i de inmatade meningarna. Jämförelseresultaten visar att destillationsmetoden som fokuserar på uppmärksamhetsmekanismen kan ge studentmodeller med bättre prestanda och mindre varians. Vi upptäcker också problemet med överanpassning i kunskapsdestillation och föreslår en tvåstegs kunskapsdestillation med transformatorskikt och prediktionsskikt för att lindra problemet. Experimentresultaten visar att vår metod kan producera robusta, effektiva och kompakta elevmodeller utan att införa extra data. I framtiden vill vi utöka vårt ramverk för att stödja fler destillationmetoder på Transformer-modeller och jämföra prestanda i andra uppgifter än sentimentklassificering.
243

Covid-19 Related Conspiracy Theories on Social Media : How to identify misinformation through patterns in language usage on social media / Covid-19 relaterade konspirationsteorier på sociala medier

Savinainen, Oskar, Hvidbjerg Hansen, Thor January 2022 (has links)
Distinguishing between information and disinformation is an ever growing issue. The dramatic structure of a conspiracy theory easily captures a large audience and with the advent of social media, this disinformation can spread at an ever growing rate. This is especially true with the infodemic following the Covid-19 pandemic in early 2020, where there was a drastic increase in Covid-19 related misinformation on social media. When misinformation replaces fact, some people will inevitably follow borderline dangerous advice. This could unfortunately be seen in the ivermection issue where people injected this substance in hope of preventing/curing a Covid-19 infection. This is why finding patterns in disinformation that distinguishes it from facts would allow us to take measures against the spread of conspiracy theories. We have found patterns in our dataset suggesting that there is a significant difference in the language patterns for terms relating to conspiracy theories, and non-conspiratorial terms. We find that the sentiment of conspiracy theories is very volatile when compared to that of non-conspiratorial terms which follow a more neutral pattern in terms of sentiment. Suggesting that the language usage in a post can be used as a factor when determining the credibility of its content. We also find that conspiracy theories tend to see a drastic increase in mentions when previously being relatively lowin mentions. The result of this thesis could therefore be used as a start for developing tools and processes which would seek to combat the spread of conspiracy theories and limit the potential harm that could come from them.
244

Data Fusion and Text Mining for Supporting Journalistic Work

Zsombor, Vermes January 2022 (has links)
During the past several decades, journalists have been struggling with the ever growing amount of data on the internet. Investigating the validity of the sources or finding similar articles for a story can consume a lot of time and effort. These issues are even amplified by the declining size of the staff of news agencies. The solution is to empower the remaining professional journalists with digital tools created by computer scientists. This thesis project is inspired by an idea to provide software support for journalistic work with interactive visual interfaces and artificial intelligence. More specifically, within the scope of this thesis project, we created a backend module that supports several text mining methods such as keyword extraction, named entity recognition, sentiment analysis, fake news classification and also data collection from various data sources to help professionals in the field of journalism. To implement our system, first we gathered the requirements from several researchers and practitioners in journalism, media studies, and computer science, then acquired knowledge by reviewing literature on current approaches. Results are evaluated both with quantitative methods such as individual component benchmarks and also with qualitative methods by analyzing the outcomes of the semi-structured interviews with collaborating and external domain experts. Our results show that there is similarity between the domain experts' perceived value and the performance of the components on the individual evaluations. This shows us that there is potential in this research area and future work would be welcomed by the journalistic community.
245

Image Emotion Analysis: Facial Expressions vs. Perceived Expressions

Ayyalasomayajula, Meghana 20 December 2022 (has links)
No description available.
246

COVID-19: Анализ эмоциональной окраски сообщений в социальных сетях (на материале сети «Twitter») : магистерская диссертация / COVID-19: Social network sentiment analysis (based on the material of "Twitter" messages)

Денисова, П. А., Denisova, P. A. January 2021 (has links)
Работа посвящена изучению анализа тональности текстов в социальных сетях на примере сообщений-твитов из социальной сети Twitter. Материал исследования составили 818 224 сообщения по 17-ти ключевым словам, из которых 89 025 твитов содержали слова «COVID-19» и «Сoronavirus». В первой части работы рассматриваются общие теоретические и методологические вопросы: вводится понятие Sentiment Analysis, анализируются различные подходы к классификации тональности текстов. Особое внимание в задачах классификации текстов уделяется Байесовскому классификатору, который показывает высокую точность работы. Изучаются особенности анализа тональности текстов в социальных сетях во время эпидемий и вспышек болезней. Описывается процедура и алгоритм анализа тональности текста. Большое внимание уделяется анализу тональности текстов в Python с помощью библиотеки TextBlob, а также выбирается ещё один из инструментов «SaaS» - программное обеспечение как услуга, который позволяет реализовать анализ тональности текстов в режиме реального времени, где нет необходимости в большом опыте машинного обучения и обработке естественного языка, в сравнении с языком программирования Python. Вторая часть исследования начинается с построения выборок, т.е. определения ключевых слов, по которым в работе осуществляется поиск и экспорт необходимых твитов. Для этой цели используется корпус - Coronavirus Corpus, предназначенный для отражения социальных, культурных и экономических последствий коронавируса (COVID-19) в 2020 году и в последующий период. Анализируется динамика использования слов по изучаемой тематике в течение 2020 года и проводится аналогия между частотой их использования и происходящими событиями. Далее по выбранным ключевым словам осуществляется поиск твитов и, основываясь на полученных данных, реализуется анализ тональности cообщений с помощью библиотеки Python - TextBlob, созданной для обработки текстовых данных, и онлайн - сервиса Brand24. Сравнивая данные инструменты, отмечается схожесть полученных результатов. Исследование помогает быстро и в реальном времени понять общественные настроения по поводу вспышки COVID-19, способствуя тем самым пониманию развивающихся событий. Также данная работа может быть использована в качестве модели для определения эмоционального состояния интернет-пользователей в различных ситуациях. / The work is devoted to the sentiment analysis study of messages in Twitter social network. The research material consisted of 818,224 messages and 17 keywords, whereas 89,025 tweets contained the words "COVID-19" and "Coronavirus". In the first part, theoretical and methodological issues are considered: the concept of sentiment analysis is introduced, various approaches to text classification are analyzed. Particular attention in the problems of text classification is given to Naive Bayes classifier, which shows high accuracy of work. The features of sentiment analysis in social networks during epidemics and disease outbreaks are studied. The procedure and algorithm for analyzing the sentiment of the text are described. Much attention is paid to the analysis of sentiment of texts in Python using TextBlob library, and also one of the SaaS tools is chosen - software as a service, which allows real-time sentiment analysis of texts, where there is no need for extensive experience in machine learning and natural language processing against Python programming language. The second part of the study begins with sampling, i.e. definition of keywords by which the search and export of the necessary tweets is carried out. For this purpose, the Coronavirus Corpus is used, designed to reflect the social, cultural and economic consequences of the coronavirus (COVID-19) in 2020 and beyond. The dynamics of the topic words usage during 2020 is analyzed and an analogy is drawn between the frequency of their usage and the events in place. Next, the selected keywords are used to search for tweets and, based on the data obtained, the sentiment analysis of messages is carried out using the Python library - TextBlob, created for processing textual data, and the Brand24 online service. Comparing these tools, the results are similar. The study helps to understand quickly and in real-time public sentiments about the COVID-19 outbreak, thereby contributing to the understanding of developing events. Also, this work can be used as a model for determining the emotional state of Internet users in various situations.
247

Domain Knowledge and Representation Learning for Centroid Initialization in Text Clustering with k-Means : An exploratory study / Domänkunskap och Representationsinlärning för Centroidinitialisering vid Textklustering med k-Means : En utforskande studie

Yu, David January 2023 (has links)
Text clustering is a problem where texts are partitioned into homogeneous clusters, such as partitioning them based on their sentiment value. Two techniques to address the problem are representation learning, in particular language representation models, and clustering algorithms. The state-ofthe-art language models are based on neural networks, in particular the Transformer architecture, and the models are used to transform a text into a point in a high dimensional vector space. The texts are then clustered using a clustering algorithm, and a recognized partitional clustering algorithm is k-Means. Its goal is to find centroids that represent the clusters (partitions) by minimizing a distance measure. Two influential parameters of k-Means are the number of clusters and the initial centroids. Multiple heuristics exist to decide how the parameters are selected. The heuristic of using domain knowledge is commonly used when it is available, e.g., the number of clusters is set to the number of dataset labels. This project further explores this idea. The main contribution of the thesis is an investigation of domain knowledge and representation learning as a heuristic in centroid initialization applied to k-Means. Initial centroids were obtained by applying a representation learning technique on the dataset labels. The project analyzed a Swedish dataset with views towards different aspects of Swedish immigration and a Swedish translated movie review dataset using six Swedish compatible language models and two versions of k-Means. Clustering evaluation was measured using eight metrics related to cohesion, separation, external entropy and accuracy. The results show the proposed heuristic made a positive impact on the metrics. By employing the proposed heuristic, six out of eight metrics were improved compared to the baseline. The improvements were observed using six language models and k-Means on two datasets. Additionally, the evaluation metrics indicated that the proposed heuristic has opportunities for future improvements. / Textklustering är ett problem där texter partitioneras i homogena kluster, till exempel genom att gruppera dem baserat på dess sentimentala värde. Två tekniker för att undersöka problemet är representationsinlärning, i synnerhet språkrepresentationsmodeller, och klustringsalgoritmer. Moderna språkmodeller är baserade på neurala nätverk, framförallt på Transformer arkitekturen, och modellerna används för att omvandla texter till punkter i ett högdimensionellt vektorrum. Därefter klustras texterna med hjälp av en klusteringsalgoritm, och en erkänd partition klusteringalgorithm är kMeans. Målet med algoritmen är att finna centroider som representerar klustren (partitioner) genom att minimera ett avståndsmått. Två inflytelserika parametrar i k-Means är antalet kluster och initiala centroider. Många heuristiker existerar för att bestämma hur dessa parametrar skall väljas. En vanligt förekommande heuristik är att använda domänkunskap om det är tillgängligt, e.g., antalet kluster väljs som antalet datamängdsetiketter. Detta projekt genomför ytterligare utforskningar av idén. Avhandlingens huvudsakliga bidrag är en undersökning av att använda kunskaper om domänen för datamängden och representationsinlärning som heuristik för centroid initialisering applicerat på k-Means. Initiala centroider erhölls genom att applicera en representationsinlärningsteknik på datamängdsetiketter. Projektet analyserar en svensk datamängd med åsikter gentemot olika aspekter av svensk immigration och en svensk översatt datamängd om filmrecensioner med hjälp av sex svenskkompatibla språkmodeller och kMeans. Utvärdering av klustringen uppmättes med hjälp av åtta mätetal relaterade till sammanhållning, separation, entropi och ackuratess. Den föreslagna heuristiken hade en positiv effekt på mätetalen. Genom att använda den föreslagna heuristiken förbättrades sex av åtta mätetal jämfört med baslinjen. Förbättringarna observerades med användning av sex språkmodeller och k-Means på två datamängder. Evalueringsmätetalen indikerar också på att heuristiken har möjligheter till framtida förbättringar.
248

Biases in AI: An Experiment : Algorithmic Fairness in the World of Hateful Language Detection / Bias i AI: ett experiment : Algoritmisk rättvisa inom detektion av hatbudskap

Stozek, Anna January 2023 (has links)
Hateful language is a growing problem in digital spaces. Human moderators are not enough to eliminate the problem. Automated hateful language detection systems are used to aid the human moderators. One of the issues with the systems is that their performance can differ depending on who is the target of a hateful text. This project evaluated the performance of the two systems (Perspective and Hatescan) with respect to who is the target of hateful texts. The analysis showed, that the systems performed the worst for texts directed at women and immigrants. The analysis involved tools such as a synthetic dataset based on the HateCheck test suite, as well as wild datasets created from forum data. Improvements to the test suite HateCheck have also been proposed. / Hatiskt språk är ett växande problem i digitala miljöer. Datamängderna är för stora för att enbart hanteras av mänskliga moderatorer. Automatiska system för hatdetektion används därför som stöd. Ett problem med dessa system är att deras prestanda kan variera beroende på vem som är målet för en hatfull text. Det här projektet evaluerade prestandan av de två systemen Perspective och Hatescan med hänsyn till olika mål för hatet. Analysen visade att systemen presterade sämst för texter där hatet riktades mot kvinnor och invandrare. Analysen involverade verktyg som ett syntetiskt dataset baserat på testsviten HateCheck och vilda dataset med texter inhämtade från diskussionsforum på internet. Dessutom har projektet utvecklat förslag på förbättringar till testsviten HateCheck.
249

Sentiment Analysis Of IMDB Movie Reviews : A comparative study of Lexicon based approach and BERT Neural Network model

Domadula, Prashuna Sai Surya Vishwitha, Sayyaparaju, Sai Sumanwita January 2023 (has links)
Background: Movies have become an important marketing and advertising tool that can influence consumer behaviour and trends. Reading film reviews is an im- important part of watching a movie, as it can help viewers gain a general under- standing of the film. And also, provide filmmakers with feedback on how their work is being received. Sentiment analysis is a method of determining whether a review has positive or negative sentiment, and this study investigates a machine learning method for classifying sentiment from film reviews. Objectives: This thesis aims to perform comparative sentiment analysis on textual IMDb movie reviews using lexicon-based and BERT neural network models. Later different performance evaluation metrics are used to identify the most effective learning model. Methods: This thesis employs a quantitative research technique, with data analysed using traditional machine learning. The labelled data set comes from an online website called Kaggle (https://www.kaggle.com/datasets), which contains movie review information. Algorithms like the lexicon-based approach and the BERT neural networks are trained using the chosen IMDb movie reviews data set. To discover which model performs the best at predicting the sentiment analysis, the constructed models will be assessed on the test set using evaluation metrics such as accuracy, precision, recall and F1 score. Results: From the conducted experimentation the BERT neural network model is the most efficient algorithm in classifying the IMDb movie reviews into positive and negative sentiments. This model achieved the highest accuracy score of 90.67% over the trained data set, followed by the BoW model with an accuracy of 79.15%, whereas the TF-IDF model has 78.98% accuracy. BERT model has the better precision and recall with 0.88 and 0.92 respectively, followed by both BoW and TF-IDF models. The BoW model has a precision and recall of 0.79 and the TF-IDF has a precision of 0.79 and a recall of 0.78. And also the BERT model has the highest F1 score of 0.88, followed by the BoW model having a F1 score of 0.79 whereas, TF-IDF has 0.78. Conclusions: Among the two models evaluated, the lexicon-based approach and the BERT transformer neural network, the BERT neural network is the most efficient, having a good performance score based on the measured performance criteria.
250

Efficient Sentiment Analysis and Topic Modeling in NLP using Knowledge Distillation and Transfer Learning / Effektiv sentimentanalys och ämnesmodellering inom NLP med användning av kunskapsdestillation och överföringsinlärning

Malki, George January 2023 (has links)
This abstract presents a study in which knowledge distillation techniques were applied to a Large Language Model (LLM) to create smaller, more efficient models without sacrificing performance. Three configurations of the RoBERTa model were selected as ”student” models to gain knowledge from a pre-trained ”teacher” model. Multiple steps were used to improve the knowledge distillation process, such as copying some weights from the teacher to the student model and defining a custom loss function. The selected task for the knowledge distillation process was sentiment analysis on Amazon Reviews for Sentiment Analysis dataset. The resulting student models showed promising performance on the sentiment analysis task capturing sentiment-related information from text. The smallest of the student models managed to obtain 98% of the performance of the teacher model while being 45% lighter and taking less than a third of the time to analyze an entire the entire IMDB Dataset of 50K Movie Reviews dataset. However, the student models struggled to produce meaningful results on the topic modeling task. These results were consistent with the topic modeling results from the teacher model. In conclusion, the study showcases the efficacy of knowledge distillation techniques in enhancing the performance of LLMs on specific downstream tasks. While the model excelled in sentiment analysis, further improvements are needed to achieve desirable outcomes in topic modeling. These findings highlight the complexity of language understanding tasks and emphasize the importance of ongoing research and development to further advance the capabilities of NLP models. / Denna sammanfattning presenterar en studie där kunskapsdestilleringstekniker tillämpades på en stor språkmodell (Large Language Model, LLM) för att skapa mindre och mer effektiva modeller utan att kompremissa på prestandan. Tre konfigurationer av RoBERTa-modellen valdes som ”student”-modeller för att inhämta kunskap från en förtränad ”teacher”-modell. Studien mäter även modellernas prestanda på två ”DOWNSTREAM” uppgifter, sentimentanalys och ämnesmodellering. Flera steg användes för att förbättra kunskapsdestilleringsprocessen, såsom att kopiera vissa vikter från lärarmodellen till studentmodellen och definiera en anpassad förlustfunktion. Uppgiften som valdes för kunskapsdestilleringen var sentimentanalys på datamängden Amazon Reviews for Sentiment Analysis. De resulterande studentmodellerna visade lovande prestanda på sentimentanalysuppgiften genom att fånga upp information relaterad till sentiment från texten. Den minsta av studentmodellerna lyckades erhålla 98% av prestandan hos lärarmodellen samtidigt som den var 45% lättare och tog mindre än en tredjedel av tiden att analysera hela IMDB Dataset of 50K Movie Reviews datasettet.Dock hade studentmodellerna svårt att producera meningsfulla resultat på ämnesmodelleringsuppgiften. Dessa resultat överensstämde med ämnesmodelleringsresultaten från lärarmodellen. Dock hade studentmodellerna svårt att producera meningsfulla resultat på ämnesmodelleringsuppgiften. Dessa resultat överensstämde med ämnesmodelleringsresultaten från lärarmodellen.

Page generated in 0.0425 seconds