Global ETD Search

341	Sentiment Analysis Of IMDB Movie Reviews : A comparative study of Lexicon based approach and BERT Neural Network model Domadula, Prashuna Sai Surya Vishwitha, Sayyaparaju, Sai Sumanwita January 2023 (has links) Background: Movies have become an important marketing and advertising tool that can influence consumer behaviour and trends. Reading film reviews is an im- important part of watching a movie, as it can help viewers gain a general under- standing of the film. And also, provide filmmakers with feedback on how their work is being received. Sentiment analysis is a method of determining whether a review has positive or negative sentiment, and this study investigates a machine learning method for classifying sentiment from film reviews. Objectives: This thesis aims to perform comparative sentiment analysis on textual IMDb movie reviews using lexicon-based and BERT neural network models. Later different performance evaluation metrics are used to identify the most effective learning model. Methods: This thesis employs a quantitative research technique, with data analysed using traditional machine learning. The labelled data set comes from an online website called Kaggle (https://www.kaggle.com/datasets), which contains movie review information. Algorithms like the lexicon-based approach and the BERT neural networks are trained using the chosen IMDb movie reviews data set. To discover which model performs the best at predicting the sentiment analysis, the constructed models will be assessed on the test set using evaluation metrics such as accuracy, precision, recall and F1 score. Results: From the conducted experimentation the BERT neural network model is the most efficient algorithm in classifying the IMDb movie reviews into positive and negative sentiments. This model achieved the highest accuracy score of 90.67% over the trained data set, followed by the BoW model with an accuracy of 79.15%, whereas the TF-IDF model has 78.98% accuracy. BERT model has the better precision and recall with 0.88 and 0.92 respectively, followed by both BoW and TF-IDF models. The BoW model has a precision and recall of 0.79 and the TF-IDF has a precision of 0.79 and a recall of 0.78. And also the BERT model has the highest F1 score of 0.88, followed by the BoW model having a F1 score of 0.79 whereas, TF-IDF has 0.78. Conclusions: Among the two models evaluated, the lexicon-based approach and the BERT transformer neural network, the BERT neural network is the most efficient, having a good performance score based on the measured performance criteria. Bag of Words(BoW) Deep Learning IMDb Movie Reviews Machine Learning Natural Language Processing(NLP) Sentiment Analysis Computer Sciences Datavetenskap (datalogi)
342	Data Augmentation in Solving Data Imbalance Problems Gao, Jie January 2020 (has links) This project mainly focuses on the various methods of solving data imbalance problems in the Natural Language Processing (NLP) field. Unbalanced text data is a common problem in many tasks especially the classification task, which leads to the model not being able to predict the minority class well. Sometimes, even we change to some more excellent and complicated model could not improve the performance, while some simple data strategies that focus on solving data imbalanced problems such as over-sampling or down-sampling produce positive effects on the result. The common data strategies include some re-sampling methods that duplicate new data from the original data or remove some original data to have the balance. Except for that, some other methods such as word replacement, word swap, and word deletion are used in previous work as well. At the same time, some deep learning models like BERT, GPT and fastText model, which have a strong ability for a general understanding of natural language, so we choose some of them to solve the data imbalance problem. However, there is no systematic comparison in practicing these methods. For example, over-sampling and down-sampling are fast and easy to use in previous small scales of datasets. With the increase of the dataset, the newly generated data by some deep network models is more compatible with the original data. Therefore, our work focus on how is the performance of various data augmentation techniques when they are used to solve data imbalance problems, given the dataset and task? After the experiment, Both qualitative and quantitative experimental results demonstrate that different methods have their advantages for various datasets. In general, data augmentation could improve the performance of classification models. For specific, BERT especially our fine-tuned BERT has an excellent ability in most using scenarios(different scales and types of the dataset). Still, other techniques such as Back-translation has a better performance in long text data, even it costs more time and has a complicated model. In conclusion, suitable choices for data augmentation methods could help to solve data imbalance problems. / Detta projekt fokuserar huvudsakligen på de olika metoderna för att lösa dataobalansproblem i fältet Natural Language Processing (NLP). Obalanserad textdata är ett vanligt problem i många uppgifter, särskilt klassificeringsuppgiften, vilket leder till att modellen inte kan förutsäga minoriteten Ibland kan vi till och med byta till en mer utmärkt och komplicerad modell inte förbättra prestandan, medan några enkla datastrategier som fokuserar på att lösa data obalanserade problem som överprov eller nedprovning ger positiva effekter på resultatet. vanliga datastrategier inkluderar några omprovningsmetoder som duplicerar nya data från originaldata eller tar bort originaldata för att få balans. Förutom det används vissa andra metoder som ordbyte, ordbyte och radering av ord i tidigare arbete Samtidigt har vissa djupinlärningsmodeller som BERT, GPT och fastText-modellen, som har en stark förmåga till en allmän förståelse av naturliga språk, så vi väljer några av dem för att lösa problemet med obalans i data. Det finns dock ingen systematisk jämförelse när man praktiserar dessa metoder. Exempelvis är överprovtagning och nedprovtagning snabba och enkla att använda i tidigare små skalor av datamängder. Med ökningen av datauppsättningen är de nya genererade data från vissa djupa nätverksmodeller mer kompatibla med originaldata. Därför fokuserar vårt arbete på hur prestandan för olika dataförstärkningstekniker används när de används för att lösa dataobalansproblem, givet datamängden och uppgiften? Efter experimentet visar både kvalitativa och kvantitativa experimentella resultat att olika metoder har sina fördelar för olika datamängder. I allmänhet kan dataförstärkning förbättra prestandan hos klassificeringsmodeller. För specifika, BERT speciellt vår finjusterade BERT har en utmärkt förmåga i de flesta med hjälp av scenarier (olika skalor och typer av datamängden). Ändå har andra tekniker som Back-translation bättre prestanda i lång textdata, till och med det kostar mer tid och har en komplicerad modell. Sammanfattningsvis lämpliga val för metoder för dataökning kan hjälpa till att lösa problem med obalans i data. Data augmentation Data imbalance NLP Deep learning Comparison. Dataförstoring Data obalans Textklassificering Naturlig språkbehandling Djup lärning. Computer and Information Sciences Data- och informationsvetenskap
343	Decentralized Large-Scale Natural Language Processing Using Gossip Learning / Decentraliserad Storskalig Naturlig Språkbehandling med Hjälp av Skvallerinlärning Alkathiri, Abdul Aziz January 2020 (has links) The field of Natural Language Processing in machine learning has seen rising popularity and use in recent years. The nature of Natural Language Processing, which deals with natural human language and computers, has led to the research and development of many algorithms that produce word embeddings. One of the most widely-used of these algorithms is Word2Vec. With the abundance of data generated by users and organizations and the complexity of machine learning and deep learning models, performing training using a single machine becomes unfeasible. The advancement in distributed machine learning offers a solution to this problem. Unfortunately, due to reasons concerning data privacy and regulations, in some real-life scenarios, the data must not leave its local machine. This limitation has lead to the development of techniques and protocols that are massively-parallel and data-private. The most popular of these protocols is federated learning. However, due to its centralized nature, it still poses some security and robustness risks. Consequently, this led to the development of massively-parallel, data private, decentralized approaches, such as gossip learning. In the gossip learning protocol, every once in a while each node in the network randomly chooses a peer for information exchange, which eliminates the need for a central node. This research intends to test the viability of gossip learning for large- scale, real-world applications. In particular, it focuses on implementation and evaluation for a Natural Language Processing application using gossip learning. The results show that application of Word2Vec in a gossip learning framework is viable and yields comparable results to its non-distributed, centralized counterpart for various scenarios, with an average loss on quality of 6.904%. / Fältet Naturlig Språkbehandling (Natural Language Processing eller NLP) i maskininlärning har sett en ökande popularitet och användning under de senaste åren. Naturen av Naturlig Språkbehandling, som bearbetar naturliga mänskliga språk och datorer, har lett till forskningen och utvecklingen av många algoritmer som producerar inbäddningar av ord. En av de mest använda av dessa algoritmer är Word2Vec. Med överflödet av data som genereras av användare och organisationer, komplexiteten av maskininlärning och djupa inlärningsmodeller, blir det omöjligt att utföra utbildning med hjälp av en enda maskin. Avancemangen inom distribuerad maskininlärning erbjuder en lösning på detta problem, men tyvärr får data av sekretesskäl och datareglering i vissa verkliga scenarier inte lämna sin lokala maskin. Denna begränsning har lett till utvecklingen av tekniker och protokoll som är massivt parallella och dataprivata. Det mest populära av dessa protokoll är federerad inlärning (federated learning), men på grund av sin centraliserade natur utgör det ändock vissa säkerhets- och robusthetsrisker. Följaktligen ledde detta till utvecklingen av massivt parallella, dataprivata och decentraliserade tillvägagångssätt, såsom skvallerinlärning (gossip learning). I skvallerinlärningsprotokollet väljer varje nod i nätverket slumpmässigt en like för informationsutbyte, vilket eliminerarbehovet av en central nod. Syftet med denna forskning är att testa livskraftighetenav skvallerinlärning i större omfattningens verkliga applikationer. I synnerhet fokuserar forskningen på implementering och utvärdering av en NLP-applikation genom användning av skvallerinlärning. Resultaten visar att tillämpningen av Word2Vec i en skvallerinlärnings ramverk är livskraftig och ger jämförbara resultat med dess icke-distribuerade, centraliserade motsvarighet för olika scenarier, med en genomsnittlig kvalitetsförlust av 6,904%. gossip learning decentralized machine learning distributed machine learning NLP Word2Vec data privacy skvallerinlärning decentraliserad maskininlärning distribuerad maskininlärning naturlig språkbehandling Word2Vec dataintegritet Computer and Information Sciences Data- och informationsvetenskap
344	A Cloud Computing-based Dashboard for the Visualization of Motivational Interviewing Metrics Heng, E Jinq January 2022 (has links) No description available. Computer Science Computer Engineering Behavioral Psychology Human-centered computing human-machine learning natural language processing (NLP) cloud computing visualization UI/UX development
345	Active Learning for Extractive Question Answering Marti Roman, Salvador January 2022 (has links) Data labelling for question answering tasks (QA) is a costly procedure that requires oracles to read lengthy excerpts of texts and reason to extract an answer for a given question from within the text. QA is a task in natural language processing (NLP), where a majority of recent advancements have come from leveraging the vast corpora of unlabelled and unstructured text available online. This work aims to extend this trend in the efficient use of unlabelled text data to the problem of selecting which subset of samples to label in order to maximize performance. This practice of selective labelling is called active learning (AL). Recent developments in AL for NLP have introduced the use of self-supervised learning on large corpora of text in the labelling process of samples for classification problems. This work adapts this research to the task of question answering and performs an initial exploration of expected performance. The methods covered in this work use uncertainty estimates obtained from neural networks to guide an incremental labelling process. These estimates are obtained from transformer-based models, previously trained in a self-supervised manner, by calculating the entropy of the confidence scores or with an approximation of Bayesian uncertainty obtained through Monte Carlo dropout. These methods are evaluated on two different benchmarking QA datasets: SQuAD v1 and TriviaQA. Several factors are observed to influence the behaviour of these uncertainty-based acquisition functions, including the choice of language model used, the presence of unanswered questions and the acquisition size used in the incremental process. The study produces no evidence to support that averaging or selecting maximal uncertainty values between the classification of an answer’s starting and ending positions affects sample acquisition quality. However, language model choice, the presence of unanswerable questions and acquisition size are all identified as key factors affecting consistency between runs and degree of success. Machine Learning Deep Learning Active Learning Natural Language Processing NLP Question Answering Transformers Uncertainty Language Models Probability Theory and Statistics Sannolikhetsteori och statistik
346	Framing and Voting / The German Immigration Debate and the Effects of News Coverage on Political Preferences Berk, Nicolai 03 April 2024 (has links) Eine umfangreiche Literatur zu Framing-Effekten legt nahe, dass Bürger nur über begrenzte politische Präferenzen verfügen. Wenn die öffentliche Meinung so offen für Einflussnahme ist, stellt sie ein wackliges Fundament für den demokratischen Prozess dar. Diese Dissertation stellt daher die Frage, wie sich vorherige experimentelle Erkenntnisse auf komplexe, reale Situationen übertragen lassen und ob Framing auch Wahlabsichten beeinflussen kann. Sie entwickelt eine Methode zur automatischen Identifizierung von Nachrichtenframes. Die Dissertation präsentiert Original- und Sekundärdaten und untersucht den Zusammenhang zwischen Nachrichten-Framing, Migrationseinstellungen und Wahlabsichten. Sie bietet einen Überblick über die Darstellung der Einwanderung in den deutschen Nachrichtenmedien und zeigt, dass weder die Aufmerksamkeit noch das Framing von Migration den Aufstieg der rechtsradikalen AfD erklären können. Anschließend nutzt sie eine Änderung in der Migrationsberichterstattung Deutschlands größter Boulevardzeitung, Bild, und zeigt begrenzte Auswirkungen auf politische Einstellungen und Wahlabsichten ihrer Leser auf. Das letzte empirische Kapitel präsentiert experimentelle Daten, die aufzeigen, dass Framing lediglich die Wahlabsichten eher uninformierter Bürger beeinflusst. Die Ergebnisse tragen zum besseren Verständnis von Framing-Effekten bei und legen nahe, dass Einstellungen von Bürgern nicht so leicht manipuliert werden können und die Macht der Nachrichtenmedien begrenzter ist als oft angenommen. Stattdessen finden Framing-Effekte unter sehr spezifischen Bedingungen statt, die häufig nicht erfüllt sind. Das sich abzeichnende Bild der öffentlichen Meinung zeichnet sich durch kristallisierte Einstellungen aus, die ausschliesslich auf neuartige Ereignisse reagieren. Aus dieser Sicht ist Politik ein Muster aufeinander folgender kritischer Ereignisse, von denen jedes eine einzigartige Gelegenheit bietet, das vorherrschende Verständnis eines Themas zu ändern. / A large experimental literature on framing effects suggests that citizens form rather limited political preferences, open to severe manipulation. If citizens’ attitudes were always so easily malleable for media outlets and political actors, it would not constitute a very meaningful input for the democratic process. This dissertation asks how these experimental findings translate into complex, realworld news environments and whether news frames structure citizens’ voting intentions. It provides a clear conceptualization of frames, on which it builds a method to identify news frames automatically, and theorises a link between news frames and voting intentions. The dissertation presents original and secondary data, exploring the relationship of news framing, immigration attitudes, and voting intentions. Providing a broad overview of immigration framing in the German news media, it shows that neither immigration attention nor framing can explain the rise of the radical-right AfD. It then exploits a change in the immigration framing of Germany’s largest tabloid, Bild, showing that this shift had no effects on immigration attitudes or voting intentions among its readers. The final empirical chapter presents experimental evidence revealing that framing only affects voting intentions among rather uninformed citizens. The findings contribute to the study of framing and public opinion, suggesting that citizens’ attitudes are not as easily manipulated and the power of the news media more limited than often thought. Instead, framing effects take place under highly specific conditions, which are often not fulfilled. The emerging picture of public opinion is one of crystallized and resistant attitudes, which only respond to novel events. In other words: whoever gets to the voter first, wins. Politics, in this view, is a pattern of critical events following upon each other, each presenting a unique opportunity to change the dominant understanding of an issue. Framing Wahlverhalten Politische Einstellungen Textanalyse Einstellungsformierung Framing Voting Behaviour Political Attitudes Natural Language Processing (NLP) Attitude Formation MF 4600 MG 15460 ddc:320
347	Le diminutif chez Aristophane: une langue de femmes? : une analyse par TALN Bouchard, William 09 1900 (has links) Marqueur linguistique très usité dans la comédie, la forme diminutive est une des particularités de la langue d’Aristophane. Comparables au suffixe -ette en français (e.g. maison > maisonnette), les suffixes -ιον et -ισκος sont utilisés par des personnages de tous les genres et de toutes les classes sociales pour exprimer leur évaluation diminutive. Parfois utilisés pour représenter un objet plus petit, parfois pour complimenter et parfois pour exprimer son dédain, les diminutifs sont difficiles à définir et encore plus complexes à démêler des autres formes qui peuvent partager leur suffixe. La première étape de ma recherche a donc consisté à créer un schéma radial capable d’expliquer les différents aspects sémantiques et pragmatiques du diminutif dans le dialecte attique d’Aristophane. La seconde partie de ma recherche a servi de vérification du schéma radial proposé. À travers une méthode établie à partir du schéma radial et de la définition morphologique du diminutif grec, j’ai classé et vérifié les termes trouvés par une application de traitement automatique des langues naturelles créée dans le cadre de cette recherche. Ces données ont également servi à vérifier certaines hypothèses sur la fréquence d’apparition du diminutif et sa variété d’expression dans le sociolecte féminin chez Aristophane. Sujet encore débattu chez les linguistes, la relation entre le genre et l’expression est au centre des préoccupations de la recherche actuelle en morphologie évaluative. L’ensemble de cette recherche se veut donc également une description d’un cadre méthodologique adapté à l’analyse des textes anciens à l’aide de méthodes informatiques. / A linguistic marker widely used in comedy, the diminutive form is one of the distinctive features of Aristophanes' language. Comparable to the suffix -ette in French (e.g. maison > maisonnette), the suffixes -ιον and -ισκος are used by characters of all genders and social classes to express their diminutive valuation. Sometimes used to represent a smaller object, sometimes to compliment and sometimes to express disdain, diminutives are difficult to define and even more complex to disentangle from other forms that may share their suffix. The first stage of my research therefore involved creating a radial scheme capable of explaining the various semantic and pragmatic aspects of the diminutive in Aristophanes’ attic dialect. The second part of my research served to verify the proposed radial scheme. Using a method based on my radial scheme and the morphological definition of the greek diminutive, I classified and verified the terms found by a natural language processing application created as part of this research. These data were also used to test certain hypotheses on the frequency of appearance of the diminutive and its variety of expression in Aristophanes' feminine sociolect. The relationship between gender and expression is still a hotly debated topic among linguists, and is at the heart of current research in evaluative morphology. The whole of this research is therefore also intended as a description of a methodological framework suitable for the analysis of ancient texts with computational methods. Diminutif TALN Comédie grecque Morphologie évaluative Aristophane Expression féminine Feminine expression Aristophanes Evaluative morphology Greek comedy NLP Diminutive
348	Optimering av beslutsstöd inom verksamhetsstyrning genom en undersökning av artificiell intelligens : En djupgående undersökning av effektiva AI-tekniker för bättre affärsbeslut / Optimizing decision support in business management through an artificial intelligence study : An in-depth survey of effective AI techniques for better business decisions Sakhai, Aram January 2024 (has links) Denna studie undersöker hur artificiell intelligens (AI) kan optimera beslutsstödet inom verksamhetsstyrning genom analys av ostrukturerad data. Genom att granska begrepp som verksamhetsstyrning, Business Intelligence (BI), AI och maskininlärning (ML), belyser studien hur dessa teknologier kan förbättra organisationers beslutsprocesser. Verksamhetsstyrning syftar till att samordna och optimera organisationens delar för att nå gemensamma mål. AI (NLP, ML) samt särskilt genom BI spelar en avgörande roll genom att förbättra effektivitet och kvalitet. BI samlar och analyserar affärsinformation, medan ML möjliggör automatisk lärande från data. Studiens problemområde identifierar utmaningen med att hantera stora mängder ostrukturerad data. Trots AI:s potential att förbättra beslutsfattandet har dess fulla potential ännu inte realiserats. Genom att undersöka effektiv användning av AI för ostrukturerad data, bidrar studien till en bättre förståelse av hur AI kan förbättra beslutsstödet.Den kvalitativa ansatsen använde semistrukturerade intervjuer med IT-experter för att samla insikter om AI:s användning i beslutsfattande. Respondenterna beskrev hur AI analyserar data, förutsäger trender, optimerar processer och personaliserar kundupplevelser. AI automatiserar också tidskrävande uppgifter, vilket ökar effektiviteten och frigör tid för strategiskt arbete. Det visar att AI kan förbättra datakvalitet, automatisera processer och ge djupare insikter i kundbeteenden och marknadstrender. AI:s förmåga att hantera ostrukturerad data möjliggör identifiering av trender och mönster som annars skulle vara svåra att upptäcka. Utmaningar med AI-implementering inkluderar systemintegrering och behovet av teknisk expertis. Sammanfattningsvis visar studien att AI har stor potential att optimera beslutsstödet inom verksamhetsstyrning genom analys av ostrukturerad data. Artificiell intelligens (AI) verksamhetsstyrning business intelligence (BI) maskininlärning (ML) naturlig språkbehandling (NLP) ostrukturerad data Information Systems, Social aspects
349	SubRosa – Multi-Feature-Ähnlichkeitsvergleiche von Untertiteln Luhmann, Jan, Burghardt, Manuel, Tiepmar, Jochen 20 June 2024 (has links) No description available. info:eu-repo/classification/ddc/006 ddc:006 info:eu-repo/classification/ddc/770 ddc:770
350	„The Vectorian“ – Eine parametrisierbare Suchmaschine für intertextuelle Referenzen Liebl, Bernhard, Burghardt, Manuel 20 June 2024 (has links) No description available. info:eu-repo/classification/ddc/006 ddc:006 info:eu-repo/classification/ddc/800 ddc:800

Search results