Global ETD Search

21	Automatic Identification of Duplicates in Literature in Multiple Languages Klasson Svensson, Emil January 2018 (has links) As the the amount of books available online the sizes of each these collections are at the same pace growing larger and more commonly in multiple languages. Many of these cor- pora contain duplicates in form of various editions or translations of books. The task of finding these duplicates is usually done manually but with the growing sizes making it time consuming and demanding. The thesis set out to find a method in the field of Text Mining and Natural Language Processing that can automatize the process of manually identifying these duplicates in a corpora mainly consisting of fiction in multiple languages provided by Storytel. The problem was approached using three different methods to compute distance measures between books. The first approach was comparing titles of the books using the Levenstein- distance. The second approach used extracting entities from each book using Named En- tity Recognition and represented them using tf-idf and cosine dissimilarity to compute distances. The third approach was using a Polylingual Topic Model to estimate the books distribution of topics and compare them using Jensen Shannon Distance. In order to es- timate the parameters of the Polylingual Topic Model 8000 books were translated from Swedish to English using Apache Joshua a statistical machine translation system. For each method every book written by an author was pairwise tested using a hypothesis test where the null hypothesis was that the two books compared is not an edition or translation of the others. Since there is no known distribution to assume as the null distribution for each book a null distribution was estimated using distance measures of books not written by the author. The methods were evaluated on two different sets of manually labeled data made by the author of the thesis. One randomly sampled using one-stage cluster sampling and one consisting of books from authors that the corpus provider prior to the thesis be considered more difficult to label using automated techniques. Of the three methods the Title Matching was the method that performed best in terms of accuracy and precision based of the sampled data. The entity matching approach was the method with the lowest accuracy and precision but with a almost constant recall at around 50 %. It was concluded that there seems to be a set of duplicates that are clearly distin- guished from the estimated null-distributions, with a higher significance level a better pre- cision and accuracy could have been made with a similar recall for the specific method. For topic matching the result was worse than the title matching and when studied the es- timated model was not able to create quality topics the cause of multiple factors. It was concluded that further research is needed for the topic matching approach. None of the three methods were deemed be complete solutions to automatize detection of book duplicates. Read more Probability Theory and Statistics Sannolikhetsteori och statistik
22	Automatic Text Ontological Representation and Classification via Fundamental to Specific Conceptual Elements (TOR-FUSE) Razavi, Amir Hossein January 2012 (has links) In this dissertation, we introduce a novel text representation method mainly used for text classification purpose. The presented representation method is initially based on a variety of closeness relationships between pairs of words in text passages within the entire corpus. This representation is then used as the basis for our multi-level lightweight ontological representation method (TOR-FUSE), in which documents are represented based on their contexts and the goal of the learning task. The method is unlike the traditional representation methods, in which all the documents are represented solely based on the constituent words of the documents, and are totally isolated from the goal that they are represented for. We believe choosing the correct granularity of representation features is an important aspect of text classification. Interpreting data in a more general dimensional space, with fewer dimensions, can convey more discriminative knowledge and decrease the level of learning perplexity. The multi-level model allows data interpretation in a more conceptual space, rather than only containing scattered words occurring in texts. It aims to perform the extraction of the knowledge tailored for the classification task by automatic creation of a lightweight ontological hierarchy of representations. In the last step, we will train a tailored ensemble learner over a stack of representations at different conceptual granularities. The final result is a mapping and a weighting of the targeted concept of the original learning task, over a stack of representations and granular conceptual elements of its different levels (hierarchical mapping instead of linear mapping over a vector). Finally the entire algorithm is applied to a variety of general text classification tasks, and the performance is evaluated in comparison with well-known algorithms. Read more Text representation Ontological representation Second order text representation Natural Language Processing (NLP) Machine Learning (ML) Concept mining Hierarchical Representation Multi-level text representation SOSCO TOR-FUSE
23	Automatic Speech Recognition System for Somali in the interest of reducing Maternal Morbidity and Mortality. Laryea, Joycelyn, Jayasundara, Nipunika January 2020 (has links) Developing an Automatic Speech Recognition (ASR) system for the Somali language, though not novel, is not actively explored; hence there has been no success in a model for conversational speech. Neither are related works accessible as open-source. The unavailability of digital data is what labels Somali as a low resource language and poses the greatest impediment to the development of an ASR for Somali. The incentive to develop an ASR system for the Somali language is to contribute to reducing the Maternal Mortality Rate (MMR) in Somalia. Researchers acquire interview audio data regarding maternal health and behaviour in the Somali language; to be able to engage the relevant stakeholders to bring about the needed change, these audios must be transcribed into text, which is an important step towards translation into any language. This work investigates available ASR for Somali and attempts to develop a prototype ASR system to convert Somali audios into Somali text. To achieve this target, we first identified the available open-source systems for speech recognition and selected the DeepSpeech engine for the implementation of the prototype. With three hours of audio data, the accuracy of transcription is not as required and cannot be deployed for use. This we attribute to insufficient training data and estimate that the effort towards an ASR for Somali will be more significant by acquiring about 1200 hours of audio to train the DeepSpeech engine Read more Automatic Speech Recognition (ASR) DeepSpeech Natural Language Processing (NLP) Word Error Rate (WER) Character Error Rate (CER) Social Sciences Samhällsvetenskap
24	Embodied Metarepresentations Hinrich, Nicolás, Foradi, Maryam, Yousef, Tariq, Hartmann, Elisa, Triesch, Susanne, Kaßel, Jan, Pein, Johannes 06 June 2023 (has links) Meaning has been established pervasively as a central concept throughout disciplines that were involved in cognitive revolution. Its metaphoric usage comes to be, first and foremost, through the interpreter’s constraint: representational relationships and contents are considered to be in the “eye” or mind of the observer and shared properties among observers themselves are knowable through interlinguistic phenomena, such as translation. Despite the instability of meaning in relation to its underdetermination by reference, it can be a tertium comparationis or “third comparator” for extended human cognition if gauged through invariants that exist in transfer processes such as translation, as all languages and cultures are rooted in pan-human experience and, thus, share and express species-specific ontology. Meaning, seen as a cognitive competence, does not stop outside of the body but extends, depends, and partners with other agents and the environment. A novel approach for exploring the transfer properties of some constituent items of the original natural semantic metalanguage in English, that is, semantic primitives, is presented: FrameNet’s semantic frames, evoked by the primes SEE and FEEL, were extracted from EuroParl, a parallel corpus that allows for the automatic word alignment of items with their synonyms. Large Ontology Multilingual Extraction was used. Afterward, following the Semantic Mirrors Method, a procedure that consists back-translating into source language, a translatological examination of translated and original versions of items was performed. A fully automated pipeline was designed and tested, with the purpose of exploring associated frame shifts and, thus, beginning a research agenda on their alleged universality as linguistic features of translation, which will be complemented with and contrasted against further massive feedback through a citizen science approach, as well as cognitive and neurophysiological examinations. Additionally, an embodied account of frame semantics is proposed. Read more info:eu-repo/classification/ddc/410 ddc:410
25	RECOMMENDATION SYSTEMS IN SOCIAL NETWORKS Behafarid Mohammad Jafari (15348268) 18 May 2023 (has links) <p> The dramatic improvement in information and communication technology (ICT) has made an evolution in learning management systems (LMS). The rapid growth in LMSs has caused users to demand more advanced, automated, and intelligent services. CourseNetworking is a next-generation LMS adopting machine learning to add personalization, gamification, and more dynamics to the system. This work tries to come up with two recommender systems that can help improve CourseNetworking services. The first one is a social recommender system helping CourseNetworking to track user interests and give more relevant recommendations. Recently, graph neural network (GNN) techniques have been employed in social recommender systems due to their high success in graph representation learning, including social network graphs. Despite the rapid advances in recommender systems performance, dealing with the dynamic property of the social network data is one of the key challenges that is remained to be addressed. In this research, a novel method is presented that provides social recommendations by incorporating the dynamic property of social network data in a heterogeneous graph by supplementing the graph with time span nodes that are used to define users long-term and short-term preferences over time. The second service that is proposed to add to Rumi services is a hashtag recommendation system that can help users label their posts quickly resulting in improved searchability of content. In recent years, several hashtag recommendation methods are proposed and developed to speed up processing of the texts and quickly find out the critical phrases. The methods use different approaches and techniques to obtain critical information from a large amount of data. This work investigates the efficiency of unsupervised keyword extraction methods for hashtag recommendation and recommends the one with the best performance to use in a hashtag recommender system. </p> Read more Knowledge representation and reasoning Natural language processing Data mining and knowledge discovery Graph, social and multimedia data Recommender systems Recommender Systems Graph neural networks (GNNs) Natural Language Processing (NLP) Machine Learning
26	Granskning av examensarbetesrapporter med IBM Watson molntjänster Eriksson, Patrik, Wester, Philip January 2018 (has links) Cloud services are one of the fast expanding fields of today. Companies such as Amazon, Google, Microsoft and IBM offer these cloud services in various forms. As this field progresses, the natural question occurs ”What can you do with the technology today?”. The technology offers scalability for hardware usage and user demands, that is attractive to developers and companies. This thesis tries to examine the applicability of cloud services, by combining it with the question: ”Is it possible to make an automated thesis examiner?” By narrowing down the services to IBM Watson web services, this thesis main question reads ”Is it possible to make an automated thesis examiner using IBM Watson?”. Hence the goal of this thesis was to create an automated thesis examiner. The project used a modified version of Bunge’s technological research method. Where amongst the first steps, a definition of an software thesis examiner for student theses was created. Then an empirical study of the Watson services, that seemed relevant from the literature study, proceeded. These empirical studies allowed a deeper understanding about the services’ practices and boundaries. From these implications and the definition of a software thesis examiner for student theses, an idea of how to build and implement an automated thesis examiner was created. Most of IBM Watson’s services were thoroughly evaluated, except for the service Machine Learning, that should have been studied further if the time resources would not have been depleted. This project found the Watson web services useful in many cases but did not find a service that was well suited for thesis examination. Although the goal was not reached, this thesis researched the Watson web services and can be used to improve understanding of its applicability, and for future implementations that face the provided definition. / Molntjänster är ett av de områden som utvecklas snabbast idag. Företag såsom Amazon, Google, Microsoft och IBM tillhandahåller dessa tjänster i flera former. Allteftersom utvecklingen tar fart, uppstår den naturliga frågan ”Vad kan man göra med den här tekniken idag?”. Tekniken erbjuder en skalbarhet mot använd hårdvara och antalet användare, som är attraktiv för utvecklare och företag. Det här examensarbetet försöker svara på hur molntjänster kan användas genom att kombinera det med frågan ”Är det möjligt att skapa en automatiserad examensarbetesrapportsgranskare?”. Genom att avgränsa undersökningen till IBM Watson molntjänster försöker arbetet huvudsakligen svara på huvudfrågan ”Är det möjligt att skapa en automatiserad examensarbetesrapportsgranskare med Watson molntjänster?”. Därmed var målet med arbetet att skapa en automatiserad examensarbetesrapportsgranskare. Projektet följde en modifierad version av Bunge’s teknologiska undersökningsmetod, där det första steget var att skapa en definition för en mjukvaruexamensarbetesrapportsgranskare följt av en utredning av de Watson molntjänster som ansågs relevanta från litteratur studien. Dessa undersöktes sedan vidare i empirisk studie. Genom de empiriska studierna skapades förståelse för tjänsternas tillämpligheter och begränsningar, för att kunna kartlägga hur de kan användas i en automatiserad examensarbetsrapportsgranskare. De flesta tjänster behandlades grundligt, förutom Machine Learning, som skulle behövt vidare undersökning om inte tidsresurserna tog slut. Projektet visar på att Watson molntjänster är användbara men inte perfekt anpassade för att granska examensarbetesrapporter. Även om inte målet uppnåddes, undersöktes Watson molntjänster, vilket kan ge förståelse för deras användbarhet och framtida implementationer för att möta den skapade definitionen. Read more Natural Language Processing; NLP Computer and Information Sciences Data- och informationsvetenskap
27	Large language models as an interface to interact with API tools in natural language Tesfagiorgis, Yohannes Gebreyohannes, Monteiro Silva, Bruno Miguel January 2023 (has links) In this research project, we aim to explore the use of Large Language Models (LLMs) as an interface to interact with API tools in natural language. Bubeck et al. [1] shed some light on how LLMs could be used to interact with API tools. Since then, new versions of LLMs have been launched and the question of how reliable a LLM can be in this task remains unanswered. The main goal of our thesis is to investigate the designs of the available system prompts for LLMs, identify the best-performing prompts, and evaluate the reliability of different LLMs when using the best-identified prompts. We will employ a multiple-stage controlled experiment: A literature review where we reveal the available system prompts used in the scientific community and open-source projects; then, using F1-score as a metric we will analyse the precision and recall of the system prompts aiming to select the best-performing system prompts in interacting with API tools; and in a latter stage, we compare a selection of LLMs with the best-performing prompts identified earlier. From these experiences, we realize that AI-generated system prompts perform better than the current prompts used in open-source and literature with GPT-4, zero-shot prompts have better performance in this specific task with GPT-4 and that a good system prompt in one model does not generalize well into other models. Read more Large language model (LLM) Natural Language Processing (NLP) GPT-4 Llama-2 Palm Application Programming Interface (API). Engineering and Technology Teknik och teknologier Computer Sciences Datavetenskap (datalogi)
28	Sentiment Analysis Of IMDB Movie Reviews : A comparative study of Lexicon based approach and BERT Neural Network model Domadula, Prashuna Sai Surya Vishwitha, Sayyaparaju, Sai Sumanwita January 2023 (has links) Background: Movies have become an important marketing and advertising tool that can influence consumer behaviour and trends. Reading film reviews is an im- important part of watching a movie, as it can help viewers gain a general under- standing of the film. And also, provide filmmakers with feedback on how their work is being received. Sentiment analysis is a method of determining whether a review has positive or negative sentiment, and this study investigates a machine learning method for classifying sentiment from film reviews. Objectives: This thesis aims to perform comparative sentiment analysis on textual IMDb movie reviews using lexicon-based and BERT neural network models. Later different performance evaluation metrics are used to identify the most effective learning model. Methods: This thesis employs a quantitative research technique, with data analysed using traditional machine learning. The labelled data set comes from an online website called Kaggle (https://www.kaggle.com/datasets), which contains movie review information. Algorithms like the lexicon-based approach and the BERT neural networks are trained using the chosen IMDb movie reviews data set. To discover which model performs the best at predicting the sentiment analysis, the constructed models will be assessed on the test set using evaluation metrics such as accuracy, precision, recall and F1 score. Results: From the conducted experimentation the BERT neural network model is the most efficient algorithm in classifying the IMDb movie reviews into positive and negative sentiments. This model achieved the highest accuracy score of 90.67% over the trained data set, followed by the BoW model with an accuracy of 79.15%, whereas the TF-IDF model has 78.98% accuracy. BERT model has the better precision and recall with 0.88 and 0.92 respectively, followed by both BoW and TF-IDF models. The BoW model has a precision and recall of 0.79 and the TF-IDF has a precision of 0.79 and a recall of 0.78. And also the BERT model has the highest F1 score of 0.88, followed by the BoW model having a F1 score of 0.79 whereas, TF-IDF has 0.78. Conclusions: Among the two models evaluated, the lexicon-based approach and the BERT transformer neural network, the BERT neural network is the most efficient, having a good performance score based on the measured performance criteria. Read more Bag of Words(BoW) Deep Learning IMDb Movie Reviews Machine Learning Natural Language Processing(NLP) Sentiment Analysis Computer Sciences Datavetenskap (datalogi)
29	A Cloud Computing-based Dashboard for the Visualization of Motivational Interviewing Metrics Heng, E Jinq January 2022 (has links) No description available. Computer Science Computer Engineering Behavioral Psychology Human-centered computing human-machine learning natural language processing (NLP) cloud computing visualization UI/UX development
30	Comparison of Machine Learning Models Used for Swedish Text Classification in Chat Messaging Karim, Mezbahul, Amanzadi, Amirtaha January 2022 (has links) The rise of social media and the use of mobile applications has led to increasing concerns regarding the content that is shared through these apps and whether they are being regulated or not. One of the problems that can arise due to a lack of regulation is that chat messages that are inappropriate or of profane nature can be allowed to be shared through these apps. Thus, it is vital to detect whenever these types of chat messages are shared through these mobile applications. In addition to that, there should also be detection of chat messages that can lead to the identity of the users being revealed as that is how the app in this thesis project was intended to be used. One of the most popular approaches to detect chat messages of this nature is to use machine learning techniques that can classify text. We were quick to discover that there were not many machine learning models that were built to classify short text messages in the Swedish language, thus the main problem of our thesis was the lack of evaluation and analysis of machine learning models for text classification in the context of the chat messages in Swedish. Thus, the purpose of our project was mainly to find the best performing models for text classification, implement these models and evaluate them to find the best among the ones we found. After the models were created, a hosting server, as well as an API, was required for the text classifying system to compute and communicate the prediction results to the mobile application in real-time. Therefore, the models were containerized and deployed as a REST API that serves requests upon arrival on a cloud server. The goal of this project was to help future work being done on text classification in the Swedish language by providing the results of this thesis to any parties that are interested in our line of work. From our own experience, we realized how challenging it can be to find and choose the best machine learning models when one has no previous data on which can be the best performing one. Thus, we believe that the results of this thesis project will greatly aid future projects in this area. The chosen research methodology was qualitative and dealt with quantitative data. The results we received showed that the BERT model was the best choice among the three models that we compared. With minute adjustments, this model should be more than capable of detecting the type of chat messages that it is required within the mobile application. / Uppkomsten av social media och användning av mobilapplikationer ledde till ökande oro om innehållet som är delad inom dessa appar och om dem är reglerad eller inte. Ett problem som uppstår på grund av bristande reglering kan vara att chatmeddelanden som är olämplig eller profan kan bli delad med dessa appar. Därför är det viktig att upptäcka när dessa typer av chatmeddelande är delad genom mobilapplikationer. Dessutom det måste finnas ett system som upptäcker chattmeddelanden som kan hjälpa att avslöja användarens identiteter, som den här appen i detta projekt avsedda att användas. En av mest populära sett att upptäcka den typen av chattmeddelanden är användning av mäskinlärning tekniker som kan klassificera text. Vi snart hittade att det finns inte så många mäskinlärning modeller som var byggt att klassificera texter på svenska, alltså huvudproblem med vår exam en var bistrande utvärdering och analys av mäskinlärning modeller för textklassificering i kontext av svenska språket. Så, syftet med vårt projekt var att hitta de bästa presenterande modeller för textklassifikation, genomföra dessa modeller själva och sedan utvärdera dem att hitta den bästa. Därtill, för att textklassificering ska beräkna och kommunicera den förutsägelseresultaten till mobila applikationer i realtid behövs en värdserver samt en API. Därför, modellerna containeriserades och distribuerad es som en REST API som betjänar begäran vid ankomst på en molnserver. Målet med det här projektet var att hjälpa framtidsarbete inom textklassifikation på svenska språket genom att tillhandahålla resultaten till partier som är intresserad i vår arbetslin je. Från vår egen erfarenhet, vi insåg att det var svårt att hitta och välja dem bästa mäskinlärning modeller, specifikt när man har inga data som tidigare visat den med bäst prestanda. Och därför vi anser att den resultaten av den har examen kommer att v ara stor hjälp till framtida projekt i det här området. Den valda forskningsmetodiken var kvalitativ och handlade om kvantitativ data. Resultaten visade att BERT modell var den bästa bland de tre modellerna som vi jämförde med. Med lite justeringen är mod ellen mer än kapable att detektera den typen av krävs inom mobilapplikationen. Read more Machine learning Natural language processing (NLP) Text classification Model deployment BERT Maskininlärning Naturlig språkbehandling (NLP) Textklassificering Modellinstallation BERT Computer and Information Sciences Data- och informationsvetenskap

Search results