Spelling suggestions: "subject:"batural language processing (NLP)"" "subject:"datural language processing (NLP)""
21 |
Προδιαγραφές μιας καινοτόμας πλατφόρμας ηλεκτρονικής μάθησης που ενσωματώνει τεχνικές επεξεργασίας φυσικής γλώσσαςΦερφυρή, Ναυσικά 04 September 2013 (has links)
Ζούμε σε μια κοινωνία στην οποία η χρήση της τεχνολογίας έχει εισβάλει δυναμικά στην καθημερινότητα.Η εκπαίδευση δεν θα μπορούσε να μην επηρεαστεί απο τις Νέες Τεχνολογίες.Ήδη,όροι όπως “Ηλεκτρονική Μάθηση” και ”Ασύγχρονη Τηλε-εκπαίδευση” έχουν δημιουργήσει νέα δεδομένα στην κλασική Εκπαίδευση. Με τον όρο ασύγχρονη τηλε-εκπαίδευση εννοούμε μια διαδικασία ανταλλαγής μάθησης μεταξύ εκπαιδευτή - εκπαιδευομένων,που πραγματοποιείται ανεξάρτητα χρόνου και τόπου. Ηλεκτρονική Μάθηση είναι η χρήση των νέων πολυμεσικών τεχνολογιών και του διαδικτύου για τη βελτίωση της ποιότητας της μάθησης,διευκολύνοντας την πρόσβαση σε πηγές πληροφοριών και σε υπηρεσίες καθώς και σε ανταλλαγές και εξ'αποστάσεως συνεργασίες.Ο όρος καλύπτει ένα ευρύ φάσμα εφαρμογών και διαδικασιών,όπως ηλεκτρονικές τάξεις και ψηφιακές συνεργασίες, μάθηση βασιζόμενη στους ηλεκτρονικούς υπολογιστές και στις τεχνολογίες του παγκόσμιου ιστού. Κάποιες απο τις βασικές απαιτήσεις που θα πρέπει να πληρούνται για την δημιουργία μιας πλατφόρμας ηλεκτρονικής μάθησης είναι: Να υποστηρίζει τη δημιουργία βημάτων συζήτησης (discussion forums) και “δωματίων συζήτησης”(chat rooms),να υλοποιεί ηλεκτρονικό ταχυδρομείο,να έχει φιλικό περιβάλλον τόσο για το χρήστη/μαθητή όσο και για το χρήστη/καθηγητή,να υποστηρίζει προσωποποίηση(customization)του περιβάλλοντος ανάλογα με το χρήστη.Επίσης να κρατάει πληροφορίες(δημιουργία profiles)για το χρήστη για να τον “βοηθάει”κατά την πλοήγηση,να υποστηρίζει την εύκολη δημιουργία διαγωνισμάτων(online tests), να υποστηρίζει την παρουσίαση πολυμεσικών υλικών. Ως επεξεργασία φυσικής γλώσσας (NLP) ορίζουμε την υπολογιστική ανάλυση αδόμητων δεδομένων σε κείμενα, με σκοπό την επίτευξη μηχανικής κατανόησης του κειμένου αυτού.Είναι η επεξεργασία προτάσεων που εισάγονται ή διαβάζονται από το σύστημα,το οποίο απαντά επίσης με προτάσεις με τρόπο τέτοιο που να θυμίζει απαντήσεις μορφωμένου ανθρώπου. Βασικό ρόλο παίζει η γραμματική,το συντακτικό,η ανάλυση των εννοιολογικών στοιχείων και γενικά της γνώσης, για να γίνει κατανοητή η ανθρώπινη γλώσσα από τη μηχανή. Οι βασικές τεχνικές επεξεργασίας φυσικού κειμένου βασίζονται στις γενικές γνώσεις σχετικά με τη φυσική γλώσσα.Χρησιμοποιούν ορισμένους απλούς ευρετικούς κανόνες οι οποίοι στηρίζονται στη συντακτική και σημασιολογική προσέγγιση και ανάλυση του κειμένου.Ορισμένες τεχνικές που αφορούν σε όλα τα πεδία εφαρμογής είναι: ο διαμερισμός στα συστατικά στοιχεία του κειμένου (tokenization), η χρήση της διάταξης του κειμένου (structural data mining), η απαλοιφή λέξεων που δεν φέρουν ουσιαστική πληροφορία (elimination of insignificant words),η γραμματική δεικτοδότηση (PoS tagging), η μορφολογική ανάλυση και η συντακτική ανάλυση. Στόχος της παρούσας διπλωματικής είναι να περιγράψει και να αξιολογήσει πως οι τεχνικές επεξεργασίας της φυσικής γλώσσας (NLP), θα μπορούσαν να αξιοποιηθούν για την ενσωμάτωση τους σε πλατφόρμες ηλεκτρονικής μάθησης.Ο μεγάλος όγκος δεδομένων που παρέχεται μέσω μιας ηλεκτρονικής πλατφόρμας μάθησης, θα πρέπει να μπορεί να διαχειριστεί , να διανεμηθεί και να ανακτηθεί σωστά.Κάνοντας χρήση των τεχνικών NLP θα παρουσιαστεί μια καινοτόμα πλατφόρμα ηλεκτρονικής μάθησης,εκμεταλεύοντας τις υψηλού επιπέδου τεχνικές εξατομίκευσης, την δυνατότητα εξαγωγής συμπερασμάτων επεξεργάζοντας την φυσική γλώσσα των χρηστών προσαρμόζοντας το προσφερόμενο εκπαιδευτικό υλικό στις ανάγκες του κάθε χρήστη. / We live in a society in which the use of technology has entered dynamically in our life,the education could not be influenced by new Technologies. Terms such as "e-Learning" and "Asynchronous e-learning" have created new standards in the classical Education.
By the term “asynchronous e-learning” we mean a process of exchange of learning between teacher & student, performed regardless of time and place.
E-learning is the use of new multimedia technologies and the Internet to improve the quality of learning by facilitating access to information resources and services as well as remote exchanges .The term covers a wide range of applications and processes, such electronic classrooms, and digital collaboration, learning based on computers and Web technologies.
Some of the basic requirements that must be met to establish a platform for e-learning are: To support the creation of forums and chat rooms, to deliver email, has friendly environment for both user / student and user / teacher, support personalization depending to the user . Holding information (creating profiles) for the user in order to provide help in the navigation, to support easy creating exams (online tests), to support multimedia presentation materials.
As natural language processing (NLP) define the computational analysis of unstructured data in text, to achieve mechanical understanding of the text. To elaborate proposals that imported or read by the system, which also responds by proposals in a manner that reminds answers of educated man. A key role is played by the grammar, syntax, semantic analysis of data and general knowledge to understand the human language of the machine.
The main natural text processing techniques based on general knowledge about natural language .This techniques use some simple heuristic rules based on syntactic and semantic analysis of the text. Some of the techniques pertaining to all fields of application are: tokenization, structural data mining, elimination of insignificant words, PoS tagging, analyzing the morphological and syntactic analysis.
The aim of this study is to describe and evaluate how the techniques of natural language processing (NLP), could be used for incorporation into e-learning platforms. The large growth of data delivered through an online learning platform, should be able to manage, distributed and retrieved. By the use of NLP techniques will be presented an innovative e-learning platform, using the high level personalization techniques, the ability to extract conclusions digesting the user's natural language by customizing the offered educational materials to the needs of each user .
|
22 |
Automatic Identification of Duplicates in Literature in Multiple LanguagesKlasson Svensson, Emil January 2018 (has links)
As the the amount of books available online the sizes of each these collections are at the same pace growing larger and more commonly in multiple languages. Many of these cor- pora contain duplicates in form of various editions or translations of books. The task of finding these duplicates is usually done manually but with the growing sizes making it time consuming and demanding. The thesis set out to find a method in the field of Text Mining and Natural Language Processing that can automatize the process of manually identifying these duplicates in a corpora mainly consisting of fiction in multiple languages provided by Storytel. The problem was approached using three different methods to compute distance measures between books. The first approach was comparing titles of the books using the Levenstein- distance. The second approach used extracting entities from each book using Named En- tity Recognition and represented them using tf-idf and cosine dissimilarity to compute distances. The third approach was using a Polylingual Topic Model to estimate the books distribution of topics and compare them using Jensen Shannon Distance. In order to es- timate the parameters of the Polylingual Topic Model 8000 books were translated from Swedish to English using Apache Joshua a statistical machine translation system. For each method every book written by an author was pairwise tested using a hypothesis test where the null hypothesis was that the two books compared is not an edition or translation of the others. Since there is no known distribution to assume as the null distribution for each book a null distribution was estimated using distance measures of books not written by the author. The methods were evaluated on two different sets of manually labeled data made by the author of the thesis. One randomly sampled using one-stage cluster sampling and one consisting of books from authors that the corpus provider prior to the thesis be considered more difficult to label using automated techniques. Of the three methods the Title Matching was the method that performed best in terms of accuracy and precision based of the sampled data. The entity matching approach was the method with the lowest accuracy and precision but with a almost constant recall at around 50 %. It was concluded that there seems to be a set of duplicates that are clearly distin- guished from the estimated null-distributions, with a higher significance level a better pre- cision and accuracy could have been made with a similar recall for the specific method. For topic matching the result was worse than the title matching and when studied the es- timated model was not able to create quality topics the cause of multiple factors. It was concluded that further research is needed for the topic matching approach. None of the three methods were deemed be complete solutions to automatize detection of book duplicates.
|
23 |
Automatic Text Ontological Representation and Classification via Fundamental to Specific Conceptual Elements (TOR-FUSE)Razavi, Amir Hossein January 2012 (has links)
In this dissertation, we introduce a novel text representation method mainly used for text classification purpose. The presented representation method is initially based on a variety of closeness relationships between pairs of words in text passages within the entire corpus. This representation is then used as the basis for our multi-level lightweight ontological representation method (TOR-FUSE), in which documents are represented based on their contexts and the goal of the learning task. The method is unlike the traditional representation methods, in which all the documents are represented solely based on the constituent words of the documents, and are totally isolated from the goal that they are represented for. We believe choosing the correct granularity of representation features is an important aspect of text classification. Interpreting data in a more general dimensional space, with fewer dimensions, can convey more discriminative knowledge and decrease the level of learning perplexity. The multi-level model allows data interpretation in a more conceptual space, rather than only containing scattered words occurring in texts. It aims to perform the extraction of the knowledge tailored for the classification task by automatic creation of a lightweight ontological hierarchy of representations. In the last step, we will train a tailored ensemble learner over a stack of representations at different conceptual granularities. The final result is a mapping and a weighting of the targeted concept of the original learning task, over a stack of representations and granular conceptual elements of its different levels (hierarchical mapping instead of linear mapping over a vector). Finally the entire algorithm is applied to a variety of general text classification tasks, and the performance is evaluated in comparison with well-known algorithms.
|
24 |
Automatic Speech Recognition System for Somali in the interest of reducing Maternal Morbidity and Mortality.Laryea, Joycelyn, Jayasundara, Nipunika January 2020 (has links)
Developing an Automatic Speech Recognition (ASR) system for the Somali language, though not novel, is not actively explored; hence there has been no success in a model for conversational speech. Neither are related works accessible as open-source. The unavailability of digital data is what labels Somali as a low resource language and poses the greatest impediment to the development of an ASR for Somali. The incentive to develop an ASR system for the Somali language is to contribute to reducing the Maternal Mortality Rate (MMR) in Somalia. Researchers acquire interview audio data regarding maternal health and behaviour in the Somali language; to be able to engage the relevant stakeholders to bring about the needed change, these audios must be transcribed into text, which is an important step towards translation into any language. This work investigates available ASR for Somali and attempts to develop a prototype ASR system to convert Somali audios into Somali text. To achieve this target, we first identified the available open-source systems for speech recognition and selected the DeepSpeech engine for the implementation of the prototype. With three hours of audio data, the accuracy of transcription is not as required and cannot be deployed for use. This we attribute to insufficient training data and estimate that the effort towards an ASR for Somali will be more significant by acquiring about 1200 hours of audio to train the DeepSpeech engine
|
25 |
Embodied MetarepresentationsHinrich, Nicolás, Foradi, Maryam, Yousef, Tariq, Hartmann, Elisa, Triesch, Susanne, Kaßel, Jan, Pein, Johannes 06 June 2023 (has links)
Meaning has been established pervasively as a central concept throughout disciplines
that were involved in cognitive revolution. Its metaphoric usage comes to be, first and
foremost, through the interpreter’s constraint: representational relationships and contents
are considered to be in the “eye” or mind of the observer and shared properties
among observers themselves are knowable through interlinguistic phenomena, such
as translation. Despite the instability of meaning in relation to its underdetermination
by reference, it can be a tertium comparationis or “third comparator” for extended
human cognition if gauged through invariants that exist in transfer processes such as
translation, as all languages and cultures are rooted in pan-human experience and, thus,
share and express species-specific ontology. Meaning, seen as a cognitive competence,
does not stop outside of the body but extends, depends, and partners with other
agents and the environment. A novel approach for exploring the transfer properties
of some constituent items of the original natural semantic metalanguage in English,
that is, semantic primitives, is presented: FrameNet’s semantic frames, evoked by the
primes SEE and FEEL, were extracted from EuroParl, a parallel corpus that allows for
the automatic word alignment of items with their synonyms. Large Ontology Multilingual
Extraction was used. Afterward, following the Semantic Mirrors Method, a procedure
that consists back-translating into source language, a translatological examination of
translated and original versions of items was performed. A fully automated pipeline
was designed and tested, with the purpose of exploring associated frame shifts and,
thus, beginning a research agenda on their alleged universality as linguistic features of
translation, which will be complemented with and contrasted against further massive
feedback through a citizen science approach, as well as cognitive and neurophysiological
examinations. Additionally, an embodied account of frame semantics is proposed.
|
26 |
RECOMMENDATION SYSTEMS IN SOCIAL NETWORKSBehafarid Mohammad Jafari (15348268) 18 May 2023 (has links)
<p> The dramatic improvement in information and communication technology (ICT) has made an evolution in learning management systems (LMS). The rapid growth in LMSs has caused users to demand more advanced, automated, and intelligent services. CourseNetworking is a next-generation LMS adopting machine learning to add personalization, gamification, and more dynamics to the system. This work tries to come up with two recommender systems that can help improve CourseNetworking services. The first one is a social recommender system helping CourseNetworking to track user interests and give more relevant recommendations. Recently, graph neural network (GNN) techniques have been employed in social recommender systems due to their high success in graph representation learning, including social network graphs. Despite the rapid advances in recommender systems performance, dealing with the dynamic property of the social network data is one of the key challenges that is remained to be addressed. In this research, a novel method is presented that provides social recommendations by incorporating the dynamic property of social network data in a heterogeneous graph by supplementing the graph with time span nodes that are used to define users long-term and short-term preferences over time. The second service that is proposed to add to Rumi services is a hashtag recommendation system that can help users label their posts quickly resulting in improved searchability of content. In recent years, several hashtag recommendation methods are proposed and developed to speed up processing of the texts and quickly find out the critical phrases. The methods use different approaches and techniques to obtain critical information from a large amount of data. This work investigates the efficiency of unsupervised keyword extraction methods for hashtag recommendation and recommends the one with the best performance to use in a hashtag recommender system. </p>
|
27 |
Granskning av examensarbetesrapporter med IBM Watson molntjänsterEriksson, Patrik, Wester, Philip January 2018 (has links)
Cloud services are one of the fast expanding fields of today. Companies such as Amazon, Google, Microsoft and IBM offer these cloud services in various forms. As this field progresses, the natural question occurs ”What can you do with the technology today?”. The technology offers scalability for hardware usage and user demands, that is attractive to developers and companies. This thesis tries to examine the applicability of cloud services, by combining it with the question: ”Is it possible to make an automated thesis examiner?” By narrowing down the services to IBM Watson web services, this thesis main question reads ”Is it possible to make an automated thesis examiner using IBM Watson?”. Hence the goal of this thesis was to create an automated thesis examiner. The project used a modified version of Bunge’s technological research method. Where amongst the first steps, a definition of an software thesis examiner for student theses was created. Then an empirical study of the Watson services, that seemed relevant from the literature study, proceeded. These empirical studies allowed a deeper understanding about the services’ practices and boundaries. From these implications and the definition of a software thesis examiner for student theses, an idea of how to build and implement an automated thesis examiner was created. Most of IBM Watson’s services were thoroughly evaluated, except for the service Machine Learning, that should have been studied further if the time resources would not have been depleted. This project found the Watson web services useful in many cases but did not find a service that was well suited for thesis examination. Although the goal was not reached, this thesis researched the Watson web services and can be used to improve understanding of its applicability, and for future implementations that face the provided definition. / Molntjänster är ett av de områden som utvecklas snabbast idag. Företag såsom Amazon, Google, Microsoft och IBM tillhandahåller dessa tjänster i flera former. Allteftersom utvecklingen tar fart, uppstår den naturliga frågan ”Vad kan man göra med den här tekniken idag?”. Tekniken erbjuder en skalbarhet mot använd hårdvara och antalet användare, som är attraktiv för utvecklare och företag. Det här examensarbetet försöker svara på hur molntjänster kan användas genom att kombinera det med frågan ”Är det möjligt att skapa en automatiserad examensarbetesrapportsgranskare?”. Genom att avgränsa undersökningen till IBM Watson molntjänster försöker arbetet huvudsakligen svara på huvudfrågan ”Är det möjligt att skapa en automatiserad examensarbetesrapportsgranskare med Watson molntjänster?”. Därmed var målet med arbetet att skapa en automatiserad examensarbetesrapportsgranskare. Projektet följde en modifierad version av Bunge’s teknologiska undersökningsmetod, där det första steget var att skapa en definition för en mjukvaruexamensarbetesrapportsgranskare följt av en utredning av de Watson molntjänster som ansågs relevanta från litteratur studien. Dessa undersöktes sedan vidare i empirisk studie. Genom de empiriska studierna skapades förståelse för tjänsternas tillämpligheter och begränsningar, för att kunna kartlägga hur de kan användas i en automatiserad examensarbetsrapportsgranskare. De flesta tjänster behandlades grundligt, förutom Machine Learning, som skulle behövt vidare undersökning om inte tidsresurserna tog slut. Projektet visar på att Watson molntjänster är användbara men inte perfekt anpassade för att granska examensarbetesrapporter. Även om inte målet uppnåddes, undersöktes Watson molntjänster, vilket kan ge förståelse för deras användbarhet och framtida implementationer för att möta den skapade definitionen.
|
28 |
Large language models as an interface to interact with API tools in natural languageTesfagiorgis, Yohannes Gebreyohannes, Monteiro Silva, Bruno Miguel January 2023 (has links)
In this research project, we aim to explore the use of Large Language Models (LLMs) as an interface to interact with API tools in natural language. Bubeck et al. [1] shed some light on how LLMs could be used to interact with API tools. Since then, new versions of LLMs have been launched and the question of how reliable a LLM can be in this task remains unanswered. The main goal of our thesis is to investigate the designs of the available system prompts for LLMs, identify the best-performing prompts, and evaluate the reliability of different LLMs when using the best-identified prompts. We will employ a multiple-stage controlled experiment: A literature review where we reveal the available system prompts used in the scientific community and open-source projects; then, using F1-score as a metric we will analyse the precision and recall of the system prompts aiming to select the best-performing system prompts in interacting with API tools; and in a latter stage, we compare a selection of LLMs with the best-performing prompts identified earlier. From these experiences, we realize that AI-generated system prompts perform better than the current prompts used in open-source and literature with GPT-4, zero-shot prompts have better performance in this specific task with GPT-4 and that a good system prompt in one model does not generalize well into other models.
|
29 |
Sentiment Analysis Of IMDB Movie Reviews : A comparative study of Lexicon based approach and BERT Neural Network modelDomadula, Prashuna Sai Surya Vishwitha, Sayyaparaju, Sai Sumanwita January 2023 (has links)
Background: Movies have become an important marketing and advertising tool that can influence consumer behaviour and trends. Reading film reviews is an im- important part of watching a movie, as it can help viewers gain a general under- standing of the film. And also, provide filmmakers with feedback on how their work is being received. Sentiment analysis is a method of determining whether a review has positive or negative sentiment, and this study investigates a machine learning method for classifying sentiment from film reviews. Objectives: This thesis aims to perform comparative sentiment analysis on textual IMDb movie reviews using lexicon-based and BERT neural network models. Later different performance evaluation metrics are used to identify the most effective learning model. Methods: This thesis employs a quantitative research technique, with data analysed using traditional machine learning. The labelled data set comes from an online website called Kaggle (https://www.kaggle.com/datasets), which contains movie review information. Algorithms like the lexicon-based approach and the BERT neural networks are trained using the chosen IMDb movie reviews data set. To discover which model performs the best at predicting the sentiment analysis, the constructed models will be assessed on the test set using evaluation metrics such as accuracy, precision, recall and F1 score. Results: From the conducted experimentation the BERT neural network model is the most efficient algorithm in classifying the IMDb movie reviews into positive and negative sentiments. This model achieved the highest accuracy score of 90.67% over the trained data set, followed by the BoW model with an accuracy of 79.15%, whereas the TF-IDF model has 78.98% accuracy. BERT model has the better precision and recall with 0.88 and 0.92 respectively, followed by both BoW and TF-IDF models. The BoW model has a precision and recall of 0.79 and the TF-IDF has a precision of 0.79 and a recall of 0.78. And also the BERT model has the highest F1 score of 0.88, followed by the BoW model having a F1 score of 0.79 whereas, TF-IDF has 0.78. Conclusions: Among the two models evaluated, the lexicon-based approach and the BERT transformer neural network, the BERT neural network is the most efficient, having a good performance score based on the measured performance criteria.
|
30 |
A Cloud Computing-based Dashboard for the Visualization of Motivational Interviewing MetricsHeng, E Jinq January 2022 (has links)
No description available.
|
Page generated in 0.1388 seconds