Global ETD Search

51	Grammars for generating isiXhosa and isiZulu weather bulletin verbs Mahlaza, Zola January 2018 (has links) The Met Office has investigated the use of natural language generation (NLG) technologies to streamline the production of weather forecasts. Their approach would be of great benefit in South Africa because there is no fast and large scale producer, automated or otherwise, of textual weather summaries for Nguni languages. This is because of, among other things, the complexity of Nguni languages. The structure of these languages is very different from Indo-European languages, and therefore we cannot reuse existing technologies that were developed for the latter group. Traditional NLG techniques such as templates are not compatible with 'Bantu' languages, and existing works that document scaled-down 'Bantu' language grammars are also not sufficient to generate weather text. In pursuance of generating weather text in isiXhosa and isiZulu - we restricted our text to only verbs in order to ensure a manageable scope. In particular, we have developed a corpus of weather sentences in order to determine verb features. We then created context free verbal grammar rules using an incremental approach. The quality of these rules was evaluated using two linguists. We then investigated the grammatical similarity of isiZulu verbs with their isiXhosa counterparts, and the extent to which a singular merged set of grammar rules can be used to produce correct verbs for both languages. The similarity analysis of the two languages was done through the developed rules' parse trees, and by applying binary similarity measures on the sets of verbs generated by the rules. The parse trees show that the differences between the verb's components are minor, and the similarity measures indicate that the verb sets are at most 59.5% similar (Driver-Kroeber metric). We also examined the importance of the phonological conditioning process by developing functions that calculate the ratio of verbs that will require conditioning out of the total strings that can be generated. We have found that the phonological conditioning process affects at least 45% of strings for isiXhosa, and at least 67% of strings for isiZulu depending on the type of verb root that is used. Overall, this work shows that the differences between isiXhosa and isiZulu verbs are minor, however, the exploitation of these similarities for the goal of creating a unified rule set for both languages cannot be achieved without significant maintainability compromises because there are dependencies that exist in one language and not the other between the verb's 'modules'. Furthermore, the phonological conditioning process should be implemented in order to improve generated text due to the high ratio of verbs it affects. Natural Language Generation Computational Linguistics
52	Coping with Missing and Incomplete Information in Natural Language Processing with Applications in Sentiment Analysis and Entity Matching Schneider, Andrew Thomas January 2020 (has links) Much work in Natural Language Processing (NLP) is broadly concerned with extracting useful information from unstructured text passages. In recent years there has been an increased focus on informal writing as is found in online venues such as Twitter and Yelp. Processing this text introduces additional difficulties for NLP techniques, for example, many of the terms may be unknown due to rapidly changing vocabulary usage. A straightforward NLP approach will not have any capability of using the information these terms provide. In such \emph{information poor} environments of missing and incomplete information, it is necessary to develop novel, clever methods for leveraging the information we have explicitly available to unlock key nuggets of implicitly available information. In this work we explore several such methods and how they can collectively help to improve NLP techniques in general, with a focus on Sentiment Analysis (SA) and Entity Matching (EM). The problem of SA is that of identifying the polarity (positive, negative, neutral) of a speaker or author towards the topic of a given piece of text. SA can focus on various levels of granularity. These include finding the overall sentiment of a long text document, finding the sentiment of individual sentences or phrases, or finding the sentiment directed toward specific entities and their aspects (attributes). The problem of EM, also known as Record Linkage, is the problem of determining records from independent and uncooperative data sources that refer to the same real-world entities. Traditional approaches to EM have used the record representation of entities to accomplish this task. With the nascence of social media, entities on the Web are now accompanied by user generated content, which allows us to apply NLP solutions to the problem. We investigate specifically the following aspects of NLP for missing and incomplete information: (1) Inferring a sentiment polarity (i.e., the positive, negative, and neutral composition) of new terms. (2) Inferring a representation of new vocabulary terms that allows us to compare these terms with known terms in regards to their meaning and sentiment orientation. This idea can be further expanded to derive the representation of larger chunks of text, such as multi-word phrases. (3) Identifying key attributes of highly salient sentiment bearing passages that allow us to identify such sections of a document, even when the complete text is not analyzable. (4) Using text based methods to match corresponding entities (e.g., restaurants or hotels) from independent data sources that may miss key identifying attributes such as names or addresses. / Computer and Information Science Computer Science Natural Language Processing
53	Generative Chatbot Framework for Cybergrooming Prevention Wang, Pei 20 December 2021 (has links) Cybergrooming refers to the crime of establishing personal close relationships with potential victims, commonly teens, for the purpose of sexual exploitation or abuse via online social media platforms. Cybergrooming has been recognized as a serious social problem. However, there have been insufficient programs to provide proactive prevention to protect the youth users from cybergrooming. In this thesis, we present a generative chatbot framework, called SERI (Stop cybERgroomIng), that can generate simulated conversations between a perpetrator chatbot and a potential victim chatbot. To realize the simulation of authentic conversations in the context of cybergrooming, we take deep reinforcement learning (DRL)-based dialogue generation to simulate the authentic conversations between a perpetrator and a potential victim. The design and development of the SERI are motivated to provide a safe and authentic chatting environment to enhance the youth's precautionary awareness and sensitivity of cybergrooming while any unnecessary ethical issues (e.g., the potential misuse of the SERI) are removed or minimized. We developed the SERI as a preliminary platform that the perpetrator chatbot can be deployed in social media environments to interact with human users (i.e., youth) and observe the conversations that the youth users respond to strangers or acquaintances when they are asked for private or sensitive information by the perpetrator. We evaluated the quality of conversations generated by the SERI based on open-source, referenced, and unreferenced metrics as well as human evaluation. The evaluation results show that the SERI can generate authentic conversations between two chatbots compared to the original conversations from the used datasets in perplexity and MaUde scores. / Master of Science / Cybergrooming refers to the crime of building personal close relationships with potential victims, especially youth users such as children and teenagers, for the purpose of sexual exploitation or abuse via online social media platforms. Cybergrooming has been recognized as a serious social problem. However, there have been insufficient methods to provide proactive protection for the youth users from cybergrooming. In this thesis, we present a generative chatbot framework, called SERI (Stop cybERgroomIng), that can generate simulated authentic conversations between a perpetrator chatbot and a potential victim chatbot by applying advanced natural language generation models. The design and development of the SERI are motivated to ensure a safe and authentic environment to strengthen the youth's precautionary awareness and sensitivity of cybergrooming while any unnecessary ethical issues (e.g., the potential misuse of the SERI) are removed or minimized. We used different metrics and methods to evaluate the quality of conversations generated by the SERI. The evaluation results show that the SERI can generate authentic conversations between two chatbots compared to the original conversations from the used datasets. Cybergrooming Natural Language Processing Chatbot
54	Enhancing Accessibility in Black-Box Attack Research with BinarySelect.pdf Shatarupa Ghosh (18438924) 28 April 2024 (has links) <p>Adversarial text attack research is crucial for evaluating NLP model robustness and addressing privacy concerns. However, the increasing complexity of transformer and pretrained</p> <p>language models has led to significant time and resource requirements for training and testing. This challenge is particularly pronounced in black-box attacks, where hundreds</p> <p>or thousands of queries may be needed to identify critical words leveraged by the target model. To overcome this, we introduce BinarySelect, a novel method combining binary search</p> <p>with adversarial attack techniques to reduce query numbers significantly while maintaining attack effectiveness. Our experiments show that BinarySelect requires far fewer queries than traditional methods, making adversarial attack research more accessible to researchers with limited resources. We demonstrate the efficacy of BinarySelect across multiple datasets and classifiers, showcasing its potential for efficient adversarial attack exploration and addressing related black-box challenges.</p> Natural language processing binary algorithms
55	Use of Assembly Inspired Instructions in the Allowance of Natural Language Processing in ROS Kakusa, Takondwa Lisungu 08 August 2018 (has links) Natural Language processing is a growing field and widely used in both industrial and and commercial cases. Though it is difficult to create a natural language system that can robustly react to and handle every situation it is quite possible to design the system to react to specific instruction or scenario. The problem with current natural language systems used in machines, though, is that they are focused on single instructions, working to complete the instruction given then waiting for the next instruction. In this way they are not set to respond to possible conditions that are explained to them. In the system designed and explained in this thesis, the goal is to fix this problem by introducing a method of adjusting to these conditions. The contributions made in this thesis are to design a set of instruction types that can be used in order to allow for conditional statements within natural language instructions. To create a modular system using ROS in order to allow for more robust communication and integration. Finally, the goal is to also allow for an interconnection between the written text and derived instructions that will make the sentence construction more seamless and natural for the user. The work in this thesis will be limited in its focus to pertaining to the objective of obstacle traversal. The ideas and methodology, though, can be seen to extend into future work in the area. / Master of Science / With the growth of natural language processing and the development of artificial intelligence, it is important to take a look how to best allow these to work together. The main goal of this project is to find a way of integrating natural language so that it can be used in order to program a robot and in so doing, develop a method of translating that is not only efficient but also easy to understand. We have found we can accomplish this by creating a system that not only creates a direct correlation between the sentence and the instruction generated for the robot to understand, but also one that is able to break down complex sentences and paragraphs into multiple different instructions. This allows for a larger amount of robustness in the system. Natural Language Processing Robotics ROS
56	Natural Language Processing of Stories Kaley Rittichier (12474468) 28 April 2022 (has links) <p>In this thesis, I deal with the task of computationally processing stories with a focus on multidisciplinary ends, specifically in Digital Humanities and Cultural Analytics. In the process, I collect, clean, investigate, and predict from two datasets. The first is a dataset of 2,302 open-source literary works categorized by the time period they are set in. These works were all collected from Project Gutenberg. The classification of the time period in which the work is set was discovered by collecting and inspecting Library of Congress subject classifications, Wikipedia Categories, and literary factsheets from SparkNotes. The second is a dataset of 6,991 open-source literary works categorized by the hierarchical location the work is set in; these labels were constructed from Library of Congress subject classifications and SparkNotes factsheets. These datasets are the first of their kind and can help move forward an understanding of 1) the presentation of settings in stories and 2) the effect the settings have on our understanding of the stories.</p> Natural language processing Stories Natural Language Processing Story Setting Digital Humanities Cultural Analytics Natural Language Processing
57	Natural Language Interfaces to Databases Chandra, Yohan 12 1900 (has links) Natural language interfaces to databases (NLIDB) are systems that aim to bridge the gap between the languages used by humans and computers, and automatically translate natural language sentences to database queries. This thesis proposes a novel approach to NLIDB, using graph-based models. The system starts by collecting as much information as possible from existing databases and sentences, and transforms this information into a knowledge base for the system. Given a new question, the system will use this knowledge to analyze and translate the sentence into its corresponding database query statement. The graph-based NLIDB system uses English as the natural language, a relational database model, and SQL as the formal query language. In experiments performed with natural language questions ran against a large database containing information about U.S. geography, the system showed good performance compared to the state-of-the-art in the field. User interfaces (Computer systems) Database searching. natural language database natural language interface information extraction
58	Understanding the Importance of Entities and Roles in Natural Language Inference : A Model and Datasets January 2019 (has links) abstract: In this thesis, I present two new datasets and a modification to the existing models in the form of a novel attention mechanism for Natural Language Inference (NLI). The new datasets have been carefully synthesized from various existing corpora released for different tasks. The task of NLI is to determine the possibility of a sentence referred to as “Hypothesis” being true given that another sentence referred to as “Premise” is true. In other words, the task is to identify whether the “Premise” entails, contradicts or remains neutral with regards to the “Hypothesis”. NLI is a precursor to solving many Natural Language Processing (NLP) tasks such as Question Answering and Semantic Search. For example, in Question Answering systems, the question is paraphrased to form a declarative statement which is treated as the hypothesis. The options are treated as the premise. The option with the maximum entailment score is considered as the answer. Considering the applications of NLI, the importance of having a strong NLI system can't be stressed enough. Many large-scale datasets and models have been released in order to advance the field of NLI. While all of these models do get good accuracy on the test sets of the datasets they were trained on, they fail to capture the basic understanding of “Entities” and “Roles”. They often make the mistake of inferring that “John went to the market.” from “Peter went to the market.” failing to capture the notion of “Entities”. In other cases, these models don't understand the difference in the “Roles” played by the same entities in “Premise” and “Hypothesis” sentences and end up wrongly inferring that “Peter drove John to the stadium.” from “John drove Peter to the stadium.” The lack of understanding of “Roles” can be attributed to the lack of such examples in the various existing datasets. The reason for the existing model’s failure in capturing the notion of “Entities” is not just due to the lack of such examples in the existing NLI datasets. It can also be attributed to the strict use of vector similarity in the “word-to-word” attention mechanism being used in the existing architectures. To overcome these issues, I present two new datasets to help make the NLI systems capture the notion of “Entities” and “Roles”. The “NER Changed” (NC) dataset and the “Role-Switched” (RS) dataset contains examples of Premise-Hypothesis pairs that require the understanding of “Entities” and “Roles” respectively in order to be able to make correct inferences. This work shows how the existing architectures perform poorly on the “NER Changed” (NC) dataset even after being trained on the new datasets. In order to help the existing architectures, understand the notion of “Entities”, this work proposes a modification to the “word-to-word” attention mechanism. Instead of relying on vector similarity alone, the modified architectures learn to incorporate the “Symbolic Similarity” as well by using the Named-Entity features of the Premise and Hypothesis sentences. The new modified architectures not only perform significantly better than the unmodified architectures on the “NER Changed” (NC) dataset but also performs as well on the existing datasets. / Dissertation/Thesis / Masters Thesis Computer Science 2019 Artificial intelligence Computer science Information technology Artificial Intelligence Deep Learning Entailment Natural Language Inference Natural Language Processing Natural Language Understanding
59	Multiple knowledge sources for word sense disambiguation Stevenson, Robert Mark January 1999 (has links) No description available. 006.3
60	Applicability analysis of computation double entendre humor recognition with machine learning methods Johansson, David January 2016 (has links) No description available. Natural language processing computational humor machine learning

Search results