Global ETD Search

691	Functional linguistic based motivations for a conversational software agent Panesar, Kulvinder 07 October 2020 (has links) No / This chapter discusses a linguistically orientated model of a conversational software agent (CSA) (Panesar 2017) framework sensitive to natural language processing (NLP) concepts and the levels of adequacy of a functional linguistic theory (LT). We discuss the relationship between NLP and knowledge representation (KR), and connect this with the goals of a linguistic theory (Van Valin and LaPolla 1997), in particular Role and Reference Grammar (RRG) (Van Valin Jr 2005). We debate the advantages of RRG and consider its fitness and computational adequacy. We present a design of a computational model of the linking algorithm that utilises a speech act construction as a grammatical object (Nolan 2014a, Nolan 2014b) and the sub-model of belief, desire and intentions (BDI) (Rao and Georgeff 1995). This model has been successfully implemented in software, using the resource description framework (RDF), and we highlight some implementation issues that arose at the interface between language and knowledge representation (Panesar 2017). / The full-text of this article will be released for public view at the end of the publisher embargo on 27 Sep 2024. Linguistic based motivations Conversational software agent (CSA) Knowledge representation (KR)
692	The Art of Deep Connection - Towards Natural and Pragmatic Conversational Agent Interactions Ray, Arijit 12 July 2017 (has links) As research in Artificial Intelligence (AI) advances, it is crucial to focus on having seamless communication between humans and machines in order to effectively accomplish tasks. Smooth human-machine communication requires the machine to be sensible and human-like while interacting with humans, while simultaneously being capable of extracting the maximum information it needs to accomplish the desired task. Since a lot of the tasks required to be solved by machines today involve the understanding of images, training machines to have human-like and effective image-grounded conversations with humans is one important step towards achieving this goal. Although we now have agents that can answer questions asked for images, they are prone to failure from confusing input, and cannot ask clarification questions, in turn, to extract the desired information from humans. Hence, as a first step, we direct our efforts towards making Visual Question Answering agents human-like by making them resilient to confusing inputs that otherwise do not confuse humans. Not only is it crucial for a machine to answer questions reasonably, it should also know how to ask questions sequentially to extract the desired information it needs from a human. Hence, we introduce a novel game called the Visual 20 Questions Game, where a machine tries to figure out a secret image a human has picked by having a natural language conversation with the human. Using deep learning techniques like recurrent neural networks and sequence-to-sequence learning, we demonstrate scalable and reasonable performances on both the tasks. / Master of Science Computer Vision Natural Language Processing Conversational Agents Chatbots Deep learning (Machine learning) Machine Learning Artificial Intelligence
693	Natural language interface to a VHDL modeling tool Manek, Meenakshi 23 June 2009 (has links) This thesis describes a Natural Language (NL) interface to a VHDL modeling tool called the Modeler's Assistant. The primary motivation for the interface developed in this research work is to permit VLSI modelers who are not proficient in VHDL to rapidly produce correct VHDL models from manufacturer's descriptions. This tool should also be useful in teaching the VHDL language. The Modeler's Assistant has supported graphical capture of behavioral models in the form of Process Model Graphs consisting of processes (nodes) interconnected by signals (arcs). The NL interface that has been constructed allows modelers to specify the behavior for the process nodes using a restricted form of English called ModelSpeak. A Spell-checking routine (of the UNIX operating system) is invoked to reduce input errors. Also, the grammar employed, accepts multi-sentence descriptions rather than just a single sentence. Correct VHDL for each process is synthesized automatically, but user interaction is solicited where needed to resolve ambiguities such as the scope of loops and the type of signals and variables. The Modeler's Assistant can then assemble the VHDL code for these processes, along with the information about the interface description from the PMG, into a complete entity model. / Master of Science LD5655.V855 1993.M263
694	<b>EXPLORING ENSEMBLE MODELS AND GAN-BASED </b><b>APPROACHES FOR AUTOMATED DETECTION OF </b><b>MACHINE-GENERATED TEXT</b> Surbhi Sharma (18437877) 29 April 2024 (has links) <p dir="ltr">Automated detection of machine-generated text has become increasingly crucial in various fields such as cybersecurity, journalism, and content moderation due to the proliferation of generated content, including fake news, spam, and bot-generated comments. Traditional methods for detecting such content often rely on rule-based systems or supervised learning approaches, which may struggle to adapt to evolving generation techniques and sophisticated manipulations. In this thesis, we explore the use of ensemble models and Generative Adversarial Networks (GANs) for the automated detection of machine-generated text. </p><p dir="ltr">Ensemble models combine the strengths of different approaches, such as utilizing both rule-based systems and machine learning algorithms, to enhance detection accuracy and robustness. We investigate the integration of linguistic features, syntactic patterns, and semantic cues into machine learning pipelines, leveraging the power of Natural Language Processing (NLP) techniques. By combining multiple modalities of information, Ensemble models can effectively capture the subtle characteristics and nuances inherent in machine-generated text, improving detection performance. </p><p dir="ltr">In my latest experiments, I examined the performance of a Random Forest classifier trained on TF-IDF representations in combination with RoBERTa embeddings to calculate probabilities for machine-generated text detection. Test1 results showed promising accuracy rates, indicating the effectiveness of combining TF-IDF with RoBERTa probabilities. Test2 further validated these findings, demonstrating improved detection performance compared to standalone approaches.<br></p><p dir="ltr">These results suggest that leveraging Random Forest TF-IDF representation with RoBERTa embeddings to calculate probabilities can enhance the detection accuracy of machine-generated text.<br></p><p dir="ltr">Furthermore, we delve into the application of GAN-RoBERTa, a class of deep learning models comprising a generator and a discriminator trained adversarially, for generating and detecting machine-generated text. GANs have demonstrated remarkable capabilities in generating realistic text, making them a potential tool for adversaries to produce deceptive content. However, this same adversarial nature can be harnessed for detection purposes,<br>where the discriminator is trained to distinguish between genuine and machine-generated text.<br></p><p dir="ltr">Overall, our findings suggest that the use of Ensemble models and GAN-RoBERTa architectures holds significant promise for the automated detection of machine-generated text. Through a combination of diverse approaches and adversarial training techniques, we have demonstrated improved detection accuracy and robustness, thereby addressing the challenges posed by the proliferation of generated content across various domains. Further research and refinement of these approaches will be essential to stay ahead of evolving generation techniques and ensure the integrity and trustworthiness of textual content in the digital landscape.</p> Natural language processing Adversarial machine learning RoBERTa ensemble models built Adversarial Models
695	Increasing Accessibility of Electronic Theses and Dissertations (ETDs) Through Chapter-level Classification Jude, Palakh Mignonne 07 July 2020 (has links) Great progress has been made to leverage the improvements made in natural language processing and machine learning to better mine data from journals, conference proceedings, and other digital library documents. However, these advances do not extend well to book-length documents such as electronic theses and dissertations (ETDs). ETDs contain extensive research data; stakeholders -- including researchers, librarians, students, and educators -- can benefit from increased access to this corpus. Challenges arise while working with this corpus owing to the varied nature of disciplines covered as well as the use of domain-specific language. Prior systems are not tuned to this corpus. This research aims to increase the accessibility of ETDs by the automatic classification of chapters of an ETD using machine learning and deep learning techniques. This work utilizes an ETD-centric target classification system. It demonstrates the use of custom trained word and document embeddings to generate better vector representations of this corpus. It also describes a methodology to leverage extractive summaries of chapters of an ETD to aid in the classification process. Our findings indicate that custom embeddings and the use of summarization techniques can increase the performance of the classifiers. The chapter-level labels generated by this research help to identify the level of interdisciplinarity in the corpus. The automatic classifiers can also be further used in a search engine interface that would help users to find the most appropriate chapters. / Master of Science / Electronic Theses and Dissertations (ETDs) are submitted by students at the end of their academic study. These works contain research information pertinent to a given field. Increasing the accessibility of such documents will be beneficial to many stakeholders including students, researchers, librarians, and educators. In recent years, a great deal of research has been conducted to better extract information from textual documents with the use of machine learning and natural language processing. However, these advances have not been applied to increase the accessibility of ETDs. This research aims to perform the automatic classification of chapters extracted from ETDs. That will reduce the human effort required to label the key parts of these book-length documents. Additionally, when considered by search engines, such categorization can aid users to more easily find the chapters that are most relevant to their research. Electronic Theses and Dissertations Classification Machine learning Deep learning (Machine learning) Natural Language Processing
696	Optimizing TEE Protection by Automatically Augmenting Requirements Specifications Dhar, Siddharth 03 June 2020 (has links) An increasing number of software systems must safeguard their confidential data and code, referred to as critical program information (CPI). Such safeguarding is commonly accomplished by isolating CPI in a trusted execution environment (TEE), with the isolated CPI becoming a trusted computing base (TCB). TEE protection incurs heavy performance costs, as TEE-based functionality is expensive to both invoke and execute. Despite these costs, projects that use TEEs tend to have unnecessarily large TCBs. As based on our analysis, developers often put code and data into TEE for convenience rather than protection reasons, thus not only compromising performance but also reducing the effectiveness of TEE protection. In order for TEEs to provide maximum benefits for protecting CPI, their usage must be systematically incorporated into the entire software engineering process, starting from Requirements Engineering. To address this problem, we present a novel approach that incorporates TEEs in the Requirements Engineering phase by using natural language processing (NLP) to classify those software requirements that are security critical and should be isolated in TEE. Our approach takes as input a requirements specification and outputs a list of annotated software requirements. The annotations recommend to the developer which corresponding features comprise CPI that should be protected in a TEE. Our evaluation results indicate that our approach identifies CPI with a high degree of accuracy to incorporate safeguarding CPI into Requirements Engineering. / Master of Science / An increasing number of software systems must safeguard their confidential data like passwords, payment information, personal details, etc. This confidential information is commonly protected using a Trusted Execution Environment (TEE), an isolated environment provided by either the existing processor or separate hardware that interacts with the operating system to secure sensitive data and code. Unfortunately, TEE protection incurs heavy performance costs, with TEEs being slower than modern processors and frequent communication between the system and the TEE incurring heavy performance overhead. We discovered that developers often put code and data into TEE for convenience rather than protection purposes, thus not only hurting performance but also reducing the effectiveness of TEE protection. By thoroughly examining a project's features in the Requirements Engineering phase, which defines the project's functionalities, developers would be able to understand which features handle confidential data. To that end, we present a novel approach that incorporates TEEs in the Requirements Engineering phase by means of Natural Language Processing (NLP) tools to categorize the project requirements that may warrant TEE protection. Our approach takes as input a project's requirements and outputs a list of categorized requirements defining which requirements are likely to make use of confidential information. Our evaluation results indicate that our approach performs this categorization with a high degree of accuracy to incorporate safeguarding the confidentiality related features in the Requirements Engineering phase. Trusted Execution Environment Natural Language Processing Recommendation System Critical Program Information Requirements Specification
697	Building a Trustworthy Question Answering System for Covid-19 Tracking Liu, Yiqing 02 September 2021 (has links) During the unprecedented global pandemic of Covid-19, the general public is suffering from inaccurate Covid-19 related information including outdated information and fake news. The most used media: TV, social media, newspaper, and radio are incompetent in providing certitude and flash updates that people are seeking. In order to cope with this challenge, several public data resources that are dedicated to providing Covid-19 information were born. They rallied with experts from different fields to provide authoritative and up-to-date pandemic updates. However, the general public cannot still make complete use of such resources since the learning curve is too steep, especially for the aged and under-aged users. To address this problem, in this Thesis, we propose a question answering system that can be interacted with using simple natural language-based sentences. While building this system, we investigate qualified public data resources and from the data content they are providing, and we collect a set of frequently asked questions for Covid-19 tracking. We further build a dedicated dataset named CovidQA for evaluating the performance of the question answering system with different models. Based on the new dataset, we assess multiple machine learning-based models that are built for retrieving relevant information from databases, and then propose two empirical models which utilize the pre-defined templates to generate SQL queries. In our experiments, we demonstrate both quantitative and qualitative results and provide a comprehensive comparison between different types of methods. The results show that the proposed template-based methods are simple but effective in building question answering systems for specific domain problems. / Master of Science / During the unprecedented global pandemic of Covid-19, the general public is suffering from inaccurate Covid-19 related information including outdated information and fake news. The most used media: TV, social media, newspaper, and radio are incompetent in providing certitude and flash updates that people are seeking. In order to cope with this challenge, several public data resources that are dedicated to providing Covid-19 information were born. They rallied with experts from different fields to provide authoritative and up-to-date pandemic updates. However, there is room for improvement in terms of user experience. To address this problem, in this Thesis, we propose a system that can be interacted with using natural questions. While building this system, we evaluate and choose six qualified public data providers as the data sources. We further build a testing dataset for evaluating the performance of the system. We assess two Artificial Intelligence-powered models for the system, and then propose two rule-based models for the researched problem. In our experiments, we provide a comprehensive comparison between different types of methods. The results show that the proposed rule-based methods are simple but effective in building such systems. Information Retrieval Question Answering Database Machine Learning Natural Language Processing Healthcare Covid-19 Dashboard
698	Translating Sensory Perceptions: Existing and Emerging Methods of Collecting and Analyzing Flavor Data Hamilton, Leah Marie 28 April 2022 (has links) Food flavor is hugely important in motivating food choice and eating behavior. Unfortunately for research and communication about flavor, many languages' flavor vocabularies are notoriously variable and must be aligned before data collection using training or after the fact by researchers. This dissertation demonstrates one example of each approach (conventional descriptive analysis (DA) and labeled free sorting, respectively), and compares their use to emerging, computational natural language processing (NLP) methods that use large volumes of existing text data. Rapid methods that align flavor vocabulary after data collection are most similar to NLP, and with the development or improvement of some strategic tools, NLP is well-poised to further accelerate the analysis of existing text data or unaligned vocabularies. DA, while much more time-consuming, ensures that the researchers, tasters, and readers have a shared definition of any flavor words used, an advantage that all existing rapid methods lack. With a greater understanding of how this differs from everyday communication about flavor, future researchers may be able to replicate this aspect of DA in novel descriptive methods. This dissertation investigates the flavors of specialty beverages, specifically American whiskeys and cold brew coffees. American whiskeys differ from other whiskeys based on raw materials and aging practices, with the aging practices primarily setting them apart. While the most expensive American whiskeys are similar to Scotches and dominated by oaky, sultana-like flavors, only very rich consumers desire these flavors, with chocolate and caramel being the most widely preferred by most consumers. Degree of roasting has more of an impact on cold brew coffee flavor than the origin of the beans, and the coffee consumers surveyed here preferred dark roast to light roast cold brews. / Doctor of Philosophy / Food flavor is hugely important in motivating food choice and eating behavior. Unfortunately for research and communication about flavor, many languages' flavor vocabularies are notoriously inconsistent: flavor words may have more than one meaning, multiple words may mean the same thing, and people regularly make mistakes when naming flavors. To get around this, researchers can either train human tasters to use a fixed set of flavor words, or they can attempt to identify the flavors that people are talking about from their own-words descriptions. In this dissertation, I give examples of both of these methods and compare them to approaches based on machine learning and other computational techniques. This dissertation investigates the flavors of specialty beverages, specifically American whiskeys and cold brew coffees. American whiskeys differ from other whiskeys based on raw materials and aging practices, with the aging practices primarily setting them apart. Producers wanting to set their whiskeys apart with the use of specialty or heritage grains will likely need to work with breeders to develop new varieties that will impart special flavors to the whiskeys. While the most expensive American whiskeys are similar to Scotches and dominated by oaky, sultana-like flavors, only very rich consumers desire these flavors, with chocolate and caramel being the most widely preferred by most consumers. For cold brew coffees, degree of roasting has more of an impact on flavor than the origin of the beans, although a subset of people sense and prioritize origin-related flavor differences when making flavor groups. The coffee consumers surveyed here preferred dark roast to light roast cold brews, which suggests that different beans are ideal for making well-liked cold brew coffee than traditional hot brew. Sensory Science Descriptive Analysis Rapid Descriptive Methods Natural Language Processing Cold Brew Coffee Whiskey Flavor
699	Generating Canonical Sentences from Question-Answer Pairs of Deposition Transcripts Mehrotra, Maanav 15 September 2020 (has links) In the legal domain, documents of various types are created in connection with a particular case, such as testimony of people, transcripts, depositions, memos, and emails. Deposition transcripts are one such type of legal document, which consists of conversations between the different parties in the legal proceedings that are recorded by a court reporter. Court reporting has been traced back to 63 B.C. It has transformed from the initial scripts of ``Cuneiform", ``Running Script", and ``Grass Script" to Certified Access Real-time Translation (CART). Since the boom of digitization, there has been a shift to storing these in the PDF/A format. Deposition transcripts are in the form of question-answer (QA) pairs and can be quite lengthy for common people to read. This gives us a need to develop some automatic text-summarization method for the same. The present-day summarization systems do not support this form of text, entailing a need to process them. This creates a need to parse such documents and extract QA pairs as well as any relevant supporting information. These QA pairs can then be converted into complete canonical sentences, i.e., in a declarative form, from which we could extract some insights and use for further downstream tasks. This work investigates the same, as well as using deep-learning techniques for such transformations. / Master of Science / In the legal domain, documents of various types are created in connection with a particular case, such as the testimony of people, transcripts, memos, and emails. Deposition transcripts are one such type of legal document, which consists of conversations between a lawyer and one of the parties in the legal proceedings, captured by a court reporter. Since the boom of digitization, there has been a shift to storing these in the PDF/A format. Deposition transcripts are in the form of question-answer (QA) pairs and can be quite lengthy. Though automatic summarization could help, present-day systems do not work well with such texts. This creates a need to parse these documents and extract QA pairs as well as any relevant supporting information. The QA pairs can then be converted into canonical sentences, i.e., in a declarative form, from which we could extract some insights and support downstream tasks. This work describes these conversions, as well as using deep-learning techniques for such transformations. Natural Language Processing Deep learning (Machine learning) Legal Tech Legal Depositions
700	Larger-first partial parsing Van Delden, Sebastian Alexander 01 January 2003 (has links) (PDF) Larger-first partial parsing is a primarily top-down approach to partial parsing that is opposite to current easy-first, or primarily bottom-up, strategies. A rich partial tree structure is captured by an algorithm that assigns a hierarchy of structural tags to each of the input tokens in a sentence. Part-of-speech tags are first assigned to the words in a sentence by a part-of-speech tagger. A cascade of Deterministic Finite State Automata then uses this part-of-speech information to identify syntactic relations primarily in a descending order of their size. The cascade is divided into four specialized sections: (1) a Comma Network, which identifies syntactic relations associated with commas; (2) a Conjunction Network, which partially disambiguates phrasal conjunctions and llly disambiguates clausal conjunctions; (3) a Clause Network, which identifies non-comma-delimited clauses; and (4) a Phrase Network, which identifies the remaining base phrases in the sentence. Each automaton is capable of adding one or more levels of structural tags to the tokens in a sentence. The larger-first approach is compared against a well-known easy-first approach. The results indicate that this larger-first approach is capable of (1) producing a more detailed partial parse than an easy first approach; (2) providing better containment of attachment ambiguity; (3) handling overlapping syntactic relations; and (4) achieving a higher accuracy than the easy-first approach. The automata of each network were developed by an empirical analysis of several sources and are presented here in detail. Electrical and Computer Engineering Engineering

Search results