Global ETD Search

371	Flexible representation for genetic programming : lessons from natural language processing Nguyen, Xuan Hoai, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW January 2004 (has links) This thesis principally addresses some problems in genetic programming (GP) and grammar-guided genetic programming (GGGP) arising from the lack of operators able to make small and bounded changes on both genotype and phenotype space. It proposes a new and flexible representation for genetic programming, using a state-of-the-art formalism from natural language processing, Tree Adjoining Grammars (TAGs). It demonstrates that the new TAG-based representation possesses two important properties: non-fixed arity and locality. The former facilitates the design of new operators, including some which are bio-inspired, and others able to make small and bounded changes. The latter ensures that bounded changes in genotype space are reflected in bounded changes in phenotype space. With these two properties, the thesis shows how some well-known difficulties in standard GP and GGGP tree-based representations can be solved in the new representation. These difficulties have been previously attributed to the treebased nature of the representations; since TAG representation is also tree-based, it has enabled a more precise delineation of the causes of the difficulties. Building on the new representation, a new grammar guided GP system known as TAG3P has been developed, and shown to be competitive with other GP and GGGP systems. A new schema theorem, explaining the behaviour of TAG3P on syntactically constrained domains, is derived. Finally, the thesis proposes a new method for understanding performance differences between GP representations requiring different ways to bound the search space, eliminating the effects of the bounds through multi-objective approaches. Genetic programming grammar-guided genotype space natural language processing phenotype space tree adjoining grammars (TAGs)
372	Incremental knowledge acquisition for natural language processing Pham, Son Bao, Computer Science & Engineering, Faculty of Engineering, UNSW January 2006 (has links) Linguistic patterns have been used widely in shallow methods to develop numerous NLP applications. Approaches for acquiring linguistic patterns can be broadly categorised into three groups: supervised learning, unsupervised learning and manual methods. In supervised learning approaches, a large annotated training corpus is required for the learning algorithms to achieve decent results. However, annotated corpora are expensive to obtain and usually available only for established tasks. Unsupervised learning approaches usually start with a few seed examples and gather some statistics based on a large unannotated corpus to detect new examples that are similar to the seed ones. Most of these approaches either populate lexicons for predefined patterns or learn new patterns for extracting general factual information; hence they are applicable to only a limited number of tasks. Manually creating linguistic patterns has the advantage of utilising an expert's knowledge to overcome the scarcity of annotated data. In tasks with no annotated data available, the manual way seems to be the only choice. One typical problem that occurs with manual approaches is that the combination of multiple patterns, possibly being used at different stages of processing, often causes unintended side effects. Existing approaches, however, do not focus on the practical problem of acquiring those patterns but rather on how to use linguistic patterns for processing text. A systematic way to support the process of manually acquiring linguistic patterns in an efficient manner is long overdue. This thesis presents KAFTIE, an incremental knowledge acquisition framework that strongly supports experts in creating linguistic patterns manually for various NLP tasks. KAFTIE addresses difficulties in manually constructing knowledge bases of linguistic patterns, or rules in general, often faced in existing approaches by: (1) offering a systematic way to create new patterns while ensuring they are consistent; (2) alleviating the difficulty in choosing the right level of generality when creating a new pattern; (3) suggesting how existing patterns can be modified to improve the knowledge base's performance; (4) making the effort in creating a new pattern, or modifying an existing pattern, independent of the knowledge base's size. KAFTIE, therefore, makes it possible for experts to efficiently build large knowledge bases for complex tasks. This thesis also presents the KAFDIS framework for discourse processing using new representation formalisms: the level-of-detail tree and the discourse structure graph. Knowledge acquisition (Expert systems) Computational linguistics Semantics - Data processing
373	Efficient computation of advanced skyline queries. Yuan, Yidong, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) Skyline has been proposed as an important operator for many applications, such as multi-criteria decision making, data mining and visualization, and user-preference queries. Due to its importance, skyline and its computation have received considerable attention from database research community recently. All the existing techniques, however, focus on the conventional databases. They are not applicable to online computation environment, such as data stream. In addition, the existing studies consider efficiency of skyline computation only, while the fundamental problem on the semantics of skylines still remains open. In this thesis, we study three problems of skyline computation: (1) online computing skyline over data stream; (2) skyline cube computation and its analysis; and (3) top-k most representative skyline. To tackle the problem of online skyline computation, we develop a novel framework which converts more expensive multiple dimensional skyline computation to stabbing queries in 1-dimensional space. Based on this framework, a rigorous theoretical analysis of the time complexity of online skyline computation is provided. Then, efficient algorithms are proposed to support ad hoc and continuous skyline queries over data stream. Inspired by the idea of data cube, we propose a novel concept of skyline cube which consists of skylines of all possible non-empty subsets of a given full space. We identify the unique sharing strategies for skyline cube computation and develop two efficient algorithms which compute skyline cube in a bottom-up and top-down manner, respectively. Finally, a theoretical framework to answer the question about semantics of skyline and analysis of multidimensional subspace skyline are presented. Motived by the fact that the full skyline may be less informative because it generally consists of a large number of skyline points, we proposed a novel skyline operator -- top-k most representative skyline. The top-k most representative skyline operator selects the k skyline points so that the number of data points, which are dominated by at least one of these k skyline points, is maximized. To compute top-k most representative skyline, two efficient algorithms and their theoretical analysis are presented. Database management. Database design. Question-answering systems. Semantics - Data processing.
374	Lexical approaches to backoff in statistical parsing Lakeland, Corrin, n/a January 2006 (has links) This thesis develops a new method for predicting probabilities in a statistical parser so that more sophisticated probabilistic grammars can be used. A statistical parser uses a probabilistic grammar derived from a training corpus of hand-parsed sentences. The grammar is represented as a set of constructions - in a simple case these might be context-free rules. The probability of each construction in the grammar is then estimated by counting its relative frequency in the corpus. A crucial problem when building a probabilistic grammar is to select an appropriate level of granularity for describing the constructions being learned. The more constructions we include in our grammar, the more sophisticated a model of the language we produce. However, if too many different constructions are included, then our corpus is unlikely to contain reliable information about the relative frequency of many constructions. In existing statistical parsers two main approaches have been taken to choosing an appropriate granularity. In a non-lexicalised parser constructions are specified as structures involving particular parts-of-speech, thereby abstracting over individual words. Thus, in the training corpus two syntactic structures involving the same parts-of-speech but different words would be treated as two instances of the same event. In a lexicalised grammar the assumption is that the individual words in a sentence carry information about its syntactic analysis over and above what is carried by its part-of-speech tags. Lexicalised grammars have the potential to provide extremely detailed syntactic analyses; however, Zipf�s law makes it hard for such grammars to be learned. In this thesis, we propose a method for optimising the trade-off between informative and learnable constructions in statistical parsing. We implement a grammar which works at a level of granularity in between single words and parts-of-speech, by grouping words together using unsupervised clustering based on bigram statistics. We begin by implementing a statistical parser to serve as the basis for our experiments. The parser, based on that of Michael Collins (1999), contains a number of new features of general interest. We then implement a model of word clustering, which we believe is the first to deliver vector-based word representations for an arbitrarily large lexicon. Finally, we describe a series of experiments in which the statistical parser is trained using categories based on these word representations. parsing (computer grammar) computational linguistics linguistics statistical methods
375	An agent-based approach to dialogue management in personal assistants Nguyen, Thi Thuc Anh, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) Personal assistants need to allow the user to interact with the system in a flexible and adaptive way such as through spoken language dialogue. This research is aimed at achieving robust and effective dialogue management in such applications. We focus on an application, the Smart Personal Assistant (SPA), in which the user can use a variety of devices to interact with a collection of personal assistants, each specializing in a task domain. The current implementation of the SPA contains an e-mail management agent and a calendar agent that the user can interact with through a spoken dialogue and a graphical interface on PDAs. The user-system interaction is handled by a Dialogue Manager agent. We propose an agent-based approach that makes use of a BDI agent architecture for dialogue modelling and control. The Dialogue Manager agent of the SPA acts as the central point for maintaining coherent user-system interaction and coordinating the activities of the assistants. The dialogue model consists of a set of complex but modular plans for handling communicative goals. The dialogue control flow emerges automatically as the result of the agent???s plan selection by the BDI interpreter. In addition the Dialogue Manager maintains the conversational context, the domainspecific knowledge and the user model in its internal beliefs. We also consider the problem of dialogue adaptation in such agent-based dialogue systems. We present a novel way of integrating learning into a BDI architecture so that the agent can learn to select the most suitable plan among those applicable in the current context. This enables the Dialogue Manager agent to tailor its responses according to the conversational context and the user???s physical context, devices and preferences. Finally, we report the evaluation results, which indicate the robustness and effectiveness of the dialogue model in handling a range of users. spoken dialogues dialogue management BDI agents human-computer interaction natural language processing
376	Natural language program analysis combining natural language processing with program analysis to improve software maintenance tools / Shepherd, David. January 2007 (has links) Thesis (Ph.D.)--University of Delaware, 2007. / Principal faculty advisors: Lori L. Pollock and Vijay K. Shanker, Dept. of Computer & Information Sciences. Includes bibliographical references.
377	Efficient computation of advanced skyline queries. Yuan, Yidong, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) Skyline has been proposed as an important operator for many applications, such as multi-criteria decision making, data mining and visualization, and user-preference queries. Due to its importance, skyline and its computation have received considerable attention from database research community recently. All the existing techniques, however, focus on the conventional databases. They are not applicable to online computation environment, such as data stream. In addition, the existing studies consider efficiency of skyline computation only, while the fundamental problem on the semantics of skylines still remains open. In this thesis, we study three problems of skyline computation: (1) online computing skyline over data stream; (2) skyline cube computation and its analysis; and (3) top-k most representative skyline. To tackle the problem of online skyline computation, we develop a novel framework which converts more expensive multiple dimensional skyline computation to stabbing queries in 1-dimensional space. Based on this framework, a rigorous theoretical analysis of the time complexity of online skyline computation is provided. Then, efficient algorithms are proposed to support ad hoc and continuous skyline queries over data stream. Inspired by the idea of data cube, we propose a novel concept of skyline cube which consists of skylines of all possible non-empty subsets of a given full space. We identify the unique sharing strategies for skyline cube computation and develop two efficient algorithms which compute skyline cube in a bottom-up and top-down manner, respectively. Finally, a theoretical framework to answer the question about semantics of skyline and analysis of multidimensional subspace skyline are presented. Motived by the fact that the full skyline may be less informative because it generally consists of a large number of skyline points, we proposed a novel skyline operator -- top-k most representative skyline. The top-k most representative skyline operator selects the k skyline points so that the number of data points, which are dominated by at least one of these k skyline points, is maximized. To compute top-k most representative skyline, two efficient algorithms and their theoretical analysis are presented. Database management. Database design. Question-answering systems. Semantics - Data processing.
378	UNITRAN: An Interlingual Machine Translation System Dorr, Bonnie Jean 01 December 1987 (has links) This report describes the UNITRAN (UNIversal TRANslator) system, an implementation of a principle-based approach to natural language translation. The system is "interlingual", i.e., the model is based on universal principles that hold across all languages; the distinctions among languages are handled by settings of parameters associated with the universal principles. Interaction effects of linguistic principles are handled by the syste so that the programmer does not need to specifically spell out the details of rule applications. Only a small set of principles covers all languages; thus, the unmanageable grammar size of alternative approaches is no longer a problem. natural language processing parsing interlingual machinestranslation principles and parameters linguistic constraints sgeneration
379	Interactive Visualizations of Natural Language Collins, Christopher 06 August 2010 (has links) While linguistic skill is a hallmark of humanity, the increasing volume of linguistic data each of us faces is causing individual and societal problems — ‘information overload’ is a commonly discussed condition. Tasks such as finding the most appropriate information online, understanding the contents of a personal email repository, and translating documents from another language are now commonplace. These tasks need not cause stress and feelings of overload: the human intellectual capacity is not the problem. Rather, the computational interfaces to linguistic data are problematic — there exists a Linguistic Visualization Divide in the current state-of-the-art. Through five design studies, this dissertation combines sophisticated natural language processing algorithms with information visualization techniques grounded in evidence of human visuospatial capabilities. The first design study, Uncertainty Lattices, augments real-time computermediated communication, such as cross-language instant messaging chat and automatic speech recognition. By providing explicit indications of algorithmic confidence, the visualization enables informed decisions about the quality of computational outputs. Two design studies explore the space of content analysis. DocuBurst is an interactive visualization of document content, which spatially organizes words using an expert-created ontology. Broadening from single documents to document collections, Parallel Tag Clouds combine keyword extraction and coordinated visualizations to provide comparative overviews across subsets of a faceted text corpus. Finally, two studies address visualization for natural language processing research. The Bubble Sets visualization draws secondary set relations around arbitrary collections of items, such as a linguistic parse tree. From this design study we propose a theory of spatial rights to consider when assigning visual encodings to data. Expanding considerations of spatial rights, we present a formalism to organize the variety of approaches to coordinated and linked visualization, and introduce VisLink, a new method to relate and explore multiple 2d visualizations in 3d space. Intervisualization connections allow for cross-visualization queries and support high level comparison between visualizations. From the design studies we distill challenges common to visualizing language data, including maintaining legibility, supporting detailed reading, addressing data scale challenges, and managing problems arising from semantic ambiguity. visualization linguistics natural language processing information visualization machine translation 0984 0723
380	Exploiting Linguistic Knowledge to Infer Properties of Neologisms Cook, C. Paul 14 February 2011 (has links) Neologisms, or newly-coined words, pose problems for natural language processing (NLP) systems. Due to the recency of their coinage, neologisms are typically not listed in computational lexicons---dictionary-like resources that many NLP applications depend on. Therefore when a neologism is encountered in a text being processed, the performance of an NLP system will likely suffer due to the missing word-level information. Identifying and documenting the usage of neologisms is also a challenge in lexicography, the making of dictionaries. The traditional approach to these tasks has been to manually read a lot of text. However, due to the vast quantities of text being produced nowadays, particularly in electronic media such as blogs, it is no longer possible to manually analyze it all in search of neologisms. Methods for automatically identifying and inferring syntactic and semantic properties of neologisms would therefore address problems encountered in both natural language processing and lexicography. Because neologisms are typically infrequent due to their recent addition to the language, approaches to automatically learning word-level information relying on statistical distributional information are in many cases inappropriate. Moreover, neologisms occur in many domains and genres, and therefore approaches relying on domain-specific resources are also inappropriate. The hypothesis of this thesis is that knowledge about etymology---including word formation processes and types of semantic change---can be exploited for the acquisition of aspects of the syntax and semantics of neologisms. Evidence supporting this hypothesis is found in three case studies: lexical blends (e.g., "webisode" a blend of "web" and "episode"), text messaging forms (e.g., "any1" for "anyone"), and ameliorations and pejorations (e.g., the use of "sick" to mean `excellent', an amelioration). Moreover, this thesis presents the first computational work on lexical blends and ameliorations and pejorations, and the first unsupervised approach to text message normalization. Computer science Computational linguistics Natural language processing Lexical acquisition Neologisms 0984

Search results