Global ETD Search

381	Flexible representation for genetic programming : lessons from natural language processing Nguyen, Xuan Hoai, Information Technology & Electrical Engineering, Australian Defence Force Academy, UNSW January 2004 (has links) This thesis principally addresses some problems in genetic programming (GP) and grammar-guided genetic programming (GGGP) arising from the lack of operators able to make small and bounded changes on both genotype and phenotype space. It proposes a new and flexible representation for genetic programming, using a state-of-the-art formalism from natural language processing, Tree Adjoining Grammars (TAGs). It demonstrates that the new TAG-based representation possesses two important properties: non-fixed arity and locality. The former facilitates the design of new operators, including some which are bio-inspired, and others able to make small and bounded changes. The latter ensures that bounded changes in genotype space are reflected in bounded changes in phenotype space. With these two properties, the thesis shows how some well-known difficulties in standard GP and GGGP tree-based representations can be solved in the new representation. These difficulties have been previously attributed to the treebased nature of the representations; since TAG representation is also tree-based, it has enabled a more precise delineation of the causes of the difficulties. Building on the new representation, a new grammar guided GP system known as TAG3P has been developed, and shown to be competitive with other GP and GGGP systems. A new schema theorem, explaining the behaviour of TAG3P on syntactically constrained domains, is derived. Finally, the thesis proposes a new method for understanding performance differences between GP representations requiring different ways to bound the search space, eliminating the effects of the bounds through multi-objective approaches. Genetic programming grammar-guided genotype space natural language processing phenotype space tree adjoining grammars (TAGs)
382	Incremental knowledge acquisition for natural language processing Pham, Son Bao, Computer Science & Engineering, Faculty of Engineering, UNSW January 2006 (has links) Linguistic patterns have been used widely in shallow methods to develop numerous NLP applications. Approaches for acquiring linguistic patterns can be broadly categorised into three groups: supervised learning, unsupervised learning and manual methods. In supervised learning approaches, a large annotated training corpus is required for the learning algorithms to achieve decent results. However, annotated corpora are expensive to obtain and usually available only for established tasks. Unsupervised learning approaches usually start with a few seed examples and gather some statistics based on a large unannotated corpus to detect new examples that are similar to the seed ones. Most of these approaches either populate lexicons for predefined patterns or learn new patterns for extracting general factual information; hence they are applicable to only a limited number of tasks. Manually creating linguistic patterns has the advantage of utilising an expert's knowledge to overcome the scarcity of annotated data. In tasks with no annotated data available, the manual way seems to be the only choice. One typical problem that occurs with manual approaches is that the combination of multiple patterns, possibly being used at different stages of processing, often causes unintended side effects. Existing approaches, however, do not focus on the practical problem of acquiring those patterns but rather on how to use linguistic patterns for processing text. A systematic way to support the process of manually acquiring linguistic patterns in an efficient manner is long overdue. This thesis presents KAFTIE, an incremental knowledge acquisition framework that strongly supports experts in creating linguistic patterns manually for various NLP tasks. KAFTIE addresses difficulties in manually constructing knowledge bases of linguistic patterns, or rules in general, often faced in existing approaches by: (1) offering a systematic way to create new patterns while ensuring they are consistent; (2) alleviating the difficulty in choosing the right level of generality when creating a new pattern; (3) suggesting how existing patterns can be modified to improve the knowledge base's performance; (4) making the effort in creating a new pattern, or modifying an existing pattern, independent of the knowledge base's size. KAFTIE, therefore, makes it possible for experts to efficiently build large knowledge bases for complex tasks. This thesis also presents the KAFDIS framework for discourse processing using new representation formalisms: the level-of-detail tree and the discourse structure graph. Knowledge acquisition (Expert systems) Computational linguistics Semantics - Data processing
383	Efficient computation of advanced skyline queries. Yuan, Yidong, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) Skyline has been proposed as an important operator for many applications, such as multi-criteria decision making, data mining and visualization, and user-preference queries. Due to its importance, skyline and its computation have received considerable attention from database research community recently. All the existing techniques, however, focus on the conventional databases. They are not applicable to online computation environment, such as data stream. In addition, the existing studies consider efficiency of skyline computation only, while the fundamental problem on the semantics of skylines still remains open. In this thesis, we study three problems of skyline computation: (1) online computing skyline over data stream; (2) skyline cube computation and its analysis; and (3) top-k most representative skyline. To tackle the problem of online skyline computation, we develop a novel framework which converts more expensive multiple dimensional skyline computation to stabbing queries in 1-dimensional space. Based on this framework, a rigorous theoretical analysis of the time complexity of online skyline computation is provided. Then, efficient algorithms are proposed to support ad hoc and continuous skyline queries over data stream. Inspired by the idea of data cube, we propose a novel concept of skyline cube which consists of skylines of all possible non-empty subsets of a given full space. We identify the unique sharing strategies for skyline cube computation and develop two efficient algorithms which compute skyline cube in a bottom-up and top-down manner, respectively. Finally, a theoretical framework to answer the question about semantics of skyline and analysis of multidimensional subspace skyline are presented. Motived by the fact that the full skyline may be less informative because it generally consists of a large number of skyline points, we proposed a novel skyline operator -- top-k most representative skyline. The top-k most representative skyline operator selects the k skyline points so that the number of data points, which are dominated by at least one of these k skyline points, is maximized. To compute top-k most representative skyline, two efficient algorithms and their theoretical analysis are presented. Database management. Database design. Question-answering systems. Semantics - Data processing.
384	Lexical approaches to backoff in statistical parsing Lakeland, Corrin, n/a January 2006 (has links) This thesis develops a new method for predicting probabilities in a statistical parser so that more sophisticated probabilistic grammars can be used. A statistical parser uses a probabilistic grammar derived from a training corpus of hand-parsed sentences. The grammar is represented as a set of constructions - in a simple case these might be context-free rules. The probability of each construction in the grammar is then estimated by counting its relative frequency in the corpus. A crucial problem when building a probabilistic grammar is to select an appropriate level of granularity for describing the constructions being learned. The more constructions we include in our grammar, the more sophisticated a model of the language we produce. However, if too many different constructions are included, then our corpus is unlikely to contain reliable information about the relative frequency of many constructions. In existing statistical parsers two main approaches have been taken to choosing an appropriate granularity. In a non-lexicalised parser constructions are specified as structures involving particular parts-of-speech, thereby abstracting over individual words. Thus, in the training corpus two syntactic structures involving the same parts-of-speech but different words would be treated as two instances of the same event. In a lexicalised grammar the assumption is that the individual words in a sentence carry information about its syntactic analysis over and above what is carried by its part-of-speech tags. Lexicalised grammars have the potential to provide extremely detailed syntactic analyses; however, Zipf�s law makes it hard for such grammars to be learned. In this thesis, we propose a method for optimising the trade-off between informative and learnable constructions in statistical parsing. We implement a grammar which works at a level of granularity in between single words and parts-of-speech, by grouping words together using unsupervised clustering based on bigram statistics. We begin by implementing a statistical parser to serve as the basis for our experiments. The parser, based on that of Michael Collins (1999), contains a number of new features of general interest. We then implement a model of word clustering, which we believe is the first to deliver vector-based word representations for an arbitrarily large lexicon. Finally, we describe a series of experiments in which the statistical parser is trained using categories based on these word representations. parsing (computer grammar) computational linguistics linguistics statistical methods
385	An agent-based approach to dialogue management in personal assistants Nguyen, Thi Thuc Anh, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) Personal assistants need to allow the user to interact with the system in a flexible and adaptive way such as through spoken language dialogue. This research is aimed at achieving robust and effective dialogue management in such applications. We focus on an application, the Smart Personal Assistant (SPA), in which the user can use a variety of devices to interact with a collection of personal assistants, each specializing in a task domain. The current implementation of the SPA contains an e-mail management agent and a calendar agent that the user can interact with through a spoken dialogue and a graphical interface on PDAs. The user-system interaction is handled by a Dialogue Manager agent. We propose an agent-based approach that makes use of a BDI agent architecture for dialogue modelling and control. The Dialogue Manager agent of the SPA acts as the central point for maintaining coherent user-system interaction and coordinating the activities of the assistants. The dialogue model consists of a set of complex but modular plans for handling communicative goals. The dialogue control flow emerges automatically as the result of the agent???s plan selection by the BDI interpreter. In addition the Dialogue Manager maintains the conversational context, the domainspecific knowledge and the user model in its internal beliefs. We also consider the problem of dialogue adaptation in such agent-based dialogue systems. We present a novel way of integrating learning into a BDI architecture so that the agent can learn to select the most suitable plan among those applicable in the current context. This enables the Dialogue Manager agent to tailor its responses according to the conversational context and the user???s physical context, devices and preferences. Finally, we report the evaluation results, which indicate the robustness and effectiveness of the dialogue model in handling a range of users. spoken dialogues dialogue management BDI agents human-computer interaction natural language processing
386	Efficient computation of advanced skyline queries. Yuan, Yidong, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) Skyline has been proposed as an important operator for many applications, such as multi-criteria decision making, data mining and visualization, and user-preference queries. Due to its importance, skyline and its computation have received considerable attention from database research community recently. All the existing techniques, however, focus on the conventional databases. They are not applicable to online computation environment, such as data stream. In addition, the existing studies consider efficiency of skyline computation only, while the fundamental problem on the semantics of skylines still remains open. In this thesis, we study three problems of skyline computation: (1) online computing skyline over data stream; (2) skyline cube computation and its analysis; and (3) top-k most representative skyline. To tackle the problem of online skyline computation, we develop a novel framework which converts more expensive multiple dimensional skyline computation to stabbing queries in 1-dimensional space. Based on this framework, a rigorous theoretical analysis of the time complexity of online skyline computation is provided. Then, efficient algorithms are proposed to support ad hoc and continuous skyline queries over data stream. Inspired by the idea of data cube, we propose a novel concept of skyline cube which consists of skylines of all possible non-empty subsets of a given full space. We identify the unique sharing strategies for skyline cube computation and develop two efficient algorithms which compute skyline cube in a bottom-up and top-down manner, respectively. Finally, a theoretical framework to answer the question about semantics of skyline and analysis of multidimensional subspace skyline are presented. Motived by the fact that the full skyline may be less informative because it generally consists of a large number of skyline points, we proposed a novel skyline operator -- top-k most representative skyline. The top-k most representative skyline operator selects the k skyline points so that the number of data points, which are dominated by at least one of these k skyline points, is maximized. To compute top-k most representative skyline, two efficient algorithms and their theoretical analysis are presented. Database management. Database design. Question-answering systems. Semantics - Data processing.
387	The Use of Case-Based Reasoning in a Human-Robot Dialog System Eliasson, Karolina January 2006 (has links) <p>As long as there have been computers, one goal has been to be able to communicate with them using natural language. It has turned out to be very hard to implement a dialog system that performs as well as a human being in an unrestricted domain, hence most dialog systems today work in small, restricted domains where the permitted dialog is fully controlled by the system.</p><p>In this thesis we present two dialog systems for communicating with an autonomous agent:</p><p>The first system, the WITAS RDE, focuses on constructing a simple and failsafe dialog system including a graphical user interface with multimodality features, a dialog manager, a simulator, and development infrastructures that provides the services that are needed for the development, demonstration, and validation of the dialog system. The system has been tested during an actual flight connected to an unmanned aerial vehicle.</p><p>The second system, CEDERIC, is a successor of the dialog manager in the WITAS RDE. It is equipped with a built-in machine learning algorithm to be able to learn new phrases and dialogs over time using past experiences, hence the dialog is not necessarily fully controlled by the system. It also includes a discourse model to be able to keep track of the dialog history and topics, to resolve references and maintain subdialogs. CEDERIC has been evaluated through simulation tests and user tests with good results.</p> / Report code: LiU{Tek{Lic{2006:29. Dialog manager machine learning case-based reasoning case-based planning helicopter natural language Computer science Datalogi
388	Observations on Cognitive Judgments McAllester, David 01 December 1991 (has links) It is obvious to anyone familiar with the rules of the game of chess that a king on an empty board can reach every square. It is true, but not obvious, that a knight can reach every square. Why is the first fact obvious but the second fact not? This paper presents an analytic theory of a class of obviousness judgments of this type. Whether or not the specifics of this analysis are correct, it seems that the study of obviousness judgments can be used to construct integrated theories of linguistics, knowledge representation, and inference. obviousness automated reasoning natural language smathematical induction theorem proving tractable inference
389	Designing and Evaluating Human-Robot Communication : Informing Design through Analysis of User Interaction Green, Anders January 2009 (has links) This thesis explores the design and evaluation of human-robot communication for service robots that use natural language to interact with people. The research is centred around three themes: design of human-robot communication; evaluation of miscommunication in human-robot communication; and the analysis of spatial influence as empiric phenomenon and design element. The method has been to put users in situations of future use through means of Hi-fi simulation. Several scenarios were enacted using the Wizard-of-Oz technique: a robot intended for fetch- and carry services in an office environment; and a robot acting in what can be characterised as a home tour, where the user teaches objects and locations to the robot. Using these scenarios a corpus of human-robot communication was developed and analysed. The analysis of the communicative behaviours led to the following observations: the users communicate with the robot in order to solve a main task goal. In order to fulfil this goal they overtake service actions that the robot is incapable of. Once users have understood that the robot is capable of performing actions, they explore its capabilities. During the interactions the users continuously monitor the behaviour of the robot, attempting to elicit feedback or to draw its perceptual attention to the users’ communicative behaviour. Information related to the communicative status of the robot seems to have a fundamental impact on the quality of interaction. Large portions of the miscommunication that occurs in the analysed scenarios can be attributed to ill-timed, lacking or irrelevant feedback from the robot. The analysis of the corpus data also showed that the users’ spatial behaviour seemed to be influenced by the robot’s communicative behaviour, embodiment and positioning. This means that we in robot design can consider the use strategies for spatial prompting to influence the users’ spatial behaviour. The understanding of the importance of continuously providing information of the communicative status of the robot to it’s users leaves us with an intriguing design challenge for the future: When designing communication for a service robot we need to design communication for the robot work tasks; and simultaneously, provide information based on the systems communicative status to continuously make users aware of the robots communicative capability. / QC 20100714 Human-Robot Interaction Human-Robot Communication Natural language user interfaces Interaction Design Computer science Datavetenskap
390	A Computational Approach to the Analysis and Generation of Emotion in Text Keshtkar, Fazel 09 August 2011 (has links) Sentiment analysis is a field of computational linguistics involving identification, extraction, and classification of opinions, sentiments, and emotions expressed in natural language. Sentiment classification algorithms aim to identify whether the author of a text has a positive or a negative opinion about a topic. One of the main indicators which help to detect the opinion are the words used in the texts. Needless to say, the sentiments expressed in the texts also depend on the syntactic structure and the discourse context. Supervised machine learning approaches to sentiment classification were shown to achieve good results. Classifying texts by emotions requires finer-grained analysis than sentiment classification. In this thesis, we explore the task of emotion and mood classification for blog postings. We propose a novel approach that uses the hierarchy of possible moods to achieve better results than a standard flat classification approach. We also show that using sentiment orientation features improves the performance of classification. We used the LiveJournal blog corpus as a dataset to train and evaluate our method. Another contribution of this work is extracting paraphrases for emotion terms based on the six basics emotions proposed by Ekman (\textit{happiness, anger, sadness, disgust, surprise, fear}). Paraphrases are different ways to express the same information. Algorithms to extract and automatically identify paraphrases are of interest from both linguistic and practical points of view. Our paraphrase extraction method is based on a bootstrapping algorithms that starts with seed words. Unlike in previous work, our algorithm does not need a parallel corpus. In Natural Language Generation (NLG), paraphrasing is employed to create more varied and natural text. In our research, we extract paraphrases for emotions, with the goal of using them to automatically generate emotional texts (such as friendly or hostile texts) for conversations between intelligent agents and characters in educational games. Nowadays, online services are popular in many disciplines such as: e-learning, interactive games, educational games, stock market, chat rooms and so on. NLG methods can be used in order to generate more interesting and normal texts for such applications. Generating text with emotions is one of the contributions of our work. In the last part of this thesis, we give an overview of NLG from an applied system's points of view. We discuss when NLG techniques can be used; we explained the requirements analysis and specification of NLG systems. We also, describe the main NLG tasks of content determination, discourse planning, sentence aggregation, lexicalization, referring expression generation, and linguistic realisation. Moreover, we describe our Authoring Tool that we developed in order to allow writers without programming skills to automatically generate texts for educational games. We develop an NLG system that can generate text with different emotions. To do this, we introduce our pattern-based model for generation. We show our model starts with initial patterns, then constructs extended patterns from which we choose ``final'' patterns that are suitable for generating emotion sentences. A user can generate sentences to express the desired emotions by using our patterns. Alternatively, the user can use our Authoring Tool to generate sentences with emotions. Our acquired paraphrases will be employed by the tool in order to generate more varied outputs. Natural Language Processing Natural Language Generation Emotion Analysis Sentiment Orientation Paraphrase Bootstrapping Authoring Tool

Search results