Global ETD Search

231	On integrating apprentice learning and reinforcement learning Clouse, Jeffery Allen 01 January 1996 (has links) Apprentice learning and reinforcement learning are methods that have each been developed in order to endow computerized agents with the capacity to learn to perform multiple-step tasks, such as problem-solving tasks and control tasks. To achieve this end, each method takes differing approaches, with disparate assumptions, objectives, and algorithms. In apprentice learning, the autonomous agent tries to mimic a training agent's problem-solving behavior, learning based on examples of the trainer's action choices. In an attempt to learn to perform its task optimally, the learner in reinforcement learning changes its behavior based on scalar feedback about the consequences of its own actions. We demonstrate that a careful integration of the two learning methods can produce a more powerful method than either one alone. An argument based on the characteristics of the individuals maintains that a hybrid will be an improvement because of the complimentary strengths of its constituents. Although existing hybrids of apprentice learning and reinforcement learning perform better than their individual components, those hybrids have left many questions unanswered. We consider the following questions in this dissertation. How do the learner and trainer interact during training? How does the learner assimilate the trainer's expertise? How does the proficiency of the trainer affect the learner's ability to perform the task? And, when during training should the learner acquire information from the trainer? In our quest for answers, we develop the A scSK FOR H scELP integrated approach, and use it in our empirical study. With the new integrated approach, the learning agent is significantly faster at learning to perform optimally than learners employing either apprentice learning alone or reinforcement learning alone. The study indicates further that the learner can learn to perform optimally even when its trainer cannot; thus, the learner can outperform its trainer. Two strategies for determining when to acquire the trainer's aid show that simple approaches work well. The results of the study demonstrate that the A scSK FOR H scELP approach is effective for integrating apprentice learning and reinforcement learning, and support the conclusion that an integrated approach can be better than its individual components. Computer science\|Artificial intelligence
232	A trainable approach to coreference resolution for information extraction McCarthy, Joseph Francis 01 January 1996 (has links) This dissertation presents a new approach to solving the coreference resolution problem for a natural language processing (NLP) task known as information extraction. It describes a new system, named R scESOLVE, that uses machine learning techniques to determine when two phrases in a test co-refer, i.e., refer to the same thing. R scESOLVE can be used as a component within an information extraction system--a system that extracts information automatically from a corpus of texts that all focus on the same topic area--or it can be used as a stand-alone system to evaluate the relative contribution of different types of knowledge to the coreference resolution process. R scESOLVE represents an improvement over previous approaches to the coreference resolution problem, in that it uses a machine learning algorithm to handle some of the work that had previously been performed manually by a knowledge engineer. R scESOLVE can achieve performance that is as good as a system that was manually constructed for the same task, when both systems are given access to the same knowledge and tested on the same data. The machine learning algorithm used by R scESOLVE can be given access to different types of knowledge, some portions of which are very specific to a particular topic area or domain, and other portions are more general or domain-independent. An ablation experiment shows that domain-specific knowledge is very important to coreference resolution--the performance degradation when the domain-specific features are disabled is significantly worse than when a similarly-sized set of domain-independent features is disabled. However, even though domain-specific knowledge is important for coreference resolution, domain-independent features alone enable R scESOLVE to achieve 80% of the performance it achieves when domain-specific features are available. One explanation for why domain-independent knowledge can be used so effectively is illustrated in another domain, where the machine learning algorithm discovers domain-specific knowledge by assembling the domain-independent features of knowledge into domain-specific patterns. This ability of R scESOLVE to compensate for missing or insufficient domain-specific knowledge is a significant advantage for redeploying the system in new domains. Computer science\|Artificial intelligence
233	A knowledge-based system for the conceptual design of gas and liquid separation systems Beilstein, James Ralph 01 January 1996 (has links) Most recent research in the synthesis of separation systems has been focused on the development of more rigorous procedures for the design and synthesis of specific unit operations, rather than the synthesis of these units as a part of the overall flowsheet. Furthermore, many of the chemical process used in the production of commodity and specialty chemicals are characterized by many reaction steps with separations required between each step. The coupling of the reaction and separation steps has a significant impact on the recycle and separation system structures. A completely integrated hierarchical procedure has been developed for the conceptual design of vapor-liquid separation systems found in multiple reaction step processes. A new decomposition procedure is presented for determining the general structure of the separation system for processes involving reactions which occur in vapor-liquid-liquid-solid phase mixtures. The interactions of the separation subsystems for; vapor recovery, solid recovery, and liquid separations are identified, including process recycles between subsystems. The vapor recovery system and distillation sequence are then synthesized, the dominant process design variables are identified, the size and cost of the process units are determined, and the economic potential is calculated. Alternatives can then be quickly compared and ranked. Design procedures have been implemented in an expert system environment for the synthesis of gas membrane, condensation (high pressure or low temperature), and complex distillation columns (sidestream, side rectifier, and side stripper columns) separations. Finally, a procedure is presented for assessing the incentive for the combining of distillation systems in multiple step reaction processes. Dilute mixture separations generally represent the highest separation costs in the distillation system. The pooling of plant distillation separations can lead to better separations by reducing flow imbalances and dilute mixtures in the separation system feed. A hybrid of object-oriented and rule-based techniques has been used in the development and implementation of the procedure in PIPII, a computer-aided design tool which can rapidly generate process flowsheet alternatives and estimate the optimum range of the process conditions. The hierarchical nature of the procedure quickly prunes the number of viable alternatives which must be examined for a given process. The procedures, reasoning, and methods are thoroughly discussed within.
234	Learning text analysis rules for domain-specific natural language processing Soderland, Stephen Glenn 01 January 1997 (has links) An enormous amount of knowledge is needed to infer the meaning of unrestricted natural language. The problem can be reduced to a manageable size by restricting attention to a specific domain, which is a corpus of texts together with a predefined set of concepts that are of interest to that domain. Two widely different domains are used to illustrate this domain-specific approach. One domain is a collection of Wall Street Journal articles in which the target concept is management succession events: identifying persons moving into corporate management positions or moving out. A second domain is a collection of hospital discharge summaries in which the target concepts are various classes of diagnosis or symptom. The goal of an information extraction system is to identify references to the concept of interest for a particular domain. A key knowledge source for this purpose is a set of text analysis rules based on the vocabulary, semantic classes, and writing style peculiar to the domain. This thesis presents CRYSTAL, an implemented system that automatically induces domain-specific text analysis rules from training examples. CRYSTAL learns rules that approach the performance of hand-coded rules, are robust in the face of noise and inadequate features, and require only a modest amount of training data. CRYSTAL belongs to a class of machine learning algorithms called covering algorithms, and presents a novel control strategy with time and space complexities that are independent of the number of features. CRYSTAL navigates efficiently through an extremely large space of possible rules. CRYSTAL also demonstrates that expressive rule representation is essential for high performance, robust text analysis rules. While simple rules are adequate to capture the most salient regularities in the training data, high performance can only be achieved when rules are expressive enough to reflect the subtlety and variability of unrestricted natural language. Computer science\|Artificial intelligence
235	The development of hierarchical knowledge in robot systems Hart, Stephen W 01 January 2009 (has links) This dissertation investigates two complementary ideas in the literature on machine learning and robotics—those of embodiment and intrinsic motivation—to address a unified framework for skill learning and knowledge acquisition. "Embodied" systems make use of structure derived directly from sensory and motor configurations for learning behavior. Intrinsically motivated systems learn by searching for native, hedonic value through interaction with the world. Psychological theories of intrinsic motivation suggest that there exist internal drives favoring open-ended cognitive development and exploration. I argue that intrinsically motivated, embodied systems can learn generalizable skills, acquire control knowledge, and form an epistemological understanding of the world in terms of behavioral affordances. I propose that the development of behavior results from the assembly of an agent's sensory and motor resources into state and action spaces that can be explored autonomously. I introduce an intrinsic reward function that can lead to the open-ended learning of hierarchical behavior. This behavior is factored into declarative "recipes" for patterned activity and common sense procedural strategies for implementing them in a variety of run-time contexts. These skills form a categorical basis for the robot to interpret and model its world in terms of the behavior it affords. Experiments conducted on a bimanual robot illustrate a progression of cumulative manipulation behavior addressing manual and visual skills. Such accumulation of skill over the long-term by a single robot is a novel contribution that has yet to be demonstrated in the literature. Robotics\|Artificial intelligence
236	Learning the structure of activities for a mobile robot Schmill, Matthew D 01 January 2004 (has links) At birth, the human infant has only a very rudimentary perceptual system and similarly rudimentary control over its musculature. As time goes on, a child develops. Its ability to control, perceive, and predict its own behavior improves as it interacts with its environment. We are interested in the process of development, in particular with respect to activity. How might an intelligent agent of our own design learn to represent and organize procedural knowledge so that over time it becomes more competent at its achieving goals in its own environment? In this dissertation, we present a system that allows an agent to learn models of activity and its environment and then use those models to create units of behavior of increasing sophistication for the purpose of achieving its own internally-generated goals. Computer science\|Artificial intelligence
237	Meta-level control in multi-agent systems Raja, Anita 01 January 2003 (has links) Sophisticated agents operating in open environments must make complex real-time control decisions on scheduling and coordination of domain activities. These decisions are made in the context of limited resources and uncertainty about the outcomes of activities. Many efficient architectures and algorithms that support these computation-intensive activities have been developed and studied. However, none of these architectures explicitly reason about the consumption of time and other resources by these activities, which may degrade an agent's performance. The problem of sequencing execution and computational activities without consuming too many resources in the process, is the meta-level control problem for a resource-bounded rational agent. The focus of this research is to provide effective allocation of computation and unproved performance of individual agents in a cooperative multi-agent system. This is done by approximating the ideal solution to meta-level decisions made by these agents using reinforcement learning methods. A meta-level agent control architecture for meta-level reasoning with bounded computational overhead is described. This architecture supports decisions on when to accept, delay or reject a new task, when it is appropriate to negotiate with another agent, whether to renegotiate when a negotiation task fails, how much effort to put into scheduling when reasoning about a new task and whether to reschedule when actual execution performance deviates from expected performance. The major contributions of this work are: a resource-bounded framework that supports detailed reasoning about scheduling and coordination costs; an abstract representation of the agent state which is used by hand-generated heuristic strategies to make meta-level control decisions; and a reinforcement learning based approach which automatically learns efficient meta-level control policies. Computer science\|Artificial intelligence
238	The Parameter Signature Isolation Method and Applications McCusker, James R 01 January 2011 (has links) The aim of this research was to develop a method of system identification that would draw inspiration from the approach taken by human experts for simulation model tuning and validation. Human experts are able to utilize their natural pattern recognition ability to identify the various shape attributes, or signatures, of a time series from simulation model outputs. They can also intelligently and effectively perform tasks ranging from system identification to model validation. However, the feature extraction approach employed by them cannot be readily automated due to the difficulty in measuring shape attributes. In order to bridge the gap between the approach taken by human experts and those employed for traditional iterative approaches, a method to quantify the shape attributes was devised. The method presented in this dissertation, the Parameter Signature Isolation Method (PARSIM), uses continuous wavelet transformation to characterize specific aspects of the time series shape through surfaces in the time-scale domain. A salient characteristic of these surfaces is their enhanced delineation of the model outputs and/or their sensitivities. One benefit of this enhanced delineation is the capacity to isolate regions of the time-scale plane, coined as parameter signatures, wherein individual output sensitivities dominate all the others. The parameter signatures enable the estimation of each model parameter error separately with applicability to parameter estimation. The proposed parameter estimation method has unique features, one of them being the capacity for noise suppression, wherein the feature of relying entirely on the time-scale domain for parameter estimation offers direct noise compensation in this domain. Yet another utility of parameter signatures is in measurement selection, whereby the existence of parameter signatures is attributed to the identifiability of model parameters through various outputs. The effectiveness of PARSIM is demonstrated through an array of theoretical models, such as the Lorenz System and the Van der Pol oscillator, as well as through the real-world simulation models of an injection molding process and a jet engine.
239	Optimization-based approximate dynamic programming Petrik, Marek 01 January 2010 (has links) Reinforcement learning algorithms hold promise in many complex domains, such as resource management and planning under uncertainty. Most reinforcement learning algorithms are iterative—they successively approximate the solution based on a set of samples and features. Although these iterative algorithms can achieve impressive results in some domains, they are not sufficiently reliable for wide applicability; they often require extensive parameter tweaking to work well and provide only weak guarantees of solution quality. Some of the most interesting reinforcement learning algorithms are based on approximate dynamic programming (ADP). ADP, also known as value function approximation, approximates the value of being in each state. This thesis presents new reliable algorithms for ADP that use optimization instead of iterative improvement. Because these optimization–based algorithms explicitly seek solutions with favorable properties, they are easy to analyze, offer much stronger guarantees than iterative algorithms, and have few or no parameters to tweak. In particular, we improve on approximate linear programming — an existing method — and derive approximate bilinear programming — a new robust approximate method. The strong guarantees of optimization–based algorithms not only increase confidence in the solution quality, but also make it easier to combine the algorithms with other ADP components. The other components of ADP are samples and features used to approximate the value function. Relying on the simplified analysis of optimization–based methods, we derive new bounds on the error due to missing samples. These bounds are simpler, tighter, and more practical than the existing bounds for iterative algorithms and can be used to evaluate solution quality in practical settings. Finally, we propose homotopy methods that use the sampling bounds to automatically select good approximation features for optimization–based algorithms. Automatic feature selection significantly increases the flexibility and applicability of the proposed ADP methods. The methods presented in this thesis can potentially be used in many practical applications in artificial intelligence, operations research, and engineering. Our experimental results show that optimization–based methods may perform well on resource-management problems and standard benchmark problems and therefore represent an attractive alternative to traditional iterative methods.
240	Design-to-time real-time scheduling Garvey, Alan James 01 January 1996 (has links) Design-to-time real-time scheduling is an approach to solving time-sensitive problems where multiple solution methods are available for many subproblems. The design-to-time approach involves designing a solution plan (i.e., an ordered schedule of solution methods) dynamically at runtime such that the solution plan uses the time available as productively as possible to try to maximize solution quality. The problem to be solved is modeled as a set of interrelated computational tasks, with alternative ways of accomplishing the overall task. There is not a single "right" answer, but a range of possible solution plans of different qualities, where the overall quality of a problem solution is a function of the quality of individual subtasks. The act of scheduling such pre-specified task structures that contain alternatives requires both deciding "what" to do and deciding "when" to do it. One major focus of our design-to-time work is on taking interactions among subproblems into account when building solution plans, both "hard" interactions that must be satisfied to find correct solutions (e.g., hard precedence constraints), and "soft" interactions that can improve (or hinder) performance. Another recent focus of our work has been on adding to the problem model the notion of uncertainty in the duration and quality of methods, and in the presence and power of soft interactions. Scheduling with uncertain information requires additions to the scheduling algorithm and the monitoring of method performance to allow dynamic reaction to unexpected situations. Computer science\|Artificial intelligence

Search results