Global ETD Search

311	Reinforcement learning for intelligent assembly automation Lee, Siu-keung., 李少強. January 2002 (has links) published_or_final_version / Industrial and Manufacturing Systems Engineering / Doctoral / Doctor of Philosophy Vector processing (Computer science) Assembling machines - Automatic control. Algorithms.
312	Induction of environment and goal models by an adaptive agent in deterministic environment / Adaptyvaus agento aplinkos ir tikslo modelių indukcija deterministinėje aplinkoje Kapočiūtė-Dzikienė, Jurgita 01 February 2011 (has links) If the laws of percepts change, which are described by deterministic Markov decision process, exist in the observable or partially observable environment, then the agent by interacting with the environment and having no initial knowledge is capable to discover those laws using the methods of logical and constructive induction (i.e. is capable to learn environment and goal models); it is capable to learn predicting precisely its own consequences of actions and apply this learned knowledge in order to achieve its own goals in the new unseen situations. Adaptive agent proposed in this dissertation differs from other similar works presented in the literature in three novel potentials that enables this agent to solve the problem of knowledge transferability from one environment into another, when the same laws are valid for the environments; to solve the problem of goal percepts generalization; and to solve the problem of perceptual aliasing in the partially observable environment. During the investigations it was discovered that adaptive agent using created environment model solves the tasks of knowledge transferability into new environments better than other alternative agents (based on Q-learning and ADP methods); using created goal model solves the goal percepts generalization tasks by correctly approximating the reward function and predicting its values in the new environments; solves the problem of perceptual aliasing by transforming the deterministic nth order Markov... [to full text] / Jei stebimoje ar iš dalies stebimoje aplinkoje galioja būsenų kaitos dėsniai, nusakomi deterministiniu Markovo sprendimo procesu, tai agentas, sąveikaudamas su aplinka ir neturėdamas jokių pradinių žinių, gali šiuos dėsnius atrasti loginės ir konstrukcinės indukcijos metodais (išmokti aplinkos ir tikslo modelius), gali išmokti tiksliai prognozuoti savo veiksmų pasekmes ir taikyti šias žinias, kad greičiau pasiektų savo tikslus naujose nematytose situacijose. Disertacijoje siūlomas adaptyvus agentas nuo literatūroje pristatomų panašių darbų skiriasi trimis naujomis galimybėmis, nes: geba spręsti vienoje aplinkoje išmoktų žinių perkeliamumo į naujas aplinkas problemą, kai aplinkoms galioja tie patys dėsniai; tikslo stebėjimų apibendrinimo problemą; stebėjimų daugiareikšmiškumo problemą dalinai stebimoje aplinkoje. Tyrimų metu nustatyta, kad adaptyvus agentas, naudodamas sukurtą aplinkos modelį, žinių perkeliamumo į naujas aplinkas uždavinius sprendžia geriau nei kiti alternatyvūs agentai (grindžiami Q-mokymu ir ADP metodais); tikslo stebėjimų apibendrinimo uždavinius, naudodamas sukurtą tikslo modelį, sprendžia teisingai aproksimuodamas atlygio funkciją ir prognozuodamas pastiprinimo reikšmes naujose aplinkose; stebėjimų daugiareikšmiškumo problemą sprendžia pertvarkydamas deterministinį n-tos eilės Markovo sprendimo procesą į 1-os eilės ir jam sukurdamas aplinkos modelį, atitinkantį baigtinį Muro automatą. Informatics Artifitial intelligence Reinforcement learning Planning Dirbtinis intelektas Mokymas pastiprinimu Planavimas
313	Adaptyvaus agento aplinkos ir tikslo modelių indukcija deterministinėje aplinkoje / Induction of environment and goal models by an adaptive agent in deterministic environment Kapočiūtė-Dzikienė, Jurgita 01 February 2011 (has links) Jei stebimoje ar iš dalies stebimoje aplinkoje galioja būsenų kaitos dėsniai, nusakomi deterministiniu Markovo sprendimo procesu, tai agentas, sąveikaudamas su aplinka ir neturėdamas jokių pradinių žinių, gali šiuos dėsnius atrasti loginės ir konstrukcinės indukcijos metodais (išmokti aplinkos ir tikslo modelius), gali išmokti tiksliai prognozuoti savo veiksmų pasekmes ir taikyti šias žinias, kad greičiau pasiektų savo tikslus naujose nematytose situacijose. Disertacijoje siūlomas adaptyvus agentas nuo literatūroje pristatomų panašių darbų skiriasi trimis naujomis galimybėmis, nes: geba spręsti vienoje aplinkoje išmoktų žinių perkeliamumo į naujas aplinkas problemą, kai aplinkoms galioja tie patys dėsniai; tikslo stebėjimų apibendrinimo problemą; stebėjimų daugiareikšmiškumo problemą dalinai stebimoje aplinkoje. Tyrimų metu nustatyta, kad adaptyvus agentas, naudodamas sukurtą aplinkos modelį, žinių perkeliamumo į naujas aplinkas uždavinius sprendžia geriau nei kiti alternatyvūs agentai (grindžiami Q-mokymu ir ADP metodais); tikslo stebėjimų apibendrinimo uždavinius, naudodamas sukurtą tikslo modelį, sprendžia teisingai aproksimuodamas atlygio funkciją ir prognozuodamas pastiprinimo reikšmes naujose aplinkose; stebėjimų daugiareikšmiškumo problemą sprendžia pertvarkydamas deterministinį n-tos eilės Markovo sprendimo procesą į 1-os eilės ir jam sukurdamas aplinkos modelį, atitinkantį baigtinį Muro automatą. / If the laws of percepts change, which are described by deterministic Markov decision process, exist in the observable or partially observable environment, then the agent by interacting with the environment and having no initial knowledge is capable to discover those laws using the methods of logical and constructive induction (i.e. is capable to learn environment and goal models); it is capable to learn predicting precisely its own consequences of actions and apply this learned knowledge in order to achieve its own goals in the new unseen situations. Adaptive agent proposed in this dissertation differs from other similar works presented in the literature in three novel potentials that enables this agent to solve the problem of knowledge transferability from one environment into another, when the same laws are valid for the environments; to solve the problem of goal percepts generalization; and to solve the problem of perceptual aliasing in the partially observable environment. During the investigations it was discovered that adaptive agent using created environment model solves the tasks of knowledge transferability into new environments better than other alternative agents (based on Q-learning and ADP methods); using created goal model solves the goal percepts generalization tasks by correctly approximating the reward function and predicting its values in the new environments; solves the problem of perceptual aliasing by transforming the deterministic nth order Markov... [to full text] Informatics Adaptyvi elgsena Mokymas pastiprinimu Planavimas Adaptive behavior Reinforcement learning Planning
314	Complex question answering : minimizing the gaps and beyond Hasan, Sheikh Sadid Al January 2013 (has links) Current Question Answering (QA) systems have been significantly advanced in demonstrating finer abilities to answer simple factoid and list questions. Such questions are easier to process as they require small snippets of texts as the answers. However, there is a category of questions that represents a more complex information need, which cannot be satisfied easily by simply extracting a single entity or a single sentence. For example, the question: “How was Japan affected by the earthquake?” suggests that the inquirer is looking for information in the context of a wider perspective. We call these “complex questions” and focus on the task of answering them with the intention to minimize the existing gaps in the literature. The major limitation of the available search and QA systems is that they lack a way of measuring whether a user is satisfied with the information provided. This was our motivation to propose a reinforcement learning formulation to the complex question answering problem. Next, we presented an integer linear programming formulation where sentence compression models were applied for the query-focused multi-document summarization task in order to investigate if sentence compression improves the overall performance. Both compression and summarization were considered as global optimization problems. We also investigated the impact of syntactic and semantic information in a graph-based random walk method for answering complex questions. Decomposing a complex question into a series of simple questions and then reusing the techniques developed for answering simple questions is an effective means of answering complex questions. We proposed a supervised approach for automatically learning good decompositions of complex questions in this work. A complex question often asks about a topic of user’s interest. Therefore, the problem of complex question decomposition closely relates to the problem of topic to question generation. We addressed this challenge and proposed a topic to question generation approach to enhance the scope of our problem domain. / xi, 192 leaves : ill. ; 29 cm Question-answering systems Reinforcement learning Integer programming Linear programming Random walks (Mathematics) Decomposition method Dissertations, Academic
315	Computational model-based functional magnetic resonance imaging of reinforcement learning in humans Erdeniz, Burak January 2013 (has links) The aim of this thesis is to determine the changes in BOLD signal of the human brain during various stages of reinforcement learning. In order to accomplish that goal two probabilistic reinforcement-learning tasks were developed and assessed with healthy participants by using functional magnetic resonance imaging (fMRI). For both experiments the brain imaging data of the participants were analysed by using a combination of univariate and model–based techniques. In Experiment 1 there were three types of stimulus-response pairs where they predict either a reward, a neutral or a monetary loss outcome with a certain probability. The Experiment 1 tested the following research questions: Where does the activity occur in the brain for expecting and receiving a monetary reward and a punishment ? Does avoiding a loss outcome activate similar brain regions as gain outcomes and vice a verse does avoiding a reward outcome activate similar brain regions as loss outcomes? Where in the brain prediction errors, and predictions for rewards and losses are calculated? What are the neural correlates of reward and loss predictions for reward and loss during early and late phases in learning? The results of the Experiment 1 have shown that expectation for reward and losses activate overlapping brain areas mainly in the anterior cingulate cortex and basal ganglia but outcomes of rewards and losses activate separate brain regions, outcomes of losses mainly activate insula and amygdala whereas reward activate bilateral medial frontal gyrus. The model-based analysis also revealed early versus late learning related changes. It was found that predicted-value in early trials is coded in the ventro-medial orbito frontal cortex but later in learning the activation for the predicted value was found in the putamen. The second experiment was designed to find out the differences in processing novel versus familiar reward-predictive stimuli. The results revealed that dorso-lateral prefrontal cortex and several regions in the parietal cortex showed greater activation for novel stimuli than for familiar stimuli. As an extension to the fourth research question of Experiment 1, reward predictedvalues of the conditional stimuli and prediction errors of unconditional stimuli were also assessed in Experiment 2. The results revealed that during learning there is a significant activation of the prediction error mainly in the ventral striatum with extension to various cortical regions but for familiar stimuli no prediction error activity was observed. Moreover, predicted values for novel stimuli activate mainly ventro-medial orbito frontal cortex and precuneus whereas the predicted value of familiar stimuli activates putamen. The results of Experiment 2 for the predictedvalues reviewed together with the early versus later predicted values in Experiment 1 suggest that during learning of CS-US pairs activation in the brain shifts from ventro-medial orbito frontal structures to sensori-motor parts of the striatum. 153.15
316	Adaptive modelling and planning for learning intelligent behaviour Kochenderfer, Mykel J. January 2006 (has links) An intelligent agent must be capable of using its past experience to develop an understanding of how its actions affect the world in which it is situated. Given some objective, the agent must be able to effectively use its understanding of the world to produce a plan that is robust to the uncertainty present in the world. This thesis presents a novel computational framework called the Adaptive Modelling and Planning System (AMPS) that aims to meet these requirements for intelligence. The challenge of the agent is to use its experience in the world to generate a model. In problems with large state and action spaces, the agent can generalise from limited experience by grouping together similar states and actions, effectively partitioning the state and action spaces into finite sets of regions. This process is called abstraction. Several different abstraction approaches have been proposed in the literature, but the existing algorithms have many limitations. They generally only increase resolution, require a large amount of data before changing the abstraction, do not generalise over actions, and are computationally expensive. AMPS aims to solve these problems using a new kind of approach. AMPS splits and merges existing regions in its abstraction according to a set of heuristics. The system introduces splits using a mechanism related to supervised learning and is defined in a general way, allowing AMPS to leverage a wide variety of representations. The system merges existing regions when an analysis of the current plan indicates that doing so could be useful. Because several different regions may require revision at any given time, AMPS prioritises revision to best utilise whatever computational resources are available. Changes in the abstraction lead to changes in the model, requiring changes to the plan. AMPS prioritises the planning process, and when the agent has time, it replans in high-priority regions. This thesis demonstrates the flexibility and strength of this approach in learning intelligent behaviour from limited experience. 006.3
317	Learning in a state of confusion : employing active perception and reinforcement learning in partially observable worlds Crook, Paul A. January 2007 (has links) In applying reinforcement learning to agents acting in the real world we are often faced with tasks that are non-Markovian in nature. Much work has been done using state estimation algorithms to try to uncover Markovian models of tasks in order to allow the learning of optimal solutions using reinforcement learning. Unfortunately these algorithms which attempt to simultaneously learn a Markov model of the world and how to act have proved very brittle. Our focus differs. In considering embodied, embedded and situated agents we have a preference for simple learning algorithms which reliably learn satisficing policies. The learning algorithms we consider do not try to uncover the underlying Markovian states, instead they aim to learn successful deterministic reactive policies such that agents actions are based directly upon the observations provided by their sensors. Existing results have shown that such reactive policies can be arbitrarily worse than a policy that has access to the underlying Markov process and in some cases no satisficing reactive policy can exist. Our first contribution is to show that providing agents with alternative actions and viewpoints on the task through the addition of active perception can provide a practical solution in such circumstances. We demonstrate empirically that: (i) adding arbitrary active perception actions to agents which can only learn deterministic reactive policies can allow the learning of satisficing policies where none were originally possible; (ii) active perception actions allow the learning of better satisficing policies than those that existed previously and (iii) our approach converges more reliably to satisficing solutions than existing state estimation algorithms such as U-Tree and the Lion Algorithm. Our other contributions focus on issues which affect the reliability with which deterministic reactive satisficing policies can be learnt in non-Markovian environments. We show that that greedy action selection may be a necessary condition for the existence of stable deterministic reactive policies on partially observable Markov decision processes (POMDPs). We also set out the concept of Consistent Exploration. This is the idea of estimating state-action values by acting as though the policy has been changed to incorporate the action being explored. We demonstrate that this concept can be used to develop better algorithms for learning reactive policies to POMDPs by presenting a new reinforcement learning algorithm; the Consistent Exploration Q(l) algorithm (CEQ(l)). We demonstrate on a significant number of problems that CEQ(l) is more reliable at learning satisficing solutions than the algorithm currently regarded as the best for learning deterministic reactive policies, that of SARSA(l). 006.3
318	Towards Controlling Latency in Wireless Networks Bouacida, Nader 24 April 2017 (has links) Wireless networks are undergoing an unprecedented revolution in the last decade. With the explosion of delay-sensitive applications in the Internet (i.e., online gaming and VoIP), latency becomes a major issue for the development of wireless technology. Taking advantage of the significant decline in memory prices, industrialists equip the network devices with larger buffering capacities to improve the network throughput by limiting packets drops. Over-buffering results in increasing the time that packets spend in the queues and, thus, introducing more latency in networks. This phenomenon is known as “bufferbloat”. While throughput is the dominant performance metric, latency also has a huge impact on user experience not only for real-time applications but also for common applications like web browsing, which is sensitive to latencies in order of hundreds of milliseconds. Concerns have arisen about designing sophisticated queue management schemes to mitigate the effects of such phenomenon. My thesis research aims to solve bufferbloat problem in both traditional half-duplex and cutting-edge full-duplex wireless systems by reducing delay while maximizing wireless links utilization and fairness. Our work shed lights on buffer management algorithms behavior in wireless networks and their ability to reduce latency resulting from excessive queuing delays inside oversized static network buffers without a significant loss in other network metrics. First of all, we address the problem of buffer management in wireless full-duplex networks by using Wireless Queue Management (WQM), which is an active queue management technique for wireless networks. Our solution is based on Relay Full-Duplex MAC (RFD-MAC), an asynchronous media access control protocol designed for relay full-duplexing. Compared to the default case, our solution reduces the end-to-end delay by two orders of magnitude while achieving similar throughput in most of the cases. In the second part of this thesis, we propose a novel design called “LearnQueue” based on reinforcement learning that can effectively control the latency in wireless networks. LearnQueue adapts quickly and intelligently to changes in the wireless environment using a sophisticated reward structure. Testbed results prove that LearnQueue can guarantee low latency while preserving throughput. Bufferbloat Full-duplex wireless active queue management queuing latency reinforcement learning
319	An Evaluation of Negative Reinforcement During Error Correction Procedures Maillard, Gloria Nicole 12 1900 (has links) This study evaluated the effects of error correction procedures on sight word acquisition. Participants were four typically developing children in kindergarten and first grade. We used an adapted alternating treatment design embedded within a multiple baseline design to evaluate instructional efficacy of two error correction procedures; one with preferred items plus error correction and one with error correction only, and a concurrent chain schedule to evaluate participant preference for instructional procedure. The results show that there was no difference in acquisition rates between the procedures. The evaluation also showed children prefer procedures that include a positive reinforcement component. Negative reinforcement error correction sight words Reinforcement learning. Errors.
320	Využití opakovaně posilovaného učení pro řízení čtyřnohého robotu / Using of Reinforcement Learning for Four Legged Robot Control Ondroušek, Vít January 2011 (has links) The Ph.D. thesis is focused on using the reinforcement learning for four legged robot control. The main aim is to create an adaptive control system of the walking robot, which will be able to plan the walking gait through Q-learning algorithm. This aim is achieved using the design of the complex three layered architecture, which is based on the DEDS paradigm. The small set of elementary reactive behaviors forms the basis of proposed solution. The set of composite control laws is designed using simultaneous activations of these behaviors. Both types of controllers are able to operate on the plain terrain as well as on the rugged one. The model of all possible behaviors, that can be achieved using activations of mentioned controllers, is designed using an appropriate discretization of the continuous state space. This model is used by the Q-learning algorithm for finding the optimal strategies of robot control. The capabilities of the control unit are shown on solving three complex tasks: rotation of the robot, walking of the robot in the straight line and the walking on the inclined plane. These tasks are solved using the spatial dynamic simulations of the four legged robot with three degrees of freedom on each leg. Resulting walking gaits are evaluated using the quantitative standardized indicators. The video files, which show acting of elementary and composite controllers as well as the resulting walking gaits of the robot, are integral part of this thesis.

Search results