• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 512
  • 81
  • 65
  • 22
  • 11
  • 8
  • 8
  • 7
  • 7
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 893
  • 893
  • 215
  • 193
  • 149
  • 130
  • 130
  • 128
  • 122
  • 120
  • 119
  • 107
  • 104
  • 96
  • 88
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

The hardware implementation of an artificial neural network using stochastic pulse rate encoding principles

Glover, John Sigsworth January 1995 (has links)
In this thesis the development of a hardware artificial neuron device and artificial neural network using stochastic pulse rate encoding principles is considered. After a review of neural network architectures and algorithmic approaches suitable for hardware implementation, a critical review of hardware techniques which have been considered in analogue and digital systems is presented. New results are presented demonstrating the potential of two learning schemes which adapt by the use of a single reinforcement signal. The techniques for computation using stochastic pulse rate encoding are presented and extended with new novel circuits relevant to the hardware implementation of an artificial neural network. The generation of random numbers is the key to the encoding of data into the stochastic pulse rate domain. The formation of random numbers and multiple random bit sequences from a single PRBS generator have been investigated. Two techniques, Simulated Annealing and Genetic Algorithms, have been applied successfully to the problem of optimising the configuration of a PRBS random number generator for the formation of multiple random bit sequences and hence random numbers. A complete hardware design for an artificial neuron using stochastic pulse rate encoded signals has been described, designed, simulated, fabricated and tested before configuration of the device into a network to perform simple test problems. The implementation has shown that the processing elements of the artificial neuron are small and simple, but that there can be a significant overhead for the encoding of information into the stochastic pulse rate domain. The stochastic artificial neuron has the capability of on-line weight adaption. The implementation of reinforcement schemes using the stochastic neuron as a basic element are discussed.
22

Efficient Mobile Sensing for Large-Scale Spatial Data Acquisition

Wei, Yongyong January 2021 (has links)
Large-scale spatial data such as air quality of a city, biomass content in a lake, Wi-Fi Received Signal Strengths (RSS, also referred as fingerprints) in indoor spaces often play vital roles to applications like indoor localization. However, it is extremely labor-intensive and time-consuming to collect those data manually. In this thesis, the main goal is to develop efficient means for large-scale spatial data collection. Robotic technologies nowadays offer an opportunity on mobile sensing, where data are collected by a robot traveling in target areas. However, since robots usually have a limited travel budget depending on battery capacity, one important problem is to schedule a data collection path to best utilize the budget. Inspired by existing literature, we consider to collect data along informative paths. The process to search the most informative path given a limited budget is known as the informative path planning (IPP) problem, which is NP-hard. Thus, we propose two heuristic approaches, namely a greedy algorithm and a genetic algorithm. Experiments on Wi-Fi RSS based localization show that data collected along informative paths tend to achieve lower errors than that are opportunistically collected. In practice, the budget of a mobile robot can vary due to insufficient charging or battery degradation. Although it is possible to apply the same path planning algorithm repetitively whenever the budget changes, it is more efficient and desirable to avoid solving the problem from scratch. This can be possible since informative paths for the same area share common characteristics. Based on this intuition, we propose and design a reinforcement learning based IPP solution, which is able to predict informative paths given any budget. In addition, it is common to have multiple robots to conduct sensing tasks cooperatively. Therefore, we also investigate the multi-robot IPP problem and present two solutions based on multi-agent reinforcement learning. Mobile crowdsourcing (MCS) offers another opportunity to lowering the cost of data collection. In MCS, data are collected by individual contributors, which is able to accumulate a large amount of data when there are sufficient participants. As an example, we consider the collection of a specific type of spatial data, namely Wi-Fi RSS, for indoor localization purpose. The process to collect RSS is also known as site survey in the localization community. Though MCS based site survey has been suggested a decade ago~\cite{park2010growing}, so far, there has not been any published large-scale fingerprint MCS campaign. The main issue is that it depends on user's participation, and users may be reluctant to make a contribution. To investigate user behavior in a real-world site survey, we design an indoor fingerprint MCS system and organize a data collection campaign in the McMaster University campus for five months. Although we focus on Wi-Fi fingerprints, the design choices and campaign experience are beneficial to the MCS of other types of spatial data as well. The contribution of this thesis is two-fold. For applications where robots are available for large-scale spatial sensing, efficient path planning solutions are investigated so as to maximize data utility. Meanwhile, for MCS based data acquisition, our real-world campaign experience and user behavior study reveal essential design factors that need to be considered and aspects for further improvements. / Thesis / Doctor of Philosophy (PhD) / A variety of applications such as environmental monitoring require to collect large-scale spatial data like air quality, temperature and humidity. However, it usually incurs dramatic costs like time to obtain those data, which is impeding the deployment of those applications. To reduce the data collection efforts, we consider two mobile sensing schemes, i.e, mobile robotic sensing and mobile crowdsourcing. For the former scheme, we investigate how to plan paths for mobile robots given limited travel budgets. For the latter scheme, we design a crowdsourcing platform and study user behavior through a real word data collection campaign. The proposed solutions in this thesis can benefit large-scale spatial data collection tasks.
23

Relationship of cognitive style and reinforcement learning in counseling /

Riemer, Helmut Herbert January 1967 (has links)
No description available.
24

Successive discrimination and reversal learning as a function of differential sensory reinforcement and discriminative cues in two sensory modalities /

Duckmanton, Robert Antony. January 1971 (has links) (PDF)
Thesis (B.A. (Hons.)), Department of Psychology, University of Adelaide, 1971.
25

Q-learning for robot control /

Gaskett, Chris. January 2002 (has links)
Thesis (Ph.D.)--Australian National University, 2002. / CD contains "Examples of continuous state and action Q-learning"
26

Reinforcement learning for job-shop scheduling /

Zhang, Wei, January 1900 (has links)
Thesis (Ph. D.)--Oregon State University, 1996. / Typescript (photocopy). Includes bibliographical references (leaves 159-170). Also available on the World Wide Web.
27

Knowledge discovery for time series /

Saffell, Matthew John. January 2005 (has links)
Thesis (Ph.D.)--OGI School of Science & Engineering at OHSU, Oct. 2005. / Includes bibliographical references (leaves 132-142).
28

Adaptive representations for reinforcement learning

Whiteson, Shimon Azariah. January 1900 (has links)
Thesis (Ph. D.)--University of Texas at Austin, 2007. / Vita. Includes bibliographical references.
29

Task Offloading and Resource Allocation Using Deep Reinforcement Learning

Zhang, Kaiyi 01 December 2020 (has links)
Rapid urbanization poses huge challenges to people's daily lives, such as traffic congestion, environmental pollution, and public safety. Mobile Internet of things (MIoT) applications serving smart cities bring the promise of innovative and enhanced public services such as air pollution monitoring, enhanced road safety and city resources metering and management. These applications rely on a number of energy constrained MIoT units (MUs) (e.g., robots and drones) to continuously sense, capture and process data and images from their environments to produce immediate adaptive actions (e.g., triggering alarms, controlling machinery and communicating with citizens). In this thesis, we consider a scenario where a battery constrained MU executes a number of time-sensitive data processing tasks whose arrival times and sizes are stochastic in nature. These tasks can be executed locally on the device, offloaded to one of the nearby edge servers or to a cloud data center within a mobile edge computing (MEC) infrastructure. We first formulate the problem of making optimal offloading decisions that minimize the cost of current and future tasks as a constrained Markov decision process (CMDP) that accounts for the constraints of the MU battery and the limited reserved resources on the MEC infrastructure by the application providers. Then, we relax the CMDP problem into regular Markov decision process (MDP) using Lagrangian primal-dual optimization. We then develop advantage actor-critic (A2C) algorithm, one of the model-free deep reinforcement learning (DRL) method to train the MU to solve the relaxed problem. The training of the MU can be carried-out once to learn optimal offloading policies that are repeatedly employed as long as there are no large changes in the MU environment. Simulation results are presented to show that the proposed algorithm can achieve performance improvement over offloading decisions schemes that aim at optimizing instantaneous costs.
30

Nonparametric Inverse Reinforcement Learning and Approximate Optimal Control with Temporal Logic Tasks

Perundurai Rajasekaran, Siddharthan 30 August 2017 (has links)
"This thesis focuses on two key problems in reinforcement learning: How to design reward functions to obtain intended behaviors in autonomous systems using the learning-based control? Given complex mission specification, how to shape the reward function to achieve fast convergence and reduce sample complexity while learning the optimal policy? To answer these questions, the first part of this thesis investigates inverse reinforcement learning (IRL) method with a purpose of learning a reward function from expert demonstrations. However, existing algorithms often assume that the expert demonstrations are generated by the same reward function. Such an assumption may be invalid as one may need to aggregate data from multiple experts to obtain a sufficient set of demonstrations. In the first and the major part of the thesis, we develop a novel method, called Non-parametric Behavior Clustering IRL. This algorithm allows one to simultaneously cluster behaviors while learning their reward functions from demonstrations that are generated from more than one expert/behavior. Our approach is built upon the expectation-maximization formulation and non-parametric clustering in the IRL setting. We apply the algorithm to learn, from driving demonstrations, multiple driver behaviors (e.g., aggressive vs. evasive driving behaviors). In the second task, we study whether reinforcement learning can be used to generate complex behaviors specified in formal logic — Linear Temporal Logic (LTL). Such LTL tasks may specify temporally extended goals, safety, surveillance, and reactive behaviors in a dynamic environment. We introduce reward shaping under LTL constraints to improve the rate of convergence in learning the optimal and probably correct policies. Our approach exploits the relation between reward shaping and actor-critic methods for speeding up the convergence and, as a consequence, reducing training samples. We integrate compositional reasoning in formal methods with actor-critic reinforcement learning algorithms to initialize a heuristic value function for reward shaping. This initialization can direct the agent towards efficient planning subject to more complex behavior specifications in LTL. The investigation takes the initial step to integrate machine learning with formal methods and contributes to building highly autonomous and self-adaptive robots under complex missions."

Page generated in 0.1262 seconds