Spelling suggestions: "subject:"reinforcement learning."" "subject:"einforcement learning.""
21 |
Q-learning for robot control /Gaskett, Chris. January 2002 (has links)
Thesis (Ph.D.)--Australian National University, 2002. / CD contains "Examples of continuous state and action Q-learning"
|
22 |
Reinforcement learning for job-shop scheduling /Zhang, Wei, January 1900 (has links)
Thesis (Ph. D.)--Oregon State University, 1996. / Typescript (photocopy). Includes bibliographical references (leaves 159-170). Also available on the World Wide Web.
|
23 |
Knowledge discovery for time series /Saffell, Matthew John. January 2005 (has links)
Thesis (Ph.D.)--OGI School of Science & Engineering at OHSU, Oct. 2005. / Includes bibliographical references (leaves 132-142).
|
24 |
Adaptive representations for reinforcement learningWhiteson, Shimon Azariah. January 1900 (has links)
Thesis (Ph. D.)--University of Texas at Austin, 2007. / Vita. Includes bibliographical references.
|
25 |
Task Offloading and Resource Allocation Using Deep Reinforcement LearningZhang, Kaiyi 01 December 2020 (has links)
Rapid urbanization poses huge challenges to people's daily lives, such as traffic congestion, environmental pollution, and public safety. Mobile Internet of things (MIoT) applications serving smart cities bring the promise of innovative and enhanced public services such as air pollution monitoring, enhanced road safety and city resources metering and management. These applications rely on a number of energy constrained MIoT units (MUs) (e.g., robots and drones) to continuously sense, capture and process data and images from their environments to produce immediate adaptive actions (e.g., triggering alarms, controlling machinery and communicating with citizens). In this thesis, we consider a scenario where a battery constrained MU executes a number of time-sensitive data processing tasks whose arrival times and sizes are stochastic in nature. These tasks can be executed locally on the device, offloaded to one of the nearby edge servers or to a cloud data center within a mobile edge computing (MEC) infrastructure. We first formulate the problem of making optimal offloading decisions that minimize the cost of current and future tasks as a constrained Markov decision process (CMDP) that accounts for the constraints of the MU battery and the limited reserved resources on the MEC infrastructure by the application providers. Then, we relax the CMDP problem into regular Markov decision process (MDP) using Lagrangian primal-dual optimization. We then develop advantage actor-critic (A2C) algorithm, one of the model-free deep reinforcement learning (DRL) method to train the MU to solve the relaxed problem. The training of the MU can be carried-out once to learn optimal offloading policies that are repeatedly employed as long as there are no large changes in the MU environment. Simulation results are presented to show that the proposed algorithm can achieve performance improvement over offloading decisions schemes that aim at optimizing instantaneous costs.
|
26 |
Reinforcement learning in commercial computer gamesCoggan, Melanie. January 2008 (has links)
No description available.
|
27 |
State-similarity metrics for continuous Markov decision processesFerns, Norman Francis January 2007 (has links)
No description available.
|
28 |
Efficient Mobile Sensing for Large-Scale Spatial Data AcquisitionWei, Yongyong January 2021 (has links)
Large-scale spatial data such as air quality of a city, biomass content in a lake, Wi-Fi Received Signal Strengths (RSS, also referred as fingerprints) in indoor spaces often play vital roles to applications like indoor localization. However, it is extremely labor-intensive and time-consuming to collect those data manually. In this thesis, the main goal is to develop efficient means for large-scale spatial data collection.
Robotic technologies nowadays offer an opportunity on mobile sensing, where data are collected by a robot traveling in target areas. However, since robots usually have a limited travel budget depending on battery capacity, one important problem is to schedule a data collection path to best utilize the budget. Inspired by existing literature, we consider to collect data along informative paths. The process to search the most informative path given a limited budget is known as the informative path planning (IPP) problem, which is NP-hard. Thus, we propose two heuristic approaches, namely a greedy algorithm and a genetic algorithm. Experiments on Wi-Fi RSS based localization show that data collected along informative paths tend to achieve lower errors than that are opportunistically collected.
In practice, the budget of a mobile robot can vary due to insufficient charging or battery degradation. Although it is possible to apply the same path planning algorithm repetitively whenever the budget changes, it is more efficient and desirable to avoid solving the problem from scratch. This can be possible since informative paths for the same area share common characteristics. Based on this intuition, we propose and design a reinforcement learning based IPP solution, which is able to predict informative paths given any budget. In addition, it is common to have multiple robots to conduct sensing tasks cooperatively. Therefore, we also investigate the multi-robot IPP problem and present two solutions based on multi-agent reinforcement learning.
Mobile crowdsourcing (MCS) offers another opportunity to lowering the cost of data collection. In MCS, data are collected by individual contributors, which is able to accumulate a large amount of data when there are sufficient participants. As an example, we consider the collection of a specific type of spatial data, namely Wi-Fi RSS, for indoor localization purpose. The process to collect RSS is also known as site survey in the localization community. Though MCS based site survey has been suggested a decade ago~\cite{park2010growing}, so far, there has not been any published large-scale fingerprint MCS campaign. The main issue is that it depends on user's participation, and users may be reluctant to make a contribution. To investigate user behavior in a real-world site survey, we design an indoor fingerprint MCS system and organize a data collection campaign in the McMaster University campus for five months. Although we focus on Wi-Fi fingerprints, the design choices and campaign experience are beneficial to the MCS of other types of spatial data as well.
The contribution of this thesis is two-fold. For applications where robots are available for large-scale spatial sensing, efficient path planning solutions are investigated so as to maximize data utility. Meanwhile, for MCS based data acquisition, our real-world campaign experience and user behavior study reveal essential design factors that need to be considered and aspects for further improvements. / Thesis / Doctor of Philosophy (PhD) / A variety of applications such as environmental monitoring require to collect large-scale spatial data like air quality, temperature and humidity. However, it usually incurs dramatic costs like time to obtain those data, which is impeding the deployment of those applications. To reduce the data collection efforts, we consider two mobile sensing schemes, i.e, mobile robotic sensing and mobile crowdsourcing. For the former scheme, we investigate how to plan paths for mobile robots given limited travel budgets. For the latter scheme, we design a crowdsourcing platform and study user behavior through a real word data collection campaign. The proposed solutions in this thesis can benefit large-scale spatial data collection tasks.
|
29 |
Automating Network Operation Centers using Reinforcement LearningAltamimi, Sadi 18 May 2023 (has links)
Reinforcement learning (RL) has been at the core of recent advances in fulfilling
the AI promise towards general intelligence. Unlike other machine learning (ML)
paradigms, such as supervised learning (SL) that learn to mimic how humans act,
RL tries to mimic how humans learn, and in many tasks, managed to discover new
strategies and achieved super-human performance. This is possible mainly because
RL algorithms are allowed to interact with the world to collect the data they need for
training by themselves. This is not possible in SL, where the ML model is limited to a
dataset collected by humans which can be biased towards sub-optimal solutions.
The downside of RL is its high cost when trained on real systems. This high cost
stems from the fact that the actions taken by an RL model during the initial phase of
training are merely random. To overcome this issue, it is common to train RL models
using simulators before deploying them in production. However, designing a realistic
simulator that faithfully resembles the real environment is not easy at all. Furthermore,
simulator-based approaches don’t utilize the sheer amount of field-data available at
their disposal.
This work investigates new ways to bridge the gap between SL and RL through an
offline pre-training phase. The idea is to utilize the field-data to pre-train RL models
in an offline setting (similar to SL), and then allow them to safely explore and improve
their performance beyond human-level. The proposed training pipeline includes: (i)
a process to convert static datasets into RL-environment, (ii) an MDP-aware data
augmentation process of offline-dataset, and (iii) a pre-training step that improves
RL exploration phase. We show how to apply this approach to design an action
recommendation engine (ARE) that automates network operation centers (NOC); a
task that is still tackled by teams of network professionals using hand-crafted rules.
Our RL algorithm learns to maximize the Quality of Experience (QoE) of NOC
users and minimize the operational costs (OPEX) compared to traditional algorithms.
Furthermore, our algorithm is scalable, and can be used to control large-scale networks
of arbitrary size.
|
30 |
Reliable deep reinforcement learning: stable training and robust deploymentQueeney, James 30 August 2023 (has links)
Deep reinforcement learning (RL) represents a data-driven framework for sequential decision making that has demonstrated the ability to solve challenging control tasks. This data-driven, learning-based approach offers the potential to improve operations in complex systems, but only if it can be trusted to produce reliable performance both during training and upon deployment. These requirements have hindered the adoption of deep RL in many real-world applications. In order to overcome the limitations of existing methods, this dissertation introduces reliable deep RL algorithms that deliver (i) stable training from limited data and (ii) robust, safe deployment in the presence of uncertainty.
The first part of the dissertation addresses the interactive nature of deep RL, where learning requires data collection from the environment. This interactive process can be expensive, time-consuming, and dangerous in many real-world settings, which motivates the need for reliable and efficient learning. We develop deep RL algorithms that guarantee stable performance throughout training, while also directly considering data efficiency in their design. These algorithms are supported by novel policy improvement lower bounds that account for finite-sample estimation error and sample reuse.
The second part of the dissertation focuses on the uncertainty present in real-world applications, which can impact the performance and safety of learned control policies. In order to reliably deploy deep RL in the presence of uncertainty, we introduce frameworks that incorporate safety constraints and provide robustness to general disturbances in the environment. Importantly, these frameworks make limited assumptions on the training process, and can be implemented in settings that require real-world interaction for training. This motivates deep RL algorithms that deliver robust, safe performance at deployment time, while only using standard data collection from a single training environment.
Overall, this dissertation contributes new techniques to overcome key limitations of deep RL for real-world decision making and control. Experiments across a variety of continuous control tasks demonstrate the effectiveness of our algorithms.
|
Page generated in 0.1242 seconds