Global ETD Search

571	Performance Enhancement of Aerial Base Stations via Reinforcement Learning-Based 3D Placement Techniques Parvaresh, Nahid 21 December 2022 (has links) Deploying unmanned aerial vehicles (UAVs) as aerial base stations (BSs) in order to assist terrestrial connectivity, has drawn significant attention in recent years. UAV-BSs can take over quickly as service providers during natural disasters and many other emergency situations when ground BSs fail in an unanticipated manner. UAV-BSs can also provide cost-effective Internet connection to users who are out of infrastructure. UAV-BSs benefit from their mobility nature that enables them to change their 3D locations if the demand of ground users changes. In order to effectively make use of the mobility of UAV-BSs in a dynamic network and maximize the performance, the 3D location of UAV-BSs should be continuously optimized. However, solving the location optimization problem of UAV-BSs is NP-hard with no optimal solution in polynomial time for which near optimal solutions have to be exploited. Besides the conventional solutions, i.e. heuristic solutions, machine learning (ML), specifically reinforcement learning (RL), has emerged as a promising solution for tackling the positioning problem of UAV-BSs. The common practice for optimizing the 3D location of UAV-BSs using RL algorithms is assuming fixed location of ground users (i.e., UEs) in addition to defining discrete and limited action space for the agent of RL. In this thesis, we focus on improving the location optimization of UAV-BSs in two ways: 1-Taking into account the mobility of users in the design of RL algorithm, 2-Extending the action space of RL to a continuous action space so that the UAV-BS agent can flexibly change its location by any distance (limited by the maximum speed of UAV-BS). Three types of RL algorithms, i.e. Q-learning (QL), deep Q-learning (DQL) and actor-critic deep Q-learning (ACDQL) have been employed in this thesis to step-by-step improve the performance results of a UAV-assisted cellular network. QL is the first type of RL algorithm we use for the autonomous movement of the UAV-BS in the presence of mobile users. As a solution to the limitations of QL, we next propose a DQL-based strategy for the location optimization of the UAV-BS which largely improves the performance results of the network compared to the QL-based model. Third, we propose an ACDQL-based solution for autonomously moving the UAV-BS in a continuous action space wherein the performance results significantly outperforms both QL and DQL strategies. Aerial Base Stations Reinforcement Learning Deep Reinforcement Learning Wireless Networks Unmanned Aerial Vehicles
572	The interaction of working memory and Uncertainty (mis)estimation in context-dependent outcome estimation Li Xin Lim (9230078) 13 November 2023 (has links) <p dir="ltr">In the context of reinforcement learning, extensive research has shown how reinforcement learning was facilitated by the estimation of uncertainty to improve the ability to make decisions. However, the constraints imposed by the limited observation of the process of forming environment representation have seldom been a subject of discussion. Thus, the study intended to demonstrate that when incorporating a limited memory into uncertainty estimation, individuals potentially misestimate outcomes and environmental statistics. The study included a computational model that included the process of active working memory and lateral inhibition in working memory (WM) to describe how the relevant information was chosen and stored to form estimations of uncertainty in forming outcome expectations. The active working memory maintained relevant information not just by the recent memory, but also with utility. With the relevant information stored in WM, the model was able to estimate expected uncertainty and perceived volatility and detect contextual changes or dynamics in the outcome structure. Two experiments to investigate limitations in information availability and uncertainty estimation were carried out. The first experiment investigated the impact of cognitive loading on the reliance on memories to form outcome estimation. The findings revealed that introducing cognitive loading diminished the reliance on memory for uncertainty estimations and lowered the expected uncertainty, leading to an increased perception of environmental volatility. The second experiment investigated the ability to detect changes in outcome noise under different conditions of outcome exposure. The study found differences in the mechanisms used for detecting environmental changes in various conditions. Through the experiments and model fitting, the study showed that the misestimation of uncertainties was reliant on individual experiences and relevant information stored in WM under a limited capacity.</p> Reinforcement learning uncertainty estimation working memory gating mechanism reinforcement learning
573	REINFORCEMENT LEARNING FOR CONCAVE OBJECTIVES AND CONVEX CONSTRAINTS Mridul Agarwal (13171941) 29 July 2022 (has links) <p> </p> <p>Formulating RL with MDPs work typically works for a single objective, and hence, they are not readily applicable where the policies need to optimize multiple objectives or to satisfy certain constraints while maximizing one or multiple objectives, which can often be conflicting. Further, many applications such as robotics or autonomous driving do not allow for violating constraints even during the training process. Currently, existing algorithms do not simultaneously combine multiple objectives and zero-constraint violations, sample efficiency, and computational complexity. To this end, we study sample efficient Reinforcement Learning with concave objective and convex constraints, where an agent maximizes a concave, Lipschitz continuous function of multiple objectives while satisfying a convex cost objective. For this setup, we provide a posterior sampling algorithm which works with a convex optimization problem to solve for the stationary distribution of the states and actions. Further, using our Bellman error based analysis, we show that the algorithm obtains a near-optimal Bayesian regret bound for the number of interaction with the environment. Moreover, with an assumption of existence of slack policies, we design an algorithm that solves for conservative policies which does not violate constraints and still achieves the near-optimal regret bound. We also show that the algorithm performs significantly better than the existing algorithm for MDPs with finite states and finite actions.</p> Reinforcement learning reinforcement learning regret analysis concave utility RL constrained RL multi objective RL
574	Specification-guided imitation learning Zhou, Weichao 13 September 2024 (has links) Imitation learning is a powerful data-driven paradigm that enables machines to acquire advanced skills at a human-level proficiency by learning from demonstrations provided by humans or other agents. This approach has found applications in various domains such as robotics, autonomous driving, and text generation. However, the effectiveness of imitation learning depends heavily on the quality of the demonstrations it receives. Human demonstrations can often be inadequate, partial, environment-specific, and sub-optimal. For example, experts may only demonstrate successful task completion in ideal conditions, neglecting potential failure scenarios and important aspects of system safety considerations. The lack of diversity in the demonstrations can introduce bias in the learning process and compromise the safety and robustness of the learning systems. Additionally, current imitation learning algorithms primarily focus on replicating expert behaviors and are thus limited to learning from successful demonstrations alone. This inherent inability to learn to avoid failure is a significant limitation of existing methodologies. As a result, when faced with real-world uncertainties, imitation learning systems encounter challenges in ensuring safety, particularly in critical domains such as autonomous vehicles, healthcare, and finance, where system failures can have serious consequences. Therefore, it is crucial to develop mechanisms that ensure safety, reliability, and transparency in the decision-making process within imitation learning systems. To address these challenges, this thesis proposes innovative approaches that go beyond traditional imitation learning methodologies by enabling imitation learning systems to incorporate explicit task specifications provided by human designers. Inspired by the idea that humans acquire skills not only by learning from demonstrations but also by following explicit rules, our approach aims to complement expert demonstrations with rule-based specifications. We show that in machine learning tasks, experts can use specifications to convey information that can be difficult to express through demonstrations alone. For instance, in safety-critical scenarios where demonstrations are infeasible, explicitly specifying safety requirements for the learner can be highly effective. We also show that experts can introduce well-structured biases into the learning model, ensuring that the learning process adheres to correct-by-construction principles from its inception. Our approach, called ‘specification-guided imitation learning’, seamlessly integrates formal specifications into the data-driven learning process, laying the theoretical foundations for this framework and developing algorithms to incorporate formal specifications at various stages of imitation learning. We explore the use of different types of specifications in various types of imitation learning tasks and envision that this framework will significantly advance the applicability of imitation learning and create new connections between formal methods and machine learning. Additionally, we anticipate significant impacts across a range of domains, including robotics, autonomous driving, and gaming, by enhancing core machine learning components in future autonomous systems and improving their performance, safety, and reliability. Computer engineering Formal specifications Imitation learning Inverse reinforcement learning Reinforcement learning
575	Game Players Using Distributional Reinforcement Learning Pettersson, Adam, Pei Purroy, Francesc January 2024 (has links) Reinforcement learning (RL) algorithms aim to identify optimal action sequences for an agent in a given environment, traditionally maximizing the expected rewards received from the environment by taking each action and transitioning between states. This thesis explores approaching RL distributionally, replacing the expected reward function by the full distribution over the possible rewards received, known as the value distribution. We focus on the quantile regression distributional RL (QR-DQN) algorithm introduced by Dabney et al. (2017), which models the value distribution by representing its quantiles. With such information of the value distribution, we modify the QR-DQN algorithm to enhance the agent's risk sensitivity. Our risk-averse algorithm is evaluated against the original QR-DQN in the Atari 2600 and in the Gymnasium environment, specifically in the games Breakout, Pong, Lunar Lander and Cartpole. Results indicate that the risk-averse variant performs comparably in terms of rewards while exhibiting increased robustness and risk aversion. Potential refinements of the risk-averse algorithm are presented. Reinforcement learning distributional reinforcement learning quantile regression risk aversion Mathematics Matematik
576	Influence of inclined web reinforcement on reinforced concrete deep beams with web openings. Yang, Keun-Hyeok, Chung, H-S., Ashour, Ashraf 09 1900 (has links) Yes / This paper reports the testing of fifteen reinforced concrete deep beams with openings. All beams tested had the same overall geometrical dimensions. The main variables considered were the opening size and amount of inclined reinforcement. An effective inclined reinforcement factor combining the influence of the amount of inclined reinforcement and opening size on the structural behaviour of the beams tested is proposed. It was observed that the diagonal crack width and shear strength of beams tested were significantly dependent on the effective inclined reinforcement factor that ranged from 0 to 0.318 for the test specimens. As this factor increased, the diagonal crack width and its development rate decreased, and the shear strength of beams tested improved. Beams having effective inclined reinforcement factor more than 0.15 had higher shear strength than that of the corresponding solid beams. A numerical procedure based on the upper bound analysis of the plasticity theory was proposed to estimate the shear strength and load transfer capacity of reinforcement in deep beams with openings. Predictions obtained from the proposed formulas have a consistent agreement with test results. Deep beams Openings Shear strength Inclined reinforcement Web reinforcement Upper-bound analysis
577	The Application of Group Contingent Reinforcement to Retarded Adults Newman, Jan 05 1900 (has links) Two groups of eleven retarded adults each were used as subjects. An individually consequated token economy was in effect during baseline-1 for both groups. The treatment phase of the experiment consisted of group consequation, the first group receiving a high rate of reinforcement and the second group receiving a low rate. The individual token system was reinstated for both groups during baseline-2 measures. Attending behavior and work output were measured during each phase of the experiment. Significant differences were found between group versus individually contingent reinforcement treatments on attending behaviors, and between high and low contingency groups on performance behaviors. Differences between the high contingency and low contingency groups were found to be non-significant in regard to attending behaviors. mentally retarded adults behavior control Operant conditioning. Group psychotherapy. Reinforcement (Psychology)
578	Lateral Resistance of H-Piles and Square Piles Behind an MSE Wall with Ribbed Strip and Welded Wire Reinforcements Luna, Andrew I. 01 May 2016 (has links) Bridges often use pile foundations behind MSE walls to help resist lateral loading from seismic and thermal expansion and contraction loads. Overdesign of pile spacing and sizes occur owing to a lack of design code guidance for piles behind an MSE wall. However, space constraints necessitate the installation of piles near the wall. Full scale lateral load tests were conducted on piles behind an MSE wall. This study involves the testing of four HP12X74 H-piles and four HSS12X12X5/16 square piles. The H-piles were tested with ribbed strip soil reinforcement at a wall height of 15 feet, and the square piles were tested with welded wire reinforcement at a wall height of 20 feet. The H-piles were spaced from the back face of the MSE wall at pile diameters 4.5, 3.2, 2.5, and 2.2. The square piles were spaced at pile diameters 5.7, 4.2, 3.1, and 2.1. Testing was based on a displacement control method where load increments were applied every 0.25 inches up to three inches of pile deflection. It was concluded that piles placed closer than 3.9 pile diameters have a reduction in their lateral resistance. P-multipliers were back-calculated in LPILE from the load-deflection curves obtained from the tests. The p-multipliers were found to be 1.0, 0.85, 0.60, and 0.73 for the H-piles spaced at 4.5, 3.2, 2.5, and 2.2 pile diameters, respectively. The p-multipliers for the square piles were found to be 1.0, 0.77, 0.63, and 0.57 for piles spaced at 5.7, 4.2, 3.1, and 2.1 pile diameters, respectively. An equation was developed to estimate p-multipliers versus pile distance behind the wall. These p-multipliers account for reduced soil resistance, and decrease linearly with distance for piles placed closer than 3.9 pile diameters. Measurements were also taken of the force induced in the soil reinforcement. A statistical analysis was performed to develop an equation that could predict the maximum induced reinforcement load. The main parameters that went into this equation were the lateral pile load, transverse distance from the reinforcement to the pile center normalized by the pile diameter, spacing from the pile center to the wall normalized by the pile diameter, vertical stress, and reinforcement length to height ratio where the height included the equivalent height of the surcharge. The multiple regression equations account for 76% of the variation in observed tensile force for the ribbed strip reinforcement, and 77% of the variation for the welded wire reinforcement. The tensile force was found to increase in the reinforcement as the pile spacing decreased, transverse spacing from the pile decreased, and as the lateral load increased. laterally loaded piles MSE wall p-multiplier welded wire reinforcement ribbed strip reinforcement p-y curve tensile force reinforcement load Civil and Environmental Engineering
579	Hluboké posilovaná učení a řešení pohybu robotu typu had / Deep reinforcement learning and snake-like robot locomotion design Kočí, Jakub January 2020 (has links) This master thesis is discussing application of reinforcement learning in deep learning tasks. In theoretical part, basics about artificial neural networks and reinforcement learning. The thesis describes theoretical model of reinforcement learning process - Markov processes. Some interesting techniques are shown on conventional reinforcement learning algorithms. Some of widely used deep reinforcement learning algorithms are described here as well. Practical part consist of implementing model of robot and it's environment and of the deep reinforcement learning system itself.
580	Increasing Policy Network Size Does Not Guarantee Better Performance in Deep Reinforcement Learning Zachery Peter Berg (12455928) 25 April 2022 (has links) <p>The capacity of deep reinforcement learning policy networks has been found to affect the performance of trained agents. It has been observed that policy networks with more parameters have better training performance and generalization ability than smaller networks. In this work, we find cases where this does not hold true. We observe unimodal variance in the zero-shot test return of varying width policies, which accompanies a drop in both train and test return. Empirically, we demonstrate mostly monotonically increasing performance or mostly optimal performance as the width of deep policy networks increase, except near the variance mode. Finally, we find a scenario where larger networks have increasing performance up to a point, then decreasing performance. We hypothesize that these observations align with the theory of double descent in supervised learning, although with specific differences.</p> Theoretical Computer Science Deep Reinforcement Learning (DRL) Reinforcement Learning (RL) Double descent Policy network size bias-variance tradeoff Reinforcement Learning Generalization overparameterization

Search results