Global ETD Search

521	Reinforcement Learning for Self-adapting Time Discretizations of Complex Systems Gallagher, Conor Dietrich 27 August 2021 (has links) The overarching goal of this project is to develop intelligent, self-adapting numerical algorithms for the time discretization of complex real-world problems with Q-Learning methodologies. The specific application is ordinary differential equations which can resolve problems in mathematics, social and natural sciences, but which usually require approximations to solve because direct analytical solutions are rare. Using the traditional Brusellator and Lorenz differential equations as test beds, this research develops models to determine reward functions and dynamically tunes controller parameters that minimize both the error and number of steps required for approximate mathematical solutions. Our best reward function is based on an error that does not overly punish rejected states. The Alpha-Beta Adjustment and Safety Factor Adjustment Model is the most efficient and accurate method for solving these mathematical problems. Allowing the model to change the alpha/beta value and safety factor by small amounts provides better results than if the model chose values from discrete lists. This method shows potential for training dynamic controllers with Reinforcement Learning. / Master of Science / This research applies Q-Learning, a subset of Reinforcement Learning and Machine Learning, to solve complex mathematical problems that are unable to be solved analytically and therefore require approximate solutions. Specifically, this research applies mathematical modeling of ordinary differential equations which are used in many fields, from theoretical sciences such and physics and chemistry, to applied technical fields such as medicine and engineering, to social and consumer-oriented fields such as finance and consumer purchasing habits, and to the realms of national and international security and communications. Q-Learning develops mathematical models that make decisions, and depending on the outcome, learns if the decision is good or bad, and uses this information to make the next decision. The research develops approaches to determine reward functions and controller parameters that minimize the error and number of steps associated with approximate mathematical solutions to ordinary differential equations. Error is how far the model's answer is from the true answer, and the number of steps is related to how long it takes and how much computational time and cost is associated with the solution. The Alpha-Beta Adjustment and Safety Factor Adjustment Model is the most efficient and accurate method for solving these mathematical problems and has potential for solving complex mathematical and societal problems. Ordinary Differential Equations Controllers Reinforcement Learning
522	Bond of Reinforcing Bars to Steel Fiber Reinforced Concrete (SFRC) García Taengua, Emilio José 21 October 2013 (has links) The use of steel fiber reinforced concrete (SFRC hereafter) is becoming more and more common. Building codes and recommendations are gradually including the positive effect of fibers on mechanical properties of concrete. How to take advantage of the higher ductility and energy absorption capacity of SFRC to reduce anchorage lengths when using fibers is not a straightforward issue. Fibers improve bond performance because they confine reinforcement (playing a similar role to that of transverse reinforcement). Their impact on bond performance of concrete is really important in terms of toughness/ductility. The study of previous literature has revealed important points of ongoing discussion regarding different issues, especially the following: a) whether the effect of fibers on bond strength is negligible or not, b) whether the effect of fibers on bond strength is dependent on any other factors such as concrete compressive strength or concrete cover, c) quantifying the effect of fibers on the ductility of bond failure (bond toughness). These issues have defined the objectives of this thesis. A modified version of the Pull Out Test (POT hereafter) has been selected as the most appropriate test for the purposes of this research. The effect of a number of factors on bond stress¿slip curves has been analyzed. The factors considered are: concrete compressive strength (between 30 MPa and 50 MPa), rebar diameter (between 8 mm and 20 mm), concrete cover (between 30 mm and 5 times rebar diameter), fiber content (up to 70 kg/m3), and fiber slenderness and length. The experimental program has been designed relying on the principles of statistical Design Of Experiments. This has allowed to select a reduced number of combinations to be tested without any bias or loss of accuracy. A total of 81 POT specimens have been produced and tested. An accurate model for predicting the mode of bond failure has been developed. It relates splitting probability to the factors considered. It has been proved that increasing fiber content restrains the risk of splitting failure. The favorable effect of fibers when preventing splitting failures has been revealed to be more important for higher concrete compressive strength values. Higher compressive strength values require higher concrete cover/diameter ratios for splitting failure to be prevented. Fiber slenderness and fiber length modify the effect of fiber content on splitting probability and therefore on minimum cover/diameter ratios required to prevent splitting failures. Two charts have been developed for estimating the minimum cover/ diameter ratio required to prevent splitting. Predictive equations have been obtained for estimating bond strength and areas under the bond stress¿slip curve as a function of the factors considered. Increasing fiber content has a slightly positive impact on bond strength, which is mainly determined by concrete compressive strength. On the contrary, fibers have a very important effect on the ductility of bond failure, just as well as concrete cover, as long as no splitting occurs. Multivariate analysis has proved that bond stress corresponding to the onset of slippage behaves independently from the rest of the bond stress¿slip curve. The effect of fibers and concrete compressive strength on bond stress values corresponding to the onset of slips is mainly attributable to their influence on the material mechanical properties. On the contrary, the effect of fibers and concrete cover on the rest of the bond stress¿slip curve is due to their structural role. / García Taengua, EJ. (2013). Bond of Reinforcing Bars to Steel Fiber Reinforced Concrete (SFRC) [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/32952 Bond Reinforcement Steel fibers SFRC INGENIERIA DE LA CONSTRUCCION
523	Optimizing Urban Traffic Management Through AI with Digital Twin Simulation and Validation Sioldea, Daniel 08 1900 (has links) The number of vehicles on the road continuously increases, revealing a lack of robust and effective traffic management systems in urban settings. Urban traffic makes up a substantial portion of the total traffic problem, and current traffic light architecture has been limiting the traffic flow noticeably. This thesis focuses on developing an artificial intelligence-based smart traffic management system using a double duelling deep Q network (DDDQN), validated through a user-controlled 3D simulation, determining the system’s effectiveness. This work leverages current fisheye camera architecture to present a system that can be implemented into current architecture with little intrusion. The challenges surrounding large computer vision datasets, and the challenges and limitations surrounding fisheye cameras are discussed. The data and conditions required to replicate these features in a simulated environment are identified. Finally, a baseline traffic flow and traffic light phase model is created using camera data from the City of Hamilton. A DDDQN optimization algorithm used to reduce individual traffic light queue length and wait times is developed using the SUMO traffic simulator. The algorithm is trained over different maps and is then deployed onto a large map of various streets in the City of Hamilton. The algorithm is tested through a user-controlled driving simulator, observing excellent performance results over long routes. / Thesis / Master of Applied Science (MASc) AI Deep Learning Reinforcement Learning Simulation
524	Enhancing Capabilities of Assistive Robotic Arms: Learning, Control, and Object Manipulation Mehta, Shaunak A. 11 November 2024 (has links) In this thesis, we explore methods to enable assistive robotic arms mounted on wheelchairs to assist disabled users with their daily activities. To effectively aid users, these robots must recognize a variety of tasks and provide intuitive control mechanisms. We focus on developing techniques that allow these assistive robots to learn diverse tasks, manipulate different types of objects, and simplify user control of these complex, high-dimensional systems. This thesis is structured around three key contributions. First, we introduce a method for assistive robots to autonomously learn complex, high-dimensional behaviors in a given environment and map them to a low-dimensional joystick interface without human demonstrations. Through controlled experiments and a user study, we show that this approach outperforms systems based on human-demonstrated actions, leading to faster task completion compared to industry-standard baselines. Second, we improve the efficiency of reinforcement learning for robotic manipulation tasks by introducing a waypoint-based algorithm. This approach frames task learning as a sequence of multi-armed bandit problems, where each bandit problem corresponds to a waypoint in the robot's trajectory. We introduce an approximate posterior sampling solution that builds the robot's motion one waypoint at a time. Our simulations and real-world experiments show that this approach achieves faster learning than state-of-the-art baselines. Finally, to address the challenge of manipulating a variety of objects, we introduce RIgid-SOft (RISO) grippers that combine soft-switchable adhesives with standard rigid grippers and propose a shared control framework that automates part of the grasping process. The RISO grippers allow users to manipulate objects using either rigid or soft grasps, depending on the task. Our user study reveals that, with the shared control framework and RISO grippers, users were able to grasp and manipulate a wide range of household objects effectively. The findings from this research emphasize the importance of integrating advanced learning algorithms and control strategies to improve the capabilities of assistive robots in helping users with their daily activities. By exploring different directions within the domain of assistive robotics, this thesis contributes to the development of methods that enhance the overall functionality of assistive robotic arms. / Master of Science / In this thesis, we explore ways to make robotic arms attached to wheelchairs more helpful for people with disabilities in their everyday lives. To be truly useful, these robots need to understand a variety of tasks and be easy for users to control. Our focus is on developing techniques that help these robots learn different tasks, handle different types of objects, and make controlling them simpler. The thesis is built around three main contributions. First, we introduce a way for robots to learn how to perform complex tasks on their own and then simplify controlling robots for those tasks so users can control the robot to perform different tasks using just a joystick. We show through experiments that this approach helps people complete tasks faster than systems that rely on human-taught actions. Second, we improve how robots learn to perform tasks using a more efficient learning method. This method breaks down tasks into smaller steps, and the robot learns how to move toward each step more quickly. Our tests show that this approach speeds up the learning process compared to other methods. Finally, we address the challenge of handling different types of objects by developing a new type of robotic gripper that combines soft and rigid gripping options. This gripper allows users to pick up and manipulate a wide variety of household objects more easily, thanks to a control system that helps automate part of the process. In our user study, people found it easier to use the new gripper to handle different items. Overall, this research highlights the importance of combining learning algorithms and userfriendly controls to make assistive robots better at helping people with their daily tasks. These contributions advance the development of robotic arms that can more effectively assist users. Human-Robot Interaction Reinforcement Learning Assistive Manipulation
525	Altered Neural and Behavioral Associability-Based Learning in Posttraumatic Stress Disorder Brown, Vanessa 24 April 2015 (has links) Posttraumatic stress disorder (PTSD) is accompanied by marked alterations in cognition and behavior, particularly when negative, high-value information is present (Aupperle, Melrose, Stein, & Paulus, 2012; Hayes, Vanelzakker, & Shin, 2012) . However, the underlying processes are unclear; such alterations could result from differences in how this high value information is updated or in its effects on processing future information. To untangle the effects of different aspects of behavior, we used a computational psychiatry approach to disambiguate the roles of increased learning from previously surprising outcomes (i.e. associability; Li, Schiller, Schoenbaum, Phelps, & Daw, 2011) and from large value differences (i.e. prediction error; Montague, 1996; Schultz, Dayan, & Montague, 1997) in PTSD. Combat-deployed military veterans with varying levels of PTSD symptoms completed a learning task while undergoing fMRI; behavioral choices and neural activation were modeled using reinforcement learning. We found that associability-based loss learning at a neural and behavioral level increased with PTSD severity, particularly with hyperarousal symptoms, and that the interaction of PTSD severity and neural markers of associability based learning predicted behavior. In contrast, PTSD severity did not modulate prediction error neural signal or behavioral learning rate. These results suggest that increased associability-based learning underlies neurobehavioral alterations in PTSD. / Master of Science fMRI reinforcement learning associability posttraumatic stress disorder
526	Cocaine Use Modulates Neural Prediction Error During Aversive Learning Wang, John Mujia 08 June 2015 (has links) Cocaine use has contributed to 5 million individuals falling into the cycle of addiction. Prior research in cocaine dependence mainly focused on rewards. Losses also play a critical role in cocaine dependence as dependent individuals fail to avoid social, health, and economic losses even when they acknowledge them. However, dependent individuals are extremely adept at escaping negative states like withdrawal. To further understand whether cocaine use may contribute to dysfunctions in aversive learning, this paper uses fMRI and an aversive learning task to examine cocaine dependent individuals abstinent from cocaine use (C-) and using as usual (C+). Specifically of interest is the neural signal representing actual loss compared to the expected loss, better known as prediction error (δ), which individuals use to update future expectations. When abstinent (C-), dependent individuals exhibited higher positive prediction error (Î´+) signal in their striatum than when they were using as usual. Furthermore, their striatal δ+ signal enhancements from drug abstinence were predicted by higher positive learning rate (α+) enhancements. However, no relationships were found between drug abstinence enhancements to negative learning rates (α±-) and negative prediction error (δ-) striatal signals. Abstinent (C-) individuals' striatal δ+ signal was predicted by longer drug use history, signifying possible relief learning adaptations with time. Lastly, craving measures, especially the desire to use cocaine and positive effects of cocaine, also positively correlated with C- individuals' striatal δ+ signal. This suggests possible relief learning adaptations in response to higher craving and withdrawal symptoms. Taken together, enhanced striatal δ+ signal when abstinent and adaptations in relief learning provide evidence in supporting dependent individuals' lack of aversive learning ability while using as usual and enhanced relief learning ability for the purpose of avoiding negative situations such as withdrawal, suggesting a neurocomputational mechanism that pushes the dependent individual to maintains dependence. / Master of Science reinforcement learning prediction error cocaine dopamine fMRI
527	Three-Dimensional Finite Difference Analysis of Geosynthetic Reinforcement Used in Column-Supported Embankments Jones, Brenton Michael 14 January 2008 (has links) Column-supported, geosynthetic-reinforced embankments provide effective geotechnical foundations for applications in areas of weak subgrade soils. The system consists of a soil bridging layer with one or more embedded layers of geosynthetic reinforcement supported by driven or deep mixed columnar piles. The geosynthetic promotes load transfer within the bridging layer to the columns, allowing for larger column spacings and varied alignments. This technique is generally used when differential settlements of the embankment or adjacent structures are a concern and to minimize construction time. Recent increase in the popularity of this composite system has generated the need to further investigate its behavior and soil-structure interaction. Current models of geosynthetics are oversimplified and do not represent the true three-dimensional nature of the material. Such simplifications include treating the geosynthetic as a one-dimensional cable as well as neglecting stress concentrations and pile orientations. In this thesis, a complete three-dimensional analysis of the geosynthetic is performed. The geosynthetic was modeled as a thin flexible plate in a single square unit cell of the embankment. The principle of minimum potential energy was then applied, utilizing central finite difference equations. Energy components from vertical loading, soil and column support, as well as bending and membrane stiffness of the geosynthetic are considered. Three pile orentation types were implemented: square piles, circular piles, and square piles rotated 45° to the edges of the unit cell. Each of the pile orientations was analyzed using two distinct parameter sets that are investigated in previously published and ongoing research. Vertical and in-plane deflections, stress resultants, and strains were determined and compared to other geosynthetic models and design guides. Results of each parameter set and pile orientation were also compared to provide design recommendations for geosynthetic-reinforced column-supported embankments. / Master of Science finite difference pile support plate geosynthetic reinforcement
528	Sample Complexity of Incremental Policy Gradient Methods for Solving Multi-Task Reinforcement Learning Bai, Yitao 05 April 2024 (has links) We consider a multi-task learning problem, where an agent is presented a number of N reinforcement learning tasks. To solve this problem, we are interested in studying the gradient approach, which iteratively updates an estimate of the optimal policy using the gradients of the value functions. The classic policy gradient method, however, may be expensive to implement in the multi-task settings as it requires access to the gradients of all the tasks at every iteration. To circumvent this issue, in this paper we propose to study an incremental policy gradient method, where the agent only uses the gradient of only one task at each iteration. Our main contribution is to provide theoretical results to characterize the performance of the proposed method. In particular, we show that incremental policy gradient methods converge to the optimal value of the multi-task reinforcement learning objectives at a sublinear rate O(1/√k), where k is the number of iterations. To illustrate its performance, we apply the proposed method to solve a simple multi-task variant of GridWorld problems, where an agent seeks to find an policy to navigate effectively in different environments. / Master of Science / First, we introduce a popular machine learning technique called Reinforcement Learning (RL), where an agent, such as a robot, uses a policy to choose an action, like moving forward, based on observations from sensors like cameras. The agent receives a reward that helps judge if the policy is good or bad. The objective of the agent is to find a policy that maximizes the cumulative reward it receives by repeating the above process. RL has many applications, including Cruise autonomous cars, Google industry automation, training ChatGPT language models, and Walmart inventory management. However, RL suffers from task sensitivity and requires a lot of training data. For example, if the task changes slightly, the agent needs to train the policy from the beginning. This motivates the technique called Multi-Task Reinforcement Learning (MTRL), where different tasks give different rewards and the agent maximizes the sum of cumulative rewards of all the tasks. We focus on the incremental setting where the agent can only access the tasks one by one randomly. In this case, we only need one agent and it is not required to know which task it is performing. We show that the incremental policy gradient methods we proposed converge to the optimal value of the MTRL objectives at a sublinear rate O(1/ √ k), where k is the number of iterations. To illustrate its performance, we apply the proposed method to solve a simple multi-task variant of GridWorld problems, where an agent seeks to find an policy to navigate effectively in different environments. Markov decision processes Multi-task reinforcement learning
529	Reinforcing Reachable Routes Thirunavukkarasu, Muthukumar 13 May 2004 (has links) Reachability routing is a newly emerging paradigm in networking, where the goal is to determine all paths between a sender and a receiver. It is becoming relevant with the changing dynamics of the Internet and the emergence of low-bandwidth wireless/ad hoc networks. This thesis presents the case for reinforcement learning (RL) as the framework of choice to realize reachability routing, within the confines of the current Internet backbone infrastructure. The setting of the reinforcement learning problem offers several advantages, including loop resolution, multi-path forwarding capability, cost-sensitive routing, and minimizing state overhead, while maintaining the incremental spirit of the current backbone routing algorithms. We present the design and implementation of a new reachability algorithm that uses a model-based approach to achieve cost-sensitive multi-path forwarding. Performance assessment of the algorithm in various troublesome topologies shows consistently superior performance over classical reinforcement learning algorithms. Evaluations of the algorithm based on different criteria on many types of randomly generated networks as well as realistic topologies are presented. / Master of Science Multipath routing Probabilistic algorithms Reachability Reinforcement learning
530	Advances in Non-Stationary Sequential Decision-Making Suk, Joseph January 2024 (has links) We study the problem of sequential decision-making (e.g. multi-armed bandits, contextual bandits, reinforcement learning) under changing environments, or distribution shifts. Ideally, one aims to automatically adapt/self-tune to unknown changes in distribution, and restart exploration as needed. While recent theoretical breakthroughs show this is possible in a broad sense, such works contend that the learner should restart procedures upon experiencing any change leading to worst-case (regret) rates. This leaves open whether faster rates are possible, adaptively, if few changes in distribution are actually severe, e.g., involve no change in best action. This thesis initiates a broad research program giving positive answers to these open questions across several instances. In particular, we begin at non-stationary bandits and show a much weaker notion of change can be adapted to, which can yield significantly faster rates than previously known, whether as expressed in terms of number of best action switches--for which no adaptive procedure was known, or in terms of previously studied variation or smoothness measures. We then generalize these results to non-parametric contextual bandits and dueling bandits. As a result, we substantially improve the theoretical state-of-the-art performance guarantees for these problems and, in many cases, tightly characterize the statistical limits of sequential decision-making under changing environments. Statistics Decision making Reinforcement learning Smoothing (Statistics)

Search results