Global ETD Search

211	Learning enhances encoding of time and temporal surprise in primary sensory cortex Rabinovich, Rebecca January 2022 (has links) Primary sensory cortex has long been believed to play a straightforward role in the initial processing of sensory information. Yet, the superficial layers of cortex overall are sparsely active, even during strong sensory stimulation; moreover, cortical activity is influenced by other modalities, task context, reward, and behavioral state. The experiments described in this thesis demonstrate that reinforcement learning dramatically alters representations among longitudinally imaged neurons in superficial layers of mouse primary somatosensory cortex. Cells were confirmed to be sparsely active in naïve animals; however, learning an object detection task recruited previously unresponsive neurons, enlarging the neuronal population sensitive to tactile stimuli. In contrast, cortical responses habituated, decreasing upon repeated exposure to unrewarded stimuli. In addition, after conditioning, the cell population as well as individual neurons better encoded the rewarded stimuli, as well as behavioral choice. Furthermore, in well-trained mice, the neuronal population encoded of the passage of time. We further found evidence that the temporal information was contained in sequences of cell activity, meaning that different cells in the population activated at different moments within the trial. This kind of time-keeping was not observed in naïve animals, nor did it arise after repeated stimulus exposure. Finally, unexpected deviations in trial timing elicited even stronger responses than touch did. In conclusion, the superficial layers of sensory cortex exhibit a high degree of learning-dependent plasticity and are strongly modulated by non-sensory but behaviorally-relevant features, such as timing and surprise. Neurosciences Somatosensory cortex Sensory stimulation Surprise Reinforcement learning
212	Reinforcement Learning of Repetitive Tasks for Autonomous Heavy-Duty Vehicles Lindesvik Warma, Simon January 2020 (has links) Many industrial applications of heavy-duty autonomous vehicles include repetitive manoeuvres, such as, vehicle parking, hub-to-hub transportation etc. This thesis explores the possibility to use the information from previous executions, via reinforcement learning, of specific manoeuvres to improve the performance for future iterations. The manoeuvres are; one straight line path, and one constantly curved path. A proportional-integrative control strategy is designed to control the vehicle and the controller is updated, between each iteration, using a policy gradient method. A rejection sampling procedure is introduced to impose the stability of the control system. This is necessary since the general reinforcement learning framework and policy gradient framework do not consider stability. The performance of the rejection sampling procedure is improved using the ideas of simulated annealing. The performance improvement of the vehicle is evaluated through simulations. Linear and nonlinear vehicle models are evaluated on a straight line path and a constantly curved path. The simulations show that the vehicle improves its ability to track the reference path for all evaluation models and scenarios. Finally, the simulations also show that the controlled system is kept stable throughout the learning process. / Autonoma fordon är en viktig pusselbit i framtidens transportlösningar och industriella miljöer, både klimat- och säkerhetsmässigt. Många manövrar industriella fordon utför är repetetiva, exempelvis parkering. Det här arbetet utforskar möjligheten att lära sig av tidigare försök av manövern för att förbättra fordonets förmåga att utföra den. En proportionelig-integrerande reglerstruktur används för att styra fordonet. Reglerstrukturen är en tillståndsåterkoppling där regulatorn består av två proportionelig-integrerende regulatorer. Reglersystemet är initialiserat stabilt och fordonet låts utföra en iteration av manövern. Regulatorn updateras mellan varje iteration av manövern med hjälp av förstärkningsinlärning. Förstärkningslärning innebär att man använder informationen från tidigare försök av manövern för att förbättra fordonets förmåga att följa referensbanan. Förstärkningslärningen ger alltså instruktioner om hur regulatorn ska uppdateras baserat på hur fordonet presterade under förra iterationen. En samplings procedur implementeras för att försäkra stabiliteten av reglersystemet eftersom förstärkningslärandet inte tar hänsyn till detta. Syftet med samplings proceduren är också att minimera de negativa effekterna på lärningsprocessen. Algoritmen är analyserad genom att simulera fordonet med hjälp av både linjära- och olinjära utvärderingsmodeller på två olika scenarion; en rak bana och en bana med konstant kurvatur. Simuleringarna visar att fordonet förbättrar sin förmåga att följa referensbanorna för alla utvärderingsmodeller av fordonet. Simuleringarna visar också att reglersystemet hålls stabilt under lärningsprocessen. reinforcement learning automatic control motion control Engineering and Technology Teknik och teknologier
213	Intelligent Device Selection in Federated Edge Learning with Energy Efficiency Peng, Cheng 12 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Due to the increasing demand from mobile devices for the real-time response of cloud computing services, federated edge learning (FEL) emerges as a new computing paradigm, which utilizes edge devices to achieve efficient machine learning while protecting their data privacy. Implementing efficient FEL suffers from the challenges of devices' limited computing and communication resources, as well as unevenly distributed datasets, which inspires several existing research focusing on device selection to optimize time consumption and data diversity. However, these studies fail to consider the energy consumption of edge devices given their limited power supply, which can seriously affect the cost-efficiency of FEL with unexpected device dropouts. To fill this gap, we propose a device selection model capturing both energy consumption and data diversity optimization, under the constraints of time consumption and training data amount. Then we solve the optimization problem by reformulating the original model and designing a novel algorithm, named E2DS, to reduce the time complexity greatly. By comparing with two classical FEL schemes, we validate the superiority of our proposed device selection mechanism for FEL with extensive experimental results. Furthermore, for each device in a real FEL environment, it is the fact that multiple tasks will occupy the CPU at the same time, so the frequency of the CPU used for training fluctuates all the time, which may lead to large errors in computing energy consumption. To solve this problem, we deploy reinforcement learning to learn the frequency so as to approach real value. And compared to increasing data diversity, we consider a more direct way to improve the convergence speed using loss values. Then we formulate the optimization problem that minimizes the energy consumption and maximizes the loss values to select the appropriate set of devices. After reformulating the problem, we design a new algorithm FCE2DS as the solution to have better performance on convergence speed and accuracy. Finally, we compare the performance of this proposed scheme with the previous scheme and the traditional scheme to verify the improvement of the proposed scheme in multiple aspects. federated learning edge computing reinforcement learning energy consumption dynamic programming
214	Multi-criteria decision making using reinforcement learning and its application to food, energy, and water systems (FEWS) problem Deshpande, Aishwarya 12 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Multi-criteria decision making (MCDM) methods have evolved over the past several decades. In today’s world with rapidly growing industries, MCDM has proven to be significant in many application areas. In this study, a decision-making model is devised using reinforcement learning to carry out multi-criteria optimization problems. Learning automata algorithm is used to identify an optimal solution in the presence of single and multiple environments (criteria) using pareto optimality. The application of this model is also discussed, where the model provides an optimal solution to the food, energy, and water systems (FEWS) problem. Multi-criteria decision making Reinforcement learning Learning automata Pareto optimality
215	Optimal Control and Reinforcement Learning of Switched Systems Chen, Hua January 2018 (has links) No description available. Electrical Engineering optimal control reinforcement learning switched systems
216	Reinforcement Learning For Multiple Time Series Singh, Isha January 2019 (has links) No description available. Computer Science Reinforcement Learning Vector Autoregression Q learning
217	A Classification and Visualization System for Lower-Limb Activities Analysis With Musculoskeletal Modeling Zheng, Jianian 01 June 2020 (has links) No description available. Computer Science
218	Market Timing strategy through Reinforcement Learning HE, Xuezhong January 2021 (has links) This dissertation implements an optimal trading strategy based on the machine learning method and extreme value theory (EVT) to obtain an excess return on investments in the capital market. The trading strategy outperforms the benchmark S&P 500 index with higher returns and lower volatility through effective market timing. In addition, this dissertation starts by modeling the market tail risk using the EVT and reinforcement learning methods, distinguishing from the traditional value at risk method. In this dissertation, I used EVT to extract the characteristics of the tail risk, which are inputs for reinforcement learning. This process is proved to be effective in market timing, and the trading strategy could avoid market crash and achieve a long-term excess return. In sum, this study has several contributions. First, this study takes a new method to analyze stock price (in this dissertation, I use the S&P 500 index as a stock). I combined the EVT and reinforcement learning to study the price tail risk and predict stock crash efficiently, which is a new method for tail risk research. Thus, I can predict the stock crash or provide the probability of risk, and then, the trading strategy can be built. The second contribution is that this dissertation provides a dynamic market timing trading strategy, which can significantly outperform the market index with a lower volatility and a higher Sharpe ratio. Moreover, the dynamic trading process can provide investors an intuitive sense on the stock market and help in decision-making. Third, the success of the strategy shows that the combination of EVT and reinforcement learning can predict the stock crash very well, which is a great improvement on the extreme event study and deserves further study. / Business Administration/Finance Finance Extreme value theory Reinforcement learning Tail risk Trading strategy
219	Enhancing Traffic Efficiency of Mixed Traffic Using Control-based and Learning-based Connected and Autonomous Systems Young Joun Ha (8065802) 15 August 2023 (has links) <p>Inefficient traffic operations have far-reaching consequences in not just travel mobility but also public health, social equity, and economic prosperity. Congestion, a key symptom of inefficient operations, can lead to increased emissions, accidents, and productivity loss. Therefore, advancements and policies in transportation engineering require careful scrutiny to not only prevent unintended adverse consequences but also capitalize on opportunities for improvement. In spite of significant efforts to enhance traffic mobility and safety, human control of vehicles remains prone to errors such as inattention, impatience, and intoxication. Connected and autonomous vehicles (CAVs) are seen as a great opportunity to address human-related inefficiencies. This dissertation focuses on connectivity and automation and investigates the synergies between technologies. First, a deep reinforcement learning based strategy is proposed herein to enable agents to address the dynamic nature of inputs in traffic environments and to capture proximal and distant information, and to facilitate learning in rapidly changing traffic. The strategy is applied to alleviate congestion at highway bottlenecks by training a small number of CAVs to cooperatively reduce congestion through deep reinforcement learning. Secondly, to address congestion at intersections, the dissertation introduces a fog-based graphic RL (FG-RL) framework. This approach allows traffic signals across multiple intersections to form a cooperative coalition, sharing information for signal timing prescriptions. Large-scale traffic signal optimization is computationally inefficient, so the proposed FG-RL approach breaks down networks into smaller fog nodes that function as local centralization points within a decentralized system Doing so allows for a bottom-up solution approach for decomposing large traffic networks. Furthermore, the dissertation pioneers the notion of using a small CAV fleet, selected from any existing shared autonomous mobility services (SAMSs) to assist city road agencies to achieve string-stable driving in locally congested urban traffic. These vehicles are dispersed throughout the network to perform their primary function of providing ride-share services. However, when they encounter severe congestion, they act cooperatively with each other to be rerouted and to undertake traffic-stabilizing maneuvers to smoothen the traffic and reduce congestion. The smoothing behavior is learned through DRL, while the rerouting is facilitated through the proposed constrained entropy-based dynamic AV routing algorithm (CEDAR).</p> Transport engineering Connected and Autonomous Systems Traffic Flow Reinforcement Learning
220	Reinforcement Learning assisted Adaptive difficulty of Proof of Work (PoW) in Blockchain-enabled Federated Learning Sethi, Prateek 10 August 2023 (has links) This work addresses the challenge of heterogeneity in blockchain mining, particularly in the context of consortium and private blockchains. The motivation stems from ensuring fairness and efficiency in blockchain technology's Proof of Work (PoW) consensus mechanism. Existing consensus algorithms, such as PoW, PoS, and PoB, have succeeded in public blockchains but face challenges due to heterogeneous miners. This thesis highlights the significance of considering miners' computing power and resources in PoW consensus mechanisms to enhance efficiency and fairness. It explores the implications of heterogeneity in blockchain mining in various applications, such as Federated Learning (FL), which aims to train machine learning models across distributed devices collaboratively. The research objectives of this work involve developing novel RL-based techniques to address the heterogeneity problem in consortium blockchains. Two proposed RL-based approaches, RL based Miner Selection (RL-MS) and RL based Miner and Difficulty Selection (RL-MDS), focus on selecting miners and dynamically adapting the difficulty of PoW based on the computing power of the chosen miners. The contributions of this research work include the proposed RL-based techniques, modifications to the Ethereum code for dynamic adaptation of Proof of Work Difficulty (PoW-D), integration of the Commonwealth Cyber Initiative (CCI) xG testbed with an AI/ML framework, implementation of a simulator for experimentation, and evaluation of different RL algorithms. The research also includes additional contributions in Open Radio Access Network (O-RAN) and smart cities. The proposed research has significant implications for achieving fairness and efficiency in blockchain mining in consortium and private blockchains. By leveraging reinforcement learning techniques and considering the heterogeneity of miners, this work contributes to improving the consensus mechanisms and performance of blockchain-based systems. / Master of Science / Technological Advancement has led to devices having powerful yet heterogeneous computational resources. Due to the heterogeneity in the compute of miner nodes in a blockchain, there is unfairness in the PoW Consensus mechanism. More powerful devices have a higher chance of mining and gaining from the mining process. Additionally, the PoW consensus introduces a delay due to the time to mine and block propagation time. This work uses Reinforcement Learning to solve the challenge of heterogeneity in a private Ethereum blockchain. It also introduces a time constraint to ensure efficient blockchain performance for time-critical applications. Blockchain Machine Learning Federated learning Reinforcement Learning MEC 5G

Search results