Global ETD Search

231	Stochastic Game Theory Applications for Power Management in Cognitive Networks Fung, Sham 24 April 2014 (has links) No description available. Computer Science wireless cognitive reinforcement learning game theory
232	Mobile robot navigation in hilly terrains Tennety, Srinivas 23 September 2011 (has links) No description available. Robots Mobile robot Reinforcement learning Hilly terrains Autonomous Navigation
233	Hierarchical Sampling for Least-Squares Policy Iteration Schwab, Devin 26 January 2016 (has links) No description available. Computer Science reinforcement learning MaxQ LSPI Least-Squares Policy Iteration
234	Reinforcement Learning Based Generation of Highlighted Map for Mobile Robot Localization and Its Generalization to Particle Filter Design / 自己位置推定のためのハイライト地図の強化学習による生成と粒子フィルタ設計への一般化 Yoshimura, Ryota 23 May 2022 (has links) 京都大学 / 新制・課程博士 / 博士(工学) / 甲第24103号 / 工博第5025号 / 新制\|\|工\|\|1784(附属図書館) / 京都大学大学院工学研究科航空宇宙工学専攻 / (主査)教授藤本健治, 教授太田快人, 准教授丸田一郎, 教授泉田啓 / 学位規則第4条第1項該当 / Doctor of Philosophy (Engineering) / Kyoto University / DFAM mobile robots localization particle filters reinforcement learning state estimation 500
235	Evaluating the effects of hyperparameter optimization in VizDoom Olsson, Markus, Malm, Simon, Witt, Kasper January 2022 (has links) Reinforcement learning is a machine learning technique in which an artificial intelligence agent is guided by positive and negative rewards to learn strategies. To guide the agent’s behavior in addition to the reward are its hyperparameters. These values control how the agent learns. These hyperparameters are rarely disclosed in contemporary research, making it hard to estimate the value of optimizing these hyperparameters. This study aims to partly compare three different popular reinforcement learning algorithms, Proximal Policy Optimization (PPO), Advantage Actor-Critic (A2C) and Deep Q Network (DQN), and partly investigate the effects of hyperparameter optimization of several hyperparameters for each algorithm. All the included algorithms showed a significant difference after hyperparameter optimization, resulting in higher performance. A2C showed the largest performance increase after hyperparameter optimization, and PPO performed the best of the three algorithms both with default and optimized hyperparameters. Vizdoom reinforcement learning hyperparameter optimization Information Systems
236	Cooperative Perception for Connected Vehicles Mehr, Goodarz 31 May 2024 (has links) Doctor of Philosophy / Self-driving cars promise a future with safer roads and reduced traffic incidents and fatalities. This future hinges on the car's accurate understanding of its surrounding environment; however, the reliability of the algorithms that form this perception is not always guaranteed and adverse traffic and environmental conditions can significantly diminish the performance of these algorithms. To solve this problem, this research builds on the idea that enabling cars to share and exchange information via communication allows them to extend the range and quality of their perception beyond their capability. To that end, this research formulates a robust and flexible framework for cooperative perception, explores how connected vehicles can learn to collaborate to improve their perception, and introduces an affordable, experimental vehicle platform for connected autonomy research. cooperative perception connected vehicles map fusion multi-agent reinforcement learning
237	Hierarchical Bayesian Dataset Selection Zhou, Xiaona 05 1900 (has links) Despite the profound impact of deep learning across various domains, supervised model training critically depends on access to large, high-quality datasets, which are often challenging to identify. To address this, we introduce <b>H</b>ierarchical <b>B</b>ayesian <b>D</b>ataset <b>S</b>election (<b>HBDS</b>), the first dataset selection algorithm that utilizes hierarchical Bayesian modeling, designed for collaborative data-sharing ecosystems. The proposed method efficiently decomposes the contributions of dataset groups and individual datasets to local model performance using Bayesian updates with small data samples. Our experiments on two benchmark datasets demonstrate that HBDS not only offers a computationally lightweight solution but also enhances interpretability compared to existing data selection methods, by revealing deep insights into dataset interrelationships through learned posterior distributions. HBDS outperforms traditional non-hierarchical methods by correctly identifying all relevant datasets, achieving optimal accuracy with fewer computational steps, even when initial model accuracy is low. Specifically, HBDS surpasses its non-hierarchical counterpart by 1.8% on DIGIT-FIVE and 0.7% on DOMAINNET, on average. In settings with limited resources, HBDS achieves a 6.9% higher accuracy than its non-hierarchical counterpart. These results confirm HBDS's effectiveness in identifying datasets that improve the accuracy and efficiency of deep learning models when collaborative data utilization is essential. / Master of Science / Deep learning technologies have revolutionized many domains and applications, from voice recognition in smartphones to automated recommendations on streaming services. However, the success of these technologies heavily relies on having access to large and high-quality datasets. In many cases, selecting the right datasets can be a daunting challenge. To tackle this, we have developed a new method that can quickly figure out which datasets or groups of datasets contribute most to improving the performance of a model with only a small amount of data needed. Our tests prove that this method is not only effective and light on computation but also helps us understand better how different datasets relate to each other. Hierarchical Bayesian Data-Sharing Reinforcement Learning Dataset Selection
238	Use of Reinforcement Learning for Interference Avoidance or Efficient Jamming in Wireless Communications Schutz, Zachary Alexander 05 June 2024 (has links) We implement reinforcement learning in the context of wireless communications in two very different settings. In the first setting, we study the use of reinforcement learning in an underwater acoustic communications network to adapt its transmission frequencies to avoid interference and potential malicious jammers. To that effect, we implement a reinforcement learning algorithm called contextual bandits. The harsh environment of an underwater channel provides a challenging problem. The channel may induce multipath and time delays which lead to time-varying, frequency-selective attenuation. These factors are also influenced by the distance between the transmitter and receiver, the subbands the interference is located within, and the power of the transmitter. We show that the agent is effectively able to avoid frequency bands that have degraded channel quality or that contain interference, both of which are dynamic or time-varying . In the second setting, we study the use of reinforcement learning to adapt the modulation and power scheme of a jammer seeking to disrupt a wireless communications system. To achieve this, we make use of a linear contextual bandit to learn to jam the victim system. Prior work has shown that with the use of linear bandits, improved convergence is achieved to jam a single-carrier system using time-domain jamming schemes. However, communications systems today typically employ orthogonal frequency division multiplexing (OFDM) to transmit data, particularly in 4G/5G networks. This work explores the use of linear Thompson Sampling (TS) to jam OFDM-modulated signals. The jammer may select from both time-domain and frequency-domain jamming schemes. We demonstrate that the linear TS algorithm is able to perform better than a traditional reinforcement learning algorithm, upper confidence bound-1 (UCB-1), in terms of maximizing the victim's symbol error rate. We also draw novel insights by observing the action states, to which the reinforcement learning algorithm converges. We then investigate the design and modification of the context vector in the hope of in- creasing overall performance of the bandit, such as decreased learning period and increased symbol error rate caused to the victim. This includes running experiments on particular features and examining how the bandit weights the importance of the features in the context vector. Lastly, we study how to jam an OFDM-modulated signal which employs forward error correction coding. We extend this to leverage reinforcement learning to jam a 5G-based system implementing some aspects of the 5G protocol. This model is then modified to introduce unreliable reward feedback in the form of ACK/NACK observations to the jammer to understand the effect of how imperfect observations of errors can affect the jammer's ability to learn. We gain insights into the convergence time of the jammer and its ability to jam the victim, as well as improvements to the algorithm, and insights into the vulnerabilities of wireless communications for reinforcement learning based jamming. / Master of Science / In this thesis we implement a class of reinforcement learning known as contextual bandits in two different applications of communications systems and jamming. In the first setting, we study the use of reinforcement learning in an underwater acoustic communications network to adapt its transmission frequencies to avoid interference and potential malicious jammers. We show that the agent is effectively able to avoid frequency bands that have degraded channel quality or that contain interference, both of which are dynamic or time-varying. In the second setting, we study the use of reinforcement learning to adapt the jamming type, such as using additive white Gaussian noise, and power scheme of a jammer seeking to disrupt a wireless communications system. To achieve this, we make use of a linear contextual bandit which implies that the contexts that the jammer is able to observe and the sampled probability of each arm has a linear relationship with the reward function. We demonstrate that the linear algorithm is able to outperform a traditional reinforcement learning algorithm in terms of maximizing the victim's symbol error rate. We extend this work by examining the impact of the context feature vector design, LTE/5G-based protocol specifics (such as error correction coding), and imperfect reward feedback information. We gain insights into the convergence time of the jammer and its ability to jam the victim, as well as improvements to the algorithm, and insights into the vulnerabilities of wireless communications for reinforcement learning based jamming. Reinforcement Learning OFDM Interference Avoidance Jamming Underwater Channel 5G
239	Derivative-Free Meta-Blackbox Optimization on Manifold Sel, Bilgehan 06 1900 (has links) Solving a sequence of high-dimensional, nonconvex, but potentially similar optimization problems poses a significant computational challenge in various engineering applications. This thesis presents the first meta-learning framework that leverages the shared structure among sequential tasks to improve the computational efficiency and sample complexity of derivative-free optimization. Based on the observation that most practical high-dimensional functions lie on a latent low-dimensional manifold, which can be further shared among problem instances, the proposed method jointly learns the meta-initialization of a search point and a meta-manifold. This novel approach enables the efficient adaptation of the optimization process to new tasks by exploiting the learned meta-knowledge. Theoretically, the benefit of meta-learning in this challenging setting is established by proving that the proposed method achieves improved convergence rates and reduced sample complexity compared to traditional derivative-free optimization techniques. Empirically, the effectiveness of the proposed algorithm is demonstrated in two high-dimensional reinforcement learning tasks, showcasing its ability to accelerate learning and improve performance across multiple domains. Furthermore, the robustness and generalization capabilities of the meta-learning framework are explored through extensive ablation studies and sensitivity analyses. The thesis highlights the potential of meta-learning in tackling complex optimization problems and opens up new avenues for future research in this area. / Master of Science / Optimization problems are ubiquitous in various fields, from engineering to finance, where the goal is to find the best solution among a vast number of possibilities. However, solving these problems can be computationally challenging, especially when the search space is high-dimensional and the problem is nonconvex, meaning that there may be multiple locally optimal solutions. This thesis introduces a novel approach to tackle these challenges by leveraging the power of meta-learning, a technique that allows algorithms to learn from previous experiences and adapt to new tasks more efficiently. The proposed framework is based on the observation that many real-world optimization problems share similar underlying structures, even though they may appear different on the surface. By exploiting this shared structure, the meta-learning algorithm can learn a low-dimensional representation of the problem space, which serves as a guide for efficiently searching for optimal solutions in new, unseen problems. This approach is particularly useful when dealing with a sequence of related optimization tasks, as it allows the algorithm to transfer knowledge from one task to another, thereby reducing the computational burden and improving the overall performance. The effectiveness of the proposed meta-learning framework is demonstrated through rigorous theoretical analysis and empirical evaluations on challenging reinforcement learning tasks. These tasks involve high-dimensional search spaces and require the algorithm to adapt to changing environments. The results show that the meta-learning approach can significantly accelerate the learning process and improve the quality of the solutions compared to traditional optimization methods. Meta Optimization Zeroth Order Search Meta Reinforcement Learning
240	Reliable Low Latency Machine Learning for Resource Management in Wireless Networks Taleb Zadeh Kasgari, Ali 30 March 2022 (has links) Next-generation wireless networks must support a plethora of new applications ranging from the Internet of Things to virtual reality. Each one of these emerging applications have unique rate, reliability, and latency requirements that substantially differ from traditional services such as video streaming. Hence, there is a need for designing an efficient resource management framework that is taking into account different components that can affect the resource usage, including less obvious factors such as human behavior that contribute to the resource usage of the system. The use of machine learning for modeling mentioned components in a resource management system is a promising solution. This is because many hidden factors might contribute to the resource usage pattern of users or machine-type devices that can only be modeled using an end-to-end machine learning solution. Therefore, machine learning algorithms can be used either for modeling a complex factor such as the human brain's delay perception or for designing an end-to-end resource management system. The overarching goal of this dissertation is to develop and deploy machine learning frameworks that are suitable to model the various components of a wireless resource management system that must provide reliable and low latency service to the users. First, by explicitly modeling the limitations of the human brain, a concrete measure for the delay perception of human users in a wireless network is introduced. Then, a new probabilistic model for this delay perception is learned based on the brain features of a human user. Given the learned model for the delay perception of the human brain, a brain-aware resource management algorithm is proposed for allocating radio resources to human users while minimizing the transmit power and taking into account the reliability of both machine type devices and human users. Next, a novel experienced deep reinforcement learning (deep-RL) framework is proposed to provide model-free resource allocation for ultra reliable low latency communication (URLLC) in the downlink of a wireless network. The proposed, experienced deep-RL framework can guarantee high end-to-end reliability and low end-to-end latency, under explicit data rate constraints, for each wireless user without any models of or assumptions on the users' traffic. In particular, in order to enable the deep-RL framework to account for extreme network conditions and operate in highly reliable systems, a new approach based on generative adversarial networks (GANs) is proposed. After that, the problem of network slicing is studied in the context of a wireless system having a time-varying number of users that require two types of slices: reliable low latency (RLL) and self-managed (capacity limited) slices. To address this problem, a novel control framework for stochastic optimization is proposed based on the Lyapunov drift-plus-penalty method. This new framework enables the system to minimize power, maintain slice isolation, and provide reliable and low latency end-to-end communication for RLL slices. Then, a novel concept of three-dimensional (3D) cellular networks, that integrate drone base stations (drone-BS) and cellular-connected drone users (drone-UEs), is introduced. For this new 3D cellular architecture, a novel framework for network planning for drone-BSs as well as latency-minimal cell association for drone-UEs is proposed. For network planning, a tractable method for drone-BSs' deployment based on the notion of truncated octahedron shapes is proposed that ensures full coverage for a given space with minimum number of drone-BSs. In addition, to characterize frequency planning in such 3D wireless networks, an analytical expression for the feasible integer frequency reuse factors is derived. Subsequently, an optimal 3D cell association scheme is developed for which the drone-UEs' latency, considering transmission, computation, and backhaul delays, is minimized. Finally, the concept of super environments is introduced. After formulating this concept mathematically, it is shown that any two markov decision process (MDP) can be a member of a super environment if sufficient additional state space is added. Then the effect of this additional state space on model-free and model-based deep-RL algorithms is investigated. Next, the tradeoff caused by adding the extra state space on the speed of convergence and the optimality of the solution is discussed. In summary, this dissertation led to the development of machine learning algorithms for statistically modeling complex parts in the resource management system. Also, it developed a model-free controller that can control the resource management system reliably, with low latency, and optimally. / Doctor of Philosophy / Next-generation wireless networks must support a plethora of new applications ranging from the Internet of Things to virtual reality. Each one of these emerging applications have unique requirements that substantially differ from traditional services such as video streaming. Hence, there is a need for designing a new and efficient resource management framework that is taking into account different components that can affect the resource usage, including less obvious factors such as human behavior that contributes to the resource usage of the system. The use of machine learning for modeling mentioned components in a resource management system is a promising solution. This is because of the data-driven nature of machine learning algorithms that can help us to model many hidden factors that might contribute to the resource usage pattern of users or devices. These hidden factors can only be modeled using an end-to-end machine learning solution. By end-to-end, we mean the system only relies on its observation of the quality of service (QoS) for users. Therefore, machine learning algorithms can be used either for modeling a complex factor such as the human brain's delay perception or for designing an end-to-end resource management system. The overarching goal of this dissertation is to develop and deploy machine learning frameworks that are suitable to model the various components of a wireless resource management system that must provide reliable and low latency service to the users. Machine Learning Deep Reinforcement Learning Wireless Resource Management

Search results