Global ETD Search

101	Reinforcement Learning for Grid Voltage Stability with FACTS Oldeen, Joakim, Sharma, Vishnu January 2020 (has links) With increased penetration of renewable energy sources, maintaining equilibrium between production and consumption in the world’s electrical power systems (EPS) becomes more and more challenging. One way to increase stability and efficiency in an EPS is to use flexible alternating current transmission systems (FACTS). However, an EPS containing multiple FACTS-devices with overlapping areas of influence can lead to negative effects if the reference values they operate around are not updated with sufficient temporal resolution. The reference values are usually set manually by a system operator. The work in this master thesis has investigated how three different reinforcement learning (RL) algorithms can be used to set reference values automatically with higher temporal resolution than a system operator with the aim of increased voltage stability. The three RL algorithms – Q-learning, Deep Q-learning (DQN), and Twindelayed deep deterministic policy gradient (TD3) – were implemented in Python together with a 2-bus EPS test network acting as environment. The 2-bus EPS test network contain two FACTS devices: one for shunt compensation and one for series compensation. The results show that – with respect to reward – DQN was able to perform equally or better than non-RL cases 98.3 % of the time on the simulation test set, while corresponding values for TD3 and Q-learning were 87.3 % and 78.5 % respectively. DQN was able to achieve increased voltage stability on the test network while TD3 showed similar results except during lower loading levels. Q-learning decreased voltage stability on a substantial portion of the test set, even compared to a case without FACTS devices. To help with continued research and possible future real life implementation, a list of suggestions for future work has been established. Reinforcement learning Machine learning Q-learning DQN TD3 Electrical power systems Voltage stability FACTS Computer Sciences Datavetenskap (datalogi) Annan elektroteknik och elektronik Engineering and Technology Teknik och teknologier
102	Not All Goals Are Created Equal : Evaluating Hockey Players in the NHL Using Q-Learning with a Contextual Reward Function Vik, Jon January 2021 (has links) Not all goals in the game of ice hockey are created equal: some goals increase the chances of winning more than others. This thesis investigates the result of constructing and using a reward function that takes this fact into consideration, instead of the common binary reward function. The two reward functions are used in a Markov Game model with value iteration. The data used to evaluate the hockey players is play-by-play data from the 2013-2014 season of the National Hockey League (NHL). Furthermore, overtime events, goalkeepers, and playoff games are excluded from the dataset. This study finds that the constructed reward, in general, is less correlated than the binary reward to the metrics: points, time on ice and, star points. However, an increased correlation was found between the evaluated impact and time on ice for center players. Much of the discussion is devoted to the difficulty of validating the results from a player evaluation due to the lack of ground truth. One conclusion from this discussion is that future efforts must be made to establish consensus regarding how the success of a hockey player should be defined. Sports Analytics Markov Game Machine Learning Reinforcement Learning Q-Learning Data Mining National Hockey League Ice Hockey Reward Function Player Evaluation Other Computer and Information Science Annan data- och informationsvetenskap Computer and Information Sciences Data- och informationsvetenskap
103	Stabilizing Q-Learning for continuous control Hui, David Yu-Tung 12 1900 (has links) L'apprentissage profond par renforcement a produit des décideurs qui jouent aux échecs, au Go, au Shogi, à Atari et à Starcraft avec une capacité surhumaine. Cependant, ces algorithmes ont du mal à naviguer et à contrôler des environnements physiques, contrairement aux animaux et aux humains. Manipuler le monde physique nécessite la maîtrise de domaines d'actions continues tels que la position, la vitesse et l'accélération, contrairement aux domaines d'actions discretes dans des jeux de société et de vidéo. L'entraînement de réseaux neuronaux profonds pour le contrôle continu est instable: les agents ont du mal à apprendre et à conserver de bonnes habitudes, le succès est à haute variance sur hyperparamètres, graines aléatoires, même pour la même tâche, et les algorithmes ont du mal à bien se comporter en dehors des domaines dans lesquels ils ont été développés. Cette thèse examine et améliore l'utilisation de réseaux de neurones profonds dans l'apprentissage par renforcement. Le chapitre 1 explique comment le principe d'entropie maximale produit des fonctions d'objectifs pour l'apprentissage supervisé et non supervisé et déduit, à partir de la dynamique d'apprentissage des réseaux neuronaux profonds, certains termes régulisants pour stabiliser les réseaux neuronaux profonds. Le chapitre 2 fournit une justification de l'entropie maximale pour la forme des algorithmes acteur-critique et trouve une configuration d'un algorithme acteur-critique qui s'entraîne le plus stablement. Enfin, le chapitre 3 examine la dynamique d'apprentissage de l'apprentissage par renforcement profond afin de proposer deux améliorations aux réseaux cibles et jumeaux qui améliorent la stabilité et la convergence. Des expériences sont réalisées dans les simulateurs de physique idéale DeepMind Control, MuJoCo et Box2D. / Deep Reinforcement Learning has produced decision makers that play Chess, Go, Shogi, Atari, and Starcraft with superhuman ability. However, unlike animals and humans, these algorithms struggle to navigate and control physical environments. Manipulating the physical world requires controlling continuous action spaces such as position, velocity, and acceleration, unlike the discrete action spaces of board and video games. Training deep neural networks for continuous control is unstable: agents struggle to learn and retain good behaviors, performance is high variance across hyperparameters, random seed, and even multiple runs of the same task, and algorithms struggle to perform well outside the domains they have been developed in. This thesis finds principles behind the success of deep neural networks in other learning paradigms and examines their impact on reinforcement learning for continuous control. Chapter 1 explains how the maximum-entropy principle produces supervised and unsupervised learning loss functions and derives some regularizers used to stabilize deep networks from the training dynamics of deep learning. Chapter 2 provides a maximum-entropy justification for the form of actor-critic algorithms and finds a configuration of an actor-critic algorithm that trains most stably. Finally, Chapter 3 considers the training dynamics of deep reinforcement learning to propose two improvements to target and twin networks that improve stability and convergence. Experiments are performed within the DeepMind Control, MuJoCo, and Box2D ideal-physics simulators. Computer Science Aritifical Intelligence Deep Learning Reinforcement Learning Deep Reinforcement Learning Control Continuous Control Q-Learning MuJoCo Informatique Intelligence Artificielle Apprentissage Profond Apprentissage par Reinforcement Apprentissage par Reinforcement Profond Contrôle Contrôle Continu
104	Generation and Detection of Adversarial Attacks for Reinforcement Learning Policies Drotz, Axel, Hector, Markus January 2021 (has links) In this project we investigate the susceptibility ofreinforcement rearning (RL) algorithms to adversarial attacks.Adversarial attacks have been proven to be very effective atreducing performance of deep learning classifiers, and recently,have also been shown to reduce performance of RL agents.The goal of this project is to evaluate adversarial attacks onagents trained using deep reinforcement learning (DRL), aswell as to investigate how to detect these types of attacks. Wefirst use DRL to solve two environments from OpenAI’s gymmodule, namely Cartpole and Lunarlander, by using DQN andDDPG (DRL techniques). We then evaluate the performanceof attacks and finally we also train neural networks to detectattacks. The attacks was successful at reducing performancein the LunarLander environment and CartPole environment.The attack detector was very successful at detecting attacks onthe CartPole environment, but performed not quiet as well onLunarLander.We hypothesize that continuous action space environmentsmay pose a greater difficulty for attack detectors to identifypotential adversarial attacks. / I detta projekt undersöker vikänsligheten hos förstärknings lärda (RL) algotritmerför attacker mot förstärknings lärda agenter. Attackermot förstärknings lärda agenter har visat sig varamycket effektiva för att minska prestandan hos djuptförsärknings lärda klassifierare och har nyligen visat sigockså minska prestandan hos förstärknings lärda agenter.Målet med detta projekt är att utvärdera attacker motdjupt förstärknings lärda agenter och försöka utföraoch upptäcka attacker. Vi använder först RL för attlösa två miljöer från OpenAIs gym module CartPole-v0och ContiniousLunarLander-v0 med DQN och DDPG.Vi utvärderar sedan utförandet av attacker och avslutarslutligen med ett möjligt sätt att upptäcka attacker.Attackerna var mycket framgångsrika i att minskaprestandan i både CartPole-miljön och LunarLandermiljön. Attackdetektorn var mycket framgångsrik medatt upptäcka attacker i CartPole-miljön men presteradeinte lika bra i LunarLander-miljön.Vi hypotiserar att miljöer med kontinuerligahandlingsrum kan innebära en större svårighet fören attack identifierare att upptäcka attacker mot djuptförstärknings lärda agenter. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm Deep Reinforcement Learning Adversarial Attacks Adversarial Attack Detection Fast Gradient Sign Method Deep Deterministic Policy Gradient Deep Q-Learning Likelihood Ratio Test CUSUM Elektroteknik och elektronik
105	An empirical study of stability and variance reduction in DeepReinforcement Learning Lindström, Alexander January 2024 (has links) Reinforcement Learning (RL) is a branch of AI that deals with solving complex sequential decision making problems such as training robots, trading while following patterns and trends, optimal control of industrial processes, and more. These applications span various fields, including data science, factories, finance, and others[1]. The most popular RL algorithm today is Deep Q Learning (DQL), developed by a team at DeepMind, which successfully combines RL with Neural Network (NN). However, combining RL and NN introduces challenges such as numerical instability and unstable learning due to high variance. Among others, these issues are due to the“moving target problem”. To mitigate this problem, the target network was introduced as a solution. However, using a target network slows down learning, vastly increases memory requirements, and adds overheads in running the code. In this thesis, we conduct an empirical study to investigate the importance of target networks. We conduct this empirical study for three scenarios. In the first scenario, we train agents in online learning. The aim here is to demonstrate that the target network can be removed after some point in time without negatively affecting performance. To evaluate this scenario, we introduce the concept of the stabilization point. In thesecond scenario, we pre-train agents before continuing to train them in online learning. For this scenario, we demonstrate the redundancy of the target network by showing that it can be completely omitted. In the third scenario, we evaluate a newly developed activation function called Truncated Gaussian Error Linear Unit (TGeLU). For thisscenario, we train an agent in online learning and show that by using TGeLU as anactivation function, we can completely remove the target network. Through the empirical study of these scenarios, we conjecture and verify that a target network has only transient benefits concerning stability. We show that it has no influence on the quality of the policy found. We also observed that variance was generally higher when using a target network in the later stages of training compared to cases where the target network had been removed. Additionally, during the investigation of the second scenario, we observed that the magnitude of training iterations during pre-training affected the agent’s performance in the online learning phase. This thesis provides a deeper understanding of how the target networkaffects the training process of DQL, some of them - surrounding variance reduction- are contrary to popular belief. Additionally, the results have provided insights into potential future work. These include further explore the benefits of lower variance observed when removing the target network and conducting more efficient convergence analyses for the pre-training part in the second scenario. Reinforcement Learning Markov Decision Processes Neural Network Deep Q Learning Deep Q Network Sigmoid Truncated Gaussian Error Linear Unit Target network Stable learning Online learning Offline learning Computer Engineering Datorteknik
106	Resource Allocation for Sequential Decision Making Under Uncertainaty : Studies in Vehicular Traffic Control, Service Systems, Sensor Networks and Mechanism Design Prashanth, L A January 2013 (has links) (PDF) A fundamental question in a sequential decision making setting under uncertainty is “how to allocate resources amongst competing entities so as to maximize the rewards accumulated in the long run?”. The resources allocated may be either abstract quantities such as time or concrete quantities such as manpower. The sequential decision making setting involves one or more agents interacting with an environment to procure rewards at every time instant and the goal is to find an optimal policy for choosing actions. Most of these problems involve multiple (infinite) stages and the objective function is usually a long-run performance objective. The problem is further complicated by the uncertainties in the sys-tem, for instance, the stochastic noise and partial observability in a single-agent setting or private information of the agents in a multi-agent setting. The dimensionality of the problem also plays an important role in the solution methodology adopted. Most of the real-world problems involve high-dimensional state and action spaces and an important design aspect of the solution is the choice of knowledge representation. The aim of this thesis is to answer important resource allocation related questions in different real-world application contexts and in the process contribute novel algorithms to the theory as well. The resource allocation algorithms considered include those from stochastic optimization, stochastic control and reinforcement learning. A number of new algorithms are developed as well. The application contexts selected encompass both single and multi-agent systems, abstract and concrete resources and contain high-dimensional state and control spaces. The empirical results from the various studies performed indicate that the algorithms presented here perform significantly better than those previously proposed in the literature. Further, the algorithms presented here are also shown to theoretically converge, hence guaranteeing optimal performance. We now briefly describe the various studies conducted here to investigate problems of resource allocation under uncertainties of different kinds: Vehicular Traffic Control The aim here is to optimize the ‘green time’ resource of the individual lanes in road networks that maximizes a certain long-term performance objective. We develop several reinforcement learning based algorithms for solving this problem. In the infinite horizon discounted Markov decision process setting, a Q-learning based traffic light control (TLC) algorithm that incorporates feature based representations and function approximation to handle large road networks is proposed, see Prashanth and Bhatnagar [2011b]. This TLC algorithm works with coarse information, obtained via graded thresholds, about the congestion level on the lanes of the road network. However, the graded threshold values used in the above Q-learning based TLC algorithm as well as several other graded threshold-based TLC algorithms that we propose, may not be optimal for all traffic conditions. We therefore also develop a new algorithm based on SPSA to tune the associated thresholds to the ‘optimal’ values (Prashanth and Bhatnagar [2012]). Our thresh-old tuning algorithm is online, incremental with proven convergence to the optimal values of thresholds. Further, we also study average cost traffic signal control and develop two novel reinforcement learning based TLC algorithms with function approximation (Prashanth and Bhatnagar [2011c]). Lastly, we also develop a feature adaptation method for ‘optimal’ feature selection (Bhatnagar et al. [2012a]). This algorithm adapts the features in a way as to converge to an optimal set of features, which can then be used in the algorithm. Service Systems The aim here is to optimize the ‘workforce’, the critical resource of any service system. However, adapting the staffing levels to the workloads in such systems is nontrivial as the queue stability and aggregate service level agreement (SLA) constraints have to be complied with. We formulate this problem as a constrained hidden Markov process with a (discrete) worker parameter and propose simultaneous perturbation based simulation optimization algorithms for this purpose. The algorithms include both first order as well as second order methods and incorporate SPSA based gradient estimates in the primal, with dual ascent for the Lagrange multipliers. All the algorithms that we propose are online, incremental and are easy to implement. Further, they involve a certain generalized smooth projection operator, which is essential to project the continuous-valued worker parameter updates obtained from the SASOC algorithms onto the discrete set. We validate our algorithms on five real-life service systems and compare their performance with a state-of-the-art optimization tool-kit OptQuest. Being ��times faster than OptQuest, our scheme is particularly suitable for adaptive labor staffing. Also, we observe that it guarantees convergence and ﬁnds better solutions than OptQuest in many cases. Wireless Sensor Networks The aim here is to allocate the ‘sleep time’ (resource) of the individual sensors in an intrusion detection application such that the energy consumption from the sensors is reduced, while keeping the tracking error to a minimum. We model this sleep–wake scheduling problem as a partially-observed Markov decision process (POMDP) and propose novel RL-based algorithms -with both long-run discounted and average cost objectives -for solving this problem. All our algorithms incorporate function approximation and feature-based representations to handle the curse of dimensionality. Further, the feature selection scheme used in each of the proposed algorithms intelligently manages the energy cost and tracking cost factors, which in turn, assists the search for the optimal sleeping policy. The results from the simulation experiments suggest that our proposed algorithms perform better than a recently proposed algorithm from Fuemmeler and Veeravalli [2008], Fuemmeler et al. [2011]. Mechanism Design The setting here is of multiple self-interested agents with limited capacities, attempting to maximize their individual utilities, which often comes at the expense of the group’s utility. The aim of the resource allocator here then is to efficiently allocate the resource (which is being contended for, by the agents) and also maximize the social welfare via the ‘right’ transfer of payments. In other words, the problem is to find an incentive compatible transfer scheme following a socially efficient allocation. We present two novel mechanisms with progressively realistic assumptions about agent types aimed at economic scenarios where agents have limited capacities. For the simplest case where agent types consist of a unit cost of production and a capacity that does not change with time, we provide an enhancement to the static mechanism of Dash et al. [2007] that effectively deters misreport of the capacity type element by an agent to receive an allocation beyond its capacity, which thereby damages other agents. Our model incorporates an agent’s preference to harm other agents through a additive factor in the utility function of an agent and the mechanism we propose achieves strategy proofness by means of a novel penalty scheme. Next, we consider a dynamic setting where agent types evolve and the individual agents here again have a preference to harm others via capacity misreports. We show via a counterexample that the dynamic pivot mechanism of Bergemann and Valimaki [2010] cannot be directly applied in our setting with capacity-limited alim¨agents. We propose an enhancement to the mechanism of Bergemann and V¨alim¨aki [2010] that ensures truth telling w.r.t. capacity type element through a variable penalty scheme (in the spirit of the static mechanism). We show that each of our mechanisms is ex-post incentive compatible, ex-post individually rational, and socially efficient Vehicular Traffic Control Service Systems Sensor Networks Mechanism Design Traffic Signal Control - Q-Learning Traffic Signal Control Signal Control - Threshold Tuning Traffic Light Control Algorithm Adaptive Labor Staffing Sleep-Wake Scheduling Algorithms Reinforcement Learning Vehicular Control Graded Signal Control Adaptive Sleep–wake Control Computer Science
107	Theseus : a 3D virtual reality orientation game with real-time guidance system for cognitive training Jha, Manish Kumar 10 1900 (has links) Des études soutiennent que l’entraînement cognitif est une méthode efficace pour ralentirle déclin cognitif chez les personnes âgées. Les jeux sérieux basés sur la réalité virtuelle(RV) ont trouvé une application dans ce domaine en raison du haut niveau d’immersionet d’interactivité offert par les environnements virtuels (EV). Ce projet implémente unjeu d’orientation 3D en réalité virtuelle entièrement immersif avec un système pour guiderl’utilisateur en temps réel. Le jeu d’orientation 3D est utilisé comme exercice pour entraînerles capacités cognitives des utilisateurs. Les effets immédiats du jeu d’orientation sur lescapacités de mémoire et d’attention ont été étudiés sur quinze personnes âgées présentant undéclin cognitif subjectif (DCS). Il a été observé que bien qu’il n’y ait pas eu d’améliorationsignificative des résultats pour les exercices d’attention, les participants ont obtenu demeilleurs résultats aux exercices de mémoire spécifiques après avoir joué au jeu d’orientation. Le manque de succès dans la réalisation de l’objectif requis peut parfois augmenter lesémotions négatives chez les êtres humains, et plus particulièrement chez les personnes quisouffrent de déclin cognitif. C’est pourquoi le jeu a été équipé d’un système de guidageavec indices de localisation en temps réel pour contrôler les émotions négatives et aiderles participants à accomplir leurs tâches. Le système de guidage est basé sur des règleslogiques; chaque indice est délivré si une condition spécifique est remplie. Le changement desémotions des participants a montré que les indices sont efficaces pour réduire la frustration,étant donné qu’ils sont facilement compréhensibles et conçus pour donner un retour positif. La dernière partie du projet se concentre sur le système de guidage et met en oeuvre unmoyen pour l’activer entièrement selon les émotions d’une personne. Le problème consisteà identifier l’état des émotions qui devraient déclencher l’activation du système de guidage.Ce problème prend la forme d’un processus de décision markovien (PDM), qui peut êtrerésolu via l’apprentissage par renforcement (AR). Le réseau profond Q (RPQ) avec relectured’expérience (ER), qui est l’un des algorithmes d’apprentissage par renforcement les plusavancés pour la prédiction d’actions dans un espace d’action discret, a été utilisé dans cecontexte. L’algorithme a été formé sur des données d’émotions simulées, et testé sur les données de quinze personnes âgées acquises lors d’expériences menées dans la première partiedu projet. On observe que la méthode basée sur l’AR est plus performante que la méthodebasée sur les règles pour identifier l’état mental d’une personne afin de lui fournir des indices. / Studies support cognitive training as an efficient method to slow the cognitive declinein older adults. Virtual reality (VR) based serious games have found application in thisfield due to the high level of immersion and interactivity offered by virtual environments(VE). This project implements a fully immersive 3D virtual reality orientation game with areal-time guidance system to be used as an exercise for cognitive training. The immediateaftereffects of playing the orientation game on memory and attention abilities were studiedon fifteen older adults with subjective cognitive decline (SCD). It was observed that whilethere was no significant improvement in attention exercises, the participants performedbetter in specific memory exercises after playing the orientation game. Sometimes lack of success in achieving the required objective may increase the negativeemotions in humans and more so in people who suffer from cognitive decline. Hence, thegame was equipped with a real-time guidance system with location hints to control negativeemotions and help participants to complete the tasks. The guidance system is based onlogical rules; each hint is delivered if a specific condition is met. Change in emotions ofparticipants showed that hints are effective in reducing frustration, given that the hints areeasily comprehensible and designed to give positive feedback. The final part of the project focuses on the guidance system and implements a way toactivate it entirely based on a person’s emotions. The problem calls for identifying the stateof the emotions that should trigger the guidance system’s activation. This problem takes theform of a Markov decision process (MDP), which can be solved by setting it in a reinforcementlearning framework. Deep Q-Learning network (DQN) with experience replay (ER),which is one of the state-of-the-art reinforcement learning algorithms for predicting actionsin discrete action space, was used in this context. The algorithm was trained on simulateddata of emotions and tested on the data of fifteen older adults acquired in experimentsconducted in the first part of the project. It is observed that the RL based method performsbetter than the rule-based method in identifying the mental state of a person to provide hints. Emotions Immersive Virtual Reality Alzheimer’s Disease Reinforcement Learning Deep Q-Learning Networks Spatial Orientation Hints Maladie d’Alzheimer Émotions Réalité virtuelle immersive Orientation spatiale Indices Apprentissage par renforcement Réseaux profond Q
108	Prediction of Protein-Protein Interactions Using Deep Learning Techniques Soleymani, Farzan 24 April 2023 (has links) Proteins are considered the primary actors in living organisms. Proteins mainly perform their functions by interacting with other proteins. Protein-protein interactions underpin various biological activities such as metabolic cycles, signal transduction, and immune response. PPI identification has been addressed by various experimental methods such as the yeast two-hybrid, mass spectrometry, and protein microarrays, to mention a few. However, due to the sheer number of proteins, experimental methods for finding interacting and non-interacting protein pairs are time-consuming and costly. Therefore a sequence-based framework called ProtInteract is developed to predict protein-protein interaction. ProtInteract comprises two components: first, a novel autoencoder architecture that encodes each protein's primary structure to a lower-dimensional vector while preserving its underlying sequential pattern by extracting uncorrelated attributes and more expressive descriptors. This leads to faster training of the second network, a deep convolutional neural network (CNN) that receives encoded proteins and predicts their interaction. Three different scenarios formulate the prediction task. In each scenario, the deep CNN predicts the class of a given encoded protein pair. Each class indicates different ranges of confidence scores corresponding to the probability of whether a predicted interaction occurs or not. The proposed framework features significantly low computational complexity and relatively fast response. The present study makes two significant contributions to the field of protein-protein interaction (PPI) prediction. Firstly, it addresses the computational challenges posed by the high dimensionality of protein datasets through the use of dimensionality reduction techniques, which extract highly informative sequence attributes. Secondly, the proposed framework, ProtInteract, utilises this information to identify the interaction characteristics of a protein based on its amino acid configuration. ProtInteract encodes the protein's primary structure into a lower-dimensional vector space, thereby reducing the computational complexity of PPI prediction. Our results provide evidence of the proposed framework's accuracy and efficiency in predicting protein-protein interactions. Long-short Term Memory Recurrent Neural Networks Protein-Protein Interaction Temporal Convolutional Network Convolutional Neural Network Autoencoder Reinforcement learning actor-critic portfolio management stock market prediction coverage control multi-agent system SARSA Q-learning Graph convolutional neural network GCN state-action-reward-state-action
109	Modelling closed-loop receptive fields: On the formation and utility of receptive fields in closed-loop behavioural systems / Entwicklung rezeptiver Felder in autonom handelnden, rückgekoppelten Systemen Kulvicius, Tomas 20 April 2010 (has links) No description available. 500 Naturwissenschaften allgemein Mathematics and Computer Science Computational Neuroscience Rückgekoppelte Systeme Wahrnehmungs-Handlungs-Schleife Visuelle Rezeptive Felder Temporal-Sequence-Learning-Algorithmus Autonom Fahrender Roboter Entropie Energie Eingang/Ausgangs-Verhältnis Optimale Agenten Ortsfelder Q-Lernalgorithmus Eigenmarkierung Zielgerichtete Orientierung Computational Neuroscience Closed-Loop Learning Systems Sensorimotor Loop Visual Receptive Fields Temporal Sequence Learning Driving Robot Entropy Input/Output Ratio Energy Optimal Agents Place Fields Place Field Remapping Q-learning Self-marking Goal Directed Navigation 30.03 RA 000: Allgemeine Naturwissenschaften
110	Design and Performance Analysis of Access Control Mechanisms for Massive Machine-to-Machine Communications in Wireless Cellular Networks Tello Oquendo, Luis Patricio 10 September 2018 (has links) En la actualidad, la Internet de las Cosas (Internet of Things, IoT) es una tecnología esencial para la próxima generación de sistemas inalámbricos. La conectividad es la base de IoT, y el tipo de acceso requerido dependerá de la naturaleza de la aplicación. Uno de los principales facilitadores del entorno IoT es la comunicación machine-to-machine (M2M) y, en particular, su enorme potencial para ofrecer conectividad ubicua entre dispositivos inteligentes. Las redes celulares son la elección natural para las aplicaciones emergentes de IoT y M2M. Un desafío importante en las redes celulares es conseguir que la red sea capaz de manejar escenarios de acceso masivo en los que numerosos dispositivos utilizan comunicaciones M2M. Por otro lado, los sistemas celulares han experimentado un tremendo desarrollo en las últimas décadas: incorporan tecnología sofisticada y nuevos algoritmos para ofrecer una amplia gama de servicios. El modelado y análisis del rendimiento de estas redes multiservicio es también una tarea desafiante que podría requerir un gran esfuerzo computacional. Para abordar los desafíos anteriores, nos centramos en primer lugar en el diseño y la evaluación de las prestaciones de nuevos mecanismos de control de acceso para hacer frente a las comunicaciones masivas M2M en redes celulares. Posteriormente nos ocupamos de la evaluación de prestaciones de redes multiservicio y proponemos una nueva técnica analítica que ofrece precisión y eficiencia computacional. Nuestro principal objetivo es proporcionar soluciones para aliviar la congestión en la red de acceso radio cuando un gran número de dispositivos M2M intentan conectarse a la red. Consideramos los siguientes tipos de escenarios: (i) los dispositivos M2M se conectan directamente a las estaciones base celulares, y (ii) forman grupos y los datos se envían a concentradores de tráfico (gateways) que les proporcionan acceso a la infraestructura. En el primer escenario, dado que el número de dispositivos añadidos a la red aumenta continuamente, esta debería ser capaz de manejar el considerable incremento en las solicitudes de acceso. El 3rd Generation Partnership Project (3GPP) ha propuesto el access class barring (ACB) como una solución práctica para el control de congestión en la red de acceso radio y la red troncal. El ajuste correcto de los parámetros de ACB de acuerdo con la intensidad del tráfico es crítico, pero cómo hacerlo de forma dinámica y autónoma es un problema complejo cuya solución no está recogida en las especificaciones del 3GPP. Esta tesis doctoral contribuye al análisis del rendimiento y al diseño de nuevos algoritmos que implementen efectivamente este mecanismo, y así superar los desafíos introducidos por las comunicaciones masivas M2M. En el segundo escenario, dado que la heterogeneidad de los dispositivos IoT y las arquitecturas celulares basadas en hardware imponen desafíos aún mayores para permitir una comunicación flexible y eficiente en los sistemas inalámbricos 5G, esta tesis doctoral también contribuye al diseño de software-defined gateways (SD-GWs) en una nueva arquitectura propuesta para redes inalámbricas definidas por software que se denomina SoftAir. Esto permite manejar tanto un gran número de dispositivos como el volumen de datos que estarán vertiendo en la red. Otra contribución de esta tesis doctoral es la propuesta de una técnica novedosa para el análisis de prestaciones de redes multiservicio de alta capacidad que se basa en un nuevo enfoque del modelizado analítico de sistemas que operan a diferentes escalas temporales. Este enfoque utiliza el análisis del transitorio de una serie de subcadenas absorbentes y lo denominamos absorbing Markov chain approximation (AMCA). Nuestros resultados muestran que para un coste computacional dado, AMCA calcula los parámetros de prestaciones habituales de un sistema con mayor precisión, en comparación con los resultados obtenidos por otr / Nowadays, Internet of Things (IoT) is an essential technology for the upcoming generation of wireless systems. Connectivity is the foundation for IoT, and the type of access required will depend on the nature of the application. One of the leading facilitators of the IoT environment is machine-to-machine (M2M) communication, and particularly, its tremendous potential to offer ubiquitous connectivity among intelligent devices. Cellular networks are the natural choice for emerging IoT and M2M applications. A major challenge in cellular networks is to make the network capable of handling massive access scenarios in which myriad devices deploy M2M communications. On the other hand, cellular systems have seen a tremendous development in recent decades; they incorporate sophisticated technology and algorithms to offer a broad range of services. The modeling and performance analysis of these large multi-service networks is also a challenging task that might require high computational effort. To address the above challenges, we first concentrate on the design and performance evaluation of novel access control schemes to deal with massive M2M communications. Then, we focus on the performance evaluation of large multi-service networks and propose a novel analytical technique that features accuracy and computational efficiency. Our main objective is to provide solutions to ease the congestion in the radio access or core network when massive M2M devices try to connect to the network. We consider the following two types of scenarios: (i) massive M2M devices connect directly to cellular base stations, and (ii) they form clusters and the data is forwarded to gateways that provide them with access to the infrastructure. In the first scenario, as the number of devices added to the network is constantly increasing, the network should handle the considerable increment in access requests. Access class barring (ACB) is proposed by the 3rd Generation Partnership Project (3GPP) as a practical congestion control solution in the radio access and core network. The proper tuning of the ACB parameters according to the traffic intensity is critical, but how to do so dynamically and autonomously is a challenging task that has not been specified. Thus, this dissertation contributes to the performance analysis and optimal design of novel algorithms to implement effectively this barring scheme and overcome the challenges introduced by massive M2M communications. In the second scenario, since the heterogeneity of IoT devices and the hardware-based cellular architectures impose even greater challenges to enable flexible and efficient communication in 5G wireless systems, this dissertation also contributes to the design of software-defined gateways (SD-GWs) in a new architecture proposed for wireless software-defined networks called SoftAir. The deployment of these SD-GWs represents an alternative solution aiming at handling both a vast number of devices and the volume of data they will be pouring into the network. Another contribution of this dissertation is to propose a novel technique for the performance analysis of large multi-service networks. The underlying complexity of the network, particularly concerning its size and the ample range of configuration options, makes the solution of the analytical models computationally costly. However, a typical characteristic of these networks is that they support multiple types of traffic flows operating at different time-scales. This time-scale separation can be exploited to reduce considerably the computational cost associated to determine the key performance indicators. Thus, we propose a novel analytical modeling approach based on the transient regime analysis, that we name absorbing Markov chain approximation (AMCA). For a given computational cost, AMCA finds common performance indicators with greater accuracy, when compared to the results obtained by other approximate methods proposed in the literature. / En l'actualitat, la Internet de les Coses (Internet of Things, IoT) és una tecnologia essencial per a la propera generació de sistemes sense fil. La connectivitat és la base d'IoT, i el tipus d'accés requerit dependrà de la naturalesa de l'aplicació. Un dels principals facilitadors de l'entorn IoT és la comunicació machine-to-machine (M2M) i, en particular, el seu enorme potencial per oferir connectivitat ubiqua entre dispositius intel · ligents. Les xarxes mòbils són l'elecció natural per a les aplicacions emergents de IoT i M2M. Un desafiament important en les xarxes mòbils que actualment está rebent molta atenció és aconseguir que la xarxa siga capaç de gestionar escenaris d'accés massiu en què una gran quantitat de dispositius utilitzen comunicacions M2M. D'altra banda, els sistemes mòbils han experimentat un gran desenvolupament en les últimes dècades: incorporen tecnologia sofisticada i nous algoritmes per oferir una àmplia gamma de serveis. El modelatge i análisi del rendiment d'aquestes xarxes multiservei és també un desafiament important que podria requerir un gran esforç computacional. Per abordar els desafiaments anteriors, en aquesta tesi doctoral ens centrem en primer lloc en el disseny i l'avaluació de les prestacions de nous mecanismes de control d'accés per fer front a les comunicacions massives M2M en xarxes cel · lulars. Posteriorment ens ocupem de l'avaluació de prestacions de xarxes multiservei i proposem una nova tècnica analítica que ofereix precisió i eficiència computacional. El nostre principal objectiu és proporcionar solucions per a alleujar la congestió a la xarxa d'accés ràdio quan un gran nombre de dispositius M2M intenten connectar-se a la xarxa. Considerem els dos tipus d'escenaris següents: (i) els dispositius M2M es connecten directament a les estacions base cel · lulars, i (ii) formen grups i les dades s'envien a concentradors de trànsit (gateways) que els proporcionen accés a la infraestructura. En el primer escenari, atès que el nombre de dispositius afegits a la xarxa augmenta contínuament, aquesta hauria de ser capaç de gestionar el considerable increment en les sol · licituds d'accés. El 3rd Generation Partnership Project (3GPP) ha proposat l'access class barring (ACB) com una solució pràctica per al control de congestió a la xarxa d'accès ràdio i la xarxa troncal. L'ajust correcte dels paràmetres d'ACB d'acord amb la intensitat del trànsit és crític, però com fer-ho de forma dinàmica i autònoma és un problema complex, la solució del qual no està recollida en les especificacions del 3GPP. Aquesta tesi doctoral contribueix a l'anàlisi del rendiment i al disseny de nous algoritmes que implementen efectivament aquest mecanisme, i així superar els desafiaments introduïts per les comunicacions massives M2M en les xarxes mòbils actuals i futures. En el segon escenari, atès que l'heterogeneïtat dels dispositius IoT i les arquitectures cel · lulars basades en hardware imposen desafiaments encara més grans per permetre una comunicació flexible i eficient en els sistemes sense fil 5G, aquesta tesi doctoral també contribueix al disseny de software-defined gateways (SD-GWS) en una nova arquitectura proposada per a xarxes sense fils definides per programari que s'anomena SoftAir. Això permet gestionar tant un gran nombre de dispositius com el volum de dades que estaran abocant a la xarxa. Una altra contribució d'aquesta tesi doctoral és la proposta d'una tècnica innovadora per a l'anàlisi de prestacions de xarxes multiservei d'alta capacitat que es basa en un nou enfocament del modelitzat analític de sistemes que operen a diferents escales temporals. Aquest enfocament utilitza l'anàlisi del transitori d'una sèrie de subcadenes absorbents i l'anomenem absorbing Markov chain Approximation (AMCA). Els nostres resultats mostren que per a un cost computacional donat, AMCA calcula els paràmetres de prestacions habituals d / Tello Oquendo, LP. (2018). Design and Performance Analysis of Access Control Mechanisms for Massive Machine-to-Machine Communications in Wireless Cellular Networks [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/107946 Access class barring (ACB) Cellular systems Cognitive radio networks Congestion control Internet of Things (loT) Machine-to-machine communications markov chains Massive machine-type communications Mobile traffic analysis Network modeling Performance analysis Phase-type distribution Probability theory Q-learning Random access channel Random access protocols Reinforcement learning SoftAir architecture Software-defined gateways Stochastic processes Time-scale separation Wireless software-defined networks 5G and beyond systems INGENIERIA TELEMATICA

Search results