Global ETD Search

451	Exploration of Intelligent HVAC Operation Strategies for Office Buildings Xiaoqi Liu (9681032) 15 December 2020 (has links) <p>Commercial buildings not only have significant impacts on occupants’ well-being, but also contribute to more than 19% of the total energy consumption in the United States. Along with improvements in building equipment efficiency and utilization of renewable energy, there has been significant focus on the development of advanced heating, ventilation, and air conditioning (HVAC) system controllers that incorporate predictions (e.g., occupancy patterns, weather forecasts) and current state information to execute optimization-based strategies. For example, model predictive control (MPC) provides a systematic implementation option using a system model and an optimization algorithm to adjust the control setpoints dynamically. This approach automatically satisfies component and operation constraints related to building dynamics, HVAC equipment, etc. However, the wide adaptation of advanced controls still faces several practical challenges: such approaches involve significant engineering effort and require site-specific solutions for complex problems that need to consider uncertain weather forecast and engaging the building occupants. This thesis explores smart building operation strategies to resolve such issues from the following three aspects. </p> <p>First, the thesis explores a stochastic model predictive control (SMPC) method for the optimal utilization of solar energy in buildings with integrated solar systems. This approach considers the uncertainty in solar irradiance forecast over a prediction horizon, using a new probabilistic time series autoregressive model, calibrated on the sky-cover forecast from a weather service provider. In the optimal control formulation, we model the effect of solar irradiance as non-Gaussian stochastic disturbance affecting the cost and constraints, and the nonconvex cost function is an expectation over the stochastic process. To solve this optimization problem, we introduce a new approximate dynamic programming methodology that represents the optimal cost-to-go functions using Gaussian process, and achieves good solution quality. We use an emulator to evaluate the closed-loop operation of a building-integrated system with a solar-assisted heat pump coupled with radiant floor heating. For the system and climate considered, the SMPC saves up to 44% of the electricity consumption for heating in a winter month, compared to a well-tuned rule-based controller, and it is robust, imposing less uncertainty on thermal comfort violation.</p> <p>Second, this thesis explores user-interactive thermal environment control systems that aim to increase energy efficiency and occupant satisfaction in office buildings. Towards this goal, we present a new modeling approach of occupant interactions with a temperature control and energy use interface based on utility theory that reveals causal effects in the human decision-making process. The model is a utility function that quantifies occupants’ preference over temperature setpoints incorporating their comfort and energy use considerations. We demonstrate our approach by implementing the user-interactive system in actual office spaces with an energy efficient model predictive HVAC controller. The results show that with the developed interactive system occupants achieved the same level of overall satisfaction with selected setpoints that are closer to temperatures determined by the MPC strategy to reduce energy use. Also, occupants often accept the default MPC setpoints when a significant improvement in the thermal environment conditions is not needed to satisfy their preference. Our results show that the occupants’ overrides can contribute up to 55% of the HVAC energy consumption on average with MPC. The prototype user-interactive system recovered 36% of this additional energy consumption while achieving the same overall occupant satisfaction level. Based on these findings, we propose that the utility model can become a generalized approach to evaluate the design of similar user-interactive systems for different office layouts and building operation scenarios. </p> <p>Finally, this thesis presents an approach based on meta-reinforcement learning (Meta-RL) that enables autonomous optimal building controls with minimum engineering effort. In reinforcement learning (RL), the controller acts as an agent that executes control actions in response to the real-time building system status and exogenous disturbances according to a policy. The agent has the ability to update the policy towards improving the energy efficiency and occupant satisfaction based on the previously achieved control performance. In order to ensure satisfactory performance upon deployment to a target building, the agent is trained using the Meta-RL algorithm beforehand with a model universe obtained from available building information, which is a probability measure over the possible building dynamical models. Starting from what is learned in the training process, the agent then fine-tunes the policy to adapt to the target building based on-site observations. The control performance and adaptability of the Meta-RL agent is evaluated using an emulator of a private office space over 3 summer months. For the system and climate under consideration, the Meta-RL agent can successfully maintain the indoor air temperature within the first week, and result in only 16% higher energy consumption in the 3<sup>rd</sup> month than MPC, which serves as the theoretical upper performance bound. It also significantly outperforms the agents trained with conventional RL approach. </p> Building Science and Techniques Model predictive control Reinforcement learning Human-building interaction Building energy efficiency
452	Hybrid Station-Keeping Controller Design Leveraging Floquet Mode and Reinforcement Learning Approaches Andrew Blaine Molnar (9746054) 15 December 2020 (has links) The general station-keeping problem is a focal topic when considering any spacecraft mission application. Recent missions are increasingly requiring complex trajectories to satisfy mission requirements, necessitating the need for accurate station-keeping controllers. An ideal controller reliably corrects for spacecraft state error, minimizes the required propellant, and is computationally efficient. To that end, this investigation assesses the effectiveness of several controller formulations in the circular restricted three-body model. Particularly, a spacecraft is positioned in a L<sub>1</sub> southern halo orbit within the Sun-Earth Moon Barycenter system. To prevent the spacecraft from departing the vicinity of this reference halo orbit, the Floquet mode station-keeping approach is introduced and evaluated. While this control strategy generally succeeds in the station-keeping objective, a breakdown in performance is observed proportional to increases in state error. Therefore, a new hybrid controller is developed which leverages Floquet mode and reinforcement learning. The hybrid controller is observed to efficiently determine corrective maneuvers that consistently recover the reference orbit for all evaluated scenarios. A comparative analysis of the performance metrics of both control strategies is conducted, highlighting differences in the rates of success and the expected propellant costs. The performance comparison demonstrates a relative improvement in the ability of the hybrid controller to meet the mission objectives, and suggests the applicability of reinforcement learning to the station-keeping problem. Aerospace Engineering station-keeping floquet mode reinforcement learning circular restricted three-body problem sun-earth
453	A framework for training Spiking Neural Networks using Evolutionary Algorithms and Deep Reinforcement Learning Anirudh Shankar (10276349) 12 March 2021 (has links) In this work two novel frameworks, one using evolutionary algorithms and another using Reinforcement Learning for training Spiking Neural Networks are proposed and analyzed. A novel multi-agent evolutionary robotics (ER) based framework, inspired by competitive evolutionary environments in nature, is demonstrated for training Spiking Neural Networks (SNN). The weights of a population of SNNs along with morphological parameters of bots they control in the ER environment are treated as phenotypes. Rules of the framework select certain bots and their SNNs for reproduction and others for elimination based on their efficacy in capturing food in a competitive environment. While the bots and their SNNs are given no explicit reward to survive or reproduce via any loss function, these drives emerge implicitly as they evolve to hunt food and survive within these rules. Their efficiency in capturing food as a function of generations exhibit the evolutionary signature of punctuated equilibria. Two evolutionary inheritance algorithms on the phenotypes, Mutation and Crossover with Mutation along with their variants, are demonstrated. Performances of these algorithms are compared using ensembles of 100 experiments for each algorithm. We find that one of the Crossover with Mutation variants promotes 40% faster learning in the SNN than mere Mutation with a statistically significant margin. Along with an evolutionary approach to training SNNs, we also describe a novel Reinforcement Learning(RL) based framework using the Proximal Policy Optimization to train a SNN for an image classification task. The experiments and results of the framework are then discussed highlighting future direction of the work. Spiking Neural Networks Evolutionary Algorithms Multi Agent Evolutionary Robotics Reinforcement Learning
454	Regularized Greedy Gradient Q-Learning with Mobile Health Applications Lu, Xiaoqi January 2021 (has links) Recent advance in health and technology has made mobile apps a viable approach to delivering behavioral interventions in areas including physical activity encouragement, smoking cessation, substance abuse prevention, and mental health management. Due to the chronic nature of most of the disorders and heterogeneity among mobile users, delivery of the interventions needs to be sequential and tailored to individual needs. We operationalize the sequential decision making via a policy that takes a mobile user's past usage pattern and health status as input and outputs an app/intervention recommendation with the goal of optimizing the cumulative rewards of interest in an indefinite horizon setting. There is a plethora of reinforcement learning methods on the development of optimal policies in this case. However, the vast majority of the literature focuses on studying the convergence of the algorithms with infinite amount of data in computer science domain. Their performances in health applications with limited amount of data and high noise are yet to be explored. Technically the nature of sequential decision making results in an objective function that is non-smooth (not even a Lipschitz) and non-convex in the model parameters. This poses theoretical challenges to the characterization of the asymptotic properties of the optimizer of the objective function, as well as computational challenges for optimization. This problem is especially exacerbated with the presence of high dimensional data in mobile health applications. In this dissertation we propose a regularized greedy gradient Q-learning (RGGQ) method to tackle this estimation problem. The optimal policy is estimated via an algorithm which synthesizes the PGM and the GGQ algorithms in the presence of an L₁ regularization, and its asymptotic properties are established. The theoretical framework initiated in this work can be applied to tackle other non-smooth high dimensional problems in reinforcement learning. Biometry Mobile apps Smoking cessation Substance abuse--Prevention Reinforcement learning Mental health Medical technology
455	[pt] COORDENAÇÃO INTELIGENTE PARA MULTIAGENTES BASEADOS EM MODELOS NEURO-FUZZY HIERÁRQUICOS COM APRENDIZADO POR REFORÇO / [en] INTELLIGENT COORDINATION FOR MULTIAGENT BASED MODELS HIERARCHICAL NEURO-FUZZY WITH REINFORCEMENT LEARNING 08 November 2018 (has links) [pt] Esta tese consiste na investigação e no desenvolvimento de estratégias de coordenação inteligente que possam ser integradas a modelos neuro-fuzzy hierárquicos para sistemas de múltiplos agentes em ambientes complexos. Em ambientes dinâmicos ou complexos a organização dos agentes deve se adaptar a mudanças nos objetivos do sistema, na disponibilidade de recursos, nos relacionamentos entre os agentes, e assim por diante. Esta flexibilidade é um problema chave nos sistemas multiagente. O objetivo principal dos modelos propostos é fazer com que múltiplos agentes interajam de forma inteligente entre si em sistemas complexos. Neste trabalho foram desenvolvidos dois novos modelos inteligentes neuro-fuzzy hierárquicos com mecanismo de coordenação para sistemas multiagentes, a saber: modelo Neuro-Fuzzy Hierárquico com Aprendizado por Reforço com mecanismo de coordenação Market-Driven (RL-NFHP-MA-MD); e o Modelo Neuro-Fuzzy Hierárquico com Aprendizado por Reforço com modelo de coordenação por grafos (RL-NFHP-MA-CG). A inclusão de modelos de coordenação ao modelo Neuro-Fuzzy Hierárquicos com Aprendizado por Reforço (RL-NHFP-MA) foi motivada principalmente pela importância de otimizar o desempenho do trabalho em conjunto dos agentes, melhorando os resultados do modelo e visando aplicações mais complexas. Os modelos foram concebidos a partir do estudo das limitações existentes nos modelos atuais e das características desejáveis para sistemas de aprendizado baseados em RL, em particular quando aplicados a ambientes contínuos e/ou ambientes considerados de grande dimensão. Os modelos desenvolvidos foram testados através de basicamente dois estudos de caso: a aplicação benchmark do jogo da presa-predador (Pursuit- Game) e Futebol de robôs (simulado e com agentes robóticos). Os resultados obtidos tanto no jogo da presa-predador quanto no futebol de robô através dos novos modelos RL-NFHP-MA-MD e RL-NFHP-MA-CG para múltiplos agentes se mostraram bastante promissores. Os testes demonstraram que o novo sistema mostrou capacidade de coordenar as ações entre agentes com uma velocidade de convergência quase 30 por cento maior que a versão original. Os resultados de futebol de robô foram obtidos com o modelo RL-NFHP-MA-MD e o modelo RL-NFHP-MA-CG, os resultados são bons em jogos completos como em jogadas específicas, ganhando de times desenvolvidos com outros modelos similares. / [en] This thesis is the research and development of intelligent coordination strategies that can be integrated into models for hierarchical neuro-fuzzy systems of multiple agents in complex environments. In dynamic environments or complex organization of agents must adapt to changes in the objectives of the system, availability of resources, relationships between agents, and so on. This flexibility is a key problem in multiagent systems. The main objective of the proposed models is to make multiple agents interact intelligently with each other in complex systems. In this work we developed two new intelligent neuro-fuzzy models with hierarchical coordination mechanism for multi-agent systems, namely Neuro-Fuzzy Model with Hierarchical Reinforcement Learning with coordination mechanism Market-Driven (RL-NFHP-MA-MD), and Neuro-Fuzzy model with Hierarchical Reinforcement Learning with coordination model for graphs (RL-NFHP-MA-CG). The inclusion of coordination models to model with Neuro-Fuzzy Hierarchical Reinforcement Learning (RL-NHFP-MA) was primarily motivated by the importance of optimizing the performance of the work in all players, improving the model results and targeting more complex applications. The models were designed based on the study of existing limitations in current models and desirable features for learning systems based RL, in particular when applied to continuous environments and/or environments considered large. The developed models were tested primarily through two case studies: application benchmark game of predator-prey ( Pursuit-Game) and Soccer robots (simulated and robotic agents). The results obtained both in the game of predator-prey as in soccer robot through new models RL-NFHP-MA-MD and RL-NFHP-MA-CG for multiple agents proved promising. The tests showed that the new system showed ability to coordinate actions among agents with a convergence rate nearly 30 percent higher than the original version. Results soccer robot were obtained with model RL-NFHP-MA-MD–NFHP-RL and model-CG-MA, the results are good in games played in full as specific winning teams developed with other similar models. [pt] APRENDIZADO POR REFORCO [pt] COORDENACAO MULTIAGENTE [pt] NEURO-FUZZY [en] REINFORCEMENT LEARNING [en] MULTIAGENT COORDINATION [en] NEURO-FUZZY
456	A Reward-based Algorithm for Hyperparameter Optimization of Neural Networks / En Belöningsbaserad Algoritm för Hyperparameteroptimering av Neurala Nätverk Larsson, Olov January 2020 (has links) Machine learning and its wide range of applications is becoming increasingly prevalent in both academia and industry. This thesis will focus on the two machine learning methods convolutional neural networks and reinforcement learning. Convolutional neural networks has seen great success in various applications for both classification and regression problems in a diverse range of fields, e.g. vision for self-driving cars or facial recognition. These networks are built on a set of trainable weights optimized on data, and a set of hyperparameters set by the designer of the network which will remain constant. For the network to perform well, the hyperparameters have to be optimized separately. The goal of this thesis is to investigate the use of reinforcement learning as a method for optimizing hyperparameters in convolutional neural networks built for classification problems. The reinforcement learning methods used are a tabular Q-learning and a new Q-learning inspired algorithm denominated max-table. These algorithms have been tested with different exploration policies based on each hyperparameter value’s covariance, precision or relevance to the performance metric. The reinforcement learning algorithms were mostly tested on the datasets CIFAR10 and MNIST fashion against a baseline set by random search. While the Q-learning algorithm was not able to perform better than random search, max-table was able to perform better than random search in 50% of the time on both datasets. Hyperparameterbased exploration policy using covariance and relevance were shown to decrease the optimizers’ performance. No significant difference was found between a hyperparameter based exploration policy using performance and an equally distributed exploration policy. / Maskininlärning och dess många tillämpningsområden blir vanligare i både akademin och industrin. Den här uppsatsen fokuserar på två maskininlärningsmetoder, faltande neurala nätverk och förstärkningsinlärning. Faltande neurala nätverk har sett stora framgångar inom olika applikationsområden både för klassifieringsproblem och regressionsproblem inom diverse fält, t.ex. syn för självkörande bilar eller ansiktsigenkänning. Dessa nätverk är uppbyggda på en uppsättning av tränbara parameterar men optimeras på data, samt en uppsättning hyperparameterar bestämda av designern och som hålls konstanta vilka behöver optimeras separat för att nätverket ska prestera bra. Målet med denna uppsats är att utforska användandet av förstärkningsinlärning som en metod för att optimera hyperparameterar i faltande neurala nätverk gjorda för klassifieringsproblem. De förstärkningsinlärningsmetoder som använts är en tabellarisk "Q-learning" samt en ny "Q-learning" inspirerad metod benämnd "max-table". Dessa algoritmer har testats med olika handlingsmetoder för utforskning baserade på hyperparameterarnas värdens kovarians, precision eller relevans gentemot utvärderingsmetriken. Förstärkningsinlärningsalgoritmerna var i största del testade på dataseten CIFAR10 och MNIST fashion och jämförda mot en baslinje satt av en slumpmässig sökning. Medan "Q-learning"-algoritmen inte kunde visas prestera bättre än den slumpmässiga sökningen, kunde "max-table" prestera bättre på 50\% av tiden på både dataseten. De handlingsmetoder för utforskning som var baserade på kovarians eller relevans visades minska algoritmens prestanda. Ingen signifikant skillnad kunde påvisas mellan en handlingsmetod baserad på hyperparametrarnas precision och en jämnt fördelad handlingsmetod för utforsking. Convolutional Neural Networks Reinforcement Learning Hyperparameter Optimization Faltande Neurala Nätverk Förstärkningsinlärning Hyperparameteroptimering Computer Engineering Datorteknik
457	Analysis of Design Artifacts in Platform-Based Markets Vandith Pamuru Subramanya Rama (9180506) 31 July 2020 (has links) <div>Digitization has led to emergence of many platforms-based markets. In this dissertation I focus on three different design problems in these markets. The first essay relates to augmented-reality platforms. Pok\'emon Go, an augmented-reality technology-based game, garnered tremendous public interest upon release with an average of 20 million active daily users. The game combines geo-spatial elements with gamification practices to incentivize user movement in the physical world. This work examines the potential externalities that such incentives may have on associated businesses. Particularly, we study the impact of Pok\'emon Go on local restaurants using online reviews as a proxy for consumer engagement and perception. We treat the release of Pok\'emon Go as a natural experiment and study the post-release impact on the associated restaurants. We find that restaurants located near an in-game artifact do indeed observe a higher level of consumer engagement and a more positive consumer perception as compared with those that have no in-game artifacts nearby. In addition, we find that the heterogeneous characteristics of the restaurants moderate the effect significantly. To the best of our knowledge, this study is the first to examine the economic implications of augmented-reality applications. Thereby, our research lays the foundations for how augmented-reality games affect consumer economic behavior. This work also builds insights into the potential value of such associations for business owners and policymakers. </div><div><br></div><div>The second essay focuses on the platform design problem in sponsored seaerch ad-market.Recent advances in technology have reduced frictions in various markets. In this research, we specifically investigate the role of frictions in determining the efficiency and bidding behavior in a generalized second price auction (GSP) -- the most preferred mechanism for sponsored search advertisements. First, we simulate computational agents in the GSP setting and obtain predictions for the metrics of interest. Second, we test these predictions by conducting a human-subject experiment. We find that, contrary to the theoretical prediction, the lower-valued advertisers (who do not win the auction) substantially overbid. Moreover, we find that the presence of market frictions moderates this phenomenon and results in higher allocative efficiency. These results have implications for policymakers and auction platform managers in designing incentives for more efficient auctions.</div><div><br></div><div>The third essay is about user-generated content platforms. These platform utilize various gamification strategies to incentivize user contributions. One of the most popular strategy is to provide platform sponsorships like a special status. Previous literature has extensively studied the impact of having these sponsorships user contributions. We specifically focus on the impact of losing such elite status. Once their contributions to the platform reduce in volume, elite users lose status. Using a unique empirical strategy we show that users continue to contribute high quality reviews, even though they lose their status. We utilize NLP to extract various review characteristics including sentiment and topics. Using an empirical strategy, we find that losing status does not modify the topic of the reviews written by the users, on average. </div><div><br></div> Information Systems Information Systems Management Business Information Systems augmented reality reinforcement learning user generated content
458	Multi-modal Simulation and Calibration for OSU CampusMobility Kalra, Vikhyat January 2021 (has links) No description available. Mechanical Engineering Calibration Traffic Simulation Reinforcement Learning Traffic Signal Control Optimization CAVS HIL
459	Radio Resource Allocation and Beam Management under Location Uncertainty in 5G mmWave Networks Yao, Yujie 16 June 2022 (has links) Millimeter wave (mmWave) plays a critical role in the Fifth-generation (5G) new radio due to the rich bandwidth it provides. However, one shortcoming of mmWave is the substantial path loss caused by poor diffraction at high frequencies, and consequently highly directional beams are applied to mitigate this problem. A typical way of beam management is to cluster users based on their locations. However, localization uncertainty is unavoidable due to measurement accuracy, system performance fluctuation, and so on. Meanwhile, the traffic demand may change dynamically in wireless environments, which increases the complexity of network management. Therefore, a scheme that can handle both the uncertainty of localization and dynamic radio resource allocation is required. Moreover, since the localization uncertainty will influence the network performance, more state-of-the-art localization methods, such as vision-aided localization, are expected to reduce the localization error. In this thesis, we proposed two algorithms for joint radio resource allocation and beam management in 5G mmWave networks, namely UK-means-based Clustering and Deep Reinforcement Learning-based resource allocation (UK-DRL) and UK-medoids-based Clustering and Deep Reinforcement Learning-based resource allocation (UKM-DRL). Specifically, we deploy UK-means and UK-medoids clustering method in UK-DRL and UKM-DRL, respectively, which is designed to handle the clustering under location uncertainties. Meanwhile, we apply Deep Reinforcement Learning (DRL) for intra-beam radio resource allocations in UK-DRL and UKM-DRL. Moreover, to improve the localization accuracy, we develop a vision-aided localization scheme, where pixel characteristics-based features are extracted from satellite images as additional input features for location prediction. The simulations show that UK-DRL and UKM-DRL successfully improve the network performance in data rate and delay than baseline algorithms. When the traffic load is 4 Mbps, UK-DRL has a 172.4\% improvement in sum rate and 64.1\% improvement in latency than K-means-based Clustering and Deep Reinforcement Learning-based resource allocation (K-DRL). UKM-DRL has 17.2\% higher throughput and 7.7\% lower latency than UK-DRL, and 231\% higher throughput and 55.8\% lower latency than K-DRL. On the other hand, the vision-aided localization scheme can significantly reduce the localization error from 17.11 meters to 3.6 meters. Beam Management Location Uncertainty UK-means UK-medoids Vision-aided Deep Reinforcement Learning Radio Resource allocation
460	Navigation of Mobile Robots in Human Environments with Deep Reinforcement Learning / Navigering av mobila robotar i mänskliga miljöer med deep reinforcement learning Coors, Benjamin January 2016 (has links) For mobile robots which operate in human environments it is not sufficient to simply travel to their target destination as quickly as possible. Instead, mobile robots in human environments need to travel to their destination safely, keeping a comfortable distance to humans and not colliding with any obstacles along the way. As the number of possible human-robot interactions is very large, defining a rule-based navigation approach is difficult in such highly dynamic environments. Current approaches solve this task by predicting the trajectories of humans in the scene and then planning a collision-free path. However, this requires separate components for detecting and predicting human motion and does not scale well to densely populated environments. Therefore, this work investigates the use of deep reinforcement learning for the navigation of mobile robots in human environments. This approach is based on recent research on utilizing deep neural networks in reinforcement learning to successfully play Atari 2600 video games on human level. A deep convolutional neural network is trained end-to-end from one-dimensional laser scan data to command velocities. Discrete and continuous action space implementations are evaluated in a simulation and are shown to outperform a Social Force Model baseline approach on the navigation problem for mobile robots in human environments. Computer Sciences Datavetenskap (datalogi)

Search results