Global ETD Search

1	Conditioning behavior styles of Reinforcement Learning policies Mysore Sthaneshwar, Siddharth 19 September 2023 (has links) Reinforcement Learning (RL) algorithms may learn any of an arbitrary set of behaviors that may satisfy a reward-based objective, and this lack of consistency can limit the reliability and practical utility of RL. By considering how RL policies are trained, aspects of the core optimization loop are identified, that significantly impact what behaviors are learned and how. The work presented in this thesis develops frameworks for manipulating these aspects to define and train desirable behavior in practical and more user-friendly ways. Smoothness in RL-based control was found to be a common issue among existing applications of RL in real-world controls. Our initial work on REinforcement-based transferable Agents through Learning (RE+AL) demonstrates that, through principled reward engineering and training-environment tuning, it is possible to learn effective and smooth control. However, this would still be tedious to extend to new tasks. Conditioning for Action Policy Smoothness (CAPS) introduces simple regularization terms directly to the policy optimization and serves as a generalized solution to smooth control that is more easily extensible across tasks. Looking next at how neural network architectural choices impact policy learning, it was noted that the burden of complexity in learning and representation often fell disproportionately to the value function approximations learned during training. Building on this knowledge, Multi-Critic Actor Learning (MultiCriticAL) was developed for multi-task RL, drawing from the intuition that, if value functions to estimate policy quality are difficult to learn, having distinct functions to evaluate each task would ease this representational burden. MultiCriticAL provides an effective tool for learning policies that can smoothly transition between multiple behavior styles and demonstrates superior performance over commonly used single-critic techniques, both in reward-based performance metrics, as well as data efficiency, even enabling learning in cases where baseline methods would otherwise fail. When considering user-friendliness for non-expert practitioners, demonstrations of desirable behavior can often be easier to provide than fine-tuned heuristics, making imitation learning an attractive avenue of exploration for user-friendly tools in policy design. Where heuristic-based rewards can guide RL in general learning, imitation can be used to condition optimization for specific behaviors, though this requires a balancing of possibly conflicting RL and imitation policy optimization signals. We overcome this challenge by extending MultiCriticAL to learning behavior from demonstrations. The Split-Critic Imitation Learning (SCIL) framework allows the definition of specific behaviors in parts of the state space, where it matters, and allows policies to learn any other compatible, generally useful behavior over the rest of the states, using a more standard RL reward-based training loop. Inheriting the strengths of MultiCriticAL, SCIL is able to better separate and balance reinforcement- and imitation-based policy optimization signals to adequately handle both, where contemporary state-of-the-art imitation learning frameworks may fail while enabling improved imitation performance and data efficiency. / 2024-09-18T00:00:00Z Artificial intelligence Control Game AI Multi-style learning Reinforcement learning Robotics
2	Reducing fabric consumption : by improving marker efficiency Widanalage, Varuna Lasantha Kumara, Kizilirmak, Serkan January 2020 (has links) Resource degradation is a significant problem in the world, which is directly related to the textile and fashion industry. Efficient use of the material has been identified as an essential aspect to be addressed seriously. It is a critical topic that has attracted the attention of people and companies in recent years and has become a fundamental issue of sustainability. This research study was based on UN sustainable development goals number 12 and 8, which focuses on resource efficiency. The research is designed in considering fabric consumption, which has a significant impact on the textile and clothing industry to contribute to a brighter future and a more sustainable life. The purpose of this study is to reduce the fabric consumption through improving marker efficiency. The research focuses on investigating the behaviour of marker efficiency concerning usable fabric widths, markers with different sizes and marker with style combinations to reduce fabric consumption. The improvements of the existing markers lead to reduce fabric wastage during the cutting process while improving resource efficiency in consumption and production. In this study, the explanatory sequential design of mixed research method is employed with carrying out experiments to collect and analyze quantitative data, explained and elaborated with qualitative findings through expert interviews to get insights into the quantitative findings in a deductive approach. The marker efficiency significantly varies according to the combination of sizes and style and usable fabric width. The improvements of the marker efficiency, reduce the fabric consumption per garment and increase resource efficiency while preventing waste generation. A saving of 1% of a material which consumed millions of tons per year, significantly affect on reducing resource depletion and environmental pollution. This study is limited to five usable fabric widths, four size marker combinations and two style combinations. Moreover, it is focused on material efficiency, and cost efficiency is not considered. There are possibilities for clothing manufactures’ to improve resource efficiency by improving marker efficiency while planning the demand, considering multi-size and multi-style markers. They can concern usable fabric widths, which provide higher marker efficiencies during material purchasing. Marker efficiency Sustainability Resource efficiency Waste prevention Fabric consumption Usable fabric width Multi-size and Multi-style markers Social Sciences Samhällsvetenskap
3	規模與價值多重風格投資策略實證分析—以台灣股票市場為例周忠樑 Unknown Date (has links) 大約自1980年代初期開始，效率市場理論漸漸受到許多經由實證發現的市場異象(Anomalies)之挑戰，其中最具代表性的就是規模效應與價值效應(淨值市價比現象)。受到這些實證發現的啟示，如何針對股票市場中不同區隔的股票進行風格投資(Style Investing)早已經成為資產管理中的核心議題。學術界與實務界並開始逐漸針對多重風格(Multi-Style)投資組合進行研究，期望能發掘更多投資機會與策略意涵。本研究採用1993年10月至2002年間台灣證券交易所非金融類上市普通股為研究對象，參考Ahmed、Lockwood and Nanda(2002)的分類與模擬方式，探討根據規模與價值雙重分類所形成的各種單一風格投資組合與多重風格投資組合之報酬率差異情形，並進行各風格投資組合之投資終值模擬。為了驗證輪動策略的價值，本研究更進一步以不同風格投資組合配對進行完美預見(Perfect Forecasting)輪動模擬。根據實證結果顯示，台灣股市中價值類投資組合(價值股、大型價值股、小型價值股)報酬率顯著高於成長類投資組合(成長股、大型成長股、小型成長股)，而大型類投資組合與小型類投資組合間沒有產生顯著的報酬差異，但大型類與小型類間的報酬差變動有助於提升輪動策略的績效。經由不同風格投資組合配對之完美預見(Perfect Forecasting)輪動模擬，顯示出輪動策略確實具有相當的潛在投資價值，且以多重風格投資組合進行輪動策略可以獲得更佳的模擬結果。多重風格風格投資輪動策略規模價值 Multi-Style Style Investing Rotation Strategy Firm Size Value

1

Page generated in 0.0334 seconds