Spelling suggestions: "subject:"deep reinforcement learning"" "subject:"deep einforcement learning""
81 |
Smart Tracking for Edge-assisted Object Detection : Deep Reinforcement Learning for Multi-objective Optimization of Tracking-based Detection Process / Smart Spårning för Edge-assisterad Objektdetektering : Djup Förstärkningsinlärning för Flermålsoptimering av Spårningsbaserad DetekteringsprocessZhou, Shihang January 2023 (has links)
Detecting generic objects is one important sensing task for applications that need to understand the environment, for example eXtended Reality (XR), drone navigation etc. However, Object Detection algorithms are particularly computationally heavy for real-time video analysis on resource-constrained mobile devices. Thus Object Tracking, which is a much lighter process, is introduced under the Tracking-By-Detection (TBD) paradigm to alleviate the computational overhead. Still, it is common that the configurations of the TBD remain unchanged, which would result in unnecessary computation and/or performance loss in many cases.\\ This Master's Thesis presents a novel approach for multi-objective optimization of the TBD process on precision and latency, with the platform being power-constrained devices. We propose a Deep Reinforcement Learning based scheduling architecture that selects appropriate TBD actions in video sequences to achieve the desired goals. Specifically, we develop a simulation environment providing Markovian state information as input for the scheduler neural network, justified options of TBD actions, and a scalarized reward function to combine the multiple objectives. Our results demonstrate that the trained policies can learn to utilize content information from the current and previous frames, thus optimally controlling the TBD process at each frame. The proposed approach outperforms the baselines that have fixed TBD configurations and recent research works, achieving the precision close to pure detection while keeping the latency much lower. Both tuneable configurations show positive and synergistic contribution to the optimization objectives. We also show that our policies are generalizable, with inference and action time of the scheduler having minimal latency overhead. This makes our scheduling design highly practical in real XR or similar applications on power-constrained devices. / Att upptäcka generiska objekt är en viktig uppgift inom avkänning för tillämpningar som behöver förstå omgivningen, såsom eXtended Reality (XR) och navigering med drönare, bland annat. Algoritmer för objektdetektering är dock särskilt beräkningstunga när det gäller videoanalyser i realtid på resursbegränsade mobila enheter. Objektspårning, å andra sidan, är en lättare process som vanligtvis implementeras under Tracking-By-Detection (TBD)-paradigmet för att minska beräkningskostnaden. Det är dock vanligt att TBD-konfigurationerna förblir oförändrade, vilket leder till onödig beräkning och/eller prestandaförlust i många fall.\\ I detta examensarbete presenteras en ny metod för multiobjektiv optimering av TBD-processen med avseende på precision och latens på plattformar med begränsad prestanda. Vi föreslår en djup förstärkningsinlärningsbaserad schemaläggningsarkitektur som väljer lämpliga TBD-åtgärder för videosekvenser för att uppnå de önskade målen. Vi utvecklar specifikt en simulering som tillhandahåller Markovian state-information som indata för schemaläggaren, samt neurala nätverk, motiverade alternativ för TBD-åtgärder och en skalariserad belöningsfunktion för att kombinera de olika målen. Våra resultat visar att de tränade strategierna kan lära sig att använda innehållsinformation från aktuella och tidigare ramar för att optimalt styra TBD-processen för varje bild. Det föreslagna tillvägagångssättet är bättre än både de grundläggande metoderna med en fast TBD-konfiguration och nyare forskningsarbeten. Det uppnår en precision som ligger nära den rena detektionen samtidigt som latensen hålls mycket låg. Båda justerbara konfigurationerna bidrar positivt och synergistiskt till optimeringsmålen. Vi visar också att våra strategier är generaliserbara genom att dela upp träning och testning med en 50 %-ig uppdelning, vilket resulterar i minimal inferenslatens och schemaläggarens handlingslatens. Detta gör vår schemaläggningsdesign mycket praktisk i verkliga XR- eller liknande tillämpningar på enheter med begränsad strömförsörjning.
|
82 |
Autonomous Cyber Defense for Resilient Cyber-Physical SystemsZhang, Qisheng 09 January 2024 (has links)
In this dissertation research, we design and analyze resilient cyber-physical systems (CPSs) under high network dynamics, adversarial attacks, and various uncertainties. We focus on three key system attributes to build resilient CPSs by developing a suite of the autonomous cyber defense mechanisms. First, we consider network adaptability to achieve the resilience of a CPS. Network adaptability represents the network ability to maintain its security and connectivity level when faced with incoming attacks. We address this by network topology adaptation. Network topology adaptation can contribute to quickly identifying and updating the network topology to confuse attacks by changing attack paths. We leverage deep reinforcement learning (DRL) to develop CPSs using network topology adaptation. Second, we consider the fault-tolerance of a CPS as another attribute to ensure system resilience. We aim to build a resilient CPS under severe resource constraints, adversarial attacks, and various uncertainties. We chose a solar sensor-based smart farm as one example of the CPS applications and develop a resource-aware monitoring system for the smart farms. We leverage DRL and uncertainty quantification using a belief theory, called Subjective Logic, to optimize critical tradeoffs between system performance and security under the contested CPS environments. Lastly, we study system resilience in terms of system recoverability. The system recoverability refers to the system's ability to recover from performance degradation or failure. In this task, we mainly focus on developing an automated intrusion response system (IRS) for CPSs. We aim to design the IRS with effective and efficient responses by reducing a false alarm rate and defense cost, respectively. Specifically, We build a lightweight IRS for an in-vehicle controller area network (CAN) bus system operating with DRL-based autonomous driving. / Doctor of Philosophy / In this dissertation research, we design and analyze resilient cyber-physical systems (CPSs) under high network dynamics, adversarial attacks, and various uncertainties. We focus on three key system attributes to build resilient CPSs by developing a suite of the autonomous cyber defense mechanisms. First, we consider network adaptability to achieve the resilience of a CPS. Network adaptability represents the network ability to maintain its security and connectivity level when faced with incoming attacks. We address this by network topology adaptation. Network topology adaptation can contribute to quickly identifying and updating the network topology to confuse attacks by changing attack paths. We leverage deep reinforcement learning (DRL) to develop CPSs using network topology adaptation. Second, we consider the fault-tolerance of a CPS as another attribute to ensure system resilience. We aim to build a resilient CPS under severe resource constraints, adversarial attacks, and various uncertainties. We chose a solar sensor-based smart farm as one example of the CPS applications and develop a resource-aware monitoring system for the smart farms. We leverage DRL and uncertainty quantification using a belief theory, called Subjective Logic, to optimize critical tradeoffs between system performance and security under the contested CPS environments. Lastly, we study system resilience in terms of system recoverability. The system recoverability refers to the system's ability to recover from performance degradation or failure. In this task, we mainly focus on developing an automated intrusion response system (IRS) for CPSs. We aim to design the IRS with effective and efficient responses by reducing a false alarm rate and defense cost, respectively. Specifically, We build a lightweight IRS for an in-vehicle controller area network (CAN) bus system operating with DRL-based autonomous driving.
|
83 |
Deep Reinforcement Learning for Temperature Control in Buildings and Adversarial AttacksAmmouri, Kevin January 2021 (has links)
Heating, Ventilation and Air Conditioning (HVAC) systems in buildings are energy consuming and traditional methods used for building control results in energy losses. The methods cannot account for non-linear dependencies in the thermal behaviour. Deep Reinforcement Learning (DRL) is a powerful method for reaching optimal control in many different control environments. DRL utilizes neural networks to approximate the optimal actions to take given that the system is in a given state. Therefore, DRL is a promising method for building control and this fact is highlighted by several studies. However, neural network polices are known to be vulnerable to adversarial attacks, which are small, indistinguishable changes to the input, which make the network choose a sub-optimal action. Two of the main approaches to attack DRL policies are: (1) the Fast Gradient Sign Method, which uses the gradients of the control agent’s network to conduct the attack; (2) to train a a DRL-agent with the goal to minimize performance of control agents. The aim of this thesis is to investigate different strategies for solving the building control problem with DRL using the building simulator IDA ICE. This thesis is also going to use the concept of adversarial machine learning by applying the attacks on the agents controlling the temperature inside the building. We first built a DRL architecture to learn how to efficiently control temperature in a building. Experiments demonstrate that exploration of the agent plays a crucial role in the training of the building control agent, and one needs to fine-tune the exploration strategy in order to achieve satisfactory performance. Finally, we tested the susceptibility of the trained DRL controllers to adversarial attacks. These tests showed, on average, that attacks trained using DRL methods have a larger impact on building control than those using FGSM, while random perturbation have almost null impact. / Ventilationssystem i byggnader är energiförbrukande och traditionella metoder som används för byggnadskontroll resulterar i förlust av energisparande. Dessa metoder kan inte ta hänsyn till icke-linjära beroenden i termisk beteenden. Djup förstärkande inlärning (DRL) är en kraftfull metod för att uppnå optimal kontroll i många kontrollmiljöer. DRL använder sig av neurala nätverk för att approximera optimala val som kan tas givet att systemet befinner sig i en viss stadie. Därför är DRL en lovande metod för byggnadskontroll och detta faktumet är markerat av flera studier. Likväl, neurala nätverk i allmänhet är kända för att vara svaga mot adversarial attacker, vilket är små ändringar i inmatningen, som gör att neurala nätverket väljer en åtgärd som är suboptimal. Syftet med denna anvhandling är att undersöka olika strategier för att lösa byggnadskontroll-problemet med DRL genom att använda sig av byggnadssimulatorn IDA ICE. Denna avhandling kommer också att använda konceptet av adversarial machine learning för att attackera agenterna som kontrollerar temperaturen i byggnaden. Det finns två olika sätt att attackera neurala nätverk: (1) Fast Gradient Sign Method, som använder gradienterna av kontrollagentens nätverk för att utföra sin attack; (2) träna en inlärningsagent med DRL med målet att minimera kontrollagenternas prestanda. Först byggde vi en DRL-arkitektur som lärde sig kontrollera temperaturen i en byggad. Experimenten visar att utforskning av agenten är en grundläggande faktor för träningen av kontrollagenten och man måste finjustera utforskningen av agenten för att nå tillfredsställande prestanda. Slutligen testade vi känsligheten av de tränade DRL-agenterna till adversarial attacker. Dessa test visade att i genomsnitt har det större påverkan på kontrollagenterna att använda DRL metoder än att använda sig av FGSM medans att attackera helt slumpmässigt har nästan ingen påverkan.
|
84 |
Understanding and Combating Online Social DeceptionGuo, Zhen 02 May 2023 (has links)
In today's world, online communication through social network services (SNSs) has become an essential aspect of people's daily lives. As social networking sites (SNSs) have become more sophisticated, cyber attackers have found ways to exploit them for harmful activities such as financial fraud, privacy violations, and sexual or labor exploitation. Thus, it is imperative to gain an understanding of these activities and develop effective countermeasures to build SNSs that can be trusted. The existing approaches have focused on discussing detection mechanisms for a particular type of online social deception (OSD) using various artificial intelligence (AI) techniques, including machine/deep learning (ML/DL) or text mining. However, fewer studies exist on the prevention and response (or mitigation) mechanisms for effective defense against OSD attacks. Further, there have been insufficient efforts to investigate the underlying intents and tactics of those OSD attackers through their in-depth understanding. This dissertation is motivated to take defense approaches to combat OSD attacks through the in-depth understanding of the psychological-social behaviors of attackers and potential victims, which can effectively guide us to take more proactive action against OSD attacks which can minimize potential damages to the potential victims as well as be cost-effective by minimizing or saving recovery cost.
In this dissertation, we examine the OSD attacks mainly through two tasks, including understanding their causes and combating them in terms of prevention, detection, and mitigation. In the OSD understanding task, we investigate the intent and tactics of false informers (e.g., fake news spreaders) in propagating fake news or false information. We understand false informers' intent more accurately based on intent-related phrases from fake news contexts to decide on effective and efficient defenses (or interventions) against them. In the OSD combating task, we develop the defense systems following two sub-tasks: (1) The social capital-based friending recommendation system to guide OSN users to choose trustworthy users to defend against phishing attackers proactively; and (2) The defensive opinion update framework for OSN users to process their opinions by filtering out false information. The schemes proposed for combating OSD attacks contribute to the prevention, detection, and mitigation of OSD attacks. / Doctor of Philosophy / This Ph.D. dissertation explores the issue of online social deception (OSD) in the context of social networking services (SNSs). With the increasing sophistication of SNSs, cyber attackers have found ways to exploit them for harmful activities, such as financial fraud and privacy violations. While previous studies have focused on detection mechanisms using artificial intelligence (AI) techniques, this dissertation takes a defense approach by investigating the underlying psychological-social behaviors of attackers and potential victims. Through two tasks of understanding OSD causes and combating them through various AI approaches, this dissertation proposes a social capital-based friending recommendation system, a defensive opinion update framework, and a fake news spreaders' intent analysis framework to guide SNS users in choosing trustworthy users and filtering out phishing attackers or false information. The proposed schemes contribute to the prevention, detection, and mitigation of OSD attacks, potentially minimizing potential damages to potential victims and saving recovery costs.
|
85 |
Autonomous robot car with Deep Reinforcement LearningFeng, Yi 02 January 2025 (has links)
Autonomous driving (AD) aims to achieve fully autonomous operation in various complex traffic environments. Deep reinforcement learning (DRL) integrates the perception capabilities of deep learning and the decision-making capabilities of reinforcement learning (RL), providing efficient solutions for autonomous driving through non-end-to-end pretraining methods and end-to-end direct control methods. This study utilizes the Gym-Duckietown simulation platform and applies Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) algorithms to compare and evaluate the performance of non-end-to-end and end-to-end DRL models across tasks of varying complexity. The main research results highlight significant performance differences based on input dimensionality, RL algorithm selection, and reward structure. Notably, in non-end-to-end complex tasks, using directly loaded pretrained weights from simple models significantly enhances model generalization. In end-to-end complex task models, phased reinforcement learning strategies demonstrate advantages over standard training methods. The experimental results reveal the critical influence of reward design, task complexity, and RL algorithms on model performance, and demonstrate the potential of phased reinforcement learning in improving training efficiency and adaptability. These findings validate the applicability of DRL in AD and provide insights for optimizing training strategies, reward structures, and algorithm selection in future AD research.
|
86 |
Searching for Q*Piché, Alexandre 04 1900 (has links)
Les travaux dans cette thèse peuvent être vue à travers le prisme commun de la “recherche de Q*” et visent à mettre en évidence l’efficacité de la combinaison des systèmes d’apprentissage par renforcement (RL) profond et la planification. Le RL profond nous permet d’apprendre: 1) des politiques riches à partir desquelles nous pouvons échantillonner des actions futures potentielles, et 2) des fonctions Q précises permettant à l’agent d’évaluer l’impact potentiel de ses actions avant de les prendre. La planification permet à l’agent d’utiliser le calcul pour améliorer sa politique en évaluant plusieurs séquences potentielles d’actions futures et en sélectionnant la plus prometteuse. Dans cette thèse, nous explorons différentes façons de combiner ces deux composantes afin qu’elles se renforcent mutuellement et nous permettent d’obtenir des agents plus robustes.
La première contribution de cette thèse cadre le RL et la planification comme un pro- blème d’inférence. Ce cadre nous permet d’utiliser des techniques de Monte Carlo séquentiel pour approximer une distribution sur les trajectoires planifiées optimales. La deuxième contribution met en évidence une connexion entre les réseaux cibles utilisés dans l’appren- tissage Q profond et la régularisation fonctionnelle, ce qui nous conduit à une régularisation des fonctions Q plus flexible et “propre”. La troisième contribution simplifie le problème de RL via l’apprentissage supervisé en modélisant directement le retour futur comme une distribution, permettant à l’agent d’échantillonner des retours conditionnels à l’état présent plutôt qu’être un hyper paramètre specifique à chaque environnement. Enfin, la quatrième contribution propose un nouvel algorithme d’optimisation itératif basé sur l’auto-évaluation et l’auto-amélioration pour les grands modèles de langage, cet algorithme est utilisé pour réduire le taux d’hallucination des modèles sans compromettre leurs utilités. / The research in this thesis can be seen through the common lens of “Searching for Q*” and aims to highlight the effectiveness of combining deep Reinforcement Learning (RL) systems and search. Deep RL allows us to learn: 1) rich policies from which we can sample potential future actions, and 2) accurate Q-functions allowing the agent to evaluate the potential impact of its actions before taking them. Search allows the agent to use computation to improve its policy by evaluating multiple potential future sequences of actions and selecting the most promising one. In this thesis, we explore different ways to combine these two components, so they improve one another and allow us to obtain stronger agents.
The first contribution of this thesis frames RL and planning as an inference problem. This framing enables us to leverage Sequential Monte Carlo techniques to approximate a distribution over the optimal planned trajectories. The second contribution highlights a connection between Target Networks used in Q-learning and functional regularization, lead- ing us to a more flexible and “proper” regularization of Q-functions. The third contribution simplifies the RL via supervised learning (RvS) problem by directly modeling future return as a distribution, allowing the agent to sample returns on the fly instead of having it be a hyperparameter dependent on the environment. Finally, the fourth contribution proposes a novel iterative optimization algorithm based on self-evaluation and self-prompting for large language models, which reduces the hallucination rates of the model without compromising its helpfulness.
|
87 |
Optimizing vertical farming : control and scheduling algorithms for enhanced plant growthVu, Cong Vinh 10 1900 (has links)
L’agriculture verticale permet de contrôler presque totalement les conditions pour croître
des plantes, qu’il s’agisse des conditions météorologiques, des nutriments nécessaires à la
croissance des plantes ou même de la lutte contre les parasites. Il est donc possible de
trouver et de définir des paramètres susceptibles d’augmenter le rendement et la qualité des
récoltes et de minimiser la consommation d’énergie dans la mesure du possible. À cette fin,
ce mémoire présente des algorithmes d’optimisation tels qu’une version améliorée du recuit
simulé qui peut être utilisée pour trouver et donner des lignes directrices pour les paramètres
de l’agriculture verticale. Nous présentons égalementune contribution sur la façon dont les
algorithmes de contrôle, p. ex. l’apprentissage par renforcement profond avec les méthodes
critiques d’acteurs, peuvent être améliorés grâce à une exploration plus efficace en prenant
en compte de l’incertitude épistémique lors de la sélection des actions. cette contribution
peut profiter aux systèmes de contrôle conçus pour l’agriculture verticale. Nous montrons
que notre travail est capable de surpasser certains algorithmes utilisés pour l’optimisation et
le contrôle continu. / Vertical farming provides a way to have almost total control over agriculture, whether it be
controlling weather conditions, nutrients necessary for plant growth, or even pest control. As
such, it is possible to find and set parameters that can increase crop yield, and quality, and
minimize energy consumption where possible. To that end, this thesis presents optimization
algorithms such as an enhanced version of Simulated Annealing that can be used to find and
give guidelines for those parameters. We also present work on how real-time control algorithms such as Actor-Critic methods can be made to perform better through more efficient
exploration by taking into account epistemic uncertainty during action selection which can
also benefit control systems made for vertical farming. We show that our work is able to
outperform some algorithms used for optimization and continuous control.
|
88 |
Optimization Methods for Distribution Systems: Market Design and Resiliency EnhancementBedoya Ceballos, Juan Carlos 05 August 2020 (has links)
The increasing penetration of proactive agents in distribution systems (DS) has opened new possibilities to make the grid more resilient and to increase participation of responsive loads (RL) and non-conventional generation resources. On the resiliency side, plug-in hybrid electric vehicles (PHEV), energy storage systems (ESS), microgrids (MG), and distributed energy resources (DER), can be leveraged to restore critical load in the system when the utility system is not available for extended periods of time. Critical load restoration is a key factor to achieve a resilient distribution system. On the other hand, existing DERs and responsive loads can be coordinated in a market environment to contribute to efficiency of electricity consumption and fair electricity tariffs, incentivizing proactive agents' participation in the distribution system.
Resiliency and market applications for distribution systems are highly complex decision-making problems that can be addressed using modern optimization techniques. Complexities of these problems arise from non-linear relations, integer decision variables, scalability, and asynchronous information. On the resiliency side, existing models include optimization approaches that consider system's available information and neglect asynchrony of data arrival. As a consequence, these models can lead to underutilization of critical resources during system restoration. They can also become computationally intractable for large-scale systems. In the market design problem, existing approaches are based on centralized or computational distributed approaches that are not only limited by hardware requirements but also restrictive for active participation of the market agents.
In this context, the work of this dissertation results in major contributions regarding new optimization algorithms for market design and resiliency improvement in distribution systems. In the DS market side, two novel contribution are presented: 1) A computational distributed coordination framework based on bilateral transactions where social welfare is maximized, and 2) A fully decentralized transactive framework where power suppliers, in a simultaneous auction environment, strategically bid using a Markowitz portfolio optimization approach. On the resiliency side, this research proposed a system restoration approach, taking into account uncertain devices and associated asynchronous information, by means of a two-module optimization models based on binary programming and three phase unbalanced optimal power flow. Furthermore, a Reinforcement Learning (RL) method along with a Monte Carlo tree search algorithm has been proposed to solve the scalability problem for resiliency enhancement. / Doctor of Philosophy / Distribution systems (DS) are evolving from traditional centralized and fossil fuel generation resources to networks with large scale deployment of responsive loads and distributed energy resources. Optimization-based decision-making methods to improve resiliency and coordinate DS participants are required. Prohibitive costs due to extended power outages require efficient mechanisms to avoid interruption of service to critical load during catastrophic power outages. Coordination mechanisms for various generation resources and proactive loads are in great need.
Existing optimization-based approaches either neglect the asynchronous nature of the information arrival or are computationally intractable for large scale system. The work of this dissertation results in major contributions regarding new optimization methods for market design, coordination of DS participants, and improvement of DS resiliency. Four contributions toward the application of optimization approaches for DS are made: 1) A distributed optimization algorithm based on decomposition and best approximation techniques to maximize social welfare in a market environment, 2) A simultaneous auction mechanism and portfolio optimization method in a fully decentralized market framework, 3) Binary programming and nonlinear unbalanced power flow, considering asynchronous information, to enhance resiliency in a DS, and 4) A reinforcement learning method together with an efficient search algorithm to support large scale resiliency improvement models incorporating asynchronous information.
|
89 |
ENHANCING POLICY OPTIMIZATION FOR IMPROVED SAMPLE EFFICIENCY AND GENERALIZATION IN DEEP REINFORCEMENT LEARNINGMd Masudur Rahman (19818171) 08 October 2024 (has links)
<p dir="ltr">The field of reinforcement learning has made significant progress in recent years, with deep reinforcement learning (RL) being a major contributor. However, there are still challenges associated with the effective training of RL algorithms, particularly with respect to sample efficiency and generalization. This thesis aims to address these challenges by developing RL algorithms capable of generalizing to unseen environments and adapting to dynamic conditions, thereby expanding the practical applicability of RL in real-world tasks. The first contribution of this thesis is the development of novel policy optimization techniques that enhance the generalization capabilities of RL agents. These techniques include the Thinker method, which employs style transfer to diversify observation trajectories, and Bootstrap Advantage Estimation, which improves policy and value function learning through augmented data. These methods have demonstrated superior performance in standard benchmarks, outperforming existing data augmentation and policy optimization techniques. Additionally, this thesis introduces Robust Policy Optimization, a method that enhances exploration in policy gradient-based RL by perturbing action distributions. This method addresses the limitations of traditional methods, such as entropy collapse and primacy bias, resulting in improved sample efficiency and adaptability in continuous action spaces. The thesis further explores the potential of natural language descriptions as an alternative to image-based state representations in RL. This approach enhances interpretability and generalization in tasks involving complex visual observations by leveraging large language models. Furthermore, this work contributes to the field of semi-autonomous teleoperated robotic surgery by developing systems capable of performing complex surgical tasks remotely, even under challenging conditions such as communication delays and data scarcity. The creation of the DESK dataset supports knowledge transfer across different robotic platforms, further enhancing the capabilities of these systems. Overall, the advancements presented in this thesis represent significant steps toward developing more robust, adaptable, and efficient autonomous agents. These contributions have broad implications for various real-world applications, including autonomous systems, robotics, and safety-critical tasks such as medical surgery.</p>
|
90 |
Reinforcement Learning From Human Feedback For Ethically Robust Ai Decision-MakingPlasencia, Marco M 01 January 2024 (has links) (PDF)
The emergence of reinforcement learning from human feedback (RLHF) has made great strides toward giving AI decision-making the ability to learn from external human advice. In general, this machine learning technique is concerned with producing agents that learn to work toward optimizing and achieving some goal, advanced by interactions with the environment and feedback given in terms of a quantifiable reward. In the scope of this project, we seek to merge the intricate realms of AI robustness, ethical decision-making, and RLHF. With no way to truly quantify human values, human feedback is an essential bridge in the learning process, allowing AI models to reflect better ethical principles rather than just replicating human behavior. By exploring the transformative potential of RLHF in AI-human interactions, acknowledging the dynamic nature of human behavior beyond simplistic models, and emphasizing the necessity for ethically framed AI systems, this thesis constructs a deep reinforcement learning framework that is not only robust but also well aligned with human ethical standards. Through a methodology that incorporates simulated ethical dilemmas and evaluates AI decisions against established ethical frameworks, the focus is to contribute significantly to the understanding and application of RLHF in creating AI systems that embody robustness and ethical integrity.
|
Page generated in 0.1034 seconds