Spelling suggestions: "subject:"emitation 1earning"" "subject:"emitation c1earning""
21 |
Pose Imitation Constraints For Kinematic StructuresGlebys T Gonzalez (14486934) 09 February 2023 (has links)
<p> </p>
<p>The usage of robots has increased in different areas of society and human work, including medicine, transportation, education, space exploration, and the service industry. This phenomenon has generated a sudden enthusiasm to develop more intelligent robots that are better equipped to perform tasks in a manner that is equivalently good as those completed by humans. Such jobs require human involvement as operators or teammates since robots struggle with automation in everyday settings. Soon, the role of humans will be far beyond users or stakeholders and include those responsible for training such robots. A popular teaching form is to allow robots to mimic human behavior. This method is intuitive and natural and does not require specialized knowledge of robotics. While there are other methods for robots to complete tasks effectively, collaborative tasks require mutual understanding and coordination that is best achieved by mimicking human motion. This mimicking problem has been tackled through skill imitation, which reproduces human-like motion during a task shown by a trainer. Skill imitation builds on faithfully replicating the human pose and requires two steps. In the first step, an expert's demonstration is captured and pre-processed, and motion features are obtained; in the second step, a learning algorithm is used to optimize for the task. The learning algorithms are often paired with traditional control systems to transfer the demonstration to the robot successfully. However, this methodology currently faces a generalization issue as most solutions are formulated for specific robots or tasks. The lack of generalization presents a problem, especially as the frequency at which robots are replaced and improved in collaborative environments is much higher than in traditional manufacturing. Like humans, we expect robots to have more than one skill and the same skills to be completed by more than one type of robot. Thus, we address this issue by proposing a human motion imitation framework that can be efficiently computed and generalized for different kinematic structures (e.g., different robots).</p>
<p> </p>
<p>This framework is developed by training an algorithm to augment collaborative demonstrations, facilitating the generalization to unseen scenarios. Later, we create a model for pose imitation that converts human motion to a flexible constraint space. This space can be directly mapped to different kinematic structures by specifying a correspondence between the main human joints (i.e., shoulder, elbow, wrist) and robot joints. This model permits having an unlimited number of robotic links between two assigned human joints, allowing different robots to mimic the demonstrated task and human pose. Finally, we incorporate the constraint model into a reward that informs a Reinforcement Learning algorithm during optimization. We tested the proposed methodology in different collaborative scenarios. Thereafter, we assessed the task success rate, pose imitation accuracy, the occlusion that the robot produces in the environment, the number of collisions, and finally, the learning efficiency of the algorithm.</p>
<p> </p>
<p>The results show that the proposed framework creates effective collaboration in different robots and tasks.</p>
|
22 |
Playstyle Generation with Multimodal Generative Adversarial Imitation Learning : Style-reward from Human Demonstration for Playtesting Agents / Spelstilsgenerering med Multimodal Generativ Motståndarimitationsinlärning : Spelstilsbelöning från Demonstrationer för Playtesting-AgenterAhlberg, William January 2023 (has links)
Playtesting plays a crucial role in video game production. The presence of gameplay issues and faulty design choices can be of great detriment to the overall player experience. Machine learning has the potential to be applied to automated playtesting solutions, removing mundane and repetitive testing, and allowing game designers and playtesters to focus their efforts on rewarding tasks. It is important in playtesting to consider the different playstyles players might use to adapt game design choices accordingly. With Reinforcement learning, it is possible to create high quality agents able to play and traverse complex game environments with fairly simple task-rewards. However, an automated playtesting solution must also be able to incorporate unique behaviour which mimic human playstyles. It can often be difficult to handcraft a quantitative style-reward to drive agent learning, especially for those with limited reinforcement learning experience, like game developers. MultiGAIL, Multimodal Generative Adversarial Imitation Learning, is a proposed learning algorithm able to generate autonomous agents imbued with human playstyles from recorded playstyle demonstrations. The proposed method requires no handcrafted style-reward, and can generate novel intermediate playstyles from demonstrated ones. MultiGAIL is evaluated in game environments resembling complex 3D games with both discrete and continuous action spaces. The playstyle the agent exhibits is easily controllable at inference with an auxiliary input parameter. Evaluation shows the agent is able to successfully replicate the underlying playstyles in human demonstrations, and that novel playstyles generate explainable action distributions indicative of the level of blending the auxiliary input declares. The results indicate that MultiGAIL could be a suitable solution to incorporate style behaviours in playtesting autonomous agents, and can be easily be used by those with limited domain knowledge of reinforcement learning. / ”Playtesting” har en viktig roll i TV-spelsutveckling. Fel i spel, såsom buggar och dålig speldesign kan drastiskt försämra spelupplevelsen. Maskininlärning kan användas för att automatisera testandet av spel och därmed ta bort behovet för människor att utföra repetitiva och tråkiga test. Spelutvecklare och speltestare kan då istället inrikta sig på mer nyttiga uppgifter. I playtesting så behöver de diverse spelstilar som spelare kan ha beaktas, så att spelutvecklare har möjligheten att anpassa spelet därefter. Förstärkande inlärning har använts för att skapa högkvalitativa agenter som kan spela och navigera komplexa spelmiljöer genom att definiera relativt simpla belöningsfunktioner. Dock är uppgiften att skapa en belöningsfunktion som formar agenten att följa specifika spelstilar en mycket svårare uppgift. Att anta att de utan förkunskaper inom maskininlärning och förstärkande inlärning, som spelutvecklare, ska kunna skapa sådana belöningsfunktioner är orealistiskt. MultiGAIL, Multimodal Generative Adversarial Imitation Learning", är en maskininlärningsalgoritm som kan generera autonoma agenter som efterföljer spelstilar med hjälp av tillgången till inspelade spelstilsdemonstrationer. Metoden kräver inga hårdkodade stilbelöningar och kan interpolera de spelstilarna funna i demonstrationerna, därav skapa nya beteenden för agenterna. MultiGAIL evalueras i spelmiljöer liknande komplexa 3D spel och kan använda både diskreta och kontinuerliga åtgärdsrum. Den spelstil agenten uppvisar kan enkelt kontrolleras vid inferens av en varierbar parameter. Vår evaluering visar att metoden kan lära agenten att korrekt imitera de spelstilar som definieras av inspelade demonstrationer. Nya spelstilar generade av MultiGAIL har förutsägbara beteenden utefter värdet på den varierande parametern. MultiGAIL kan mycket troligt användas för att skapa playtesting autonoma agenter som beter sig utefter specifika spelstilar utan att behöva definiera en belöningsfunktion.
|
23 |
Using Imitation Learning for Human Motion Control in a Virtual SimulationAkrin, Christoffer January 2022 (has links)
Test Automation is becoming a more vital part of the software development cycle, as it aims to lower the cost of testing and allow for higher test frequency. However, automating manual tests can be difficult as they tend to require complex human interaction. In this thesis, we aim to solve this by using Imitation Learning as a tool for automating manual software tests. The software under test consists of a virtual simulation, connected to a physical input device in the form of a sight. The sight can rotate on two axes, yaw and pitch, which require human motion control. Based on this, we use a Behavioral Cloning approach with a k-NN regressor trained on human demonstrations. Evaluation of model resemblance to the human is done by comparing the state path taken by the model and human. The model task performance is measured with a score based on the time taken to stabilize the sight pointing at a given object in the virtual world. The results show that a simple k-NN regression model using high-level states and actions, and with limited data, can imitate the human motion well. The model tends to be slightly faster than the human on the task while keeping realistic motion. It also shows signs of human errors, such as overshooting the object at higher angular velocities. Based on the results, we conclude that using Imitation Learning for Test Automation can be practical for specific tasks, where capturing human factors are of importance. However, further exploration is needed to identify the full potential of Imitation Learning in Test Automation.
|
24 |
Transformer enhanced affordance learning for autonomous drivingSankar, Rajasekar 30 October 2024 (has links)
Most existing autonomous driving perception approaches rely on the Direct perception method with camera sensors, yet they often overlook the valuable 3D spatial data provided by other sensors, such as LiDAR. This Master thesis investigates enhancing affordance learning through a multimodal fusion transformer, aiming to refine AV perception and scene interpretation by effectively integrating multi-sensor data. Our approach introduces a two-stage network architecture: the first stage employs a backbone to fuse sensor data and to extract features, while the second stage employs a Taskblock MLP network to predict both classification affordances (junction, red light, pedestrian, and vehicle hazards) and regression affordances (relative angle, lateral distance, and target vehicle distance). We utilized the TransFuser backbone, based on Imitation Learning, to integrate image and LiDAR BEV data using a self-attention mechanism and to extract the feature map. Our results are compared against image-only architectures like Latent TransFuser and other sensor fusion backbones. Integration with the OmniOpt 2 tool, developed by ScaDS.AI, facilitates hyperparameter optimization, enhancing the model performance. We assessed our model's effectiveness using the CARLA Town02 and as well as the real-world KITTI-360 datasets, demonstrating significant improvements in affordance prediction accuracy and reliability. This advancement underscores the potential of combining LiDAR and image data via transformer-based fusion to create safer and more efficient autonomous driving systems.:List of Figures ix
List of Tables xi
Abbreviations xiii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Autonmous Driving: Overview . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 From highly automated to autonomous . . . . . . . . . . . . . . 1
1.1.2 Autonomy levels . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 Perception systems . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Three Paradigms for autonomous driving . . . . . . . . . . . . . . . . . 4
1.3 Sensor Fusion: Global context capture . . . . . . . . . . . . . . . . . . . 5
1.4 Research Questions and Methods . . . . . . . . . . . . . . . . . . . . . . 5
1.4.1 Research Questions (RQ) . . . . . . . . . . . . . . . . . . . . . . 5
1.4.2 Research Methods (RM) . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Structure of the work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Research Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Affordance Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Multi-Modal Autonomous Driving . . . . . . . . . . . . . . . . . . . . . 9
2.3 Sensor Fusion Methods for Object Detection and Motion Forecasting . . 10
2.4 Attention for Autonomous Driving . . . . . . . . . . . . . . . . . . . . . 11
3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Problem setting A . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.2 Problem setting B . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Input and Output parametrization . . . . . . . . . . . . . . . . . . . . . 15
3.2.1 Input Representation . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.2 Output Representation . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Definition of affordances . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Proposed Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Detailed overview of the Proposed Architecture . . . . . . . . . . . . . . 20
3.5.1 Stage1: TransFuser Backbone - Multimodal fusion transformer . 21
3.5.2 Fused Feature extraction . . . . . . . . . . . . . . . . . . . . . . 23
3.5.3 Annotations extraction . . . . . . . . . . . . . . . . . . . . . . . 24
3.5.4 Stage2: Task-Block MLP Network architecture . . . . . . . . . . 29
3.6 Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.6.1 Stage1: Loss Function . . . . . . . . . . . . . . . . . . . . . . . . 30
3.6.2 Stage2: Loss Function . . . . . . . . . . . . . . . . . . . . . . . . 31
3.6.3 Total Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.7 Other Backbone Architectures . . . . . . . . . . . . . . . . . . . . . . . . 32
3.7.1 Latent TransFuser . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.7.2 Geometric Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.7.3 Late Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.8 Hyperparameter Optimization: OmniOpt 2 . . . . . . . . . . . . . . . . 34
4 Training and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1 Dataset definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.1 Types of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.2 Overview of Dataset Distribution . . . . . . . . . . . . . . . . . . 36
4.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3.1 Stage 1: Backbone architecture training . . . . . . . . . . . . . . 38
4.3.2 Stage 2: TaskBlock MLP training . . . . . . . . . . . . . . . . . 39
4.3.3 Traning Parameter Study . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Loss curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4.1 Stage 1 Loss curve . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4.2 Stage 2 Loss curve . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.5 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.5.1 Preparation of a optimization project . . . . . . . . . . . . . . . 43
5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1 Quantitative Insights into Regression-Based Affordance Predictions . . . 45
5.1.1 Comparative Analysis of Error Metrics against each Backbone . 45
5.1.2 Graphical Analysis of error metrics performance for Transfuser . 47
5.2 Quantitative Insights into Classification-Based Affordance Predictions . 48
5.2.1 Comparative Analysis of Classification Performance Metrics against
each Backbone . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2.2 Graphical Analysis of classification performance for TransFuser . 50
5.3 OmniOpt2 Hyper-optimization results . . . . . . . . . . . . . . . . . . . 52
5.4 Affordance Prediction Dashboard . . . . . . . . . . . . . . . . . . . . . . 53
6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.1 Evaluation with CARLA Test dataset . . . . . . . . . . . . . . . . . . . 55
6.1.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2 Evaluation with real world: The KITTI Dataset . . . . . . . . . . . . . 56
6.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
A Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
A.1 Latent Transfuser with MLP . . . . . . . . . . . . . . . . . . . . . . . . 61
A.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
A.2.1 Comparative Analysis of Error Metrics in Latent Transfuser with
Transformer and MLP . . . . . . . . . . . . . . . . . . . . . . . . 61
A.2.2 Comparative Analysis of Classification Performance Metrics in
Latent Transfuser with Transformer and MLP . . . . . . . . . . 62
|
25 |
Learning Preference Models for Autonomous Mobile Robots in Complex DomainsSilver, David 01 December 2010 (has links)
Achieving robust and reliable autonomous operation even in complex unstructured environments is a central goal of field robotics. As the environments and scenarios to which robots are applied have continued to grow in complexity, so has the challenge of properly defining preferences and tradeoffs between various actions and the terrains they result in traversing. These definitions and parameters encode the desired behavior of the robot; therefore their correctness is of the utmost importance. Current manual approaches to creating and adjusting these preference models and cost functions have proven to be incredibly tedious and time-consuming, while typically not producing optimal results except in the simplest of circumstances.
This thesis presents the development and application of machine learning techniques that automate the construction and tuning of preference models within complex mobile robotic systems. Utilizing the framework of inverse optimal control, expert examples of robot behavior can be used to construct models that generalize demonstrated preferences and reproduce similar behavior. Novel learning from demonstration approaches are developed that offer the possibility of significantly reducing the amount of human interaction necessary to tune a system, while also improving its final performance. Techniques to account for the inevitability of noisy and imperfect demonstration are presented, along with additional methods for improving the efficiency of expert demonstration and feedback.
The effectiveness of these approaches is confirmed through application to several real world domains, such as the interpretation of static and dynamic perceptual data in unstructured environments and the learning of human driving styles and maneuver preferences. Extensive testing and experimentation both in simulation and in the field with multiple mobile robotic systems provides empirical confirmation of superior autonomous performance, with less expert interaction and no hand tuning. These experiments validate the potential applicability of the developed algorithms to a large variety of future mobile robotic systems.
|
26 |
Learning Multi-step Dual-arm Tasks From DemonstrationsNatalia S Sanchez Tamayo (9156518) 29 July 2020 (has links)
Surgeon expertise can be difficult to capture through direct robot programming. Deep imitation learning (DIL) is a popular method for teaching robots to autonomously execute tasks through learning from demonstrations. DIL approaches have been previously applied to surgical automation. However, previous approaches do not consider the full range of robot dexterous motion required in general surgical task, by leaving out tooltip rotation changes or modeling one robotic arm only. Hence, they are not directly applicable for tasks that require rotation and dual-arm collaboration such as debridement. We propose to address this limitation by formulating a DIL approach for the execution of dual-arm surgical tasks including changes in tooltip orientation, position and gripper actions.<br><br>In this thesis, a framework for multi-step surgical task automation is designed and implemented by leveraging deep imitation learning. The framework optimizes Recurrent Neural Networks (RNNs) for the execution of the whole surgical tasks while considering tooltip translations, rotations as well as gripper actions. The network architecture proposed implicitly optimizes for the interaction between two robotic arms as opposed to modeling each arm independently. The networks were trained directly from the human demonstrations and do not require to create task specific hand-crafted models or to manually segment the demonstrations.<br><br>The proposed framework was implemented and evaluated in simulation for two relevant surgical tasks, the peg transfer task and the surgical debridement. The tasks were tested under random initial conditions to challenge the robustness of the networks to generalize to variable settings. The performance of the framework was assessed using task and subtask success as well as a set of quantitative metrics. Experimental evaluation showed favorable results for automating surgical tasks under variable conditions for the surgical debridement, which obtained a task success rate comparable to the human task success. For the peg transfer task, the framework displayed moderate overall task success. Quantitative metrics indicate that the robot generated trajectories possess similar or better motion economy that the human demonstrations.
|
27 |
On the Efficiency of Transfer Learning in a Fighter Pilot Behavior Modelling Context / Effektiviteten av överföringsinlärning vid beteendemodellering av stridspiloterSandström, Viktor January 2021 (has links)
Creating realistic models of human fighter pilot behavior is made possible with recent deep learning techniques. However, these techniques are often highly dependent on large datasets, often unavailable in many settings, or expensive to produce. Transfer learning is an active research field where the idea is to leverage the knowledge gained from studying a problem for which large amounts of training data are more readily available, when considering a different, related problem. The related problem is called the target task and the initial problem is called the source task. Given a successful transfer scenario, a smaller amount of data, or less training, can be required to reach high quality results on the target task. The first part of this thesis focuses on the development of a fighter pilot model using behavior cloning, a method for reducing an imitation learning problem to standard supervised learning. The resulting model, called a policy, is capable of imitating a human pilot controlling a fighter jet in the military combat simulator Virtual BattleSpace 3. In this simulator, the forces acting on the aircraft can be modelled using one of several flight dynamic models (FDMs). In the second part, the efficiency of transfer learning is measured. This is done by replacing the built-in FDM to one with a significant variation in the input response, and subsequently train two policies on successive amount of data. One policy was trained using only the latter FDM, whereas the other policy exploits the gained knowledge from the first part of the thesis, using a technique called fine-tuning. The results indicate that a model already capable of handling one FDM, adapts to a different FDM with less data compared to a previously untrained policy. / Realistiska modeller av mänskligt pilotbeteende kan potentiellt skapas med djupinlärningstekniker. För detta krävs ofta stora datamängder som för många tillämpningar saknas, eller är dyra att ta fram. Överföringsinlärning är ett aktivt forskningsfält där grundidén är att utnyttja redan inlärd kunskap från ett problem där stora mängder träningsdata finns tillgängligt, vid undersökning av ett relaterat problem. Vid lyckad överföringinlärning behövs en mindre mängd data, eller mindre träning, för att uppnå ett önskvärt resultat på denna måluppgift. Första delen av detta examensarbete handlar om utvecklingen av en pilotmodell med hjälp av beteendekloning, en metod som reducerar imitationsinlärning till vanlig övervakad inlärning. Den resulterande pilotmodellen klarar av att imitera en mänsklig pilot som styr ett stridsflygplan i den militära simulatormiljön Virtual BattleSpace 3, där krafterna som verkar på flygplanet modelleras med en enkel inbyggd flygdynamiksmodell. I den andra delen av arbetet utvärderas överföringsförmågan mellan olika flygdynamiksmodeller. Detta gjordes genom att ersätta den inbyggda dynamiken till en dynamik som modellerar ett annat flygplan och som svarar på styrsignaler på ett vida olikartat sätt. Sedan tränades två stridspilotmodeller successivt på ökad mängd data. Den ena pilotmodellen tränas endast med den ena dynamiken varvid den andra pilotmodellen utnyttjar det redan inlärda beteendet från första delen av arbetet, med hjälp av en teknik som kallas finjustering. Resultaten visar att en pilotmodell som redan lärt sig att flyga med en specifik flygdynamik har lättare att lära sig en ny dynamik, jämfört med en pilotmodell som inte förtränats.
|
28 |
Flying High: Deep Imitation Learning of Optimal Control for Unmanned Aerial Vehicles / Far & Flyg: Djup Imitationsinlärning av Optimal Kontroll för Obemannade LuftfarkosterEricson, Ludvig January 2018 (has links)
Optimal control for multicopters is difficult in part due to the low processing power available, and the instability inherent to multicopters. Deep imitation learning is a method for approximating an expert control policy with a neural network, and has the potential of improving control for multicopters. We investigate the performance and reliability of deep imitation learning with trajectory optimization as the expert policy by first defining a dynamics model for multicopters and applying a trajectory optimization algorithm to it. Our investigation shows that network architecture plays an important role in the characteristics of both the learning process and the resulting control policy, and that in particular trajectory optimization can be leveraged to improve convergence times for imitation learning. Finally, we identify some limitations and future areas of study and development for the technology. / Optimal kontroll för multikoptrar är ett svårt problem delvis på grund av den vanligtvis låga processorkraft som styrdatorn har, samt att multikoptrar är synnerligen instabila system. Djup imitationsinlärning är en metod där en beräkningstung expert approximeras med ett neuralt nätverk, och gör det därigenom möjligt att köra dessa tunga experter som realtidskontroll för multikoptrar. I detta arbete undersöks prestandan och pålitligheten hos djup imitationsinlärning med banoptimering som expert genom att först definiera en dynamisk modell för multikoptrar, sedan applicera en välkänd banoptimeringsmetod på denna modell, och till sist approximera denna expert med imitationsinlärning. Vår undersökning visar att nätverksarkitekturen spelar en avgörande roll för karakteristiken hos både inlärningsprocessens konvergenstid, såväl som den resulterande kontrollpolicyn, och att särskilt banoptimering kan nyttjas för att förbättra konvergenstiden hos imitationsinlärningen. Till sist påpekar vi några begränsningar hos metoden och identifierar särskilt intressanta områden för framtida studier.
|
29 |
Performance Evaluation of Imitation Learning Algorithms with Human ExpertsBåvenstrand, Erik, Berggren, Jakob January 2019 (has links)
The purpose of this thesis was to compare the performance of three different imitation learning algorithms with human experts, with limited expert time. The central question was, ”How should one implement imitation learning in a simulated car racing environment, using human experts, to achieve the best performance when access to the experts is limited?”. We limited the work to only consider the three algorithms Behavior Cloning, DAGGER, and HG-DAGGER and limited the implementation to the car racing simulator TORCS. The agents consisted of the same type of feedforward neural network that utilized sensor data provided by TORCS. Through comparison in the performance of the different algorithms on a different amount of expert time, we can conclude that HGDAGGER performed the best. In this case, performance is regarded as a distance covered given set time. Its performance also seemed to scale well with more expert time, which the others did not. This result confirmed previously published results when comparing these algorithms. / Målet med detta examensarbete var att jämföra prestandan av tre olika algoritmer inom området imitationinlärning med mänskliga experter, där experttiden är begränsad. Arbetets frågeställning var, ”Hur ska man implementera imitationsinlärning i en bilsimulator, för att få bäst prestanda, med mänskliga experter där experttiden är begränsad?”. Vi begränsade arbetet till att endast omfatta de tre algoritmerna, Behavior Cloning, DAGGER och HG-DAGGER, och begränsade implementationsmiljön till bilsimulatorn TORCS. Alla agenterna bestod av samma sorts feedforward neuralt nätverk som använde sig av sensordata från TROCS. Genom jämförelse i prestanda på olika mängder experttid kan vi dra slutsatsen att HG-DAGGER gav bäst resultat. I detta fall motsvarar prestanda körsträcka, givet en viss tid. Dess prestanda verkar även utvecklas väl med ytterligare experttid, vilket de övriga inte gjorde. Detta resultat bekräftar tidigare publicerade resultat om jämförelse av de tre olika algoritmerna.
|
30 |
Training an Adversarial Non-Player Character with an AI Demonstrator : Applying Unity ML-AgentsJlali, Yousra Ramdhana January 2022 (has links)
Background. Game developers are continuously searching for new ways of populating their vast game worlds with competent and engaging Non-Player Characters (NPCs), and researchers believe Deep Reinforcement Learning (DRL) might be the solution for emergent behavior. Consequently, fusing NPCs with DRL practices has surged in recent years, however, proposed solutions rarely outperform traditional script-based NPCs. Objectives. This thesis explores a novel method of developing an adversarial DRL NPC by combining Reinforcement Learning (RL) algorithms. Our goal is to produce an agent that surpasses its script-based opponents by first mimicking their actions. Methods. The experiment commences with Imitation Learning (IL) before proceeding with supplementary DRL training where the agent is expected to improve its strategies. Lastly, we make all agents participate in 100-deathmatch tournaments to statistically evaluate and differentiate their deathmatch performances. Results. Statistical tests reveal that the agents reliably differ from one another and that our learning agent performed poorly in comparison to its script-based opponents. Conclusions. Based on our computed statistics, we can conclude that our solution was unsuccessful in developing a talented hostile DRL agent as it was unable to convey any form of proficiency in deathmatches. No further improvements could be applied to our ML agent due to the time constraints. However, we believe our outcome can be used as a stepping-stone for future experiments within this branch of research.
|
Page generated in 0.0906 seconds