Global ETD Search

1	Representing and learning affordance-based behaviors Hermans, Tucker Ryer 22 May 2014 (has links) Autonomous robots deployed in complex, natural human environments such as homes and offices need to manipulate numerous objects throughout their deployment. For an autonomous robot to operate effectively in such a setting and not require excessive training from a human operator, it should be capable of discovering how to reliably manipulate novel objects it encounters. We characterize the possible methods by which a robot can act on an object using the concept of affordances. We define affordance-based behaviors as object manipulation strategies available to a robot, which correspond to specific semantic actions over which a task-level planner or end user of the robot can operate. This thesis concerns itself with developing the representation of these affordance- based behaviors along with associated learning algorithms. We identify three specific learning problems. The first asks which affordance-based behaviors a robot can successfully apply to a given object, including ones seen for the first time. Second, we examine how a robot can learn to best apply a specific behavior as a function of an object’s shape. Third, we investigate how learned affordance knowledge can be transferred between different objects and different behaviors. We claim that decomposing affordance-based behaviors into three separate factors— a control policy, a perceptual proxy, and a behavior primitive—aids an autonomous robot in learning to manipulate. Having a varied set of affordance-based behaviors available allows a robot to learn which behaviors perform most effectively as a function of an object’s identity or pose in the workspace. For a specific behavior a robot can use interactions with previously encountered objects to learn to robustly manipulate a novel object when first encountered. Finally, our factored representation allows a robot to transfer knowledge learned with one behavior to effectively manipulate an object in a qualitatively different manner by using a distinct controller or behavior primitive. We evaluate all work on a bimanual, mobile-manipulator robot. In all experiments the robot interacts with real-world objects sensed by an RGB-D camera. Robot learning Affordance learning Autonomous robots Machine learning
2	Transformer enhanced affordance learning for autonomous driving Sankar, Rajasekar 30 October 2024 (has links) Most existing autonomous driving perception approaches rely on the Direct perception method with camera sensors, yet they often overlook the valuable 3D spatial data provided by other sensors, such as LiDAR. This Master thesis investigates enhancing affordance learning through a multimodal fusion transformer, aiming to refine AV perception and scene interpretation by effectively integrating multi-sensor data. Our approach introduces a two-stage network architecture: the first stage employs a backbone to fuse sensor data and to extract features, while the second stage employs a Taskblock MLP network to predict both classification affordances (junction, red light, pedestrian, and vehicle hazards) and regression affordances (relative angle, lateral distance, and target vehicle distance). We utilized the TransFuser backbone, based on Imitation Learning, to integrate image and LiDAR BEV data using a self-attention mechanism and to extract the feature map. Our results are compared against image-only architectures like Latent TransFuser and other sensor fusion backbones. Integration with the OmniOpt 2 tool, developed by ScaDS.AI, facilitates hyperparameter optimization, enhancing the model performance. We assessed our model's effectiveness using the CARLA Town02 and as well as the real-world KITTI-360 datasets, demonstrating significant improvements in affordance prediction accuracy and reliability. This advancement underscores the potential of combining LiDAR and image data via transformer-based fusion to create safer and more efficient autonomous driving systems.:List of Figures ix List of Tables xi Abbreviations xiii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Autonmous Driving: Overview . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 From highly automated to autonomous . . . . . . . . . . . . . . 1 1.1.2 Autonomy levels . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.3 Perception systems . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Three Paradigms for autonomous driving . . . . . . . . . . . . . . . . . 4 1.3 Sensor Fusion: Global context capture . . . . . . . . . . . . . . . . . . . 5 1.4 Research Questions and Methods . . . . . . . . . . . . . . . . . . . . . . 5 1.4.1 Research Questions (RQ) . . . . . . . . . . . . . . . . . . . . . . 5 1.4.2 Research Methods (RM) . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Structure of the work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Research Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1 Affordance Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Multi-Modal Autonomous Driving . . . . . . . . . . . . . . . . . . . . . 9 2.3 Sensor Fusion Methods for Object Detection and Motion Forecasting . . 10 2.4 Attention for Autonomous Driving . . . . . . . . . . . . . . . . . . . . . 11 3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.1 Problem setting A . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.2 Problem setting B . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Input and Output parametrization . . . . . . . . . . . . . . . . . . . . . 15 3.2.1 Input Representation . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.2 Output Representation . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3 Definition of affordances . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.4 Proposed Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.5 Detailed overview of the Proposed Architecture . . . . . . . . . . . . . . 20 3.5.1 Stage1: TransFuser Backbone - Multimodal fusion transformer . 21 3.5.2 Fused Feature extraction . . . . . . . . . . . . . . . . . . . . . . 23 3.5.3 Annotations extraction . . . . . . . . . . . . . . . . . . . . . . . 24 3.5.4 Stage2: Task-Block MLP Network architecture . . . . . . . . . . 29 3.6 Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.6.1 Stage1: Loss Function . . . . . . . . . . . . . . . . . . . . . . . . 30 3.6.2 Stage2: Loss Function . . . . . . . . . . . . . . . . . . . . . . . . 31 3.6.3 Total Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.7 Other Backbone Architectures . . . . . . . . . . . . . . . . . . . . . . . . 32 3.7.1 Latent TransFuser . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.7.2 Geometric Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.7.3 Late Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.8 Hyperparameter Optimization: OmniOpt 2 . . . . . . . . . . . . . . . . 34 4 Training and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.1 Dataset definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.1.1 Types of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.1.2 Overview of Dataset Distribution . . . . . . . . . . . . . . . . . . 36 4.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3.1 Stage 1: Backbone architecture training . . . . . . . . . . . . . . 38 4.3.2 Stage 2: TaskBlock MLP training . . . . . . . . . . . . . . . . . 39 4.3.3 Traning Parameter Study . . . . . . . . . . . . . . . . . . . . . . 41 4.4 Loss curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.4.1 Stage 1 Loss curve . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.4.2 Stage 2 Loss curve . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.5 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.5.1 Preparation of a optimization project . . . . . . . . . . . . . . . 43 5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.1 Quantitative Insights into Regression-Based Affordance Predictions . . . 45 5.1.1 Comparative Analysis of Error Metrics against each Backbone . 45 5.1.2 Graphical Analysis of error metrics performance for Transfuser . 47 5.2 Quantitative Insights into Classification-Based Affordance Predictions . 48 5.2.1 Comparative Analysis of Classification Performance Metrics against each Backbone . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2.2 Graphical Analysis of classification performance for TransFuser . 50 5.3 OmniOpt2 Hyper-optimization results . . . . . . . . . . . . . . . . . . . 52 5.4 Affordance Prediction Dashboard . . . . . . . . . . . . . . . . . . . . . . 53 6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 6.1 Evaluation with CARLA Test dataset . . . . . . . . . . . . . . . . . . . 55 6.1.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 6.2 Evaluation with real world: The KITTI Dataset . . . . . . . . . . . . . 56 6.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 A Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 A.1 Latent Transfuser with MLP . . . . . . . . . . . . . . . . . . . . . . . . 61 A.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 A.2.1 Comparative Analysis of Error Metrics in Latent Transfuser with Transformer and MLP . . . . . . . . . . . . . . . . . . . . . . . . 61 A.2.2 Comparative Analysis of Classification Performance Metrics in Latent Transfuser with Transformer and MLP . . . . . . . . . . 62 info:eu-repo/classification/ddc/380 ddc:380

Search results

Representing and learning affordance-based behaviors

Transformer enhanced affordance learning for autonomous driving