Recent advancements in the field of Computer Vision are a by-product of breakthroughs in the domain of Artificial Intelligence. Object detection in monocular images is now realized by an amalgamation of Computer Vision and Deep Learning. While most approaches detect objects as a mere two dimensional (2D) bounding box, there are a few that exploit rather traditional representation of the 3D object. Such approaches detect an object either as a 3D bounding box or exploit its shape primitives using active shape models which results in a wireframe-like detection. Such a wireframe detection is represented as combinations of detected keypoints (or landmarks) of the desired object. Apart from a faithful retrieval of the object’s true shape, wireframe based approaches are relatively robust in handling occlusions. The central task of this thesis was to find such an approach and to implement it with the goal of its performance evaluation. The object of interest is the vehicle class (cars, mini vans, trucks etc.) and the evaluation data is monocular traffic surveillance videos collected by the supervising chair. A wireframe type detection can aid several facets of traffic analysis by improved (compared to 2D bounding box) estimation of the detected object’s ground plane. The thesis encompasses the process of implementation of the chosen approach called Occlusion-Net [40], including its design details and a qualitative evaluation on traffic surveillance videos. The implementation reproduces most of the published results across several occlusion categories except the truncated car category. Occlusion-Net’s erratic detections are mostly caused by incorrect detection of the initial region of interest. It employs three instances of Graph Neural Networks for occlusion reasoning and localization. The thesis also provides a didactic introduction to the field of Machine and Deep Learning including intuitions of mathematical concepts required to understand the two disciplines and the implemented approach.:Contents
1 Introduction 1
2 Technical Background 7
2.1 AI, Machine Learning and Deep Learning 7
2.1.1 But what is AI ? 7
2.1.2 Representational composition by Deep Learning 10
2.2 Essential Mathematics for ML 14
2.2.1 Linear Algebra 15
2.2.2 Probability and Statistics 25
2.2.3 Calculus 34
2.3 Mathematical Introduction to ML 39
2.3.1 Ingredients of a Machine Learning Problem 39
2.3.2 The Perceptron 40
2.3.3 Feature Transformation 46
2.3.4 Logistic Regression 48
2.3.5 Artificial Neural Networks: ANN 53
2.3.6 Convolutional Neural Network: CNN 61
2.3.7 Graph Neural Networks 68
2.4 Specific Topics in Computer Vision 72
2.5 Previous work 76
3 Design of Implemented Approach 81
3.1 Training Dataset 81
3.2 Keypoint Detection : MaskRCNN 83
3.3 Occluded Edge Prediction : 2D-KGNN Encoder 84
3.4 Occluded Keypoint Localization : 2D-KGNN Decoder 86
3.5 3D Shape Estimation: 3D-KGNN Encoder 88
4 Implementation 93
4.1 Open-Source Tools and Libraries 93
4.1.1 Code Packaging: NVIDIA-Docker 94
4.1.2 Data Processing Libraries 94
4.1.3 Libraries for Neural Networks 95
4.1.4 Computer Vision Library 95
4.2 Dataset Acquisition and Training 96
4.2.1 Acquiring Dataset 96
4.2.2 Training Occlusion-Net 96
4.3 Refactoring 97
4.3.1 Error in Docker File 97
4.3.2 Image Directories as Input 97
4.3.3 Frame Extraction in Parallel 98
4.3.4 Video as Input 100
4.4 Functional changes 100
4.4.1 Keypoints In Output 100
4.4.2 Mismatched BB and Keypoints 101
4.4.3 Incorrect Class Labels 101
4.4.4 Bounding Box Overlay 101
5 Evaluation 103
5.1 Qualitative Evaluation 103
5.1.1 Evaluation Across Occlusion Categories 103
5.1.2 Performance on Moderate and Heavy Vehicles 105
5.2 Verification of Failure Analysis 106
5.2.1 Truncated Cars 107
5.2.2 Overlapping Cars 108
5.3 Analysis of Missing Frames 109
5.4 Test Performance 110
6 Conclusion 113
7 Future Work 117
Bibliography 119
Identifer | oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:73973 |
Date | 19 February 2021 |
Creators | Mishra, Abhinav |
Contributors | Gerike, Regine, Bärwolff, Martin, Techniche Universität Dresden |
Source Sets | Hochschulschriftenserver (HSSS) der SLUB Dresden |
Language | English |
Detected Language | English |
Type | info:eu-repo/semantics/publishedVersion, doc-type:masterThesis, info:eu-repo/semantics/masterThesis, doc-type:Text |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0024 seconds