Global ETD Search

1	Incorporating Metadata Into the Active Learning Cycle for 2D Object Detection / Inkorporera metadata i aktiv inlärning för 2D objektdetektering Stadler, Karsten January 2021 (has links) In the past years, Deep Convolutional Neural Networks have proven to be very useful for 2D Object Detection in many applications. These types of networks require large amounts of labeled data, which can be increasingly costly for companies deploying these detectors in practice if the data quality is lacking. Pool-based Active Learning is an iterative process of collecting subsets of data to be labeled by a human annotator and used for training to optimize performance per labeled image. The detectors used in Active Learning cycles are conventionally pre-trained with a small subset, approximately 2% of available data labeled uniformly at random. This is something I challenged in this thesis by using image metadata. With the motivation of many Machine Learning models being a "jack of all trades, master of none", thus it is hard to train models such that they generalize to all of the data domain, it can be interesting to develop a detector for a certain target metadata domain. A simple Monte Carlo method, Rejection Sampling, can be implemented to sample according to a metadata target domain. This would require a target and proposal metadata distribution. The proposal metadata distribution would be a parametric model in the form of a Gaussian Mixture Model learned from the training metadata. The parametric model for the target distribution could be learned in a similar manner, however from a target dataset. In this way, only the training images with metadata most similar to the target metadata distribution can be sampled. This sampling approach was employed and tested with a 2D Object Detector: Faster-RCNN with ResNet-50 backbone. The Rejection Sampling approach was tested against conventional random uniform sampling and a classical Active Learning baseline: Min Entropy Sampling. The performance was measured and compared on two different target metadata distributions that were inferred from a specific target dataset. With a labeling budget of 2% for each cycle, the max Mean Average Precision at 0.5 Intersection Over Union for the target set each cycle was calculated. My proposed approach has a 40 % relative performance advantage over random uniform sampling for the first cycle, and 10% after 9 cycles. Overall, my approach only required 37 % of the labeled data to beat the next best-tested sampler: the conventional uniform random sampling. / De senaste åren har Djupa Neurala Faltningsnätverk visat sig vara mycket användbara för 2D Objektdetektering i många applikationer. De här typen av nätverk behöver stora mängder av etiketterat data, något som kan innebära ökad kostnad för företag som distribuerar dem, om kvaliteten på etiketterna är bristfällig. Pool-baserad Aktiv Inlärning är en iterativ process som innebär insamling av delmängder data som ska etiketteras av en människa och användas för träning, för att optimera prestanda per etiketterat data. Detektorerna som används i Aktiv Inlärning är konventionellt sätt förtränade med en mindre delmängd data, ungefär 2% av all tillgänglig data, etiketterat enligt slumpen. Det här är något jag utmanade i det här arbetet genom att använda bild metadata. Med motiveringen att många Maskininlärningsmodeller presterar sämre på större datadomäner, eftersom det kan vara svårt att lära detektorer stora datadomäner, kan det vara intressant att utveckla en detektor för ett särskild metadata mål-domän. För att samla in data enligt en metadata måldomän, kan en enkel Monte Carlo metod, Rejection Sampling implementeras. Det skulle behövas en mål-metadata-distribution och en faktisk metadata distribution. den faktiska metadata distributionen skulle vara en parametrisk modell i formen av en Gaussisk blandningsmodell som är tränad på träningsdata. Den parametriska modellen för mål-metadata-distributionen skulle kunna vara tränad på liknande sätt, fast ifrån mål-datasetet. På detta sätt, skulle endast träningsbilder med metadata mest lik mål-datadistributionen kunna samlas in. Den här samplings-metoden utvecklades och testades med en 2D objektdetektor: Faster R-CNN med ResNet-50 bildegenskapextraktor. Rejection sampling metoden blev testad mot konventionell likformig slumpmässig sampling av data och en klassisk Aktiv Inlärnings metod: Minimum Entropi sampling. Prestandan mättes och jämfördes mellan två olika mål-metadatadistributioner som var framtagna från specifika mål-metadataset. Med en etiketteringsbudget på 2%för varje cykel, så beräknades medelvärdesprecisionen om 0.5 snitt över union för mål-datasetet. Min metod har 40%bättre prestanda än slumpmässig likformig insamling i första cykeln, och 10 % efter 9 cykler. Överlag behövde min metod endast 37 % av den etiketterade data för att slå den näst basta samplingsmetoden: slumpmässig likformig insamling. Active learning Deep Learning Object detection Metadata Nuscenes Nuimages Gaussian mixture model Rejection sampling Monte-Carlo methods Aktiv Inlärning Djupinlärning Objektdetektering metadata Nuscenes Nuimages Gaussisk blandingsmodell Rejection sampling Monte-Carlo metoder Computer and Information Sciences Data- och informationsvetenskap
2	SENSOR FUSION IN NEURAL NETWORKS FOR OBJECT DETECTION Sheetal Prasanna (12447189) 12 July 2022 (has links) <p>Object detection is an increasingly popular tool used in many fields, especially in the<br> development of autonomous vehicles. The task of object detections involves the localization<br> of objects in an image, constructing a bounding box to determine the presence and loca-<br> tion of the object, and classifying each object into its appropriate class. Object detection<br> applications are commonly implemented using convolutional neural networks along with the<br> construction of feature pyramid networks to extract data.<br> Another commonly used technique in the automotive industry is sensor fusion. Each<br> automotive sensor – camera, radar, and lidar – have their own advantages and disadvantages.<br> Fusing two or more sensors together and using the combined information is a popular method<br> of balancing the strengths and weakness of each independent sensor. Together, using sensor<br> fusion within an object detection network has been found to be an effective method of<br> obtaining accurate models. Accurate detections and classifications of images is a vital step<br> in the development of autonomous vehicles or self-driving cars.<br> Many studies have proposed methods to improve neural networks or object detection<br> networks. Some of these techniques involve data augmentation and hyperparameter opti-<br> mization. This thesis achieves the goal of improving a camera and radar fusion network by<br> implementing various techniques within these areas. Additionally, a novel idea of integrating<br> a third sensor, the lidar, into an existing camera and radar fusion network is explored in this<br> research work.<br> The models were trained on the Nuscenes dataset, one of the biggest automotive datasets<br> available today. Using the concepts of augmentation, hyperparameter optimization, sensor<br> fusion, and annotation filters, the CRF-Net was trained to achieve an accuracy score that<br> was 69.13% higher than the baseline</p> Digital processor architectures Computer vision Object detection Sensor Fusion Nuscenes Machine Learning Autonomous Vehicles Radar Lidar Camera Computer Engineering Computer Vision
3	Handling Occlusion using Trajectory Prediction in Autonomous Vehicles / Ocklusionshantering med hjälp av banprediktion för självkörande fordon Ljung, Mattias, Nagy, Bence January 2022 (has links) Occlusion is a frequently occuring challenge in vision systems for autonomous driving. The density of objects in the field-of-view of the vehicle may be so high that some objects are only visible intermittently. It is therefore beneficial to investigate ways to predict the paths of objects under occlusion. In this thesis, we investigate whether trajectory prediction methods can be used to solve the occlusion prediction problem. We investigate two different types of approaches, one based on motion models, and one based on machine learning models. Furthermore, we investigate whether these two approaches can be fused to produce an even more reliable model. We evaluate our models on a pedestrian trajectory prediction dataset, an autonomous driving dataset, and a subset of the autonomous driving dataset that only includes validation examples of occlusion. The comparison of our different approaches shows that pure motion model-based methods perform the worst out of the three. On the other hand, machine learning-based models perform better, yet they require additional computing resources for training. Finally, the fused method performs the best on both the driving dataset and the occlusion data. Our results also indicate that trajectory prediction methods, both motion model-based and learning-based ones, can indeed accurately predict the path of occluded objects up to at least 3 seconds in the autonomous driving scenario. occlusion prediction trajectory prediction computer vision machine learning autonomous driving LSTM motion models graph attention pooling ethucy nuscenes

1

Page generated in 0.0407 seconds