Spelling suggestions: "subject:"abject detection"" "subject:"6bject detection""
391 |
Dataset Evaluation Method for Vehicle Detection Using TensorFlow Object Detection API / Utvärderingsmetod för dataset inom fordonsigenkänning med användning avTensorFlow Object Detection APIFurundzic, Bojan, Mathisson, Fabian January 2021 (has links)
Recent developments in the field of object detection have highlighted a significant variation in quality between visual datasets. As a result, there is a need for a standardized approach of validating visual dataset features and their performance contribution. With a focus on vehicle detection, this thesis aims to develop an evaluation method utilized for comparing visual datasets. This method was utilized to determine the dataset that contributed to the detection model with the greatest ability to detect vehicles. The visual datasets compared in this research were BDD100K, KITTI and Udacity, each one being trained on individual models. Applying the developed evaluation method, a strong indication of BDD100K's performance superiority was determined. Further analysis and feature extraction of dataset size, label distribution and average labels per image was conducted. In addition, real-world experimental conduction was performed in order to validate the developed evaluation method. It could be determined that all features and experimental results pointed to BDD100K's superiority over the other datasets, validating the developed evaluation method. Furthermore, the TensorFlow Object Detection API's ability to improve performance gain from a visual dataset was studied. Through the use of augmentations, it was concluded that the TensorFlow Object Detection API serves as a great tool to increase performance gain for visual datasets. / Inom fältet av objektdetektering har ny utveckling demonstrerat stor kvalitetsvariation mellan visuella dataset. Till följd av detta finns det ett behov av standardiserade valideringsmetoder för att jämföra visuella dataset och deras prestationsförmåga. Detta examensarbete har, med ett fokus på fordonsigenkänning, som syfte att utveckla en pålitlig valideringsmetod som kan användas för att jämföra visuella dataset. Denna valideringsmetod användes därefter för att fastställa det dataset som bidrog till systemet med bäst förmåga att detektera fordon. De dataset som användes i denna studien var BDD100K, KITTI och Udacity, som tränades på individuella igenkänningsmodeller. Genom att applicera denna valideringsmetod, fastställdes det att BDD100K var det dataset som bidrog till systemet med bäst presterande igenkänningsförmåga. En analys av dataset storlek, etikettdistribution och genomsnittliga antalet etiketter per bild var även genomförd. Tillsammans med ett experiment som genomfördes för att testa modellerna i verkliga sammanhang, kunde det avgöras att valideringsmetoden stämde överens med de fastställda resultaten. Slutligen studerades TensorFlow Object Detection APIs förmåga att förbättra prestandan som erhålls av ett visuellt dataset. Genom användning av ett modifierat dataset, kunde det fastställas att TensorFlow Object Detection API är ett lämpligt modifieringsverktyg som kan användas för att öka prestandan av ett visuellt dataset.
|
392 |
Training of Object Detection Spiking Neural Networks for Event-Based VisionJohansson, Olof January 2021 (has links)
Event-based vision offers high dynamic range, time resolution and lower latency than conventional frame-based vision sensors. These attributes are useful in varying light condition and fast motion. However, there are no neural network models and training protocols optimized for object detection with event data, and conventional artificial neural networks for frame-based data are not directly suitable for that task. Spiking neural networks are natural candidates but further work is required to develop an efficient object detection architecture and end-to-end training protocol. For example, object detection in varying light conditions is identified as a challenging problem for the automation of construction equipment such as earth-moving machines, aiming to increase the safety of operators and make repetitive processes less tedious. This work focuses on the development and evaluation of a neural network for object detection with data from an event-based sensor. Furthermore, the strengths and weaknesses of an event-based vision solution are discussed in relation to the known challenges described in former works on automation of earth-moving machines. A solution for object detection with event data is implemented as a modified YOLOv3 network with spiking convolutional layers trained with a backpropagation algorithm adapted for spiking neural networks. The performance is evaluated on the N-Caltech101 dataset with classes for airplanes and motorbikes, resulting in a mAP of 95.8% for the combined network and 98.8% for the original YOLOv3 network with the same architecture. The solution is investigated as a proof of concept and suggestions for further work is described based on a recurrent spiking neural network.
|
393 |
Advanced Data Augmentation : With Generative Adversarial Networks and Computer-Aided DesignThaung, Ludwig January 2020 (has links)
CNN-based (Convolutional Neural Network) visual object detectors often reach human level of accuracy but need to be trained with large amounts of manually annotated data. Collecting and annotating this data can frequently be time-consuming and financially expensive. Using generative models to augment the data can help minimize the amount of data required and increase detection per-formance. Many state-of-the-art generative models are Generative Adversarial Networks (GANs). This thesis investigates if and how one can utilize image data to generate new data through GANs to train a YOLO-based (You Only Look Once) object detector, and how CAD (Computer-Aided Design) models can aid in this process. In the experiments, different models of GANs are trained and evaluated by visual inspection or with the Fréchet Inception Distance (FID) metric. The data provided by Ericsson Research consists of images of antenna and baseband equipment along with annotations and segmentations. Ericsson Research supplied the YOLO detector, and no modifications are made to this detector. Finally, the YOLO detector is trained on data generated by the chosen model and evaluated by the Average Precision (AP). The results show that the generative models designed in this work can produce RGB images of high quality. However, the quality reduces if binary segmentation masks are to be generated as well. The experiments with CAD input data did not result in images that could be used for the training of the detector. The GAN designed in this work is able to successfully replace objects in images with the style of other objects. The results show that training the YOLO detector with GAN-modified data compared to training with real data leads to the same detection performance. The results also show that the shapes and backgrounds of the antennas contributed more to detection performance than their style and colour.
|
394 |
Object representation in local feature spaces : application to real-time tracking and detection / Représentation d'objets dans des espaces de caractéristiques locales : application à la poursuite de cibles temps-réel et à la détectionTran, Antoine 25 October 2017 (has links)
La représentation visuelle est un problème fondamental en vision par ordinateur. Le but est de réduire l'information au strict nécessaire pour une tâche désirée. Plusieurs types de représentation existent, comme les caractéristiques de couleur (histogrammes, attributs de couleurs...), de forme (dérivées, points d'intérêt...) ou d'autres, comme les bancs de filtres.Les caractéristiques bas-niveau (locales) sont rapides à calculer. Elles ont un pouvoir de représentation limité, mais leur généricité présente un intérêt pour des systèmes autonomes et multi-tâches, puisque les caractéristiques haut-niveau découlent d'elles.Le but de cette thèse est de construire puis d'étudier l'impact de représentations fondées seulement sur des caractéristiques locales de bas-niveau (couleurs, dérivées spatiales) pour deux tâches : la poursuite d'objets génériques, nécessitant des caractéristiques robustes aux variations d'aspect de l'objet et du contexte au cours du temps; la détection d'objets, où la représentation doit décrire une classe d'objets en tenant compte des variations intra-classe. Plutôt que de construire des descripteurs d'objets globaux dédiés, nous nous appuyons entièrement sur les caractéristiques locales et sur des mécanismes statistiques flexibles visant à estimer leur distribution (histogrammes) et leurs co-occurrences (Transformée de Hough Généralisée). La Transformée de Hough Généralisée (THG), créée pour la détection de formes quelconques, consiste à créer une structure de données représentant un objet, une classe... Cette structure, d'abord indexée par l'orientation du gradient, a été étendue à d'autres caractéristiques. Travaillant sur des caractéristiques locales, nous voulons rester proche de la THG originale.En poursuite d'objets, après avoir présenté nos premiers travaux, combinant la THG avec un filtre particulaire (utilisant un histogramme de couleurs), nous présentons un algorithme plus léger et rapide (100fps), plus précis et robuste. Nous présentons une évaluation qualitative et étudierons l'impact des caractéristiques utilisées (espace de couleur, formulation des dérivées partielles...). En détection, nous avons utilisé l'algorithme de Gall appelé forêts de Hough. Notre but est de réduire l'espace de caractéristiques utilisé par Gall, en supprimant celles de type HOG, pour ne garder que les dérivées partielles et les caractéristiques de couleur. Pour compenser cette réduction, nous avons amélioré deux étapes de l'entraînement : le support des descripteurs locaux (patchs) est partiellement produit selon une mesure géométrique, et l'entraînement des nœuds se fait en générant une carte de probabilité spécifique prenant en compte les patchs utilisés pour cette étape. Avec l'espace de caractéristiques réduit, le détecteur n'est pas plus précis. Avec les mêmes caractéristiques que Gall, sur une même durée d'entraînement, nos travaux ont permis d'avoir des résultats identiques, mais avec une variance plus faible et donc une meilleure répétabilité. / Visual representation is a fundamental problem in computer vision. The aim is to reduce the information to the strict necessary for a query task. Many types of representation exist, like color features (histograms, color attributes...), shape ones (derivatives, keypoints...) or filterbanks.Low-level (and local) features are fast to compute. Their power of representation are limited, but their genericity have an interest for autonomous or multi-task systems, as higher level ones derivate from them. We aim to build, then study impact of low-level and local feature spaces (color and derivatives only) for two tasks: generic object tracking, requiring features robust to object and environment's aspect changes over the time; object detection, for which the representation should describe object class and cope with intra-class variations.Then, rather than using global object descriptors, we use entirely local features and statisticals mecanisms to estimate their distribution (histograms) and their co-occurrences (Generalized Hough Transform).The Generalized Hough Transform (GHT), created for detection of any shape, consists in building a codebook, originally indexed by gradient orientation, then to diverse features, modeling an object, a class. As we work on local features, we aim to remain close to the original GHT.In tracking, after presenting preliminary works combining the GHT with a particle filter (using color histograms), we present a lighter and fast (100 fps) tracker, more accurate and robust.We present a qualitative evaluation and study the impact of used features (color space, spatial derivative formulation).In detection, we used Gall's Hough Forest. We aim to reduce Gall's feature space and discard HOG features, to keep only derivatives and color ones.To compensate the reduction, we enhanced two steps: the support of local descriptors (patches) are partially chosen using a geometrical measure, and node training is done by using a specific probability map based on patches used at this step.With reduced feature space, the detector is less accurate than with Gall's feature space, but for the same training time, our works lead to identical results, but with higher stability and then better repeatability.
|
395 |
Vytěžování snímků z panoramatické kamery mobilního mapování / Exploitation of images from panoramatic camera of mobile mapping systemBelanis, Pavel January 2019 (has links)
This diploma thesis deals with an automated detection of vertical traffic signs in images from the panoramic camera Ladybug5. From the detected signs with help of a classifier, a GIS data set is automatically created, usable for example to passportisation of traffic signs. The first part of the thesis describes a theoretical basis needed to understand the given problematics. The second part is devoted to a specific procedure leading to the reliable classifier, its testing on an independent set of images and automated creation of the GIS data set. The output of the work are the trained classifiers and the GIS data sets containing vertical traffic signs.
|
396 |
Towards Condition-Based Maintenance of Catenary wires using computer vision : Deep Learning applications on eMaintenance & Industrial AI for railway industryMoussallik, Laila January 2021 (has links)
Railways are a main element of a sustainable transport policy in several countries as they are considered a safe, efficient and green mode of transportation. Owing to these advantages, there is a cumulative request for the railway industry to increase the performance, the capacity and the availability in addition to safely transport goods and people at higher speeds. To meet the demand, large adjustment of the infrastructure and improvement of maintenance process are required. Inspection activities are essential in establishing the required maintenance, and it is periodically required to reduce unexpected failures and to prevent dangerous consequences. Maintenance of railway catenary systems is a critical task for warranting the safety of electrical railway operation.Usually, the catenary inspection is performed manually by trained personnel. However, as in all human-based inspections characterized by slowness and lack of objectivity, might have a number of crucial disadvantages and potentially lead to dangerous consequences. With the rapid progress of artificial intelligence, it is appropriate for computer vision detection approaches to replace the traditional manual methods during inspections. In this thesis, a strategy for monitoring the health of catenary wires is developed, which include the various steps needed to detect anomalies in this component. Moreover, a solution for detecting different types of wires in the railway catenary system was implemented, in which a deep learning framework is developed by combining the Convolutional Neural Network (CNN) and the Region Proposal Network (RPN).
|
397 |
Implementation of an Approach for 3D Vehicle Detection in Monocular Traffic Surveillance VideosMishra, Abhinav 19 February 2021 (has links)
Recent advancements in the field of Computer Vision are a by-product of breakthroughs in the domain of Artificial Intelligence. Object detection in monocular images is now realized by an amalgamation of Computer Vision and Deep Learning. While most approaches detect objects as a mere two dimensional (2D) bounding box, there are a few that exploit rather traditional representation of the 3D object. Such approaches detect an object either as a 3D bounding box or exploit its shape primitives using active shape models which results in a wireframe-like detection. Such a wireframe detection is represented as combinations of detected keypoints (or landmarks) of the desired object. Apart from a faithful retrieval of the object’s true shape, wireframe based approaches are relatively robust in handling occlusions. The central task of this thesis was to find such an approach and to implement it with the goal of its performance evaluation. The object of interest is the vehicle class (cars, mini vans, trucks etc.) and the evaluation data is monocular traffic surveillance videos collected by the supervising chair. A wireframe type detection can aid several facets of traffic analysis by improved (compared to 2D bounding box) estimation of the detected object’s ground plane. The thesis encompasses the process of implementation of the chosen approach called Occlusion-Net [40], including its design details and a qualitative evaluation on traffic surveillance videos. The implementation reproduces most of the published results across several occlusion categories except the truncated car category. Occlusion-Net’s erratic detections are mostly caused by incorrect detection of the initial region of interest. It employs three instances of Graph Neural Networks for occlusion reasoning and localization. The thesis also provides a didactic introduction to the field of Machine and Deep Learning including intuitions of mathematical concepts required to understand the two disciplines and the implemented approach.:Contents
1 Introduction 1
2 Technical Background 7
2.1 AI, Machine Learning and Deep Learning 7
2.1.1 But what is AI ? 7
2.1.2 Representational composition by Deep Learning 10
2.2 Essential Mathematics for ML 14
2.2.1 Linear Algebra 15
2.2.2 Probability and Statistics 25
2.2.3 Calculus 34
2.3 Mathematical Introduction to ML 39
2.3.1 Ingredients of a Machine Learning Problem 39
2.3.2 The Perceptron 40
2.3.3 Feature Transformation 46
2.3.4 Logistic Regression 48
2.3.5 Artificial Neural Networks: ANN 53
2.3.6 Convolutional Neural Network: CNN 61
2.3.7 Graph Neural Networks 68
2.4 Specific Topics in Computer Vision 72
2.5 Previous work 76
3 Design of Implemented Approach 81
3.1 Training Dataset 81
3.2 Keypoint Detection : MaskRCNN 83
3.3 Occluded Edge Prediction : 2D-KGNN Encoder 84
3.4 Occluded Keypoint Localization : 2D-KGNN Decoder 86
3.5 3D Shape Estimation: 3D-KGNN Encoder 88
4 Implementation 93
4.1 Open-Source Tools and Libraries 93
4.1.1 Code Packaging: NVIDIA-Docker 94
4.1.2 Data Processing Libraries 94
4.1.3 Libraries for Neural Networks 95
4.1.4 Computer Vision Library 95
4.2 Dataset Acquisition and Training 96
4.2.1 Acquiring Dataset 96
4.2.2 Training Occlusion-Net 96
4.3 Refactoring 97
4.3.1 Error in Docker File 97
4.3.2 Image Directories as Input 97
4.3.3 Frame Extraction in Parallel 98
4.3.4 Video as Input 100
4.4 Functional changes 100
4.4.1 Keypoints In Output 100
4.4.2 Mismatched BB and Keypoints 101
4.4.3 Incorrect Class Labels 101
4.4.4 Bounding Box Overlay 101
5 Evaluation 103
5.1 Qualitative Evaluation 103
5.1.1 Evaluation Across Occlusion Categories 103
5.1.2 Performance on Moderate and Heavy Vehicles 105
5.2 Verification of Failure Analysis 106
5.2.1 Truncated Cars 107
5.2.2 Overlapping Cars 108
5.3 Analysis of Missing Frames 109
5.4 Test Performance 110
6 Conclusion 113
7 Future Work 117
Bibliography 119
|
398 |
Design and implementation of an affordable reversing camera system with object detection and OBD-2 integration for commercial vehicles / Design och implementering av ett prisvärt backkamerasystem med objektdetektering och OBD-2-integration för kommersiella fordonEbrahimi, Alireza, Akbari, Esmatullah January 2023 (has links)
This thesis is about designing and implementing an affordable reversing camera sys-tem with object detection and OBD-2 integration for commercial vehicles. The aim is to improve the safety and efficiency of these vehicles by giving drivers a clear view of their surroundings behind the vehicle and alerting them to the presence of nearby obstacles. Ultrasonic sensors are used for object detection and give the driver control over the environment behind the vehicle and warn of present obstacles. The system is also integrated with the vehicle's on-board diagnostics system (OBD-2), which provides important information on speed and engine performance, among other things. This project contributes to making safety systems more accessible to com-mercial vehicles and reduces the risk of accidents and collisions. / Detta examensarbete handlar om att utforma och implementera ett prisvärt backkamerasystem objektdetektering och integration med On-Board Diagnostics 2 för kommersiella fordon. Syftet är att förbättra säkerheten och effektiviteten för dessa fordon genom att ge förarna en tydlig vy av deras omgivningar bakom fordonet och varna dem för närvaron av hinder i närheten. Ultraljudssensorer används för objekt-detektering och ger föraren en kontroll över omgivningen bakom fordonet samt var-nar för närvarande hinder. Systemet är också integrerat med fordonets omborddia-gnostiksystem (OBD-2), som ger viktig information om bland annat hastighet och motorprestanda. Detta projekt bidrar till att göra säkerhetssystem mer tillgängliga för kommersiella fordon och minskar risken för olyckor och kollisioner.
|
399 |
Multi-site Organ Detection in CT Images using Deep Learning / Regionsoberoende organdetektion i CT-bilder meddjupinlärningJacobzon, Gustaf January 2020 (has links)
When optimizing a controlled dose in radiotherapy, high resolution spatial information about healthy organs in close proximity to the malignant cells are necessary in order to mitigate dispersion into these organs-at-risk. This information can be provided by deep volumetric segmentation networks, such as 3D U-Net. However, due to limitations of memory in modern graphical processing units, it is not feasible to train a volumetric segmentation network on full image volumes and subsampling the volume gives a too coarse segmentation. An alternative is to sample a region of interest from the image volume and train an organ-specific network. This approach requires knowledge of which region in the image volume that should be sampled and can be provided by a 3D object detection network. Typically the detection network will also be region specific, although a larger region such as the thorax region, and requires human assistance in choosing the appropriate network for a certain region in the body. Instead, we propose a multi-site object detection network based onYOLOv3 trained on 43 different organs, which may operate on arbitrary chosen axial patches in the body. Our model identifies the organs present (whole or truncated) in the image volume and may automatically sample a region from the input and feed to the appropriate volumetric segmentation network. We train our model on four small (as low as 20 images) site-specific datasets in a weakly-supervised manner in order to handle the partially unlabeled nature of site-specific datasets. Our model is able to generate organ-specific regions of interests that enclose 92% of the organs present in the test set. / Vid optimering av en kontrollerad dos inom strålbehandling krävs det information om friska organ, så kallade riskorgan, i närheten av de maligna cellerna för att minimera strålningen i dessa organ. Denna information kan tillhandahållas av djupa volymetriskta segmenteringsnätverk, till exempel 3D U-Net. Begränsningar i minnesstorleken hos moderna grafikkort gör att det inte är möjligt att träna ett volymetriskt segmenteringsnätverk på hela bildvolymen utan att först nedsampla volymen. Detta leder dock till en lågupplöst segmentering av organen som inte är tillräckligt precis för att kunna användas vid optimeringen. Ett alternativ är att endast behandla en intresseregion som innesluter ett eller ett fåtal organ från bildvolymen och träna ett regionspecifikt nätverk på denna mindre volym. Detta tillvägagångssätt kräver dock information om vilket område i bildvolymen som ska skickas till det regionspecifika segmenteringsnätverket. Denna information kan tillhandahållas av ett 3Dobjektdetekteringsnätverk. I regel är även detta nätverk regionsspecifikt, till exempel thorax-regionen, och kräver mänsklig assistans för att välja rätt nätverk för en viss region i kroppen. Vi föreslår istället ett multiregions-detekteringsnätverk baserat påYOLOv3 som kan detektera 43 olika organ och fungerar på godtyckligt valda axiella fönster i kroppen. Vår modell identifierar närvarande organ (hela eller trunkerade) i bilden och kan automatiskt ge information om vilken region som ska behandlas av varje regionsspecifikt segmenteringsnätverk. Vi tränar vår modell på fyra små (så lågt som 20 bilder) platsspecifika datamängder med svag övervakning för att hantera den delvis icke-annoterade egenskapen hos datamängderna. Vår modell genererar en organ-specifik intresseregion för 92 % av organen som finns i testmängden.
|
400 |
Supervision : Object motion interpretation using hyperdimensional computing based on object detection run on the edgeAndersson Svensson, Albin January 2022 (has links)
This thesis demonstrates a technique for developing efficient applications interpreting spacial deep learning output using Hyper Dimensional Computing (HDC), also known as Vector Symbolic Architecture (VSA). As a part of the application demonstration, a novel preprocessing technique for motion using state machines and spacial semantic pointers will be explained. The application will be evaluated and run on a Google Coral edge TPU interpreting real time inference of a compressed object detection model.
|
Page generated in 0.1418 seconds