Spelling suggestions: "subject:"abject detection"" "subject:"6bject detection""
401 |
SENSOR FUSION IN NEURAL NETWORKS FOR OBJECT DETECTIONSheetal Prasanna (12447189) 12 July 2022 (has links)
<p>Object detection is an increasingly popular tool used in many fields, especially in the<br>
development of autonomous vehicles. The task of object detections involves the localization<br>
of objects in an image, constructing a bounding box to determine the presence and loca-<br>
tion of the object, and classifying each object into its appropriate class. Object detection<br>
applications are commonly implemented using convolutional neural networks along with the<br>
construction of feature pyramid networks to extract data.<br>
Another commonly used technique in the automotive industry is sensor fusion. Each<br>
automotive sensor – camera, radar, and lidar – have their own advantages and disadvantages.<br>
Fusing two or more sensors together and using the combined information is a popular method<br>
of balancing the strengths and weakness of each independent sensor. Together, using sensor<br>
fusion within an object detection network has been found to be an effective method of<br>
obtaining accurate models. Accurate detections and classifications of images is a vital step<br>
in the development of autonomous vehicles or self-driving cars.<br>
Many studies have proposed methods to improve neural networks or object detection<br>
networks. Some of these techniques involve data augmentation and hyperparameter opti-<br>
mization. This thesis achieves the goal of improving a camera and radar fusion network by<br>
implementing various techniques within these areas. Additionally, a novel idea of integrating<br>
a third sensor, the lidar, into an existing camera and radar fusion network is explored in this<br>
research work.<br>
The models were trained on the Nuscenes dataset, one of the biggest automotive datasets<br>
available today. Using the concepts of augmentation, hyperparameter optimization, sensor<br>
fusion, and annotation filters, the CRF-Net was trained to achieve an accuracy score that<br>
was 69.13% higher than the baseline</p>
|
402 |
Evaluation and Analysis of Perception Systems for Autonomous DrivingSharma, Devendra January 2020 (has links)
For safe mobility, an autonomous vehicle must perceive the surroundings accurately. There are many perception tasks associated with understanding the local environment such as object detection, localization, and lane analysis. Object detection, in particular, plays a vital role in determining an object’s location and classifying it correctly and is one of the challenging tasks in the self-driving research area. Before employing an object detection module in autonomous vehicle testing, an organization needs to have a precise analysis of the module. Hence, it becomes crucial for a company to have an evaluation framework to evaluate an object detection algorithm’s performance. This thesis develops a comprehensive framework for evaluating and analyzing object detection algorithms, both 2D (camera images based) and 3D (LiDAR point cloud-based). The pipeline developed in this thesis provides the ability to evaluate multiple models with ease, signified by the key performance metrics, Average Precision, F-score, and Mean Average Precision. 40-point interpolation method is used to calculate the Average Precision. / För säker rörlighet måste ett autonomt fordon uppfatta omgivningen exakt. Det finns många uppfattningsuppgifter associerade med att förstå den lokala miljön, såsom objektdetektering, lokalisering och filanalys. I synnerhet objektdetektering spelar en viktig roll för att bestämma ett objekts plats och klassificera det korrekt och är en av de utmanande uppgifterna inom det självdrivande forskningsområdet. Innan en anställd detekteringsmodul används i autonoma fordonsprovningar måste en organisation ha en exakt analys av modulen. Därför blir det avgörande för ett företag att ha en utvärderingsram för att utvärdera en objektdetekteringsalgoritms prestanda. Denna avhandling utvecklar ett omfattande ramverk för utvärdering och analys av objektdetekteringsalgoritmer, både 2 D (kamerabilder baserade) och 3 D (LiDAR-punktmolnbaserade). Rörledningen som utvecklats i denna avhandling ger möjlighet att enkelt utvärdera flera modeller, betecknad med nyckelprestandamätvärdena, Genomsnittlig precision, F-poäng och genomsnittlig genomsnittlig precision. 40-punkts interpoleringsmetod används för att beräkna medelprecisionen.
|
403 |
Depth-Aware Deep Learning Networks for Object Detection and Image SegmentationDickens, James 01 September 2021 (has links)
The rise of convolutional neural networks (CNNs) in the context of computer vision
has occurred in tandem with the advancement of depth sensing technology.
Depth cameras are capable of yielding two-dimensional arrays storing at each pixel
the distance from objects and surfaces in a scene from a given sensor, aligned with
a regular color image, obtaining so-called RGBD images. Inspired by prior models
in the literature, this work develops a suite of RGBD CNN models to tackle
the challenging tasks of object detection, instance segmentation, and semantic
segmentation. Prominent architectures for object detection and image segmentation
are modified to incorporate dual backbone approaches inputting RGB and
depth images, combining features from both modalities through the use of novel
fusion modules. For each task, the models developed are competitive with state-of-the-art RGBD architectures. In particular, the proposed RGBD object detection
approach achieves 53.5% mAP on the SUN RGBD 19-class object detection
benchmark, while the proposed RGBD semantic segmentation architecture yields
69.4% accuracy with respect to the SUN RGBD 37-class semantic segmentation
benchmark. An original 13-class RGBD instance segmentation benchmark is introduced for the SUN RGBD dataset, for which the proposed model achieves 38.4%
mAP. Additionally, an original depth-aware panoptic segmentation model is developed, trained, and tested for new benchmarks conceived for the NYUDv2 and
SUN RGBD datasets. These benchmarks offer researchers a baseline for the task
of RGBD panoptic segmentation on these datasets, where the novel depth-aware
model outperforms a comparable RGB counterpart.
|
404 |
Detecting small and fast objects using image processing techniques : A project study within sport analysisGustafsson, Simon, Persson, Andreas January 2021 (has links)
This study has put three different object detecting techniques to the test. The goal was to investigate small and fast-moving objects to see which technique’s performance is most suitable within the sports of Padel. The study aims to cover and explain different affecting conditions that could cause better but also worse performance for small and fast object detection. The three techniques use different approaches for detecting one or multiple objects and could be a guideline for future object detection development. The proposed techniques utilize background histogram calculation, HSV masking with edge detection and DNN frameworks together with the COCO dataset. The process is tested through outdoor video footage across all techniques to generate data, which indicates that Canny edge detection is a prominent suggestion for further research given its high detection rate. However, YOLO shows excellent potential for multiple object detection at a very high confidence grade, which provides reliable and accurate detection of a targeted object. This study’s conclusion is that depending on what the end purpose aims to achieve, Canny and YOLO have potential for future small and fast object detection.
|
405 |
Benchmarking Object Detection Algorithms for Optical Character Recognition of Odometer MileageHjelm, Mandus, Andersson, Eric January 2022 (has links)
Machine learning algorithms have had breakthroughs in many areas in the last decades. The hardest task, to solve with machine learning, was solving tasks that humans solve intuitively, e.g. understanding natural language or recognizing specific objects in images. To overcome these problems is to allow the computer to learn from experience, instead of implementing a pre-written program to solve the problem at hand - that is how Neural Networks came to be. Neural Network is widely used in image analysis, and object detection algorithms have evolved considerably in the last years. Two of these algorithms are Faster Region-basedConvolutional Neural Networks(Faster R-CNN) and You Only Look Once(YOLO). The purpose of this thesis is to evaluate and benchmark state-of-the-art object detection methods and then analyze their performance based on reading information from images. The information that we aim to extract is digital and analog digits from the odometer of a car, this will be done through object recognition and region-based image analysis. Our models will be compared to the open-source Optical Character Recognition(OCR) model Tesseract, which is in production by the Stockholm-based company Greater Than. In this project we will take a more modern approach and focus on two object detection models, Faster R-CNN and YOLO. When training these models, we will use transfer learning. This means that we will use models that are pre-trained, in our case on a dataset called ImageNet, specifically for object detection. We will then use the TRODO dataset to train these models further, this dataset consists of 2 389 images of car odometers. The models are then evaluated through the measures of mean average precision(mAP), prediction accuracy, and Levenshtein Distance. Our findings are that the object detection models are out-performing Tesseract for all measurements. The highest mAP and accuracy is attained by Faster R-CNN while the best results, regarding Levenshtein distance, are achieved by a YOLO model. The final result is clear, both of our approaches have more diversity and are far better thanTesseract, for solving this specific problem.
|
406 |
Transformer Based Object Detection and Semantic Segmentation for Autonomous DrivingHardebro, Mikaela, Jirskog, Elin January 2022 (has links)
The development of autonomous driving systems has been one of the most popular research areas in the 21st century. One key component of these kinds of systems is the ability to perceive and comprehend the physical world. Two techniques that address this are object detection and semantic segmentation. During the last decade, CNN based models have dominated these types of tasks. However, in 2021, transformer based networks were able to outperform the existing CNN approach, therefore, indicating a paradigm shift in the domain. This thesis aims to explore the use of a vision transformer, particularly a Swin Transformer, in an object detection and semantic segmentation framework, and compare it to a classical CNN on road scenes. In addition, since real-time execution is crucial for autonomous driving systems, the possibility of a parameter reduction of the transformer based network is investigated. The results appear to be advantageous for the Swin Transformer compared to the convolutional based network, considering both object detection and semantic segmentation. Furthermore, the analysis indicates that it is possible to reduce the computational complexity while retaining the performance.
|
407 |
Railway Fastener Fault Detection using YOLOv5Efraimsson, Alva, Lemón, Elin January 2022 (has links)
The railway system is an important part of the sociotechnical society, as it enables efficient, reliable, and sustainable transportation of both people and goods. Despite increasing investments, the Swedish railway has encountered structural and technical problems due to worn-out infrastructure as a result of insufficient maintenance. Two important technical aspects of the rail are the stability and robustness. To prevent transversal and longitudinal deviations, the rail is attached to sleepers by fasteners. The fasteners’ conditions are therefore crucial for the stability of the track and the safeness of the railway. Automatic fastener inspections enable efficient and objective inspections which are a prerequisite for a more adequate maintenance of the railway. This master thesis aims to investigate how machine learning can be applied to the problem of automatic fastener fault detection. The master thesis includes the complete process of applying and evaluating machine learning algorithms to the given problem, including data gathering, data preprocessing, model training, and model evaluation. The chosen model was the state-of-the-art object detector YOLOv5s. To assess the model’s performance and robustness to the given problem, different settings regarding both the dataset and the model’s architecture in terms of transfer learning and hyperparameters were tested. The results indicate that YOLOv5s is an appropriate machine learning algorithm for fastener fault detection. The models that achieved the highest performance reached an mAP[0.5:0.95] above 0.744 during training and 0.692 during testing. Furthermore, several combinations of different settings had a positive effect on the different models’ performances. In conclusion, YOLOv5s is in general a suitable model for detecting fasteners. By closer analysis of the result, the models failed when both fasteners and missing fasteners were partly visible in the lower and upper parts of the image. These cases were not annotated in the dataset and therefore resulted in misclassification. In production, the cropped fasteners can be reduced by accurately synchronizing the frequency of capturing data with the distance between the sleepers, in such a way that only one sleeper and corresponding fasteners are visible per image leading to more accurate results. To conclude, machine learning can be applied as an effective and robust technique to the problem of automatic fastener fault detection.
|
408 |
An empirical study on synthetic image generation techniques for object detectorsArcidiacono, Claudio Salvatore January 2018 (has links)
Convolutional Neural Networks are a very powerful machine learning tool that outperformed other techniques in image recognition tasks. The biggest drawback of this method is the massive amount of training data required, since producing training data for image recognition tasks is very labor intensive. To tackle this issue, different techniques have been proposed to generate synthetic training data automatically. These synthetic data generation techniques can be grouped in two categories: the first category generates synthetic images using computer graphic software and CAD models of the objects to recognize; the second category generates synthetic images by cutting the object from an image and pasting it on another image. Since both techniques have their pros and cons, it would be interesting for industries to investigate more in depth the two approaches. A common use case in industrial scenarios is detecting and classifying objects inside an image. Different objects appertaining to classes relevant in industrial scenarios are often undistinguishable (for example, they all the same component). For these reasons, this thesis work aims to answer the research question “Among the CAD model generation techniques, the Cut-paste generation techniques and a combination of the two techniques, which technique is more suitable for generating images for training object detectors in industrial scenarios”. In order to answer the research question, two synthetic image generation techniques appertaining to the two categories are proposed.The proposed techniques are tailored for applications where all the objects appertaining to the same class are indistinguishable, but they can also be extended to other applications. The two synthetic image generation techniques are compared measuring the performances of an object detector trained using synthetic images on a test dataset of real images. The performances of the two synthetic data generation techniques used for data augmentation have been also measured. The empirical results show that the CAD models generation technique works significantly better than the Cut-Paste generation technique where synthetic images are the only source of training data (61% better),whereas the two generation techniques perform equally good as data augmentation techniques. Moreover, the empirical results show that the models trained using only synthetic images performs almost as good as the model trained using real images (7,4% worse) and that augmenting the dataset of real images using synthetic images improves the performances of the model (9,5% better). / Konvolutionella neurala nätverk är ett mycket kraftfullt verktyg för maskininlärning som överträffade andra tekniker inom bildigenkänning. Den största nackdelen med denna metod är den massiva mängd träningsdata som krävs, eftersom det är mycket arbetsintensivt att producera träningsdata för bildigenkänningsuppgifter. För att ta itu med detta problem har olika tekniker föreslagits för att generera syntetiska träningsdata automatiskt. Dessa syntetiska datagenererande tekniker kan grupperas i två kategorier: den första kategorin genererar syntetiska bilder med hjälp av datorgrafikprogram och CAD-modeller av objekten att känna igen; Den andra kategorin genererar syntetiska bilder genom att klippa objektet från en bild och klistra in det på en annan bild. Eftersom båda teknikerna har sina fördelar och nackdelar, skulle det vara intressant för industrier att undersöka mer ingående de båda metoderna. Ett vanligt fall i industriella scenarier är att upptäcka och klassificera objekt i en bild. Olika föremål som hänför sig till klasser som är relevanta i industriella scenarier är ofta oskiljbara (till exempel de är alla samma komponent). Av dessa skäl syftar detta avhandlingsarbete till att svara på frågan “Bland CAD-genereringsteknikerna, Cut-paste generationsteknikerna och en kombination av de två teknikerna, vilken teknik är mer lämplig för att generera bilder för träningsobjektdetektorer i industriellascenarier”. För att svara på forskningsfrågan föreslås två syntetiska bildgenereringstekniker som hänför sig till de två kategorierna. De föreslagna teknikerna är skräddarsydda för applikationer där alla föremål som tillhör samma klass är oskiljbara, men de kan också utökas till andra applikationer. De två syntetiska bildgenereringsteknikerna jämförs med att mäta prestanda hos en objektdetektor som utbildas med hjälp av syntetiska bilder på en testdataset med riktiga bilder. Föreställningarna för de två syntetiska datagenererande teknikerna som används för dataförökning har också uppmätts. De empiriska resultaten visar att CAD-modelleringstekniken fungerar väsentligt bättre än Cut-Paste-genereringstekniken, där syntetiska bilder är den enda källan till träningsdata (61% bättre), medan de två generationsteknikerna fungerar lika bra som dataförstoringstekniker. Dessutom visar de empiriska resultaten att modellerna som utbildats med bara syntetiska bilder utför nästan lika bra som modellen som utbildats med hjälp av riktiga bilder (7,4% sämre) och att förstora datasetet med riktiga bilder med hjälp av syntetiska bilder förbättrar modellens prestanda (9,5% bättre).
|
409 |
Data Augmentations for Improving Vision-Based Damage Detection : in Land Transport Infrastructure / Dataökningar för att förbättra bildbaserade sprickdetektering : i landtransportinfrastrukturSiripatthiti, Punnawat January 2023 (has links)
Crack, a typical term most people know, is a common form of distress or damage in road pavements and railway sleepers. It poses significant challenges to their structural integrity, safety, and longevity. Over the years, researchers have developed various data-driven technologies for image-based crack detection in road and sleeper applications. The image-based crack detection has become a promising field. Many researchers use ensemble learning to win the Road Damage Detection Challenge. The challenge provides a street view dataset from several countries from different perspectives. The version of the dataset is 2020, which contains images from Japan, India, and Czech. Thus, the dataset inherits a domain shift problem. Current solutions use ensemble learning to deal with such a problem. Those solutions require much computational power and challenge adaptability in real-time applications. To mitigate the problem, the thesis experiments with various data augmentation techniques that could improve the base model performance. The main focuses are erasing a crack from an image using generative AI (Erase), implementing road segmentation by using the Panoptic Segmentation (RS) and injecting a perspective-aware synthetic crack (InjectPa) into the segmented road surface in the image. The results show that compared to the base model, the Erase + RS techniques improve the model's F1 score when trained only on Japan in the dataset rather than when trained on three countries simultaneously. Moreover, the InjectPa technique does not help improve the base model in both scenarios. Then, the experiment moved to the SBB dataset containing close-up images of sleepers from cameras mounted in front of the diagnostic vehicle. This section follows the same techniques but changes the segmentation model to the Segment Anything Model (SAM) because the previous segmentation model was trained on a street view dataset, making it vulnerable to close-up images. The Erase + SAM techniques show improvement in bbox/AP and validation loss. Nevertheless, it does not improve the F1 score significantly compared to the base model. This thesis also applies the explainable AI name D-RISE to determine which feature most influences the model decision. D-RISE shows that the augmentation model can pay attention to the damage type pothole for road pavements and defect type spalling for sleepers than other types. Finally, the thesis discusses the results and suggests a strategy for future study. / Sprickor, en typisk term som de flesta känner till, är en vänlig form av skador i vägbeläggningar och järnvägsslipers. Det innebär betydande utmaningar för strukturella integritet, säkerhet och livslängd. Under årens lopp har olika datadrivna tekniker utvecklats för bildbaserade sprickdetektering i vägbeläggningar och järnvägsslipers applikationer. Den bildbaserade sprickdetekteringen har blivit ett lovande område. Många forskare använder ensembleinlärningsmodeller för att vinna den Road Damage Detection Challenge (Vägbeläggningar Detektering Utmaning). Utmaningen ger en Gatuvy dataset från flera länder från olika perspektiv. Versionen av datasetet är 2020 som innehåller bilder från Japan, Indien och Tjeckien. Därför ärver datasetet ett domänskiftproblem. Nuvarande lösningar använder ensembleinlärning för att hantera ett sådant problem. Dessa lösningar kräver mycket datorkraft och utmanar anpassningsförmågan i realtidsapplikationer. För att mildra problemet, denna avhandling prover många tekniker för dataökningar som kan förbättra basmodellens prestanda. Huvudfokusen är att radera en spricka från en bild via en generativ AI (Erase), implementera vägyta segmentering via den Panoptic Segmentation (RS), lägga en persective-aware syntetik spricka (InjectPa) till segmenterade vögytan in bilden. Resultaten visar att den Erase + RS ökningsteknikerna förbättrar modellens F1 score när den tränas på Japan i datasetet i stället för att tränas alla länder samtidigt. Dessutom förbättrar den InjectPa tekniken inte basmodellen på båda fallen. Därefter flyttades experimentet till SBB-datasetet som innehåller närbilder av järnvägsslipers från kameror monterades framför ett diagnosfordon. Denna section följer de samma teknikerna men ändra segmentering modellen till den Segment Anything Model (SAM) eftersom förra segmentering modellen tränades på en Gatuvy dataset vilket gör den sårbar för närbilder. Den Erase + SAM ökningsteknikerna visar förbättringar på bbox/AP och validering. Ändå förbättrade den inte F1 score avsevört jämfört med basmodellen. Denna avhandling tillämpar också Förklarbar AI-namnet D-RISE för att avgöra vilken funktion som mest påverkar modellbeslutet. D-RISE visar att modellen som har dataökning kan uppmärksamma skadetypen potthål för vägbeläggningar och defekttypen spjälkning för järnvägsslipers än andra typer. Slutligen diskuterar avhandlingen resultaten och föreslår en strategi för framtida arbetsinsatser.
|
410 |
Incorporating Sparse Attention Mechanism into Transformer for Object Detection in Images / Inkludering av gles attention i en transformer för objektdetektering i bilderDuc Dao, Cuong January 2022 (has links)
DEtection TRansformer, DETR, introduces an innovative design for object detection based on softmax attention. However, the softmax operation produces dense attention patterns, i.e., all entries in the attention matrix receive a non-zero weight, regardless of their relevance for detection. In this work, we explore several alternatives to softmax to incorporate sparsity into the architecture of DETR. Specifically, we replace softmax with a sparse transformation from the α-entmax family: sparsemax and entmax-1.5, which induce a set amount of sparsity, and α-entmax, which treats sparsity as a learnable parameter of each attention head. In addition to evaluating the effect on detection performance, we examine the resulting attention maps from the perspective of explainability. To this end, we introduce three evaluation metrics to quantify the sparsity, complementing the qualitative observations. Although our experimental results on the COCO detection dataset do not show an increase in detection performance, we find that learnable sparsity provides more flexibility to the model and produces more explicative attention maps. To the best of our knowledge, we are the first to introduce learnable sparsity into the architecture of transformer-based object detectors. / DEtection Transformer, DETR, introducerar en innovativ design för objektdetektering baserad på softmax attention. Softmax producerar tät attention, alla element i attention-matrisen får en vikt skild från noll, oberoende av deras relevans för objektdetektering. Vi utforskar flera alternativ till softmax för att inkludera gleshet i DETRs arkitektur. Specifikt så ersätter vi softmax med en gles transformation från α-entmax familjen: sparsemax och entmax1.5, vilka inducerar en fördefinierad mängd gleshet, och α-entmax, som ser gleshet som en träningsbar parameter av varje attention-huvud. Förutom att evaluera effekten på detekteringsprestandan, så utforskar vi de resulterande attention-matriserna från ett förklarbarhetsperspektiv. Med det som mål så introducerar vi tre olika metriker för att evaluera gleshet, som ett komplement till de kvalitativa observationerna. Trots att våra experimentella resultat på COCO, ett utmanande dataset för objektdetektering, inte visar en ökning i detekteringsprestanda, så finner vi att träningsbar gleshet ökar modellens flexibilitet, och producerar mer förklarbara attentionmatriser. Såvitt vi vet så är vi de första som introducerar träningsbar gleshet i transformer-baserade arkitekturer för objektdetektering.
|
Page generated in 0.101 seconds