Global ETD Search

501	Convolutional neural network based object detection in a fish ladder : Positional and class imbalance problems using YOLOv3 / Objektdetektering i en fisktrappa baserat på convolutional neural networks : Positionell och kategorisk obalans vid användning av YOLOv3 Ekman, Patrik January 2021 (has links) Hydropower plants create blockages in fish migration routes. Fish ladders can serve as alternative routes but are complex to install and follow up to help adapt and develop them further. In this study, computer vision tools are considered in this regard. More specifically, object detection is applied to images collected in a hydropower plant fish ladder to localise and classify wild, farmed and unknown fish labelled according to the presence, absence or uncertainty of an adipose fin. Fish migration patterns are not deterministic, making it a challenge to collect representative and balanced data to train a model that is resilient to changing conditions. In this study, two data imbalances are addressed by modifying a YOLOv3 baseline model: foreground-foreground class imbalance is targeted using hard and soft resampling and positional imbalance using translation augmentation. YOLOv3 is a convolutional neural network predicting bounding box coordinates, class probabilities and confidence scores simultaneously. It divides images into grids and makes predictions based on grid cell locations and anchor box offsets. Performance is estimated across 10 random data splits and different bounding box overlap thresholds, using (mean) average precision as well as recall, precision and F1 score estimated at optimal validation set confidence thresholds. The Wilcoxon signed-ranks test is used for determining statistical significance. In experiments, the best performance was observed on wild and farmed fish, with F1 scores reaching 94.8 and 89.0 percent respectively. The inconsistent appearance of unknown fish appears harder to generalise to, with a corresponding F1 score of 65.7 percent. Soft sampling but especially translation augmentation contributed to enhanced performance and reduced variance, implying that the baseline model is particularly sensitive to positional imbalance. Spatial dependencies introduced by YOLOv3’s grid cell strategy likely produce local bias or overfitting. An experimental evaluation highlight the importance of not relying on a single data split when evaluating performance on a moderately large or custom dataset. A key challenge observed in experiments is the choice of a suitable confidence threshold, influencing the dynamics of the results. / Vattenkraftverk blockerar fiskars vandringsvägar. Fisktrappor kan skapa alternativa vägar men är komplexa att installera och följa upp för vidare anpassning och utveckling. I denna studie betraktas datorseende i detta avseende. Mer specifikt appliceras objektdetektering på bilder samlade i en fisktrappa i anslutning till ett vattenkraftverk, med målet att lokalisera och klassificera vilda, odlade och okända fiskar baserat på förekomsten, avsaknaden eller osäkerheten av en fett-fena. Fiskars migrationsmönster är inte deterministiska vilket gör det svårt att samla representativ och balanserad data för att trana en modell som kan hantera förändrade förutsättningar. I denna studie addresseras två obalanser i datan genom modifikation av en YOLOv3 baslinjemodell: klass-obalans genom hård och mjuk återanvändning av data och positionell obalans genom translation av bilder innan träning. YOLOv3 är ett convolutional neural network som simultant förutsäger avgränsnings-lådor, klass-sannolikheter och prediktions-säkerhet. Bilder delas upp i rutnätceller och prediktioner görs baserat på cellers position samt modifikation av fördefinierade avgränsningslådor. Resultat beräknas på 10 slumpmässiga uppdelningar av datan och för olika tröskelvärden för avgränsningslådors överlappning. På detta beräknas (mean) average precision, liksom recall, precision och F1 score med tröskelvärden för prediktions-säkerhet beräknat på valideringsdata. Wilcoxon signed-ranks test används för att avgöra statistisk signifikans. Bäst resultat observeras på vilda och odlade fiskar, med F1 scores som når 94.8 respektive 89.0 procent. Okända fiskars inkonsekventa utseenden verkar svårare att generalisera till, med en motsvarande F1 score på 65.7 procent. Mjuk återanvändning av data men speciellt translation bidrar till förbättrad prestanda och minskad varians, vilket pekar på att baslinjemodellen är särskilt känslig för positionell obalans. Spatiala beroenden skapade av YOLOv3s rutnäts-strategi producerar troligen lokal partiskhet eller överträning. I en experimentell utvärdering understryks vikten av multipel uppdelning av datan vid evaluering på ett måttligt stort eller egenskapat dataset. Att välja tröskelvärdet för prediktions-säkerhet anses utmanande och påverkar resultatens dynamik. Object detection Computer vision Fish ladder Imbalance problems Imbalanced data YOLO Convolutional Neural Network Deep learning Objektdetektering Datorseende Fisktrappa Obalanser Obalanserad data YOLO Convolutional Neural Network Djupinl¨arning Computer and Information Sciences Data- och informationsvetenskap
502	Incorporating Metadata Into the Active Learning Cycle for 2D Object Detection / Inkorporera metadata i aktiv inlärning för 2D objektdetektering Stadler, Karsten January 2021 (has links) In the past years, Deep Convolutional Neural Networks have proven to be very useful for 2D Object Detection in many applications. These types of networks require large amounts of labeled data, which can be increasingly costly for companies deploying these detectors in practice if the data quality is lacking. Pool-based Active Learning is an iterative process of collecting subsets of data to be labeled by a human annotator and used for training to optimize performance per labeled image. The detectors used in Active Learning cycles are conventionally pre-trained with a small subset, approximately 2% of available data labeled uniformly at random. This is something I challenged in this thesis by using image metadata. With the motivation of many Machine Learning models being a "jack of all trades, master of none", thus it is hard to train models such that they generalize to all of the data domain, it can be interesting to develop a detector for a certain target metadata domain. A simple Monte Carlo method, Rejection Sampling, can be implemented to sample according to a metadata target domain. This would require a target and proposal metadata distribution. The proposal metadata distribution would be a parametric model in the form of a Gaussian Mixture Model learned from the training metadata. The parametric model for the target distribution could be learned in a similar manner, however from a target dataset. In this way, only the training images with metadata most similar to the target metadata distribution can be sampled. This sampling approach was employed and tested with a 2D Object Detector: Faster-RCNN with ResNet-50 backbone. The Rejection Sampling approach was tested against conventional random uniform sampling and a classical Active Learning baseline: Min Entropy Sampling. The performance was measured and compared on two different target metadata distributions that were inferred from a specific target dataset. With a labeling budget of 2% for each cycle, the max Mean Average Precision at 0.5 Intersection Over Union for the target set each cycle was calculated. My proposed approach has a 40 % relative performance advantage over random uniform sampling for the first cycle, and 10% after 9 cycles. Overall, my approach only required 37 % of the labeled data to beat the next best-tested sampler: the conventional uniform random sampling. / De senaste åren har Djupa Neurala Faltningsnätverk visat sig vara mycket användbara för 2D Objektdetektering i många applikationer. De här typen av nätverk behöver stora mängder av etiketterat data, något som kan innebära ökad kostnad för företag som distribuerar dem, om kvaliteten på etiketterna är bristfällig. Pool-baserad Aktiv Inlärning är en iterativ process som innebär insamling av delmängder data som ska etiketteras av en människa och användas för träning, för att optimera prestanda per etiketterat data. Detektorerna som används i Aktiv Inlärning är konventionellt sätt förtränade med en mindre delmängd data, ungefär 2% av all tillgänglig data, etiketterat enligt slumpen. Det här är något jag utmanade i det här arbetet genom att använda bild metadata. Med motiveringen att många Maskininlärningsmodeller presterar sämre på större datadomäner, eftersom det kan vara svårt att lära detektorer stora datadomäner, kan det vara intressant att utveckla en detektor för ett särskild metadata mål-domän. För att samla in data enligt en metadata måldomän, kan en enkel Monte Carlo metod, Rejection Sampling implementeras. Det skulle behövas en mål-metadata-distribution och en faktisk metadata distribution. den faktiska metadata distributionen skulle vara en parametrisk modell i formen av en Gaussisk blandningsmodell som är tränad på träningsdata. Den parametriska modellen för mål-metadata-distributionen skulle kunna vara tränad på liknande sätt, fast ifrån mål-datasetet. På detta sätt, skulle endast träningsbilder med metadata mest lik mål-datadistributionen kunna samlas in. Den här samplings-metoden utvecklades och testades med en 2D objektdetektor: Faster R-CNN med ResNet-50 bildegenskapextraktor. Rejection sampling metoden blev testad mot konventionell likformig slumpmässig sampling av data och en klassisk Aktiv Inlärnings metod: Minimum Entropi sampling. Prestandan mättes och jämfördes mellan två olika mål-metadatadistributioner som var framtagna från specifika mål-metadataset. Med en etiketteringsbudget på 2%för varje cykel, så beräknades medelvärdesprecisionen om 0.5 snitt över union för mål-datasetet. Min metod har 40%bättre prestanda än slumpmässig likformig insamling i första cykeln, och 10 % efter 9 cykler. Överlag behövde min metod endast 37 % av den etiketterade data för att slå den näst basta samplingsmetoden: slumpmässig likformig insamling. Active learning Deep Learning Object detection Metadata Nuscenes Nuimages Gaussian mixture model Rejection sampling Monte-Carlo methods Aktiv Inlärning Djupinlärning Objektdetektering metadata Nuscenes Nuimages Gaussisk blandingsmodell Rejection sampling Monte-Carlo metoder Computer and Information Sciences Data- och informationsvetenskap
503	Performance Assessment of a 77 GHz Automotive Radar for Various Obstacle Avoidance Application Komarabathuni, Ravi V. 26 July 2011 (has links) No description available. Automotive Engineering Electrical Engineering Long Range Radar (LRR) Adaptive Cruise Control (ACC) 77 GHz Radar Sensor Target Object Detection by Radar Sensor Safety in Automotive Industry
504	Artificial data for Image classification in industrial applications Yonan, Yonan, Baaz, August January 2022 (has links) Machine learning and AI are growing rapidly and they are being implemented more often than before due to their high accuracy and performance. One of the biggest challenges to machine learning is data collection. The training data is the most important part of any machine learning project since it determines how the trained model will behave. In the case of object classification and detection, capturing a large number of images per object is not always possible and can be a very time-consuming and tedious process. This thesis explores options specific to image classification that help reducing the need to capture many images per object while still keeping the same performance accuracy. In this thesis, experiments have been performed with the goal of achieving a high classification accuracy with a limited dataset. One method that is explored is to create artificial training images using a game engine. Ways to expand a small dataset such as different data augmentation methods, and regularization methods, are also employed. / Maskininlärning och AI växer snabbt och de implementeras allt oftare på grund av deras höga noggrannhet och prestanda. En av de största utmaningarna för maskininlärning är datainsamling. Träningsdata är den viktigaste delen av ett maskininlärningsprojekt eftersom den avgör hur den tränade modellen kommer att bete sig. När det gäller objektklassificering och detektering är det inte alltid möjligt att ta många bilder per objekt och det kan vara en process som kräver mycket tid och arbete. Det här examensarbetet utforskar alternativ som är specifika för bildklassificering som minskar behovet av att ta många bilder per objekt samtidigt som prestanda bibehålls. I det här examensarbetet, flera experiment har utförts med målet att uppnå en hög klassificeringsprestanda med en begränsad dataset. En metod som utforskas är att skapa träningsbilder med hjälp av en spelmotor. Metoder för att utöka antal bilder i ett litet dataset, som data augmenteringsmetoder och regleringsmetoder, används också. Synthetic data artificial data object detection image classification artificial intelligence machine learning neural networks convolutional neural networks ResNet ResNet50 Engineering and Technology Teknik och teknologier Computer Engineering Datorteknik Computer Sciences Datavetenskap (datalogi)
505	3D Object Detection Using Sidescan Sonar Images Georgiev, Ivaylo January 2024 (has links) Sidescan sonars are tools used in seabed inspection and imagery. As a smaller and cheaper compared to the alternatives tool, it has attracted attention and many studies have been developed to extract information about the seabed altitude from the produced images. The main issue is that sidescan sonars do not provide elevation angle information, therefore a 3D map of the seabed cannot be inferred directly. One of the most recent techniques to tackle this problem is called neural rendering [1], in which the sea surface bathymetry is implicitly represented using a neural network. The purpose of this thesis is (1) to find the minimum altitude change that can be detected using this technique, (2) to check whether the position of the sonar ensonification has any effect on these results, and (3) to check from how many sides is it sufficient to ensonify the region with altitude change in order to detect it confidently. To conduct this research, simulations of missions conducted by an autonomous underwater vehicle with sidescan sonar heads on both sides are done on a map, where different objects from various sizes and shapes are put. Then, neural rendering is used to reconstruct the bathymetry of the maps before and after the object insertion from the sidescan sonar. The reconstructed seabed elevations are then compared and the objects with the smallest size or altitude that were detected (meaning that the predicted height from the model trained on the map with the objects is significantly larger than that of the model trained on the initial map) would be the answer to the first question. Then, those smallest objects are again put on the same map, and now smaller autonomous underwater vehicle missions are used to check how many sides are need so that the objects are still detectable. The conducted experiments suggest that objects with bathymetry elevation in the range of centimeters can be detected, and in some cases ensonification from 2 sides is sufficient to detect an object with confidence. / Sidenskannings-sonarer spelar en avgörande roll i inspektionen av havsbotten och erbjuder kostnadseffektiva alternativ till traditionella verktyg. Bristen på information om elevationsvinklar utgör dock en utmaning för att direkt härleda en 3D-karta över havsbotten. Denna avhandling undersöker tillämpningen av neural rendering [1], en nyligen utvecklad teknik som implicit representerar havsytsbathymetri med neurala nätverk, för att adressera denna begränsning. Målen med denna forskning är tre: (1) att bestämma den minsta detekterbara höjdändringen med hjälp av neural rendering, (2) att bedöma effekten av sonarens ensonifieringsposition på detektionsresultaten och (3) att undersöka det minsta antalet sidor som krävs för pålitlig objektdetektion i områden med höjdändringar. Metoden innefattar simuleringar av autonoma undervattensfordonsuppdrag utrustade med sidenskannings-sonarer på båda sidor. Olika objekt av varierande storlekar och former introduceras på en karta, och neural rendering används för att återskapa bathymetrier före och efter objektets insättning. Analysen fokuserar på att jämföra de återskapade havsbottenhöjderna och identifiera de minsta objekten eller höjdändringarna som är möjliga att detektera med modellen. Därefter återintroduceras dessa minimala objekt på kartan, och mindre uppdrag med autonoma undervattensfordon genomförs för att fastställa det minsta antalet sidor som krävs för pålitlig detektion. Forskningsresultaten indikerar att objekt med höjdändringar i centimeterskalan kan detekteras pålitligt. Dessutom tyder experimenten på att i vissa fall är ensonifiering från endast två sidor tillräckligt för pålitlig objektdetektion. Denna forskning bidrar med värdefulla insikter för att optimera sidenskanningssonarapplikationer för havsbotteninspektion, vilket erbjuder potentiella förbättringar av effektivitet och kostnadseffektivitet för undervattensutforskning och kartläggning. Sidescan sonar Neural rendering Sinusoidal representation networks Object detection Neural shape-from-shading Sidenskannings-sonar Neural rendering Sinusoidala representationsnätverk Objektdetektion Neural form-från-belysning Computer and Information Sciences Data- och informationsvetenskap
506	Enhanced 3D Object Detection And Tracking In Autonomous Vehicles: An Efficient Multi-modal Deep Fusion Approach Priyank Kalgaonkar (10911822) 03 September 2024 (has links) <p dir="ltr">This dissertation delves into a significant challenge for Autonomous Vehicles (AVs): achieving efficient and robust perception under adverse weather and lighting conditions. Systems that rely solely on cameras face difficulties with visibility over long distances, while radar-only systems struggle to recognize features like stop signs, which are crucial for safe navigation in such scenarios.</p><p dir="ltr">To overcome this limitation, this research introduces a novel deep camera-radar fusion approach using neural networks. This method ensures reliable AV perception regardless of weather or lighting conditions. Cameras, similar to human vision, are adept at capturing rich semantic information, whereas radars can penetrate obstacles like fog and darkness, similar to X-ray vision.</p><p dir="ltr">The thesis presents NeXtFusion, an innovative and efficient camera-radar fusion network designed specifically for robust AV perception. Building on the efficient single-sensor NeXtDet neural network, NeXtFusion significantly enhances object detection accuracy and tracking. A notable feature of NeXtFusion is its attention module, which refines critical feature representation for object detection, minimizing information loss when processing data from both cameras and radars.</p><p dir="ltr">Extensive experiments conducted on large-scale datasets such as Argoverse, Microsoft COCO, and nuScenes thoroughly evaluate the capabilities of NeXtDet and NeXtFusion. The results show that NeXtFusion excels in detecting small and distant objects compared to existing methods. Notably, NeXtFusion achieves a state-of-the-art mAP score of 0.473 on the nuScenes validation set, outperforming competitors like OFT by 35.1% and MonoDIS by 9.5%.</p><p dir="ltr">NeXtFusion’s excellence extends beyond mAP scores. It also performs well in other crucial metrics, including mATE (0.449) and mAOE (0.534), highlighting its overall effectiveness in 3D object detection. Visualizations of real-world scenarios from the nuScenes dataset processed by NeXtFusion provide compelling evidence of its capability to handle diverse and challenging environments.</p> Computer vision CondenseNeXt NeXtDet NeXtFusion Sensor Fusion AI CNN DNN ANN PyTorch Object Detection Image Classification Self-Driving Autonomous Car Neural Networks Machine Learning Deep Learning
507	<b>LIDAR BASED 3D OBJECT DETECTION USING YOLOV8</b> Swetha Suresh Menon (18813667) 03 September 2024 (has links) <p dir="ltr">Autonomous vehicles have gained substantial traction as the future of transportation, necessitating continuous research and innovation. While 2D object detection and instance segmentation methods have made significant strides, 3D object detection offers unparalleled precision. Deep neural network-based 3D object detection, coupled with sensor fusion, has become indispensable for self-driving vehicles, enabling a comprehensive grasp of the spatial geometry of physical objects. In our study of a Lidar-based 3D object detection network using point clouds, we propose a novel architectural model based on You Only Look Once (YOLO) framework. This innovative model combines the efficiency and accuracy of the YOLOv8 network, a swift 2D standard object detector, and a state-of-the-art model, with the real-time 3D object detection capability of the Complex YOLO model. By integrating the YOLOv8 model as the backbone network and employing the Euler Region Proposal (ERP) method, our approach achieves rapid inference speeds, surpassing other object detection models while upholding high accuracy standards. Our experiments, conducted on the KITTI dataset, demonstrate the superior efficiency of our new architectural model. It outperforms its predecessors, showcasing its prowess in advancing the field of 3D object detection in autonomous vehicles.</p> Computer vision Deep learning Neural networks Euler Region Proposal Network Lidar Computer Vision 3D Object Detection Autonomous Vehicles YOLOv8
508	A MULTI-HEAD ATTENTION APPROACH WITH COMPLEMENTARY MULTIMODAL FUSION FOR VEHICLE DETECTION Nujhat Tabassum (18010969) 03 June 2024 (has links) <p dir="ltr">In the realm of autonomous vehicle technology, the Multimodal Vehicle Detection Network (MVDNet) represents a significant leap forward, particularly in the challenging context of weather conditions. This paper focuses on the enhancement of MVDNet through the integration of a multi-head attention layer, aimed at refining its performance. The integrated multi-head attention layer in the MVDNet model is a pivotal modification, advancing the network's ability to process and fuse multimodal sensor information more efficiently. The paper validates the improved performance of MVDNet with multi-head attention through comprehensive testing, which includes a training dataset derived from the Oxford Radar Robotcar. The results clearly demonstrate that the Multi-Head MVDNet outperforms the other related conventional models, particularly in the Average Precision (AP) estimation, under challenging environmental conditions. The proposed Multi-Head MVDNet not only contributes significantly to the field of autonomous vehicle detection but also underscores the potential of sophisticated sensor fusion techniques in overcoming environmental limitations.</p> Electronic sensors Computer vision Multimodal analysis and synthesis Deep learning Neural networks Multi-head Attention Deep Learning Attention Neural Network Autonomous Vehicle Sensor Fusion CNN R-CNN Vehicle Detection Object Detection Deep Fusion Lidar Radar Vision Transformer (ViT)
509	Deep Convolutional Neural Networks for Real-Time Single Frame Monocular Depth Estimation Schennings, Jacob January 2017 (has links) Vision based active safety systems have become more frequently occurring in modern vehicles to estimate depth of the objects ahead and for autonomous driving (AD) and advanced driver-assistance systems (ADAS). In this thesis a lightweight deep convolutional neural network performing real-time depth estimation on single monocular images is implemented and evaluated. Many of the vision based automatic brake systems in modern vehicles only detect pre-trained object types such as pedestrians and vehicles. These systems fail to detect general objects such as road debris and roadside obstacles. In stereo vision systems the problem is resolved by calculating a disparity image from the stereo image pair to extract depth information. The distance to an object can also be determined using radar and LiDAR systems. By using this depth information the system performs necessary actions to avoid collisions with objects that are determined to be too close. However, these systems are also more expensive than a regular mono camera system and are therefore not very common in the average consumer car. By implementing robust depth estimation in mono vision systems the benefits from active safety systems could be utilized by a larger segment of the vehicle fleet. This could drastically reduce human error related traffic accidents and possibly save many lives. The network architecture evaluated in this thesis is more lightweight than other CNN architectures previously used for monocular depth estimation. The proposed architecture is therefore preferable to use on computationally lightweight systems. The network solves a supervised regression problem during the training procedure in order to produce a pixel-wise depth estimation map. The network was trained using a sparse ground truth image with spatially incoherent and discontinuous data and output a dense spatially coherent and continuous depth map prediction. The spatially incoherent ground truth posed a problem of discontinuity that was addressed by a masked loss function with regularization. The network was able to predict a dense depth estimation on the KITTI dataset with close to state-of-the-art performance. deep learning machine learning mono vision system lightweight CNN convolutional neural network depth estimation lidar kitti vehicle camera mono camera camera real-time real time ad autonomous driving adas advanced driver assistance systems mono depth computer vision regression pixel-wise pixel wise object detection general object detection pedestrian detection vehicle detection supervised learning supervised tensorflow python keras opencv autoliv
510	Development and Evaluation of a Machine Vision System for Digital Thread Data Traceability in a Manufacturing Assembly Environment Alexander W Meredith (15305698) 29 April 2023 (has links) <p>A thesis study investigating the development and evaluation of a computer vision (CV) system for a manufacturing assembly task is reported. The CV inference results are compared to a Manufacturing Process Plan and an automation method completes a buyoff in the software, Solumina. Research questions were created and three hypotheses were tested. A literature review was conducted recognizing little consensus of Industry 4.0 technology adoption in manufacturing industries. Furthermore, the literature review uncovered the need for additional research within the topic of CV. Specifically, literature points towards more research regarding the cognitive capabilities of CV in manufacturing. A CV system was developed and evaluated to test for 90% or greater confidence in part detection. A CV dataset was developed and the system was trained and validated with it. Dataset contextualization was leveraged and evaluated, as per literature. A CV system was trained from custom datasets, containing six classes of part. The pre-contextualization dataset and post-contextualization dataset was compared by a Two-Sample T-Test and statistical significance was noted for three classes. A python script was developed to compare as-assembled locations with as-defined positions of components, per the Manufacturing Process Plan. A comparison of yields test for CV-based True Positives (TPs) and human-based TPs was conducted with the system operating at a 2σ level. An automation method utilizing Microsoft Power Automate was developed to complete the cognitive functionality of the CV system testing, by completing a buyoff in the software, Solumina, if CV-based TPs were equal to or greater than human-based TPs.</p> Automation engineering Flexible manufacturing systems Industrial engineering Manufacturing safety and quality Knowledge representation and reasoning Computer vision Object detection and classification object detection and tracking Smart Manufacturing Systems MES system Smart Manufacturing Approach Quality engineering Manufacturing automation systems manufacturing automation Cognitive Systems Engineering cognitive system computer vision machine vision manufacturing execution system robotic process automation Digital Thread (DT) Digital Thread digital twin approach manufacturing assembly assembly quality manufacturing traceability advanced manufacturing advanced quality control Quality 4.0 Industry 4.0 Industry 5.0 Quality 5.0 Human Machine Interactions

Search results