Global ETD Search

471	You Only Gesture Once (YouGo): American Sign Language Translation using YOLOv3 Mehul Nanda (8786558) 01 May 2020 (has links) <div>The study focused on creating and proposing a model that could accurately and precisely predict the occurrence of an American Sign Language gesture for an alphabet in the English Language</div><div>using the You Only Look Once (YOLOv3) Algorithm. The training dataset used for this study was custom created and was further divided into clusters based on the uniqueness of the ASL sign.</div><div>Three diverse clusters were created. Each cluster was trained with the network known as darknet. Testing was conducted using images and videos for fully trained models of each cluster and</div><div>Average Precision for each alphabet in each cluster and Mean Average Precision for each cluster was noted. In addition, a Word Builder script was created. This script combined the trained models, of all 3 clusters, to create a comprehensive system that would create words when the trained models were supplied</div><div>with images of alphabets in the English language as depicted in ASL.</div> Computer Vision Image Processing Pattern Recognition and Data Mining Object Detection Neural Networks YOLO YOLOv3 Sign Language Sign Language Translation ASL Image Processing Convolutional Neural Network
472	Rozpoznávání ručně psaného notopisu / Optical Recognition of Handwritten Music Notation Hajič, Jan January 2019 (has links) Optical Music Recognition (OMR) is the field of computationally reading music notation. This thesis presents, in the form of dissertation by publication, contributions to the theory, resources, and methods of OMR especially for handwritten notation. The main contributions are (1) the Music Notation Graph (MuNG) formalism for describing arbitrarily complex music notation using an oriented graph that can be unambiguously interpreted in terms of musical semantics, (2) the MUSCIMA++ dataset of musical manuscripts with MuNG as ground truth that can be used to train and evaluate OMR systems and subsystems from the image all the way to extracting the musical semantics encoded therein, and (3) a pipeline for performing OMR on musical manuscripts that relies on machine learning both for notation symbol detection and the notation assembly stage, and on properties of the inferred MuNG representation to deterministically extract the musical semantics. While the the OMR pipeline does not perform flawlessly, this is the first OMR system to perform at basic useful tasks over musical semantics extracted from handwritten music notation of arbitrary complexity.
473	Forest Growth And Volume Estimation Using Machine Learning Dahmén, Gustav, Strand, Erica January 2022 (has links) Estimation of forest parameters using remote sensing information could streamline the forest industry from a time and economic perspective. This thesis utilizes object detection and semantic segmentation to detect and classify individual trees from images over 3D models reconstructed from satellite images. This thesis investigated two methods that showed different strengths in detecting and classifying trees in deciduous, evergreen, or mixed forests. These methods are not just valuable for forest inventory but can be greatly useful for telecommunication companies and in defense and intelligence applications. This thesis also presents methods for estimating tree volume and estimating tree growth in 3D models. The results from the methods show the potential to be used in forest management. Finally, this thesis shows several benefits of managing a digitalized forest, economically, environmentally, and socially. machine learning computer vision forest object detection semantic segmentation forest inventory forest type maskinlärning datorseende skog objektdetektion semantisk segmentering skogsinventering skogstyp
474	Object Detection with Deep Convolutional Neural Networks in Images with Various Lighting Conditions and Limited Resolution / Detektion av objekt med Convolutional Neural Networks (CNN) i bilder med dåliga belysningförhållanden och lågupplösning Landin, Roman January 2021 (has links) Computer vision is a key component of any autonomous system. Real world computer vision applications rely on a proper and accurate detection and classification of objects. A detection algorithm that doesn’t guarantee reasonable detection accuracy is not applicable in real time scenarios where safety is the main objective. Factors that impact detection accuracy are illumination conditions and image resolution. Both contribute to degradation of objects and lead to low classifications and detection accuracy. Recent development of Convolutional Neural Networks (CNNs) based algorithms offers possibilities for low-light (LL) image enhancement and super resolution (SR) image generation which makes it possible to combine such models in order to improve image quality and increase detection accuracy. This thesis evaluates different CNNs models for SR generation and LL enhancement by comparing generated images against ground truth images. To quantify the impact of the respective model on detection accuracy, a detection procedure was evaluated on generated images. Experimental results evaluated on images selected from NoghtOwls and Caltech Pedestrian datasets proved that super resolution image generation and low-light image enhancement improve detection accuracy by a substantial margin. Additionally, it has been proven that a cascade of SR generation and LL enhancement further boosts detection accuracy. However, the main drawback of such cascades is related to an increased computational time which limits possibilities for a range of real time applications. / Datorseende är en nyckelkomponent i alla autonoma system. Applikationer för datorseende i realtid är beroende av en korrekt detektering och klassificering av objekt. En detekteringsalgoritm som inte kan garantera rimlig noggrannhet är inte tillämpningsbar i realtidsscenarier, där huvudmålet är säkerhet. Faktorer som påverkar detekteringsnoggrannheten är belysningförhållanden och bildupplösning. Dessa bidrar till degradering av objekt och leder till låg klassificerings- och detekteringsnoggrannhet. Senaste utvecklingar av Convolutional Neural Networks (CNNs) -baserade algoritmer erbjuder möjligheter för förbättring av bilder med dålig belysning och bildgenerering med superupplösning vilket gör det möjligt att kombinera sådana modeller för att förbättra bildkvaliteten och öka detekteringsnoggrannheten. I denna uppsats utvärderas olika CNN-modeller för superupplösning och förbättring av bilder med dålig belysning genom att jämföra genererade bilder med det faktiska data. För att kvantifiera inverkan av respektive modell på detektionsnoggrannhet utvärderades en detekteringsprocedur på genererade bilder. Experimentella resultat utvärderades på bilder utvalda från NoghtOwls och Caltech datauppsättningar för fotgängare och visade att bildgenerering med superupplösning och bildförbättring i svagt ljus förbättrar noggrannheten med en betydande marginal. Dessutom har det bevisats att en kaskad av superupplösning-generering och förbättring av bilder med dålig belysning ytterligare ökar noggrannheten. Den största nackdelen med sådana kaskader är relaterad till en ökad beräkningstid som begränsar möjligheterna för en rad realtidsapplikationer. Object detection Super Resolution image generation Low-Light image enhancement Computer Vision Detektion av objekt Bildgenerering med superupplösning Datorseende Computer and Information Sciences Data- och informationsvetenskap
475	Computer Vision for Camera Trap Footage : Comparing classification with object detection Örn, Fredrik January 2021 (has links) Monitoring wildlife is of great interest to ecologists and is arguably even more important in the Arctic, the region in focus for the research network INTERACT, where the effects of climate change are greater than on the rest of the planet. This master thesis studies how artificial intelligence (AI) and computer vision can be used together with camera traps to achieve an effective way to monitor populations. The study uses an image data set, containing both humans and animals. The images were taken by camera traps from ECN Cairngorms, a station in the INTERACT network. The goal of the project is to classify these images into one of three categories: "Empty", "Animal" and "Human". Three different methods are compared, a DenseNet201 classifier, a YOLOv3 object detector, and the pre-trained MegaDetector, developed by Microsoft. No sufficient results were achieved with the classifier, but YOLOv3 performed well on human detection, with an average precision (AP) of 0.8 on both training and validation data. The animal detections for YOLOv3 did not reach an as high AP and this was likely because of the smaller amount of training examples. The best results were achieved by MegaDetector in combination with an added method to determine if the detected animals were dogs, reaching an average precision of 0.85 for animals and 0.99 for humans. This is the method that is recommended for future use, but there is potential to improve all the models and reach even more impressive results.Teknisk-naturvetenskapliga computer vision camera traps classification object detection neural networks artificial intelligence machine learning datorseende kamerafällor klassificering detektering neurala nätverk artificiell intelligens maskininlärning
476	Radar-based Application of Pedestrian and Cyclist Micro-Doppler Signatures for Automotive Safety Systems Held, Patrick 12 May 2022 (has links) Die sensorbasierte Erfassung des Nahfeldes im Kontext des hochautomatisierten Fahrens erfährt einen spürbaren Trend bei der Integration von Radarsensorik. Fortschritte in der Mikroelektronik erlauben den Einsatz von hochauflösenden Radarsensoren, die durch effiziente Verfahren sowohl im Winkel als auch in der Entfernung und im Doppler die Messgenauigkeit kontinuierlich ansteigen lassen. Dadurch ergeben sich neuartige Möglichkeiten bei der Bestimmung der geometrischen und kinematischen Beschaffenheit ausgedehnter Ziele im Fahrzeugumfeld, die zur gezielten Entwicklung von automotiven Sicherheitssystemen herangezogen werden können. Im Rahmen dieser Arbeit werden ungeschützte Verkehrsteilnehmer wie Fußgänger und Radfahrer mittels eines hochauflösenden Automotive-Radars analysiert. Dabei steht die Erscheinung des Mikro-Doppler-Effekts, hervorgerufen durch das hohe Maß an kinematischen Freiheitsgraden der Objekte, im Vordergrund der Betrachtung. Die durch den Mikro-Doppler-Effekt entstehenden charakteristischen Radar-Signaturen erlauben eine detailliertere Perzeption der Objekte und können in direkten Zusammenhang zu ihren aktuellen Bewegungszuständen gesetzt werden. Es werden neuartige Methoden vorgestellt, die die geometrischen und kinematischen Ausdehnungen der Objekte berücksichtigen und echtzeitfähige Ansätze zur Klassifikation und Verhaltensindikation realisieren. Wird ein ausgedehntes Ziel (z.B. Radfahrer) von einem Radarsensor detektiert, können aus dessen Mikro-Doppler-Signatur wesentliche Eigenschaften bezüglich seines Bewegungszustandes innerhalb eines Messzyklus erfasst werden. Die Geschwindigkeitsverteilungen der sich drehenden Räder erlauben eine adaptive Eingrenzung der Tretbewegung, deren Verhalten essentielle Merkmale im Hinblick auf eine vorausschauende Unfallprädiktion aufweist. Ferner unterliegen ausgedehnte Radarziele einer Orientierungsabhängigkeit, die deren geometrischen und kinematischen Profile direkt beeinflusst. Dies kann sich sowohl negativ auf die Klassifikations-Performance als auch auf die Verwertbarkeit von Parametern auswirken, die eine Absichtsbekundung des Radarziels konstituieren. Am Beispiel des Radfahrers wird hierzu ein Verfahren vorgestellt, das die orientierungsabhängigen Parameter in Entfernung und Doppler normalisiert und die gemessenen Mehrdeutigkeiten kompensiert. Ferner wird in dieser Arbeit eine Methodik vorgestellt, die auf Grundlage des Mikro- Doppler-Profils eines Fußgängers dessen Beinbewegungen über die Zeit schätzt (Tracking) und wertvolle Objektinformationen hinsichtlich seines Bewegungsverhaltens offenbart. Dazu wird ein Bewegungsmodell entwickelt, das die nichtlineare Fortbewegung des Beins approximiert und dessen hohes Maß an biomechanischer Variabilität abbildet. Durch die Einbeziehung einer wahrscheinlichkeitsbasierten Datenassoziation werden die Radar-Detektionen ihren jeweils hervorrufenden Quellen (linkes und rechtes Bein) zugeordnet und eine Trennung der Gliedmaßen realisiert. Im Gegensatz zu bisherigen Tracking-Verfahren weist die vorgestellte Methodik eine Steigerung in der Genauigkeit der Objektinformationen auf und stellt damit einen entscheidenden Vorteil für zukünftige Fahrerassistenzsysteme dar, um deutlich schneller auf kritische Verkehrssituationen reagieren zu können.:1 Introduction 1 1.1 Automotive environmental perception 2 1.2 Contributions of this work 4 1.3 Thesis overview 6 2 Automotive radar 9 2.1 Physical fundamentals 9 2.1.1 Radar cross section 9 2.1.2 Radar equation 10 2.1.3 Micro-Doppler effect 11 2.2 Radar measurement model 15 2.2.1 FMCW radar 15 2.2.2 Chirp sequence modulation 17 2.2.3 Direction-of-arrival estimation 22 2.3 Signal processing 25 2.3.1 Target properties 26 2.3.2 Target extraction 28 Power detection 28 Clustering 30 2.3.3 Real radar data example 31 2.4 Conclusion 33 3 Micro-Doppler applications of a cyclist 35 3.1 Physical fundamentals 35 3.1.1 Micro-Doppler signatures of a cyclist 35 3.1.2 Orientation dependence 36 3.2 Cyclist feature extraction 38 3.2.1 Adaptive pedaling extraction 38 Ellipticity constraints 38 Ellipse fitting algorithm 39 3.2.2 Experimental results 42 3.3 Normalization of the orientation dependence 44 3.3.1 Geometric correction 44 3.3.2 Kinematic correction 45 3.3.3 Experimental results 45 3.4 Conclusion 47 3.5 Discussion and outlook 47 4 Micro-Doppler applications of a pedestrian 49 4.1 Pedestrian detection 49 4.1.1 Human kinematics 49 4.1.2 Micro-Doppler signatures of a pedestrian 51 4.1.3 Experimental results 52 Radially moving pedestrian 52 Crossing pedestrian 54 4.2 Pedestrian feature extraction 57 4.2.1 Frequency-based limb separation 58 4.2.2 Extraction of body parts 60 4.2.3 Experimental results 62 4.3 Pedestrian tracking 64 4.3.1 Probabilistic state estimation 65 4.3.2 Gaussian filters 67 4.3.3 The Kalman filter 67 4.3.4 The extended Kalman filter 69 4.3.5 Multiple-object tracking 71 4.3.6 Data association 74 4.3.7 Joint probabilistic data association 80 4.4 Kinematic-based pedestrian tracking 84 4.4.1 Kinematic modeling 84 4.4.2 Tracking motion model 87 4.4.3 4-D radar point cloud 91 4.4.4 Tracking implementation 92 4.4.5 Experimental results 96 Longitudinal trajectory 96 Crossing trajectory with sudden turn 98 4.5 Conclusion 102 4.6 Discussion and outlook 103 5 Summary and outlook 105 5.1 Developed algorithms 105 5.1.1 Adaptive pedaling extraction 105 5.1.2 Normalization of the orientation dependence 105 5.1.3 Model-based pedestrian tracking 106 5.2 Outlook 106 Bibliography 109 List of Acronyms 119 List of Figures 124 List of Tables 125 Appendix 127 A Derivation of the rotation matrix 2.26 127 B Derivation of the mixed radar signal 2.52 129 C Calculation of the marginal association probabilities 4.51 131 Curriculum Vitae 135 / Sensor-based detection of the near field in the context of highly automated driving is experiencing a noticeable trend in the integration of radar sensor technology. Advances in microelectronics allow the use of high-resolution radar sensors that continuously increase measurement accuracy through efficient processes in angle as well as distance and Doppler. This opens up novel possibilities in determining the geometric and kinematic nature of extended targets in the vehicle environment, which can be used for the specific development of automotive safety systems. In this work, vulnerable road users such as pedestrians and cyclists are analyzed using a high-resolution automotive radar. The focus is on the appearance of the micro-Doppler effect, caused by the objects’ high kinematic degree of freedom. The characteristic radar signatures produced by the micro-Doppler effect allow a clearer perception of the objects and can be directly related to their current state of motion. Novel methods are presented that consider the geometric and kinematic extents of the objects and realize real-time approaches to classification and behavioral indication. When a radar sensor detects an extended target (e.g., bicyclist), its motion state’s fundamental properties can be captured from its micro-Doppler signature within a measurement cycle. The spinning wheels’ velocity distributions allow an adaptive containment of the pedaling motion, whose behavior exhibits essential characteristics concerning predictive accident prediction. Furthermore, extended radar targets are subject to orientation dependence, directly affecting their geometric and kinematic profiles. This can negatively affect both the classification performance and the usability of parameters constituting the radar target’s intention statement. For this purpose, using the cyclist as an example, a method is presented that normalizes the orientation-dependent parameters in range and Doppler and compensates for the measured ambiguities. Furthermore, this paper presents a methodology that estimates a pedestrian’s leg motion over time (tracking) based on the pedestrian’s micro-Doppler profile and reveals valuable object information regarding his motion behavior. To this end, a motion model is developed that approximates the leg’s nonlinear locomotion and represents its high degree of biomechanical variability. By incorporating likelihood-based data association, radar detections are assigned to their respective evoking sources (left and right leg), and limb separation is realized. In contrast to previous tracking methods, the presented methodology shows an increase in the object information’s accuracy. It thus represents a decisive advantage for future driver assistance systems in order to be able to react significantly faster to critical traffic situations.:1 Introduction 1 1.1 Automotive environmental perception 2 1.2 Contributions of this work 4 1.3 Thesis overview 6 2 Automotive radar 9 2.1 Physical fundamentals 9 2.1.1 Radar cross section 9 2.1.2 Radar equation 10 2.1.3 Micro-Doppler effect 11 2.2 Radar measurement model 15 2.2.1 FMCW radar 15 2.2.2 Chirp sequence modulation 17 2.2.3 Direction-of-arrival estimation 22 2.3 Signal processing 25 2.3.1 Target properties 26 2.3.2 Target extraction 28 Power detection 28 Clustering 30 2.3.3 Real radar data example 31 2.4 Conclusion 33 3 Micro-Doppler applications of a cyclist 35 3.1 Physical fundamentals 35 3.1.1 Micro-Doppler signatures of a cyclist 35 3.1.2 Orientation dependence 36 3.2 Cyclist feature extraction 38 3.2.1 Adaptive pedaling extraction 38 Ellipticity constraints 38 Ellipse fitting algorithm 39 3.2.2 Experimental results 42 3.3 Normalization of the orientation dependence 44 3.3.1 Geometric correction 44 3.3.2 Kinematic correction 45 3.3.3 Experimental results 45 3.4 Conclusion 47 3.5 Discussion and outlook 47 4 Micro-Doppler applications of a pedestrian 49 4.1 Pedestrian detection 49 4.1.1 Human kinematics 49 4.1.2 Micro-Doppler signatures of a pedestrian 51 4.1.3 Experimental results 52 Radially moving pedestrian 52 Crossing pedestrian 54 4.2 Pedestrian feature extraction 57 4.2.1 Frequency-based limb separation 58 4.2.2 Extraction of body parts 60 4.2.3 Experimental results 62 4.3 Pedestrian tracking 64 4.3.1 Probabilistic state estimation 65 4.3.2 Gaussian filters 67 4.3.3 The Kalman filter 67 4.3.4 The extended Kalman filter 69 4.3.5 Multiple-object tracking 71 4.3.6 Data association 74 4.3.7 Joint probabilistic data association 80 4.4 Kinematic-based pedestrian tracking 84 4.4.1 Kinematic modeling 84 4.4.2 Tracking motion model 87 4.4.3 4-D radar point cloud 91 4.4.4 Tracking implementation 92 4.4.5 Experimental results 96 Longitudinal trajectory 96 Crossing trajectory with sudden turn 98 4.5 Conclusion 102 4.6 Discussion and outlook 103 5 Summary and outlook 105 5.1 Developed algorithms 105 5.1.1 Adaptive pedaling extraction 105 5.1.2 Normalization of the orientation dependence 105 5.1.3 Model-based pedestrian tracking 106 5.2 Outlook 106 Bibliography 109 List of Acronyms 119 List of Figures 124 List of Tables 125 Appendix 127 A Derivation of the rotation matrix 2.26 127 B Derivation of the mixed radar signal 2.52 129 C Calculation of the marginal association probabilities 4.51 131 Curriculum Vitae 135 info:eu-repo/classification/ddc/621.3822 ddc:621.3822 ddc:621.38131 info:eu-repo/classification/ddc/621.3824 ddc:621.3824
477	Segmentation and structuring of video documents for indexing applications / Segmentation et structuration de documents video pour l'indexation Tapu, Ruxandra Georgina 07 December 2012 (has links) Les progrès récents en matière de télécommunications, collaboré avec le développement des dispositifs d'acquisition d’images et de vidéos a conduit à une croissance spectaculaire de la quantité des données vidéo stockées, transmises et échangées sur l’Internet. Dans ce contexte, l'élaboration d'outils efficaces pour accéder aux éléments d’information présents dans le contenu vidéo est devenue un enjeu crucial. Dans le Chapitre 2 nous introduisons un nouvel algorithme pour la détection de changement de plans vidéo. La technique est basée sur la partition des graphes combinée avec une analyse multi-résolution et d'une opération de filtrage non-linéaire. La complexité globale de calcul est réduite par l’application d'une stratégie deux passes. Dans le Chapitre 3 le problème d’abstraction automatique est considéré. Dans notre cas, nous avons adopté un système de représentation image-clés qui extrait un nombre variable d'images de chaque plan vidéo détecté, en fonction de la variation du contenu visuel. Le Chapitre 4 traite la segmentation de haut niveau sémantique. En exploitant l'observation que les plans vidéo appartenant à la même scène ont les mêmes caractéristiques visuelles, nous introduisons un nouvel algorithme de regroupement avec contraintes temporelles, qui utilise le seuillage adaptatif et les plans vidéo neutralisés. Dans le Chapitre 5 nous abordons le thème de détection d’objets vidéo saillants. Dans ce contexte, nous avons introduit une nouvelle approche pour modéliser l'attention spatio-temporelle utilisant : la correspondance entre les points d'intérêt, les transformations géométriques et l’estimation des classes de mouvement / Recent advances in telecommunications, collaborated with the development of image and video processing and acquisition devices has lead to a spectacular growth of the amount of the visual content data stored, transmitted and exchanged over Internet. Within this context, elaborating efficient tools to access, browse and retrieve video content has become a crucial challenge. In Chapter 2 we introduce and validate a novel shot boundary detection algorithm able to identify abrupt and gradual transitions. The technique is based on an enhanced graph partition model, combined with a multi-resolution analysis and a non-linear filtering operation. The global computational complexity is reduced by implementing a two-pass approach strategy. In Chapter 3 the video abstraction problem is considered. In our case, we have developed a keyframe representation system that extracts a variable number of images from each detected shot, depending on the visual content variation. The Chapter 4 deals with the issue of high level semantic segmentation into scenes. Here, a novel scene/DVD chapter detection method is introduced and validated. Spatio-temporal coherent shots are clustered into the same scene based on a set of temporal constraints, adaptive thresholds and neutralized shots. Chapter 5 considers the issue of object detection and segmentation. Here we introduce a novel spatio-temporal visual saliency system based on: region contrast, interest points correspondence, geometric transforms, motion classes’ estimation and regions temporal consistency. The proposed technique is extended on 3D videos by representing the stereoscopic perception as a 2D video and its associated depth Détection de changement de plans vidéo Partition des graphes Analyse multi-résolution Abstraction automatique Segmentation de haut niveau sémantique Détection d’objets vidéo saillants Video transition detection Graph partition Multi-resolution analysis Automatic abstraction High level semantic segmentation
478	3D Object Detection based on Unsupervised Depth Estimation Manoharan, Shanmugapriyan 25 January 2022 (has links) Estimating depth and detection of object instances in 3D space is fundamental in autonomous navigation, localization, and mapping, robotic object manipulation, and augmented reality. RGB-D images and LiDAR point clouds are the most illustrative formats of depth information. However, depth sensors offer many shortcomings, such as low effective spatial resolutions and capturing of a scene from a single perspective. The thesis focuses on reproducing denser and comprehensive 3D scene structure for given monocular RGB images using depth and 3D object detection. The first contribution of this thesis is the pipeline for the depth estimation based on an unsupervised learning framework. This thesis proposes two architectures to analyze structure from motion and 3D geometric constraint methods. The proposed architectures trained and evaluated using only RGB images and no ground truth depth data. The architecture proposed in this thesis achieved better results than the state-of-the-art methods. The second contribution of this thesis is the application of the estimated depth map, which includes two algorithms: point cloud generation and collision avoidance. The predicted depth map and RGB image are used to generate the point cloud data using the proposed point cloud algorithm. The collision avoidance algorithm predicts the possibility of collision and provides the collision warning message based on decoding the color in the estimated depth map. This algorithm design is adaptable to different color map with slight changes and perceives collision information in the sequence of frames. Our third contribution is a two-stage pipeline to detect the 3D objects from a monocular image. The first stage pipeline used to detect the 2D objects and crop the patch of the image and the same provided as the input to the second stage. In the second stage, the 3D regression network train to estimate the 3D bounding boxes to the target objects. There are two architectures proposed for this 3D regression network model. This approach achieves better average precision than state-of-theart for truncation of 15% or fully visible objects and lowers but comparable results for truncation more than 30% or partly/fully occluded objects. info:eu-repo/classification/ddc/000 ddc:000
479	3D Object Detection Using Virtual Environment Assisted Deep Network Training Dale, Ashley S. 12 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / An RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety of configurations. When the MR-CNN architecture was initialized with MS COCO weights and the heads were trained with a mix of synthetic data and real world data, F1 scores improved in four of the five classes: The average maximum F1-score of all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91, compared to F1 = 0.89 for the networks trained exclusively with real data, and the standard deviation of the maximum mean F1-score for synthetically trained networks is σ∗ = 0.015, compared to σ_F1 = 0.020 for the networks trained exclusively with real F1 data. Various backgrounds in synthetic data were shown to have negligible impact on F1 scores, opening the door to abstract backgrounds and minimizing the need for intensive synthetic data fabrication. When the MR-CNN architecture was initialized with MS COCO weights and depth data was included in the training data, the net- work was shown to rely heavily on the initial convolutional input to feed features into the network, the image depth channel was shown to influence mask generation, and the image color channels were shown to influence object classification. A set of latent variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering based on image background. Machine Learning MASK R-CNN ARTIFICIAL INTELLIGENCE IMAGE PROCESSING 3D IMAGE SIGNAL PROCESSING OBJECT DETECTION THREAT DETECTION VIRTUAL ENVIRONMENTS SYNTHETIC DATASET IMAGE SEGMENTATION RGBD RGBD VIDEO RGBZ ALGORITHM MS COCO TRANSFER LEARNING
480	Využití GPU pro algoritmy grafiky a zpracování obrazu / Exploitation of GPU in graphics and image processing algorithms Jošth, Radovan January 2015 (has links) Táto práca popisuje niekoľko vybraných algoritmov, ktoré boli primárne vyvinuté pre CPU procesory, avšak vzhľadom k vysokému dopytu po ich vylepšeniach sme sa rozhodli ich využiť v prospech GPGPU (procesorov grafického adaptéra). Modifikácia týchto algoritmov bola zároveň cieľom nášho výskumu, ktorý bol prevedený pomocou CUDA rozhrania. Práca je členená podľa troch skupín algoritmov, ktorým sme sa venovali: detekcia objektov v reálnom čase, spektrálna analýza obrazu a detekcia čiar v reálnom čase. Pre výskum detekcie objektov v reálnom čase sme zvolili použitie LRD a LRP funkcií. Výskum spektrálnej analýzy obrazu bol prevedný pomocou PCA a NTF algoritmov. Pre potreby skúmania detekcie čiar v reálnom čase sme používali dva rôzne spôsoby modifikovanej akumulačnej schémy Houghovej transformácie. Pred samotnou časťou práce venujúcej sa konkrétnym algoritmom a predmetu skúmania, je v úvodných kapitolách, hneď po kapitole ozrejmujúcej dôvody skúmania vybranej problematiky, stručný prehľad architektúry GPU a GPGPU. Záverečné kapitoly sú zamerané na konkretizovanie vlastného prínosu autora, jeho zameranie, dosiahnuté výsledky a zvolený prístup k ich dosiahnutiu. Súčasťou výsledkov je niekoľko vyvinutých produktov.

Search results