• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 346
  • 42
  • 19
  • 13
  • 10
  • 8
  • 4
  • 4
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 535
  • 535
  • 247
  • 204
  • 168
  • 129
  • 110
  • 110
  • 108
  • 87
  • 86
  • 79
  • 75
  • 74
  • 73
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
431

Data Augmentation for Safe 3D Object Detection for Autonomous Volvo Construction Vehicles

Zhao, Xun January 2021 (has links)
Point cloud data can express the 3D features of objects, and is an important data type in the field of 3D object detection. Since point cloud data is more difficult to collect than image data and the scale of existing datasets is smaller, point cloud data augmentation is introduced to allow more features to be discovered on existing data. In this thesis, we propose a novel method to enhance the point cloud scene, based on the generative adversarial network (GAN) to realize the augmentation of the objects and then integrate them into the existing scenes. A good fidelity and coverage are achieved between the fake sample and the real sample, with JSD equal to 0.027, MMD equal to 0.00064, and coverage equal to 0.376. In addition, we investigated the functional data annotation tools and completed the data labeling task. The 3D object detection task is carried out on the point cloud data, and we have achieved a relatively good detection results in a short processing of around 22ms. Quantitative and qualitative analysis is carried out on different models. / Punktmolndata kan uttrycka 3D-egenskaperna hos objekt och är en viktig datatyp inom området för 3D-objektdetektering. Eftersom punktmolndata är svarare att samla in än bilddata och omfattningen av befintlig data är mindre, introduceras punktmolndataförstärkning för att tillåta att fler funktioner kan upptäckas på befintlig data. I det här dokumentet föreslår vi en metod för att förbättra punktmolnsscenen, baserad på det generativa motstridiga nätverket (GAN) för att realisera förstärkningen av objekten och sedan integrera dem i de befintliga scenerna. En god trohet och tackning uppnås mellan det falska provet och det verkliga provet, med JSD lika med 0,027, MMD lika med 0,00064 och täckning lika med 0,376. Dessutom undersökte vi de funktionella verktygen för dataanteckningar och slutförde uppgiften for datamärkning. 3D- objektdetekteringsuppgiften utförs på punktmolnsdata och vi har uppnått ett relativt bra detekteringsresultat på en kort bearbetningstid runt 22ms. Kvantitativ och kvalitativ analys utförs på olika modeller.
432

Utvärdering av noggrannheten av kastparablar på en iPad / Accuracy evaluation of trajectories on an iPad

Waninger, Mikael, Rothman, Sofia January 2022 (has links)
Prestationsmätning och analys används inom sporter för att förbättra en spelares resultat relaterade till sin respektive sport. För analys finns labb och/eller dyr utrustning vilket gör den svårtillgänglig för icke-professionella utövare. Att minska kostnaden för mätverktyg bidrar till mer jämlika förutsättningar för spelare oavsett inkomst eller ålder. Den här studien syftar till att undersöka om en smartphone eller surfplatta kan användas för mätning och sportanalys. För att utforska detta utvecklades en applikation med fokus på projektilsporter som fotboll, tennis och golf. Applikationen testar visualisering av ett objekts parabel, mätning av dess hastighet och visualisering av dess träff i ett vertikalt plan. Applikationen utvecklades för iOS och testades på en iPad 12 pro. Tester för att validera applikationens noggrannhet utfördes med en fotboll, en tennisboll och en golfboll. Testresultaten för visualisering av parabel gav resultat för fotboll och tennisboll men kunde inte hantera golfbollens mindre storlek. Hastighet kunde mätas för alla tre bollar med en genomsnittlig procentuell avvikelse på 76% för fotboll, 21% för tennisboll och 43% för golfboll. Testresultaten för visualisering av ett objekts träff i ett målplan visade resultat för fotboll och tennisboll, men inte för en golfboll. Den genomsnittliga procentuella avvikelsen var 89% för fotboll respektive 23% för tennisboll. / Measuring and analyzing player performance within sports helps to improve a players results in regards to their respective sport. Specialized labs and or expensive equipment are used for analysis but are difficult to access for the average player. Decreasing the cost of measurement tools would help equalize the playing field for players regardless of age or economic background. This study evaluates if a smartphone or tablet can be used to perform the same task. To achieve this an application was developed with a focus on projectile sports such as soccer, tennis, or golf. The application will visualize a parabola, measure speed, and visualize the point where an object hits a vertical plane. The application was developed for iOS and was tested on an iPad 12 pro. The tests were performed with a soccer ball, tennis ball and golf ball. Tests for visualizing a parabola produced results for the soccer ball and the tennis ball but could not handle the golf balls smaller size. Speed was measured for all three balls with an average percentual offset of 76% for the soccer ball, 21% for the tennis ball and 43% for the golf ball. Hit on a vertical plan produced results for the soccer ball and tennis ball with an average percentual offset of 89% for the soccer ball and 23% for the tennis ball.
433

Towards a Comprehensive Bicycle Motion Behavior Model and Naturalistic Cycling Dataset

Alazemi, Fahd 25 May 2022 (has links)
Most of the existing bicycle flow traffic research is limited to characterizing the longitudinal motion of bicyclists based on the assumption that there is no significant differences between the dynamics of a single-file bicycle traffic and the longitudinal motion behavior of cars. This research reparametrizes an existing car-following model to describe bicycle-following and motion behavior. Furthermore, the lack of naturalistic data has limited the validation of this model. This research aims at developing a descriptive model that is capable of capturing the inherent non-lane-based traffic behavior characteristics of bicycle traffic and provides a methodology for extracting naturalistic cycling data from video feeds for use in safety and mobility applications. In this study, The Fadhloun-Rakha (FR) bicycle-following longitudinal motion model was extended through complementing it with a lateral motion strategy; thus allowing for overtaking maneuvers and lateral bicycle movements. For the most part, the following strategy of the FR model remains valid for modeling the longitudinal motion of bicycles except for the activation conditions of the collision avoidance strategy which are modified in order to allow for overtaking when possible. The proposed methodology is innovative in that it makes use of the intersection of certain pre-defined regions around the bicycles to decide on the feasibility of angular motion along with its direction and magnitude. The resulting model is the first point-mass dynamics-based model for the description of the longitudinal and lateral behavior of bicycles in both constrained and unconstrained conditions, and it is the only existing model that is sensitive to the bicyclist physical characteristics and the bicycle and roadway surface conditions given that the used longitudinal logic was previously validated against experimental cycling data. In relation to the development of the naturalistic cycling dataset, the used videos come from a dataset collected in a previous Virginia Tech Transportation Institute study in collaboration with SPIN in which continuous video data at a non-signalized intersection on the Virginia Tech campus was recorded. The research applied computer vision and machine learning techniques to develop a comprehensive framework for the extraction of naturalistic cycling trajectories. In total, this study resulted in the collection and classification of 619 bicycle trajectories based on their type of interactions with other road users. The results confirm the success of the proposed methodology in relation to extracting the locations, speeds, and accelerations of the bicycles with a high precision level. Furthermore, preliminary insights into the acceleration and speed behavior of bicyclists around motorists are determined. / Master of Science / The behavior of bicycle traffic differs from the that of cars. Bicycle traffic flow dynamics is unconstrained in lateral motion and overtaking when compared to car traffic flow. Based on this inherent behavior, existing car-following can only model the longitudinal motion of the bicycle flow traffic and it does not describe the non-lane base traffic that characterizes bicycle traffic dynamics. Furthermore, the existing experimental controlled dataset used for validating bicycle traffic flow models does not capture the naturalistic behavior of cyclists. Therefore, this research aims to develop a descriptive model that is capable of capturing the inherent non-lane-based traffic behavior characteristics of bicycle traffic and provides a methodology for extracting a naturalistic cycling data from a video dataset for use in safety and mobility applications. In this study, the Fadhloun-Rakha (FR) bicycle-following longitudinal motion model was extended through complementing it with a lateral motion strategy; thus allowing for overtaking maneuvers and lateral bicycle movements. For the most part, the following strategy of the FR model remains valid for modeling the longitudinal motion of bicycles except for the activation conditions of the collision avoidance strategy which are modified in order to allow for overtaking when possible. The proposed methodology is innovative in that it makes use of the intersection of certain pre-defined regions around the bicycles to decide on the feasibility of angular motion along with its direction and magnitude. The resulting model is the first point-mass dynamics-based model for the description of the longitudinal and lateral behavior of bicycles in both constrained and unconstrained conditions, and it is the only existing model that is sensitive to the bicyclist physical characteristics and the bicycle and roadway surface conditions given that the used longitudinal logic was previously validated against experimental cycling data. In relation to the development of the naturalistic cycling dataset, the used videos come from a dataset collected in a previous Virginia Tech Transportation Institute study in collaboration with SPIN in which continuous video data at a non-signalized intersection on the Virginia Tech campus was recorded. The research applied computer vision and machine learning techniques to develop a comprehensive framework for the extraction of naturalistic cycling trajectories. In total, this study resulted in the collection and classification of 619 bicycle trajectories based on their type of interactions with other road users. The results confirm the success of the proposed methodology in relation to extracting the locations, speeds, and accelerations of the bicycles with a high precision level. Furthermore, preliminary insights into the acceleration and speed behavior of bicyclists around motorists are determined.
434

Image Approximation using Triangulation

Trisiripisal, Phichet 11 July 2003 (has links)
An image is a set of quantized intensity values that are sampled at a finite set of sample points on a two-dimensional plane. Images are crucial to many application areas, such as computer graphics and pattern recognition, because they discretely represent the information that the human eyes interpret. This thesis considers the use of triangular meshes for approximating intensity images. With the help of the wavelet-based analysis, triangular meshes can be efficiently constructed to approximate the image data. In this thesis, this study will focus on local image enhancement and mesh simplification operations, which try to minimize the total error of the reconstructed image as well as the number of triangles used to represent the image. The study will also present an optimal procedure for selecting triangle types used to represent the intensity image. Besides its applications to image and video compression, this triangular representation is potentially very useful for data storage and retrieval, and for processing such as image segmentation and object recognition. / Master of Science
435

Soccer Data Analysis Based on Computer Vision : Master Thesis at KTH Royal Institute of Technology / Fotbollsdataanalys baserad på datorseende : Masteruppsats vid Kungliga Tekniska Högskolan

Pan, Rongfei January 2024 (has links)
As the top sport in the world without any doubt, soccer has a wide influence on human society. Since the beginning of modern soccer, soccer tactics have been developed for a long time. Clearly, it requires data for soccer analysis, which includes not only the match results between each team but also performance of players on the pitch. Playmaker.ai, where this degree project has been carried out, is a company that provides soccer analysis services. The major purpose of this project is to create a system that can generate player position by analyzing video data without bird-view information. Besides player position generation, some progress has been made in expected goal calculation and implemented some data preprocessing tools. In this project, the goal is accomplished in following steps: 1. Detect players from camera view images by using YOLO (You Only Look Once) network. 2. Use Strong-Sort method to track the position of players and ball in a long video. 3. Assign the teams to different detected object, methods including K-means are used in this step. 4. Generate bird view position by using perspective transformation method The result shows that all the machine model successfully converged and achieve good performance in practical usage, despite that there are still existing limitations and problems. By using this system, a 2-D map with player position on this map can be generated. And the data preprocessing tools can also be used for the company. Admittedly, because of several limitation in practical development, there are problems and disadvantage of the system. This system could be considered as a prototype of a complete method for solving multiple issues in soccer data analysis based on machine learning and computer vision. The future developers can iterate this project for further improvement. / Som den bästa sporten i världen utan tvekan har fotboll ett stort inflytande på det mänskliga samhället. Sedan starten av modern fotboll har fotbollstaktik utvecklats under lång tid. Det kräver helt klart data för fotbollsanalys, som inte bara inkluderar matchresultaten mellan varje lag utan även spelarnas prestation på planen. Playmaker.ai, där jag gjorde det här examensarbetet, är ett företag som tillhandahåller fotbollsanalystjänster. Huvudsyftet med detta projekt är att skapa ett system som kan generera spelarposition genom att analysera videodata utan fågelvyinformation. Förutom spelarpositionsgenerering, gjorde jag också vissa framsteg i xG-beräkning och implementerade några verktyg för förbearbetning av data. I det här projektet uppnådde jag målet i följande steg: 1.Upptäck spelare från kameravisningsbilder genom att använda YOLOv5-nätverket. 2. Använd Strong-Sort-metoden för att spåra spelares och bollens position i en lång video. 3. Tilldela teamen till olika upptäckta objekt, metoder inklusive Kmeans används i detta steg. 4. Generera fågelvyposition genom att använda perspektivomvandlings-metoden. Resultatet visar att alla maskinmodeller framgångsrikt konvergerade och uppnår bra prestanda i praktisk användning, trots att det fortfarande finns begränsningar och problem. Genom att använda detta system kan vi framgångsrikt generera en 2D-karta med spelarposition på denna karta. Och verktygen för dataförbehandling kan också användas för företaget. Visserligen, på grund av flera begränsningar i praktisk utveckling, finns det problem och nackdelar med systemet. Detta system skulle kunna betraktas som en prototyp av en komplett metod för att lösa flera problem inom fotbollsdataanalys baserad på maskininlärning och datorseende. Den framtida utvecklaren kan upprepa detta projekt för att göra framsteg.
436

A Composite Field-Based Learning Framework for Pose Estimation and Object Detection : Exploring Scale Variation Adaptations in Composite Field-Based Pose Estimation and Extending the Framework for Object Detection / En sammansatt fältbaserad inlärningsramverk för posuppskattning och objektdetektering : Utforskning av skalvariationsanpassningar i sammansatt fältbaserad posuppskattning och utvidgning av ramverket för objektdetektering

Guo, Jianting January 2024 (has links)
This thesis aims to address the concurrent challenges of multi-person 2D pose estimation and object detection within a unified bottom-up framework. Our foundational solutions encompass a recently proposed pose estimation framework named OpenPifPaf, grounded in composite fields. OpenPifPaf employs the Composite Intensity Field (CIF) for precise joint localization and the Composite Association Field (CAF) for seamless joint connectivity. To assess the model’s robustness against scale variances, a Feature Pyramid Network (FPN) is incorporated into the baseline. Additionally, we present a variant of OpenPifPaf known as CifDet. CifDet utilizes the Composite Intensity Field to classify and detect object centers, subsequently regressing bounding boxes from these identified centers. Furthermore, we introduce an extended version of CifDet specifically tailored for enhanced object detection capabilities—CifCafDet. This augmented framework is designed to more effectively tackle the challenges inherent in object detection tasks. The baseline OpenPifPaf model outperforms most existing bottom-up pose estimation methods and achieves comparable results with some state-of-the-art top-down methods on the COCO keypoint dataset. Its variant, CifDet, adapts the OpenPifPaf’s composite field-based architecture for object detection tasks. Further modifications result in CifCafDet, which demonstrates enhanced performance on the MS COCO detection dataset over CifDet, suggesting its viability as a multi-task framework. / Denna avhandling syftar till att ta itu med de samtidiga utmaningarna med flerpersons 2D-posestimering och objektdetektion inom en enhetlig bottom-up-ram. Våra grundläggande lösningar omfattar ett nyligen föreslaget ramverk för posestimering med namnet OpenPifPaf, som grundar sig i kompositfält. OpenPifPaf använder Composite Intensity Field (CIF) för exakt leddlokalisering och Composite Association Field (CAF) för sömlös ledanslutning. För att bedöma modellens robusthet mot skalvariationer införlivas ett Feature Pyramid Network (FPN) i baslinjen. Dessutom presenterar vi en variant av OpenPifPaf känd som CifDet. CifDet använder Composite Intensity Field för att klassificera och detektera objektcentrum, för att sedan regrediera inramningslådor från dessa identifierade centrum. Vidare introducerar vi en utökad version av CifDet som är speciellt anpassad för förbättrade objektdetekteringsförmågor—CifCafDet. Detta förstärkta ramverk är utformat för att mer effektivt ta itu med de utmaningar som är inneboende i objektdetekteringsuppgifter. Basmodellen OpenPifPaf överträffar de flesta befintliga bottom-up-metoder för posestimering och uppnår jämförbara resultat med vissa toppmoderna top-down-metoder på COCO-keypoint-datasetet. Dess variant, CifDet, anpassar OpenPifPafs kompositfältbaserade arkitektur för objekt-detekteringsuppgifter. Ytterligare modifieringar resulterar i CifCafDet, som visar förbättrad prestanda på MS COCO-detektionsdatasetet över CifDet, vilket antyder dess livskraft som ett ramverk för flera uppgifter.
437

Knowledge Transfer for Person Detection in Event-Based Vision

Suihko, Gabriel January 2024 (has links)
This thesis investigates the application of knowledge transfer techniques to process event-based data forperson detection in area surveillance. A teacher-student model setup is employed, where both modelsare pretrained on conventional visual data. The teacher model processes visual images to generate targetlabels for the student model trained on event-based data, forming the baseline model. Building onthis, the project incorporates feature-based knowledge transfer, specifically transferring features fromthe Feature Pyramid Network (FPN) component of the Faster R-CNN ResNet-50 FPN network. Resultsindicate that response-based knowledge transfer can effectively finetune models for event-based data.However, feature-based knowledge transfer yields mixed results, requiring more refined techniques forconsistent improvement. The study identifies limitations, including the need for a more diverse dataset,improved preprocessing methods, labeling techniques, and refined feature-based knowledge transfermethods. This research bridges the gap between conventional object detection methods and event-baseddata, enhancing the applicability of event cameras in surveillance applications.
438

Near Realtime Object Detection : Optimizing YOLO Models for Efficiency and Accuracy for Computer Vision Applications

Abo Khalaf, Mulham January 2024 (has links)
Syftet med denna studie är att förbättra effektiviteten och noggrannheten hos YOLO-modeller genom att optimera dem, särskilt när de står inför begränsade datorresurser. Det akuta behovet av objektigenkänning i nära realtid i tillämpningar som övervakningssystem och autonom körning understryker betydelsen av bearbetningshastighet och exceptionell noggrannhet. Avhandlingen fokuserar på svårigheterna med att implementera komplexa modeller för objektidentifiering på enheter med låg kapacitet, nämligen Jetson Orin Nano. Den föreslår många optimeringsmetoder för att övervinna dessa hinder. Vi utförde flera försök och gjorde metodologiska förbättringar för att minska bearbetningskraven och samtidigt bibehålla en stark prestanda för objektdetektering. Viktiga komponenter i forskningen inkluderar noggrann modellträning, användning av bedömningskriterier och undersökning av optimeringseffekter på modellprestanda i verkliga miljöer. Studien visar att det är möjligt att uppnå optimal prestanda i YOLO-modeller trots begränsade resurser, vilket ger betydande framsteg inom datorseende och maskininlärning. / The objective of this study is to improve the efficiency and accuracy of YOLO models by optimizing them, particularly when faced with limited computing resources. The urgent need for near realtime object recognition in applications such as surveillance systems and autonomous driving underscores the significance of processing speed and exceptional accuracy. The thesis focuses on the difficulties of implementing complex object identification models on low-capacity devices, namely the Jetson Orin Nano. It suggests many optimization methods to overcome these obstacles. We performed several trials and made methodological improvements to decrease processing requirements while maintaining strong object detecting performance. Key components of the research include meticulous model training, the use of assessment criteria, and the investigation of optimization effects on model performance in reallife settings. The study showcases the feasibility of achieving optimal performance in YOLO models despite limited resources, bringing substantial advancements in computer vision and machine learning.
439

Estimation de cartes d'énergie du bruit apériodique de la marche humaine avec une caméra de profondeur pour la détection de pathologies et modèles légers de détection d'objets saillants basés sur l'opposition de couleurs

Ndayikengurukiye, Didier 06 1900 (has links)
Cette thèse a pour objectif l’étude de trois problèmes : l’estimation de cartes de saillance de l’énergie du bruit apériodique de la marche humaine par la perception de profondeur pour la détection de pathologies, les modèles de détection d’objets saillants en général et les modèles légers en particulier par l’opposition de couleurs. Comme première contribution, nous proposons un système basé sur une caméra de profondeur et un tapis roulant, qui analyse les parties du corps du patient ayant un mouvement irrégulier, en termes de périodicité, pendant la marche. Nous supposons que la marche d'un sujet sain présente n'importe où dans son corps, pendant les cycles de marche, un signal de profondeur avec un motif périodique sans bruit. La présence de bruit et son importance peuvent être utilisées pour signaler la présence et l'étendue de pathologies chez le sujet. Notre système estime, à partir de chaque séquence vidéo, une carte couleur de saillance montrant les zones de fortes irrégularités de marche, en termes de périodicité, appelées énergie de bruit apériodique, de chaque sujet. Notre système permet aussi de détecter automatiquement les cartes des individus sains et ceux malades. Nous présentons ensuite deux approches pour la détection d’objets saillants. Bien qu’ayant fait l’objet de plusieurs travaux de recherche, la détection d'objets saillants reste un défi. La plupart des modèles traitent la couleur et la texture séparément et les considèrent donc implicitement comme des caractéristiques indépendantes, à tort. Comme deuxième contribution, nous proposons une nouvelle stratégie, à travers un modèle simple, presque sans paramètres internes, générant une carte de saillance robuste pour une image naturelle. Cette stratégie consiste à intégrer la couleur dans les motifs de texture pour caractériser une micro-texture colorée, ceci grâce au motif ternaire local (LTP) (descripteur de texture simple mais puissant) appliqué aux paires de couleurs. La dissemblance entre chaque paire de micro-textures colorées est calculée en tenant compte de la non-linéarité des micro-textures colorées et en préservant leurs distances, donnant une carte de saillance intermédiaire pour chaque espace de couleur. La carte de saillance finale est leur combinaison pour avoir des cartes robustes. Le développement des réseaux de neurones profonds a récemment permis des performances élevées. Cependant, il reste un défi de développer des modèles de même performance pour des appareils avec des ressources limitées. Comme troisième contribution, nous proposons une nouvelle approche pour un modèle léger de réseau neuronal profond de détection d'objets saillants, inspiré par les processus de double opposition du cortex visuel primaire, qui lient inextricablement la couleur et la forme dans la perception humaine des couleurs. Notre modèle proposé, CoSOV1net, est entraîné à partir de zéro, sans utiliser de ``backbones'' de classification d'images ou d'autres tâches. Les expériences sur les ensembles de données les plus utilisés et les plus complexes pour la détection d'objets saillants montrent que CoSOV1Net atteint des performances compétitives avec des modèles de l’état-de-l’art, tout en étant un modèle léger de détection d'objets saillants et pouvant être adapté aux environnements mobiles et aux appareils à ressources limitées. / The purpose of this thesis is to study three problems: the estimation of saliency maps of the aperiodic noise energy of human gait using depth perception for pathology detection, and to study models for salient objects detection in general and lightweight models in particular by color opposition. As our first contribution, we propose a system based on a depth camera and a treadmill, which analyzes the parts of the patient's body with irregular movement, in terms of periodicity, during walking. We assume that a healthy subject gait presents anywhere in his (her) body, during gait cycles, a depth signal with a periodic pattern without noise. The presence of noise and its importance can be used to point out presence and extent of the subject’s pathologies. Our system estimates, from each video sequence, a saliency map showing the areas of strong gait irregularities, in terms of periodicity, called aperiodic noise energy, of each subject. Our system also makes it possible to automatically detect the saliency map of healthy and sick subjects. We then present two approaches for salient objects detection. Although having been the subject of many research works, salient objects detection remains a challenge. Most models treat color and texture separately and therefore implicitly consider them as independent feature, erroneously. As a second contribution, we propose a new strategy through a simple model, almost without internal parameters, generating a robust saliency map for a natural image. This strategy consists in integrating color in texture patterns to characterize a colored micro-texture thanks to the local ternary pattern (LTP) (simple but powerful texture descriptor) applied to the color pairs. The dissimilarity between each colored micro-textures pair is computed considering non-linearity from colored micro-textures and preserving their distances. This gives an intermediate saliency map for each color space. The final saliency map is their combination to have robust saliency map. The development of deep neural networks has recently enabled high performance. However, it remains a challenge to develop models of the same performance for devices with limited resources. As a third contribution, we propose a new approach for a lightweight salient objects detection deep neural network model, inspired by the double opponent process in the primary visual cortex, which inextricably links color and shape in human color perception. Our proposed model, namely CoSOV1net, is trained from scratch, without using any image classification backbones or other tasks. Experiments on the most used and challenging datasets for salient objects detection show that CoSOV1Net achieves competitive performance with state-of-the-art models, yet it is a lightweight detection model and it is a salient objects detection that can be adapted to mobile environments and resource-constrained devices.
440

Object Based Image Retrieval Using Feature Maps of a YOLOv5 Network / Objektbaserad bildhämtning med hjälp av feature maps från ett YOLOv5-nätverk

Essinger, Hugo, Kivelä, Alexander January 2022 (has links)
As Machine Learning (ML) methods have gained traction in recent years, someproblems regarding the construction of such methods have arisen. One such problem isthe collection and labeling of data sets. Specifically when it comes to many applicationsof Computer Vision (CV), one needs a set of images, labeled as either being of someclass or not. Creating such data sets can be very time consuming. This project setsout to tackle this problem by constructing an end-to-end system for searching forobjects in images (i.e. an Object Based Image Retrieval (OBIR) method) using an objectdetection framework (You Only Look Once (YOLO) [16]). The goal of the project wasto create a method that; given an image of an object of interest q, search for that sameor similar objects in a set of other images S. The core concept of the idea is to passthe image q through an object detection model (in this case YOLOv5 [16]), create a”fingerprint” (can be seen as a sort of identity for an object) from a set of feature mapsextracted from the YOLOv5 [16] model and look for corresponding similar parts of aset of feature maps extracted from other images. An investigation regarding whichvalues to select for a few different parameters was conducted, including a comparisonof performance for a couple of different similarity metrics. In the table below,the parameter combination which resulted in the highest F_Top_300-score (a measureindicating the amount of relevant images retrieved among the top 300 recommendedimages) in the parameter selection phase is presented. Layer: 23Pool Methd: maxSim. Mtrc: eucFP Kern. Sz: 4 Evaluation of the method resulted in F_Top_300-scores as can be seen in the table below. Mouse: 0.820Duck: 0.640Coin: 0.770Jet ski: 0.443Handgun: 0.807Average: 0.696 / Medan ML-metoder har blivit mer populära under senare år har det uppstått endel problem gällande konstruktionen av sådana metoder. Ett sådant problem ärinsamling och annotering av data. Mer specifikt när det kommer till många metoderför datorseende behövs ett set av bilder, annoterande att antingen vara eller inte varaav en särskild klass. Att skapa sådana dataset kan vara väldigt tidskonsumerande.Metoden som konstruerades för detta projekt avser att bekämpa detta problem genomatt konstruera ett end-to-end-system för att söka efter objekt i bilder (alltså en OBIR-metod) med hjälp av en objektdetekteringsalgoritm (YOLO). Målet med projektet varatt skapa en metod som; givet en bild q av ett objekt, söka efter samma eller liknandeobjekt i ett bibliotek av bilder S. Huvudkonceptet bakom idén är att köra bilden qgenom objektdetekteringsmodellen (i detta fall YOLOv5 [16]), skapa ett ”fingerprint”(kan ses som en sorts identitet för ett objekt) från en samling feature maps extraheradefrån YOLOv5-modellen [16] och leta efter liknande delar av samlingar feature maps iandra bilder. En utredning angående vilka värden som skulle användas för ett antalolika parametrar utfördes, inklusive en jämförelse av prestandan som resultat av olikalikhetsmått. I tabellen nedan visas den parameterkombination som gav högst F_Top_300(ett mått som indikerar andelen relevanta bilder bland de 300 högst rekommenderadebilderna). Layer: 23Pool Methd: maxSim. Mtrc: eucFP Kern. Sz: 4 Evaluering av metoden med parameterval enligt tabellen ovan resulterade i F_Top_300enligt tabellen nedan. Mouse: 0.820Duck: 0.640Coin: 0.770Jet ski: 0.443Handgun: 0.807Average: 0.696

Page generated in 0.123 seconds