Spelling suggestions: "subject:"abject detection"" "subject:"6bject detection""
441 |
Knowledge Transfer for Person Detection in Event-Based VisionSuihko, Gabriel January 2024 (has links)
This thesis investigates the application of knowledge transfer techniques to process event-based data forperson detection in area surveillance. A teacher-student model setup is employed, where both modelsare pretrained on conventional visual data. The teacher model processes visual images to generate targetlabels for the student model trained on event-based data, forming the baseline model. Building onthis, the project incorporates feature-based knowledge transfer, specifically transferring features fromthe Feature Pyramid Network (FPN) component of the Faster R-CNN ResNet-50 FPN network. Resultsindicate that response-based knowledge transfer can effectively finetune models for event-based data.However, feature-based knowledge transfer yields mixed results, requiring more refined techniques forconsistent improvement. The study identifies limitations, including the need for a more diverse dataset,improved preprocessing methods, labeling techniques, and refined feature-based knowledge transfermethods. This research bridges the gap between conventional object detection methods and event-baseddata, enhancing the applicability of event cameras in surveillance applications.
|
442 |
Estimation de cartes d'énergie du bruit apériodique de la marche humaine avec une caméra de profondeur pour la détection de pathologies et modèles légers de détection d'objets saillants basés sur l'opposition de couleursNdayikengurukiye, Didier 06 1900 (has links)
Cette thèse a pour objectif l’étude de trois problèmes : l’estimation de cartes de saillance de l’énergie du bruit apériodique de la marche humaine par la perception de profondeur pour la détection de pathologies, les modèles de détection d’objets saillants en général et les modèles légers en particulier par l’opposition de couleurs.
Comme première contribution, nous proposons un système basé sur une caméra de profondeur et un tapis roulant, qui analyse les parties du corps du patient ayant un mouvement irrégulier, en termes de périodicité, pendant la marche. Nous supposons que la marche d'un sujet sain présente n'importe où dans son corps, pendant les cycles de marche, un signal de profondeur avec un motif périodique sans bruit. La présence de bruit et son importance peuvent être utilisées pour signaler la présence et l'étendue de pathologies chez le sujet. Notre système estime, à partir de chaque séquence vidéo, une carte couleur de saillance montrant les zones de fortes irrégularités de marche, en termes de périodicité, appelées énergie de bruit apériodique, de chaque sujet. Notre système permet aussi de détecter automatiquement les cartes des individus sains et ceux malades.
Nous présentons ensuite deux approches pour la détection d’objets saillants. Bien qu’ayant fait l’objet de plusieurs travaux de recherche, la détection d'objets saillants reste un défi. La plupart des modèles traitent la couleur et la texture séparément et les considèrent donc implicitement comme des caractéristiques indépendantes, à tort.
Comme deuxième contribution, nous proposons une nouvelle stratégie, à travers un modèle simple, presque sans paramètres internes, générant une carte de saillance robuste pour une image naturelle. Cette stratégie consiste à intégrer la couleur dans les motifs de texture pour caractériser une micro-texture colorée, ceci grâce au motif ternaire local (LTP) (descripteur de texture simple mais puissant) appliqué aux paires de couleurs. La dissemblance entre chaque paire de micro-textures colorées est calculée en tenant compte de la non-linéarité des micro-textures colorées et en préservant leurs distances, donnant une carte de saillance intermédiaire pour chaque espace de couleur. La carte de saillance finale est leur combinaison pour avoir des cartes robustes.
Le développement des réseaux de neurones profonds a récemment permis des performances élevées. Cependant, il reste un défi de développer des modèles de même performance pour des appareils avec des ressources limitées.
Comme troisième contribution, nous proposons une nouvelle approche pour un modèle léger de réseau neuronal profond de détection d'objets saillants, inspiré par les processus de double opposition du cortex visuel primaire, qui lient inextricablement la couleur et la forme dans la perception humaine des couleurs. Notre modèle proposé, CoSOV1net, est entraîné à partir de zéro, sans utiliser de ``backbones'' de classification d'images ou d'autres tâches. Les expériences sur les ensembles de données les plus utilisés et les plus complexes pour la détection d'objets saillants montrent que CoSOV1Net atteint des performances compétitives avec des modèles de l’état-de-l’art, tout en étant un modèle léger de détection d'objets saillants et pouvant être adapté aux environnements mobiles et aux appareils à ressources limitées. / The purpose of this thesis is to study three problems: the estimation of saliency maps of the aperiodic noise energy of human gait using depth perception for pathology detection, and to study models for salient objects detection in general and lightweight models in particular by color opposition.
As our first contribution, we propose a system based on a depth camera and a treadmill, which analyzes the parts of the patient's body with irregular movement, in terms of periodicity, during walking. We assume that a healthy subject gait presents anywhere in his (her) body, during gait cycles, a depth signal with a periodic pattern without noise. The presence of noise and its importance can be used to point out presence and extent of the subject’s pathologies. Our system estimates, from each video sequence, a saliency map showing the areas of strong gait irregularities, in terms of periodicity, called aperiodic noise energy, of each subject. Our system also makes it possible to automatically detect the saliency map of healthy and sick subjects.
We then present two approaches for salient objects detection. Although having been the subject of many research works, salient objects detection remains a challenge. Most models treat color and texture separately and therefore implicitly consider them as independent feature, erroneously.
As a second contribution, we propose a new strategy through a simple model, almost without internal parameters, generating a robust saliency map for a natural image. This strategy consists in integrating color in texture patterns to characterize a colored micro-texture thanks to the local ternary pattern (LTP) (simple but powerful texture descriptor) applied to the color pairs. The dissimilarity between each colored micro-textures pair is computed considering non-linearity from colored micro-textures and preserving their distances. This gives an intermediate saliency map for each color space. The final saliency map is their combination to have robust saliency map.
The development of deep neural networks has recently enabled high performance. However, it remains a challenge to develop models of the same performance for devices with limited resources.
As a third contribution, we propose a new approach for a lightweight salient objects detection deep neural network model, inspired by the double opponent process in the primary visual cortex, which inextricably links color and shape in human color perception. Our proposed model, namely CoSOV1net, is trained from scratch, without using any image classification backbones or other tasks. Experiments on the most used and challenging datasets for salient objects detection show that CoSOV1Net achieves competitive performance with state-of-the-art models, yet it is a lightweight detection model and it is a salient objects detection that can be adapted to mobile environments and resource-constrained devices.
|
443 |
Object Based Image Retrieval Using Feature Maps of a YOLOv5 Network / Objektbaserad bildhämtning med hjälp av feature maps från ett YOLOv5-nätverkEssinger, Hugo, Kivelä, Alexander January 2022 (has links)
As Machine Learning (ML) methods have gained traction in recent years, someproblems regarding the construction of such methods have arisen. One such problem isthe collection and labeling of data sets. Specifically when it comes to many applicationsof Computer Vision (CV), one needs a set of images, labeled as either being of someclass or not. Creating such data sets can be very time consuming. This project setsout to tackle this problem by constructing an end-to-end system for searching forobjects in images (i.e. an Object Based Image Retrieval (OBIR) method) using an objectdetection framework (You Only Look Once (YOLO) [16]). The goal of the project wasto create a method that; given an image of an object of interest q, search for that sameor similar objects in a set of other images S. The core concept of the idea is to passthe image q through an object detection model (in this case YOLOv5 [16]), create a”fingerprint” (can be seen as a sort of identity for an object) from a set of feature mapsextracted from the YOLOv5 [16] model and look for corresponding similar parts of aset of feature maps extracted from other images. An investigation regarding whichvalues to select for a few different parameters was conducted, including a comparisonof performance for a couple of different similarity metrics. In the table below,the parameter combination which resulted in the highest F_Top_300-score (a measureindicating the amount of relevant images retrieved among the top 300 recommendedimages) in the parameter selection phase is presented. Layer: 23Pool Methd: maxSim. Mtrc: eucFP Kern. Sz: 4 Evaluation of the method resulted in F_Top_300-scores as can be seen in the table below. Mouse: 0.820Duck: 0.640Coin: 0.770Jet ski: 0.443Handgun: 0.807Average: 0.696 / Medan ML-metoder har blivit mer populära under senare år har det uppstått endel problem gällande konstruktionen av sådana metoder. Ett sådant problem ärinsamling och annotering av data. Mer specifikt när det kommer till många metoderför datorseende behövs ett set av bilder, annoterande att antingen vara eller inte varaav en särskild klass. Att skapa sådana dataset kan vara väldigt tidskonsumerande.Metoden som konstruerades för detta projekt avser att bekämpa detta problem genomatt konstruera ett end-to-end-system för att söka efter objekt i bilder (alltså en OBIR-metod) med hjälp av en objektdetekteringsalgoritm (YOLO). Målet med projektet varatt skapa en metod som; givet en bild q av ett objekt, söka efter samma eller liknandeobjekt i ett bibliotek av bilder S. Huvudkonceptet bakom idén är att köra bilden qgenom objektdetekteringsmodellen (i detta fall YOLOv5 [16]), skapa ett ”fingerprint”(kan ses som en sorts identitet för ett objekt) från en samling feature maps extraheradefrån YOLOv5-modellen [16] och leta efter liknande delar av samlingar feature maps iandra bilder. En utredning angående vilka värden som skulle användas för ett antalolika parametrar utfördes, inklusive en jämförelse av prestandan som resultat av olikalikhetsmått. I tabellen nedan visas den parameterkombination som gav högst F_Top_300(ett mått som indikerar andelen relevanta bilder bland de 300 högst rekommenderadebilderna). Layer: 23Pool Methd: maxSim. Mtrc: eucFP Kern. Sz: 4 Evaluering av metoden med parameterval enligt tabellen ovan resulterade i F_Top_300enligt tabellen nedan. Mouse: 0.820Duck: 0.640Coin: 0.770Jet ski: 0.443Handgun: 0.807Average: 0.696
|
444 |
Optimierung von Algorithmen zur Videoanalyse / Optimization of algorithms for video analysis : A framework to fit the demands of local television stationsRitter, Marc 02 February 2015 (has links) (PDF)
Die Datenbestände lokaler Fernsehsender umfassen oftmals mehrere zehntausend Videokassetten. Moderne Verfahren werden benötigt, um derartige Datenkollektionen inhaltlich automatisiert zu erschließen. Das Auffinden relevanter Objekte spielt dabei eine übergeordnete Rolle, wobei gesteigerte Anforderungen wie niedrige Fehler- und hohe Detektionsraten notwendig sind, um eine Korruption des Suchindex zu verhindern und erfolgreiche Recherchen zu ermöglichen. Zugleich müssen genügend Objekte indiziert werden, um Aussagen über den tatsächlichen Inhalt zu treffen.
Diese Arbeit befasst sich mit der Anpassung und Optimierung bestehender Detektionsverfahren. Dazu wird ein auf die hohen Leistungsbedürfnisse der Videoanalyse zugeschnittenes holistisches Workflow- und Prozesssystem mit der Zielstellung implementiert, die Entwicklung von Bilderkennungsalgorithmen, die Visualisierung von Zwischenschritten sowie deren Evaluation zu ermöglichen. Im Fokus stehen Verfahren zur strukturellen Zerlegung von Videomaterialien und zur inhaltlichen Analyse im Bereich der Gesichtsdetektion und Fußgängererkennung. / The data collections of local television stations often consist of multiples of ten thousand video tapes. Modern methods are needed to exploit the content of such archives. While the retrieval of objects plays a fundamental role, essential requirements incorporate low false and high detection rates in order to prevent the corruption of the search index. However, a sufficient number of objects need to be found to make assumptions about the content explored.
This work focuses on the adjustment and optimization of existing detection techniques. Therefor, the author develops a holistic framework that directly reflects on the high demands of video analysis with the aim to facilitate the development of image processing algorithms, the visualization of intermediate results, and their evaluation and optimization. The effectiveness of the system is demonstrated on the structural decomposition of video footage and on content-based detection of faces and pedestrians.
|
445 |
基於方向性邊緣特徵之即時物件偵測與追蹤 / Real-Time Object Detection and Tracking using Directional Edge Maps王財得, Wang, Tsai-Te Unknown Date (has links)
在電腦視覺的研究之中,有關物件的偵測與追蹤應用在速度及可靠性上的追求一直是相當具有挑戰性的問題,而現階段發展以視覺為基礎互動式的應用,所使用到技術諸如:類神經網路、SVM及貝氏網路等。
本論文中我們持續深入此領域,並提出及發展一個方向性邊緣特徵集(DEM)與修正後的AdaBoost訓練演算法相互結合,期能有效提高物件偵測與識別的速度及準確性,在實際驗證中,我們將之應用於多種角度之人臉偵測,以及臉部表情識別等兩個主要問題之上;在人臉偵測的應用中,我們使用CMU的臉部資料庫並與Viola & Jones方法進行分析比較,在準確率上,我們的方法擁有79% 的recall及90% 的precision,而Viola & ones的方法則分別為81%及77%;在運算速度上,同樣處理512x384的影像,相較於Viola & Jones需時132ms,我們提出的方法則有較佳的82ms。
此外,於表情識別的應用中,我們結合運用Component-based及Action-unit model 兩種方法。前者的優勢在於提供臉部細節特徵的定位及追蹤變化,後者主要功用則為進行情緒表情的分類。我們對於四種不同情緒表情的辨識準確度如下:高興(83.6%)、傷心(72.7%)、驚訝(80%) 、生氣(78.1%)。在實驗中,可以發現生氣及傷心兩種情緒較難區分,而高興與驚訝則較易識別。 / Rapid and robust detection and tracking of objects is a challenging problem in computer vision research. Techniques such as artificial neural networks, support vector machine and Bayesian networks have been developed to enable interactive vision-based applications. In this thesis, we tackle this issue by devising a novel feature descriptor named directional edge maps (DEM). When combined with a modified AdaBoost training algorithm, the proposed descriptor can produce effective results in many object detection and recognition tasks.
We have applied the newly developed method to two important object recognition problems, namely, face detection and facial expression recognition. The DEM-based methodology conceived in this thesis is capable of detecting faces of multiple views. To test the efficacy of our face detection mechanism, we have performed a comparative analysis with the Viola and Jones algorithm using Carnegie Mellon University face database. The recall and precision using our approach is 79% and 90%, respectively, compared to 81% and 77% using Viola and Jones algorithm. Our algorithm is also more efficient, requiring only 82 ms (compared to 132 ms by Viola and Jones) for processing a 512x384 image.
To achieve robust facial expression recognition, we have combined component-based methods and action-unit model-based approaches. The component-based method is mainly utilized to locate important facial features and track their deformations. Action-unit model-based approach is then employed to carry out expression recognition. The accuracy of classifying different emotion type is as follows: happiness 83.6%, sadness 72.7%, surprise 80%, and anger 78.1%. It turns out that anger and sadness are more difficult to distinguish, whereas happiness and surprise expression have higher recognition rates.
|
446 |
Table tennis event detection and classificationOldham, Kevin M. January 2015 (has links)
It is well understood that multiple video cameras and computer vision (CV) technology can be used in sport for match officiating, statistics and player performance analysis. A review of the literature reveals a number of existing solutions, both commercial and theoretical, within this domain. However, these solutions are expensive and often complex in their installation. The hypothesis for this research states that by considering only changes in ball motion, automatic event classification is achievable with low-cost monocular video recording devices, without the need for 3-dimensional (3D) positional ball data and representation. The focus of this research is a rigorous empirical study of low cost single consumer-grade video camera solutions applied to table tennis, confirming that monocular CV based detected ball location data contains sufficient information to enable key match-play events to be recognised and measured. In total a library of 276 event-based video sequences, using a range of recording hardware, were produced for this research. The research has four key considerations: i) an investigation into an effective recording environment with minimum configuration and calibration, ii) the selection and optimisation of a CV algorithm to detect the ball from the resulting single source video data, iii) validation of the accuracy of the 2-dimensional (2D) CV data for motion change detection, and iv) the data requirements and processing techniques necessary to automatically detect changes in ball motion and match those to match-play events. Throughout the thesis, table tennis has been chosen as the example sport for observational and experimental analysis since it offers a number of specific CV challenges due to the relatively high ball speed (in excess of 100kph) and small ball size (40mm in diameter). Furthermore, the inherent rules of table tennis show potential for a monocular based event classification vision system. As the initial stage, a proposed optimum location and configuration of the single camera is defined. Next, the selection of a CV algorithm is critical in obtaining usable ball motion data. It is shown in this research that segmentation processes vary in their ball detection capabilities and location out-puts, which ultimately affects the ability of automated event detection and decision making solutions. Therefore, a comparison of CV algorithms is necessary to establish confidence in the accuracy of the derived location of the ball. As part of the research, a CV software environment has been developed to allow robust, repeatable and direct comparisons between different CV algorithms. An event based method of evaluating the success of a CV algorithm is proposed. Comparison of CV algorithms is made against the novel Efficacy Metric Set (EMS), producing a measurable Relative Efficacy Index (REI). Within the context of this low cost, single camera ball trajectory and event investigation, experimental results provided show that the Horn-Schunck Optical Flow algorithm, with a REI of 163.5 is the most successful method when compared to a discrete selection of CV detection and extraction techniques gathered from the literature review. Furthermore, evidence based data from the REI also suggests switching to the Canny edge detector (a REI of 186.4) for segmentation of the ball when in close proximity to the net. In addition to and in support of the data generated from the CV software environment, a novel method is presented for producing simultaneous data from 3D marker based recordings, reduced to 2D and compared directly to the CV output to establish comparative time-resolved data for the ball location. It is proposed here that a continuous scale factor, based on the known dimensions of the ball, is incorporated at every frame. Using this method, comparison results show a mean accuracy of 3.01mm when applied to a selection of nineteen video sequences and events. This tolerance is within 10% of the diameter of the ball and accountable by the limits of image resolution. Further experimental results demonstrate the ability to identify a number of match-play events from a monocular image sequence using a combination of the suggested optimum algorithm and ball motion analysis methods. The results show a promising application of 2D based CV processing to match-play event classification with an overall success rate of 95.9%. The majority of failures occur when the ball, during returns and services, is partially occluded by either the player or racket, due to the inherent problem of using a monocular recording device. Finally, the thesis proposes further research and extensions for developing and implementing monocular based CV processing of motion based event analysis and classification in a wider range of applications.
|
447 |
Multilevel Datenfusion konkurrierender Sensoren in der FahrzeugumfelderfassungHaberjahn, Mathias 21 November 2013 (has links)
Mit der vorliegenden Dissertation soll ein Beitrag zur Steigerung der Genauigkeit und Zuverlässigkeit einer sensorgestützten Objekterkennung im Fahrzeugumfeld geleistet werden. Aufbauend auf einem Erfassungssystem, bestehend aus einer Stereokamera und einem Mehrzeilen-Laserscanner, werden teils neu entwickelte Verfahren für die gesamte Verarbeitungskette vorgestellt. Zusätzlich wird ein neuartiges Framework zur Fusion heterogener Sensordaten eingeführt, welches über eine Zusammenführung der Fusionsergebnisse aus den unterschiedlichen Verarbeitungsebenen in der Lage ist, die Objektbestimmung zu verbessern. Nach einer Beschreibung des verwendeten Sensoraufbaus werden die entwickelten Verfahren zur Kalibrierung des Sensorpaares vorgestellt. Bei der Segmentierung der räumlichen Punktdaten werden bestehende Verfahren durch die Einbeziehung von Messgenauigkeit und Messspezifik des Sensors erweitert. In der anschließenden Objektverfolgung wird neben einem neuartigen berechnungsoptimierten Ansatz zur Objektassoziierung ein Modell zur adaptiven Referenzpunktbestimmung und –Verfolgung beschrieben. Durch das vorgestellte Fusions-Framework ist es möglich, die Sensordaten wahlweise auf drei unterschiedlichen Verarbeitungsebenen (Punkt-, Objekt- und Track-Ebene) zu vereinen. Hierzu wird ein sensorunabhängiger Ansatz zur Fusion der Punktdaten dargelegt, der im Vergleich zu den anderen Fusionsebenen und den Einzelsensoren die genaueste Objektbeschreibung liefert. Für die oberen Fusionsebenen wurden unter Ausnutzung der konkurrierenden Sensorinformationen neuartige Verfahren zur Bestimmung und Reduzierung der Detektions- und Verarbeitungsfehler entwickelt. Abschließend wird beschrieben, wie die fehlerreduzierenden Verfahren der oberen Fusionsebenen mit der optimalen Objektbeschreibung der unteren Fusionsebene für eine optimale Objektbestimmung zusammengeführt werden können. Die Effektivität der entwickelten Verfahren wurde durch Simulation oder in realen Messszenarien überprüft. / With the present thesis a contribution to the increase of the accuracy and reliability of a sensor-supported recognition and tracking of objects in a vehicle’s surroundings should be made. Based on a detection system, consisting of a stereo camera and a laser scanner, novel developed procedures are introduced for the whole processing chain of the sensor data. In addition, a new framework is introduced for the fusion of heterogeneous sensor data. By combining the data fusion results from the different processing levels the object detection can be improved. After a short description of the used sensor setup the developed procedures for the calibration and mutual orientation are introduced. With the segmentation of the spatial point data existing procedures are extended by the inclusion of measuring accuracy and specificity of the sensor. In the subsequent object tracking a new computation-optimized approach for the association of the related object hypotheses is presented. In addition, a model for a dynamic determination and tracking of an object reference point is described which exceeds the classical tracking of the object center in the track accuracy. By the introduced fusion framework it is possible to merge the sensor data at three different processing levels (point, object and track level). A sensor independent approach for the low fusion of point data is demonstrated which delivers the most precise object description in comparison to the other fusion levels and the single sensors. For the higher fusion levels new procedures were developed to discover and clean up the detection and processing mistakes benefiting from the competing sensor information. Finally it is described how the fusion results of the upper and lower levels can be brought together for an ideal object description. The effectiveness of the newly developed methods was checked either by simulation or in real measurement scenarios.
|
448 |
Weakly supervised learning of deformable part models and convolutional neural networks for object detection / Détection d'objets faiblement supervisée par modèles de pièces déformables et réseaux de neurones convolutionnelsTang, Yuxing 14 December 2016 (has links)
Dans cette thèse, nous nous intéressons au problème de la détection d’objets faiblement supervisée. Le but est de reconnaître et de localiser des objets dans les images, n’ayant à notre disposition durant la phase d’apprentissage que des images partiellement annotées au niveau des objets. Pour cela, nous avons proposé deux méthodes basées sur des modèles différents. Pour la première méthode, nous avons proposé une amélioration de l’approche ”Deformable Part-based Models” (DPM) faiblement supervisée, en insistant sur l’importance de la position et de la taille du filtre racine initial spécifique à la classe. Tout d’abord, un ensemble de candidats est calculé, ceux-ci représentant les positions possibles de l’objet pour le filtre racine initial, en se basant sur une mesure générique d’objectness (par region proposals) pour combiner les régions les plus saillantes et potentiellement de bonne qualité. Ensuite, nous avons proposé l’apprentissage du label des classes latentes de chaque candidat comme un problème de classification binaire, en entrainant des classifieurs spécifiques pour chaque catégorie afin de prédire si les candidats sont potentiellement des objets cible ou non. De plus, nous avons amélioré la détection en incorporant l’information contextuelle à partir des scores de classification de l’image. Enfin, nous avons élaboré une procédure de post-traitement permettant d’élargir et de contracter les régions fournies par le DPM afin de les adapter efficacement à la taille de l’objet, augmentant ainsi la précision finale de la détection. Pour la seconde approche, nous avons étudié dans quelle mesure l’information tirée des objets similaires d’un point de vue visuel et sémantique pouvait être utilisée pour transformer un classifieur d’images en détecteur d’objets d’une manière semi-supervisée sur un large ensemble de données, pour lequel seul un sous-ensemble des catégories d’objets est annoté avec des boîtes englobantes nécessaires pour l’apprentissage des détecteurs. Nous avons proposé de transformer des classifieurs d’images basés sur des réseaux convolutionnels profonds (Deep CNN) en détecteurs d’objets en modélisant les différences entre les deux en considérant des catégories disposant à la fois de l’annotation au niveau de l’image globale et l’annotation au niveau des boîtes englobantes. Cette information de différence est ensuite transférée aux catégories sans annotation au niveau des boîtes englobantes, permettant ainsi la conversion de classifieurs d’images en détecteurs d’objets. Nos approches ont été évaluées sur plusieurs jeux de données tels que PASCAL VOC, ImageNet ILSVRC et Microsoft COCO. Ces expérimentations ont démontré que nos approches permettent d’obtenir des résultats comparables à ceux de l’état de l’art et qu’une amélioration significative a pu être obtenue par rapport à des méthodes récentes de détection d’objets faiblement supervisées. / In this dissertation we address the problem of weakly supervised object detection, wherein the goal is to recognize and localize objects in weakly-labeled images where object-level annotations are incomplete during training. To this end, we propose two methods which learn two different models for the objects of interest. In our first method, we propose a model enhancing the weakly supervised Deformable Part-based Models (DPMs) by emphasizing the importance of location and size of the initial class-specific root filter. We first compute a candidate pool that represents the potential locations of the object as this root filter estimate, by exploring the generic objectness measurement (region proposals) to combine the most salient regions and “good” region proposals. We then propose learning of the latent class label of each candidate window as a binary classification problem, by training category-specific classifiers used to coarsely classify a candidate window into either a target object or a non-target class. Furthermore, we improve detection by incorporating the contextual information from image classification scores. Finally, we design a flexible enlarging-and-shrinking post-processing procedure to modify the DPMs outputs, which can effectively match the approximate object aspect ratios and further improve final accuracy. Second, we investigate how knowledge about object similarities from both visual and semantic domains can be transferred to adapt an image classifier to an object detector in a semi-supervised setting on a large-scale database, where a subset of object categories are annotated with bounding boxes. We propose to transform deep Convolutional Neural Networks (CNN)-based image-level classifiers into object detectors by modeling the differences between the two on categories with both image-level and bounding box annotations, and transferring this information to convert classifiers to detectors for categories without bounding box annotations. We have evaluated both our approaches extensively on several challenging detection benchmarks, e.g. , PASCAL VOC, ImageNet ILSVRC and Microsoft COCO. Both our approaches compare favorably to the state-of-the-art and show significant improvement over several other recent weakly supervised detection methods.
|
449 |
Compressed Convolutional Neural Network for Autonomous SystemsDurvesh Pathak (5931110) 17 January 2019 (has links)
The word “Perception” seems to be intuitive and maybe the most straightforward
problem for the human brain because as a child we have been trained to classify
images, detect objects, but for computers, it can be a daunting task. Giving intuition
and reasoning to a computer which has mere capabilities to accept commands
and process those commands is a big challenge. However, recent leaps in hardware
development, sophisticated software frameworks, and mathematical techniques have
made it a little less daunting if not easy. There are various applications built around
to the concept of “Perception”. These applications require substantial computational
resources, expensive hardware, and some sophisticated software frameworks. Building
an application for perception for the embedded system is an entirely different
ballgame. Embedded system is a culmination of hardware, software and peripherals
developed for specific tasks with imposed constraints on memory and power.
Therefore, the applications developed should keep in mind the memory and power
constraints imposed due to the nature of these systems.Before 2012, the problems related to “Perception” such as classification, object
detection were solved using algorithms with manually engineered features. However,
in recent years, instead of manually engineering the features, these features are learned
through learning algorithms. The game-changing architecture of Convolution Neural
Networks proposed in 2012 by Alex K, provided a tremendous momentum in the
direction of pushing Neural networks for perception. This thesis is an attempt to
develop a convolution neural network architecture for embedded systems, i.e. an architecture that has a small model size and competitive accuracy. Recreate state-of-the-art
architectures using fire module’s concept to reduce the model size of the
architecture. The proposed compact models are feasible for deployment on embedded
devices such as the Bluebox 2.0. Furthermore, attempts are made to integrate the
compact Convolution Neural Network with object detection pipelines.
|
450 |
Knowledge-based 3D point clouds processingTruong, Quoc Hung 15 November 2013 (has links) (PDF)
The modeling of real-world scenes through capturing 3D digital data has proven to be both useful andapplicable in a variety of industrial and surveying applications. Entire scenes are generally capturedby laser scanners and represented by large unorganized point clouds possibly along with additionalphotogrammetric data. A typical challenge in processing such point clouds and data lies in detectingand classifying objects that are present in the scene. In addition to the presence of noise, occlusionsand missing data, such tasks are often hindered by the irregularity of the capturing conditions bothwithin the same dataset and from one data set to another. Given the complexity of the underlyingproblems, recent processing approaches attempt to exploit semantic knowledge for identifying andclassifying objects. In the present thesis, we propose a novel approach that makes use of intelligentknowledge management strategies for processing of 3D point clouds as well as identifying andclassifying objects in digitized scenes. Our approach extends the use of semantic knowledge to allstages of the processing, including the guidance of the individual data-driven processing algorithms.The complete solution consists in a multi-stage iterative concept based on three factors: the modeledknowledge, the package of algorithms, and a classification engine. The goal of the present work isto select and guide algorithms following an adaptive and intelligent strategy for detecting objects inpoint clouds. Experiments with two case studies demonstrate the applicability of our approach. Thestudies were carried out on scans of the waiting area of an airport and along the tracks of a railway.In both cases the goal was to detect and identify objects within a defined area. Results show that ourapproach succeeded in identifying the objects of interest while using various data types
|
Page generated in 0.1019 seconds