451 |
Optimierung von Algorithmen zur Videoanalyse / Optimization of algorithms for video analysis : A framework to fit the demands of local television stationsRitter, Marc 02 February 2015 (has links) (PDF)
Die Datenbestände lokaler Fernsehsender umfassen oftmals mehrere zehntausend Videokassetten. Moderne Verfahren werden benötigt, um derartige Datenkollektionen inhaltlich automatisiert zu erschließen. Das Auffinden relevanter Objekte spielt dabei eine übergeordnete Rolle, wobei gesteigerte Anforderungen wie niedrige Fehler- und hohe Detektionsraten notwendig sind, um eine Korruption des Suchindex zu verhindern und erfolgreiche Recherchen zu ermöglichen. Zugleich müssen genügend Objekte indiziert werden, um Aussagen über den tatsächlichen Inhalt zu treffen.
Diese Arbeit befasst sich mit der Anpassung und Optimierung bestehender Detektionsverfahren. Dazu wird ein auf die hohen Leistungsbedürfnisse der Videoanalyse zugeschnittenes holistisches Workflow- und Prozesssystem mit der Zielstellung implementiert, die Entwicklung von Bilderkennungsalgorithmen, die Visualisierung von Zwischenschritten sowie deren Evaluation zu ermöglichen. Im Fokus stehen Verfahren zur strukturellen Zerlegung von Videomaterialien und zur inhaltlichen Analyse im Bereich der Gesichtsdetektion und Fußgängererkennung. / The data collections of local television stations often consist of multiples of ten thousand video tapes. Modern methods are needed to exploit the content of such archives. While the retrieval of objects plays a fundamental role, essential requirements incorporate low false and high detection rates in order to prevent the corruption of the search index. However, a sufficient number of objects need to be found to make assumptions about the content explored.
This work focuses on the adjustment and optimization of existing detection techniques. Therefor, the author develops a holistic framework that directly reflects on the high demands of video analysis with the aim to facilitate the development of image processing algorithms, the visualization of intermediate results, and their evaluation and optimization. The effectiveness of the system is demonstrated on the structural decomposition of video footage and on content-based detection of faces and pedestrians.
|
452 |
基於方向性邊緣特徵之即時物件偵測與追蹤 / Real-Time Object Detection and Tracking using Directional Edge Maps王財得, Wang, Tsai-Te Unknown Date (has links)
在電腦視覺的研究之中,有關物件的偵測與追蹤應用在速度及可靠性上的追求一直是相當具有挑戰性的問題,而現階段發展以視覺為基礎互動式的應用,所使用到技術諸如:類神經網路、SVM及貝氏網路等。
本論文中我們持續深入此領域,並提出及發展一個方向性邊緣特徵集(DEM)與修正後的AdaBoost訓練演算法相互結合,期能有效提高物件偵測與識別的速度及準確性,在實際驗證中,我們將之應用於多種角度之人臉偵測,以及臉部表情識別等兩個主要問題之上;在人臉偵測的應用中,我們使用CMU的臉部資料庫並與Viola & Jones方法進行分析比較,在準確率上,我們的方法擁有79% 的recall及90% 的precision,而Viola & ones的方法則分別為81%及77%;在運算速度上,同樣處理512x384的影像,相較於Viola & Jones需時132ms,我們提出的方法則有較佳的82ms。
此外,於表情識別的應用中,我們結合運用Component-based及Action-unit model 兩種方法。前者的優勢在於提供臉部細節特徵的定位及追蹤變化,後者主要功用則為進行情緒表情的分類。我們對於四種不同情緒表情的辨識準確度如下:高興(83.6%)、傷心(72.7%)、驚訝(80%) 、生氣(78.1%)。在實驗中,可以發現生氣及傷心兩種情緒較難區分,而高興與驚訝則較易識別。 / Rapid and robust detection and tracking of objects is a challenging problem in computer vision research. Techniques such as artificial neural networks, support vector machine and Bayesian networks have been developed to enable interactive vision-based applications. In this thesis, we tackle this issue by devising a novel feature descriptor named directional edge maps (DEM). When combined with a modified AdaBoost training algorithm, the proposed descriptor can produce effective results in many object detection and recognition tasks.
We have applied the newly developed method to two important object recognition problems, namely, face detection and facial expression recognition. The DEM-based methodology conceived in this thesis is capable of detecting faces of multiple views. To test the efficacy of our face detection mechanism, we have performed a comparative analysis with the Viola and Jones algorithm using Carnegie Mellon University face database. The recall and precision using our approach is 79% and 90%, respectively, compared to 81% and 77% using Viola and Jones algorithm. Our algorithm is also more efficient, requiring only 82 ms (compared to 132 ms by Viola and Jones) for processing a 512x384 image.
To achieve robust facial expression recognition, we have combined component-based methods and action-unit model-based approaches. The component-based method is mainly utilized to locate important facial features and track their deformations. Action-unit model-based approach is then employed to carry out expression recognition. The accuracy of classifying different emotion type is as follows: happiness 83.6%, sadness 72.7%, surprise 80%, and anger 78.1%. It turns out that anger and sadness are more difficult to distinguish, whereas happiness and surprise expression have higher recognition rates.
|
453 |
Table tennis event detection and classificationOldham, Kevin M. January 2015 (has links)
It is well understood that multiple video cameras and computer vision (CV) technology can be used in sport for match officiating, statistics and player performance analysis. A review of the literature reveals a number of existing solutions, both commercial and theoretical, within this domain. However, these solutions are expensive and often complex in their installation. The hypothesis for this research states that by considering only changes in ball motion, automatic event classification is achievable with low-cost monocular video recording devices, without the need for 3-dimensional (3D) positional ball data and representation. The focus of this research is a rigorous empirical study of low cost single consumer-grade video camera solutions applied to table tennis, confirming that monocular CV based detected ball location data contains sufficient information to enable key match-play events to be recognised and measured. In total a library of 276 event-based video sequences, using a range of recording hardware, were produced for this research. The research has four key considerations: i) an investigation into an effective recording environment with minimum configuration and calibration, ii) the selection and optimisation of a CV algorithm to detect the ball from the resulting single source video data, iii) validation of the accuracy of the 2-dimensional (2D) CV data for motion change detection, and iv) the data requirements and processing techniques necessary to automatically detect changes in ball motion and match those to match-play events. Throughout the thesis, table tennis has been chosen as the example sport for observational and experimental analysis since it offers a number of specific CV challenges due to the relatively high ball speed (in excess of 100kph) and small ball size (40mm in diameter). Furthermore, the inherent rules of table tennis show potential for a monocular based event classification vision system. As the initial stage, a proposed optimum location and configuration of the single camera is defined. Next, the selection of a CV algorithm is critical in obtaining usable ball motion data. It is shown in this research that segmentation processes vary in their ball detection capabilities and location out-puts, which ultimately affects the ability of automated event detection and decision making solutions. Therefore, a comparison of CV algorithms is necessary to establish confidence in the accuracy of the derived location of the ball. As part of the research, a CV software environment has been developed to allow robust, repeatable and direct comparisons between different CV algorithms. An event based method of evaluating the success of a CV algorithm is proposed. Comparison of CV algorithms is made against the novel Efficacy Metric Set (EMS), producing a measurable Relative Efficacy Index (REI). Within the context of this low cost, single camera ball trajectory and event investigation, experimental results provided show that the Horn-Schunck Optical Flow algorithm, with a REI of 163.5 is the most successful method when compared to a discrete selection of CV detection and extraction techniques gathered from the literature review. Furthermore, evidence based data from the REI also suggests switching to the Canny edge detector (a REI of 186.4) for segmentation of the ball when in close proximity to the net. In addition to and in support of the data generated from the CV software environment, a novel method is presented for producing simultaneous data from 3D marker based recordings, reduced to 2D and compared directly to the CV output to establish comparative time-resolved data for the ball location. It is proposed here that a continuous scale factor, based on the known dimensions of the ball, is incorporated at every frame. Using this method, comparison results show a mean accuracy of 3.01mm when applied to a selection of nineteen video sequences and events. This tolerance is within 10% of the diameter of the ball and accountable by the limits of image resolution. Further experimental results demonstrate the ability to identify a number of match-play events from a monocular image sequence using a combination of the suggested optimum algorithm and ball motion analysis methods. The results show a promising application of 2D based CV processing to match-play event classification with an overall success rate of 95.9%. The majority of failures occur when the ball, during returns and services, is partially occluded by either the player or racket, due to the inherent problem of using a monocular recording device. Finally, the thesis proposes further research and extensions for developing and implementing monocular based CV processing of motion based event analysis and classification in a wider range of applications.
|
454 |
Weakly supervised learning of deformable part models and convolutional neural networks for object detection / Détection d'objets faiblement supervisée par modèles de pièces déformables et réseaux de neurones convolutionnelsTang, Yuxing 14 December 2016 (has links)
Dans cette thèse, nous nous intéressons au problème de la détection d’objets faiblement supervisée. Le but est de reconnaître et de localiser des objets dans les images, n’ayant à notre disposition durant la phase d’apprentissage que des images partiellement annotées au niveau des objets. Pour cela, nous avons proposé deux méthodes basées sur des modèles différents. Pour la première méthode, nous avons proposé une amélioration de l’approche ”Deformable Part-based Models” (DPM) faiblement supervisée, en insistant sur l’importance de la position et de la taille du filtre racine initial spécifique à la classe. Tout d’abord, un ensemble de candidats est calculé, ceux-ci représentant les positions possibles de l’objet pour le filtre racine initial, en se basant sur une mesure générique d’objectness (par region proposals) pour combiner les régions les plus saillantes et potentiellement de bonne qualité. Ensuite, nous avons proposé l’apprentissage du label des classes latentes de chaque candidat comme un problème de classification binaire, en entrainant des classifieurs spécifiques pour chaque catégorie afin de prédire si les candidats sont potentiellement des objets cible ou non. De plus, nous avons amélioré la détection en incorporant l’information contextuelle à partir des scores de classification de l’image. Enfin, nous avons élaboré une procédure de post-traitement permettant d’élargir et de contracter les régions fournies par le DPM afin de les adapter efficacement à la taille de l’objet, augmentant ainsi la précision finale de la détection. Pour la seconde approche, nous avons étudié dans quelle mesure l’information tirée des objets similaires d’un point de vue visuel et sémantique pouvait être utilisée pour transformer un classifieur d’images en détecteur d’objets d’une manière semi-supervisée sur un large ensemble de données, pour lequel seul un sous-ensemble des catégories d’objets est annoté avec des boîtes englobantes nécessaires pour l’apprentissage des détecteurs. Nous avons proposé de transformer des classifieurs d’images basés sur des réseaux convolutionnels profonds (Deep CNN) en détecteurs d’objets en modélisant les différences entre les deux en considérant des catégories disposant à la fois de l’annotation au niveau de l’image globale et l’annotation au niveau des boîtes englobantes. Cette information de différence est ensuite transférée aux catégories sans annotation au niveau des boîtes englobantes, permettant ainsi la conversion de classifieurs d’images en détecteurs d’objets. Nos approches ont été évaluées sur plusieurs jeux de données tels que PASCAL VOC, ImageNet ILSVRC et Microsoft COCO. Ces expérimentations ont démontré que nos approches permettent d’obtenir des résultats comparables à ceux de l’état de l’art et qu’une amélioration significative a pu être obtenue par rapport à des méthodes récentes de détection d’objets faiblement supervisées. / In this dissertation we address the problem of weakly supervised object detection, wherein the goal is to recognize and localize objects in weakly-labeled images where object-level annotations are incomplete during training. To this end, we propose two methods which learn two different models for the objects of interest. In our first method, we propose a model enhancing the weakly supervised Deformable Part-based Models (DPMs) by emphasizing the importance of location and size of the initial class-specific root filter. We first compute a candidate pool that represents the potential locations of the object as this root filter estimate, by exploring the generic objectness measurement (region proposals) to combine the most salient regions and “good” region proposals. We then propose learning of the latent class label of each candidate window as a binary classification problem, by training category-specific classifiers used to coarsely classify a candidate window into either a target object or a non-target class. Furthermore, we improve detection by incorporating the contextual information from image classification scores. Finally, we design a flexible enlarging-and-shrinking post-processing procedure to modify the DPMs outputs, which can effectively match the approximate object aspect ratios and further improve final accuracy. Second, we investigate how knowledge about object similarities from both visual and semantic domains can be transferred to adapt an image classifier to an object detector in a semi-supervised setting on a large-scale database, where a subset of object categories are annotated with bounding boxes. We propose to transform deep Convolutional Neural Networks (CNN)-based image-level classifiers into object detectors by modeling the differences between the two on categories with both image-level and bounding box annotations, and transferring this information to convert classifiers to detectors for categories without bounding box annotations. We have evaluated both our approaches extensively on several challenging detection benchmarks, e.g. , PASCAL VOC, ImageNet ILSVRC and Microsoft COCO. Both our approaches compare favorably to the state-of-the-art and show significant improvement over several other recent weakly supervised detection methods.
|
455 |
Compressed Convolutional Neural Network for Autonomous SystemsDurvesh Pathak (5931110) 17 January 2019 (has links)
The word “Perception” seems to be intuitive and maybe the most straightforward
problem for the human brain because as a child we have been trained to classify
images, detect objects, but for computers, it can be a daunting task. Giving intuition
and reasoning to a computer which has mere capabilities to accept commands
and process those commands is a big challenge. However, recent leaps in hardware
development, sophisticated software frameworks, and mathematical techniques have
made it a little less daunting if not easy. There are various applications built around
to the concept of “Perception”. These applications require substantial computational
resources, expensive hardware, and some sophisticated software frameworks. Building
an application for perception for the embedded system is an entirely different
ballgame. Embedded system is a culmination of hardware, software and peripherals
developed for specific tasks with imposed constraints on memory and power.
Therefore, the applications developed should keep in mind the memory and power
constraints imposed due to the nature of these systems.Before 2012, the problems related to “Perception” such as classification, object
detection were solved using algorithms with manually engineered features. However,
in recent years, instead of manually engineering the features, these features are learned
through learning algorithms. The game-changing architecture of Convolution Neural
Networks proposed in 2012 by Alex K, provided a tremendous momentum in the
direction of pushing Neural networks for perception. This thesis is an attempt to
develop a convolution neural network architecture for embedded systems, i.e. an architecture that has a small model size and competitive accuracy. Recreate state-of-the-art
architectures using fire module’s concept to reduce the model size of the
architecture. The proposed compact models are feasible for deployment on embedded
devices such as the Bluebox 2.0. Furthermore, attempts are made to integrate the
compact Convolution Neural Network with object detection pipelines.
|
456 |
Knowledge-based 3D point clouds processingTruong, Quoc Hung 15 November 2013 (has links) (PDF)
The modeling of real-world scenes through capturing 3D digital data has proven to be both useful andapplicable in a variety of industrial and surveying applications. Entire scenes are generally capturedby laser scanners and represented by large unorganized point clouds possibly along with additionalphotogrammetric data. A typical challenge in processing such point clouds and data lies in detectingand classifying objects that are present in the scene. In addition to the presence of noise, occlusionsand missing data, such tasks are often hindered by the irregularity of the capturing conditions bothwithin the same dataset and from one data set to another. Given the complexity of the underlyingproblems, recent processing approaches attempt to exploit semantic knowledge for identifying andclassifying objects. In the present thesis, we propose a novel approach that makes use of intelligentknowledge management strategies for processing of 3D point clouds as well as identifying andclassifying objects in digitized scenes. Our approach extends the use of semantic knowledge to allstages of the processing, including the guidance of the individual data-driven processing algorithms.The complete solution consists in a multi-stage iterative concept based on three factors: the modeledknowledge, the package of algorithms, and a classification engine. The goal of the present work isto select and guide algorithms following an adaptive and intelligent strategy for detecting objects inpoint clouds. Experiments with two case studies demonstrate the applicability of our approach. Thestudies were carried out on scans of the waiting area of an airport and along the tracks of a railway.In both cases the goal was to detect and identify objects within a defined area. Results show that ourapproach succeeded in identifying the objects of interest while using various data types
|
457 |
Robust visual detection and tracking of complex objects : applications to space autonomous rendez-vous and proximity operationsPetit, Antoine 19 December 2013 (has links) (PDF)
In this thesis, we address the issue of fully localizing a known object through computer vision, using a monocular camera, what is a central problem in robotics. A particular attention is here paid on space robotics applications, with the aims of providing a unified visual localization system for autonomous navigation purposes for space rendezvous and proximity operations. Two main challenges of the problem are tackled: initially detecting the targeted object and then tracking it frame-by-frame, providing the complete pose between the camera and the object, knowing the 3D CAD model of the object. For detection, the pose estimation process is based on the segmentation of the moving object and on an efficient probabilistic edge-based matching and alignment procedure of a set of synthetic views of the object with a sequence of initial images. For the tracking phase, pose estimation is handled through a 3D model-based tracking algorithm, for which we propose three different types of visual features, pertinently representing the object with its edges, its silhouette and with a set of interest points. The reliability of the localization process is evaluated by propagating the uncertainty from the errors of the visual features. This uncertainty besides feeds a linear Kalman filter on the camera velocity parameters. Qualitative and quantitative experiments have been performed on various synthetic and real data, with challenging imaging conditions, showing the efficiency and the benefits of the different contributions, and their compliance with space rendezvous applications.
|
458 |
An intuitive motion-based input model for mobile devicesRichards, Mark Andrew January 2006 (has links)
Traditional methods of input on mobile devices are cumbersome and difficult to use. Devices have become smaller, while their operating systems have become more complex, to the extent that they are approaching the level of functionality found on desktop computer operating systems. The buttons and toggle-sticks currently employed by mobile devices are a relatively poor replacement for the keyboard and mouse style user interfaces used on their desktop computer counterparts. For example, when looking at a screen image on a device, we should be able to move the device to the left to indicate we wish the image to be panned in the same direction.
This research investigates a new input model based on the natural hand motions and reactions of users. The model developed by this work uses the generic embedded video cameras available on almost all current-generation mobile devices to determine how the device is being moved and maps this movement to an appropriate action.
Surveys using mobile devices were undertaken to determine both the appropriateness and efficacy of such a model as well as to collect the foundational data with which to build the model. Direct mappings between motions and inputs were achieved by analysing users' motions and reactions in response to different tasks.
Upon the framework being completed, a proof of concept was created upon the Windows Mobile Platform. This proof of concept leverages both DirectShow and Direct3D to track objects in the video stream, maps these objects to a three-dimensional plane, and determines device movements from this data.
This input model holds the promise of being a simpler and more intuitive method for users to interact with their mobile devices, and has the added advantage that no hardware additions or modifications are required the existing mobile devices.
|
459 |
Novel image processing algorithms and methods for improving their robustness and operational performanceRomanenko, Ilya January 2014 (has links)
Image processing algorithms have developed rapidly in recent years. Imaging functions are becoming more common in electronic devices, demanding better image quality, and more robust image capture in challenging conditions. Increasingly more complicated algorithms are being developed in order to achieve better signal to noise characteristics, more accurate colours, and wider dynamic range, in order to approach the human visual system performance levels.
|
460 |
Robot Goalkeeper : A robotic goalkeeper based on machine vision and motor controlAdeboye, Taiyelolu January 2018 (has links)
This report shows a robust and efficient implementation of a speed-optimized algorithm for object recognition, 3D real world location and tracking in real time. It details a design that was focused on detecting and following objects in flight as applied to a football in motion. An overall goal of the design was to develop a system capable of recognizing an object and its present and near future location while also actuating a robotic arm in response to the motion of the ball in flight. The implementation made use of image processing functions in C++, NVIDIA Jetson TX1, Sterolabs’ ZED stereoscopic camera setup in connection to an embedded system controller for the robot arm. The image processing was done with a textured background and the 3D location coordinates were applied to the correction of a Kalman filter model that was used for estimating and predicting the ball location. A capture and processing speed of 59.4 frames per second was obtained with good accuracy in depth detection while the ball was well tracked in the tests carried out.
|
Page generated in 0.3452 seconds