Global ETD Search

381	50,000 Tiny Videos: A Large Dataset for Non-parametric Content-based Retrieval and Recognition Karpenko, Alexandre 22 September 2009 (has links) This work extends the tiny image data-mining techniques developed by Torralba et al. to videos. A large dataset of over 50,000 videos was collected from YouTube. This is the largest user-labeled research database of videos available to date. We demonstrate that a large dataset of tiny videos achieves high classification precision in a variety of content-based retrieval and recognition tasks using very simple similarity metrics. Content-based copy detection (CBCD) is evaluated on a standardized dataset, and the results are applied to related video retrieval within tiny videos. We use our similarity metrics to improve text-only video retrieval results. Finally, we apply our large labeled video dataset to various classification tasks. We show that tiny videos are better suited for classifying activities than tiny images. Furthermore, we demonstrate that classification can be improved by combining the tiny images and tiny videos datasets. computer vision video retrieval image recognition 0984
382	Facial Feature Point Detection Chen, Fang 06 December 2011 (has links) Facial feature point detection is a key issue in facial image processing. One main challenge of facial feature point detection is the variation of facial structures due to expressions. This thesis aims to explore more accurate and robust facial feature point detection algorithms, which can facilitate the research on facial image processing, in particular the facial expression analysis. This thesis introduces a facial feature point detection system, where the Multilinear Principal Component Analysis is applied to extract the highly descriptive features of facial feature points. In addition, to improve the accuracy and efficiency of the system, a skin color based face detection algorithm is studied. The experiment results have indicated that this system is effective in detecting 20 facial feature points in frontal faces with different expressions. This system has also achieved a higher accuracy during the comparison with the state-of-the-art, BoRMaN. computer vision biometrics image processing 0800 0984
383	Facial Feature Point Detection Chen, Fang 06 December 2011 (has links) Facial feature point detection is a key issue in facial image processing. One main challenge of facial feature point detection is the variation of facial structures due to expressions. This thesis aims to explore more accurate and robust facial feature point detection algorithms, which can facilitate the research on facial image processing, in particular the facial expression analysis. This thesis introduces a facial feature point detection system, where the Multilinear Principal Component Analysis is applied to extract the highly descriptive features of facial feature points. In addition, to improve the accuracy and efficiency of the system, a skin color based face detection algorithm is studied. The experiment results have indicated that this system is effective in detecting 20 facial feature points in frontal faces with different expressions. This system has also achieved a higher accuracy during the comparison with the state-of-the-art, BoRMaN. computer vision biometrics image processing 0800 0984
384	50,000 Tiny Videos: A Large Dataset for Non-parametric Content-based Retrieval and Recognition Karpenko, Alexandre 22 September 2009 (has links) This work extends the tiny image data-mining techniques developed by Torralba et al. to videos. A large dataset of over 50,000 videos was collected from YouTube. This is the largest user-labeled research database of videos available to date. We demonstrate that a large dataset of tiny videos achieves high classification precision in a variety of content-based retrieval and recognition tasks using very simple similarity metrics. Content-based copy detection (CBCD) is evaluated on a standardized dataset, and the results are applied to related video retrieval within tiny videos. We use our similarity metrics to improve text-only video retrieval results. Finally, we apply our large labeled video dataset to various classification tasks. We show that tiny videos are better suited for classifying activities than tiny images. Furthermore, we demonstrate that classification can be improved by combining the tiny images and tiny videos datasets. computer vision video retrieval image recognition 0984
385	Astrometry.net: Automatic Recognition and Calibration of Astronomical Images Lang, Dustin 03 March 2010 (has links) We present Astrometry.net, a system for automatically recognizing and astrometrically calibrating astronomical images, using the information in the image pixels alone. The system is based on the geometric hashing approach in computer vision: We use the geometric relationships between low-level features (stars and galaxies), which are relatively indistinctive, to create geometric features that are distinctive enough that we can recognize images that cover less than one-millionth of the area of the sky. The geometric features are used to generate rapidly hypotheses about the location---the pointing, scale, and rotation---of an image on the sky. Each hypothesis is then evaluated in a Bayesian decision theory framework in order to ensure that most correct hypotheses are accepted while false hypotheses are almost never accepted. The feature-matching process is accelerated by using a new fast and space-efficient kd-tree implementation. The Astrometry.net system is available via a web interface, and the software is released under an open-source license. It is being used by hundreds of individual astronomers and several large-scale projects, so we have at least partially achieved our goal of helping ``to organize, annotate and make searchable all the world's astronomical information.'' Astrometry Computer vision Astronomy 0984 0606 0800
386	Statistical Independence for classification for High Dimensional Data Bressan, Marco José Miguel 26 March 2003 (has links) No description available. Pattern Recognition Statistics Computer vision Tecnologies 68
387	Computational framework for the white point interpretation based on nameability Tous Terrades, Francesc 28 July 2006 (has links) En aquest treball presentem un marc per a l'estimació del punt blanc en imatges sota condicions no calibrades, on considerem múltiples solucions interpretades. D'aquesta manera, proposem la utilització d'una cua visual que ha estat relacionada amb la constància de color: aparellament de colors. Aquest aparellament de colors està guiat per la introducció d'informació semàntica referent al contingut de la imatge. Així doncs, introduïm informació d'alt nivell dels colors que esperem trobar en les imatges. Tenint en compte aquestes dues idees, aparellament de colors i informació semàntica, i les aproximacions computacionals a la constància de color existents, proposem un mètode d'estimació de punt blanc per condicions no calibrades que lliura múltiples solucions, en funció de diferents interpretacions dels colors d'una escena. Plantegem l'extracció de múltiples solucions ja que pot permetre extreure més informació de l'escena que els algorismes clàssics de constància de color. En aquest cas, les múltiples solucions venen ponderades pel seu grau d'aparellament dels colors amb la informació semàntica introduïda. Finalment demostrem que la solució plantejada permet reduir el conjunt de solucions possibles a un conjunt més significant, que és petit i fàcilment interpretable. El nostre estudi està emmarcat en un projecte d'anotació d'imatges que pretén obtenir descriptors que representen la imatge, en concret, els descriptors de la llum de l'escena. Definim dos contextos diferents per aquest projecte: condicions calibrades, quan coneixem alguna informació del sistema d'adquisició, i condicions no calibrades, quan no coneixem res del procés d'adquisició. Si bé ens hem centrat en el cas no calibrat, pel cas calibrat hem proposat també un mètode computacional de constància de color que introdueix l'assumpció de 'món gris' relaxada per a generar un conjunt de solucions possibles més reduït. Aquest mètode té un bon rendiment, similar al dels mètodes existents, i redueix el tamany del conjunt de solucions obtingut. / In this work we present a framework for white point estimation of images under uncalibrated conditions where multiple interpretable solutions can be considered. In this way, we propose to use the colour matching visual cue that has been proved as related to colour constancy. The colour matching process is guided by the introduction of semantic information regarding the image content. Thus, we introduce high-level information of colours we expect to find in the images. Considering these two ideas, colour matching and semantic information, and existing computational colour constancy approaches, we propose a white point estimation method for uncalibrated conditions which delivers multiple solutions according to different interpretations of the colours in a scene. However, we present the selection of multiple solutions which enables to obtain more information of the scene than existing colour constancy methods, which normally select a unique solution. In this case, the multiple solutions are weighted by the degree of colour matching between colours in the image and semantic information introduced. Finally, we prove that the feasible set of solutions can be reduced to a smaller and more significant set with a semantic interpretation. Our study is framed in a global image annotation project which aims to obtain descriptors which depict the image, in this work we focus on illuminant descriptors.We define two different sets of conditions for this project: (a) calibrated conditions, when we have some information about the acquisition process and (b) uncalibrated conditions, when we do not know the acquisition process. Although we have focused on the uncalibrated case, for calibrated conditions we also propose a colour constancy method which introduces the relaxed grey-world assumption to produce a reduced feasible set of solutions. This method delivers good performance similar to existing methods and reduces the size of the feasible set obtained. Colour constancy Colour Computer vision Tecnologies 68
388	Vision-Inertial SLAM using Natural Features in Outdoor Environments Asmar, Daniel January 2006 (has links) Simultaneous Localization and Mapping (SLAM) is a recursive probabilistic inferencing process used for robot navigation when Global Positioning Systems (GPS) are unavailable. SLAM operates by building a map of the robot environment, while concurrently localizing the robot within this map. The ultimate goal of SLAM is to operate anywhere using the environment's natural features as landmarks. Such a goal is difficult to achieve for several reasons. Firstly, different environments contain different types of natural features, each exhibiting large variance in its shape and appearance. Secondly, objects look differently from different viewpoints and it is therefore difficult to always recognize them. Thirdly, in most outdoor environments it is not possible to predict the motion of a vehicle using wheel encoders because of errors caused by slippage. Finally, the design of a SLAM system to operate in a large-scale outdoor setting is in itself a challenge. <br /><br /> The above issues are addressed as follows. Firstly, a camera is used to recognize the environmental context (e. g. , indoor office, outdoor park) by analyzing the holistic spectral content of images of the robot's surroundings. A type of feature (e. g. , trees for a park) is then chosen for SLAM that is likely observable in the recognized setting. A novel tree detection system is introduced, which is based on perceptually organizing the content of images into quasi-vertical structures and marking those structures that intersect ground level as tree trunks. Secondly, a new tree recognition system is proposed, which is based on extracting Scale Invariant Feature Transform (SIFT) features on each tree trunk region and matching trees in feature space. Thirdly, dead-reckoning is performed via an Inertial Navigation System (INS), bounded by non-holonomic constraints. INS are insensitive to slippage and varying ground conditions. Finally, the developed Computer Vision and Inertial systems are integrated within the framework of an Extended Kalman Filter into a working Vision-INS SLAM system, named VisSLAM. <br /><br /> VisSLAM is tested on data collected during a real test run in an outdoor unstructured environment. Three test scenarios are proposed, ranging from semi-automatic detection, recognition, and initialization to a fully automated SLAM system. The first two scenarios are used to verify the presented inertial and Computer Vision algorithms in the context of localization, where results indicate accurate vehicle pose estimation for the majority of its journey. The final scenario evaluates the application of the proposed systems for SLAM, where results indicate successful operation for a long portion of the vehicle journey. Although the scope of this thesis is to operate in an outdoor park setting using tree trunks as landmarks, the developed techniques lend themselves to other environments using different natural objects as landmarks. Systems Design SLAM Computer Vision field robotics
389	Simultaneous Pose and Correspondence Problem for Visual Servoing Chiu, Raymond January 2010 (has links) Pose estimation is a common problem in computer vision. The pose is the combination of the position and orientation of a particular object relative to some reference coordinate system. The pose estimation problem involves determining the pose of an object from one or multiple images of the object. This problem often arises in the area of robotics. It is necessary to determine the pose of an object before it can be manipulated by the robot. In particular, this research focuses on pose estimation for initialization of position-based visual servoing. A closely related problem is the correspondence problem. This is the problem of finding a set of features from the image of an object that can be identified as the same feature from a model of the object. Solving for pose without known corre- spondence is also refered to as the simultaneous pose and correspondence problem, and it is a lot more difficult than solving for pose with known correspondence. This thesis explores a number of methods to solve the simultaneous pose and correspondence problem, with focuses on a method called SoftPOSIT. It uses the idea that the pose is easily determined if correspondence is known. It first produces an initial guess of the pose and uses it to determine a correspondence. With the correspondence, it determines a new pose. This new pose is assumed to be a better estimate, thus a better correspondence can be determined. The process is repeated until the algorithm converges to a correspondence pose estimate. If this pose estimate is not good enough, the algorithm is restarted with a new initial guess. An improvement is made to this algorithm. An early termination condition is added to detect conditions where the algorithm is unlikely to converge towards a good pose. This leads to an reduction in the runtime by as much as 50% and improvement in the success rate of the algorithm by approximately 5%. The proposed solution is tested and compared with the RANSAC method and simulated annealing in a simulation environment. It is shown that the proposed solution has the potential for use in commercial environments for pose estimation. pose estimation computer vision Electrical and Computer Engineering
390	Novel Skeletal Representation for Articulated Creatures Brostow, Gabriel Julian 12 April 2004 (has links) This research examines an approach for capturing 3D surface and structural data of moving articulated creatures. Given the task of non-invasively and automatically capturing such data, a methodology and the associated experiments are presented, that apply to multiview videos of the subjects motion. Our thesis states: A functional structure and the timevarying surface of an articulated creature subject are contained in a sequence of its 3D data. A functional structure is one example of the possible arrangements of internal mechanisms (kinematic joints, springs, etc.) that is capable of performing the motions observed in the input data. Volumetric structures are frequently used as shape descriptors for 3D data. The capture of such data is being facilitated by developments in multi-view video and range scanning, extending to subjects that are alive and moving. In this research, we examine vision-based modeling and the related representation of moving articulated creatures using Spines. We define a Spine as a branching axial structure representing the shape and topology of a 3D objects limbs, and capturing the limbs correspondence and motion over time. The Spine concept builds on skeletal representations often used to describe the internal structure of an articulated object and the significant protrusions. Our representation of a Spine provides for enhancements over a 3D skeleton. These enhancements form temporally consistent limb hierarchies that contain correspondence information about real motion data. We present a practical implementation that approximates a Spines joint probability function to reconstruct Spines for synthetic and real subjects that move. In general, our approach combines the objectives of generalized cylinders, 3D scanning, and markerless motion capture to generate baseline models from real puppets, animals, and human subjects. Image-based modeling Motion capture Computer vision

Search results