Global ETD Search

61	Human detection and pose estimation for motion picture logging and visualisation Wu, Z. January 2012 (has links) This thesis contributes to the research area of Computer-Vision-based human motion analysis, investigates techniques associated in this area, and proposes a human motion analysis system which parses images or videos (image sequences) to estimate human poses. A human motion analysis system that combines a novel colour-to-greyscale converter, an optimised Histogram of Orientated Gradients (HOG) human body detector, and an improved Generalised Distance Transform and Orientation Maps (GDT&OM) pose estimator, is built to execute key-frame extraction. The novel colour-to-greyscale conversion method that converts RGB images to chroma-edge-enhanced greyscale images by employing density-based colour clustering and spring-system-based multidimensional scaling, is proved to be superior compared with other methods such as Color2Grey and Ren’s method. The weakness of the novel method is that it is still parameter dependent and does not perform well for some images. We make improvement on Histogram of Orientated Gradients by employing a modified training scheme and using pre-processed data, and the performance is improved by achieving similar true detection rate but much lower false detection rate, compared with the original HOG scheme. We discuss the GDT&OM method and develop the original GDT&OM human detector to a human pose estimator using results of human detection. Meanwhile we also investigate a pose estimation method based on locations and orientations of human body parts under the assumption of body parts can be accurately located. Then we integrate all methods to build a key-frame extraction system which is more intelligent than conventional approaches as it is designed to select frames representing content of videos. We finally apply our methods to build a video logging system, which automatically records actions of gymnastic videos according to the actions displayed. Both systems perform well for a small set of motion categories. However they are object-dependent systems that need users to manually select target objects, and the performance is limited by the human body detector and pose estimator. 006.4
62	Spectral representation for matching and recognition Haseeb, Muhammad January 2013 (has links) In this thesis, we aim to use the spectral graph theory to develop a framework to solve the problems of computer vision. The graph spectral methods are concerned with using the eigenvalues and eigenvectors of the adjacency matrix or closely related Laplacian matrix. In this thesis we develop four methods using spectral techniques: (1) We use a Hermitian property matrix for point pattern matching problem; (2) We use coefficients of symmetric polynomials to cluster similar human poses using the skeletal representation acquired from Microsoft Kinect; (3) We use coefficients of the elementary symmetric polynomials to make the direction of the eigenvectors of the proximity matrices consistent with each other for the problem of correspondence matching; (4) We use commute time embedding to construct a 3D shape descriptor for the purpose of 3D shape classification. In Chapter 3 we address the problem of correspondence matching. We extend the Laplacian matrix to the complex domain by constructing a Hermitian property matrix. We construct a Hermitian property matrix from the spatial locations of the 2D feature points extracted from a pair of images and the angular information associated with these feature points. We construct the Hermitian property matrix in a way that reflects the Laplacian matrix. The complex eigenvectors of the Hermitian matrix is then used to find the correspondences between pairs of points across two images. We embed the complex eigenvectors of the Hermitian property matrix in the iterative alignment EM algorithm developed by Carcassoni and Hancock to make it robust to rotation, noise and point-position jitter. Experimental results on both synthetic and real world data have been presented. Chapter 4 develops a clustering method using four different type of feature vectors constructed from the complex coefficients of the elementary symmetric polynomials. These polynomials are computed from the eigenvalues and the complex eigenvectors of a Hermitian property matrix. The feature vectors are embedded into a pattern-space using Principal Component Analysis (PCA) and Multidimensional Scaling (MDS) to cluster similar human poses acquired using the Microsoft Kinect device for Xbox 360. The Hermitian property matrix is constructed from the length of the limbs and the angles subtended by each pair of limbs using the three-dimensional skeletal data produced by the Kinect device. The given skeleton is converted to its equivalent line graph to compute the angles between pairs of limbs. The joints locations are used to compute the limb lengths. In Chapter 5, we describe a method to correct the sign of eigenvectors of the proximity matrix for the problem of correspondence matching. The signs of the eigenvectors of a proximity matrix are not unique and play an important role in computing the correspondences between a set of feature points. We use the coefficients of the elementary symmetric polynomials to make the direction of the eigenvectors of the two proximity matrices consistent with each other. Chapter 6 describes a 3D shape descriptor that is robust to changes in pose and topology. The descriptor is based on the D2 shape descriptor developed by Osada et al, which is essentially the frequency distribution of the Euclidian distance between randomly selected points on the surface of the 3D shape. We use the commute-time distance instead of using the Euclidian distance between randomly selected points. A new and completely unsupervised mesh segmentation algorithm is proposed, which is based on the commute time embedding of the mesh and k-means clustering using the embedded mesh vertices. 006.4
63	Inverse rendering of faces with a 3D morphable model Aldrian, Oswald January 2012 (has links) In this thesis, we present a complete framework to inverse render faces with a 3D Morphable Model. By decomposing the image formation process into a geometric and photometric part, we are able to state the problem as a multilinear system which can be solved accurately and efficiently. As we treat each contribution as independent, the objective function is convex in the parameters and a globally optimal solution can be found. We start by recovering 3D shape using a novel algorithm which incorporates generalisation errors of the model obtained from empirical measurements. The algorithm is extended so it can efficiently deal with mixture distributions. We then describe three methods to recover facial texture, and for the second and third, diffuse lighting, specular reflectance and camera properties from a single image. These methods make increasingly weak assumptions and can all be solved in a linear fashion. We further modify our framework so it accounts for global illumination effects. This is achieved by incorporating statistical models for ambient occlusion and bent normals into the image formation model. We show that solving for ambient occlusion and bent normal parameters as part of the fitting process improves the accuracy of the estimated texture map and illumination environment. We present results on challenging data, rendered under complex natural illumination with both specular reflectance and occlusion of the illumination environment. We evaluate our findings on publicly available datasets, where we are able to obtain state-of-the-art results. Finally, we present a practical method to synthesise a larger population from a small training-set and show how the new instances can be used to build a flexible PCA model. 006.4
64	Graph generative models from information theory Han, Lin January 2012 (has links) Generative models are commonly used in statistical pattern recognition to describe the probability distributions of patterns in a vector space. In recent years, sustained by the wide range of mathematical tools available in vector space, many algorithms for constructing generative models have been developed. Compared with the advanced development of the generative model for vectors, the development of a generative model for graphs has had less progress. In this thesis, we aim to solve the problem of constructing the generative model for graphs using information theory. Given a set of sample graphs, the generative model for the graphs we aim to construct should be able to not only capture the structural variation of the sample graphs, but to also allow new graphs which share similar properties with the original graphs to be generated. In this thesis, we pose the problem of constructing a generative model for graphs as that of constructing a supergraph structure for the graphs. In Chapter 3, we describe a method of constructing a supergraph-based generative model given a set of sample graphs. By adopting the a posteriori probability developed in a graph matching problem, we obtain a probabilistic framework which measures the likelihood of the sample graphs, given the structure of the supergraph and the correspondence information between the nodes of the sample graphs and those of the supergraph. The supergraph we aim to obtain is one which maximizes the likelihood of the sample graphs. The supergraph is represented here by its adjacency matrix, and we develop a variant of the EM algorithm to locate the adjacency matrix that maximizes the likelihood of the sample graphs. Experimental evaluations demonstrate that the constructed supergraph performs well on classifying graphs. In Chapter 4, we aim to develop graph characterizations that can be used to measure the complexity of graphs. The first graph characterization developed is the von Neumann entropy of a graph associated with its normalized Laplacian matrix. This graph characterization is defined by the eigenvalues of the normalized Laplacian matrix, therefore it is a member of the graph invariant characterization. By applying some transformations, we also develop a simplified form of the von Neumann entropy, which can be expressed in terms of the node degree statistics of the graphs. Experimental results reveal that effectiveness of the two graph characterizations. Our third contribution is presented in Chapter 5, where we use the graph characterization developed in Chapter 4 to measure the supergraph complexity and we develop a novel framework for learning a supergraph using the minimum description length criterion. We combine the Jensen-Shanon kernel with our supergraph construction and this provides us with a way of measuring graph similarity. Moreover, we also develop a method of sampling new graphs from the supergraph. The supergraph we present in this chapter is a generative model which can fulfil the tasks of graph classification, graph clustering, and of generating new graphs. We experiment with both the COIL and “Toy” datasets to illustrate the utility of our generative model. Finally, in Chapter 6, we propose a method of selecting prototype graphs of the most appropriate size from candidate prototypes. The method works by partitioning the sample graphs into two parts and approximating their hypothesis space using the partition functions. From the partition functions, the mutual information between the two sets is defined. The prototype which gives the highest mutual information is selected. 006.4
65	High-level activity learning and recognition in structured environments Greenall, John Patrick January 2012 (has links) Automatic recognition of events in video is an immensly challenging problem. If solved, the number of potential domains in which such a system could be deployed is vast and growing; including traffic monitoring, surveillance, security, elderly care and semantic video search to name but a few. Much prior research in the area has focused on producing a solution that is tailored towards one of these applications, applying methods which are most appropriate given the constraints of the target domain. For the moment, this remains to some extent the only practical way to approach the problem. The aim in this thesis is to build a high-level framework for event recognition which is in the main generic and widely transferrable, yet allows domain-appropriate elements to be incorporated. A detector is constructed for low-level events which is based on dense extraction of Histograms of Optical Flow. This descriptor has only recently been adopted by the event detection community, and as such there are aspects of the features which have not been optimized. This thesis performs extensive experimentation on normalization scheme and finds that the strategy most widely in use is suboptimal compared to one of the alternatives proposed. The detector is then trained on a challenging real world domain to run in a sliding window fashion on continuous video input. A high level model which exploits temporal relations between different event types is constructed. The model is designed with transferrability and computational tractability in mind. Several methods are benchmarked for learning the distributions over time differences between pairs of events. Three different connection strategies are proposed and evaluated for creating a tree structured prior that permits fast, exact inference. An efficient iterative optimization scheme is presented for handling scenarios which contain unknown numbers of event instances. Finally, the model is extended in a Conditional Random Field framework that allows weights to be learned to balance the response from independent detectors with the pairwise temporal relationships. 006.4
66	Intelligent interface agents for biometric applications Mavity, Nick Jeremy January 2005 (has links) No description available. 006.4
67	Gait recognition based on 2D and 3D imaging features Zulcaffle, Tengku Mohd Afendi January 2016 (has links) This thesis focuses on person identification using gait features. The gait features applied in this thesis are acquired from both 2D RGB and 3D Time of Flight (ToF) camera systems. The research has three main parts: (i) lateral view gait period estimation using a single RGB camera system; (ii) the development of a foundational research framework and novel features for frontal view gait recognition using a ToF camera system; and (iii) the development of a novel classification method using the proposed 3D depth features. In the lateral view gait period estimation algorithm, a new gait cycle feature and minimum and maximum point detection methods are proposed. From the experimental results, the proposed method outperforms the previous features and methods in the literature. The second part of the research deals with the development of a novel framework for frontal view gait recognition using a 3D ToF camera. The 3D framework involves: the development of a new dataset of 3D gait image sequence acquired from a frontal view ToF camera system; a new human silhouette extraction algorithm; a frames selection method based on a new gait cycle detection algorithm; and eight gait depth image representations. Overall, the experimental results show that the proposed gait depth image representations produce better results than the previous methods. In the third part of the research, a novel classification method is proposed based on the above gait depth image representations. All the proposed classification method enhances the novel gait depth image representations and outperforms its counterparts. It can be concluded that the proposed method based on the depth information acquired from the Time of Flight camera is suitable for the short period of time 006.4
68	Visual-only person and word recognition : from lip motion dynamics Brown, Paul C. January 2017 (has links) The thesis presents novel contributions to the use of lip motion dynamics as a standalone modality for robust person identification and word recognition. The novel contributions target the key areas of visual feature extraction, video temporal dynamics and training. The novel feature contribution applies the magnitude spectra of the two-dimensional Fast Fourier transform (Mag-2D-FFT) as a robust visual feature by virtue of its phase invariance. It outperforms benchmark two-dimensional Discrete Cosine Transform (2D-DCT), two-dimensional Discrete Wavelet Transform (2D-DWT) and multi-channel Gabor image-based techniques. It delivers over 3% person identification improvement on the CMU-PIE, VidTIMIT and XM2VTS Audio-visual databases, and up to 22% relative improvement in visual-ohly word recognition on the GRID Corpus. The novel temporal dynamics uses the Longest Matching Segment (LMS) method to encode full video dynamics of a training video, delivering comparable person identification on a Vector Quantization (VQ) model with full face recognition when combined with a dynamic version of the novel feature set, and over 7% word recognition accuracy on a Hidden Markov Model (HMM). The training contribution combines Gabor feature compression based on a modulus response with a novel formulation of video-based spatial sub-banding using the Posterior Union Model (VPUM) to tackle weakly constrained face recognition of partially occluded and multi-view images as a prelude to a lip-only application. 006.4
69	Speech scrambling & synchronization French, R. C. January 1973 (has links) No description available. 006.4
70	Speech scrambling and synchronization French, R. C. January 1973 (has links) No description available. 006.4

Search results