• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 43
  • 35
  • 1
  • Tagged with
  • 208
  • 34
  • 33
  • 27
  • 19
  • 17
  • 17
  • 16
  • 13
  • 13
  • 12
  • 12
  • 11
  • 11
  • 11
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.

Contrastive topographic models

Osindero, Simon Kayode January 2004 (has links)
No description available.

Multi-objective genetic programming optimal search for feature extraction

Zhang, Yang January 2006 (has links)
No description available.

Optimising multimodal fusion for biometric identification systems

John George, Jacqueline January 2004 (has links)
No description available.

Tone mapping for high dynamic range images

Duan, Jiang January 2006 (has links)
No description available.

Biologically inspired speaker verification

Tashan, T. January 2012 (has links)
Speaker verification is an active research problem that has been addressed using a variety of different classification techniques. However, in general, methods inspired by the human auditory system tend to show better verification performance than other methods. In this thesis three biologically inspired speaker verification algorithms are presented. The first is a vowel-dependent speaker verification method that uses a modified Self Organising Map (SOM) algorithm. For each speaker, a seeded SOM is trained to produce representative Discrete Fourier Transform (DFT) models of three vowels from a spoken input using positive samples only. This SOM training is performed both during a registration phase and during each subsequent verification attempt. Speaker verification is achieved by computing the Euclidean distance between the registration and verification SOM trained weight sets. An analysis of the comparative system performance when using DFT input vectors, as well as Linear Prediction Code (LPC) spectrum and Mel Frequency Cepstrum Coefficients (MFCC) alternative input features indicates that the DFT spectrum outperforms both MFCC and LPC features. The algorithm was evaluated using 50 speakers from the Centre for Spoken Language Understanding (CSLU2002) speaker verification database. The second method consists of two neural network stages. The first stage is the modified SOM which now operates as a vowel clustering stage that filters the input speech data and separates it into three sets of vowel information. The second stage then contains three Multi Layer Perceptron (MLP) networks; each acting as a distinct vowel verifier. Adding this second stage allows the use of negative sample training. The input of each MLP network is the respective filtered output vowel data from the first stage. The DFT spectrum is again used as the input feature vector due to its optimal performance in the first algorithm. The overall system was evaluated using the same dataset as used in the first algorithm, showing improved verification performance when compared to the algorithm without using the MLP stage. The third biologically plausible method is a speaker verification algorithm that uses a positive-sample-only trained self organising map composed of spiking neurons. The architecture of the system is inspired by the biomechanical mechanism of the human auditory system which converts speech into electrical spikes inside the cochlea. A spike-based rank order coding input feature vector is proposed that is designed to be representative of the real biological spike trains found within the human auditory nerve. The Spiking Self Organising Map (SSOM) updates its winner neuron only when its activity exceeds a specified threshold. The algorithm is evaluated using the same 50 speaker dataset from the CSLU2002 speaker verification database and the results indicate that the SSOM verification performance is comparable to the non-spike based SOM. Finally, a new speech detection technique to detect speech activity within speech signals is also proposed. This novel technique uses the linear correlation coefficient (Parson Coefficient). The correlation is calculated in the frequency domain between neighbouring frames of DFT spectrum feature vectors. By summing the correlation coefficients within a sliding window over time, a correlation envelope is produced, which can be used to identify speech activity. The proposed technique is compared with a conventional energy frame analysis method and shows greater robustness against changes in speech volume level. A comparison of the two techniques, in terms of speaker verification application performance, is presented in Appendix A using 240 speech waveforms from the CSLU2002 speaker verification database.

Mathematical techniques for shape modelling in computer graphics : a distance-based approach

Tsoubelis, Dimitrios January 1995 (has links)
This research is concerned with shape modelling in computer graphics. The dissertation provides a review of the main research topics and developments in shape modelling and discusses current visualisation techniques required for the display of the models produced. In computer graphics surfaces are normally defined using analytic functions. Geometry however, supplies many shapes without providing their analytic descriptions. These are defined implicitly through fundamental relationships between primitive geometrical objects. Transferring this approach in computer graphics, opens new directions in shape modelling by enabling the definition of new objects or supplying a rigorous alternative to analytical definitions of objects with complex analytical descriptions. We review, in this dissertation, relevant works in the area of implicit modelling. Based on our observations on the shortcomings of these works, we develop an implicit modelling approach which draws on a seminal technique in this area: the distance based object definition. We investigate the principles, potential and applications of this technique both in conceptual terms (modelling aspects) and on technical merit (visualisation issues). This is the context of this PhD research. The conceptual and technological frameworks developed are presented in terms of a comprehensive investigation of an object's constituent primitives and modelling constraints on the one hand, and software visualisation platforms on the other. Finally, we adopt a critical perspective of our work to discuss possible directions for further improvements and exploitation for the modelling approach we have developed.

View synthesis for depth from motion 3D X-ray imaging

Liu, Yong January 2009 (has links)
The depth from motion or kinetic depth X-ray imaging (KDEX) technique is designed to enhance the luggage screening at airport checkpoints. The technique requires multiple views of the luggage to be obtained from an arrangement of linear X-ray detector arrays. This research investigated a solution to the unique problems defined when considering the possibility of replacing some of the X-ray sensor views with synthetic images. If sufficiently high quality synthetic images can be generated then intermediary X-ray sensors can be removed to minimise the hardware requirements and improve the commercial viability of the KDEX technique. Existing image synthesis algorithms are developed for visible light images. Due to fundamental differences between visible light and X-ray images, those algorithms are not directly applicable to the X-ray scenario. The conditions imposed by the X-ray images have instigated the original research and novel algorithm development and experimentation that form the body of this work. A voting based dual criteria multiple X-ray images synthesis algorithm (V-DMX) is proposed to exploit the potential of two matching criteria and information contained in a sequence of images. The V-DMX algorithm is divided into four stages. The first stage is to aggregate matching cost among input images. Subsequently, a novel voting approach is developed for electing the 'best' disparity prior to generation of synthetic pixels. A void filling routine is applied to complete the synthetic image generation. A series of experiments, using real acquired images, investigated the fidelity of the synthesised images resulting from application of the V-DMX algorithm as a function of several parameters: number of input images, matching criterion, method of handling multiple images and X-ray beam separation. The performance measure is based on counting the number of pixel errors in the synthetic images relative to the ground truth images. The V-DMX employs the widely adopted sum of squared differences (SSD) criterion and a novel criterion, which is derived from the laminography technique, termed laminography intensity (LamI). SSD is shown experimentally to have poor performance when the image contains repeating features, discontinuities and overlapping regions. While the overall performance of the LamI is found to be weaker than SSD, LamI consistently outperformed SSD in discontinuity and overlapping regions. This has spurred the use of LamI as a complement to SSD. Integration of the two criteria has demonstrably produced better results than using solely either of the criteria. Limitations of the algorithm are assessed by increasing the angular separation between X-ray beams used to produce the perspective X-ray images. The resultant image fidelity degraded as the angular separation increases. This result was expected because the increase in angular separation meant a concomitant increase in images' dissimilarity and disparity window. Empirical evidence demonstrated that synthetic images may be satisfactorily produced by processing images produced by X-ray beams separated by angular increments up to 6º. This result is based on comparing the algorithm performance for four beam separations, which are 4°, 6°, 8° and 10°. This finding reveals that, for example, a 32-view X-ray scanner with 1° beam separation may be scaled down to a 7-view system with at least the same angular coverage. The encouraging result has formed a basis for further research to extend the current algorithmic approach to the use of dual-energy X-ray data. The practical performance of the algorithm will be evaluated by conducting human factors investigation in collaboration with the US Department of Homeland Security.

Driver assistance using automated symbol and text recognition

Greenhalgh, Jack January 2015 (has links)
This thesis introduces several novel methods for the detection and recognition of both text and symbols in road signs and road markings. Firstly, a method for the automatic detection and recognition of symbol-based road signs is presented. This algorithm detects candidate regions as maximally stable extremal regions (MSER), due to their robustness to lighting variations. Candidate regions are verified and classified during a recognition stage, which uses a cascade of Random Forests trained on histogram of oriented gradient (HOG) features. All training data used in this process is synthetically generated from template images available from an online database, eliminating the need for real footage data. The method retains a high accuracy, even at high vehicle speeds, and can operate under a range of weather conditions. The algorithm runs in real-time, at a processing rate of 20 frames per second, and recognises all road signs currently in use in the UK. Comparative results are provided to validate the performance. Secondly, a method is proposed for the automatic detection and recognition of text in road Signs. Search regions for road sign candidates are defined through exploitation of scene structure. A large number of candidate regions are located through a combination of MSER and hue, saturation, value (HSV) thresholding, which are then reduced through the analysis of temporal and structural features. The recognition stage of the algorithm then aims to interpret the text contained within the candidate regions. Text characters are first detected as MSERs, which are then grouped into lines and interpreted using optical character recognition (OCR). Temporal fusion is applied to the text results across consecutive frames, which vastly improves performance. Comparative analysis is provided to validate the performance of the method, and an overall F-measure of 0.87 is achieved. Finally, a method for the automatic detection and recognition of symbols and text painted on the road surface is presented. Candidates for symbols and text characters are detected in an inverse perspective mapping (IPM) transformed version of the frame, to remove the effect of perspective distortion. Detected candidate regions are then divided into symbols and words, so that they can be recognised using scparate classification stages. Temporal fusion is applied to both words and symbols in order to improve performance. The performance of the proposed method is validated using a challenging dataset of videos, and provides overall F-measures of 0.85 and 0.91 for text characters and symbols, respectively.

Own-group biases in face and voice recognition : perceptual and social-cognitive influences

Cooper, R. E. January 2015 (has links)
Own-race faces are generally recognised more accurately than other-race faces (Meissner & Brigham, 2001). Two major theories attempt to explain the own-race bias (ORB) and similar own-group biases; social-cognitive and perceptual expertise theories. Perceptual theories expect that increased experience recognising own-race faces leads to a more effective processing style tuned to these faces (e.g., Stahl, Wiese & Schweinberger, 2008). Social-cognitive theories point to categorisation of other-group members at the expense of processing their individual identity, and greater motivation to attend to in-group members (Bernstein, Young & Hugenberg, 2007; Levin, 2001). Both of these theoretical accounts can be used to predict own-group biases in voice processing. An own-sex bias in voice processing was tested (experiment 2.1), and an own-accent bias was found in recognition memory (experiment 3.1). Contributions from perceptual expertise and social-cognitive mechanisms to this bias were then studied. By manipulating the supposed social power of speakers, support for the social-cognitive view was found (experiment 3.2). Event-related potentials (ERPs) revealed further support. This was because an own-accent bias was found in ERP measures of voice discrimination, but not in the ability to discriminate between voices (while ignoring them). Support for the social-cognitive view was limited when studying faces however. There was no evidence of own-group bias for physically. similar face groups (experiments 3.3,5.1 and 6.1). Evidence from eye-tracking found that attention was directed towards the most diagnostic face areas for individual recognition (experiment 5.2). Knowledge of diagnostic areas is best explained by perceptual expertise. Importantly however, unbiased participants adjusted their viewing behaviour according to the most diagnostic areas of each race. Finally, analysis of saccades revealed greater difficulty ignoring own-race faces (experiment 6.2), although there was no such bias for physically similar social groups (experiment 6.1). The implications of these findings and directions for future research are discussed.

Speaker verification using voice source parameters

Neocleous, Andreas January 2000 (has links)
No description available.

Page generated in 0.4649 seconds