• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 43
  • 35
  • 1
  • Tagged with
  • 208
  • 34
  • 33
  • 27
  • 19
  • 17
  • 17
  • 16
  • 13
  • 13
  • 12
  • 12
  • 11
  • 11
  • 11
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.

On-device mobile speech recognition

Mustafa, M. K. January 2016 (has links)
Despite many years of research, Speech Recognition remains an active area of research in Artificial Intelligence. Currently, the most common commercial application of this technology on mobile devices uses a wireless client – server approach to meet the computational and memory demands of the speech recognition process. Unfortunately, such an approach is unlikely to remain viable when fully applied over the approximately 7.22 Billion mobile phones currently in circulation. In this thesis we present an On – Device Speech recognition system. Such a system has the potential to completely eliminate the wireless client-server bottleneck. For the Voice Activity Detection part of this work, this thesis presents two novel algorithms used to detect speech activity within an audio signal. The first algorithm is based on the Log Linear Predictive Cepstral Coefficients Residual signal. These LLPCCRS feature vectors were then classified into voice signal and non-voice signal segments using a modified K-means clustering algorithm. This VAD algorithm is shown to provide a better performance as compared to a conventional energy frame analysis based approach. The second algorithm developed is based on the Linear Predictive Cepstral Coefficients. This algorithm uses the frames within the speech signal with the minimum and maximum standard deviation, as candidates for a linear cross correlation against the rest of the frames within the audio signal. The cross correlated frames are then classified using the same modified K-means clustering algorithm. The resulting output provides a cluster for Speech frames and another cluster for Non–speech frames. This novel application of the linear cross correlation technique to linear predictive cepstral coefficients feature vectors provides a fast computation method for use on the mobile platform; as shown by the results presented in this thesis. The Speech recognition part of this thesis presents two novel Neural Network approaches to mobile Speech recognition. Firstly, a recurrent neural networks architecture is developed to accommodate the output of the VAD stage. Specifically, an Echo State Network (ESN) is used for phoneme level recognition. The drawbacks and advantages of this method are explained further within the thesis. Secondly, a dynamic Multi-Layer Perceptron approach is developed. This builds on the drawbacks of the ESN and provides a dynamic way of handling speech signal length variabilities within its architecture. This novel Dynamic Multi-Layer Perceptron uses both the Linear Predictive Cepstral Coefficients (LPC) and the Mel Frequency Cepstral Coefficients (MFCC) as input features. A speaker dependent approach is presented using the Centre for spoken Language and Understanding (CSLU) database. The results show a very distinct behaviour from conventional speech recognition approaches because the LPC shows performance figures very close to the MFCC. A speaker independent system, using the standard TIMIT dataset, is then implemented on the dynamic MLP for further confirmation of this. In this mode of operation the MFCC outperforms the LPC. Finally, all the results, with emphasis on the computation time of both these novel neural network approaches are compared directly to a conventional hidden Markov model on the CSLU and TIMIT standard datasets.

Cancellable biometric using matrix approaches

Mukhaiyar, Riki January 2015 (has links)
Cancellable biometrics endeavour to hide the appearance of a biometric image into a transformed template which prevents the outsider from recognising whom the biometric belongs to. Current research into cancellable biometric methodologies concentrates on the details of biometric traits. This approach has a drawback which cannot possibly be implemented with other biometric technology. To address this problem, this thesis contributes to development of a novel concept for the feature transformation of biometric technology, especially for fingerprints, by utilizing several matrix operations to provide an alternative algorithm in order to produce multi-implementation of the cancellable system. The matrix operations generate the feature element of the input fingerprint image in an irrevocable form of output fingerprint template by ignoring the type of biometric traits unique to fingerprints; thus, the cancellable algorithm can be implemented in different biometrics technologies. The implementation offers a new concept in generating a cancellable template by considering a sequential procedure for the fingerprint processing, in order to allow the authentication process to succeed in authenticating an enquired input. For example, a region of interest (RoI) step is required to provide a square form input to support the system working in a matrix domain. Meanwhile, the input fingerprints are mostly in rectangular form. This thesis contributes a new approach to selecting a certain area of a fingerprint by utilizing the density of ridge frequency and orientation. The implementation of these two enhancement steps reduces the excision process of this significant region of the fingerprint by avoiding the involvement of a non-feature area. Meanwhile, to avoid obtaining an un classified fingerprint, this thesis offers a new approach to the fingerprint image classification process entailing three requirements in classifying the fingerprint: the core point and its number, ridge frequency, and ridge direction; whilst the tented arch (TA) is only an additional requirement. The proposed idea increases both the percentage accuracy in classifying fingerprints and time consuming of the system. For Example, the accuracy of the fingerprint classification improves from less than 41 per cent of the fingerprint to 86.48 per cent in average for all of databases.

Maximum entropy covariance estimate for statistical pattern recognition

Thomaz, Carlos Eduardo January 2004 (has links)
No description available.

Development and application of pattern recognition techniques

Kittler, J. January 1974 (has links)
No description available.

Biologically-inspired motion detection and classification : human and machine perception

Laxmi, Vijay January 2003 (has links)
No description available.

Visual analysis of viseme dynamics

Turkmani, Aseel January 2008 (has links)
Face-to-face dialogue is the most natural mode of communication between humans. The combination of human visual perception of expression and perception in changes in intonation provides semantic information that communicates idea, feelings and concepts. The realistic modelling of speech movements, through automatic facial animation, and maintaining audio-visual coherence is still a challenge in both the computer graphics and film industry.

On restricting the ambiguity in morphic images of words

Day, Joel D. January 2016 (has links)
For alphabets Delta_1, Delta_2, a morphism g : Delta_1* to Delta_2* is ambiguous with respect to a word u in Delta_1* if there exists a second morphism h : Delta_1* to Delta_2* such that g(u) = h(u) and g not= h. Otherwise g is unambiguous. Hence unambiguous morphisms are those whose structure is fully preserved in their morphic images. A concept so far considered in the free monoid, the first part of this thesis considers natural extensions of ambiguity of morphisms to free groups. It is shown that, while the most straightforward generalization of ambiguity to a free monoid results in a trivial situation, that all morphisms are (always) ambiguous, there exist meaningful extensions of (un)ambiguity which are non-trivial - most notably the concepts of (un)ambiguity up to inner automorphism and up to automorphism. A characterization is given of words in a free group for which there exists an injective morphism which is unambiguous up to inner automorphism in terms of fixed points of morphisms, replicating an existing result for words in the free monoid. A conjecture is presented, which if correct, is sufficient to show an equivalent characterization for unambiguity up to automorphism. A rather counterintuitive statement is also established, that for some words, the only unambiguous (up to automorphism) morphisms are non-injective (or even periodic). The second part of the thesis addresses words for which all non-periodic morphisms are unambiguous. In the free monoid, these take the form of periodicity forcing words. It is shown using morphisms that there exist ratio-primitive periodicity forcing words over arbitrary alphabets, and furthermore that it is possible to establish large and varied classes in this way. It is observed that the set of periodicity forcing words is spanned by chains of words, where each word is a morphic image of its predecessor. It is shown that the chains terminate in exactly one direction, meaning not all periodicity forcing words may be reached as the (non-trivial) morphic image of another. Such words are called prime periodicity forcing words, and some alternative methods for finding them are given. The free-group equivalent to periodicity forcing words - a special class of C-test words - is also considered, as well as the ambiguity of terminal-preserving morphisms with respect to words containing terminal symbols, or constants. Moreover, some applications to pattern languages and group pattern languages are discussed.

Intelligent methods for pattern recognition and optimisation

Petrov, Nedyalko January 2015 (has links)
This dissertation presents and discusses the processes of investigation, implementation, testing, validation and evaluation of several computational intelligence-based systems for solving four large-scale real-world problems. In particular, two industrial problems from the pattern recognition and two from the process optimisation areas are studied and intelligent methods to address them are proposed, developed and tested using real-world data. The first problem investigated is the application of an intelligent visual inspection system for classification of texture images. Two major approaches, incorporating supervised and unsupervised (without a priori knowledge) learning techniques, are considered and neural network based classifiers are trained. The focus is kept on the application of unsupervised non-linear dimensionality reduction techniques in combination with unsupervised classification methods. A number of experiments and simulations are performed to evaluate the proposed approaches and the results are critically compared. Next, a classification problem for timely and reliable identification of emitters of radar signals is investigated. A large data set, containing a considerable amount of missing data is used. Several techniques for dealing with the incomplete data values are employed, including listwise deletion and multiple imputation. Methods incorporating neural network classifiers are studied and the proposed approaches are tested and validated over a number of simulations in the MATLAB environment. The third large-scale problem, presented in this work, addresses the need for optimisation of a thermodynamics first principle-based prediction model for simulation of a major purifying process, used in British Petroleum (BP) refineries. A technique incorporating genetic algorithms is applied for optimising a number of the model parameters and for closing up the gaps between the predicted and measured data. Several functions and a graphical user interface (GUI) tool are implemented in MATLAB to assist the analysis, optimisation, testing and validation of the investigated model. Significant overall improvement in its prediction capabilities is achieved. The final problem, covered in this research work, is the need to improve the convergence rate of a computationally very expensive aerodynamic optimisation process. It is addressed by exploring some physics-grounded heuristics and presenting a novel intelligent approach for automated shape optimisation. A set of basis functions (for spanning the design space) is derived in such a way that they facilitate the work of a time-consuming and expensive computational fluid dynamics (CFD) optimisation process. Two MATLAB-based GUI tools are developed to support the calculation, exploration, testing and validation of the studied approach. Experiments for optimising real aircraft geometry are run on supercomputers through an industrial partner (AIRBUS Operations Ltd). The initial results show very promising opportunities for improving the convergence rate of the slow optimisation process.

Contributions to 3D-shape matching, retrieval and classification / Classification et recherche d’objets 3D

Tabia, Hedi 27 September 2011 (has links)
Les solutions existantes pour la recherche et la classification d’objets 3D sont très sensible à la grande variabilité des formes et elles ne sont pas robustes aux transformations affines ou isométriques qu’un objet peut subir. Dans ce contexte, l’objectif de ma recherche est de développer un système qui peut automatiquement retrouver rapidement et avec précision des modèles 3D visuellement similaire à un objet 3D requête. Le système doit être robuste aux transformations non rigides qu’une forme peut subir. Durant ma thèse de doctorat, nous avons développé une nouvelle approche pour la mise en correspondance des objets 3D avec la présence des transformations non-rigides et des modèles partiellement similaires. Nous avons proposé d’utiliser une nouvelle représentation des surfaces 3D à l’aide d’un ensemble de courbes 3D extraites autour des points caractéristiques. Des outils d’analyse de la forme des courbes sont appliqués pour analyser et de comparer les courbes des surfaces 3D. Nous avons utilisé les fonctions de croyance,comme technique de fusion afin de définir une distance globale entre deux objets 3D. Nous avons également expérimenté cette technique dans la recherche et la classification 3D. Nous avons exploré les techniques de sac à mots pour la recherche et la classification des objets 3D. / Three dimensional object representations have become an integral part of modern computer graphic applications such as computer-aided design, game development and audio-visual production. At the Meanwhile, the 3D data has also become extremely common in fields such as computer vision, computation geometry, molecular biology and medicine. This is due to the rapid evolution of graphics hardware and software development, particularly the availability of low cost 3D scanners which has greatly facilitated 3D model acquisition, creation and manipulation. Content-based search is a necessary solution for structuring, managing these multimedia data, and browsing within these data collections. In this context, we are looking for a system that can automatically retrieve the 3D-models visually similar to a requested 3D-object. Existing solutions for 3D-shape retrieval and classification suffer from high variability towards shape-preserving transformations like affine or isometric transformations (non-rigid transformations). In this context, the aim of my research is to develop a system that can automatically retrieve quickly and with precision 3D models visually similar to a 3D-object query. The system has to be robust to non-rigid transformation that a shape can undergo.During my PhD thesis:We have developed a novel approach to match 3D objects in the presence of nonrigid transformation and partially similar models. We have proposed to use a new representation of 3D-surfaces using 3D curves extracted around feature points. Tools from shape analysis of curves are applied to analyze and to compare curves of two 3D-surfaces. We have used the belief functions, as fusion technique, to define a global distance between 3D-objects. We have also experimented this technique in the retrieval and classification tasks. We have proposed the use of Bag of Feature techniques in 3D-object retrieval and classification.

A general state-based temporal pattern recognition

Zheng, Aihua January 2012 (has links)
Time-series and state-sequences are ubiquitous patterns in temporal logic and are widely used to present temporal data in data mining. Generally speaking, there are three known choices for the time primitive: points, intervals, points and intervals. In this thesis, a formal characterization of time-series and state-sequences is presented for both complete and incomplete situations, where a state-sequence is defined as a list of sequential data validated on the corresponding time-series. In addition, subsequence matching is addressed to associate the state-sequences, where both non-temporal aspects as well as rich temporal aspects including temporal order, temporal duration and temporal gap should be taken into account. Firstly, based on the typed point based time-elements and time-series, a formal characterization of time-series and state-sequences is introduced for both complete and incomplete situations, where a state-sequence is defined as a list of sequential data validated on the corresponding time-series. A time-series is formalized as a tetrad (T, R, Tdur, Tgap), which denotes: the temporal order of time- elements; the temporal relationship between time-elements; the temporal duration of each time-element and the temporal gap between each adjacent pair of time-elements respectively. Secondly, benefiting from the formal characterization of time-series and state-sequences, a general similarity measurement (GSM) that takes into account both non-temporal and rich temporal information, including temporal order as well as temporal duration and temporal gap, is introduced for subsequence matching. This measurement is general enough to subsume most of the popular existing measurements as special cases. In particular, a new conception of temporal common subsequence is proposed. Furthermore, a new LCS-based algorithm named Optimal Temporal Common Subsequence (OTCS), which takes into account rich temporal information, is designed. The experimental results on 6 benchmark datasets demonstrate the effectiveness and robustness of GSM and its new case OTCS. Compared with binary-value distance measurements, GSM can distinguish between the distance caused by different states in the same operation; compared with the real-penalty distance measurements, it can filter out the noise that may push the similarity into abnormal levels. Finally, two case studies are investigated for temporal pattern recognition: basketball zone-defence detection and video copy detection. In the case of basketball zone-defence detection, the computational technique and algorithm for detecting zone-defence patterns from basketball videos is introduced, where the Laplacian Matrix-based algorithm is extended to take into account the effects from zoom and single defender‘s translation in zone-defence graph matching and a set of character-angle based features was proposed to describe the zone-defence graph. The experimental results show that the approach explored is useful in helping the coach of the defensive side check whether the players are keeping to the correct zone-defence strategy, as well as detecting the strategy of the opponent side. It can describe the structure relationship between defender-lines for basketball zone-defence, and has a robust performance in both simulation and real-life applications, especially when disturbances exist. In the case of video copy detection, a framework for subsequence matching is introduced. A hybrid similarity framework addressing both non-temporal and temporal relationships between state-sequences, represented by bipartite graphs, is proposed. The experimental results using real-life video databases demonstrated that the proposed similarity framework is robust to states alignment with different numbers and different values, and various reordering including inversion and crossover.

Page generated in 0.0202 seconds