Global ETD Search

111	Bird song recognition with hidden Markov models Van der Merwe, Hugo Jacobus 03 1900 (has links) Thesis (MScEng (Electrical and Electronic Engineering))--Stellenbosch University, 2008. / Automatic bird song recognition and transcription is a relatively new field. Reliable automatic recognition systems would be of great benefit to further research in ornithology and conservation, as well as commercially in the very large birdwatching subculture. This study investigated the use of Hidden Markov Models and duration modelling for bird call recognition. Through use of more accurate duration modelling, very promising results were achieved with feature vectors consisting of only pitch and volume. An accuracy of 51% was achieved for 47 calls from 39 birds, with the models typically trained from only one or two specimens. The ALS pitch tracking algorithm was adapted to bird song to extract the pitch. Bird song synthesis was employed to subjectively evaluate the features. Compounded Selfloop Duration Modelling was developed as an alternative duration modelling technique. For long durations, this technique can be more computationally efficient than Ferguson stacks. The application of approximate string matching to bird song was also briefly considered. Bird song recognition Hidden Markov models Computer sound processing Tune recognition Dissertations -- Electronic engineering Theses -- Electronic engineering Electrical and Electronic Engineering
112	An HMM-based automatic singing transcription platform for a sight-singing tutor Krige, Willie 03 1900 (has links) Thesis (MScEng (Electrical and Electronic Engineering))--Stellenbosch University, 2008. / A singing transcription system transforming acoustic input into MIDI note sequences is presented. The transcription system is incorporated into a pronunciation-independent sight-singing tutor system, which provides note-level feedback on the accuracy with which each note in a sequence has been sung. Notes are individually modeled with hidden Markov models (HMMs) using untuned pitch and delta-pitch as feature vectors. A database consisting of annotated passages sung by 26 soprano subjects was compiled for the development of the system, since no existing data was available. Various techniques that allow efficient use of a limited dataset are proposed and evaluated. Several HMM topologies are also compared, in analogy with approaches often used in the field of automatic speech recognition. Context-independent note models are evaluated first, followed by the use of explicit transition models to better identify boundaries between notes. A non-repetitive grammar is used to reduce the number of insertions. Context-dependent note models are then introduced, followed by context-dependent transition models. The aim in introducing context-dependency is to improve transition region modeling, which in turn should increase note transcription accuracy, but also improve the time-alignment of the notes and the transition regions. The final system is found to be able to transcribe sung passages with around 86% accuracy. Finally, a note-level sight-singing tutor system based on the singing transcription system is presented and a number of note sequence scoring approaches are evaluated. Signal processing -- Digital techniques Sight-singing Hidden Markov models Dissertations -- Electronic engineering Theses -- Electronic engineering Electrical and Electronic Engineering
113	Deep Neural Networks for Large Vocabulary Handwritten Text Recognition / Réseaux de Neurones Profonds pour la Reconnaissance de Texte Manucrit à Large Vocabulaire Bluche, Théodore 13 May 2015 (has links) La transcription automatique du texte dans les documents manuscrits a de nombreuses applications, allant du traitement automatique des documents à leur indexation ou leur compréhension. L'une des approches les plus populaires de nos jours consiste à parcourir l'image d'une ligne de texte avec une fenêtre glissante, de laquelle un certain nombre de caractéristiques sont extraites, et modélisées par des Modèles de Markov Cachés (MMC). Quand ils sont associés à des réseaux de neurones, comme des Perceptrons Multi-Couches (PMC) ou Réseaux de Neurones Récurrents de type Longue Mémoire à Court Terme (RNR-LMCT), et à un modèle de langue, ces modèles produisent de bonnes transcriptions. D'autre part, dans de nombreuses applications d'apprentissage automatique, telles que la reconnaissance de la parole ou d'images, des réseaux de neurones profonds, comportant plusieurs couches cachées, ont récemment permis une réduction significative des taux d'erreur.Dans cette thèse, nous menons une étude poussée de différents aspects de modèles optiques basés sur des réseaux de neurones profonds dans le cadre de systèmes hybrides réseaux de neurones / MMC, dans le but de mieux comprendre et évaluer leur importance relative. Dans un premier temps, nous montrons que des réseaux de neurones profonds apportent des améliorations cohérentes et significatives par rapport à des réseaux ne comportant qu'une ou deux couches cachées, et ce quel que soit le type de réseau étudié, PMC ou RNR, et d'entrée du réseau, caractéristiques ou pixels. Nous montrons également que les réseaux de neurones utilisant les pixels directement ont des performances comparables à ceux utilisant des caractéristiques de plus haut niveau, et que la profondeur des réseaux est un élément important de la réduction de l'écart de performance entre ces deux types d'entrées, confirmant la théorie selon laquelle les réseaux profonds calculent des représentations pertinantes, de complexités croissantes, de leurs entrées, en apprenant les caractéristiques de façon automatique. Malgré la domination flagrante des RNR-LMCT dans les publications récentes en reconnaissance d'écriture manuscrite, nous montrons que des PMCs profonds atteignent des performances comparables. De plus, nous avons évalué plusieurs critères d'entrainement des réseaux. Avec un entrainement discriminant de séquences, nous reportons, pour des systèmes PMC/MMC, des améliorations comparables à celles observées en reconnaissance de la parole. Nous montrons également que la méthode de Classification Temporelle Connexionniste est particulièrement adaptée aux RNRs. Enfin, la technique du dropout a récemment été appliquée aux RNR. Nous avons testé son effet à différentes positions relatives aux connexions récurrentes des RNRs, et nous montrons l'importance du choix de ces positions.Nous avons mené nos expériences sur trois bases de données publiques, qui représentent deux langues (l'anglais et le français), et deux époques, en utilisant plusieurs types d'entrées pour les réseaux de neurones : des caractéristiques prédéfinies, et les simples valeurs de pixels. Nous avons validé notre approche en participant à la compétition HTRtS en 2014, où nous avons obtenu la deuxième place. Les résultats des systèmes présentés dans cette thèse, avec les deux types de réseaux de neurones et d'entrées, sont comparables à l'état de l'art sur les bases Rimes et IAM, et leur combinaison dépasse les meilleurs résultats publiés sur les trois bases considérées. / The automatic transcription of text in handwritten documents has many applications, from automatic document processing, to indexing and document understanding. One of the most popular approaches nowadays consists in scanning the text line image with a sliding window, from which features are extracted, and modeled by Hidden Markov Models (HMMs). Associated with neural networks, such as Multi-Layer Perceptrons (MLPs) or Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNs), and with a language model, these models yield good transcriptions. On the other hand, in many machine learning applications, including speech recognition and computer vision, deep neural networks consisting of several hidden layers recently produced a significant reduction of error rates. In this thesis, we have conducted a thorough study of different aspects of optical models based on deep neural networks in the hybrid neural network / HMM scheme, in order to better understand and evaluate their relative importance. First, we show that deep neural networks produce consistent and significant improvements over networks with one or two hidden layers, independently of the kind of neural network, MLP or RNN, and of input, handcrafted features or pixels. Then, we show that deep neural networks with pixel inputs compete with those using handcrafted features, and that depth plays an important role in the reduction of the performance gap between the two kinds of inputs, supporting the idea that deep neural networks effectively build hierarchical and relevant representations of their inputs, and that features are automatically learnt on the way. Despite the dominance of LSTM-RNNs in the recent literature of handwriting recognition, we show that deep MLPs achieve comparable results. Moreover, we evaluated different training criteria. With sequence-discriminative training, we report similar improvements for MLP/HMMs as those observed in speech recognition. We also show how the Connectionist Temporal Classification framework is especially suited to RNNs. Finally, the novel dropout technique to regularize neural networks was recently applied to LSTM-RNNs. We tested its effect at different positions in LSTM-RNNs, thus extending previous works, and we show that its relative position to the recurrent connections is important. We conducted the experiments on three public databases, representing two languages (English and French) and two epochs, using different kinds of neural network inputs: handcrafted features and pixels. We validated our approach by taking part to the HTRtS contest in 2014. The results of the final systems presented in this thesis, namely MLPs and RNNs, with handcrafted feature or pixel inputs, are comparable to the state-of-the-art on Rimes and IAM. Moreover, the combination of these systems outperformed all published results on the considered databases. Reconnaissance de formes Modèles de Markov Cachés Réseaux de Neurones Reconnaissance de l'Ecriture Manuscrite Pattern Recognition Hidden Markov Models Neural Nerworks Handwriting Recognition
114	RASTREAMENTO DE AGROBOTS EM ESTUFAS AGRÍCOLAS USANDO MODELOS OCULTOS DE MARKOV: Comparação do desempenho e da correção dos algoritmos de Viterbi e Viterbi com janela de observações deslizante Alves, Roberson Junior Fernandes 17 September 2015 (has links) Made available in DSpace on 2017-07-21T14:19:26Z (GMT). No. of bitstreams: 1 Roberson Junior Fernandes Alves.pdf: 17901245 bytes, checksum: 170e17bbccf0e54fa9b0dab204aca2e4 (MD5) Previous issue date: 2015-09-17 / Developing mobile and autonomous agrobots for greenhouses requires the use of procedures which allow robot autolocalization and tracking. The tracking problem can be modeled as finding the most likely sequence of states in a hidden Markov model„ whose states indicate the positions of an occupancy grid. This sequence can be estimated with Viterbi’s algorithm. However, the processing time and consumed memory, of this algorithm, grows with the dimensions of the grid and tracking duration, and, this can constraint its use for tracking agrobots. Considering it, this work presents a tracking procedure which uses two approximated implementations of Viterbi’s algorithm called Viterbi-JD(Viterbi’s algorithm with a sliding window) and Viterbi-JD-MTE(Viterbi’s algorithm with a sliding window over an hidden Markov model with sparse transition matrix). The experimental results show that the time and memory performance of tracking with this two approximated implementations are significantly higher than the Viterbi’s based tracking. The reported tracking hypothesis is suboptimal, when compared to the hypothesis generated by Viterbi, but the error does not grows substantially. Th experimentos was performed using RSSI(Received Signal Strength Indicator) simulated data. / O desenvolvimento de agrobots móveis e autônomos para operar em estufas agrícolas depende da implementação de procedimentos que permitam o rastreamento do robô no ambiente. O problema do rastreamento pode ser modelado como a determinação da sequência de estados mais prováveis de um modelo oculto de Markov cujos estados indicam posições de uma grade de ocupação. Esta sequência pode ser estimada pelo algoritmo de Viterbi. No entanto, o tempo de processamento e a memória consumida, por esse algoritmo, crescem com as dimensões da grade e com a duração do rastreamento, e isto pode limitar seu uso no rastreamento de agrobots em estufas. Considerando o exposto, este trabalho apresenta um procedimento de rastreamento que utiliza mplementações aproximadas do algoritmo de Viterbi denominadas de Viterbi-JD(Viterbi com janela deslizante) e Viterbi- JD-MTE(Viterbi com janela deslizante sobre um modelo oculto de Markov com matriz de transição esparsa). Os experimentos mostram que o desempenho de tempo e memória do rastreamento baseado nessas implementações aproximadas é significativamente melhor que aquele do algoritmo original. A hipótese de rastreamento gerada é sub ótima em relação àquela calculada pelo algoritmo original, contudo, não há um aumento substancial do erro. Os experimentos foram realizados utilizando dados simulados de RSSI (Received Signal Strength Indicator). robôs móveis estufas agrícolas modelos ocultos de Markov mobile robot greenhouse hidden Markov models
115	Dynamic Programming with Multiple Candidates and its Applications to Sign Language and Hand Gesture Recognition Yang, Ruiduo 07 March 2008 (has links) Dynamic programming has been widely used to solve various kinds of optimization problems.In this work, we show that two crucial problems in video-based sign language and gesture recognition systems can be attacked by dynamic programming with additional multiple observations. The first problem occurs at the higher (sentence) level. Movement epenthesis [1] (me), i.e., the necessary but meaningless movement between signs, can result in difficulties in modeling and scalability as the number of signs increases. The second problem occurs at the lower (feature) level. Ambiguity of hand detection and occlusion will propagate errors to the higher level. We construct a novel framework that can handle both of these problems based on a dynamic programming approach. The me has only be modeled explicitly in the past. Our proposed method tries to handle me in a dynamic programming framework where we model the me implicitly. We call this enhanced Level Building (eLB) algorithm. This formulation also allows the incorporation of statistical grammar models such as bigrams and trigrams. Another dynamic programming process that handles the problem of selecting among multiple hand candidates is also included in the feature level. This is different from most of the previous approaches, where a single observation is used. We also propose a grouping process that can generate multiple, overlapping hand candidates. We demonstrate our ideas on three continuous American Sign Language data sets and one hand gesture data set. The ASL data sets include one with a simple background, one with a simple background but with the signer wearing short sleeved clothes, and the last with a complex and changing background. The gesture data set contains color gloved gestures with a complex background. We achieve within 5% performance loss from the automatically chosen me score compared with the manually chosen me score. At the low level, we first over segment each frame to get a list of segments. Then we use a greedy method to group the segments based on different grouping cues. We also show that the performance loss is within 5% when we compare this method with manually selected feature vectors. Sign Language Recognition Movement Epenthesis Hand Segmentation Hidden Markov Models Dynamic Time Warping Level Building American Studies Arts and Humanities
116	Text Augmentation: Inserting markup into natural language text with PPM Models Yeates, Stuart Andrew January 2006 (has links) This thesis describes a new optimisation and new heuristics for automatically marking up XML documents, and CEM, a Java implementation, using PPM models. CEM is significantly more general than previous systems, marking up large numbers of hierarchical tags, using n-gram models for large n and a variety of escape methods. Four corpora are discussed, including the bibliography corpus of 14682 bibliographies laid out in seven standard styles using the BibTeX system and marked up in XML with every field from the original BibTeX. Other corpora include the ROCLING Chinese text segmentation corpus, the Computists' Communique corpus and the Reuters' corpus. A detailed examination is presented of the methods of evaluating mark up algorithms, including computation complexity measures and correctness measures from the fields of information retrieval, string processing, machine learning and information theory. A new taxonomy of markup complexities is established and the properties of each taxon are examined in relation to the complexity of marked up documents. The performance of the new heuristics and optimisation are examined using the four corpora. Markup Text Augmentation Textual Analysis Hidden Markov Models HMM PPM Viterbi Search Part-Of-Speech Tagging XML Metadata
117	Video Analysis of Mouth Movement Using Motion Templates for Computer-based Lip-Reading Yau, Wai Chee, waichee@ieee.org January 2008 (has links) This thesis presents a novel lip-reading approach to classifying utterances from video data, without evaluating voice signals. This work addresses two important issues which are the efficient representation of mouth movement for visual speech recognition the temporal segmentation of utterances from video. The first part of the thesis describes a robust movement-based technique used to identify mouth movement patterns while uttering phonemes. This method temporally integrates the video data of each phoneme into a 2-D grayscale image named as a motion template (MT). This is a view-based approach that implicitly encodes the temporal component of an image sequence into a scalar-valued MT. The data size was reduced by extracting image descriptors such as Zernike moments (ZM) and discrete cosine transform (DCT) coefficients from MT. Support vector machine (SVM) and hidden Markov model (HMM) were used to classify the feature descriptors. A video speech corpus of 2800 utterances was collected for evaluating the efficacy of MT for lip-reading. The experimental results demonstrate the promising performance of MT in mouth movement representation. The advantages and limitations of MT for visual speech recognition were identified and validated through experiments. A comparison between ZM and DCT features indicates that th e accuracy of classification for both methods is very comparable when there is no relative motion between the camera and the mouth. Nevertheless, ZM is resilient to rotation of the camera and continues to give good results despite rotation but DCT is sensitive to rotation. DCT features are demonstrated to have better tolerance to image noise than ZM. The results also demonstrate a slight improvement of 5% using SVM as compared to HMM. The second part of this thesis describes a video-based, temporal segmentation framework to detect key frames corresponding to the start and stop of utterances from an image sequence, without using the acoustic signals. This segmentation technique integrates mouth movement and appearance information. The efficacy of this technique was tested through experimental evaluation and satisfactory performance was achieved. This segmentation method has been demonstrated to perform efficiently for utterances separated with short pauses. Potential applications for lip-reading technologies include human computer interface (HCI) for mobility-impaired users, defense applications that require voice-less communication, lip-reading mobile phones, in-vehicle systems, and improvement of speech-based computer control in noisy environments. video analysis visual speech recognition motion template Zernike moments discrete cosine transform support vector machines hidden Markov Models
118	Face Recognition : A Single View Based HMM Approach Le, Hung Son January 2008 (has links) <p>This dissertation addresses the challenges of giving computers the ability of doing face recognition, i.e. discriminate between different faces. Face recognition systems are commonly trained with a database of face images, becoming “familiar” with the given faces. Many reported methods rely heavily on training database size and representativenes. But collecting training images covering, for instance, a wide range of viewpoints, different expressions and illumination conditions is difficult and costly. Moreover, there may be only one face image per person at low image resolution or quality. In these situations, face recognition techniques usually suffer serious performance drop. Here we present effective algorithms that deal with single image per person database, despite issues with illumination, face expression and pose variation.</p><p>Illumination changes the appearance of a face in images. Thus, we use a new pyramid based fusion method for face recognition under arbitrary unknown lighting. This extended approach with logarithmic transform works efficiently with a single image. The produced image has better contrast at both low and high ranges, i.e. has more visible details than the original one. An improved method works with high dynamic range images, useful for outdoor face images.</p><p>Face expressions also modify the images’ appearance. An extended Hidden Markov Models (HMM) with a flexible encoding scheme treats images as an ensemble of horizontal and vertical strips. Each person is modeled by Joint Multiple Hidden Markov Models (JM-HMMs). This approach offers computational advantages and the good learning ability from just a single sample per class. A fast method simulated JM-HMM functionality is then derived. The new method with abstract observations and a simplified similarity measurement does not require retraining HMMs for new images or subjects. Pose invariant recognition from a single sample image per person was overcome by using the wire frame Candide face model for the synthesis of virtual views. This is one of the support functions of our face recognition system, WAWO. The extensive experiments clearly show that WAWO outperforms the state-of-the-art systems in FERET tests.</p> face recognition pattern recognition computer vision HMM Hidden Markov Models contrast enhancement pyramid fusion image processing Computer science Datavetenskap
119	Missile approach warning using multi-spectral imagery / Missilvarning med hjälp av multispektrala bilder Holm Ovrén, Hannes, Emilsson, Erika January 2010 (has links) <p>Man portable air defence systems, MANPADS, pose a big threat to civilian and military aircraft. This thesis aims to find methods that could be used in a missile approach warning system based on infrared cameras.</p><p>The two main tasks of the completed system are to classify the type of missile, and also to estimate its position and velocity from a sequence of images.</p><p>The classification is based on hidden Markov models, one-class classifiers, and multi-class classifiers.</p><p>Position and velocity estimation uses a model of the observed intensity as a function of real intensity, image coordinates, distance and missile orientation. The estimation is made by an extended Kalman filter.</p><p>We show that fast classification of missiles based on radiometric data and a hidden Markov model is possible and works well, although more data would be needed to verify the results.</p><p>Estimating the position and velocity works fairly well if the initial parameters are known. Unfortunately, some of these parameters can not be computed using the available sensor data.</p> missile approach warning classification target tracking hidden markov models kalman filtering threshold model multispectral infrared Signal processing Signalbehandling
120	Face Recognition : A Single View Based HMM Approach Le, Hung Son January 2008 (has links) This dissertation addresses the challenges of giving computers the ability of doing face recognition, i.e. discriminate between different faces. Face recognition systems are commonly trained with a database of face images, becoming “familiar” with the given faces. Many reported methods rely heavily on training database size and representativenes. But collecting training images covering, for instance, a wide range of viewpoints, different expressions and illumination conditions is difficult and costly. Moreover, there may be only one face image per person at low image resolution or quality. In these situations, face recognition techniques usually suffer serious performance drop. Here we present effective algorithms that deal with single image per person database, despite issues with illumination, face expression and pose variation. Illumination changes the appearance of a face in images. Thus, we use a new pyramid based fusion method for face recognition under arbitrary unknown lighting. This extended approach with logarithmic transform works efficiently with a single image. The produced image has better contrast at both low and high ranges, i.e. has more visible details than the original one. An improved method works with high dynamic range images, useful for outdoor face images. Face expressions also modify the images’ appearance. An extended Hidden Markov Models (HMM) with a flexible encoding scheme treats images as an ensemble of horizontal and vertical strips. Each person is modeled by Joint Multiple Hidden Markov Models (JM-HMMs). This approach offers computational advantages and the good learning ability from just a single sample per class. A fast method simulated JM-HMM functionality is then derived. The new method with abstract observations and a simplified similarity measurement does not require retraining HMMs for new images or subjects. Pose invariant recognition from a single sample image per person was overcome by using the wire frame Candide face model for the synthesis of virtual views. This is one of the support functions of our face recognition system, WAWO. The extensive experiments clearly show that WAWO outperforms the state-of-the-art systems in FERET tests. face recognition pattern recognition computer vision HMM Hidden Markov Models contrast enhancement pyramid fusion image processing Computer science Datavetenskap

Search results