Spelling suggestions: "subject:"markov codels"" "subject:"markov 2models""
141 |
Deep Neural Networks for Large Vocabulary Handwritten Text Recognition / Réseaux de Neurones Profonds pour la Reconnaissance de Texte Manucrit à Large VocabulaireBluche, Théodore 13 May 2015 (has links)
La transcription automatique du texte dans les documents manuscrits a de nombreuses applications, allant du traitement automatique des documents à leur indexation ou leur compréhension. L'une des approches les plus populaires de nos jours consiste à parcourir l'image d'une ligne de texte avec une fenêtre glissante, de laquelle un certain nombre de caractéristiques sont extraites, et modélisées par des Modèles de Markov Cachés (MMC). Quand ils sont associés à des réseaux de neurones, comme des Perceptrons Multi-Couches (PMC) ou Réseaux de Neurones Récurrents de type Longue Mémoire à Court Terme (RNR-LMCT), et à un modèle de langue, ces modèles produisent de bonnes transcriptions. D'autre part, dans de nombreuses applications d'apprentissage automatique, telles que la reconnaissance de la parole ou d'images, des réseaux de neurones profonds, comportant plusieurs couches cachées, ont récemment permis une réduction significative des taux d'erreur.Dans cette thèse, nous menons une étude poussée de différents aspects de modèles optiques basés sur des réseaux de neurones profonds dans le cadre de systèmes hybrides réseaux de neurones / MMC, dans le but de mieux comprendre et évaluer leur importance relative. Dans un premier temps, nous montrons que des réseaux de neurones profonds apportent des améliorations cohérentes et significatives par rapport à des réseaux ne comportant qu'une ou deux couches cachées, et ce quel que soit le type de réseau étudié, PMC ou RNR, et d'entrée du réseau, caractéristiques ou pixels. Nous montrons également que les réseaux de neurones utilisant les pixels directement ont des performances comparables à ceux utilisant des caractéristiques de plus haut niveau, et que la profondeur des réseaux est un élément important de la réduction de l'écart de performance entre ces deux types d'entrées, confirmant la théorie selon laquelle les réseaux profonds calculent des représentations pertinantes, de complexités croissantes, de leurs entrées, en apprenant les caractéristiques de façon automatique. Malgré la domination flagrante des RNR-LMCT dans les publications récentes en reconnaissance d'écriture manuscrite, nous montrons que des PMCs profonds atteignent des performances comparables. De plus, nous avons évalué plusieurs critères d'entrainement des réseaux. Avec un entrainement discriminant de séquences, nous reportons, pour des systèmes PMC/MMC, des améliorations comparables à celles observées en reconnaissance de la parole. Nous montrons également que la méthode de Classification Temporelle Connexionniste est particulièrement adaptée aux RNRs. Enfin, la technique du dropout a récemment été appliquée aux RNR. Nous avons testé son effet à différentes positions relatives aux connexions récurrentes des RNRs, et nous montrons l'importance du choix de ces positions.Nous avons mené nos expériences sur trois bases de données publiques, qui représentent deux langues (l'anglais et le français), et deux époques, en utilisant plusieurs types d'entrées pour les réseaux de neurones : des caractéristiques prédéfinies, et les simples valeurs de pixels. Nous avons validé notre approche en participant à la compétition HTRtS en 2014, où nous avons obtenu la deuxième place. Les résultats des systèmes présentés dans cette thèse, avec les deux types de réseaux de neurones et d'entrées, sont comparables à l'état de l'art sur les bases Rimes et IAM, et leur combinaison dépasse les meilleurs résultats publiés sur les trois bases considérées. / The automatic transcription of text in handwritten documents has many applications, from automatic document processing, to indexing and document understanding. One of the most popular approaches nowadays consists in scanning the text line image with a sliding window, from which features are extracted, and modeled by Hidden Markov Models (HMMs). Associated with neural networks, such as Multi-Layer Perceptrons (MLPs) or Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNs), and with a language model, these models yield good transcriptions. On the other hand, in many machine learning applications, including speech recognition and computer vision, deep neural networks consisting of several hidden layers recently produced a significant reduction of error rates. In this thesis, we have conducted a thorough study of different aspects of optical models based on deep neural networks in the hybrid neural network / HMM scheme, in order to better understand and evaluate their relative importance. First, we show that deep neural networks produce consistent and significant improvements over networks with one or two hidden layers, independently of the kind of neural network, MLP or RNN, and of input, handcrafted features or pixels. Then, we show that deep neural networks with pixel inputs compete with those using handcrafted features, and that depth plays an important role in the reduction of the performance gap between the two kinds of inputs, supporting the idea that deep neural networks effectively build hierarchical and relevant representations of their inputs, and that features are automatically learnt on the way. Despite the dominance of LSTM-RNNs in the recent literature of handwriting recognition, we show that deep MLPs achieve comparable results. Moreover, we evaluated different training criteria. With sequence-discriminative training, we report similar improvements for MLP/HMMs as those observed in speech recognition. We also show how the Connectionist Temporal Classification framework is especially suited to RNNs. Finally, the novel dropout technique to regularize neural networks was recently applied to LSTM-RNNs. We tested its effect at different positions in LSTM-RNNs, thus extending previous works, and we show that its relative position to the recurrent connections is important. We conducted the experiments on three public databases, representing two languages (English and French) and two epochs, using different kinds of neural network inputs: handcrafted features and pixels. We validated our approach by taking part to the HTRtS contest in 2014. The results of the final systems presented in this thesis, namely MLPs and RNNs, with handcrafted feature or pixel inputs, are comparable to the state-of-the-art on Rimes and IAM. Moreover, the combination of these systems outperformed all published results on the considered databases.
|
142 |
RASTREAMENTO DE AGROBOTS EM ESTUFAS AGRÍCOLAS USANDO MODELOS OCULTOS DE MARKOV: Comparação do desempenho e da correção dos algoritmos de Viterbi e Viterbi com janela de observações deslizanteAlves, Roberson Junior Fernandes 17 September 2015 (has links)
Made available in DSpace on 2017-07-21T14:19:26Z (GMT). No. of bitstreams: 1
Roberson Junior Fernandes Alves.pdf: 17901245 bytes, checksum: 170e17bbccf0e54fa9b0dab204aca2e4 (MD5)
Previous issue date: 2015-09-17 / Developing mobile and autonomous agrobots for greenhouses requires the use of procedures which allow robot autolocalization and tracking. The tracking problem can be modeled as finding the most likely sequence of states in a hidden Markov model„ whose states indicate
the positions of an occupancy grid. This sequence can be estimated with Viterbi’s algorithm. However, the processing time and consumed memory, of this algorithm, grows with the dimensions of the grid and tracking duration, and, this can constraint its use for tracking agrobots. Considering it, this work presents a tracking procedure which uses two approximated implementations of Viterbi’s algorithm called Viterbi-JD(Viterbi’s algorithm
with a sliding window) and Viterbi-JD-MTE(Viterbi’s algorithm with a sliding window over an hidden Markov model with sparse transition matrix). The experimental
results show that the time and memory performance of tracking with this two approximated implementations are significantly higher than the Viterbi’s based tracking. The
reported tracking hypothesis is suboptimal, when compared to the hypothesis generated
by Viterbi, but the error does not grows substantially. Th experimentos was performed
using RSSI(Received Signal Strength Indicator) simulated data. / O desenvolvimento de agrobots móveis e autônomos para operar em estufas agrícolas depende da implementação de procedimentos que permitam o rastreamento do robô no ambiente. O problema do rastreamento pode ser modelado como a determinação da sequência
de estados mais prováveis de um modelo oculto de Markov cujos estados indicam posições de uma grade de ocupação. Esta sequência pode ser estimada pelo algoritmo de Viterbi. No entanto, o tempo de processamento e a memória consumida, por esse algoritmo, crescem com as dimensões da grade e com a duração do rastreamento, e isto pode limitar
seu uso no rastreamento de agrobots em estufas. Considerando o exposto, este trabalho apresenta um procedimento de rastreamento que utiliza mplementações aproximadas do algoritmo de Viterbi denominadas de Viterbi-JD(Viterbi com janela deslizante) e Viterbi-
JD-MTE(Viterbi com janela deslizante sobre um modelo oculto de Markov com matriz de transição esparsa). Os experimentos mostram que o desempenho de tempo e memória
do rastreamento baseado nessas implementações aproximadas é significativamente melhor
que aquele do algoritmo original. A hipótese de rastreamento gerada é sub ótima em relação
àquela calculada pelo algoritmo original, contudo, não há um aumento substancial
do erro. Os experimentos foram realizados utilizando dados simulados de RSSI (Received
Signal Strength Indicator).
|
143 |
Dynamic Programming with Multiple Candidates and its Applications to Sign Language and Hand Gesture RecognitionYang, Ruiduo 07 March 2008 (has links)
Dynamic programming has been widely used to solve various kinds of optimization problems.In this work, we show that two crucial problems in video-based sign language and gesture recognition systems can be attacked by dynamic programming with additional multiple observations. The first problem occurs at the higher (sentence) level. Movement epenthesis [1] (me), i.e., the necessary but meaningless movement between signs, can result in difficulties in modeling and scalability as the number of signs increases. The second problem occurs at the lower (feature) level. Ambiguity of hand detection and occlusion will propagate errors to the higher level. We construct a novel framework that can handle both of these problems based on a dynamic programming approach.
The me has only be modeled explicitly in the past. Our proposed method tries to handle me in a dynamic programming framework where we model the me implicitly. We call this enhanced Level Building (eLB) algorithm. This formulation also allows the incorporation of statistical grammar models such as bigrams and trigrams. Another dynamic programming process that handles the problem of selecting among multiple hand candidates is also included in the feature level. This is different from most of the previous approaches, where a single observation is used. We also propose a grouping process that can generate multiple, overlapping hand candidates.
We demonstrate our ideas on three continuous American Sign Language data sets and one hand gesture data set. The ASL data sets include one with a simple background, one with a simple background but with the signer wearing short sleeved clothes, and the last with a complex and changing background. The gesture data set contains color gloved gestures with a complex background. We achieve within 5% performance loss from the automatically chosen me score compared with the manually chosen me score. At the low level, we first over segment each frame to get a list of segments. Then we use a greedy method to group the segments based on different grouping cues. We also show that the performance loss is within 5% when we compare this method with manually selected feature vectors.
|
144 |
Text Augmentation: Inserting markup into natural language text with PPM ModelsYeates, Stuart Andrew January 2006 (has links)
This thesis describes a new optimisation and new heuristics for automatically marking up XML documents, and CEM, a Java implementation, using PPM models. CEM is significantly more general than previous systems, marking up large numbers of hierarchical tags, using n-gram models for large n and a variety of escape methods. Four corpora are discussed, including the bibliography corpus of 14682 bibliographies laid out in seven standard styles using the BibTeX system and marked up in XML with every field from the original BibTeX. Other corpora include the ROCLING Chinese text segmentation corpus, the Computists' Communique corpus and the Reuters' corpus. A detailed examination is presented of the methods of evaluating mark up algorithms, including computation complexity measures and correctness measures from the fields of information retrieval, string processing, machine learning and information theory. A new taxonomy of markup complexities is established and the properties of each taxon are examined in relation to the complexity of marked up documents. The performance of the new heuristics and optimisation are examined using the four corpora.
|
145 |
Video Analysis of Mouth Movement Using Motion Templates for Computer-based Lip-ReadingYau, Wai Chee, waichee@ieee.org January 2008 (has links)
This thesis presents a novel lip-reading approach to classifying utterances from video data, without evaluating voice signals. This work addresses two important issues which are the efficient representation of mouth movement for visual speech recognition the temporal segmentation of utterances from video. The first part of the thesis describes a robust movement-based technique used to identify mouth movement patterns while uttering phonemes. This method temporally integrates the video data of each phoneme into a 2-D grayscale image named as a motion template (MT). This is a view-based approach that implicitly encodes the temporal component of an image sequence into a scalar-valued MT. The data size was reduced by extracting image descriptors such as Zernike moments (ZM) and discrete cosine transform (DCT) coefficients from MT. Support vector machine (SVM) and hidden Markov model (HMM) were used to classify the feature descriptors. A video speech corpus of 2800 utterances was collected for evaluating the efficacy of MT for lip-reading. The experimental results demonstrate the promising performance of MT in mouth movement representation. The advantages and limitations of MT for visual speech recognition were identified and validated through experiments. A comparison between ZM and DCT features indicates that th e accuracy of classification for both methods is very comparable when there is no relative motion between the camera and the mouth. Nevertheless, ZM is resilient to rotation of the camera and continues to give good results despite rotation but DCT is sensitive to rotation. DCT features are demonstrated to have better tolerance to image noise than ZM. The results also demonstrate a slight improvement of 5% using SVM as compared to HMM. The second part of this thesis describes a video-based, temporal segmentation framework to detect key frames corresponding to the start and stop of utterances from an image sequence, without using the acoustic signals. This segmentation technique integrates mouth movement and appearance information. The efficacy of this technique was tested through experimental evaluation and satisfactory performance was achieved. This segmentation method has been demonstrated to perform efficiently for utterances separated with short pauses. Potential applications for lip-reading technologies include human computer interface (HCI) for mobility-impaired users, defense applications that require voice-less communication, lip-reading mobile phones, in-vehicle systems, and improvement of speech-based computer control in noisy environments.
|
146 |
Face Recognition : A Single View Based HMM ApproachLe, Hung Son January 2008 (has links)
<p>This dissertation addresses the challenges of giving computers the ability of doing face recognition, i.e. discriminate between different faces. Face recognition systems are commonly trained with a database of face images, becoming “familiar” with the given faces. Many reported methods rely heavily on training database size and representativenes. But collecting training images covering, for instance, a wide range of viewpoints, different expressions and illumination conditions is difficult and costly. Moreover, there may be only one face image per person at low image resolution or quality. In these situations, face recognition techniques usually suffer serious performance drop. Here we present effective algorithms that deal with single image per person database, despite issues with illumination, face expression and pose variation.</p><p>Illumination changes the appearance of a face in images. Thus, we use a new pyramid based fusion method for face recognition under arbitrary unknown lighting. This extended approach with logarithmic transform works efficiently with a single image. The produced image has better contrast at both low and high ranges, i.e. has more visible details than the original one. An improved method works with high dynamic range images, useful for outdoor face images.</p><p>Face expressions also modify the images’ appearance. An extended Hidden Markov Models (HMM) with a flexible encoding scheme treats images as an ensemble of horizontal and vertical strips. Each person is modeled by Joint Multiple Hidden Markov Models (JM-HMMs). This approach offers computational advantages and the good learning ability from just a single sample per class. A fast method simulated JM-HMM functionality is then derived. The new method with abstract observations and a simplified similarity measurement does not require retraining HMMs for new images or subjects. Pose invariant recognition from a single sample image per person was overcome by using the wire frame Candide face model for the synthesis of virtual views. This is one of the support functions of our face recognition system, WAWO. The extensive experiments clearly show that WAWO outperforms the state-of-the-art systems in FERET tests.</p>
|
147 |
Missile approach warning using multi-spectral imagery / Missilvarning med hjälp av multispektrala bilderHolm Ovrén, Hannes, Emilsson, Erika January 2010 (has links)
<p>Man portable air defence systems, MANPADS, pose a big threat to civilian and military aircraft. This thesis aims to find methods that could be used in a missile approach warning system based on infrared cameras.</p><p>The two main tasks of the completed system are to classify the type of missile, and also to estimate its position and velocity from a sequence of images.</p><p>The classification is based on hidden Markov models, one-class classifiers, and multi-class classifiers.</p><p>Position and velocity estimation uses a model of the observed intensity as a function of real intensity, image coordinates, distance and missile orientation. The estimation is made by an extended Kalman filter.</p><p>We show that fast classification of missiles based on radiometric data and a hidden Markov model is possible and works well, although more data would be needed to verify the results.</p><p>Estimating the position and velocity works fairly well if the initial parameters are known. Unfortunately, some of these parameters can not be computed using the available sensor data.</p>
|
148 |
Face Recognition : A Single View Based HMM ApproachLe, Hung Son January 2008 (has links)
This dissertation addresses the challenges of giving computers the ability of doing face recognition, i.e. discriminate between different faces. Face recognition systems are commonly trained with a database of face images, becoming “familiar” with the given faces. Many reported methods rely heavily on training database size and representativenes. But collecting training images covering, for instance, a wide range of viewpoints, different expressions and illumination conditions is difficult and costly. Moreover, there may be only one face image per person at low image resolution or quality. In these situations, face recognition techniques usually suffer serious performance drop. Here we present effective algorithms that deal with single image per person database, despite issues with illumination, face expression and pose variation. Illumination changes the appearance of a face in images. Thus, we use a new pyramid based fusion method for face recognition under arbitrary unknown lighting. This extended approach with logarithmic transform works efficiently with a single image. The produced image has better contrast at both low and high ranges, i.e. has more visible details than the original one. An improved method works with high dynamic range images, useful for outdoor face images. Face expressions also modify the images’ appearance. An extended Hidden Markov Models (HMM) with a flexible encoding scheme treats images as an ensemble of horizontal and vertical strips. Each person is modeled by Joint Multiple Hidden Markov Models (JM-HMMs). This approach offers computational advantages and the good learning ability from just a single sample per class. A fast method simulated JM-HMM functionality is then derived. The new method with abstract observations and a simplified similarity measurement does not require retraining HMMs for new images or subjects. Pose invariant recognition from a single sample image per person was overcome by using the wire frame Candide face model for the synthesis of virtual views. This is one of the support functions of our face recognition system, WAWO. The extensive experiments clearly show that WAWO outperforms the state-of-the-art systems in FERET tests.
|
149 |
Algorithmic Trading : Hidden Markov Models on Foreign Exchange DataIdvall, Patrik, Jonsson, Conny January 2008 (has links)
In this master's thesis, hidden Markov models (HMM) are evaluated as a tool for forecasting movements in a currency cross. With an ever increasing electronic market, making way for more automated trading, or so called algorithmic trading, there is constantly a need for new trading strategies trying to find alpha, the excess return, in the market. HMMs are based on the well-known theories of Markov chains, but where the states are assumed hidden, governing some observable output. HMMs have mainly been used for speech recognition and communication systems, but have lately also been utilized on financial time series with encouraging results. Both discrete and continuous versions of the model will be tested, as well as single- and multivariate input data. In addition to the basic framework, two extensions are implemented in the belief that they will further improve the prediction capabilities of the HMM. The first is a Gaussian mixture model (GMM), where one for each state assign a set of single Gaussians that are weighted together to replicate the density function of the stochastic process. This opens up for modeling non-normal distributions, which is often assumed for foreign exchange data. The second is an exponentially weighted expectation maximization (EWEM) algorithm, which takes time attenuation in consideration when re-estimating the parameters of the model. This allows for keeping old trends in mind while more recent patterns at the same time are given more attention. Empirical results shows that the HMM using continuous emission probabilities can, for some model settings, generate acceptable returns with Sharpe ratios well over one, whilst the discrete in general performs poorly. The GMM therefore seems to be an highly needed complement to the HMM for functionality. The EWEM however does not improve results as one might have expected. Our general impression is that the predictor using HMMs that we have developed and tested is too unstable to be taken in as a trading tool on foreign exchange data, with too many factors influencing the results. More research and development is called for.
|
150 |
Retroviral long Terminal Repeats; Structure, Detection and PhylogenyBenachenhou, Farid January 2010 (has links)
Long terminal repeats (LTRs) are non-coding repeats flanking the protein-coding genes of LTR retrotransposons. The variability of LTRs poses a challenge in studying them. Hidden Markov models (HMMs), probabilistic models widely used in pattern recognition, are useful in dealing with this variability. The aim of this work was mainly to study LTRs of retroviruses and LTR retrotransposons using HMMs. Paper I describes the methodology of HMM modelling applied to different groups of LTRs from exogenous retroviruses (XRVs) and endogenous retroviruses (ERVs). The detection capabilities of HMMs were assessed and were found to be high for homogeneous groups of LTRs. The alignments generated by the HMMs displayed conserved motifs some of which could be related to known functions of XRVs. The common features of the different groups of retroviral LTRs were investigated by combining them into a single alignment. They were the short inverted terminal repeats TG and CA and three AT-rich stretches which provide retroviruses with TATA boxes and AATAAA polyadenylation signals. In Paper II, phylogenetic trees of three groups of retroviral LTRs were constructed by using HMM-based alignments. The LTR trees were consistent with trees based on other retroviral genes suggesting co-evolution between LTRs and these genes. In Paper III, the methods in Paper I and II were extended to LTRs from other retrotransposon groups, covering much of the diversity of all known LTRs. For the first time an LTR phylogeny could be achieved. There were no major disagreement between the LTR tree and trees based on three different domains of the Pol gene. The conserved LTR structure of paper I was found to apply to all LTRs. Putative Integrase recognition motifs extended up to 12 bp beyond the short inverted repeats TG/CA. Paper IV is a review article describing the use of sequence similarity and structural markers for the taxonomy of ERVs. ERVs were originally classified into three classes according to the length of the target site duplication. While this classification is useful it does not include all ERVs. A naming convention based on previous ERV and XRV nomenclature but taking into account newer information is advocated in order to provide a practical yet coherent scheme in dealing with new unclassified ERV sequences. Paper V gives an overview of bioinformatics tools for studies of ERVs and of retroviral evolution before and after endogenization. It gives some examples of recent integrations in vertebrate genomes and discusses pathogenicity of human ERVs including their possible relation to cancers. In conclusion, HMMs were able to successfully detect and align LTRs. Progress was made in understanding their conserved structure and phylogeny. The methods developed in this thesis could be applied to different kinds of non-coding DNA sequence element.
|
Page generated in 0.0857 seconds