Spelling suggestions: "subject:"markov codels"" "subject:"markov 2models""
261 |
Analyse mixte de protéines basée sur la séquence et la structure - applications à l'annotation fonctionnelle / Mixed sequence-structure based analysis of proteins, with applications to functional annotationsTetley, Romain 21 November 2018 (has links)
Dans cette thèse, l'emphase est mise sur la réconciliation de l'analyse de structure et de séquence pour les protéines. L'analyse de séquence brille lorsqu'il s'agit de comparer des protéines présentant une forte identité de séquence (≤ 30\%) mais laisse à désirer pour identifier des homologues lointains. L'analyse de structure est une alternative intéressante. Cependant, les méthodes de résolution de structures sont coûteuses et complexes - lorsque toutefois elles produisent des résultats. Ces observations rendent évident la nécessité de développer des méthodes hybrides, exploitant l'information extraite des structures disponibles pour l'injecter dans des modèles de séquence. Cette thèse produit quatre contributions principales dans ce domaine. Premièrement, nous présentons une nouvelle distance structurale, le RMSDcomb, basée sur des patterns de conservation structurale locale, les motifs structuraux. Deuxièmement, nous avons développé une méthode pour identifier des motifs structuraux entre deux structures exploitant un bootstrap dépendant de filtrations. Notre approche n'est pas un compétiteur direct des aligneurs flexibles mais permet plutôt de produire des analyses multi-échelles de similarités structurales. Troisièmement, nous exploitons les méthodes suscitées pour construire des modèles de Markov cachés hybrides biaisés vers des régions mieux conservées structurellement. Nous utilisons un tel modèle pour caractériser les protéines de fusion virales de classe II, une tâche particulièrement ardue du fait de leur faible identité de séquence et leur conservation structurale moyenne. Ce faisant, nous parvenons à trouver un certain nombre d'homologues distants connues des protéines virales, notamment chez la Drosophile. Enfin, en formalisant un sous-problème rencontré lors de la comparaison de filtrations, nous présentons un nouveau problème théorique - le D-family matching - sur lequel nous démontrons des résultats algorithmiques variés. Nous montrons - d'une façon analogue à la comparaison de régions de deux conformations d'une protéine - comment exploiter ce modèle théorique pour comparer deux clusterings d'un même jeu de données. / In this thesis, the focus is set on reconciling the realms of structure and sequence for protein analysis. Sequence analysis tools shine when faced with proteins presenting high sequence identity (≤ 30\%), but are lack - luster when it comes to remote homolog detection. Structural analysis tools present an interesting alternative, but solving structures - when at all possible- is a tedious and expensive process. These observations make the need for hybrid methods - which inject information obtained from available structures in a sequence model - quite clear. This thesis makes four main contributions toward this goal. First we present a novel structural measure, the RMSDcomb, based on local structural conservation patterns - the so called structural motifs. Second, we developed a method to identify structural motifs between two structures using a bootstrap method which relies on filtrations. Our approach is not a direct competitor to flexible aligners but can provide useful to perform a multiscale analysis of structural similarities. Third, we build upon the previous methods to design hybrid Hidden Markov Models which are biased towards regions of increased structural conservation between sets of proteins. We test this tool on the class II fusion viral proteins - particularly challenging because of their low sequence identity and mild structural homology. We find that we are able to recover known remote homologs of the viral proteins in the Drosophila and other organisms. Finally, formalizing a sub - problem encountered when comparing filtrations, we present a new theoretical problem - the D-family matching - on which we present various algorithmic results. We show - in a manner that is analogous to comparing parts of two protein conformations - how it is possible to compare two clusterings of the same data set using such a theoretical model.
|
262 |
Spatia-temporal dynamics in land use and habitat fragmentation in the Sandveld, South AfricaMagidi, James Takawira January 2010 (has links)
>Magister Scientiae - MSc / The Cape Floristic Region (CFR) in South Africa, is one of the world's five Mediterranean hotspots, and is also one of the 34 global biodiversity hotspots. It has rich biological diversity, high level of species endemism in flora and fauna and an unusual high level of human induced threats. The Sandveld forms part of the CFR and is also highly threatened by intensive agriculture (potato, rooibos and wheat farming), proliferation of tourism facilities, coastal development, and alien invasions. These biodiversity threats have led to habitat loss and are
threatening the long-term security of surface and ground water resources. In order to understand trends in such biodiversity loss and improve in the management of these ecosystems, earth-orbiting observation satellite data were used. This research assessed landuse changes and trends in vegetation cover in the Sandveld, using remote sensing images. Landsat TM satellite images of 1990, 2004 and 2007 were classified using the maximum likelihood classifier into seven landuse classes, namely water, agriculture, fire patches, natural
vegetation, wetlands, disturbed veld, and open sands. Change detection using remote sensing algorithms and landscape metrics was performed on these multi-temporal landuse maps using the Land Change ModelIer and Patch Analyst respectively. Markov stochastic modelling techniques were used to predict future scenarios in landuse change based on the classified images and their transitional probabilities. MODIS NDVI multi-temporal datasets with a 16day temporal resolution were used to assess seasonal and annual trends in vegetation cover using time series analysis (PCA and time profiling).Results indicated that natural vegetation decreased from 46% to 31% of the total landscape between 1990 and 2007 and these biodiversity losses were attributed to an increasing agriculture footprint. Predicted future scenario based on transitional probabilities revealed a continual loss in natural habitat and increase in the agricultural footprint. Time series analysis results (principal components and temporal profiles) suggested that the landscape has a high degree of overall dynamic change with pronounced inter and intra-annual changes and there was an overall increase in greenness associated with increase in agricultural activity. The study concluded that without future conservation interventions natural habitats would continue to disappear, a condition that will impact heavily on biodiversity and significant water dependent ecosystems such as wetlands. This has significant implications for the long-term provision of water from ground water reserves and for the overall sustainability of current agricultural practices.
|
263 |
Fixed-point implementace rozpoznávače řeči / Fixed-Point Implementation Speech RecognizerKrál, Tomáš January 2007 (has links)
Master thesis is related to the problematics of automatic speech recognition on systems with restricted hardware resources - embedded systems. The object of this work was to design and implement speech recognition system on embedded systems, that do not contain floating-point processing units. First objective was to choose proper hardware architecture. Based on the knowledge of available HW resources, the recognition system design was made. During the system development, optimalization was made on constituent elements so they could be mounted on chosen HW. The result of the the project is successful recognition of Czech numerals on embedded system.
|
264 |
From single decisions to sequential choice patterns: Extending the dynamics of value-based decision-makingScherbaum, Stefan, Lade, Steven J., Siegmund, Stefan, Goschke, Thomas, Dshemuchadse, Maja 04 June 2024 (has links)
Every day, we make many value-based decisions where we weigh the value of options with other properties, e.g. their time of delivery. In the laboratory, such value-based decision-making is usually studied on a trial by trial basis and each decision is assumed to represent an isolated choice process. Real-life decisions however are usually embedded in a rich context of previous choices at different time scales. A fundamental question is therefore how the dynamics of value-based decision processes unfold on a time scale across several decisions. Indeed, findings from perceptual decision making suggest that sequential decisions patterns might also be present for vale-based decision making. Here, we use a neural-inspired attractor model as an instance of dynamic models from perceptual decision making, as such models incorporate inherent activation dynamics across decisions. We use the model to predict sequential patterns, namely oscillatory switching, perseveration and dependence of perseveration on the delay between decisions. Furthermore, we predict RT effects for specific sequences of trials. We validate the predictions in two new studies and a reanalysis of existing data from a novel decision game in which participants have to perform delay discounting decisions. Applying the validated reasoning to a well-established choice questionnaire, we illustrate and discuss that taking sequential choice patterns into account may be necessary to accurately analyse and model value-based decision processes, especially when considering differences between individuals.
|
265 |
Röststyrning i industriella miljöer : En undersökning av ordfelsfrekvens för olika kombinationer mellan modellarkitekturer, kommandon och brusreduceringstekniker / Voice command in industrial environments : An investigation of Word Error Rate for different combinations of model architectures, commands and noise reduction techniquesEriksson, Ulrika, Hultström, Vilma January 2024 (has links)
Röststyrning som användargränssnitt kan erbjuda flera fördelar jämfört med mer traditionella styrmetoder. Det saknas dock färdiga lösningar för specifika industriella miljöer, vilka ställer särskilda krav på att korta kommandon tolkas korrekt i olika grad av buller och med begränsad eller ingen internetuppkoppling. Detta arbete ämnade undersöka potentialen för röststyrning i industriella miljöer. Ett koncepttest genomfördes där ordfelsfrekvens (på engelska Word Error Rate eller kortare WER) användes för att utvärdera träffsäkerheten för olika kombinationer av taligenkänningsarkitekturer, brusreduceringstekniker samt kommandolängder i verkliga bullriga miljöer. Undersökningen tog dessutom hänsyn till Lombard-effekten. Resultaten visar att det för samtliga testade miljöer finns god potential för röststyrning med avseende på träffsäkerheten. Framför allt visade DeepSpeech, en djupinlärd taligenkänningsmodell med rekurrent lagerstruktur, kompletterad med domänspecifika språkmodeller och en riktad kardioid-mikrofon en ordfelsfrekvens på noll procent i vissa scenarier och sällan över fem procent. Resultaten visar även att utformningen av kommandon påverkar ordfelsfrekvensen. För en verklig implementation i industriell miljö behövs ytterligare studier om säkerhetslösningar, inkluderande autentisering och hantering av risker med falskt positivt tolkade kommandon. / Voice command as a user interface can offer several advantages over more traditional control methods. However, there is a lack of ready-made solutions for specific industrial environments, which place particular demands on short commands being interpreted correctly in varying degrees of noise and with limited or no internet connection. This work aimed to investigate the potential for voice command in industrial environments. A proof of concept was conducted where Word Error Rate (WER) was used to evaluate the accuracy of various combinations of speech recognition architectures, noise reduction techniques, and command lengths in authentic noisy environments. The investigation also took into account the Lombard effect. The results indicate that for all tested environments there is good potential for voice command with regard to accuracy. In particular, DeepSpeech, a deep-learned speech recognition model with recurrent layer structure, complemented with domain-specific language models and a directional cardioid microphone, showed WER values of zero percent in certain scenarios and rarely above five percent. The results also demonstrate that the design of commands influences WER. For a real implementation in an industrial environment, further studies are needed on security solutions, including authentication and management of risks with false positive interpreted commands.
|
266 |
Multiple sequence analysis in the presence of alignment uncertaintyHerman, Joseph L. January 2014 (has links)
Sequence alignment is one of the most intensely studied problems in bioinformatics, and is an important step in a wide range of analyses. An issue that has gained much attention in recent years is the fact that downstream analyses are often highly sensitive to the specific choice of alignment. One way to address this is to jointly sample alignments along with other parameters of interest. In order to extend the range of applicability of this approach, the first chapter of this thesis introduces a probabilistic evolutionary model for protein structures on a phylogenetic tree; since protein structures typically diverge much more slowly than sequences, this allows for more reliable detection of remote homologies, improving the accuracy of the resulting alignments and trees, and reducing sensitivity of the results to the choice of dataset. In order to carry out inference under such a model, a number of new Markov chain Monte Carlo approaches are developed, allowing for more efficient convergence and mixing on the high-dimensional parameter space. The second part of the thesis presents a directed acyclic graph (DAG)-based approach for representing a collection of sampled alignments. This DAG representation allows the initial collection of samples to be used to generate a larger set of alignments under the same approximate distribution, enabling posterior alignment probabilities to be estimated reliably from a reasonable number of samples. If desired, summary alignments can then be generated as maximum-weight paths through the DAG, under various types of loss or scoring functions. The acyclic nature of the graph also permits various other types of algorithms to be easily adapted to operate on the entire set of alignments in the DAG. In the final part of this work, methodology is introduced for alignment-DAG-based sequence annotation using hidden Markov models, and RNA secondary structure prediction using stochastic context-free grammars. Results on test datasets indicate that the additional information contained within the DAG allows for improved predictions, resulting in substantial gains over simply analysing a set of alignments one by one.
|
267 |
Risques extrêmes en finance : analyse et modélisation / Financial extreme risks : analysis and modelingSalhi, Khaled 05 December 2016 (has links)
Cette thèse étudie la gestion et la couverture du risque en s’appuyant sur la Value-at-Risk (VaR) et la Value-at-Risk Conditionnelle (CVaR), comme mesures de risque. La première partie propose un modèle d’évolution de prix que nous confrontons à des données réelles issues de la bourse de Paris (Euronext PARIS). Notre modèle prend en compte les probabilités d’occurrence des pertes extrêmes et les changements de régimes observés sur les données. Notre approche consiste à détecter les différentes périodes de chaque régime par la construction d’une chaîne de Markov cachée et à estimer la queue de distribution de chaque régime par des lois puissances. Nous montrons empiriquement que ces dernières sont plus adaptées que les lois normales et les lois stables. L’estimation de la VaR est validée par plusieurs backtests et comparée aux résultats d’autres modèles classiques sur une base de 56 actifs boursiers. Dans la deuxième partie, nous supposons que les prix boursiers sont modélisés par des exponentielles de processus de Lévy. Dans un premier temps, nous développons une méthode numérique pour le calcul de la VaR et la CVaR cumulatives. Ce problème est résolu en utilisant la formalisation de Rockafellar et Uryasev, que nous évaluons numériquement par inversion de Fourier. Dans un deuxième temps, nous nous intéressons à la minimisation du risque de couverture des options européennes, sous une contrainte budgétaire sur le capital initial. En mesurant ce risque par la CVaR, nous établissons une équivalence entre ce problème et un problème de type Neyman-Pearson, pour lequel nous proposons une approximation numérique s’appuyant sur la relaxation de la contrainte / This thesis studies the risk management and hedging, based on the Value-at-Risk (VaR) and the Conditional Value-at-Risk (CVaR) as risk measures. The first part offers a stocks return model that we test in real data from NSYE Euronext. Our model takes into account the probability of occurrence of extreme losses and the regime switching observed in the data. Our approach is to detect the different periods of each regime by constructing a hidden Markov chain and estimate the tail of each regime distribution by power laws. We empirically show that powers laws are more suitable than Gaussian law and stable laws. The estimated VaR is validated by several backtests and compared to other conventional models results on a basis of 56 stock market assets. In the second part, we assume that stock prices are modeled by exponentials of a Lévy process. First, we develop a numerical method to compute the cumulative VaR and CVaR. This problem is solved by using the formalization of Rockafellar and Uryasev, which we numerically evaluate by Fourier inversion techniques. Secondly, we are interested in minimizing the hedging risk of European options under a budget constraint on the initial capital. By measuring this risk by CVaR, we establish an equivalence between this problem and a problem of Neyman-Pearson type, for which we propose a numerical approximation based on the constraint relaxation
|
268 |
Uma arquitetura de Agentes BDI para auto-regulação de Trocas Sociais em Sistemas Multiagentes Abertos / SELF-REGULATION OF PERSONALITY-BASED SOCIAL EXCHANGES IN OPEN MULTIAGENT SYSTEMSGonçalves, Luciano Vargas 31 March 2009 (has links)
Made available in DSpace on 2016-03-22T17:26:22Z (GMT). No. of bitstreams: 1
dm2_Luciano_vargas.pdf: 637463 bytes, checksum: b08b63e8c6a347cd2c86fc24fdfd8986 (MD5)
Previous issue date: 2009-03-31 / The study and development of systems to control interactions in multiagent systems
is an open problem in Artificial Intelligence. The system of social exchange values
of Piaget is a social approach that allows for the foundations of the modeling of interactions
between agents, where the interactions are seen as service exchanges between
pairs of agents, with the evaluation of the realized or received services, thats is, the investments
and profits in the exchange, and credits and debits to be charged or received,
respectively, in future exchanges. This evaluation may be performed in different ways
by the agents, considering that they may have different exchange personality traits. In an
exchange process along the time, the different ways in the evaluation of profits and losses
may cause disequilibrium in the exchange balances, where some agents may accumulate
profits and others accumulate losses. To solve the exchange equilibrium problem, we use
the Partially Observable Markov Decision Processes (POMDP) to help the agent decision
of actions that can lead to the equilibrium of the social exchanges. Then, each agent has
its own internal process to evaluate its current balance of the results of the exchange process
between the other agents, observing its internal state, and with the observation of its
partner s exchange behavior, it is able to deliberate on the best action it should perform
in order to get the equilibrium of the exchanges. Considering an open multiagent system,
it is necessary a mechanism to recognize the different personality traits, to build the
POMDPs to manage the exchanges between the pairs of agents. This recognizing task
is done by Hidden Markov Models (HMM), which, from models of known personality
traits, can approximate the personality traits of the new partners, just by analyzing observations
done on the agent behaviors in exchanges. The aim of this work is to develop an
hybrid agent architecture for the self-regulation of social exchanges between personalitybased
agents in a open multiagent system, based in the BDI (Beliefs, Desires, Intentions)
architecture, where the agent plans are obtained from optimal policies of POMDPs, which
model personality traits that are recognized by HMMs. To evaluate the proposed approach
some simulations were done considering (known or new) different personality traits / O estudo e desenvolvimento de sistemas para o controle de interações em sistemas
multiagentes é um tema em aberto dentro da Inteligência Artificial. O sistema de valores
de trocas sociais de Piaget é uma abordagem social que possibilita fundamentar a modelagem
de interações de agentes, onde as interações são vistas como trocas de serviços entre
pares de agentes, com a valorização dos serviços realizados e recebidos, ou seja, investimentos
e ganhos na troca realizada, e, também os créditos e débitos a serem cobrados
ou recebidos, respectivamente, em trocas futuras. Esta avaliação pode ser realizada de
maneira diferenciada pelos agentes envolvidos, considerando que estes apresentam traços
de personalidade distintos. No decorrer de processo de trocas sociais a forma diferenciada
de avaliar os ganhos e perdas nas interações pode causar desequilíbrio nos balanços
de trocas dos agentes, onde alguns agentes acumulam ganhos e outros acumulam perdas.
Para resolver a questão do equilíbrio das trocas, encontrou-se nos Processos de Decisão
de Markov Parcialmente Observáveis (POMDP) uma metodologia capaz de auxiliar a tomada
de decisões de cursos de ações na busca do equilíbrio interno dos agentes. Assim,
cada agente conta com um mecanismo próprio para avaliar o seu estado interno, e, de
posse das observações sobre o comportamento de troca dos parceiros, torna-se apto para
deliberar sobre as melhores ações a seguir na busca do equilíbrio interno para o par de
agentes. Com objetivo de operar em sistema multiagentes aberto, torna-se necessário um
mecanismo para reconhecer os diferentes traços de personalidade, viabilizando o uso de
POMDPs nestes ambientes. Esta tarefa de reconhecimento é desempenhada pelos Modelos
de Estados Ocultos de Markov (HMM), que, a partir de modelos de traços de personalidade
conhecidos, podem inferir os traços aproximados de novos parceiros de interações,
através das observações sobre seus comportamentos nas trocas. O objetivo deste trabalho
é desenvolver uma arquitetura de agentes híbrida para a auto-regulação de trocas sociais
entre agentes baseados em traços de personalidade em sistemas multiagentes abertos. A
arquitetura proposta é baseada na arquitetura BDI (Beliefs, Desires, Intentions), onde os
planos dos agentes são obtidos através de políticas ótimas de POMDPs, que modelam
traços de personalidade reconhecidos através de HMMs. Para avaliar a proposta, foram
realizadas simulações envolvendo traços de personalidade conhecidos e novos traços
|
269 |
Explicit Segmentation Of Speech For Indian LanguagesRanjani, H G 03 1900 (has links)
Speech segmentation is the process of identifying the boundaries between words, syllables or phones in the recorded waveforms of spoken natural languages. The lowest level of speech segmentation is the breakup and classification of the sound signal into a string of phones. The difficulty of this problem is compounded by the phenomenon of co-articulation of speech sounds.
The classical solution to this problem is to manually label and segment spectrograms. In the first step of this two step process, a trained person listens to a speech signal, recognizes the word and phone sequence, and roughly determines the position of each phonetic boundary. The second step involves examining several features of the speech signal to place a boundary mark at the point where these features best satisfy a certain set of conditions specific for that kind of phonetic boundary. Manual segmentation of speech into phones is a highly time-consuming and painstaking process. Required for a variety of applications, such as acoustic analysis, or building speech synthesis databases for high-quality speech output systems, the time required to carry out this process for even relatively small speech databases can rapidly accumulate to prohibitive levels. This calls for automating the segmentation process.
The state-of-art segmentation techniques use Hidden Markov Models (HMM) for phone states. They give an average accuracy of over 95% within 20 ms of manually obtained boundaries. However, HMM based methods require large training data for good performance. Another major disadvantage of such speech recognition based segmentation techniques is that they cannot handle very long utterances, Which are necessary for prosody modeling in speech synthesis applications.
Development of Text to Speech (TTS) systems in Indian languages has been difficult till date owing to the non-availability of sizeable segmented speech databases of good quality. Further, no prosody models exist for most of the Indian languages. Therefore, long utterances (at the paragraph level and monologues) have been recorded, as part of this work, for creating the databases.
This thesis aims at automating segmentation of very long speech sentences recorded for the application of corpus-based TTS synthesis for multiple Indian languages. In this explicit segmentation problem, we need to force align boundaries in any utterance from its known phonetic transcription.
The major disadvantage of forcing boundary alignments on the entire speech waveform of a long utterance is the accumulation of boundary errors. To overcome this, we force boundaries between 2 known phones (here, 2 successive stop consonants are chosen) at a time. Here, the approach used is silence detection as a marker for stop consonants. This method gives around 89% (for Hindi database) accuracy and is language independent and training free. These stop consonants act as anchor points for the next stage.
Two methods for explicit segmentation have been proposed. Both the methods rely on the accuracy of the above stop consonant detection stage.
Another common stage is the recently proposed implicit method which uses Bach scale filter bank to obtain the feature vectors. The Euclidean Distance of the Mean of the Logarithm (EDML) of these feature vectors shows peaks at the point where the spectrum changes. The method performs with an accuracy of 87% within 20 ms of manually obtained boundaries and also achieves a low deletion and insertion rate of 3.2% and 21.4% respectively, for 100 sentences of Hindi database.
The first method is a three stage approach. The first is the stop consonant detection stage followed by the next, which uses Quatieri’s sinusoidal model to classify sounds as voiced/unvoiced within 2 successive stop consonants. The final stage uses the EDML function of Bach scale feature vectors to further obtain boundaries within the voiced and unvoiced regions. It gives a Frame Error Rate (FER) of 26.1% for Hindi database.
The second method proposed uses duration statistics of the phones of the language. It again uses the EDML function of Bach scale filter bank to obtain the peaks at the phone transitions and uses the duration statistics to assign probability to each peak being a boundary. In this method, the FER performance improves to 22.8% for the Hindi database.
Both the methods are equally promising for the fact that they give low frame error rates. Results show that the second method outperforms the first, because it incorporates the knowledge of durations.
For the proposed approaches to be useful, manual interventions are required at the output of each stage. However, this intervention is less tedious and reduces the time taken to segment each sentence by around 60% as compared to the time taken for manual segmentation. The approaches have been successfully tested on 3 different languages, 100 sentences each -Kannada, Tamil and English (we have used TIMIT database for validating the algorithms).
In conclusion, a practical solution to the segmentation problem is proposed. Also, the algorithm being training free, language independent (ES-SABSF method) and speaker independent makes it useful in developing TTS systems for multiple languages reducing the segmentation overhead. This method is currently being used in the lab for segmenting long Kannada utterances, spoken by reading a set of 1115 phonetically rich sentences.
|
270 |
Probabilistic Sequence Models with Speech and Language ApplicationsHenter, Gustav Eje January 2013 (has links)
Series data, sequences of measured values, are ubiquitous. Whenever observations are made along a path in space or time, a data sequence results. To comprehend nature and shape it to our will, or to make informed decisions based on what we know, we need methods to make sense of such data. Of particular interest are probabilistic descriptions, which enable us to represent uncertainty and random variation inherent to the world around us. This thesis presents and expands upon some tools for creating probabilistic models of sequences, with an eye towards applications involving speech and language. Modelling speech and language is not only of use for creating listening, reading, talking, and writing machines---for instance allowing human-friendly interfaces to future computational intelligences and smart devices of today---but probabilistic models may also ultimately tell us something about ourselves and the world we occupy. The central theme of the thesis is the creation of new or improved models more appropriate for our intended applications, by weakening limiting and questionable assumptions made by standard modelling techniques. One contribution of this thesis examines causal-state splitting reconstruction (CSSR), an algorithm for learning discrete-valued sequence models whose states are minimal sufficient statistics for prediction. Unlike many traditional techniques, CSSR does not require the number of process states to be specified a priori, but builds a pattern vocabulary from data alone, making it applicable for language acquisition and the identification of stochastic grammars. A paper in the thesis shows that CSSR handles noise and errors expected in natural data poorly, but that the learner can be extended in a simple manner to yield more robust and stable results also in the presence of corruptions. Even when the complexities of language are put aside, challenges remain. The seemingly simple task of accurately describing human speech signals, so that natural synthetic speech can be generated, has proved difficult, as humans are highly attuned to what speech should sound like. Two papers in the thesis therefore study nonparametric techniques suitable for improved acoustic modelling of speech for synthesis applications. Each of the two papers targets a known-incorrect assumption of established methods, based on the hypothesis that nonparametric techniques can better represent and recreate essential characteristics of natural speech. In the first paper of the pair, Gaussian process dynamical models (GPDMs), nonlinear, continuous state-space dynamical models based on Gaussian processes, are shown to better replicate voiced speech, without traditional dynamical features or assumptions that cepstral parameters follow linear autoregressive processes. Additional dimensions of the state-space are able to represent other salient signal aspects such as prosodic variation. The second paper, meanwhile, introduces KDE-HMMs, asymptotically-consistent Markov models for continuous-valued data based on kernel density estimation, that additionally have been extended with a fixed-cardinality discrete hidden state. This construction is shown to provide improved probabilistic descriptions of nonlinear time series, compared to reference models from different paradigms. The hidden state can be used to control process output, making KDE-HMMs compelling as a probabilistic alternative to hybrid speech-synthesis approaches. A final paper of the thesis discusses how models can be improved even when one is restricted to a fundamentally imperfect model class. Minimum entropy rate simplification (MERS), an information-theoretic scheme for postprocessing models for generative applications involving both speech and text, is introduced. MERS reduces the entropy rate of a model while remaining as close as possible to the starting model. This is shown to produce simplified models that concentrate on the most common and characteristic behaviours, and provides a continuum of simplifications between the original model and zero-entropy, completely predictable output. As the tails of fitted distributions may be inflated by noise or empirical variability that a model has failed to capture, MERS's ability to concentrate on high-probability output is also demonstrated to be useful for denoising models trained on disturbed data. / <p>QC 20131128</p> / ACORNS: Acquisition of Communication and Recognition Skills / LISTA – The Listening Talker
|
Page generated in 0.0594 seconds