Global ETD Search

21	Combining Probabilistic and Discrete Methods for Sequence Modelling Gudjonsen, Ludvik January 1999 (has links) Sequence modelling is used for analysing newly sequenced proteins, giving indication of the 3-D structure and functionality. Current approaches to the modelling of protein families are either based on discrete or probabilistic methods. Here we present an approach for combining these two approaches in a hybrid model, where discrete patterns are used to model conserved regions and probabilistic models are used for variable regions. When hidden Markov models are used to model the variable regions, the hybrid method gives increased classification accuracy, compared to pure discrete or probabilistic models. Bioinformatics Bio-sequence analyse Probabilistic models Hybrid Information Systems
22	Normal Factor Graphs Al-Bashabsheh, Ali January 2014 (has links) This thesis introduces normal factor graphs under a new semantics, namely, the exterior function semantics. Initially, this work was motivated by two distinct lines of research. One line is ``holographic algorithms,'' a powerful approach introduced by Valiant for solving various counting problems in computer science; the other is ``normal graphs,'' an elegant framework proposed by Forney for representing codes defined on graphs. The nonrestrictive normality constraint enables the notion of holographic transformations for normal factor graphs. We establish a theorem, called the generalized Holant theorem, which relates a normal factor graph to its holographic transformation. We show that the generalized Holant theorem on one hand underlies the principle of holographic algorithms, and on the other reduces to a general duality theorem for normal factor graphs, a special case of which was first proved by Forney. As an application beyond Forney's duality, we show that the normal factor graphs duality facilitates the approximation of the partition function for the two-dimensional nearest-neighbor Potts model. In the course of our development, we formalize a new semantics for normal factor graphs, which highlights various linear algebraic properties that enables the use of normal factor graphs as a linear algebraic tool. Indeed, we demonstrate the ability of normal factor graphs to encode several concepts from linear algebra and present normal factor graphs as a generalization of ``trace diagrams.'' We illustrate, with examples, the workings of this framework and how several identities from linear algebra may be obtained using a simple graphical manipulation procedure called ``vertex merging/splitting.'' We also discuss translation association schemes with the aid of normal factor graphs, which we believe provides a simple approach to understanding the subject. Further, under the new semantics, normal factor graphs provide a probabilistic model that unifies several graphical models such as factor graphs, convolutional factor graphs, and cumulative distribution networks. Holographic transformations sum of products partition function probabilistic models graphical models factor graphs trace diagrams
23	Computer-Aided Synthesis of Probabilistic Models / Computer-Aided Synthesis of Probabilistic Models Andriushchenko, Roman January 2020 (has links) Předkládaná práce se zabývá problémem automatizované syntézy pravděpodobnostních systémů: máme-li rodinu Markovských řetězců, jak lze efektivně identifikovat ten který odpovídá zadané specifikaci? Takové rodiny často vznikají v nejrůznějších oblastech inženýrství při modelování systémů s neurčitostí a rozhodování i těch nejjednodušších syntézních otázek představuje NP-těžký problém. V dané práci my zkoumáme existující techniky založené na protipříklady řízené induktivní syntéze (counterexample-guided inductive synthesis, CEGIS) a na zjemňování abstrakce (counterexample-guided abstraction refinement, CEGAR) a navrhujeme novou integrovanou metodu pro pravděpodobnostní syntézu. Experimenty nad relevantními modely demonstrují, že navržená technika je nejenom srovnatelná s moderními metodami, ale ve většině případů dokáže výrazně překonat, někdy i o několik řádů, existující přístupy.
24	Can Knowledge Rich Sentences Help Language Models To Solve Common Sense Reasoning Problems? January 2019 (has links) abstract: Significance of real-world knowledge for Natural Language Understanding(NLU) is well-known for decades. With advancements in technology, challenging tasks like question-answering, text-summarizing, and machine translation are made possible with continuous efforts in the field of Natural Language Processing(NLP). Yet, knowledge integration to answer common sense questions is still a daunting task. Logical reasoning has been a resort for many of the problems in NLP and has achieved considerable results in the field, but it is difficult to resolve the ambiguities in a natural language. Co-reference resolution is one of the problems where ambiguity arises due to the semantics of the sentence. Another such problem is the cause and result statements which require causal commonsense reasoning to resolve the ambiguity. Modeling these type of problems is not a simple task with rules or logic. State-of-the-art systems addressing these problems use a trained neural network model, which claims to have overall knowledge from a huge trained corpus. These systems answer the questions by using the knowledge embedded in their trained language model. Although the language models embed the knowledge from the data, they use occurrences of words and frequency of co-existing words to solve the prevailing ambiguity. This limits the performance of language models to solve the problems in common-sense reasoning task as it generalizes the concept rather than trying to answer the problem specific to its context. For example, "The painting in Mark's living room shows an oak tree. It is to the right of a house", is a co-reference resolution problem which requires knowledge. Language models can resolve whether "it" refers to "painting" or "tree", since "house" and "tree" are two common co-occurring words so the models can resolve "tree" to be the co-reference. On the other hand, "The large ball crashed right through the table. Because it was made of Styrofoam ." to resolve for "it" which can be either "table" or "ball", is difficult for a language model as it requires more information about the problem. In this work, I have built an end-to-end framework, which uses the automatically extracted knowledge based on the problem. This knowledge is augmented with the language models using an explicit reasoning module to resolve the ambiguity. This system is built to improve the accuracy of the language models based approaches for commonsense reasoning. This system has proved to achieve the state of the art accuracy on the Winograd Schema Challenge. / Dissertation/Thesis / Masters Thesis Computer Science 2019 Artificial intelligence Commonsense reasoning Deep Learning Knowledge hunting Machine Learning Natural Language Processing Probabilistic models
25	DROUGHT CHARACTERIZATION USING PROBABILISTIC MODELS Ganeshchandra Mallya (5930027) 23 June 2020 (has links) <p>Droughts are complex natural disasters caused due to deficit in water availability over a region. Water availability is strongly linked to precipitation in many parts of the world that rely on monsoonal rains. Recent studies indicate that the choice of precipitation datasets and drought indices could influence drought analysis. Therefore, drought characteristics for the Indian monsoon region were reassessed for the period 1901-2004 using two different datasets and standard precipitation index (SPI), standardized precipitation-evapotranspiration index (SPEI), Gaussian mixture model-based drought index (GMM-DI), and hidden Markov model-based drought index (HMM-DI). Drought trends and variability were analyzed for three epochs: 1901-1935, 1936-1970 and 1971-2004. Irrespective of the dataset and methodology used, the results indicate an increasing trend in drought severity and frequency during the recent decades (1971-2004). Droughts are becoming more regional and are showing a general shift to the agriculturally important coastal south-India, central Maharashtra, and Indo‑Gangetic plains indicating food security challenges and socioeconomic vulnerability in the region.</p><p><br></p><p> </p><p><br></p><p>Drought severities are commonly reported using drought classes obtained by assigning pre-defined thresholds on drought indices. Current drought classification methods ignore modeling uncertainties and provide discrete drought classification. However, the users of drought classification are often interested in knowing inherent uncertainties in classification so that they can make informed decisions. A probabilistic Gamma mixture model (Gamma-MM)-based drought index is proposed as an alternative to deterministic classification by SPI. The Bayesian framework of the proposed model avoids over-specification and overfitting by choosing the optimum number of mixture components required to model the data - a problem that is often encountered in other probabilistic drought indices (e.g., HMM-DI). When sufficient number of components are used in Gamma-MM, it can provide a good approximation to any continuous distribution in the range (0,infinity), thus addressing the problem of choosing an appropriate distribution for SPI analysis. The Gamma-MM propagates model uncertainties to drought classification. The method is tested on rainfall data over India. A comparison of the results with standard SPI shows significant differences, particularly when SPI assumptions on data distribution are violated.</p><p><br></p><p> </p><p><br></p><p>Finding regions with similar drought characteristics is useful for policy-makers and water resources planners in the optimal allocation of resources, developing drought management plans, and taking timely actions to mitigate the negative impacts during droughts. Drought characteristics such as intensity, frequency, and duration, along with land-use and geographic information, were used as input features for clustering algorithms. Three methods, namely, (i) a Bayesian graph cuts algorithm that combines the Gaussian mixture model (GMM) and Markov random fields (MRF), (ii) k-means, and (iii) hierarchical agglomerative clustering algorithm were used to find homogeneous drought regions that are spatially contiguous and possess similar drought characteristics. The number of homogeneous clusters and their shape was found to be sensitive to the choice of the drought index, the time window of drought, period of analysis, dimensionality of input datasets, clustering method, and model parameters of clustering algorithms. Regionalization for different epochs provided useful insight into the space-time evolution of homogeneous drought regions over the study area. Strategies to combine the results from multiple clustering methods were presented. These results can help policy-makers and water resources planners in the optimal allocation of resources, developing drought management plans, and taking timely actions to mitigate the negative impacts during droughts.</p> Droughts Probabilistic Models Hydroclimatology Extreme events Clustering Climate change
26	Combining machine learning and evolution for the annotation of metagenomics data / La combinaison de l'apprentissage statistique et de l'évolution pour l'annotation des données métagénomiques Ugarte, Ari 16 December 2016 (has links) La métagénomique sert à étudier les communautés microbiennes en analysant de l’ADN extrait directement d’échantillons pris dans la nature, elle permet également d’établir un catalogue très étendu des gènes présents dans les communautés microbiennes. Ce catalogue doit être comparé contre les gènes déjà référencés dans les bases des données afin de retrouver des séquences similaires et ainsi déterminer la fonction des séquences qui le composent. Au cours de cette thèse, nous avons développé MetaCLADE, une nouvelle méthodologie qui améliore la détection des domaines protéiques déjà référencés pour des séquences issues des données métagénomiques et métatranscriptomiques. Pour le développement de MetaCLADE, nous avons modifié un système d’annotations de domaines protéiques qui a été développé au sein du Laboratoire de Biologie Computationnelle et Quantitative appelé CLADE (CLoser sequences for Annotations Directed by Evolution) [17]. En général les méthodes pour l’annotation de domaines protéiques caractérisent les domaines connus avec des modèles probabilistes. Ces modèles probabilistes, appelés Sequence Consensus Models (SCMs) sont construits à partir d’un alignement des séquences homologues appartenant à différents clades phylogénétiques et ils représentent le consensus à chaque position de l’alignement. Cependant, quand les séquences qui forment l’ensemble des homologues sont très divergentes, les signaux des SCMs deviennent trop faibles pour être identifiés et donc l’annotation échoue. Afin de résoudre ce problème d’annotation de domaines très divergents, nous avons utilisé une approche fondée sur l’observation que beaucoup de contraintes fonctionnelles et structurelles d’une protéine ne sont pas globalement conservées parmi toutes les espèces, mais elles peuvent être conservées localement dans des clades. L’approche consiste donc à élargir le catalogue de modèles probabilistes en créant de nouveaux modèles qui mettent l’accent sur les caractéristiques propres à chaque clade. MetaCLADE, un outil conçu dans l’objectif d’annoter avec précision des séquences issues des expériences métagénomiques et métatranscriptomiques utilise cette libraire afin de trouver des correspondances entre les modèles et une base de données de séquences métagénomiques ou métatranscriptomiques. En suite, il se sert d’une étape pré-calculée pour le filtrage des séquences qui permet de déterminer la probabilité qu’une prédiction soit considérée vraie. Cette étape pré-calculée est un processus d’apprentissage qui prend en compte la fragmentation de séquences métagénomiques pour les classer.Nous avons montré que l’approche multi source en combinaison avec une stratégie de méta apprentissage prenant en compte la fragmentation atteint une très haute performance. / Metagenomics is used to study microbial communities by the analyze of DNA extracted directly from environmental samples. It allows to establish a catalog very extended of genes present in the microbial communities. This catalog must be compared against the genes already referenced in the databases in order to find similar sequences and thus determine their function. In the course of this thesis, we have developed MetaCLADE, a new methodology that improves the detection of protein domains already referenced for metagenomic and metatranscriptomic sequences. For the development of MetaCLADE, we modified an annotation system of protein domains that has been developed within the Laboratory of Computational and Quantitative Biology clade called (closer sequences for Annotations Directed by Evolution) [17]. In general, the methods for the annotation of protein domains characterize protein domains with probabilistic models. These probabilistic models, called sequence consensus models (SCMs) are built from the alignment of homolog sequences belonging to different phylogenetic clades and they represent the consensus at each position of the alignment. However, when the sequences that form the homolog set are very divergent, the signals of the SCMs become too weak to be identified and therefore the annotation fails. In order to solve this problem of annotation of very divergent domains, we used an approach based on the observation that many of the functional and structural constraints in a protein are not broadly conserved among all species, but they can be found locally in the clades. The approach is therefore to expand the catalog of probabilistic models by creating new models that focus on the specific characteristics of each clade. MetaCLADE, a tool designed with the objective of annotate with precision sequences coming from metagenomics and metatranscriptomics studies uses this library in order to find matches between the models and a database of metagenomic or metatranscriptomic sequences. Then, it uses a pre-computed step for the filtering of the sequences which determine the probability that a prediction is a true hit. This pre-calculated step is a learning process that takes into account the fragmentation of metagenomic sequences to classify them. We have shown that the approach multi source in combination with a strategy of meta-learning taking into account the fragmentation outperforms current methods. Métagénomique Métatranscriptomique Annotation de domaine Apprentissage statistique Annotation de protéine Modèle probabiliste Metagenomics Metatranscriptomics Probabilistic models 004
27	A Probabilistic Morphological Analyzer for Syriac McClanahan, Peter J. 08 July 2010 (has links) (PDF) We show that a carefully crafted probabilistic morphological analyzer significantly outperforms a reasonable, naive baseline for Syriac. Syriac is an under-resourced Semitic language for which there are no available language tools such as morphological analyzers. Such tools are widely used to contribute to the process of annotating morphologically complex languages. We introduce and connect novel data-driven models for segmentation, dictionary linkage, and morphological tagging in a joint pipeline to create a probabilistic morphological analyzer requiring only labeled data. We explore the performance of this model with varying amounts of training data and find that with about 34,500 tokens, it can outperform the baseline trained on over 99,000 tokens and achieve an accuracy of just over 80%. When trained on all available training data, this joint model achieves 86.47% accuracy — a 29.7% reduction in error rate over the baseline. segmentation dictionary linkage morphological tagging Syriac Semitic languages probabilistic models joint pipelines Computer Sciences
28	Analytical Probabilistic Models for Evaluating the Hydrologic Performance of Structural Low Impact Development Practices Zhang, Shouhong 04 April 2015 (has links) <p>Low Impact Development (LID) practices have been increasingly used to mitigate the adverse impacts of urbanization. Reliable methods are in need to provide hydrologic performance assessment of different types of LID practices. The purpose of this thesis is to develop a set of analytical models which can be used to assist the planning and design of commonly used structural LID practices such as green roofs, rain gardens, bioretention and permeable pavement systems.</p> <p>The analytical LID models are derived on the basis of exponential probability density functions (PDF) of local rainfall characteristics and mathematical representations of the hydraulic and hydrologic processes occurring in association with the operation of LID practices. Exponential PDFs are found to provide good fits to the histograms of rainfall characteristics of five cities located in different climatic zones. The mathematical representations are all physically based and most of the input parameters used in these representations are the same as those required in commonly used numerical models.</p> <p>The overall reliability of the analytical LID models are tested by comparing the results from these analytical models with results determined from long-term continuous simulations, in addition to that the accuracy of the analytical model for green roofs is also verified against observations from a real case study. The long-term rainfall data from the five cities and a variety of LID practice design configurations are used in the comparisons. The relative differences between the results calculated using the analytical LID models and the results determined from corresponding SWMM simulations are all less than 10%.</p> <p>The Howard’s conservative assumption is adopted in the development of the analytical models for rain gardens and permeable pavement systems. This assumption results in conservative estimations of the stormwater management performances of these LID practices. Instead of adopting the Howard’s conservative assumption, an approximate expected value of the surface depression water content of a bioretention system at the end of a random rainfall event [denoted as ] is derived and used in the development of the analytical model for bioretention systems. The use of is proven to be advantageous over the use of the Howard’s conservative assumption.</p> <p>The analytical LID models are comprised of closed-form mathematical expressions. The application of them can be easy and efficient as illustrated in the application examples. For a specific location of interest, with a goodness-of-fit examination of the exponential PDFs to local rainfall data and verification of the accuracy of the analytical LID models, these models can be used as a convenient planning, design, and management tool for LID practices.</p> / Doctor of Philosophy (PhD) Environmental Engineering Hydraulic Engineering Environmental Engineering
29	Energy Distance-Based LossFunctions in Normalizing FlowModels Inge, André January 2024 (has links) No description available. Energy distance Normalizing flows loss function error function generative probabilistic models Mathematics Matematik
30	Probabilistic modelling of the evolution of ecological interaction networks Minoarivelo, Henintsoa Onivola 12 1900 (has links) Thesis (MSc)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: In any ecological system, organisms need to interact with each other for their survival. Such interactions form ecological networks which are usually very complex. Nevertheless, they exhibit well de ned patterns; these regularities are often interpreted as products of meaningful ecological processes. As the networks are evolving through time, biological evolution is one of the factors that affects ecological network architecture. In this work, we develop a mathematical model that represents the evolution through time of such ecological interaction networks. The problem is approached by modelling network evolution as a continuous time Markov process, in such a way that the interactions in which a parent species is involved are potentially inherited by its descendant species. This approach allows us to infer ecological parameters and ecological network histories from real-world network data, as well as to simulate ecological networks under our model. While ecologists have long been aware of the in uence of evolutionary processes in shaping ecological networks, we are now able to evaluate the importance of such in uence. / AFRIKAANSE OPSOMMING: In enige ekologiese stelsel benodig organismes wisselwerkings met mekaar ten einde te oorleef. Sulke interaksies vorm ekologiese netwerke wat gewoonlik baie kompleks is maar nogtans goed-gede nieerde patrone vertoon. Hierdie patrone word dikwels geïnterpreteer as die produk van betekenisvolle ekologiese prosesse. Aangesien die netwerke met die verloop van tyd ontwikkel, is biologiese ewolusie een van die faktore wat ekologiese netwerkargitektuur beïnvloed. In hierdie studie ontwikkel ons 'n wiskundige model wat die ewolusie van sulke ekologiese interaksienetwerke voorstel. Die probleem word benader deur netwerkewolusie as 'n kontinue-tyd Markov-proses te modelleer, op so 'n manier dat die interaksies waarin 'n voorouerspesie betrokke is potensieel oorerf kan word deur die afstammelingspesies. Hierdie benadering laat ons toe om ekologiese parameters en ekologiese netwerkgeskiedenisse vanuit regte-wêreld data af te lei, sowel as om ekologiese netwerke onder ons model te simuleer. Alhoewel ekoloë al lank reeds bewus is van die invloed wat ewolusionêre prosesse het op die vorming van ekologiese netwerke, is ons nou in staat om die belangrikheid van hierdie invloed te evalueer. Ecological interaction Evolution Probabilistic models Markov process Dissertations -- Mathematics Theses -- Mathematics Dissertations -- Computer science Theses -- Computer science

Search results