Spelling suggestions: "subject:"probabilistic codels"" "subject:"probabilistic 2models""
31 |
A Probabilistic Morphological Analyzer for SyriacMcClanahan, Peter J. 08 July 2010 (has links) (PDF)
We show that a carefully crafted probabilistic morphological analyzer significantly outperforms a reasonable, naive baseline for Syriac. Syriac is an under-resourced Semitic language for which there are no available language tools such as morphological analyzers. Such tools are widely used to contribute to the process of annotating morphologically complex languages. We introduce and connect novel data-driven models for segmentation, dictionary linkage, and morphological tagging in a joint pipeline to create a probabilistic morphological analyzer requiring only labeled data. We explore the performance of this model with varying amounts of training data and find that with about 34,500 tokens, it can outperform the baseline trained on over 99,000 tokens and achieve an accuracy of just over 80%. When trained on all available training data, this joint model achieves 86.47% accuracy — a 29.7% reduction in error rate over the baseline.
|
32 |
Analytical Probabilistic Models for Evaluating the Hydrologic Performance of Structural Low Impact Development PracticesZhang, Shouhong 04 April 2015 (has links)
<p>Low Impact Development (LID) practices have been increasingly used to mitigate the adverse impacts of urbanization. Reliable methods are in need to provide hydrologic performance assessment of different types of LID practices. The purpose of this thesis is to develop a set of analytical models which can be used to assist the planning and design of commonly used structural LID practices such as green roofs, rain gardens, bioretention and permeable pavement systems.</p> <p>The analytical LID models are derived on the basis of exponential probability density functions (PDF) of local rainfall characteristics and mathematical representations of the hydraulic and hydrologic processes occurring in association with the operation of LID practices. Exponential PDFs are found to provide good fits to the histograms of rainfall characteristics of five cities located in different climatic zones. The mathematical representations are all physically based and most of the input parameters used in these representations are the same as those required in commonly used numerical models.</p> <p>The overall reliability of the analytical LID models are tested by comparing the results from these analytical models with results determined from long-term continuous simulations, in addition to that the accuracy of the analytical model for green roofs is also verified against observations from a real case study. The long-term rainfall data from the five cities and a variety of LID practice design configurations are used in the comparisons. The relative differences between the results calculated using the analytical LID models and the results determined from corresponding SWMM simulations are all less than 10%.</p> <p>The Howard’s conservative assumption is adopted in the development of the analytical models for rain gardens and permeable pavement systems. This assumption results in conservative estimations of the stormwater management performances of these LID practices. Instead of adopting the Howard’s conservative assumption, an approximate expected value of the surface depression water content of a bioretention system at the end of a random rainfall event [denoted as ] is derived and used in the development of the analytical model for bioretention systems. The use of is proven to be advantageous over the use of the Howard’s conservative assumption.</p> <p>The analytical LID models are comprised of closed-form mathematical expressions. The application of them can be easy and efficient as illustrated in the application examples. For a specific location of interest, with a goodness-of-fit examination of the exponential PDFs to local rainfall data and verification of the accuracy of the analytical LID models, these models can be used as a convenient planning, design, and management tool for LID practices.</p> / Doctor of Philosophy (PhD)
|
33 |
Energy Distance-Based LossFunctions in Normalizing FlowModelsInge, André January 2024 (has links)
No description available.
|
34 |
Probabilistic modelling of the evolution of ecological interaction networksMinoarivelo, Henintsoa Onivola 12 1900 (has links)
Thesis (MSc)--Stellenbosch University, 2011. / ENGLISH ABSTRACT: In any ecological system, organisms need to interact with each other for their survival. Such interactions form ecological networks which are usually very complex. Nevertheless, they
exhibit well de ned patterns; these regularities are often interpreted as products of meaningful
ecological processes. As the networks are evolving through time, biological evolution
is one of the factors that affects ecological network architecture. In this work, we develop a
mathematical model that represents the evolution through time of such ecological interaction
networks. The problem is approached by modelling network evolution as a continuous time
Markov process, in such a way that the interactions in which a parent species is involved
are potentially inherited by its descendant species. This approach allows us to infer ecological
parameters and ecological network histories from real-world network data, as well as
to simulate ecological networks under our model. While ecologists have long been aware of
the in uence of evolutionary processes in shaping ecological networks, we are now able to
evaluate the importance of such in uence. / AFRIKAANSE OPSOMMING: In enige ekologiese stelsel benodig organismes wisselwerkings met mekaar ten einde te oorleef.
Sulke interaksies vorm ekologiese netwerke wat gewoonlik baie kompleks is maar nogtans
goed-gede nieerde patrone vertoon. Hierdie patrone word dikwels geïnterpreteer as die produk
van betekenisvolle ekologiese prosesse. Aangesien die netwerke met die verloop van
tyd ontwikkel, is biologiese ewolusie een van die faktore wat ekologiese netwerkargitektuur
beïnvloed. In hierdie studie ontwikkel ons 'n wiskundige model wat die ewolusie van sulke
ekologiese interaksienetwerke voorstel. Die probleem word benader deur netwerkewolusie as
'n kontinue-tyd Markov-proses te modelleer, op so 'n manier dat die interaksies waarin 'n
voorouerspesie betrokke is potensieel oorerf kan word deur die afstammelingspesies. Hierdie
benadering laat ons toe om ekologiese parameters en ekologiese netwerkgeskiedenisse vanuit
regte-wêreld data af te lei, sowel as om ekologiese netwerke onder ons model te simuleer.
Alhoewel ekoloë al lank reeds bewus is van die invloed wat ewolusionêre prosesse het op die vorming van ekologiese netwerke, is ons nou in staat om die belangrikheid van hierdie
invloed te evalueer.
|
35 |
Bayesian Generative Modeling of Complex Dynamical SystemsGuan, Jinyan January 2016 (has links)
This dissertation presents a Bayesian generative modeling approach for complex dynamical systems for emotion-interaction patterns within multivariate data collected in social psychology studies. While dynamical models have been used by social psychologists to study complex psychological and behavior patterns in recent years, most of these studies have been limited by using regression methods to fit the model parameters from noisy observations. These regression methods mostly rely on the estimates of the derivatives from the noisy observation, thus easily result in overfitting and fail to predict future outcomes. A Bayesian generative model solves the problem by integrating the prior knowledge of where the data comes from with the observed data through posterior distributions. It allows the development of theoretical ideas and mathematical models to be independent of the inference concerns. Besides, Bayesian generative statistical modeling allows evaluation of the model based on its predictive power instead of the model residual error reduction in regression methods to prevent overfitting in social psychology data analysis. In the proposed Bayesian generative modeling approach, this dissertation uses the State Space Model (SSM) to model the dynamics of emotion interactions. Specifically, it tests the approach in a class of psychological models aimed at explaining the emotional dynamics of interacting couples in committed relationships. The latent states of the SSM are composed of continuous real numbers that represent the level of the true emotional states of both partners. One can obtain the latent states at all subsequent time points by evolving a differential equation (typically a coupled linear oscillator (CLO)) forward in time with some known initial state at the starting time. The multivariate observed states include self-reported emotional experiences and physiological measurements of both partners during the interactions. To test whether well-being factors, such as body weight, can help to predict emotion-interaction patterns, we construct functions that determine the prior distributions of the CLO parameters of individual couples based on existing emotion theories. Besides, we allow a single latent state to generate multivariate observations and learn the group-shared coefficients that specify the relationship between the latent states and the multivariate observations. Furthermore, we model the nonlinearity of the emotional interaction by allowing smooth changes (drift) in the model parameters. By restricting the stochasticity to the parameter level, the proposed approach models the dynamics in longer periods of social interactions assuming that the interaction dynamics slowly and smoothly vary over time. The proposed approach achieves this by applying Gaussian Process (GP) priors with smooth covariance functions to the CLO parameters. Also, we propose to model the emotion regulation patterns as clusters of the dynamical parameters. To infer the parameters of the proposed Bayesian generative model from noisy experimental data, we develop a Gibbs sampler to learn the parameters of the patterns using a set of training couples. To evaluate the fitted model, we develop a multi-level cross-validation procedure for learning the group-shared parameters and distributions from training data and testing the learned models on held-out testing data. During testing, we use the learned shared model parameters to fit the individual CLO parameters to the first 80% of the time points of the testing data by Monte Carlo sampling and then predict the states of the last 20% of the time points. By evaluating models with cross-validation, one can estimate whether complex models are overfitted to noisy observations and fail to generalize to unseen data. I test our approach on both synthetic data that was generated by the generative model and real data that was collected in multiple social psychology experiments. The proposed approach has the potential to model other complex behavior since the generative model is not restricted to the forms of the underlying dynamics.
|
36 |
Métodos Bayesianos aplicados em taxonomia molecular / Bayesian methods applied in molecular taxonomyEdwin Rafael Villanueva Talavera 31 August 2007 (has links)
Neste trabalho são apresentados dois métodos de agrupamento de dados visados para aplicações em taxonomia molecular. Estes métodos estão baseados em modelos probabilísticos, o que permite superar alguns problemas apresentados nos métodos não probabilísticos existentes, como a dificuldade na escolha da métrica de distância e a falta de tratamento e aproveitamento do conhecimento a priori disponível. Os métodos apresentados combinam por meio do teorema de Bayes a informação extraída dos dados com o conhecimento a priori que se dispõe, razão pela qual são denominados métodos Bayesianos. O primeiro método, método de agrupamento hierárquico Bayesiano, está baseado no algoritmo HBC (Hierarchical Bayesian Clustering). Este método constrói uma hierarquia de partições (dendrograma) baseado no critério da máxima probabilidade a posteriori de cada partição. O segundo método é baseado em um tipo de modelo gráfico probabilístico conhecido como redes Gaussianas condicionais, o qual foi adaptado para problemas de agrupamento. Ambos métodos foram avaliados em três bancos de dados donde se conhece a rótulo da classe. Os métodos foram usados também em um problema de aplicação real: a taxonomia de uma coleção brasileira de estirpes de bactérias do gênero Bradyrhizobium (conhecidas por sua capacidade de fixar o \'N IND.2\' do ar no solo). Este banco de dados é composto por dados genotípicos resultantes da análise do RNA ribossômico. Os resultados mostraram que o método hierárquico Bayesiano gera dendrogramas de boa qualidade, em alguns casos superior que o melhor dos algoritmos hierárquicos analisados. O método baseado em redes gaussianas condicionais também apresentou resultados aceitáveis, mostrando um adequado aproveitamento do conhecimento a priori sobre as classes tanto na determinação do número ótimo de grupos, quanto no melhoramento da qualidade dos agrupamentos. / In this work are presented two clustering methods thought to be applied in molecular taxonomy. These methods are based in probabilistic models which overcome some problems observed in traditional clustering methods such as the difficulty to know which distance metric must be used or the lack of treatment of available prior information. The proposed methods use the Bayes theorem to combine the information of the data with the available prior information, reason why they are called Bayesian methods. The first method implemented in this work was the hierarchical Bayesian clustering, which is an agglomerative hierarchical method that constructs a hierarchy of partitions (dendogram) guided by the criterion of maximum Bayesian posterior probability of the partition. The second method is based in a type of probabilistic graphical model knows as conditional Gaussian network, which was adapted for data clustering. Both methods were validated in 3 datasets where the labels are known. The methods were used too in a real problem: the clustering of a brazilian collection of bacterial strains belonging to the genus Bradyrhizobium, known by their capacity to transform the nitrogen (\'N IND.2\') of the atmosphere into nitrogen compounds useful for the host plants. This dataset is formed by genetic data resulting of the analysis of the ribosomal RNA. The results shown that the hierarchical Bayesian clustering method built dendrograms with good quality, in some cases, better than the other hierarchical methods. In the method based in conditional Gaussian network was observed acceptable results, showing an adequate utilization of the prior information (about the clusters) to determine the optimal number of clusters and to improve the quality of the groups.
|
37 |
Métodos Bayesianos aplicados em taxonomia molecular / Bayesian methods applied in molecular taxonomyVillanueva Talavera, Edwin Rafael 31 August 2007 (has links)
Neste trabalho são apresentados dois métodos de agrupamento de dados visados para aplicações em taxonomia molecular. Estes métodos estão baseados em modelos probabilísticos, o que permite superar alguns problemas apresentados nos métodos não probabilísticos existentes, como a dificuldade na escolha da métrica de distância e a falta de tratamento e aproveitamento do conhecimento a priori disponível. Os métodos apresentados combinam por meio do teorema de Bayes a informação extraída dos dados com o conhecimento a priori que se dispõe, razão pela qual são denominados métodos Bayesianos. O primeiro método, método de agrupamento hierárquico Bayesiano, está baseado no algoritmo HBC (Hierarchical Bayesian Clustering). Este método constrói uma hierarquia de partições (dendrograma) baseado no critério da máxima probabilidade a posteriori de cada partição. O segundo método é baseado em um tipo de modelo gráfico probabilístico conhecido como redes Gaussianas condicionais, o qual foi adaptado para problemas de agrupamento. Ambos métodos foram avaliados em três bancos de dados donde se conhece a rótulo da classe. Os métodos foram usados também em um problema de aplicação real: a taxonomia de uma coleção brasileira de estirpes de bactérias do gênero Bradyrhizobium (conhecidas por sua capacidade de fixar o \'N IND.2\' do ar no solo). Este banco de dados é composto por dados genotípicos resultantes da análise do RNA ribossômico. Os resultados mostraram que o método hierárquico Bayesiano gera dendrogramas de boa qualidade, em alguns casos superior que o melhor dos algoritmos hierárquicos analisados. O método baseado em redes gaussianas condicionais também apresentou resultados aceitáveis, mostrando um adequado aproveitamento do conhecimento a priori sobre as classes tanto na determinação do número ótimo de grupos, quanto no melhoramento da qualidade dos agrupamentos. / In this work are presented two clustering methods thought to be applied in molecular taxonomy. These methods are based in probabilistic models which overcome some problems observed in traditional clustering methods such as the difficulty to know which distance metric must be used or the lack of treatment of available prior information. The proposed methods use the Bayes theorem to combine the information of the data with the available prior information, reason why they are called Bayesian methods. The first method implemented in this work was the hierarchical Bayesian clustering, which is an agglomerative hierarchical method that constructs a hierarchy of partitions (dendogram) guided by the criterion of maximum Bayesian posterior probability of the partition. The second method is based in a type of probabilistic graphical model knows as conditional Gaussian network, which was adapted for data clustering. Both methods were validated in 3 datasets where the labels are known. The methods were used too in a real problem: the clustering of a brazilian collection of bacterial strains belonging to the genus Bradyrhizobium, known by their capacity to transform the nitrogen (\'N IND.2\') of the atmosphere into nitrogen compounds useful for the host plants. This dataset is formed by genetic data resulting of the analysis of the ribosomal RNA. The results shown that the hierarchical Bayesian clustering method built dendrograms with good quality, in some cases, better than the other hierarchical methods. In the method based in conditional Gaussian network was observed acceptable results, showing an adequate utilization of the prior information (about the clusters) to determine the optimal number of clusters and to improve the quality of the groups.
|
38 |
Risk based life management of offshore structures and equipmentBharadwaj, Ujjwal R. January 2010 (has links)
Risk based approaches are gaining currency as industry looks for rational, efficient and flexible approaches to managing their structures and equipment. When applied to inspection and maintenance of industrial assets, risk based approaches differ from other approaches mainly in their assessment of failure in its wider context and ramifications. These advanced techniques provide more insight into the causes and avoidance of structural failure and competing risks, as well as the resources needed to manage them. Measuring risk is a challenge that is being met with state of the art technology, skills, knowledge and experience. The thesis presents risk based approaches to solving two specific types of problem in the management of offshore structures and equipments. The first type is finding the optimum timing of an asset life management action such that financial benefit is maximised, considering the cost of the action and the risk (quantified in monetary terms) of not undertaking that action. The approach presented here is applied to managing remedial action in offshore wind farms and specifically to corroded wind turbine tower structures. The second type of problem is how to optimise resources using risk based criteria for managing competing demands. The approach presented here is applied to stocking spares in the shipping sector, where the cost of holding spares is balanced against the risk of failing to meet demands for spares. Risk is the leitmotiv running through this thesis. The approaches discussed here will find application in a variety of situations where competing risks are being managed within constraints.
|
39 |
Mapeamento semântico com aprendizado estatístico relacional para representação de conhecimento em robótica móvel. / Semantic mapping with statistical relational learning for knowledge representation in mobile robotics.Corrêa, Fabiano Rogério 30 March 2009 (has links)
A maior parte dos mapas empregados em tarefas de navegação por robôs móveis representam apenas informações espaciais do ambiente. Outros tipos de informações, que poderiam ser obtidos dos sensores do robô e incorporados à representação, são desprezados. Hoje em dia é comum um robô móvel conter sensores de distância e um sistema de visão, o que permitiria a princípio usá-lo na realização de tarefas complexas e gerais de maneira autônoma, dada uma representação adequada e um meio de extrair diretamente dos sensores o conhecimento necessário. Uma representação possível nesse contexto consiste no acréscimo de informação semântica aos mapas métricos, como por exemplo a segmentação do ambiente seguida da rotulação de cada uma de suas partes. O presente trabalho propõe uma maneira de estruturar a informação espacial criando um mapa semântico do ambiente que representa, além de obstáculos, um vínculo entre estes e as imagens segmentadas correspondentes obtidas por um sistema de visão omnidirecional. A representação é implementada por uma descrição relacional do domínio, que quando instanciada gera um campo aleatório condicionado, onde são realizadas as inferências. Modelos que combinam probabilidade e lógica de primeira ordem são mais expressivos e adequados para estruturar informações espaciais em semânticas. / Most maps used in navigational tasks by mobile robots represent only environmental spatial information. Other kinds of information, that might be obtained from the sensors of the robot and incorporated in the representation, are negleted. Nowadays it is common for mobile robots to have distance sensors and a vision system, which could in principle be used to accomplish complex and general tasks in an autonomously manner, given an adequate representation and a way to extract directly from the sensors the necessary knowledge. A possible representation in this context consists of the addition of semantic information to metric maps, as for example the environment segmentation followed by an attribution of labels to them. This work proposes a way to structure the spatial information in order to create a semantic map representing, beyond obstacles, an anchoring between them and the correspondent segmented images obtained by an omnidirectional vision system. The representation is implemented by a domains relational description that, when instantiated, produces a conditional random field, which supports the inferences. Models that combine probability and firstorder logic are more expressive and adequate to structure spatial in semantic information.
|
40 |
Probabilistic Models for the Analysis of Gene Expression ProfilesQuon, Gerald 16 August 2013 (has links)
Gene expression profiles are some of the most abundant sources of data about the cellular state of a collection of cells in an organism. Comparison of the expression profiles of multiple samples allows biologists to find associations between observations at the molecular level and the phenotype of the samples. A key challenge is to distinguish variation in expression due to biological factors of interest from variation due to confounding factors that can arise for unrelated technical or biological reasons. This thesis presents models that can explicitly adjust the comparison of expression profiles to account for specific types of confounding factors.
One such confounding factor arises when comparing tissue-specific expression profiles across multiple organisms to identify differences in expression that are indicative of changes in gene function. When the organisms are separated by long evolutionary distances, tissue functions may be re-distributed and introduce expression changes unrelated to changes in gene function. We developed Brownian Factor Phylogenetic Analysis, a model that can account for such re-distribution of function, and demonstrate that removing this confounding factor improves tasks such as predicting gene function.
Another confounding factor arises because current protocols for expression profiling require RNA extracts from multiple cells. Often biological samples are heterogeneous mixtures of multiple cell types, so the measured expression profile is an average of the RNA levels of the constituent cells. When the biological sample contains both cells of interest and nuisance cells, the confounding expression from the nuisance cells can mask the expression of the cells of interest. We developed ISOLATE and ISOpure, two models for addressing the heterogeneity of tumor samples. We demonstrated that modeling tumor heterogeneity leads to an improvement in two tasks: identifying the site of origin of metastatic tumors, and predicting the risk of death of lung cancer patients.
|
Page generated in 0.0853 seconds