Global ETD Search

11	Calibrated Bayes Factor and Bayesian Model Averaging zheng, jiayin 14 August 2018 (has links) No description available. Statistics
12	Computational Inference of Genome-Wide Protein-DNA Interactions Using High-Throughput Genomic Data Zhong, Jianling January 2015 (has links) <p>Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape. </p><p>We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations. </p><p>We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites. </p><p>Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets. </p><p>This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.</p> / Dissertation Bioinformatics Statistics Computer science Bayes factor Genomic data integration Protein-DNA interactions state-space models statistical inference transcriptional regulation
13	Análise de agrupamento de semeadoras manuais quanto à distribuição do número de sementes / Cluster analysis of manual planters according to the distribution of the number of seeds Araripe, Patricia Peres 10 December 2015 (has links) A semeadora manual é uma ferramenta que, ainda nos dias de hoje, exerce um papel importante em diversos países do mundo que praticam a agricultura familiar e de conservação. Sua utilização é de grande importância devido a minimização do distúrbio do solo, exigências de trabalho no campo, maior produtividade sustentável entre outros fatores. De modo a avaliar e/ou comparar as semeadoras manuais existentes no mercado, diversos trabalhos têm sido realizados, porém considerando somente medidas de posição e dispersão. Neste trabalho é utilizada, como alternativa, uma metodologia para a comparação dos desempenhos das semeadoras manuais. Neste caso, estimou-se as probabilidades associadas a cada categoria de resposta e testou-se a hipótese de que essas probabilidades não variam para as semeadoras quando comparadas duas a duas, utilizando o teste da razão das verossimilhanças e o fator de Bayes nos paradigmas clássico e bayesiano, respectivamente. Por fim, as semeadoras foram agrupadas considerando, como medida de distância, a medida de divergência J-divergência na análise de agrupamento. Como ilustração da metodologia apresentada, são considerados os dados para a comparação de quinze semeadoras manuais de diferentes fabricantes analisados por Molin, Menegatti e Gimenez (2001) em que as semeadoras foram reguladas para depositarem exatamente duas sementes por golpe. Inicialmente, na abordagem clássica, foram comparadas as semeadoras que não possuíam valores nulos nas categorias de resposta, sendo as semeadoras 3, 8 e 14 as que apresentaram melhores comportamentos. Posteriormente, todas as semeadoras foram comparadas duas a duas, agrupando-se as categorias e adicionando as contantes 0,5 ou 1 à cada categoria de resposta. Ao agrupar categorias foi difícil a tomada de conclusões pelo teste da razão de verossimilhanças, evidenciando somente o fato da semeadora 15 ser diferente das demais. Adicionando 0,5 ou 1 à cada categoria não obteve-se, aparentemente, a formação de grupos distintos, como a semeadora 1 pelo teste diferiu das demais e apresentou maior frequência no depósito de duas sementes, o exigido pelo experimento agronômico, foi a recomendada neste trabalho. Na abordagem bayesiana, utilizou-se o fator de Bayes para comparar as semeadoras duas a duas, no entanto as conclusões foram semelhantes às obtidas na abordagem clássica. Finalmente, na análise de agrupamento foi possível uma melhor visualização dos grupos de semeadoras semelhantes entre si em ambas as abordagens, reafirmando os resultados obtidos anteriormente. / The manual planter is a tool that today still has an important role in several countries around the world, which practices family and conservation agriculture. The use of it has importance due to minimizing soil disturbance, labor requirements in the field, most sustainable productivity and other factors. In order to analyze and/or compare the commercial manual planters, several studies have been conducted, but considering only position and dispersion measures. This work presents an alternatively method for comparing the performance of manual planters. In this case, the probabilities associated with each category of response has estimated and the hypothesis that these probabilities not vary for planters when compared in pairs evaluated using the likelihood ratio test and Bayes factor in the classical and bayesian paradigms, respectively. Finally, the planters were grouped considering as a measure of distance, the divergence measure J-divergence in the cluster analysis. As an illustration of this methodology, the data from fifteen manual planters adjusted to deposit exactly two seeds per hit of different manufacturers analyzed by Molin, Menegatti and Gimenez (2001) were considered. Initially, in the classical approach, the planters without zero values in response categories were compared and the planters 3, 8 and 14 presents the better behavior. After, all the planters were compared in pairs, grouping categories and adding the constants 0,5 or 1 for each response category. Grouping categories was difficult making conclusions by the likelihood ratio test, only highlighting the fact that the planter 15 is different from others. Adding 0,5 or 1 for each category, apparently not obtained the formation of different groups, such as planter 1 which by the test differed from the others and presented more frequently the deposit of two seeds, required by agronomic experiment and recommended in this work. In the Bayesian approach, the Bayes factor was used to compare the planters in pairs, but the findings were similar to those obtained in the classical approach. Finally, the cluster analysis allowed a better idea of similar planters groups with each other in the both approaches, confirming the results obtained previously. Análise de agrupamentos Bayes factor Cluster analysis Fator de Bayes Likelihood ratio test Manual planter Semeadora manual Teste da razão de verossimilhanças
14	Stochastic Modeling and Statistical Inference of Geological Fault Populations and Patterns Borgos, Hilde Grude January 2000 (has links) <p>The focus of this work is on faults, and the main issue is statistical analysis and stochastic modeling of faults and fault patterns in petroleum reservoirs. The thesis consists of Part I-V and Appendix A-C. The units can be read independently. Part III is written for a geophysical audience, and the topic of this part is fault and fracture size-frequency distributions. The remaining parts are written for a statistical audience, but can also be read by people with an interest in quantitative geology. The topic of Part I and II is statistical model choice for fault size distributions, with a samling algorithm for estimating Bayes factor. Part IV describes work on spatial modeling of fault geometry, and Part V is a short note on line partitioning. Part I, II and III constitute the main part of the thesis. The appendices are conference abstracts and papers based on Part I and IV.</p> / Paper III: reprinted with kind permission of the American Geophysical Union. An edited version of this paper was published by AGU. Copyright [2000] American Geophysical Union Mathematical statistics Stochastic modeling Fault size Fault population Fault pattern Bayes factor Sampling algorithm Matematisk statistik Mathematical statistics Matematisk statistik
15	Stochastic Modeling and Statistical Inference of Geological Fault Populations and Patterns Borgos, Hilde Grude January 2000 (has links) The focus of this work is on faults, and the main issue is statistical analysis and stochastic modeling of faults and fault patterns in petroleum reservoirs. The thesis consists of Part I-V and Appendix A-C. The units can be read independently. Part III is written for a geophysical audience, and the topic of this part is fault and fracture size-frequency distributions. The remaining parts are written for a statistical audience, but can also be read by people with an interest in quantitative geology. The topic of Part I and II is statistical model choice for fault size distributions, with a samling algorithm for estimating Bayes factor. Part IV describes work on spatial modeling of fault geometry, and Part V is a short note on line partitioning. Part I, II and III constitute the main part of the thesis. The appendices are conference abstracts and papers based on Part I and IV. / Paper III: reprinted with kind permission of the American Geophysical Union. An edited version of this paper was published by AGU. Copyright [2000] American Geophysical Union Mathematical statistics Stochastic modeling Fault size Fault population Fault pattern Bayes factor Sampling algorithm Matematisk statistik Mathematical statistics Matematisk statistik
16	[en] DETECTING AND SUBSTUTING DISCONTINUITIES IN MINUTE-BY-MINUTE LOAD DATA VIA BAYES FACTOR / [pt] DETECÇÃO E SUBSTITUIÇÃO DE DESCONTINUIDADES NAS SÉRIES DE CARGA MINUTO À MINUTO DO CNOS VIA FATOR DE BAYES SANDRA CANTON CARDOSO 09 November 2005 (has links) [pt] No Centro Nacional de Operação dos Sistemas - CNOS, órgão da Eletrobrás, encarregado de controlar o sistema elétrico brasileiro, localizado em Brasília, a varredura é feita de 20 em 20s e depois agregada para o minuto. Na transmissão dos dados há muitos erros de medidas acarretando descontinuidades visíveis. Estes erros podem ser causados por problemas na transmissão dos dados ou problema físico da medida em si. O objetivo desta dissertação é a implementação de um sistema que detecte e corrija estas descontinuidades nas séries de carga minuto a minuto do CNOS via Fator de Bayes. / [en] In the National Center for System Operation (CNOS), the Eletrobrás organ which controls the Brazilian electrical system, readings of load demand are taken every 20 seconds, and then integrated over the minute, to provide ninute-to-minute data. These data are then radio- transmitted via satellite. Many errors occur during the reading or the transmission, and so the data series contains many missing values (which appear as discontinuities in the graph of the series). In this paper, we propose a system that detects and corrects automatically these errors in the demand data, by means of a Bayesian approach using the Bayes factor. [pt] SERIES TEMPORAIS [en] TIME SERIES [pt] FATOR DE BAYES [en] BAYES FACTOR [pt] DADOS MINUTO-A-MINUTO DE CARGA [en] MINUTE-BY-MINUTE LOAD DATA
17	Análise de agrupamento de semeadoras manuais quanto à distribuição do número de sementes / Cluster analysis of manual planters according to the distribution of the number of seeds Patricia Peres Araripe 10 December 2015 (has links) A semeadora manual é uma ferramenta que, ainda nos dias de hoje, exerce um papel importante em diversos países do mundo que praticam a agricultura familiar e de conservação. Sua utilização é de grande importância devido a minimização do distúrbio do solo, exigências de trabalho no campo, maior produtividade sustentável entre outros fatores. De modo a avaliar e/ou comparar as semeadoras manuais existentes no mercado, diversos trabalhos têm sido realizados, porém considerando somente medidas de posição e dispersão. Neste trabalho é utilizada, como alternativa, uma metodologia para a comparação dos desempenhos das semeadoras manuais. Neste caso, estimou-se as probabilidades associadas a cada categoria de resposta e testou-se a hipótese de que essas probabilidades não variam para as semeadoras quando comparadas duas a duas, utilizando o teste da razão das verossimilhanças e o fator de Bayes nos paradigmas clássico e bayesiano, respectivamente. Por fim, as semeadoras foram agrupadas considerando, como medida de distância, a medida de divergência J-divergência na análise de agrupamento. Como ilustração da metodologia apresentada, são considerados os dados para a comparação de quinze semeadoras manuais de diferentes fabricantes analisados por Molin, Menegatti e Gimenez (2001) em que as semeadoras foram reguladas para depositarem exatamente duas sementes por golpe. Inicialmente, na abordagem clássica, foram comparadas as semeadoras que não possuíam valores nulos nas categorias de resposta, sendo as semeadoras 3, 8 e 14 as que apresentaram melhores comportamentos. Posteriormente, todas as semeadoras foram comparadas duas a duas, agrupando-se as categorias e adicionando as contantes 0,5 ou 1 à cada categoria de resposta. Ao agrupar categorias foi difícil a tomada de conclusões pelo teste da razão de verossimilhanças, evidenciando somente o fato da semeadora 15 ser diferente das demais. Adicionando 0,5 ou 1 à cada categoria não obteve-se, aparentemente, a formação de grupos distintos, como a semeadora 1 pelo teste diferiu das demais e apresentou maior frequência no depósito de duas sementes, o exigido pelo experimento agronômico, foi a recomendada neste trabalho. Na abordagem bayesiana, utilizou-se o fator de Bayes para comparar as semeadoras duas a duas, no entanto as conclusões foram semelhantes às obtidas na abordagem clássica. Finalmente, na análise de agrupamento foi possível uma melhor visualização dos grupos de semeadoras semelhantes entre si em ambas as abordagens, reafirmando os resultados obtidos anteriormente. / The manual planter is a tool that today still has an important role in several countries around the world, which practices family and conservation agriculture. The use of it has importance due to minimizing soil disturbance, labor requirements in the field, most sustainable productivity and other factors. In order to analyze and/or compare the commercial manual planters, several studies have been conducted, but considering only position and dispersion measures. This work presents an alternatively method for comparing the performance of manual planters. In this case, the probabilities associated with each category of response has estimated and the hypothesis that these probabilities not vary for planters when compared in pairs evaluated using the likelihood ratio test and Bayes factor in the classical and bayesian paradigms, respectively. Finally, the planters were grouped considering as a measure of distance, the divergence measure J-divergence in the cluster analysis. As an illustration of this methodology, the data from fifteen manual planters adjusted to deposit exactly two seeds per hit of different manufacturers analyzed by Molin, Menegatti and Gimenez (2001) were considered. Initially, in the classical approach, the planters without zero values in response categories were compared and the planters 3, 8 and 14 presents the better behavior. After, all the planters were compared in pairs, grouping categories and adding the constants 0,5 or 1 for each response category. Grouping categories was difficult making conclusions by the likelihood ratio test, only highlighting the fact that the planter 15 is different from others. Adding 0,5 or 1 for each category, apparently not obtained the formation of different groups, such as planter 1 which by the test differed from the others and presented more frequently the deposit of two seeds, required by agronomic experiment and recommended in this work. In the Bayesian approach, the Bayes factor was used to compare the planters in pairs, but the findings were similar to those obtained in the classical approach. Finally, the cluster analysis allowed a better idea of similar planters groups with each other in the both approaches, confirming the results obtained previously. Análise de agrupamentos Fator de Bayes Semeadora manual Teste da razão de verossimilhanças Bayes factor Cluster analysis Likelihood ratio test Manual planter
18	Bayesian Model Selections for Log-binomial Regression Zhou, Wei January 2018 (has links) No description available. Statistics Log-binomial Regression Bayesian Model Selection Bayesian Variable Selection Monte Carlo methods Bayes factor Relative Risk
19	Frequentist-Bayesian Hybrid Tests in Semi-parametric and Non-parametric Models with Low/High-Dimensional Covariate Xu, Yangyi 03 December 2014 (has links) We provide a Frequentist-Bayesian hybrid test statistic in this dissertation for two testing problems. The first one is to design a test for the significant differences between non-parametric functions and the second one is to design a test allowing any departure of predictors of high dimensional X from constant. The implementation is also given in construction of the proposal test statistics for both problems. For the first testing problem, we consider the statistical difference among massive outcomes or signals to be of interest in many diverse fields including neurophysiology, imaging, engineering, and other related fields. However, such data often have nonlinear system, including to row/column patterns, having non-normal distribution, and other hard-to-identifying internal relationship, which lead to difficulties in testing the significance in difference between them for both unknown relationship and high-dimensionality. In this dissertation, we propose an Adaptive Bayes Sum Test capable of testing the significance between two nonlinear system basing on universal non-parametric mathematical decomposition/smoothing components. Our approach is developed from adapting the Bayes sum test statistic by Hart (2009). Any internal pattern is treated through Fourier transformation. Resampling techniques are applied to construct the empirical distribution of test statistic to reduce the effect of non-normal distribution. A simulation study suggests our approach performs better than the alternative method, the Adaptive Neyman Test by Fan and Lin (1998). The usefulness of our approach is demonstrated with an application in the identification of electronic chips as well as an application to test the change of pattern of precipitations. For the second testing problem, currently numerous statistical methods have been developed for analyzing high-dimensional data. These methods mainly focus on variable selection approach, but are limited for purpose of testing with high-dimensional data, and often are required to have explicit derivative likelihood functions. In this dissertation, we propose ``Hybrid Omnibus Test'' for high-dimensional data testing purpose with much less requirements. Our Hybrid Omnibus Test is developed under semi-parametric framework where likelihood function is no longer necessary. Our Hybrid Omnibus Test is a version of Freqentist-Bayesian hybrid score-type test for a functional generalized partial linear single index model, which has link being functional of predictors through a generalized partially linear single index. We propose an efficient score based on estimating equation to the mathematical difficulty in likelihood derivation and construct our Hybrid Omnibus Test. We compare our approach with a empirical likelihood ratio test and Bayesian inference based on Bayes factor using simulation study in terms of false positive rate and true positive rate. Our simulation results suggest that our approach outperforms in terms of false positive rate, true positive rate, and computation cost in high-dimensional case and low-dimensional case. The advantage of our approach is also demonstrated by published biological results with application to a genetic pathway data of type II diabetes. / Ph. D. Bayes Factor Bayes Sum Test Discrete Fourier Transform Hybrid Laplace approximation Neyman Test Omnibus Resampling Score Single index Spline Approximation
20	Bayesian and frequentist methods and analyses of genome-wide association studies Vukcevic, Damjan January 2009 (has links) Recent technological advances and remarkable successes have led to genome-wide association studies (GWAS) becoming a tool of choice for investigating the genetic basis of common complex human diseases. These studies typically involve samples from thousands of individuals, scanning their DNA at up to a million loci along the genome to discover genetic variants that affect disease risk. Hundreds of such variants are now known for common diseases, nearly all discovered by GWAS over the last three years. As a result, many new studies are planned for the future or are already underway. In this thesis, I present analysis results from actual studies and some developments in theory and methodology. The Wellcome Trust Case Control Consortium (WTCCC) published one of the first large-scale GWAS in 2007. I describe my contribution to this study and present the results from some of my follow-up analyses. I also present results from a GWAS of a bipolar disorder sub-phenotype, and a recent and on-going fine mapping experiment. Building on methods developed as part of the WTCCC, I describe a Bayesian approach to GWAS analysis and compare it to widely used frequentist approaches. I do so both theoretically, by interpreting each approach from the perspective of the other, and empirically, by comparing their performance in the context of replicated GWAS findings. I discuss the implications of these comparisons on the interpretation and analysis of GWAS generally, highlighting the advantages of the Bayesian approach. Finally, I examine the effect of linkage disequilibrium on the detection and estimation of various types of genetic effects, particularly non-additive effects. I derive a theoretical result showing how the power to detect a departure from an additive model at a marker locus decays faster than the power to detect an association. 572.8

Search results