Global ETD Search

1	Tests of Independence in a Single 2x2 Contingency Table with Random Margins Yu, Yuan 01 May 2014 (has links) In analysis of the contingency tables, the Fisher's exact test is a very important statistical significant test that is commonly used to test independence between the two variables. However, the Fisher' s exact test is based upon the assumption of the fixed margins. That is, the Fisher's exact test uses information beyond the table so that it is conservative. To solve this problem, we allow the margins to be random. This means that instead of fitting the count data to the hypergeometric distribution as in the Fisher's exact test, we model the margins and one cell using multinomial distribution, and then we use the likelihood ratio to test the hypothesis of independence. Furthermore, using Bayesian inference, we consider the Bayes factor as another test statistic. In order to judge the test performance, we compare the power of the likelihood ratio test, the Bayes factor test and the Fisher's exact test. In addition, we use our methodology to analyse data gathered from the Worcester Heart Attack Study to assess gender difference in the therapeutic management of patients with acute myocardial infarction (AMI) by selected demographic and clinical characteristics. likelihood ratio test Bayes factor
2	A Bayesian Test of Independence for Two-way Contingency Tables Under Cluster Sampling Bhatta, Dilli 19 April 2013 (has links) We consider a Bayesian approach to the study of independence in a two-way contingency table obtained from a two-stage cluster sampling design. We study the association between two categorical variables when (a) there are no covariates and (b) there are covariates at both unit and cluster levels. Our main idea for the Bayesian test of independence is to convert the cluster sample into an equivalent simple random sample which provides a surrogate of the original sample. Then, this surrogate sample is used to compute the Bayes factor to make an inference about independence. For the test of independence without covariates, the Rao-Scott corrections to the standard chi-squared (or likelihood ratio) statistic were developed. They are ``large sample' methods and provide appropriate inference when there are large cell counts. However, they are less successful when there are small cell counts. We have developed the methodology to overcome the limitations of Rao-Scott correction. We have used a hierarchical Bayesian model to convert the observed cluster samples to simple random samples. This provides the surrogate samples which can be used to derive the distribution of the Bayes factor to make an inference about independence. We have used a sampling-based method to fit the model. For the test of independence with covariates, we first convert the cluster sample with covariates to a cluster sample without covariates. We use multinomial logistic regression model with random effects to accommodate the cluster effects. Our idea is to fit the cluster samples to the random effect models and predict the new samples by adjusting with the covariates. This provides the cluster sample without covariates. We then use a hierarchical Bayesian model to convert this cluster sample to a simple random sample which allows us to calculate the Bayes factor to make an inference about independence. We use Markov chain Monte Carlo methods to fit our models. We apply our first method to the Third International Mathematics and Science Study (1995) for third grade U.S. students in which we study the association between the mathematics test scores and the communities the students come from, and science test scores and the communities the students come from. We also provide a simulation study which establishes our methodology as a viable alternative to the Rao-Scott approximations for relatively small two-stage cluster samples. We apply our second method to the data from the Trend in International Mathematics and Science Study (2007) for fourth grade U.S. students to assess the association between the mathematics and science scores represented as categorical variables and also provide the simulation study. The result shows that if there is strong association between two categorical variables, there is no difference between the significance of the test in using the model (a) with covariates and (b) without covariates. However, in simulation studies, there is a noticeable difference in the significance of the test between the two models when there are borderline cases (i.e., situations where there is marginal significance). Surrogate samples Bayes factor Hierarchical Baye
3	Bayesian Model Checking in Multivariate Discrete Regression Problems Dong, Fanglong 03 November 2008 (has links) No description available. Statistics Bayesian statistics ordinal data bayes factor deviance posterior distribution
4	Uma abordagem bayesiana para mapeamento de QTLs em populações experimentais / A Bayesian approach for mapping QTL in experimental populations Andréia da Silva Meyer 03 April 2009 (has links) Muitos caracteres em plantas e animais são de natureza quantitativa, influenciados por múltiplos genes. Com o advento de novas técnicas moleculares tem sido possível mapear os locos que controlam os caracteres quantitativos, denominados QTLs (Quantitative Trait Loci). Mapear um QTL significa identificar sua posição no genoma, bem como, estimar seus efeitos genéticos. A maior dificuldade para realizar o mapeamento de QTLs, se deve ao fato de que o número de QTLs é desconhecido. Métodos bayesianos juntamente com método Monte Carlo com Cadeias de Markov (MCMC), têm sido implementados para inferir conjuntamente o número de QTLs, suas posições no genoma e os efeitos genéticos . O desafio está em obter a amostra da distribuição conjunta a posteriori desses parâmetros, uma vez que o número de QTLs pode ser considerado desconhecido e a dimensão do espaço paramétrico muda de acordo com o número de QTLs presente no modelo. No presente trabalho foi implementado, utilizando-se o programa estatístico R uma abordagem bayesiana para mapear QTLs em que múltiplos QTLs e os efeitos de epistasia são considerados no modelo. Para tanto foram ajustados modelos com números crescentes de QTLs e o fator de Bayes foi utilizado para selecionar o modelo mais adequado e conseqüentemente, estimar o número de QTLs que controlam os fenótipos de interesse. Para investigar a eficiência da metodologia implementada foi feito um estudo de simulação em que foram considerados duas diferentes populações experimentais: retrocruzamento e F2, sendo que para ambas as populações foi feito o estudo de simulação considerando modelos com e sem epistasia. A abordagem implementada mostrou-se muito eficiente, sendo que para todas as situações consideradas o modelo selecionado foi o modelo contendo o número verdadeiro de QTLs considerado na simulação dos dados. Além disso, foi feito o mapeamento de QTLs de três fenótipos de milho tropical: altura da planta (AP), altura da espiga (AE) e produção de grãos utilizando a metodologia implementada e os resultados obtidos foram comparados com os resultados encontrados pelo método CIM. / Many traits in plants and animals have quantitative nature, influenced by multiple genes. With the new molecular techniques, it has been possible to map the loci, which control the quantitative traits, called QTL (Quantitative Trait Loci). Mapping a QTL means to identify its position in the genome, as well as to estimate its genetics effects. The great difficulty of mapping QTL relates to the fact that the number of QTL is unknown. Bayesian approaches used with Markov Chain Monte Carlo method (MCMC) have been applied to infer QTL number, their positions in the genome and their genetic effects. The challenge is to obtain the sample from the joined distribution posterior of these parameters, since the number of QTL may be considered unknown and hence the dimension of the parametric space changes according to the number of QTL in the model. In this study, a Bayesian approach was applied, using the statistical program R, in order to map QTL, considering multiples QTL and epistasis effects in the model. Models were adjusted with the crescent number of QTL and Bayes factor was used to select the most suitable model and, consequently, to estimate the number of QTL that control interesting phenotype. To evaluate the efficiency of the applied methodology, a simulation study was done, considering two different experimental populations: backcross and F2, accomplishing the simulation study for both populations, considering models with and without epistasis. The applied approach resulted to be very efficient, considering that for all the used situations, the selected model was the one containing the real number of QTL used in the data simulation. Moreover, the QTL mapping of three phenotypes of tropical corn was done: plant height, corn-cob height and grain production, using the applied methodology and the results were compared to the results found by the CIM method. Genética estatística Inferência baysiana Mapeamento genético Método de Monte Carlo. Bayes factor Bayesian inference MCMC QTL mapping.
5	Summarizing FLARE assay images in colon carcinogenesis Leyk Williams, Malgorzata 12 April 2006 (has links) Intestinal tract cancer is one of the more common cancers in the United States. While in some individuals a genetic component causes the cancer, the rate of cancer in the remainder of the population is believed to be affected by diet. Since cancer usually develops slowly, the amount of oxidative damage to DNA can be used as a cancer biomarker. This dissertation examines effective ways of analyzing FLARE assay data, which quantiﬁes oxidative damage. The statistical methods will be implemented on data from a FLARE assay experiment, which examines cells from the duodenum and the colon to see if there is a difference in the risk of cancer due to corn or ﬁsh oil diets. Treatments of the oxidizing agent dextran sodium sulfate (DSS), DSS with a recovery period, as well as a control will also be used. Previous methods presented in the literature examined the FLARE data by summarizing the DNA damage of each cell with a single number, such as the relative tail moment (RTM). Variable skewness is proposed as an alternative measure, and shown to be as effective as the RTM in detecting diet and treatment differences in the standard analysis. The RTM and skewness data is then analyzed using a hierarchical model, with both the skewness and RTM showing diet/treatment differences. Simulated data for this model is also considered, and shows that a Bayes Factor (BF) for higher dimensional models does not follow guidelines presented by Kass and Raftery (1995). It is hypothesized that more information is obtained by describing the DNA damage functions, instead of summarizing them with a single number. From each function, seven points are picked. First, they are modeled independently, and only diet effects are found. However, when the correlation between points at the cell and rat level is modeled, much stronger diet and treatment differences are shown both in the colon and the duodenum than for any of the previous methods. These results are also easier to interpret and represent graphically, showing that the latter is an effective method of analyzing the FLARE data. FLARE assay hierarchical models Bayes Factor comet assay corn oil fish oil
6	Bayesian Model Checking Strategies for Dichotomous Item Response Theory Models Toribio, Sherwin G. 16 June 2006 (has links) No description available. Item Response Theory Bayesian Model Checking Gibbs Sampling Predictive Distributions Bayes Factor
7	Calibrated Bayes Factor and Bayesian Model Averaging zheng, jiayin 14 August 2018 (has links) No description available. Statistics
8	Computational Inference of Genome-Wide Protein-DNA Interactions Using High-Throughput Genomic Data Zhong, Jianling January 2015 (has links) <p>Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape. </p><p>We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations. </p><p>We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites. </p><p>Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets. </p><p>This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.</p> / Dissertation Bioinformatics Statistics Computer science Bayes factor Genomic data integration Protein-DNA interactions state-space models statistical inference transcriptional regulation
9	Análise de agrupamento de semeadoras manuais quanto à distribuição do número de sementes / Cluster analysis of manual planters according to the distribution of the number of seeds Araripe, Patricia Peres 10 December 2015 (has links) A semeadora manual é uma ferramenta que, ainda nos dias de hoje, exerce um papel importante em diversos países do mundo que praticam a agricultura familiar e de conservação. Sua utilização é de grande importância devido a minimização do distúrbio do solo, exigências de trabalho no campo, maior produtividade sustentável entre outros fatores. De modo a avaliar e/ou comparar as semeadoras manuais existentes no mercado, diversos trabalhos têm sido realizados, porém considerando somente medidas de posição e dispersão. Neste trabalho é utilizada, como alternativa, uma metodologia para a comparação dos desempenhos das semeadoras manuais. Neste caso, estimou-se as probabilidades associadas a cada categoria de resposta e testou-se a hipótese de que essas probabilidades não variam para as semeadoras quando comparadas duas a duas, utilizando o teste da razão das verossimilhanças e o fator de Bayes nos paradigmas clássico e bayesiano, respectivamente. Por fim, as semeadoras foram agrupadas considerando, como medida de distância, a medida de divergência J-divergência na análise de agrupamento. Como ilustração da metodologia apresentada, são considerados os dados para a comparação de quinze semeadoras manuais de diferentes fabricantes analisados por Molin, Menegatti e Gimenez (2001) em que as semeadoras foram reguladas para depositarem exatamente duas sementes por golpe. Inicialmente, na abordagem clássica, foram comparadas as semeadoras que não possuíam valores nulos nas categorias de resposta, sendo as semeadoras 3, 8 e 14 as que apresentaram melhores comportamentos. Posteriormente, todas as semeadoras foram comparadas duas a duas, agrupando-se as categorias e adicionando as contantes 0,5 ou 1 à cada categoria de resposta. Ao agrupar categorias foi difícil a tomada de conclusões pelo teste da razão de verossimilhanças, evidenciando somente o fato da semeadora 15 ser diferente das demais. Adicionando 0,5 ou 1 à cada categoria não obteve-se, aparentemente, a formação de grupos distintos, como a semeadora 1 pelo teste diferiu das demais e apresentou maior frequência no depósito de duas sementes, o exigido pelo experimento agronômico, foi a recomendada neste trabalho. Na abordagem bayesiana, utilizou-se o fator de Bayes para comparar as semeadoras duas a duas, no entanto as conclusões foram semelhantes às obtidas na abordagem clássica. Finalmente, na análise de agrupamento foi possível uma melhor visualização dos grupos de semeadoras semelhantes entre si em ambas as abordagens, reafirmando os resultados obtidos anteriormente. / The manual planter is a tool that today still has an important role in several countries around the world, which practices family and conservation agriculture. The use of it has importance due to minimizing soil disturbance, labor requirements in the field, most sustainable productivity and other factors. In order to analyze and/or compare the commercial manual planters, several studies have been conducted, but considering only position and dispersion measures. This work presents an alternatively method for comparing the performance of manual planters. In this case, the probabilities associated with each category of response has estimated and the hypothesis that these probabilities not vary for planters when compared in pairs evaluated using the likelihood ratio test and Bayes factor in the classical and bayesian paradigms, respectively. Finally, the planters were grouped considering as a measure of distance, the divergence measure J-divergence in the cluster analysis. As an illustration of this methodology, the data from fifteen manual planters adjusted to deposit exactly two seeds per hit of different manufacturers analyzed by Molin, Menegatti and Gimenez (2001) were considered. Initially, in the classical approach, the planters without zero values in response categories were compared and the planters 3, 8 and 14 presents the better behavior. After, all the planters were compared in pairs, grouping categories and adding the constants 0,5 or 1 for each response category. Grouping categories was difficult making conclusions by the likelihood ratio test, only highlighting the fact that the planter 15 is different from others. Adding 0,5 or 1 for each category, apparently not obtained the formation of different groups, such as planter 1 which by the test differed from the others and presented more frequently the deposit of two seeds, required by agronomic experiment and recommended in this work. In the Bayesian approach, the Bayes factor was used to compare the planters in pairs, but the findings were similar to those obtained in the classical approach. Finally, the cluster analysis allowed a better idea of similar planters groups with each other in the both approaches, confirming the results obtained previously. Análise de agrupamentos Bayes factor Cluster analysis Fator de Bayes Likelihood ratio test Manual planter Semeadora manual Teste da razão de verossimilhanças
10	Stochastic Modeling and Statistical Inference of Geological Fault Populations and Patterns Borgos, Hilde Grude January 2000 (has links) <p>The focus of this work is on faults, and the main issue is statistical analysis and stochastic modeling of faults and fault patterns in petroleum reservoirs. The thesis consists of Part I-V and Appendix A-C. The units can be read independently. Part III is written for a geophysical audience, and the topic of this part is fault and fracture size-frequency distributions. The remaining parts are written for a statistical audience, but can also be read by people with an interest in quantitative geology. The topic of Part I and II is statistical model choice for fault size distributions, with a samling algorithm for estimating Bayes factor. Part IV describes work on spatial modeling of fault geometry, and Part V is a short note on line partitioning. Part I, II and III constitute the main part of the thesis. The appendices are conference abstracts and papers based on Part I and IV.</p> / Paper III: reprinted with kind permission of the American Geophysical Union. An edited version of this paper was published by AGU. Copyright [2000] American Geophysical Union Mathematical statistics Stochastic modeling Fault size Fault population Fault pattern Bayes factor Sampling algorithm Matematisk statistik Mathematical statistics Matematisk statistik

Search results