Spelling suggestions: "subject:"gibbs ampling"" "subject:"gibbs campling""
41 |
Dark energy and the inhomogeneous universeBull, Philip J. January 2013 (has links)
In this thesis, I study the relativistic effects of matter inhomogeneities on the accelerating expansion of the Universe. The acceleration is often taken to be caused by the presence of an exotic fluid called Dark Energy, or else a non-zero 'cosmological constant' term in the field equations of General Relativity. I consider whether this result could instead be an artefact caused by using an incorrect model to interpret observations. The standard 'concordance' cosmological model assumes the Cosmological Principle, which states that the matter distribution on large scales is homogeneous. One possibility is that correction terms appear in the field equations when small-scale inhomogeneities are smoothed over to produce this homogeneous model. These 'backreaction' effects could affect the dynamics of the spacetime, causing an apparent acceleration. I clarify the relationship between acceleration of the averaged spacetime and acceleration inferred from observable quantities, and show that they are closely related in statistically-homogeneous spacetimes. Another possibility is that the Universe could be inhomogeneous on large scales. If there was a large ‘void’, with us at the centre, the lensing of light by the void could reproduce the observations that imply cosmic acceleration. I show that a popular class of void models, based on spherically-symmetric Lemaitre-Tolman-Bondi spacetimes, are unable to simultaneously fit a selection of observational data, thus effectively ruling-out this possibility. These data include the Kinematic Sunyaev-Zel'dovich (KSZ) effect, which is a distortion/shift of the Cosmic Microwave Background (CMB) frequency spectrum caused by the Compton scattering of photons by hot gas in galaxy clusters. This, and other distortions of the CMB frequency spectrum, are sensitive to the degree of anisotropy in the CMB about a scattering cluster. I suggest tests involving these observables that exploit the strong link between isotropy and homogeneity to (a) distinguish between different causes of a deviation from spatial flatness on the horizon scale, and (b) potentially confirm the Cosmological Principle using observations. Finally, I describe a novel Bayesian CMB component separation method for extracting the Sunyaev-Zel'dovich signal of clusters from CMB sky maps.
|
42 |
Analysis of Spatial DataZhang, Xiang 01 January 2013 (has links)
In many areas of the agriculture, biological, physical and social sciences, spatial lattice data are becoming increasingly common. In addition, a large amount of lattice data shows not only visible spatial pattern but also temporal pattern (see, Zhu et al. 2005). An interesting problem is to develop a model to systematically model the relationship between the response variable and possible explanatory variable, while accounting for space and time effect simultaneously.
Spatial-temporal linear model and the corresponding likelihood-based statistical inference are important tools for the analysis of spatial-temporal lattice data. We propose a general asymptotic framework for spatial-temporal linear models and investigate the property of maximum likelihood estimates under such framework. Mild regularity conditions on the spatial-temporal weight matrices will be put in order to derive the asymptotic properties (consistency and asymptotic normality) of maximum likelihood estimates. A simulation study is conducted to examine the finite-sample properties of the maximum likelihood estimates.
For spatial data, aside from traditional likelihood-based method, a variety of literature has discussed Bayesian approach to estimate the correlation (auto-covariance function) among spatial data, especially Zheng et al. (2010) proposed a nonparametric Bayesian approach to estimate a spectral density. We will also discuss nonparametric Bayesian approach in analyzing spatial data. We will propose a general procedure for constructing a multivariate Feller prior and establish its theoretical property as a nonparametric prior. A blocked Gibbs sampling algorithm is also proposed for computation since the posterior distribution is analytically manageable.
|
43 |
Multivariate Models and Algorithms for Systems BiologyAcharya, Lipi Rani 17 December 2011 (has links)
Rapid advances in high-throughput data acquisition technologies, such as microarraysand next-generation sequencing, have enabled the scientists to interrogate the expression levels of tens of thousands of genes simultaneously. However, challenges remain in developingeffective computational methods for analyzing data generated from such platforms. In thisdissertation, we address some of these challenges. We divide our work into two parts. Inthe first part, we present a suite of multivariate approaches for a reliable discovery of geneclusters, often interpreted as pathway components, from molecular profiling data with replicated measurements. We translate our goal into learning an optimal correlation structure from replicated complete and incomplete measurements. In the second part, we focus on thereconstruction of signal transduction mechanisms in the signaling pathway components. Wepropose gene set based approaches for inferring the structure of a signaling pathway.First, we present a constrained multivariate Gaussian model, referred to as the informed-case model, for estimating the correlation structure from replicated and complete molecular profiling data. Informed-case model generalizes previously known blind-case modelby accommodating prior knowledge of replication mechanisms. Second, we generalize theblind-case model by designing a two-component mixture model. Our idea is to strike anoptimal balance between a fully constrained correlation structure and an unconstrained one.Third, we develop an Expectation-Maximization algorithm to infer the underlying correlation structure from replicated molecular profiling data with missing (incomplete) measurements.We utilize our correlation estimators for clustering real-world replicated complete and incompletemolecular profiling data sets. The above three components constitute the first partof the dissertation. For the structural inference of signaling pathways, we hypothesize a directed signal pathway structure as an ensemble of overlapping and linear signal transduction events. We then propose two algorithms to reverse engineer the underlying signaling pathway structure using unordered gene sets corresponding to signal transduction events. Throughout we treat gene sets as variables and the associated gene orderings as random.The first algorithm has been developed under the Gibbs sampling framework and the secondalgorithm utilizes the framework of simulated annealing. Finally, we summarize our findingsand discuss possible future directions.
|
44 |
Análise Bayesiana da área de olho do lombo e da espessura de gordura obtidas por ultrassom e suas associações com outras características de importância econômica na raça Nelore /Yokoo, Marcos Jun Iti. January 2009 (has links)
Resumo: Objetivou-se com esse trabalho estimar os parâmetros genéticos para as características área de olho de lombo (AOL), espessura de gordura subcutânea na costela (EG) e espessura de gordura na garupa (EGP8) obtidas por ultrassom, ao ano (A) e ao sobreano (S). Além disso, foram estimadas as correlações genéticas entre essas características de carcaça obtidas por ultrassom (CCUS), e dessas com outras características de importância econômica em bovinos de corte, como peso (PS), altura do posterior (ALT) e perímetro escrotal (PE450) ao sobreano, idade ao primeiro parto (IPP) e primeiro intervalo entre partos (PIEP). Os parâmetros genéticos foram estimados em análises multi-características pelo modelo animal, utilizando-se a inferência Bayesiana via algoritmo de "Gibbs Sampling". As estimativas de herdabilidade a posteriori para as CCUS foram: 0,46 (AOL_A), 0,42 (EG_A), 0,60 (EGP8_A), 0,33 (AOL_S), 0,59 (EG_S) e 0,55 (EGP8_S), mostrando que se essas características forem utilizadas como critério de seleção, as mesmas devem responder rapidamente à seleção individual, sem causar antagonismo na seleção do PE450, PS (A e S) e IPP. A estimativa de herdabilidade a posteriori para as características IPP e PIEP foi de magnitude moderada a baixa, 0,26 e 0,11, respectivamente. A ALT apresentou correlação genética (rg) negativa com a EG_S (-0,38) e EGP8_S (-0,32), evidenciando que a seleção para animais mais altos pode levar a animais tardios em termos de terminação da carcaça. A seleção para melhorar as CCUS, o PIEP e o PE450 não afetará a IPP, contudo, animais mais pesados e mais altos tendem a ser mais precoces sexualmente (rg variou entre - 0,22 e -0,44). Com exceção da EG_S (rg=0,40), a seleção para as CCUS e as características de crescimento não afetará o PIEP, por resposta correlacionada. / Abstract: The objective of this work was to estimate genetic parameters for the traits longissimus muscle area (LMA), backfat thickness (BF) and rump fat thickness (RF) measured by real time ultrasound at 12 (Y) and 18 (S) months of age. In addition, this study aimed estimate the genetic correlations between these carcass traits measured by real time ultrasound (CTUS), and those with other economically important traits in beef cattle, i.e., weight (W), hip height (HH) and scrotal circumference (SC450 ) at 18 months of age, age at first calving (AFC) and first calving interval (FCI). The genetic parameters were estimated in multi-trait analyses, with animal models, by Bayesian inference using the Gibbs Sampling algorithm. The heritability estimates for LMA (Y and S), BF (Y and S) and RF (Y and S) were 0.46 and 0.33, 0.42 and 0.59, and 0.60 and 0.55, respectively, showing that if these traits will used as selection criteria, they must respond quickly to individual selection, without causing antagonism in the selection of the SC450, W (Y and S) and AFC. The a posteriori heritability estimates for AFC and FCI were from moderate to low, 0.26 and 0.11, respectively. The HH showed negative genetic correlations (rg) with BF_S (-0.38) and RF_S (-0.32), suggesting that long term selection for taller animals would tend to produce animals with less subcutaneous fat, i.e. later-maturing in terms of carcass finishing. Selection to improve CTUS, FCI and SC450 will not affect the AFC, however, heavier and taller animals tend to be more sexually precocious (rg ranged between -0.22 and -0.44). Except for the BF_S (rg=0.40), the selection for the CTUS and growth traits will not affect the FCI, by correlated response. / Orientadora: Lucia Galvão de Albuquerque / Coorientador: Guilherme Jordão de Magalhães / Coorientador: Cláudio Ulhôa Magnabosco / Banca: Maria Eugênia Zerlotti Mercadante / Banca: Raysildo Barbosa Lôbo / Banca: Humberto Tonhati / Banca: Henrique Nunes de Oliveira / Doutor
|
45 |
Estimation et sélection de modèle pour le modèle des blocs latents / Estimation and model selection for the latent block modelBrault, Vincent 30 September 2014 (has links)
Le but de la classification est de partager des ensembles de données en sous-ensembles les plus homogènes possibles, c'est-à-dire que les membres d'une classe doivent plus se ressembler entre eux qu'aux membres des autres classes. Le problème se complique lorsque le statisticien souhaite définir des groupes à la fois sur les individus et sur les variables. Le modèle des blocs latents définit une loi pour chaque croisement de classe d'objets et de classe de variables, et les observations sont supposées indépendantes conditionnellement au choix de ces classes. Toutefois, il est impossible de factoriser la loi jointe des labels empêchant le calcul de la logvraisemblance et l'utilisation de l'algorithme EM. Plusieurs méthodes et critères existent pour retrouver ces partitions, certains fréquentistes, d'autres bayésiens, certains stochastiques, d'autres non. Dans cette thèse, nous avons d'abord proposé des conditions suffisantes pour obtenir l'identifiabilité. Dans un second temps, nous avons étudié deux algorithmes proposés pour contourner le problème de l'algorithme EM : VEM de Govaert et Nadif (2008) et SEM-Gibbs de Keribin, Celeux et Govaert (2010). En particulier, nous avons analysé la combinaison des deux et mis en évidence des raisons pour lesquelles les algorithmes dégénèrent (terme utilisé pour dire qu'ils renvoient des classes vides). En choisissant des lois a priori judicieuses, nous avons ensuite proposé une adaptation bayésienne permettant de limiter ce phénomène. Nous avons notamment utilisé un échantillonneur de Gibbs dont nous proposons un critère d'arrêt basé sur la statistique de Brooks-Gelman (1998). Nous avons également proposé une adaptation de l'algorithme Largest Gaps (Channarond et al. (2012)). En reprenant leurs démonstrations, nous avons démontré que les estimateurs des labels et des paramètres obtenus sont consistants lorsque le nombre de lignes et de colonnes tendent vers l'infini. De plus, nous avons proposé une méthode pour sélectionner le nombre de classes en ligne et en colonne dont l'estimation est également consistante à condition que le nombre de ligne et de colonne soit très grand. Pour estimer le nombre de classes, nous avons étudié le critère ICL (Integrated Completed Likelihood) dont nous avons proposé une forme exacte. Après avoir étudié l'approximation asymptotique, nous avons proposé un critère BIC (Bayesian Information Criterion) puis nous conjecturons que les deux critères sélectionnent les mêmes résultats et que ces estimations seraient consistantes ; conjecture appuyée par des résultats théoriques et empiriques. Enfin, nous avons comparé les différentes combinaisons et proposé une méthodologie pour faire une analyse croisée de données. / Classification aims at sharing data sets in homogeneous subsets; the observations in a class are more similar than the observations of other classes. The problem is compounded when the statistician wants to obtain a cross classification on the individuals and the variables. The latent block model uses a law for each crossing object class and class variables, and observations are assumed to be independent conditionally on the choice of these classes. However, factorizing the joint distribution of the labels is impossible, obstructing the calculation of the log-likelihood and the using of the EM algorithm. Several methods and criteria exist to find these partitions, some frequentist ones, some bayesian ones, some stochastic ones... In this thesis, we first proposed sufficient conditions to obtain the identifiability of the model. In a second step, we studied two proposed algorithms to counteract the problem of the EM algorithm: the VEM algorithm (Govaert and Nadif (2008)) and the SEM-Gibbs algorithm (Keribin, Celeux and Govaert (2010)). In particular, we analyzed the combination of both and highlighted why the algorithms degenerate (term used to say that it returns empty classes). By choosing priors wise, we then proposed a Bayesian adaptation to limit this phenomenon. In particular, we used a Gibbs sampler and we proposed a stopping criterion based on the statistics of Brooks-Gelman (1998). We also proposed an adaptation of the Largest Gaps algorithm (Channarond et al. (2012)). By taking their demonstrations, we have shown that the labels and parameters estimators obtained are consistent when the number of rows and columns tend to infinity. Furthermore, we proposed a method to select the number of classes in row and column, the estimation provided is also consistent when the number of row and column is very large. To estimate the number of classes, we studied the ICL criterion (Integrated Completed Likelihood) whose we proposed an exact shape. After studying the asymptotic approximation, we proposed a BIC criterion (Bayesian Information Criterion) and we conjecture that the two criteria select the same results and these estimates are consistent; conjecture supported by theoretical and empirical results. Finally, we compared the different combinations and proposed a methodology for co-clustering.
|
46 |
Modelo bayesiano para dados de sobrevivência com riscos semicompetitivos baseado em cópulas / Bayesian model for survival data with semicompeting risks based on copulasPatiño, Elizabeth González 23 March 2018 (has links)
Motivados por um conjunto de dados de pacientes com insuficiência renal crônica (IRC), propomos uma nova modelagem bayesiana que envolve cópulas da família Arquimediana e um modelo misto para dados de sobrevivência com riscos semicompetitivos. A estrutura de riscos semicompetitivos é bastante comum em estudos clínicos em que dois eventos são de interesse, um intermediário e outro terminal, de forma tal que a ocorrência do evento terminal impede a ocorrência do intermediário mas não vice-versa. Nesta modelagem provamos que a distribuição a posteriori sob a cópula de Clayton é própria. Implementamos os algoritmos de dados aumentados e amostrador de Gibbs para a inferência bayesiana, assim como os criterios de comparação de modelos: LPML, DIC e BIC. Realizamos um estudo de simulação para avaliar o desempenho da modelagem e finalmente aplicamos a metodologia proposta para analisar os dados dos pacientes com IRC, além de outros de pacientes que receberam transplante de medula óssea. / Motivated by a dataset of patients with chronic kidney disease (CKD), we propose a new bayesian model including the Arquimedean copula and a mixed model for survival data with semicompeting risks. The structure of semicompeting risks appears frequently in clinical studies where two-types of events are involved: a nonterminal and a terminal event such that the occurrence of terminal event precludes the occurrence of the non-terminal event but not viceversa. In this work we prove that the posterior distribution is proper when the Clayton copula is used. We implement the data augmentation algorithm and Gibbs sampling for the bayesian inference, as well as some bayesian model selection criteria: LPML, BIC and DIC. We carry out a simulation study for assess the model performance and finally, our methodology is illustrated with the chronic kidney disease study.
|
47 |
Modelo bayesiano para dados de sobrevivência com riscos semicompetitivos baseado em cópulas / Bayesian model for survival data with semicompeting risks based on copulasElizabeth González Patiño 23 March 2018 (has links)
Motivados por um conjunto de dados de pacientes com insuficiência renal crônica (IRC), propomos uma nova modelagem bayesiana que envolve cópulas da família Arquimediana e um modelo misto para dados de sobrevivência com riscos semicompetitivos. A estrutura de riscos semicompetitivos é bastante comum em estudos clínicos em que dois eventos são de interesse, um intermediário e outro terminal, de forma tal que a ocorrência do evento terminal impede a ocorrência do intermediário mas não vice-versa. Nesta modelagem provamos que a distribuição a posteriori sob a cópula de Clayton é própria. Implementamos os algoritmos de dados aumentados e amostrador de Gibbs para a inferência bayesiana, assim como os criterios de comparação de modelos: LPML, DIC e BIC. Realizamos um estudo de simulação para avaliar o desempenho da modelagem e finalmente aplicamos a metodologia proposta para analisar os dados dos pacientes com IRC, além de outros de pacientes que receberam transplante de medula óssea. / Motivated by a dataset of patients with chronic kidney disease (CKD), we propose a new bayesian model including the Arquimedean copula and a mixed model for survival data with semicompeting risks. The structure of semicompeting risks appears frequently in clinical studies where two-types of events are involved: a nonterminal and a terminal event such that the occurrence of terminal event precludes the occurrence of the non-terminal event but not viceversa. In this work we prove that the posterior distribution is proper when the Clayton copula is used. We implement the data augmentation algorithm and Gibbs sampling for the bayesian inference, as well as some bayesian model selection criteria: LPML, BIC and DIC. We carry out a simulation study for assess the model performance and finally, our methodology is illustrated with the chronic kidney disease study.
|
48 |
兩個二段式指數分配比較之研究 / Comparison of two exponential distributions with a change point賴武志, Lai, Wu Chih Unknown Date (has links)
在存活分析中,含有轉折點的指數分配(又稱二段式指數分配)的模式,常被拿來研究某些疾病的復發率,以決定其治療方式是否有效。然而在文獻上,對這一個模式的探討大都局限在單一母體上,其問題不外乎有兩個:一是檢定此一轉折點是否存在;二是估計此一轉折點。
本文將此一問題擴充,從一個母體提昇至兩個母體,比較兩個母體是否具有相同的轉折點、起始危險率或轉換率。基本上,我們使用了貝氏方法和古典方法來分析。
我們利用貝氏方法,推導出兩個母體在不同的已知條件下,各母數比值或差值的事後分配。但是他們的形式幾乎都很複雜,使得欲做進一步的分析,困難重重。因此,我們引進了 Gibbs 抽樣法,利用各完全條件事後分配,萃取出各邊際事後分配,以供推論之用。
而在古典分析中,我們係採用概似比值檢定法。而最大的問題在於轉折點未知時,我們不知其對數概似比的分配為何。我們除了介紹兩個文獻中估計轉折點的方法,我們更利用了自助法 (bootstrap) 來估計其對數概似比的分配,以供檢定之用。
對於這樣兩母體的比較,在醫學上、工業上甚具意義。本文不僅推導出其供比較用的統計架構,更提供了具體而實用的抽樣方法, 對這問題的分析,頗具貢獻。
|
49 |
Protein-DNA Binding: Discovering Motifs and Distinguishing Direct from Indirect InteractionsGordan, Raluca Mihaela January 2009 (has links)
<p>The initiation of two major processes in the eukaryotic cell, gene transcription and DNA replication, is regulated largely through interactions between proteins or protein complexes and DNA. Although a lot is known about the interacting proteins and their role in regulating transcription and replication, the specific DNA binding motifs of many regulatory proteins and complexes are still to be determined. For this purpose, many computational tools for DNA motif discovery have been developed in the last two decades. These tools employ a variety of strategies, from exhaustive search to sampling techniques, with the hope of finding over-represented motifs in sets of co-regulated or co-bound sequences. Despite the variety of computational tools aimed at solving the problem of motif discovery, their ability to correctly detect known DNA motifs is still limited. The motifs are usually short and many times degenerate, which makes them difficult to distinguish from genomic background. We believe the most efficient strategy for improving the performance of motif discovery is not to use increasingly complex computational and statistical methods and models, but to incorporate more of the biology into the computational techniques, in a principled manner. To this end, we propose a novel motif discovery algorithm: PRIORITY. Based on a general Gibbs sampling framework, PRIORITY has a major advantage over other motif discovery tools: it can incorporate different types of biological information (e.g., nucleosome positioning information) to guide the search for DNA binding sites toward regions where these sites are more likely to occur (e.g., nucleosome-free regions). </p><p>We use transcription factor (TF) binding data from yeast chromatin immunoprecipitation (ChIP-chip) experiments to test the performance of our motif discovery algorithm when incorporating three types of biological information: information about nucleosome positioning, information about DNA double-helical stability, and evolutionary conservation information. In each case, incorporating additional biological information has proven very useful in increasing the accuracy of motif finding, with the number of correctly identified motifs increasing with up to 52%. PRIORITY is not restricted to TF binding data. In this work, we also analyze origin recognition complex (ORC) binding data and show that PRIORITY can utilize DNA structural information to predict the binding specificity of the yeast ORC. </p><p>Despite the improvement obtained using additional biological information, the success of motif discovery algorithms in identifying known motifs is still limited, especially when applied to sequences bound in vivo (such as those of ChIP-chip) because the observed protein-DNA interactions are not necessarily direct. Some TFs associate with DNA only indirectly via protein partners, while others exhibit both direct and indirect binding. We propose a novel method to distinguish between direct and indirect TF-DNA interactions, integrating in vivo TF binding data, in vivo nucleosome occupancy data, and in vitro motifs from protein binding microarrays. When applied to yeast ChIP-chip data, our method reveals that only 48% of the ChIP-chip data sets can be readily explained by direct binding of the profiled TF, while 16% can be explained by indirect DNA binding. In the remaining 36%, we found that none of the motifs used in our analysis was able to explain the ChIP-chip data, either because the data was too noisy or because the set of motifs was incomplete. As more in vitro motifs become available, our method can be used to build a complete catalog of direct and indirect TF-DNA interactions.</p> / Dissertation
|
50 |
A Simulation Study On Marginalized Transition Random Effects Models For Multivariate Longitudinal Binary DataYalcinoz, Zerrin 01 May 2008 (has links) (PDF)
In this thesis, a simulation study is held and a statistical model is fitted to the simulated data. This data is assumed to be the satisfaction of the customers who withdraw their salary from a particular bank. It is a longitudinal data which has bivariate and binary response. It is assumed to be collected from 200 individuals at four different time points. In such data sets, two types of dependence -the dependence within subject measurements and the dependence between responses- are important and these are considered in the model. The model is Marginalized Transition Random Effects Models, which has three levels. The first level measures the effect of covariates on responses, the second level accounts for temporal changes, and the third level measures the difference between individuals. Markov Chain Monte Carlo methods are used for the model fit. In the simulation study, the changes between the estimated values and true parameters are searched under two conditions, when the model is correctly specified or not. Results suggest that the better convergence is obtained with the full model. The third level which observes the individual changes is more sensitive to the model misspecification than the other levels of the model.
|
Page generated in 0.0688 seconds