Global ETD Search

1	Bayesian Analysis of Transposon Mutagenesis Data DeJesus, Michael A. 2012 May 1900 (has links) Determining which genes are essential for growth of a bacterial organism is an important question to answer as it is useful for the discovery of drugs that inhibit critical biological functions of a pathogen. To evaluate essentiality, biologists often use transposon mutagenesis to disrupt genomic regions within an organism, revealing which genes are able to withstand disruption and are therefore not required for growth. The development of next-generation sequencing technology augments transposon mutagenesis by providing high-resolution sequence data that identifies the exact location of transposon insertions in the genome. Although this high-resolution information has already been used to assess essentiality at a genome-wide scale, no formal statistical model has been developed capable of quantifying significance. This thesis presents a formal Bayesian framework for analyzing sequence information obtained from transposon mutagenesis experiments. Our method assesses the statistical significance of gaps in transposon coverage that are indicative of essential regions through a Gumbel distribution, and utilizes a Metropolis-Hastings sampling procedure to obtain posterior estimates of the probability of essentiality for each gene. We apply our method to libraries of M. tuberculosis transposon mutants, to identify genes essential for growth in vitro, and show concordance with previous essentiality results based on hybridization. Furthermore, we show how our method is capable of identifying essential domains within genes, by detecting significant sub-regions of open-reading frames unable to withstand disruption. We show that several genes involved in PG biosynthesis have essential domains. Bioinformatics Bayesian Analysis Gumbel Metropolis Hastings Sampling
2	Fitting distances and dimension reduction methods with applications / Méthodes d’ajustement et de réduction de dimension avec applications Alawieh, Hiba 13 March 2017 (has links) Dans la plupart des études, le nombre de variables peut prendre des valeurs élevées ce qui rend leur analyse et leur visualisation assez difficile. Cependant, plusieurs méthodes statistiques ont été conçues pour réduire la complexité de ces données et permettant ainsi une meilleure compréhension des connaissances disponibles dans ces données. Dans cette thèse, notre objectif est de proposer deux nouvelles méthodes d’analyse des données multivariées intitulées en anglais : " Multidimensional Fitting" et "Projection under pairwise distance control". La première méthode est une dérivée de la méthode de positionnement multidimensionnelle dont l’application nécessite la disponibilité des deux matrices décrivant la même population : une matrice de coordonnées et une matrice de distances et l’objective est de modifier la matrice des coordonnées de telle sorte que les distances calculées sur cette matrice soient les plus proches possible des distances observées sur la matrice de distances. Nous avons élargi deux extensions de cette méthode : la première en pénalisant les vecteurs de modification des coordonnées et la deuxième en prenant en compte les effets aléatoires qui peuvent intervenir lors de la modification. La deuxième méthode est une nouvelle méthode de réduction de dimension basée sur la projection non linéaire des données dans un espace de dimension réduite et qui tient en compte la qualité de chaque point projeté pris individuellement dans l’espace réduit. La projection des points s’effectue en introduisant des variables supplémentaires, qui s’appellent "rayons", et indiquent dans quelle mesure la projection d’un point donné est précise. / In various studies the number of variables can take high values which makes their analysis and visualization quite difficult. However, several statistical methods have been developed to reduce the complexity of these data, allowing a better comprehension of the knowledge available in these data. In this thesis, our aim is to propose two new methods of multivariate data analysis called: " Multidimensional Fitting" and "Projection under pairwise distance control". The first method is a derivative of multidimensional scaling method (MDS) whose the application requires the availability of two matrices describing the same population: a coordinate matrix and a distance matrix and the objective is to modify the coordinate matrix such that the distances calculated on the modified matrix are as close as possible to the distances observed on the distance matrix. Two extensions of this method have been extended: the first by penalizing the modification vectors of the coordinates and the second by taking into account the random effects that may occur during the modification. The second method is a new method of dimensionality reduction techniques based on the non-linearly projection of the points in a reduced space by taking into account the projection quality of each projected point taken individually in the reduced space. The projection of the points is done by introducing additional variables, called "radii", and indicate to which extent the projection of each point is accurate. Pénalisation (statistique) Méthode de Metropolis Hastings 519.535
3	Modelos de sobrevivência com fração de cura e efeitos aleatórios / Cure rate models with random effects Lopes, Célia Mendes Carvalho 29 April 2008 (has links) Neste trabalho são apresentados dois modelos de sobrevivência com fração de cura e efeitos aleatórios, um baseado no modelo de Chen-Ibrahim-Sinha para fração de cura e o outro, no modelo de mistura. São estudadas abordagens clássica e bayesiana. Na inferência clássica são utilizados estimadores REML. Para a bayesiana foi utilizado Metropolis-Hastings. Estudos de simulação são feitos para avaliar a acurácia das estimativas dos parâmetros e seus respectivos desvios-padrão. O uso dos modelos é ilustrado com uma análise de dados de câncer na orofaringe. / In this work, it is shown two survival models with long term survivors and random effects, one based on Chen-Ibrahim-Sinha model for models with surviving fraction and the other, on mixture model. We present bayesian and classical approaches. In the first one, we use Metropolis-Hastings. For the second one, we use the REML estimators. A simulation study is done to evaluate the accuracy of the applied techniques for the estimatives and their standard deviations. An example on orofaringe cancer is used to illustrate the models considered in the study. efeitos aleatórios Fração de cura Metropolis-Hastings. Metropolis-Hastings. random effects REML REML Surviving fraction
4	[en] BAYESIAN INFERENCE ON MULTIVARIATE ARCH MODELS / [es] MODELOS BAYESIANOS MCMC PARA UN PROCESO ARCH MULTIVARIADO / [pt] MODELAGEM BAYESIANA MCMC PARA UM PROCESSO ARCH MULTIVARIADO LUIS ALBERTO NAVARRO HUAMANI 20 August 2001 (has links) [pt] O objetivo deste trabalho é desenvolver uma estratégia Metropolis-Hastings para inferência Bayesiana, usando a estrutura ARCH multivatriada com representação BEKK.Em problemas complexos, como a generalização ARCH/GARCH univariadas para estruturas multivariadas, o processo de inferência é dificultado por causa do número de parâmetros envolvidos e das restrições a que eles estão sujeitos. Neste trabalho desenvolvemos uma estratégia Metropolis- Hastings para inferência Bayesiana, usando uma estrutura ARCH multivariada com representação BEKK. / [en] The objective of this work is to develop Metropolis-Hasting for strategy Bayesian Inference, based on a Multivariate ARCH model with BEKK representation. In complex problems, such as the multivariate generalization of ARCH/GARCH structures, the inference process in complicated, due to the large number of parameters involved and to the restrictions they must satisfy. We propose Metropolis- Hastings structure to provide inference, in a Bayesian framework, for a multivariate ARCH model with BEKK representation. / [es] EL objetivo de este trabajo es desarrollar una estrategia Metropolis-Hastings para inferencia Bayesiana, usando La extructura ARCH multivatriada con representación BEKK.En problemas complejos, como la generalización ARCH/GARCH univariadas para extructuras multivariadas, el proceso de inferencia se hace dificil por causa del número de parámetros involucrados y de las restricciones a que ellos están sujetos. En este trabajo desarrollamos una estrategia Metropolis- Hastings para inferencia Bayesiana, usando una extructura ARCH multivariada con representación BEKK. [pt] METROPOLIS-HASTINGS [en] METROPOLIS-HASTINGS [pt] ARCH [en] ARCH [pt] INFERENCIA BAYESIANA [en] BAYESIAN INFERENCE
5	Modelos de sobrevivência com fração de cura e efeitos aleatórios / Cure rate models with random effects Célia Mendes Carvalho Lopes 29 April 2008 (has links) Neste trabalho são apresentados dois modelos de sobrevivência com fração de cura e efeitos aleatórios, um baseado no modelo de Chen-Ibrahim-Sinha para fração de cura e o outro, no modelo de mistura. São estudadas abordagens clássica e bayesiana. Na inferência clássica são utilizados estimadores REML. Para a bayesiana foi utilizado Metropolis-Hastings. Estudos de simulação são feitos para avaliar a acurácia das estimativas dos parâmetros e seus respectivos desvios-padrão. O uso dos modelos é ilustrado com uma análise de dados de câncer na orofaringe. / In this work, it is shown two survival models with long term survivors and random effects, one based on Chen-Ibrahim-Sinha model for models with surviving fraction and the other, on mixture model. We present bayesian and classical approaches. In the first one, we use Metropolis-Hastings. For the second one, we use the REML estimators. A simulation study is done to evaluate the accuracy of the applied techniques for the estimatives and their standard deviations. An example on orofaringe cancer is used to illustrate the models considered in the study. efeitos aleatórios Fração de cura Metropolis-Hastings. REML Metropolis-Hastings. random effects REML Surviving fraction
6	Importance Sampling of Rare Events in Chaotic Systems Leitão, Jorge C. 30 August 2016 (has links) (PDF) Rare events play a crucial role in our society and a great effort has been dedicated to numerically study them in different contexts. This thesis proposes a numerical methodology based on Monte Carlo Metropolis-Hastings algorithm to efficiently sample rare events in chaotic systems. It starts by reviewing the relevance of rare events in chaotic systems, focusing in two types of rare events: states in closed systems with rare chaoticities, characterised by a finite-time Lyapunov exponent on a tail of its distribution, and states in transiently chaotic systems, characterised by a escape time on the tail of its distribution. This thesis argues that these two problems can be interpreted as a traditional problem of statistical physics: sampling exponentially rare states in the phase-space - states in the tail of the density of states - with an increasing parameter - the system size. This is used as the starting point to review Metropolis-Hastings algorithm, a traditional and flexible methodology of importance sampling in statistical physics. By an analytical argument, it is shown that the chaoticity of the system hinders direct application of Metropolis-Hastings techniques to efficiently sample these states because the acceptance is low. It is argued that a crucial step to overcome low acceptance rate is to construct a proposal distribution that uses information about the system to bound the acceptance rate. Using generic properties of chaotic systems, such as exponential divergence of initial conditions and fractals embedded in their phase-spaces, a proposal distribution that guarantees a bounded acceptance rate is derived for each type of rare events. This proposal is numerically tested in simple chaotic systems, and the efficiency of the resulting algorithm is measured in numerous examples in both types of rare events. The results confirm the dramatic improvement of using Monte Carlo importance sampling with the derived proposals against traditional methodologies: the number of samples required to sample an exponentially rare state increases polynomially, as opposed to an exponential increase observed in uniform sampling. This thesis then analyses the sub-optimal (polynomial) efficiency of this algorithm in a simple system and shows analytically how the correlations induced by the proposal distribution can be detrimental to the efficiency of the algorithm. This thesis also analyses the effect of high-dimensional chaos in the proposal distribution and concludes that an anisotropic proposal that takes advantage of the different rates of expansion along the different unstable directions, is able to efficiently find rare states. The applicability of this methodology is also discussed to sample rare states in non-hyperbolic systems, with focus on three systems: the logistic map, the Pomeau-Manneville map, and the standard map. Here, it is argued that the different origins of non-hyperbolicity require different proposal distributions. Overall, the results show that by incorporating specific information about the system in the proposal distribution of Metropolis-Hastings algorithm, it is possible to efficiently find and sample rare events of chaotic systems. This improved methodology should be useful to a large class of problems where the numerical characterisation of rare events is important. Chaos Metropolis-Hastings chaos rare-events ddc:530 rvk:SK 820
7	Capacity Proportional Unstructured Peer-to-Peer Networks Reddy, Chandan Rama 2009 August 1900 (has links) Existing methods to utilize capacity-heterogeneity in a P2P system either rely on constructing special overlays with capacity-proportional node degree or use topology adaptation to match a node's capacity with that of its neighbors. In existing P2P networks, which are often characterized by diverse node capacities and high churn, these methods may require large node degree or continuous topology adaptation, potentially making them infeasible due to their high overhead. In this thesis, we propose an unstructured P2P system that attempts to address these issues. We first prove that the overall throughput of search queries in a heterogeneous network is maximized if and only if traffic load through each node is proportional to its capacity. Our proposed system achieves this traffic distribution by biasing search walks using the Metropolis-Hastings algorithm, without requiring any special underlying topology. We then define two saturation metrics for measuring the performance of overlay networks: one for quantifying their ability to support random walks and the second for measuring their potential to handle the overhead caused by churn. Using simulations, we finally compare our proposed method with Gia, an existing system which uses topology adaptation, and find that the former performs better under all studied conditions, both saturation metrics, and such end-to-end parameters as query success rate, latency, and query-hits for various file replication schemes. P2P heterogeneous networks random walks Metropolis-Hastings algorithm
8	Μελετώντας τον αλγόριθμο Metropolis-Hastings Γιαννόπουλος, Νικόλαος 27 March 2013 (has links) Η παρούσα διπλωματική διατριβή εντάσσεται ερευνητικά στην περιοχή της Υπολογιστικής Στατιστικής, καθώς ασχολούμαστε με τη μελέτη μεθόδων προσομοίωσης από κάποια κατανομή π (κατανομή στόχο) και τον υπολογισμό σύνθετων ολοκληρωμάτων. Σε πολλά πραγματικά προβλήματα, όπου η μορφή της π είναι ιδιαίτερα πολύπλοκή ή/και η διάσταση του χώρου καταστάσεων μεγάλη, η προσομοίωση από την π δεν μπορεί να γίνει με απλές τεχνικές καθώς επίσης και ο υπολογισμός των ολοκληρωμάτων είναι πάρα πολύ δύσκολο αν όχι αδύνατο να γίνει αναλυτικά. Γι’ αυτό, καταφεύγουμε σε τεχνικές Monte Carlo (MC) και Markov Chain Monte Carlo (MCMC), οι οποίες προσομοιώνουν τιμές τυχαίων μεταβλητών και εκτιμούν τα ολοκληρώματα μέσω κατάλληλων συναρτήσεων των προσομοιωμένων τιμών. Οι τεχνικές MC παράγουν ανεξάρτητες παρατηρήσεις είτε απ’ ευθείας από την κατανομή-στόχο π είτε από κάποια διαφορετική κατανομή-πρότασης g. Οι τεχνικές MCMC προσομοιώνουν αλυσίδες Markov με στάσιμη κατανομή την και επομένως οι παρατηρήσεις είναι εξαρτημένες. Στα πλαίσια αυτής της εργασίας θα ασχοληθούμε κυρίως με τον αλγόριθμο Metropolis-Hastings που είναι ένας από τους σημαντικότερους, αν όχι ο σημαντικότερος, MCMC αλγόριθμους. Πιο συγκεκριμένα, στο Κεφάλαιο 2 γίνεται μια σύντομη αναφορά σε γνωστές τεχνικές MC, όπως η μέθοδος Αποδοχής-Απόρριψης, η μέθοδος Αντιστροφής και η μέθοδος Δειγματοληψίας σπουδαιότητας καθώς επίσης και σε τεχνικές MCMC, όπως ο αλγόριθμός Metropolis-Hastings, o Δειγματολήπτης Gibbs και η μέθοδος Metropolis Within Gibbs. Στο Κεφάλαιο 3 γίνεται αναλυτική αναφορά στον αλγόριθμο Metropolis-Hastings. Αρχικά, παραθέτουμε μια σύντομη ιστορική αναδρομή και στη συνέχεια δίνουμε μια αναλυτική περιγραφή του. Παρουσιάζουμε κάποιες ειδικές μορφές τού καθώς και τις βασικές ιδιότητες που τον χαρακτηρίζουν. Το κεφάλαιο ολοκληρώνεται με την παρουσίαση κάποιων εφαρμογών σε προσομοιωμένα καθώς και σε πραγματικά δεδομένα. Το τέταρτο κεφάλαιο ασχολείται με μεθόδους εκτίμησης της διασποράς του εργοδικού μέσου ο οποίος προκύπτει από τις MCMC τεχνικές. Ιδιαίτερη αναφορά γίνεται στις μεθόδους Batch means και Spectral Variance Estimators. Τέλος, το Κεφάλαιο 5 ασχολείται με την εύρεση μιας κατάλληλης κατανομή πρότασης για τον αλγόριθμό Metropolis-Hastings. Παρόλο που ο αλγόριθμος Metropolis-Hastings μπορεί να συγκλίνει για οποιαδήποτε κατανομή πρότασης αρκεί να ικανοποιεί κάποιες βασικές υποθέσεις, είναι γνωστό ότι μία κατάλληλη επιλογή της κατανομής πρότασης βελτιώνει τη σύγκλιση του αλγόριθμου. Ο προσδιορισμός της βέλτιστής κατανομής πρότασης για μια συγκεκριμένη κατανομή στόχο είναι ένα πολύ σημαντικό αλλά εξίσου δύσκολο πρόβλημα. Το πρόβλημα αυτό έχει προσεγγιστεί με πολύ απλοϊκές τεχνικές (trial-and-error τεχνικές) αλλά και με adaptive αλγόριθμούς που βρίσκουν μια "καλή" κατανομή πρότασης αυτόματα. / This thesis is part of research in Computational Statistics, as we deal with the study of methods of modeling some distribution π (target distribution) and calculate complex integrals. In many real problems, where the form of π is very complex and / or the size of large state space, simulation of π can not be done with simple techniques as well as the calculation of the integrals is very difficult if not impossible to done analytically. So we resort to techniques Monte Carlo (MC) and Markov Chain Monte Carlo (MCMC), which simulate values of random variables and estimate the integrals by appropriate functions of the simulated values. These techniques produce MC independent observations either directly from the distribution n target or a different distribution motion-g. MCMC techniques simulate Markov chains with stationary distribution and therefore the observations are dependent. As part of this work we will deal mainly with the Metropolis-Hastings algorithm is one of the greatest, if not the most important, MCMC algorithms. More specifically, in Chapter 2 is a brief reference to known techniques MC, such as Acceptance-Rejection method, the inversion method and importance sampling methods as well as techniques MCMC, as the algorithm Metropolis-Hastings, o Gibbs sampler and method Metropolis Within Gibbs. Chapter 3 is a detailed report on the algorithm Metropolis-Hastings. First, we present a brief history and then give a detailed description. Present some specific forms as well as the basic properties that characterize them. The chapter concludes with a presentation of some applications on simulated and real data. The fourth chapter deals with methods for estimating the dispersion of ergodic average, derived from the MCMC techniques. Particular reference is made to methods Batch means and Spectral Variance Estimators. Finally, Chapter 5 deals with finding a suitable proposal for the allocation algorithm Metropolis-Hastings. Although the Metropolis-Hastings algorithm can converge on any distribution motion sufficient to satisfy some basic assumptions, it is known that an appropriate selection of the distribution proposal improves the convergence of the algorithm. Determining the optimal allocation proposal for a specific distribution target is a very important but equally difficult problem. This problem has been approached in a very simplistic techniques (trial-and-error techniques) but also with adaptive algorithms that find a "good" allocation proposal automatically. Αλγόριθμοι Μέθοδοι προσομοίωσης 519.502 85 Algorithms Metropolis-Hastings Monte Carlo
9	Modeling the Performance of a Baseball Player's Offensive Production Smith, Michael Ross 09 March 2006 (has links) (PDF) This project addresses the problem of comparing the offensive abilities of players from different eras in Major League Baseball (MLB). We will study players from the perspective of an overall offensive summary statistic that is highly linked with scoring runs, or the Berry Value. We will build an additive model to estimate the innate ability of the player, the effect of the relative level of competition of each season, and the effect of age on performance using piecewise age curves. Using Hierarchical Bayes methodology with Gibbs sampling, we model each of these effects for each individual. The results of the Hierarchical Bayes model permit us to link players from different eras and to rank the players across the modern era of baseball (1900-2004) on the basis of their innate overall offensive ability. The top of the rankings, of which the top three were Babe Ruth, Lou Gehrig, and Stan Musial, include many Hall of Famers and some of the most productive offensive players in the history of the game. We also determine that trends in overall offensive ability in Major League Baseball exist based on different rule and cultural changes. Based on the model, MLB is currently at a high level of run production compared to the different levels of run production over the last century. baseball Hierarchical Bayes Gibbs sampler Metropolis-Hastings Statistics and Probability
10	Análise bayesiana do modelo fatorial dinâmico para um vetor de séries temporais usando distribuições elípticas. / Bayesian Analysis of the dynamic factorial models for a time series vector using elliptical distribuitions. Borges, Livia Costa 27 May 2008 (has links) A análise fatorial é uma importante ferramenta estatística que tem amplas aplicações práticas e explica a correlação entre um grande número de variáveis observáveis em termos de um pequeno número de variáveis não observáveis, conhecidas como variáveis latentes. A proposta deste trabalho é fazer a análise Bayesiana, que incorpora à análise o conhecimento que se tenha sobre os parâmetros antes da coleta dos dados, do modelo fatorial dinâmico na classe de modelos elípticos multivariados, assumindo que a um vetor de q séries temporais pode-se ajustar um modelo fatorial com k < q fatores mais um ruído branco, e que a parte latente segue um modelo vetorial auto-regressivo. A classe de modelos elípticos citada acima é rica em distribuições simétricas com caudas mais pesadas que as da distribuição normal, característica importante na análise de séries financeiras. Essa classe inclui as distribuições t de Student, exponencial potência, normal contaminada, entre outras. A inferência sobre os parâmetros foi feita utilizando métodos de Monte Carlo via Cadeias de Markov, com os algoritmos Metropolis-Hastings e Griddy-Gibbs, através da obtenção das distribuições a posteriori dos parâmetros e dos fatores. A determinação da convergência do processo foi feita por técnicas gráficas e pelos métodos de Geweke (1992), de Heidelberger e Welch (1983) e Half-Width. O método foi ilustrado usando dados reais e simulados. / The factor analysis is an important statistical tool that has wide practical applications and it explains the correlation among a large number of observable variables in terms of a small number of unobservable variables, known as latent variables. The proposal of this work is the Bayesian analysis, which incorporates the information we have concerning the parameters before collecting data into the analysis of a dynamical factor model in the class of multivariate elliptical models, where the factors follow a multivariate autoregressive model, assuming that a vector of q time series can be adjusted with k < q factors and a white noise. The class of elliptical models is rich in symmetrical distributions with heavier tails than the normal distribution, which is an important characteristic in financial series analysis. This class includes t-Student, power exponential, contaminated normal and other distributions. The parameters inference was made through Monte Carlo Markov Chain methods, with Metropolis-Hastings and Griddy-Gibbs algorithms, by obtaining the parameters and factors posteriori distributions. The convergence process was made through graphical technics and by Geweke (1992) and by Heidelberger and Welch (1983) and Half- Width methods. The method was illustrated using simulated and real data. Anállise Bayesiana Baysian analysis distribuições elípticas elliptical distributions factorial model Metropolis-Hastings Metropolis-Hastings< modelo fatorial séries temporais time series

Search results