11 |
Fitting distances and dimension reduction methods with applications / Méthodes d’ajustement et de réduction de dimension avec applicationsAlawieh, Hiba 13 March 2017 (has links)
Dans la plupart des études, le nombre de variables peut prendre des valeurs élevées ce qui rend leur analyse et leur visualisation assez difficile. Cependant, plusieurs méthodes statistiques ont été conçues pour réduire la complexité de ces données et permettant ainsi une meilleure compréhension des connaissances disponibles dans ces données. Dans cette thèse, notre objectif est de proposer deux nouvelles méthodes d’analyse des données multivariées intitulées en anglais : " Multidimensional Fitting" et "Projection under pairwise distance control". La première méthode est une dérivée de la méthode de positionnement multidimensionnelle dont l’application nécessite la disponibilité des deux matrices décrivant la même population : une matrice de coordonnées et une matrice de distances et l’objective est de modifier la matrice des coordonnées de telle sorte que les distances calculées sur cette matrice soient les plus proches possible des distances observées sur la matrice de distances. Nous avons élargi deux extensions de cette méthode : la première en pénalisant les vecteurs de modification des coordonnées et la deuxième en prenant en compte les effets aléatoires qui peuvent intervenir lors de la modification. La deuxième méthode est une nouvelle méthode de réduction de dimension basée sur la projection non linéaire des données dans un espace de dimension réduite et qui tient en compte la qualité de chaque point projeté pris individuellement dans l’espace réduit. La projection des points s’effectue en introduisant des variables supplémentaires, qui s’appellent "rayons", et indiquent dans quelle mesure la projection d’un point donné est précise. / In various studies the number of variables can take high values which makes their analysis and visualization quite difficult. However, several statistical methods have been developed to reduce the complexity of these data, allowing a better comprehension of the knowledge available in these data. In this thesis, our aim is to propose two new methods of multivariate data analysis called: " Multidimensional Fitting" and "Projection under pairwise distance control". The first method is a derivative of multidimensional scaling method (MDS) whose the application requires the availability of two matrices describing the same population: a coordinate matrix and a distance matrix and the objective is to modify the coordinate matrix such that the distances calculated on the modified matrix are as close as possible to the distances observed on the distance matrix. Two extensions of this method have been extended: the first by penalizing the modification vectors of the coordinates and the second by taking into account the random effects that may occur during the modification. The second method is a new method of dimensionality reduction techniques based on the non-linearly projection of the points in a reduced space by taking into account the projection quality of each projected point taken individually in the reduced space. The projection of the points is done by introducing additional variables, called "radii", and indicate to which extent the projection of each point is accurate.
|
12 |
Modelos de sobrevivência com fração de cura e efeitos aleatórios / Cure rate models with random effectsLopes, Célia Mendes Carvalho 29 April 2008 (has links)
Neste trabalho são apresentados dois modelos de sobrevivência com fração de cura e efeitos aleatórios, um baseado no modelo de Chen-Ibrahim-Sinha para fração de cura e o outro, no modelo de mistura. São estudadas abordagens clássica e bayesiana. Na inferência clássica são utilizados estimadores REML. Para a bayesiana foi utilizado Metropolis-Hastings. Estudos de simulação são feitos para avaliar a acurácia das estimativas dos parâmetros e seus respectivos desvios-padrão. O uso dos modelos é ilustrado com uma análise de dados de câncer na orofaringe. / In this work, it is shown two survival models with long term survivors and random effects, one based on Chen-Ibrahim-Sinha model for models with surviving fraction and the other, on mixture model. We present bayesian and classical approaches. In the first one, we use Metropolis-Hastings. For the second one, we use the REML estimators. A simulation study is done to evaluate the accuracy of the applied techniques for the estimatives and their standard deviations. An example on orofaringe cancer is used to illustrate the models considered in the study.
|
13 |
[en] BAYESIAN INFERENCE ON MULTIVARIATE ARCH MODELS / [es] MODELOS BAYESIANOS MCMC PARA UN PROCESO ARCH MULTIVARIADO / [pt] MODELAGEM BAYESIANA MCMC PARA UM PROCESSO ARCH MULTIVARIADOLUIS ALBERTO NAVARRO HUAMANI 20 August 2001 (has links)
[pt] O objetivo deste trabalho é desenvolver uma estratégia
Metropolis-Hastings para inferência Bayesiana, usando a
estrutura ARCH multivatriada com representação BEKK.Em
problemas complexos, como a generalização ARCH/GARCH
univariadas para estruturas multivariadas, o processo de
inferência é dificultado por causa do número de
parâmetros
envolvidos e das restrições a que eles estão sujeitos.
Neste trabalho desenvolvemos uma estratégia Metropolis-
Hastings para inferência Bayesiana, usando uma estrutura
ARCH multivariada com representação BEKK. / [en] The objective of this work is to develop Metropolis-Hasting
for strategy Bayesian Inference, based on a Multivariate
ARCH model with BEKK representation. In complex problems,
such as the multivariate generalization of ARCH/GARCH
structures, the inference process in complicated, due to
the large number of parameters involved and to the
restrictions they must satisfy. We propose Metropolis-
Hastings structure to provide inference, in a Bayesian
framework, for a multivariate ARCH model with BEKK
representation. / [es] EL objetivo de este trabajo es desarrollar una estrategia Metropolis-Hastings para inferencia
Bayesiana, usando La extructura ARCH multivatriada con representación BEKK.En problemas
complejos, como la generalización ARCH/GARCH univariadas para extructuras multivariadas, el
proceso de inferencia se hace dificil por causa del número de parámetros involucrados y de las
restricciones a que ellos están sujetos. En este trabajo desarrollamos una estrategia Metropolis-
Hastings para inferencia Bayesiana, usando una extructura ARCH multivariada con representación
BEKK.
|
14 |
Modelos de sobrevivência com fração de cura e efeitos aleatórios / Cure rate models with random effectsCélia Mendes Carvalho Lopes 29 April 2008 (has links)
Neste trabalho são apresentados dois modelos de sobrevivência com fração de cura e efeitos aleatórios, um baseado no modelo de Chen-Ibrahim-Sinha para fração de cura e o outro, no modelo de mistura. São estudadas abordagens clássica e bayesiana. Na inferência clássica são utilizados estimadores REML. Para a bayesiana foi utilizado Metropolis-Hastings. Estudos de simulação são feitos para avaliar a acurácia das estimativas dos parâmetros e seus respectivos desvios-padrão. O uso dos modelos é ilustrado com uma análise de dados de câncer na orofaringe. / In this work, it is shown two survival models with long term survivors and random effects, one based on Chen-Ibrahim-Sinha model for models with surviving fraction and the other, on mixture model. We present bayesian and classical approaches. In the first one, we use Metropolis-Hastings. For the second one, we use the REML estimators. A simulation study is done to evaluate the accuracy of the applied techniques for the estimatives and their standard deviations. An example on orofaringe cancer is used to illustrate the models considered in the study.
|
15 |
Importance Sampling of Rare Events in Chaotic SystemsLeitão, Jorge C. 30 August 2016 (has links) (PDF)
Rare events play a crucial role in our society and a great effort has been dedicated to numerically study them in different contexts. This thesis proposes a numerical methodology based on Monte Carlo Metropolis-Hastings algorithm to efficiently sample rare events in chaotic systems. It starts by reviewing the relevance of rare events in chaotic systems, focusing in two types of rare events: states in closed systems with rare chaoticities, characterised by a finite-time Lyapunov exponent on a tail of its distribution, and states in transiently chaotic systems, characterised by a escape time on the tail of its distribution.
This thesis argues that these two problems can be interpreted as a traditional problem of statistical physics: sampling exponentially rare states in the phase-space - states in the tail of the density of states - with an increasing parameter - the system size. This is used as the starting point to review Metropolis-Hastings algorithm, a traditional and flexible methodology of importance sampling in statistical physics. By an analytical argument, it is shown that the chaoticity of the system hinders direct application of Metropolis-Hastings techniques to efficiently sample these states because the acceptance is low. It is argued that a crucial step to overcome low acceptance rate is to construct a proposal distribution that uses information about the system to bound the acceptance rate. Using generic properties of chaotic systems, such as exponential divergence of initial conditions and fractals embedded in their phase-spaces, a proposal distribution that guarantees a bounded acceptance rate is derived for each type of rare events. This proposal is numerically tested in simple chaotic systems, and the efficiency of the resulting algorithm is measured in numerous examples in both types of rare events.
The results confirm the dramatic improvement of using Monte Carlo importance sampling with the derived proposals against traditional methodologies:
the number of samples required to sample an exponentially rare state increases polynomially, as opposed to an exponential increase observed in uniform sampling. This thesis then analyses the sub-optimal (polynomial) efficiency of this algorithm in a simple system and shows analytically how the correlations induced by the proposal distribution can be detrimental to the efficiency of the algorithm. This thesis also analyses the effect of high-dimensional chaos in the proposal distribution and concludes that an anisotropic proposal that takes advantage of the different rates of expansion along the different unstable directions, is able to efficiently find rare states.
The applicability of this methodology is also discussed to sample rare states in non-hyperbolic systems, with focus on three systems: the logistic map, the Pomeau-Manneville map, and the standard map. Here, it is argued that the different origins of non-hyperbolicity require different proposal distributions. Overall, the results show that by incorporating specific information about the system in the proposal distribution of Metropolis-Hastings algorithm, it is possible to efficiently find and sample rare events of chaotic systems. This improved methodology should be useful to a large class of problems where the numerical characterisation of rare events is important.
|
16 |
Simulations and applications of large-scale k-determinantal point processes / Simulations et applications des k-processus ponctuels déterminantauxWehbe, Diala 03 April 2019 (has links)
Avec la croissance exponentielle de la quantité de données, l’échantillonnage est une méthode pertinente pour étudier les populations. Parfois, nous avons besoin d’échantillonner un grand nombre d’objets d’une part pour exclure la possibilité d’un manque d’informations clés et d’autre part pour générer des résultats plus précis. Le problème réside dans le fait que l’échantillonnage d’un trop grand nombre d’individus peut constituer une perte de temps.Dans cette thèse, notre objectif est de chercher à établir des ponts entre la statistique et le k-processus ponctuel déterminantal(k-DPP) qui est défini via un noyau. Nous proposons trois projets complémentaires pour l’échantillonnage de grands ensembles de données en nous basant sur les k-DPPs. Le but est de sélectionner des ensembles variés qui couvrent un ensemble d’objets beaucoup plus grand en temps polynomial. Cela peut être réalisé en construisant différentes chaînes de Markov où les k-DPPs sont les lois stationnaires.Le premier projet consiste à appliquer les processus déterminantaux à la sélection d’espèces diverses dans un ensemble d’espèces décrites par un arbre phylogénétique. En définissant le noyau du k-DPP comme un noyau d’intersection, les résultats fournissent une borne polynomiale sur le temps de mélange qui dépend de la hauteur de l’arbre phylogénétique.Le second projet vise à utiliser le k-DPP dans un problème d’échantillonnage de sommets sur un graphe connecté de grande taille. La pseudo-inverse de la matrice Laplacienne normalisée est choisie d’étudier la vitesse de convergence de la chaîne de Markov créée pour l’échantillonnage de la loi stationnaire k-DPP. Le temps de mélange résultant est borné sous certaines conditions sur les valeurs propres de la matrice Laplacienne.Le troisième sujet porte sur l’utilisation des k-DPPs dans la planification d’expérience avec comme objets d’étude plus spécifiques les hypercubes latins d’ordre n et de dimension d. La clé est de trouver un noyau positif qui préserve le contrainte de ce plan c’est-à-dire qui préserve le fait que chaque point se trouve exactement une fois dans chaque hyperplan. Ensuite, en créant une nouvelle chaîne de Markov dont le n-DPP est sa loi stationnaire, nous déterminons le nombre d’étapes nécessaires pour construire un hypercube latin d’ordre n selon le n-DPP. / With the exponentially growing amount of data, sampling remains the most relevant method to learn about populations. Sometimes, larger sample size is needed to generate more precise results and to exclude the possibility of missing key information. The problem lies in the fact that sampling large number may be a principal reason of wasting time.In this thesis, our aim is to build bridges between applications of statistics and k-Determinantal Point Process(k-DPP) which is defined through a matrix kernel. We have proposed different applications for sampling large data sets basing on k-DPP, which is a conditional DPP that models only sets of cardinality k. The goal is to select diverse sets that cover a much greater set of objects in polynomial time. This can be achieved by constructing different Markov chains which have the k-DPPs as their stationary distribution.The first application consists in sampling a subset of species in a phylogenetic tree by avoiding redundancy. By defining the k-DPP via an intersection kernel, the results provide a fast mixing sampler for k-DPP, for which a polynomial bound on the mixing time is presented and depends on the height of the phylogenetic tree.The second application aims to clarify how k-DPPs offer a powerful approach to find a diverse subset of nodes in large connected graph which authorizes getting an outline of different types of information related to the ground set. A polynomial bound on the mixing time of the proposed Markov chain is given where the kernel used here is the Moore-Penrose pseudo-inverse of the normalized Laplacian matrix. The resulting mixing time is attained under certain conditions on the eigenvalues of the Laplacian matrix. The third one purposes to use the fixed cardinality DPP in experimental designs as a tool to study a Latin Hypercube Sampling(LHS) of order n. The key is to propose a DPP kernel that establishes the negative correlations between the selected points and preserve the constraint of the design which is strictly confirmed by the occurrence of each point exactly once in each hyperplane. Then by creating a new Markov chain which has n-DPP as its stationary distribution, we determine the number of steps required to build a LHS with accordance to n-DPP.
|
17 |
Capacity Proportional Unstructured Peer-to-Peer NetworksReddy, Chandan Rama 2009 August 1900 (has links)
Existing methods to utilize capacity-heterogeneity in a P2P system either rely
on constructing special overlays with capacity-proportional node degree or use topology adaptation to match a node's capacity with that of its neighbors. In existing
P2P networks, which are often characterized by diverse node capacities and high
churn, these methods may require large node degree or continuous topology adaptation, potentially making them infeasible due to their high overhead. In this thesis,
we propose an unstructured P2P system that attempts to address these issues. We
first prove that the overall throughput of search queries in a heterogeneous network
is maximized if and only if traffic load through each node is proportional to its capacity. Our proposed system achieves this traffic distribution by biasing search walks
using the Metropolis-Hastings algorithm, without requiring any special underlying
topology. We then define two saturation metrics for measuring the performance of
overlay networks: one for quantifying their ability to support random walks and the
second for measuring their potential to handle the overhead caused by churn. Using
simulations, we finally compare our proposed method with Gia, an existing system
which uses topology adaptation, and find that the former performs better under all
studied conditions, both saturation metrics, and such end-to-end parameters as query
success rate, latency, and query-hits for various file replication schemes.
|
18 |
The transition of a typical frontier with illustrations from the life of Henry Hastings Sibley.Shortridge, Wilson Porter, January 1919 (has links)
Thesis--University of Minnesota. / "Bibliography," p. 174-182.
|
19 |
Μελετώντας τον αλγόριθμο Metropolis-HastingsΓιαννόπουλος, Νικόλαος 27 March 2013 (has links)
Η παρούσα διπλωματική διατριβή εντάσσεται ερευνητικά στην περιοχή της Υπολογιστικής Στατιστικής, καθώς ασχολούμαστε με τη μελέτη μεθόδων προσομοίωσης από κάποια κατανομή π (κατανομή στόχο) και τον υπολογισμό σύνθετων ολοκληρωμάτων. Σε πολλά πραγματικά προβλήματα, όπου η μορφή της π είναι ιδιαίτερα πολύπλοκή ή/και η διάσταση του χώρου καταστάσεων μεγάλη, η προσομοίωση από την π δεν μπορεί να γίνει με απλές τεχνικές καθώς επίσης και ο υπολογισμός των ολοκληρωμάτων είναι πάρα πολύ δύσκολο αν όχι αδύνατο να γίνει αναλυτικά. Γι’ αυτό, καταφεύγουμε σε τεχνικές Monte Carlo (MC) και Markov Chain Monte Carlo (MCMC), οι οποίες προσομοιώνουν τιμές τυχαίων μεταβλητών και εκτιμούν τα ολοκληρώματα μέσω κατάλληλων συναρτήσεων των προσομοιωμένων τιμών. Οι τεχνικές MC παράγουν ανεξάρτητες παρατηρήσεις είτε απ’ ευθείας από την κατανομή-στόχο π είτε από κάποια διαφορετική κατανομή-πρότασης g. Οι τεχνικές MCMC προσομοιώνουν αλυσίδες Markov με στάσιμη κατανομή την και επομένως οι παρατηρήσεις είναι εξαρτημένες.
Στα πλαίσια αυτής της εργασίας θα ασχοληθούμε κυρίως με τον αλγόριθμο Metropolis-Hastings που είναι ένας από τους σημαντικότερους, αν όχι ο σημαντικότερος, MCMC αλγόριθμους.
Πιο συγκεκριμένα, στο Κεφάλαιο 2 γίνεται μια σύντομη αναφορά σε γνωστές τεχνικές MC, όπως η μέθοδος Αποδοχής-Απόρριψης, η μέθοδος Αντιστροφής και η μέθοδος Δειγματοληψίας σπουδαιότητας καθώς επίσης και σε τεχνικές MCMC, όπως ο αλγόριθμός Metropolis-Hastings, o Δειγματολήπτης Gibbs και η μέθοδος Metropolis Within Gibbs.
Στο Κεφάλαιο 3 γίνεται αναλυτική αναφορά στον αλγόριθμο Metropolis-Hastings. Αρχικά, παραθέτουμε μια σύντομη ιστορική αναδρομή και στη συνέχεια δίνουμε μια αναλυτική περιγραφή του. Παρουσιάζουμε κάποιες ειδικές μορφές τού καθώς και τις βασικές ιδιότητες που τον χαρακτηρίζουν. Το κεφάλαιο ολοκληρώνεται με την παρουσίαση κάποιων εφαρμογών σε προσομοιωμένα καθώς και σε πραγματικά δεδομένα.
Το τέταρτο κεφάλαιο ασχολείται με μεθόδους εκτίμησης της διασποράς του εργοδικού μέσου ο οποίος προκύπτει από τις MCMC τεχνικές. Ιδιαίτερη αναφορά γίνεται στις μεθόδους Batch means και Spectral Variance Estimators.
Τέλος, το Κεφάλαιο 5 ασχολείται με την εύρεση μιας κατάλληλης κατανομή πρότασης για τον αλγόριθμό Metropolis-Hastings. Παρόλο που ο αλγόριθμος Metropolis-Hastings μπορεί να συγκλίνει για οποιαδήποτε κατανομή πρότασης αρκεί να ικανοποιεί κάποιες βασικές υποθέσεις, είναι γνωστό ότι μία κατάλληλη επιλογή της κατανομής πρότασης βελτιώνει τη σύγκλιση του αλγόριθμου. Ο προσδιορισμός της βέλτιστής κατανομής πρότασης για μια συγκεκριμένη κατανομή στόχο είναι ένα πολύ σημαντικό αλλά εξίσου δύσκολο πρόβλημα. Το πρόβλημα αυτό έχει προσεγγιστεί με πολύ απλοϊκές τεχνικές (trial-and-error τεχνικές) αλλά και με adaptive αλγόριθμούς που βρίσκουν μια "καλή" κατανομή πρότασης αυτόματα. / This thesis is part of research in Computational Statistics, as we deal with the study of methods of modeling some distribution π (target distribution) and calculate complex integrals. In many real problems, where the form of π is very complex and / or the size of large state space, simulation of π can not be done with simple techniques as well as the calculation of the integrals is very difficult if not impossible to done analytically. So we resort to techniques Monte Carlo (MC) and Markov Chain Monte Carlo (MCMC), which simulate values of random variables and estimate the integrals by appropriate functions of the simulated values. These techniques produce MC independent observations either directly from the distribution n target or a different distribution motion-g. MCMC techniques simulate Markov chains with stationary distribution and therefore the observations are dependent.
As part of this work we will deal mainly with the Metropolis-Hastings algorithm is one of the greatest, if not the most important, MCMC algorithms.
More specifically, in Chapter 2 is a brief reference to known techniques MC, such as Acceptance-Rejection method, the inversion method and importance sampling methods as well as techniques MCMC, as the algorithm Metropolis-Hastings, o Gibbs sampler and method Metropolis Within Gibbs.
Chapter 3 is a detailed report on the algorithm Metropolis-Hastings. First, we present a brief history and then give a detailed description. Present some specific forms as well as the basic properties that characterize them. The chapter concludes with a presentation of some applications on simulated and real data.
The fourth chapter deals with methods for estimating the dispersion of ergodic average, derived from the MCMC techniques. Particular reference is made to methods Batch means and Spectral Variance Estimators.
Finally, Chapter 5 deals with finding a suitable proposal for the allocation algorithm Metropolis-Hastings. Although the Metropolis-Hastings algorithm can converge on any distribution motion sufficient to satisfy some basic assumptions, it is known that an appropriate selection of the distribution proposal improves the convergence of the algorithm. Determining the optimal allocation proposal for a specific distribution target is a very important but equally difficult problem. This problem has been approached in a very simplistic techniques (trial-and-error techniques) but also with adaptive algorithms that find a "good" allocation proposal automatically.
|
20 |
Modeling the Performance of a Baseball Player's Offensive ProductionSmith, Michael Ross 09 March 2006 (has links) (PDF)
This project addresses the problem of comparing the offensive abilities of players from different eras in Major League Baseball (MLB). We will study players from the perspective of an overall offensive summary statistic that is highly linked with scoring runs, or the Berry Value. We will build an additive model to estimate the innate ability of the player, the effect of the relative level of competition of each season, and the effect of age on performance using piecewise age curves. Using Hierarchical Bayes methodology with Gibbs sampling, we model each of these effects for each individual. The results of the Hierarchical Bayes model permit us to link players from different eras and to rank the players across the modern era of baseball (1900-2004) on the basis of their innate overall offensive ability. The top of the rankings, of which the top three were Babe Ruth, Lou Gehrig, and Stan Musial, include many Hall of Famers and some of the most productive offensive players in the history of the game. We also determine that trends in overall offensive ability in Major League Baseball exist based on different rule and cultural changes. Based on the model, MLB is currently at a high level of run production compared to the different levels of run production over the last century.
|
Page generated in 0.0582 seconds