Global ETD Search

1	Hypothesis Testing in GWAS and Statistical Issues with Compensation in Clinical Trials Swanson, David Michael 27 September 2013 (has links) We first show theoretically and in simulation how power varies as a function of SNP correlation structure with currently-implemented gene-based testing methods. We propose alternative testing methods whose power does not vary with the correlation structure. We then propose hypothesis tests for detecting prevalence-incidence bias in case-control studies, a bias perhaps overrepresented in GWAS due to currently used study designs. Lastly, we hypothesize how different incentive structures used to keep clinical trial participants in studies may interact with a background of dependent censoring and result in variation in the bias of the Kaplan-Meier survival curve estimator. Biostatistics Epidemiology bias compensation dependent censoring dimension-reduction GWAS U-statistic
2	Empirical Likelihood Inference for the Accelerated Failure Time Model via Kendall Estimating Equation Lu, Yinghua 17 July 2010 (has links) In this thesis, we study two methods for inference of parameters in the accelerated failure time model with right censoring data. One is the Wald-type method, which involves parameter estimation. The other one is empirical likelihood method, which is based on the asymptotic distribution of likelihood ratio. We employ a monotone censored data version of Kendall estimating equation, and construct confidence intervals from both methods. In the simulation studies, we compare the empirical likelihood (EL) and the Wald-type procedure in terms of coverage accuracy and average length of confidence intervals. It is concluded that the empirical likelihood method has a better performance. We also compare the EL for Kendall’s rank regression estimator with the EL for other well known estimators and find advantages of the EL for Kendall estimator for small size sample. Finally, a real clinical trial data is used for the purpose of illustration. Confidence interval Coverage probability Average length U-statistic Empirical likelihood Kendall’s rank regression Censored data Mathematics
3	Adaptation des méthodes d’apprentissage aux U-statistiques / Adapting machine learning methods to U-statistics Colin, Igor 24 November 2016 (has links) L’explosion récente des volumes de données disponibles a fait de la complexité algorithmique un élément central des méthodes d’apprentissage automatique. Les algorithmes d’optimisation stochastique ainsi que les méthodes distribuées et décentralisées ont été largement développés durant les dix dernières années. Ces méthodes ont permis de faciliter le passage à l’échelle pour optimiser des risques empiriques dont la formulation est séparable en les observations associées. Pourtant, dans de nombreux problèmes d’apprentissage statistique, l’estimation précise du risque s’effectue à l’aide de U-statistiques, des fonctions des données prenant la forme de moyennes sur des d-uplets. Nous nous intéressons tout d’abord au problème de l’échantillonnage pour la minimisation du risque empirique. Nous montrons que le risque peut être remplacé par un estimateur de Monte-Carlo, intitulé U-statistique incomplète, basé sur seulement O(n) termes et permettant de conserver un taux d’apprentissage du même ordre. Nous établissons des bornes sur l’erreur d’approximation du U-processus et les simulations numériques mettent en évidence l’avantage d’une telle technique d’échantillonnage. Nous portons par la suite notre attention sur l’estimation décentralisée, où les observations sont désormais distribuées sur un réseau connexe. Nous élaborons des algorithmes dits gossip, dans des cadres synchrones et asynchrones, qui diffusent les observations tout en maintenant des estimateurs locaux de la U-statistique à estimer. Nous démontrons la convergence de ces algorithmes avec des dépendances explicites en les données et la topologie du réseau. Enfin, nous traitons de l’optimisation décentralisée de fonctions dépendant de paires d’observations. De même que pour l’estimation, nos méthodes sont basées sur la concomitance de la propagation des observations et l’optimisation local du risque. Notre analyse théorique souligne que ces méthodes conservent une vitesse de convergence du même ordre que dans le cas centralisé. Les expériences numériques confirment l’intérêt pratique de notre approche. / With the increasing availability of large amounts of data, computational complexity has become a keystone of many machine learning algorithms. Stochastic optimization algorithms and distributed/decentralized methods have been widely studied over the last decade and provide increased scalability for optimizing an empirical risk that is separable in the data sample. Yet, in a wide range of statistical learning problems, the risk is accurately estimated by U-statistics, i.e., functionals of the training data with low variance that take the form of averages over d-tuples. We first tackle the problem of sampling for the empirical risk minimization problem. We show that empirical risks can be replaced by drastically computationally simpler Monte-Carlo estimates based on O(n) terms only, usually referred to as incomplete U-statistics, without damaging the learning rate. We establish uniform deviation results and numerical examples show that such approach surpasses more naive subsampling techniques. We then focus on the decentralized estimation topic, where the data sample is distributed over a connected network. We introduce new synchronous and asynchronous randomized gossip algorithms which simultaneously propagate data across the network and maintain local estimates of the U-statistic of interest. We establish convergence rate bounds with explicit data and network dependent terms. Finally, we deal with the decentralized optimization of functions that depend on pairs of observations. Similarly to the estimation case, we introduce a method based on concurrent local updates and data propagation. Our theoretical analysis reveals that the proposed algorithms preserve the convergence rate of centralized dual averaging up to an additive bias term. Our simulations illustrate the practical interest of our approach. U-statistique Gossip Optimisation décentralisée Graphe U-statistic Gossip Decentralized optimization Graph
4	Jackknife Empirical Likelihood for the Accelerated Failure Time Model with Censored Data Bouadoumou, Maxime K 15 July 2011 (has links) Kendall and Gehan estimating functions are used to estimate the regression parameter in accelerated failure time (AFT) model with censored observations. The accelerated failure time model is the preferred survival analysis method because it maintains a consistent association between the covariate and the survival time. The jackknife empirical likelihood method is used because it overcomes computation difficulty by circumventing the construction of the nonlinear constraint. Jackknife empirical likelihood turns the statistic of interest into a sample mean based on jackknife pseudo-values. U-statistic approach is used to construct the confidence intervals for the regression parameter. We conduct a simulation study to compare the Wald-type procedure, the empirical likelihood, and the jackknife empirical likelihood in terms of coverage probability and average length of confidence intervals. Jackknife empirical likelihood method has a better performance and overcomes the under-coverage problem of the Wald-type method. A real data is also used to illustrate the proposed methods. Confidence interval Coverage probability Jackknife empirical likelihood Right-censoring U-statistic Kendall’s estimating equation Gehan Logrank Mathematics
5	Inference for the K-sample problem based on precedence probabilities Dey, Rajarshi January 1900 (has links) Doctor of Philosophy / Department of Statistics / Paul I. Nelson / Rank based inference using independent random samples to compare K>1 continuous distributions, called the K-sample problem, based on precedence probabilities is developed and explored. There are many parametric and nonparametric approaches, most dealing with hypothesis testing, to this important, classical problem. Most existing tests are designed to detect differences among the location parameters of different distributions. Best known and most widely used of these is the F- test, which assumes normality. A comparable nonparametric test was developed by Kruskal and Wallis (1952). When dealing with location-scale families of distributions, both of these tests can perform poorly if the differences among the distributions are among their scale parameters and not in their location parameters. Overall, existing tests are not effective in detecting changes in both location and scale. In this dissertation, I propose a new class of rank-based, asymptotically distribution- free tests that are effective in detecting changes in both location and scale based on precedence probabilities. Let X_{i} be a random variable with distribution function F_{i} ; Also, let _pi_ be the set of all permutations of the numbers (1,2,...,K) . Then P(X_{i_{1}}<...<X_{i_{K}}) is a precedence probability if (i_{1},...,i_{K}) belongs to _pi_. Properties of these of tests are developed using the theory of U-statistics (Hoeffding, 1948). Some of these new tests are related to volumes under ROC (Receiver Operating Characteristic) surfaces, which are of particular interest in clinical trials whose goal is to use a score to separate subjects into diagnostic groups. Motivated by this goal, I propose three new index measures of the separation or similarity among two or more distributions. These indices may be used as “effect sizes”. In a related problem, Properties of precedence probabilities are obtained and a bootstrap algorithm is used to estimate an interval for them. Precedence probabilities Nonlinear rank-based statistic U-statistic K-sample problem Hypervolume under ROC manifold Statistics (0463)

1

Page generated in 0.0473 seconds