Spelling suggestions: "subject:"statistic"" "subject:"estatistic""
1 |
Hypothesis Testing in GWAS and Statistical Issues with Compensation in Clinical TrialsSwanson, David Michael 27 September 2013 (has links)
We first show theoretically and in simulation how power varies as a function of SNP correlation structure with currently-implemented gene-based testing methods. We propose alternative testing methods whose power does not vary with the correlation structure. We then propose hypothesis tests for detecting prevalence-incidence bias in case-control studies, a bias perhaps overrepresented in GWAS due to currently used study designs. Lastly, we hypothesize how different incentive structures used to keep clinical trial participants in studies may interact with a background of dependent censoring and result in variation in the bias of the Kaplan-Meier survival curve estimator.
|
2 |
Empirical Likelihood Inference for the Accelerated Failure Time Model via Kendall Estimating EquationLu, Yinghua 17 July 2010 (has links)
In this thesis, we study two methods for inference of parameters in the accelerated failure time model with right censoring data. One is the Wald-type method, which involves parameter estimation. The other one is empirical likelihood method, which is based on the asymptotic distribution of likelihood ratio. We employ a monotone censored data version of Kendall estimating equation, and construct confidence intervals from both methods. In the simulation studies, we compare the empirical likelihood (EL) and the Wald-type procedure in terms of coverage accuracy and average length of confidence intervals. It is concluded that the empirical likelihood method has a better performance. We also compare the EL for Kendall’s rank regression estimator with the EL for other well known estimators and find advantages of the EL for Kendall estimator for small size sample. Finally, a real clinical trial data is used for the purpose of illustration.
|
3 |
Adaptation des méthodes d’apprentissage aux U-statistiques / Adapting machine learning methods to U-statisticsColin, Igor 24 November 2016 (has links)
L’explosion récente des volumes de données disponibles a fait de la complexité algorithmique un élément central des méthodes d’apprentissage automatique. Les algorithmes d’optimisation stochastique ainsi que les méthodes distribuées et décentralisées ont été largement développés durant les dix dernières années. Ces méthodes ont permis de faciliter le passage à l’échelle pour optimiser des risques empiriques dont la formulation est séparable en les observations associées. Pourtant, dans de nombreux problèmes d’apprentissage statistique, l’estimation précise du risque s’effectue à l’aide de U-statistiques, des fonctions des données prenant la forme de moyennes sur des d-uplets. Nous nous intéressons tout d’abord au problème de l’échantillonnage pour la minimisation du risque empirique. Nous montrons que le risque peut être remplacé par un estimateur de Monte-Carlo, intitulé U-statistique incomplète, basé sur seulement O(n) termes et permettant de conserver un taux d’apprentissage du même ordre. Nous établissons des bornes sur l’erreur d’approximation du U-processus et les simulations numériques mettent en évidence l’avantage d’une telle technique d’échantillonnage. Nous portons par la suite notre attention sur l’estimation décentralisée, où les observations sont désormais distribuées sur un réseau connexe. Nous élaborons des algorithmes dits gossip, dans des cadres synchrones et asynchrones, qui diffusent les observations tout en maintenant des estimateurs locaux de la U-statistique à estimer. Nous démontrons la convergence de ces algorithmes avec des dépendances explicites en les données et la topologie du réseau. Enfin, nous traitons de l’optimisation décentralisée de fonctions dépendant de paires d’observations. De même que pour l’estimation, nos méthodes sont basées sur la concomitance de la propagation des observations et l’optimisation local du risque. Notre analyse théorique souligne que ces méthodes conservent une vitesse de convergence du même ordre que dans le cas centralisé. Les expériences numériques confirment l’intérêt pratique de notre approche. / With the increasing availability of large amounts of data, computational complexity has become a keystone of many machine learning algorithms. Stochastic optimization algorithms and distributed/decentralized methods have been widely studied over the last decade and provide increased scalability for optimizing an empirical risk that is separable in the data sample. Yet, in a wide range of statistical learning problems, the risk is accurately estimated by U-statistics, i.e., functionals of the training data with low variance that take the form of averages over d-tuples. We first tackle the problem of sampling for the empirical risk minimization problem. We show that empirical risks can be replaced by drastically computationally simpler Monte-Carlo estimates based on O(n) terms only, usually referred to as incomplete U-statistics, without damaging the learning rate. We establish uniform deviation results and numerical examples show that such approach surpasses more naive subsampling techniques. We then focus on the decentralized estimation topic, where the data sample is distributed over a connected network. We introduce new synchronous and asynchronous randomized gossip algorithms which simultaneously propagate data across the network and maintain local estimates of the U-statistic of interest. We establish convergence rate bounds with explicit data and network dependent terms. Finally, we deal with the decentralized optimization of functions that depend on pairs of observations. Similarly to the estimation case, we introduce a method based on concurrent local updates and data propagation. Our theoretical analysis reveals that the proposed algorithms preserve the convergence rate of centralized dual averaging up to an additive bias term. Our simulations illustrate the practical interest of our approach.
|
4 |
Jackknife Empirical Likelihood for the Accelerated Failure Time Model with Censored DataBouadoumou, Maxime K 15 July 2011 (has links)
Kendall and Gehan estimating functions are used to estimate the regression parameter in accelerated failure time (AFT) model with censored observations. The accelerated failure time model is the preferred survival analysis method because it maintains a consistent association between the covariate and the survival time. The jackknife empirical likelihood method is used because it overcomes computation difficulty by circumventing the construction of the nonlinear constraint. Jackknife empirical likelihood turns the statistic of interest into a sample mean based on jackknife pseudo-values. U-statistic approach is used to construct the confidence intervals for the regression parameter. We conduct a simulation study to compare the Wald-type procedure, the empirical likelihood, and the jackknife empirical likelihood in terms of coverage probability and average length of confidence intervals. Jackknife empirical likelihood method has a better performance and overcomes the under-coverage problem of the Wald-type method. A real data is also used to illustrate the proposed methods.
|
5 |
Inference for the K-sample problem based on precedence probabilitiesDey, Rajarshi January 1900 (has links)
Doctor of Philosophy / Department of Statistics / Paul I. Nelson / Rank based inference using independent random samples to compare K>1 continuous distributions, called the K-sample problem, based on precedence probabilities is developed and explored. There are many parametric and nonparametric approaches, most dealing with hypothesis testing, to this important, classical problem. Most existing tests are designed to detect differences among the location parameters of different distributions. Best known and most widely used of these is the F- test, which assumes normality. A comparable nonparametric test was developed by Kruskal and Wallis (1952). When dealing with location-scale families of distributions, both of these tests can perform poorly if the differences among the distributions are among their scale parameters and not in their location parameters. Overall, existing tests are not effective in detecting changes in both location and scale. In this dissertation, I propose a new class of rank-based, asymptotically distribution- free tests that are effective in detecting changes in both location and scale based on precedence probabilities. Let X_{i} be a random variable with distribution function F_{i} ; Also, let _pi_ be the set of all permutations of the numbers (1,2,...,K) . Then P(X_{i_{1}}<...<X_{i_{K}}) is a precedence probability if (i_{1},...,i_{K}) belongs to _pi_. Properties of these of tests are developed using the theory of U-statistics (Hoeffding, 1948). Some of these new tests are related to volumes under ROC (Receiver Operating Characteristic) surfaces, which are of particular interest in clinical trials whose goal is to use a score to separate subjects into diagnostic groups. Motivated by this goal, I propose three new index measures of the separation or similarity among two or more distributions. These indices may be used as “effect sizes”. In a related problem, Properties of precedence probabilities are obtained and a bootstrap algorithm is used to estimate an interval for them.
|
Page generated in 0.0473 seconds