Spelling suggestions: "subject:"1gsample deproblem"" "subject:"1gsample 3dproblem""
1 |
The k-Sample Problem When k is Large and n SmallZhan, Dongling 2012 May 1900 (has links)
The k-sample problem, i.e., testing whether two or more data sets come from the same population, is a classic one in statistics. Instead of having a small number of k groups of
samples, this dissertation works on a large number of p groups of samples, where within each group, the sample size, n, is a fixed, small number. We call this as a "Large p, but Small n" setting. The primary goal of the research is to provide a test statistic based on kernel density estimation (KDE) that has an asymptotic normal distribution when p goes to infinity with n fixed.
In this dissertation, we propose a test statistic called Tp(S) and its standardized version, T(S). By using T(S), we conduct our test based on the critical values of the standard normal distribution. Theoretically, we show that our test is invariant to a location and scale transformation of the data. We also find conditions under which our test is consistent. Simulation studies show that our test has good power against a variety of alternatives. The real data analyses show that our test finds differences between gene distributions that are not due simply to location.
|
2 |
Survival analysis issues with interval-censored dataOller Piqué, Ramon 30 June 2006 (has links)
L'anàlisi de la supervivència s'utilitza en diversos àmbits per tal d'analitzar dades que mesuren el temps transcorregut entre dos successos. També s'anomena anàlisi de la història dels esdeveniments, anàlisi de temps de vida, anàlisi de fiabilitat o anàlisi del temps fins a l'esdeveniment. Una de les dificultats que té aquesta àrea de l'estadística és la presència de dades censurades. El temps de vida d'un individu és censurat quan només és possible mesurar-lo de manera parcial o inexacta. Hi ha diverses circumstàncies que donen lloc a diversos tipus de censura. La censura en un interval fa referència a una situació on el succés d'interès no es pot observar directament i només tenim coneixement que ha tingut lloc en un interval de temps aleatori. Aquest tipus de censura ha generat molta recerca en els darrers anys i usualment té lloc en estudis on els individus són inspeccionats o observats de manera intermitent. En aquesta situació només tenim coneixement que el temps de vida de l'individu es troba entre dos temps d'inspecció consecutius.Aquesta tesi doctoral es divideix en dues parts que tracten dues qüestions importants que fan referència a dades amb censura en un interval. La primera part la formen els capítols 2 i 3 els quals tracten sobre condicions formals que asseguren que la versemblança simplificada pot ser utilitzada en l'estimació de la distribució del temps de vida. La segona part la formen els capítols 4 i 5 que es dediquen a l'estudi de procediments estadístics pel problema de k mostres. El treball que reproduïm conté diversos materials que ja s'han publicat o ja s'han presentat per ser considerats com objecte de publicació.En el capítol 1 introduïm la notació bàsica que s'utilitza en la tesi doctoral. També fem una descripció de l'enfocament no paramètric en l'estimació de la funció de distribució del temps de vida. Peto (1973) i Turnbull (1976) van ser els primers autors que van proposar un mètode d'estimació basat en la versió simplificada de la funció de versemblança. Altres autors han estudiat la unicitat de la solució obtinguda en aquest mètode (Gentleman i Geyer, 1994) o han millorat el mètode amb noves propostes (Wellner i Zhan, 1997).El capítol 2 reprodueix l'article d'Oller et al. (2004). Demostrem l'equivalència entre les diferents caracteritzacions de censura no informativa que podem trobar a la bibliografia i definim una condició de suma constant anàloga a l'obtinguda en el context de censura per la dreta. També demostrem que si la condició de no informació o la condició de suma constant són certes, la versemblança simplificada es pot utilitzar per obtenir l'estimador de màxima versemblança no paramètric (NPMLE) de la funció de distribució del temps de vida. Finalment, caracteritzem la propietat de suma constant d'acord amb diversos tipus de censura. En el capítol 3 estudiem quina relació té la propietat de suma constant en la identificació de la distribució del temps de vida. Demostrem que la distribució del temps de vida no és identificable fora de la classe dels models de suma constant. També demostrem que la probabilitat del temps de vida en cadascun dels intervals observables és identificable dins la classe dels models de suma constant. Tots aquests conceptes elsil·lustrem amb diversos exemples.El capítol 4 s'ha publicat parcialment en l'article de revisió metodològica de Gómez et al. (2004). Proporciona una visió general d'aquelles tècniques que s'han aplicat en el problema no paramètric de comparació de dues o més mostres amb dades censurades en un interval. També hem desenvolupat algunes rutines amb S-Plus que implementen la versió permutacional del tests de Wilcoxon, Logrank i de la t de Student per a dades censurades en un interval (Fay and Shih, 1998). Aquesta part de la tesi doctoral es complementa en el capítol 5 amb diverses propostes d'extensió del test de Jonckeere. Amb l'objectiu de provar una tendència en el problema de k mostres, Abel (1986) va realitzar una de les poques generalitzacions del test de Jonckheere per a dades censurades en un interval. Nosaltres proposem altres generalitzacions d'acord amb els resultats presentats en el capítol 4. Utilitzem enfocaments permutacionals i de Monte Carlo. Proporcionem programes informàtics per a cada proposta i realitzem un estudi de simulació per tal de comparar la potència de cada proposta sota diferents models paramètrics i supòsits de tendència. Com a motivació de la metodologia, en els dos capítols s'analitza un conjunt de dades d'un estudi sobre els beneficis de la zidovudina en pacients en els primers estadis de la infecció del virus VIH (Volberding et al., 1995).Finalment, el capítol 6 resumeix els resultats i destaca aquells aspectes que s'han de completar en el futur. / Survival analysis is used in various fields for analyzing data involving the duration between two events. It is also known as event history analysis, lifetime data analysis, reliability analysis or time to event analysis. One of the difficulties which arise in this area is the presence of censored data. The lifetime of an individual is censored when it cannot be exactly measured but partial information is available. Different circumstances can produce different types of censoring. Interval censoring refers to the situation when the event of interest cannot be directly observed and it is only known to have occurred during a random interval of time. This kind of censoring has produced a lot of work in the last years and typically occurs for individuals in a study being inspected or observed intermittently, so that an individual's lifetime is known only to lie between two successive observation times.This PhD thesis is divided into two parts which handle two important issues of interval censored data. The first part is composed by Chapter 2 and Chapter 3 and it is about formal conditions which allow estimation of the lifetime distribution to be based on a well known simplified likelihood. The second part is composed by Chapter 4 and Chapter 5 and it is devoted to the study of test procedures for the k-sample problem. The present work reproduces several material which has already been published or has been already submitted.In Chapter 1 we give the basic notation used in this PhD thesis. We also describe the nonparametric approach to estimate the distribution function of the lifetime variable. Peto (1973) and Turnbull (1976) were the first authors to propose an estimation method which is based on a simplified version of the likelihood function. Other authors have studied the uniqueness of the solution given by this method (Gentleman and Geyer, 1994) or have improved it with new proposals (Wellner and Zhan, 1997).Chapter 2 reproduces the paper of Oller et al. (2004). We prove the equivalence between different characterizations of noninformative censoring appeared in the literature and we define an analogous constant-sum condition to the one derived in the context of right censoring. We prove as well that when the noninformative condition or the constant-sum condition holds, the simplified likelihood can be used to obtain the nonparametric maximum likelihood estimator (NPMLE) of the failure time distribution function. Finally, we characterize the constant-sum property according to different types of censoring. In Chapter 3 we study the relevance of the constant-sum property in the identifiability of the lifetime distribution. We show that the lifetime distribution is not identifiable outside the class of constant-sum models. We also show that the lifetime probabilities assigned to the observable intervals are identifiable inside the class of constant-sum models. We illustrate all these notions with several examples.Chapter 4 has partially been published in the survey paper of Gómez et al. (2004). It gives a general view of those procedures which have been applied in the nonparametric problem of the comparison of two or more interval-censored samples. We also develop some S-Plus routines which implement the permutational version of the Wilcoxon test, the Logrank test and the t-test for interval censored data (Fay and Shih, 1998). This part of the PhD thesis is completed in Chapter 5 by different proposals of extension of the Jonckeere's test. In order to test for an increasing trend in the k-sample problem, Abel (1986) gives one of the few generalizations of the Jonckheree's test for interval-censored data. We also suggest different Jonckheere-type tests according to the tests presented in Chapter 4. We use permutational and Monte Carlo approaches. We give computer programs for each proposal and perform a simulation study in order compare the power of each proposal under different parametric assumptions and different alternatives. We motivate both chapters with the analysis of a set of data from a study of the benefits of zidovudine in patients in the early stages of the HIV infection (Volberding et al., 1995).Finally, Chapter 6 summarizes results and address those aspects which remain to be completed.
|
3 |
Multivariate permutation tests for the k-sample problem with clustered dataRahnenführer, Jörg January 1999 (has links) (PDF)
The present paper deals with the choice of clustering algorithms before treating a k-sample problem. We investigate multivariate data sets that are quantized by algorithms that define partitions by maximal support planes (MSP) of a convex function. These algorithms belong to a wide class containing as special cases both the well known k-means algorithm and the Kohonen (1985) algorithm and have been profoundly investigated by Pötzelberger and Strasser (1999). For computing the test statistics for the k-sample problem we replace the data points by their conditional expections with respect to the MSP-partition. We present Monte Carlo simulations of power functions of different tests for the k-sample problem whereas the tests are carried out as multivariate permutation tests to ensure that they hold the level. The results presented show that there seems to be a vital and decisive connection between the optimal choice of the clustering algorithm and the tails of the probability distribution of the data. Especially for distributions with heavy tails like the exponential distribution the performance of tests based on a quadratic convex function with k-means type partitions totally breaks down. (author's abstract) / Series: Report Series SFB "Adaptive Information Systems and Modelling in Economics and Management Science"
|
4 |
Analýza funkcionálních dat / Functional data analysisJurica, Tomáš January 2021 (has links)
The aim of the master thesis is to review of reconstruction techniques of func- tional data and existing one-way functional ANOVA (FANOVA) tests. Specif- ically, the work deals with L2 -norm based and F-type mean functions equality tests, L2 -norm based covariance functions equality tests and tests for distri- bution equality. Furthermore, for each type of the test, it is introduced test based on reconstructed functional data, using orthornormal basis functions of L2 space. Finally, simulation study was conducted for comparing properties of tests using orthonormal basis representation of functional data and tests applied on non-reconstructed data. 1
|
5 |
A New Nonparametric Procedure for the k-sample ProblemWilcock, Samuel Phillip 18 September 2001 (has links)
The k-sample data setting is one of the most common data settings used today. The null hypothesis that is most generally of interest for these methods is that the k-samples have the same location. Currently there are several procedures available for the individual who has data of this type. The most often used method is commonly called the ANOVA F-test. This test assumes that all of the underlying distributions are normal, with equal variances. Thus the only allowable difference in the distributions is a possible shift, under the alternative hypothesis. Under the null hypothesis, it is assumed that all k distributions are identical, not just equally located.
Current nonparametric methods for the k-sample setting require a variety of restrictions on the distribution of the data. The most commonly used method is that due to Kruskal and Wallis (1952). The method, commonly called the Kruskal-Wallis test, does not assume that the data come from normal populations, though they must still be continuous, but maintains the requirement that the populations must be identical under the null, and may differ only by a possible shift under the alternative.
In this work a new procedure is developed which is exactly distribution free when the distributions are equivalent and continuous under the null hypothesis, and simulations are used to study the properties of the test when the distributions are continuous and have the same medians under the null. The power of the statistic under alternatives is also studied. The test bears a resemblance to the two sample sign type tests, which will be pointed out as the development is shown. / Ph. D.
|
6 |
Inference for the K-sample problem based on precedence probabilitiesDey, Rajarshi January 1900 (has links)
Doctor of Philosophy / Department of Statistics / Paul I. Nelson / Rank based inference using independent random samples to compare K>1 continuous distributions, called the K-sample problem, based on precedence probabilities is developed and explored. There are many parametric and nonparametric approaches, most dealing with hypothesis testing, to this important, classical problem. Most existing tests are designed to detect differences among the location parameters of different distributions. Best known and most widely used of these is the F- test, which assumes normality. A comparable nonparametric test was developed by Kruskal and Wallis (1952). When dealing with location-scale families of distributions, both of these tests can perform poorly if the differences among the distributions are among their scale parameters and not in their location parameters. Overall, existing tests are not effective in detecting changes in both location and scale. In this dissertation, I propose a new class of rank-based, asymptotically distribution- free tests that are effective in detecting changes in both location and scale based on precedence probabilities. Let X_{i} be a random variable with distribution function F_{i} ; Also, let _pi_ be the set of all permutations of the numbers (1,2,...,K) . Then P(X_{i_{1}}<...<X_{i_{K}}) is a precedence probability if (i_{1},...,i_{K}) belongs to _pi_. Properties of these of tests are developed using the theory of U-statistics (Hoeffding, 1948). Some of these new tests are related to volumes under ROC (Receiver Operating Characteristic) surfaces, which are of particular interest in clinical trials whose goal is to use a score to separate subjects into diagnostic groups. Motivated by this goal, I propose three new index measures of the separation or similarity among two or more distributions. These indices may be used as “effect sizes”. In a related problem, Properties of precedence probabilities are obtained and a bootstrap algorithm is used to estimate an interval for them.
|
Page generated in 0.0524 seconds