• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 113
  • 49
  • 30
  • 11
  • 9
  • Tagged with
  • 221
  • 176
  • 49
  • 49
  • 46
  • 43
  • 43
  • 32
  • 29
  • 29
  • 27
  • 25
  • 22
  • 22
  • 21
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

Robust mixture regression modeling with Pearson type VII distribution

Zhang, Jingyi January 1900 (has links)
Master of Science / Department of Statistics / Weixing Song / A robust estimation procedure for parametric regression models is proposed in the paper by assuming the error terms follow a Pearson type VII distribution. The estimation procedure is implemented by an EM algorithm based on the fact that the Pearson type VII distributions are a scale mixture of a normal distribution and a Gamma distribution. A trimmed version of proposed procedure is also discussed in this paper, which can successfully trim the high leverage points away from the data. Finite sample performance of the proposed algorithm is evaluated by some extensive simulation studies, together with the comparisons made with other existing procedures in the literature.
112

Robust mixture regression model fitting by Laplace distribution

Xing, Yanru January 1900 (has links)
Master of Science / Department of Statistics / Weixing Song / A robust estimation procedure for mixture linear regression models is proposed in this report by assuming the error terms follow a Laplace distribution. EM algorithm is imple- mented to conduct the estimation procedure of missing information based on the fact that the Laplace distribution is a scale mixture of normal and a latent distribution. Finite sample performance of the proposed algorithm is evaluated by some extensive simulation studies, together with the comparisons made with other existing procedures in this literature. A sensitivity study is also conducted based on a real data example to illustrate the application of the proposed method.
113

Conditional variance function checking in heteroscedastic regression models.

Samarakoon, Nishantha Anura January 1900 (has links)
Doctor of Philosophy / Department of Statistics / Weixing Song / The regression model has been given a considerable amount of attention and played a significant role in data analysis. The usual assumption in regression analysis is that the variances of the error terms are constant across the data. Occasionally, this assumption of homoscedasticity on the variance is violated; and the data generated from real world applications exhibit heteroscedasticity. The practical importance of detecting heteroscedasticity in regression analysis is widely recognized in many applications because efficient inference for the regression function requires unequal variance to be taken into account. The goal of this thesis is to propose new testing procedures to assess the adequacy of fitting parametric variance function in heteroscedastic regression models. The proposed tests are established in Chapter 2 using certain minimized L[subscript]2 distance between a nonparametric and a parametric variance function estimators. The asymptotic distribution of the test statistics corresponding to the minimum distance estimator under the fixed model and that of the corresponding minimum distance estimators are shown to be normal. These estimators turn out to be [sqrt]n consistent. The asymptotic power of the proposed test against some local nonparametric alternatives is also investigated. Numerical simulation studies are employed to evaluate the nite sample performance of the test in one dimensional and two dimensional cases. The minimum distance method in Chapter 2 requires the calculation of the integrals in the test statistics. These integrals usually do not have a tractable form. Therefore, some numerical integration methods are needed to approximate the integrations. Chapter 3 discusses a nonparametric empirical smoothing lack-of-fit test for the functional form of the variance in regression models that do not involve evaluation of integrals. empirical smoothing lack-of-fit test can be treated as a nontrivial modification of Zheng (1996)'s nonparametric smoothing test and Koul and Ni (2004)'s minimum distance test for the mean function in the classic regression models. The asymptotic normality of the proposed test under the null hypothesis is established. Consistency at some fixed alternatives and asymptotic power under some local alternatives are also discussed. Simulation studies are conducted to assess the nite sample performance of the test. The simulation studies show that the proposed empirical smoothing test is more powerful and computationally more efficient than the minimum distance test and Wang and Zhou (2006)'s test.
114

Empirical minimum distance lack-of-fit tests for Tobit regression models

Zhang, Yi January 1900 (has links)
Master of Science / Department of Statistics / Weixing Song / The purpose of this report is to propose and evaluate two lack-of-fit test procedures to check the adequacy of the regression functional forms in the standard Tobit regression models. It is shown that testing the null hypothesis for the standard Tobit regression models amounts testing a new equivalent null hypothesis of the classic regression models. Both procedures are constructed based on the empirical variants of a minimum distance, which measures the squared difference between a nonparametric estimator and a parametric estimator of the regression functions fitted under the null hypothesis for the new regression models. The asymptotic null distributions of the test statistics are investigated, as well as the power for some fixed alternatives and some local hypotheses. Simulation studies are conducted to assess the finite sample power performance and the robustness of the tests. Comparisons between these two test procedures are also made.
115

Comparison of background correction in tiling arrays and a spatial model

Maurer, Dustin January 1900 (has links)
Master of Science / Department of Statistics / Susan J. Brown / Haiyan Wang / DNA hybridization microarray technologies have made it possible to gain an unbiased perspective of whole genome transcriptional activity on such a scale that is increasing more and more rapidly by the day. However, due to biologically irrelevant bias introduced by the experimental process and the machinery involved, correction methods are needed to restore the data to its true biologically meaningful state. Therefore, it is important that the algorithms developed to remove any sort of technical biases are accurate and robust. This report explores the concept of background correction in microarrays by using a real data set of five replicates of whole genome tiling arrays hybridized with genetic material from Tribolium castaneum. It reviews the literature surrounding such correction techniques and explores some of the more traditional methods through implementation on the data set. Finally, it introduces an alternative approach, implements it, and compares it to the traditional approaches for the correction of such errors.
116

Confidence intervals for population size based on a capture-recapture design

Hua, Jianjun January 1900 (has links)
Master of Science / Department of Statistics / Paul I. Nelson / Capture-Recaputre (CR) experiments stemmed from the study of wildlife and are widely used in areas such as ecology, epidemiology, evaluation of census undercounts, and software testing, to estimate population size, survival rate, and other population parameters. The basic idea of the design is to use “overlapping” information contained in multiple samples from the population. In this report, we focus on the simplest form of Capture-Recapture experiments, namely, a two-sample Capture-Recapture design, which is conventionally called the “Petersen Method.” We study and compare the performance of three methods of constructing confidence intervals for the population size based on a Capture-Recapture design, asymptotic normality estimation, Chapman estimation, and “inverting a chi-square test” estimation, in terms of coverage rate and mean interval width. Simulation studies are carried out and analyzed using R and SAS. It turns out that the “inverting a chi-square test” estimation is better than the other two methods. A possible solution to the “zero recapture” problem is put forward. We find that if population size is at least a few thousand, two-sample CR estimation provides reasonable estimates of the population size.
117

A study of the robustness of Cox's proportional hazards model used in testing for covariate effects

Fei, Mingwei January 1900 (has links)
Master of Arts / Department of Statistics / Paul Nelson / There are two important statistical models for multivariate survival analysis, proportional hazards(PH) models and accelerated failure time(AFT) model. PH analysis is most commonly used multivariate approach for analysing survival time data. For example, in clinical investigations where several (known) quantities or covariates, potentially affect patient prognosis, it is often desirable to investigate one factor effect adjust for the impact of others. This report offered a solution to choose appropriate model in testing covariate effects under different situations. In real life, we are very likely to just have limited sample size and censoring rates(people dropping off), which cause difficulty in statistical analysis. In this report, each dataset is randomly repeated 1000 times from three different distributions (Weibull, Lognormal and Loglogistc) with combination of sample sizes and censoring rates. Then both models are evaluated by hypothesis testing of covariate effect using the simulated data using the derived statistics, power, type I error rate and covergence rate for each situation. We would recommend PH method when sample size is small(n<20) and censoring rate is high(p>0.8). In this case, both PH and AFT analyses may not be suitable for hypothesis testing, but PH analysis is more robust and consistent than AFT analysis. And when sample size is 20 or above and censoring rate is 0.8 or below, AFT analysis will have slight higher convergence rate and power than PH, but not much improvement in Type I error rates when sample size is big(n>50) and censoring rate is low(p<0.3). Considering the privilege of not requiring knowledge of distribution for PH analysis, we concluded that PH analysis is robust in hypothesis testing for covariate effects using data generated from an AFT model.
118

Minimum Hellinger distance estimation in a semiparametric mixture model

Xiang, Sijia January 1900 (has links)
Master of Science / Department of Statistics / Weixin Yao / In this report, we introduce the minimum Hellinger distance (MHD) estimation method and review its history. We examine the use of Hellinger distance to obtain a new efficient and robust estimator for a class of semiparametric mixture models where one component has known distribution while the other component and the mixing proportion are unknown. Such semiparametric mixture models have been used in biology and the sequential clustering algorithm. Our new estimate is based on the MHD, which has been shown to have good efficiency and robustness properties. We use simulation studies to illustrate the finite sample performance of the proposed estimate and compare it to some other existing approaches. Our empirical studies demonstrate that the proposed minimum Hellinger distance estimator (MHDE) works at least as well as some existing estimators for most of the examples considered and outperforms the existing estimators when the data are under contamination. A real data set application is also provided to illustrate the effectiveness of our proposed methodology.
119

Statistical analysis of pyrosequence data

Keating, Karen January 1900 (has links)
Doctor of Philosophy / Department of Statistics / Gary L. Gadbury / Since their commercial introduction in 2005, DNA sequencing technologies have become widely available and are now cost-effective tools for determining the genetic characteristics of organisms. While the biomedical applications of DNA sequencing are apparent, these technologies have been applied to many other research areas. One such area is community ecology, in which DNA sequence data are used to identify the presence and abundance of microscopic organisms that inhabit an environment. This is currently an active area of research, since it is generally believed that a change in the composition of microscopic species in a geographic area may signal a change in the overall health of the environment. An overview of DNA pyrosequencing, as implemented by the Roche/Life Science 454 platform, is presented and aspects of the process that can introduce variability in data are identified. Four ecological data sets that were generated by the 454 platform are used for illustration. Characteristics of these data include high dimensionality, a large proportion of zeros (usually in excess of 90%), and nonzero values that are strongly right-skewed. A nonparametric method to standardize these data is presented and effects of standardization on outliers and skewness are examined. Traditional statistical methods for analyzing macroscopic species abundance data are discussed, and the applicability of these methods to microscopic species data is examined. One objective that receives focus is the classification of microscopic species as either rare or common species. This is an important distinction since there is much evidence to suggest that the biological and environmental mechanisms that govern common species are distinctly different than the mechanisms that govern rare species. This indicates that the abundance patterns for common and rare species may follow different probability models, and the suitability of the Pareto distribution for rare species is examined. Techniques for classifying macroscopic species are shown to be ill-suited for microscopic species, and an alternative technique is presented. Recognizing that the structure of the data is similar to that of financial applications (such as insurance claims and the distribution of wealth), the Gini index and other statistics based on the Lorenz curve are explored as potential test statistics for distinguishing rare versus common species.
120

A simulation study of the robustness of prediction intervals for an independent observation obtained from a random sample from an assumed location-scale family of distributions

Makarova, Natalya January 1900 (has links)
Master of Science / Department of Statistics / Paul I. Nelson / Suppose that based on data consisting of independent repetitions of an experiment a researcher wants to predict the outcome of the next independent outcome of the experiment. The researcher models the data as being realizations of independent, identically distributed random variables { Xi, i=1,2,..n} having density f() and the next outcome as the value of an independent random variable Y , also having density f() . We assume that the density f() lies in one of three location-scale families: standard normal (symmetric); Cauchy (symmetric, heavy-tailed); extreme value (asymmetric.). The researcher does not know the values of the location and scale parameters. For f() = f0() lying in one of these families, an exact prediction interval for Y can be constructed using equivariant estimators of the location and scale parameters to form a pivotal quantity based on { Xi, i=1,2,..n} and Y. This report investigates via a simulation study the performance of these prediction intervals in terms of coverage rate and length when the assumption that f() = f0() is correct and when it is not. The simulation results indicate that prediction intervals based on the assumption of normality perform quite well with normal and extreme value data and reasonably well with Cauchy data when the sample sizes are large. The heavy tailed Cauchy assumption only leads to prediction intervals that perform well with Cauchy data and is not robust when the data are normal and extreme value. Similarly, the asymmetric extreme value model leads to prediction intervals that only perform well with extreme value data. Overall, this study indicates robustness with respect to a mismatch between the assumed and actual distributions in some cases and a lack of robustness in others.

Page generated in 0.024 seconds