1 |
Estimating the Variance of the Sample MedianPrice, Robert M., Bonett, Douglas G. 01 January 2001 (has links)
The small-sample bias and root mean squared error of several distribution-free estimators of the variance of the sample median are examined. A new estimator is proposed that is easy to compute and tends to have the smallest bias and root mean squared error.
|
2 |
Mixture Modeling and Outlier Detection in Microarray Data AnalysisGeorge, Nysia I. 16 January 2010 (has links)
Microarray technology has become a dynamic tool in gene expression analysis
because it allows for the simultaneous measurement of thousands of gene expressions.
Uniqueness in experimental units and microarray data platforms, coupled with how
gene expressions are obtained, make the field open for interesting research questions.
In this dissertation, we present our investigations of two independent studies related
to microarray data analysis.
First, we study a recent platform in biology and bioinformatics that compares
the quality of genetic information from exfoliated colonocytes in fecal matter with
genetic material from mucosa cells within the colon. Using the intraclass correlation
coe�cient (ICC) as a measure of reproducibility, we assess the reliability of density
estimation obtained from preliminary analysis of fecal and mucosa data sets. Numerical findings clearly show that the distribution is comprised of two components.
For measurements between 0 and 1, it is natural to assume that the data points are
from a beta-mixture distribution. We explore whether ICC values should be modeled
with a beta mixture or transformed first and fit with a normal mixture. We find that
the use of mixture of normals in the inverse-probit transformed scale is less sensitive toward model mis-specification; otherwise a biased conclusion could be reached. By
using the normal mixture approach to compare the ICC distributions of fecal and
mucosa samples, we observe the quality of reproducible genes in fecal array data to
be comparable with that in mucosa arrays.
For microarray data, within-gene variance estimation is often challenging due
to the high frequency of low replication studies. Several methodologies have been
developed to strengthen variance terms by borrowing information across genes. However, even with such accommodations, variance may be initiated by the presence of
outliers. For our second study, we propose a robust modification of optimal shrinkage variance estimation to improve outlier detection. In order to increase power, we
suggest grouping standardized data so that information shared across genes is similar
in distribution. Simulation studies and analysis of real colon cancer microarray data
reveal that our methodology provides a technique which is insensitive to outliers, free of distributional assumptions, effective for small sample size, and data adaptive.
|
3 |
Calibration Adjustment for Nonresponse in Sample SurveysRota, Bernardo João January 2016 (has links)
In this thesis, we discuss calibration estimation in the presence of nonresponse with a focus on the linear calibration estimator and the propensity calibration estimator, along with the use of different levels of auxiliary information, that is, sample and population levels. This is a fourpapers- based thesis, two of which discuss estimation in two steps. The two-step-type estimator here suggested is an improved compromise of both the linear calibration and the propensity calibration estimators mentioned above. Assuming that the functional form of the response model is known, it is estimated in the first step using calibration approach. In the second step the linear calibration estimator is constructed replacing the design weights by products of these with the inverse of the estimated response probabilities in the first step. The first step of estimation uses sample level of auxiliary information and we demonstrate that this results in more efficient estimated response probabilities than using population-level as earlier suggested. The variance expression for the two-step estimator is derived and an estimator of this is suggested. Two other papers address the use of auxiliary variables in estimation. One of which introduces the use of principal components theory in the calibration for nonresponse adjustment and suggests a selection of components using a theory of canonical correlation. Principal components are used as a mean to accounting the problem of estimation in presence of large sets of candidate auxiliary variables. In addition to the use of auxiliary variables, the last paper also discusses the use of explicit models representing the true response behavior. Usually simple models such as logistic, probit, linear or log-linear are used for this purpose. However, given a possible complexity on the structure of the true response probability, it may raise a question whether these simple models are effective. We use an example of telephone-based survey data collection process and demonstrate that the logistic model is generally not appropriate.
|
4 |
Inférence doublement robuste en présence de données imputées dans les enquêtesPicard, Frédéric 02 1900 (has links)
L'imputation est souvent utilisée dans les enquêtes pour traiter la non-réponse partielle. Il est bien connu que traiter les
valeurs imputées comme des valeurs observées entraîne une
sous-estimation importante de la variance des estimateurs
ponctuels. Pour remédier à ce problème, plusieurs méthodes
d'estimation de la variance ont été proposées dans la littérature,
dont des méthodes adaptées de rééchantillonnage telles que le
Bootstrap et le Jackknife. Nous définissons le concept de
double-robustesse pour l'estimation ponctuelle et de variance
sous l'approche par modèle de non-réponse et l'approche par modèle
d'imputation. Nous mettons l'emphase sur l'estimation de la
variance à l'aide du Jackknife qui est souvent utilisé dans la
pratique. Nous étudions les propriétés de différents estimateurs
de la variance à l'aide du Jackknife pour l'imputation par la
régression déterministe ainsi qu'aléatoire. Nous nous penchons
d'abord sur le cas de l'échantillon aléatoire simple. Les cas de
l'échantillonnage stratifié et à probabilités inégales seront
aussi étudiés. Une étude de simulation compare plusieurs méthodes
d'estimation de variance à l'aide du Jackknife en terme de biais
et de stabilité relative quand la fraction de sondage n'est pas
négligeable. Finalement, nous établissons la normalité
asymptotique des estimateurs imputés pour l'imputation par
régression déterministe et aléatoire. / Imputation is often used in surveys to treat item nonresponse. It
is well known that treating the imputed values as observed values
may lead to substantial underestimation of the variance of the
point estimators. To overcome the problem, a number of variance
estimation methods have been proposed in the literature, including
appropriate versions of resampling methods such as the jackknife
and the bootstrap. We define the concept of doubly robust point
and variance estimation under the so-called nonresponse and
imputation model approaches. We focus on jackknife variance
estimation, which is widely used in practice. We study the
properties of several jackknife variance estimators under both
deterministic and random regression imputation. We first consider
the case of simple random sampling without replacement. The case
of stratified simple random sampling and unequal probability
sampling is also considered. A limited simulation study compares
various jackknife variance estimators in terms of bias and
relative stability when the sampling fraction is not negligible.
Finally, the asymptotic normality of imputed estimator is
established under both deterministic and random regression
imputation.
|
5 |
Inférence doublement robuste en présence de données imputées dans les enquêtesPicard, Frédéric 02 1900 (has links)
L'imputation est souvent utilisée dans les enquêtes pour traiter la non-réponse partielle. Il est bien connu que traiter les
valeurs imputées comme des valeurs observées entraîne une
sous-estimation importante de la variance des estimateurs
ponctuels. Pour remédier à ce problème, plusieurs méthodes
d'estimation de la variance ont été proposées dans la littérature,
dont des méthodes adaptées de rééchantillonnage telles que le
Bootstrap et le Jackknife. Nous définissons le concept de
double-robustesse pour l'estimation ponctuelle et de variance
sous l'approche par modèle de non-réponse et l'approche par modèle
d'imputation. Nous mettons l'emphase sur l'estimation de la
variance à l'aide du Jackknife qui est souvent utilisé dans la
pratique. Nous étudions les propriétés de différents estimateurs
de la variance à l'aide du Jackknife pour l'imputation par la
régression déterministe ainsi qu'aléatoire. Nous nous penchons
d'abord sur le cas de l'échantillon aléatoire simple. Les cas de
l'échantillonnage stratifié et à probabilités inégales seront
aussi étudiés. Une étude de simulation compare plusieurs méthodes
d'estimation de variance à l'aide du Jackknife en terme de biais
et de stabilité relative quand la fraction de sondage n'est pas
négligeable. Finalement, nous établissons la normalité
asymptotique des estimateurs imputés pour l'imputation par
régression déterministe et aléatoire. / Imputation is often used in surveys to treat item nonresponse. It
is well known that treating the imputed values as observed values
may lead to substantial underestimation of the variance of the
point estimators. To overcome the problem, a number of variance
estimation methods have been proposed in the literature, including
appropriate versions of resampling methods such as the jackknife
and the bootstrap. We define the concept of doubly robust point
and variance estimation under the so-called nonresponse and
imputation model approaches. We focus on jackknife variance
estimation, which is widely used in practice. We study the
properties of several jackknife variance estimators under both
deterministic and random regression imputation. We first consider
the case of simple random sampling without replacement. The case
of stratified simple random sampling and unequal probability
sampling is also considered. A limited simulation study compares
various jackknife variance estimators in terms of bias and
relative stability when the sampling fraction is not negligible.
Finally, the asymptotic normality of imputed estimator is
established under both deterministic and random regression
imputation.
|
Page generated in 0.0661 seconds