Spelling suggestions: "subject:"estatistics"" "subject:"cstatistics""
1 |
Small Area Estimation in a Survey of GovernmentsDumbacher, Brian Arthur 01 April 2016 (has links)
<p> Small area composite estimators are weighted averages that attempt to balance the variability of the direct survey estimator against the bias of the synthetic estimator. Direct and synthetic estimators have competing properties, and finding an optimal weighted average can be challenging. </p><p> One example of a survey that utilizes small area estimation is the Annual Survey of Public Employment & Payroll (ASPEP), which is conducted by the U.S. Census Bureau to collect data on the number and pay of federal, state, and local government civilian employees. Estimates of local government totals are calculated for domains created by crossing state and government function. To calculate estimates at such a detailed level, the Census Bureau uses small area methods that take advantage of auxiliary information from the most recent Census of Governments (CoG). During ASPEP's 2009 sample design, a composite estimator was used, and it was observed that the direct estimator has the desirable property of being greater than the corresponding raw sum of the data, whereas the synthetic estimator has the desirable property of being close to the most recent CoG total. </p><p> In this research, the design-based properties of various estimators and quantities in the composite methodology are studied via a large Monte Carlo simulation using CoG data. New estimators are constructed based on the underlying ideas of limited translation and James-Stein shrinkage. The simulation provides estimates of the design-based variance and mean squared error of every estimator under consideration, and more optimal domain-level composite weights are calculated. Based on simulation results, several limitations of the composite methodology are identified. </p><p> Explicit area-level models are developed that try to capture the spirit of the composite methodology and address its limitations in a unified and generalizable way. The models consist of hierarchical Bayesian extensions of the Fay-Herriot model and are characterized by novel combinations of components allowing for correlated sampling errors, multilevel structure, and t-distributed errors. Estimated variances and covariances from the Monte Carlo simulation are incorporated to take ASPEP's complex sample design into account. Posterior predictive checks and cross-validated posterior predictive checks based on selective discrepancy measures are used to help assess model fit. </p><p> It is observed that the t-normal models, which have t-distributed sampling errors, protect against unreasonable direct estimates and provide over-shrinkage towards the regression synthetic estimates. Also, the proportion of model estimates less than the corresponding raw sums is close to optimal. These empirical findings motivate a theoretical study of the shrinkage provided by the t-normal model. Another simulation is conducted to compare the shrinkage properties of this model and the Fay-Herriot model. </p><p> The methods in this research apply not just to ASPEP, but also to other surveys of governments, surveys of business establishments, and surveys of agriculture, which are similar in terms of sample design and the availability of auxiliary data from a quinquennial census. Ideas for future research include investigating alternative data transformations and discrepancy measures and developing hierarchical Bayesian models for time series and count data.</p>
|
2 |
On two-color monotonic self-equilibrium urn modelsGao, Shuyang 08 June 2016 (has links)
<p> In this study, we focus on a class of two-color balanced urns with multiple drawings that has the property of monotonic self-equilibrium. We give the definition of a monotonic self-equilibrium urn model by specifying the form of its replacement matrix. At each step, a sample of size m ≥ 1 is drawn from the urn, and the replacement rule prespecified by a matrix is applied. The idea is to support whichever color that has fewer counts in the sample. Intuitively, for any urn scheme within this class, the proportions of white and blue balls in the urn tend to be equal asymptotically. We observe by simulation that, when n is large, the number of white balls in the urn within this class is around half of the total number of balls in the urn on average and is normally distributed. Within the class of affine urn schemes, we specify subclasses that have the property of monotonic self-equilibrium, and derive limiting behavior of the number of white balls using existing results. The class of non-affine urn schemes is not yet well developed in the literature. We work on a subclass of non-affine urn models that has the property of monotonic self-equilibrium. For the special case that one ball is added into the urn at each step, we derive limiting behavior of the expectation and the variance and prove convergence in probability for the proportion of white balls in the urn. An optimal strategy on urn balancing and application of monotonic self-equilibrium urn models are also briefly discussed.</p>
|
3 |
On the sequential test per MIL-STD-781 and new, more efficient test plans.Li, Dingjun. January 1990 (has links)
The sequential probability ratio test is an efficient test procedure compared to the fixed sample size test procedure in the sense that it minimizes the average sample size needed for terminating the experiment at the two specified hypotheses, i.e., at H₀: θ = θ₀ and H₁: θ = θ₁. However, this optimum property does not hold for the values of the testing parameter other than these two hypotheses, especially for those with values between these two. Also the estimation following a sequential test is considered to be difficult, and the usual maximum likelihood estimate is in general biased. The sequential test plans given in MIL-STD-781 do not meet their nominal test risk requirements and the truncation of these test plans is determined by the theory for a fixed sample size test. The contributions of this dissertation are: (1) The distribution of the successive sums of samples from a generalized sequential probability ratio test in the exponential case has been obtained. An exact analysis method for the generalized sequential probability ratio test has been developed as well as its FORTRAN programs based on this distribution. (2) A set of improved sequential probability ratio test plans for testing the mean for the exponential distribution has been established. The improved test plan can meet the test risk requirements exactly and can approximately minimize the maximum average waiting time. (3) The properties of the estimates after a sequential test have been investigated and a bias reduced estimate has been recommended. The general method for constructing the confidence interval after a sequential test has been studied and its existence and uniqueness have been proved in the exponential case. (4) Two modification to the Wald's sequential probability ratio test, the triangular test and the repeated significance test, in the exponential case have been also studied. The results show that the triangular test is very close to the optimal test in terms of minimizing the maximum average sample size, and a method for constructing the triangular test plan has been developed.
|
4 |
Joint Modelling of Longitudinal Quality of Life Measurements and Survival Data in Cancer Clinical TrialsSong, Hui 23 January 2013 (has links)
In cancer clinical trials, longitudinal Quality of Life (QoL)
measurements on a patient may be analyzed by classical linear
mixed models but some patients may drop out of study due to
recurrence or death, which causes problems in the application of
classical methods. Joint modelling of longitudinal QoL
measurements and survival times may be employed to explain the
dropout information of longitudinal QoL measurements, and provide
more efficient estimation, especially when there is strong
association between longitudinal measurements and survival times.
Most joint models in the literature assumed classical linear mixed
model for longitudinal measurements, and Cox's proportional
hazards model for survival times. The linear mixed model with
normal-distribution random effects may not be sufficient to model
longitudinal QoL measurements. Moreover, with advances in medical
research, long-term survivors may exist, which makes the
proportional hazards assumption not suitable for survival times
when some censoring times are due to potential cured patients.
In this thesis, we propose new models to analyze longitudinal QoL
measurements and survival times jointly. In the first part of this
thesis, we develop a joint model which assumes a linear mixed tt
model for longitudinal measurements and a promotion time cure
model for survival data. We link these two models through a latent
variable and develop a semiparametric inference procedure. The
second part of this thesis considers a special feature of the QoL
measurements. That is, they are constrained in an interval
(0,1). We propose to take into account this feature by a
simplex-distribution model for these QoL measurements. Classical
proportional hazards and promotion time cure models are used
separately to the situations, depending on whether a cure fraction
is assumed in the data or not. In both cases, we characterize the
correlation between the longitudinal measurements and survival
times by a shared random effect, and derive a semiparametric
penalized joint partial likelihood to estimate the parameters. The
above proposed new joint models and estimation procedures are
evaluated in simulation studies and applied to the QoL
measurements and recurrence times from a clinical trial on women
with early breast cancer. / Thesis (Ph.D, Mathematics & Statistics) -- Queen's University, 2013-01-23 14:04:14.297
|
5 |
An Introduction to the Cox Proportional Hazards Model and Its Applications to Survival AnalysisThompson, Kristina 29 January 2015 (has links)
<p> Statistical modeling of lifetime data, or survival analysis, is studied in many fields, including medicine, information technology and economics. This type of data gives the time to a certain event, such as death in studies of cancer treatment, or time until a computer program crashes. Researchers are often interested in how covariates affect the time to event and wish to determine ways of incorporating such covariates into statistical models. Covariates are explanatory variables that are suspected to affect the lifetime of interest. Lifetime data are typically subject to censoring and this fact needs to be taken into account when choosing the statistical model. </p><p> D.R. Cox (1972) proposed a statistical model that can be used to explore the relationship between survival and various covariates and takes censoring into account. This is called the Cox proportional hazards (PH) model. In particular, the model will be presented and estimation procedures for parameters and functions of interest will be developed. Statistical properties of the resulting estimators will be derived and used in developing inference procedures. </p>
|
6 |
A Clinical Decision Support System for the Prevention of Genetic-Related Heart DiseaseSaguilig, Lauren G. 13 June 2017 (has links)
<p> Drug-induced long QT syndrome (diLQTS) is a common adverse drug reaction characterized by rapid and erratic heart beats that may instigate fainting or seizures. The onset of diLQTS can lead to torsades de points (TdP), a specific form of abnormal heart rhythm that often leads to sudden cardiac arrest and death. This study aims to understand the genetic similarities between diLQTS and TdP to develop a clinical decision support system (CDSS) to aide physicians in the prevention of TdP. Highly accurate classification algorithms, including random forests, shrunken centroid, and diagonal linear discriminant analysis are considered to build a prediction model for TdP. With a feasible set of markers, we accurately predict TdP classifications with an accuracy above 90%. The methodology used in this study can be extended to dealing with other biomedical high-dimensional data.</p>
|
7 |
Modeling the Correlation Structure of RNA Sequencing Data Using A Multivariate Poisson-Lognormal ModelJia, Liyi 02 September 2016 (has links)
<p> High-throughput sequencing technologies have been widely used in biomedical research, especially in human genomic studies. RNA Sequencing (RNA-seq) applies high-throughput sequencing technologies to quantify gene expression, study alternatively spliced gene and discover novel isoform. </p><p> Poisson distribution based methods have been popularly used to model RNA-seq data in practice. Differential expression analysis of RNA-seq data has been well studied. However, the correlation structure of RNA-seq data has not been extensively studied. </p><p> The dissertation proposes a multivariate Poisson-lognormal model for the correlation structure of RNA-seq data. This approach enables us to estimate both positive and negative correlations for the count-type RNA-seq data. Three general scenarios have been discussed. In scenario 1, one exon with one isoform, we propose a bivariate Poisson-lognormal model. In scenario 2, multiple exons with one isoform, we propose a multivariate Poisson-lognormal model. Extending to multiple exons level, the number of pairwise correlations increases accordingly. To reduce the parameter space, the block compound symmetry correlation structure has been introduced. And in scenario 3, multiple exons with multiple isoforms, we propose a mixture of multivariate Poisson-lognormal models. </p><p> Correlation coefficients are estimated by the method of moments. At multiple exons level, we apply the average weighting strategy to reduce the number of moment equations. Simulation studies have been conducted and demonstrate the advantage of our correlation coefficient moment estimator, comparing to Pearson correlation coefficient estimator and Spearman's rank correlation coefficient estimator. </p><p> For application illustrations, we apply our methods to the RNA-seq data from The Cancer Genome Atlas (TCGA, breast cancer study). We estimate the correlation coefficient between gene TP53 and gene CDKN1A with normal subjects. The results show that TP53 and CDKN1A are slightly negative correlated.</p>
|
8 |
Construction of Optimal Foldover Designs with the General Minimum Lower-Order ConfoundingAtakora, Faisal 09 September 2016 (has links)
Fractional factorial designs are widely used in industry and agriculture. Over the years much research work has been done to study these designs. Foldover fractional factorial designs can de-alias effects of interest so that the effects can be estimated without ambiguities. We consider optimal foldover designs using general minimum lower-order confounding criterion. Some Properties of such designs are investigated. A catalogue of 16- and 32-run optimal foldover designs is constructed and tabulated for practical use. A comparison is made between the general minimum lower-order confounding optimal foldover designs and other optimal foldover designs obtained using minimum aberration and clear effect criteria. / October 2016
|
9 |
Optimal Designs for Minimizing Variances of Parameter Estimates in Linear Regression ModelsChen, Manqiong 19 September 2016 (has links)
In statistical inference, it is important to estimate the parameters of a regression model in such a way that the variances of the estimates are as small as possible. Motivated by this fact, we have tried to address this important problem using optimal design theory.
We start with some optimal design theory and determine the optimality conditions in terms of a directional derivative. We construct the optimal designs for minimizing variances of the parameter estimates in two ways. The first one is the analytic approach, in which we derive the derivatives of our criterion and solve the resulting equations. In another approach, we construct the designs using a class of algorithms.
We also construct designs for minimizing the total variance of some parameter estimates. This is motivated by a practical problem in Chemistry. We attempt to improve the convergence of the algorithm by using the properties of the directional derivatives. / October 2016
|
10 |
Combining Information from Multiple Sources in Bayesian ModelingSchifeling, Tracy Anne January 2016 (has links)
<p>Surveys can collect important data that inform policy decisions and drive social science research. Large government surveys collect information from the U.S. population on a wide range of topics, including demographics, education, employment, and lifestyle. Analysis of survey data presents unique challenges. In particular, one needs to account for missing data, for complex sampling designs, and for measurement error. Conceptually, a survey organization could spend lots of resources getting high-quality responses from a simple random sample, resulting in survey data that are easy to analyze. However, this scenario often is not realistic. To address these practical issues, survey organizations can leverage the information available from other sources of data. For example, in longitudinal studies that suffer from attrition, they can use the information from refreshment samples to correct for potential attrition bias. They can use information from known marginal distributions or survey design to improve inferences. They can use information from gold standard sources to correct for measurement error.</p><p>This thesis presents novel approaches to combining information from multiple sources that address the three problems described above.</p><p>The first method addresses nonignorable unit nonresponse and attrition in a panel survey with a refreshment sample. Panel surveys typically suffer from attrition, which can lead to biased inference when basing analysis only on cases that complete all waves of the panel. Unfortunately, the panel data alone cannot inform the extent of the bias due to attrition, so analysts must make strong and untestable assumptions about the missing data mechanism. Many panel studies also include refreshment samples, which are data collected from a random sample of new</p><p>individuals during some later wave of the panel. Refreshment samples offer information that can be utilized to correct for biases induced by nonignorable attrition while reducing reliance on strong assumptions about the attrition process. To date, these bias correction methods have not dealt with two key practical issues in panel studies: unit nonresponse in the initial wave of the panel and in the</p><p>refreshment sample itself. As we illustrate, nonignorable unit nonresponse </p><p>can significantly compromise the analyst's ability to use the refreshment samples for attrition bias correction. Thus, it is crucial for analysts to assess how sensitive their inferences---corrected for panel attrition---are to different assumptions about the nature of the unit nonresponse. We present an approach that facilitates such sensitivity analyses, both for suspected nonignorable unit nonresponse </p><p>in the initial wave and in the refreshment sample. We illustrate the approach using simulation studies and an analysis of data from the 2007-2008 Associated Press/Yahoo News election panel study. </p><p>The second method incorporates informative prior beliefs about</p><p>marginal probabilities into Bayesian latent class models for categorical data. </p><p>The basic idea is to append synthetic observations to the original data such that </p><p>(i) the empirical distributions of the desired margins match those of the prior beliefs, and (ii) the values of the remaining variables are left missing. The degree of prior uncertainty is controlled by the number of augmented records. Posterior inferences can be obtained via typical MCMC algorithms for latent class models, tailored to deal efficiently with the missing values in the concatenated data.</p><p>We illustrate the approach using a variety of simulations based on data from the American Community Survey, including an example of how augmented records can be used to fit latent class models to data from stratified samples.</p><p>The third method leverages the information from a gold standard survey to model reporting error. Survey data are subject to reporting error when respondents misunderstand the question or accidentally select the wrong response. Sometimes survey respondents knowingly select the wrong response, for example, by reporting a higher level of education than they actually have attained. We present an approach that allows an analyst to model reporting error by incorporating information from a gold standard survey. The analyst can specify various reporting error models and assess how sensitive their conclusions are to different assumptions about the reporting error process. We illustrate the approach using simulations based on data from the 1993 National Survey of College Graduates. We use the method to impute error-corrected educational attainments in the 2010 American Community Survey using the 2010 National Survey of College Graduates as the gold standard survey.</p> / Dissertation
|
Page generated in 0.0706 seconds