Global ETD Search

281	Improving coverage of rectangular confidence interval Gogtas, Hakan 23 September 2004 (has links) To find a better confidence region is always of interest in statistics. One way to find better confidence regions is to uniformly improve coverage probability over the usual confidence region while maintaining the same volume. Thus, the classical spherical confidence regions for the mean vector of a multivariate normal distribution have been improved by changing the point estimator for the parameter In 1961, James and Stein found a shrinkage estimator having total mean square error, TMSE, smaller than that of the usual estimator. In 1982, Casella and Hwang gave an analytical proof of the dominance of the confidence sphere which uses the James Stein estimator as its center over the usual confidence sphere centered at the sample mean vector. This opened up new possibilities in multiple comparisons This dissertation will focus on simultaneous confidence intervals for treatment means and for the differences between treatment means and the mean of a control in one-way and two-way Analysis of Variance, ANOVA, studies. We make use of Stein-type shrinkage estimators as centers to improve the simultaneous coverage of those confidence intervals. The main obstacle to an analytic study is that the rectangular confidence regions are not rotation invariant like the spherical confidence regions Therefore, we primarily use simulation to show dominance of the rectangular confidence intervals centered around a shrinkage estimator over the usual rectangular confidence regions centered about the sample means. For the one-way ANOVA model, our simulation results indicate that our confidence procedure has higher coverage probability than the usual confidence procedure if the number of means is sufficiently large. We develop a lower bound for the coverage probability of our rectangular confidence region which is a decreasing function of the shrinkage constant for the estimator used as center and use this bound to prove that the rectangular confidence intervals centered around a shrinkage estimator have coverage probability uniformly exceeding that of the usual rectangular confidence regions up to an arbitrarily small epsilon when the number of means is sufficiently large. We show that these intervals have strictly greater coverage probability when all the parameters are zero, and that the coverage probability of the two procedures converge to one another when at least one of the parameters becomes arbitrarily large To check the reliability of our simulations for the one-way ANOVA model, we use numerical integration to calculate the coverage probability for the rectangular confidence regions. Gaussian quadrature making use of Hermite polynomials is used to approximate the coverage probability of our rectangular confidence regions for n=2, 3, 4. The difference in results between numerical integration and simulations is negligible. However, numerical integration yields values slightly higher than the simulations A similar approach is applied to develop improved simultaneous confidence intervals for the comparison of treatment means with the mean of a control. We again develop a lower bound for the coverage probability of our confidence procedure and prove results similar to those that we proved for the one-way ANOVA model. We also apply our approach to develop improved simultaneous confidence intervals for the cell means for a two-way ANOVA model. We again primarily use simulation to show dominance of the rectangular confidence intervals centered around an appropriate shrinkage estimator over the usual rectangular confidence regions. We again develop a lower bound for the coverage probabilities of our confidence procedure and prove the same results that we proved for the one-way model Statistics
282	Use of Simultaneous Inference Under Order Restriction, Stepdown Testing Procedure and Stage-wise Sequential Optimal Design in Clinical Dose Study Jia, Gang 31 January 2005 (has links) This dissertation discusses the design approaches of adaptive dose escalation study and the analysis methods of dose study data, and the relationship between the study design approach and data analysis methods. A general max-min approach to construct simultaneous confidence Intervals for the monotone means of correlated and normally istributed random samples is proposed to analyze correlated dose response data. The approach provides an accurate, flexible and computationally easy way to obtain critical values of simultaneous confidence intervals under monotone order restriction. Stepdown testing procedure for analyzing dose study data is examined and an modified stepdown testing approach is proposed to incorporate the adaptive sampling nature of the study data. An approximate mixture normal distribution of the dose response is proposed to analyze the binary outcome with small sample size at the first stage of the adaptive design. Finally, an optimal stage-wise adptive clinical dose study design is proposed to be applied in dose escalation study with binary outcome and correlated dose response. The study design criteria is defined as a weighted average power to identify all effective dose levels. A back induction algorithm is used to obtain the design parameters. The values of optimal design parameters vary when different analysis methods are used to analyze the study data. Simulation studies are performed to illustrate the two proposed analysis methods and the proposed optimal design approach. Statistics
283	PARAMETER ESTIMATION FOR LATENT MIXTURE MODELS WITH APPLICATIONS TO PSYCHIATRY Ren, Lulu 06 July 2006 (has links) Longitudinal and repeated measurement data commonly arise in many scientific research areas. Traditional methods have focused on estimating single mean response as a function of a time related variable and other covariates in a homogeneous population. However, in many situations the homogeneity assumption may not be appropriate. Latent mixture models combine latent class modeling and conventional mixture modeling. They accommodate the population heterogeneity by modeling each subpopulation with a mixing component. In this paper, we developed a hybrid Markov Chain Monte Carlo algorithm to estimate the parameters of the latent mixture model. We show through simulation studies that MCMC algorithm is superior than the EM algorithm when missing value percentage is large. As an extension of latent mixture models, we also propose the use of cubic splines as a curve fitting technique instead of classic polynomial fitting. We show that this method gives better fits to the data, and our MCMC algorithm estimates the model efficiently. We apply the cubic spline technique to a data set which was collected in a study of alcoholism. Our MCMC algorithm shows several different P300 amplitude trajectory patterns among children and adolescents. Other topics that are covered in this thesis include the identifiability of the latent mixture model and the use of such model to predict a binary outcome. We propose a bivariate version of the latent mixture model, where two courses of longitudinal responses can be modeled at the same time. Computational aspects of such models remain to be completed in the future. Statistics
284	Bounded Influence Approaches to Constrained Mixed Vector Autoregressive Models Gamalo, Mark Amper 28 September 2006 (has links) The proliferation of many clinical studies obtaining multiple biophysical signals from several individuals repeatedly in time is increasingly recognized, a recognition generating growth in statistical models that analyze cross-sectional time series data. In general, these statistical models try to answer two questions: (i) intra-individual dynamics of the response and its relation to some covariates; and, (ii) how this dynamics can be aggregated consistently in a group. In response to the first question, we propose a covariate-adjusted constrained Vector Autoregressive model, a technique similar to the STARMAX model (Stoffer, JASA 81, 762-772), to describe serial dependence of observations. In this way, the number of parameters to be estimated is kept minimal while offering flexibility for the model to explore higher order dependence. In response to (ii), we use mixed effects analysis that accommodates modelling of heterogeneity among cross-sections arising from covariate effects that vary from one cross-section to another. Although estimation of the model can proceed using standard maximum likelihood techniques, we believed it is advantageous to use bounded influence procedures in the modelling (such as choosing constraints) and parameter estimation so that the effects of outliers can be controlled. In particular, we use M-estimation with a redescending bounding function because its influence function is always bounded. Furthermore, assuming consistency, this influence function is useful to obtain the limiting distribution of the estimates. However, this distribution may not necessarily yield accurate inference in the presence of contamination as the actual asymptotic distribution might have wider tails. This led us to investigate bootstrap approximation techniques. A sampling scheme based on IID innovations is modified to accommodate the cross-sectional structure of the data. Then the M-estimation is applied to each bootstrap sample naively to obtain the asymptotic distribution of the estimates. We apply these strategies to the extracted BOLD activation from several regions of the brain from a group of individuals to describe joint dynamic behavior between these locations. We used simulated data with both innovation and additive outliers to test whether the estimation procedure is accurate despite contamination. Statistics
285	DOSE FINDING STRATEGIES FOR SINGLE DRUG AND COMBINATION DRUG TRIALS Soulakova, Julia 02 October 2006 (has links) A key component of drug development is to establish the compounds dose-response relationship, and identify all effective doses of the drug with a general goal of selecting the minimum effective dose (MED). A new closed testing procedure is proposed for identifying the MED for a single component drug. This procedure is based on constructing simultaneous one-sided confidence bands for the response surface of each doses effect relative to placebo. Our methodology utilizes a stepwise closed testing to test the ordered hypotheses of equality of mean dose-responses. The pattern of the rejected and accepted null hypotheses provides the estimate of the MED, if it exists. In the case of a combination drug, in addition to demonstrating safety and efficacy the FDA requires demonstrating that each component makes a contribution to the claimed effects. A combination which satisfies the last requirement is called an efficacious combination. In the most common case both single drugs are approved ones, and therefore, the efficacious combinations are effective, that is, they produce a therapeutic effect which is superior to placebo. We propose a closed testing procedure for estimating the minimum efficacious combinations (MeDs) in a two-drug study and introduce a notion of the MeD-set. The main advantage of a closed testing procedure is the strong control of the familywise error at level of significance á and allowing testing individual hypotheses at the same significance level á without multiplicity adjustments. The proposed procedure is based on two main steps. In the first step, all possible structures of the population MeD-set are identified and the related closed family of hypotheses is constructed, and the proper step-down testing partial order is established. The second step is the á-testing step. Using the closed testing principle, we test the hypotheses by constructing the AVE-test statistic. The pattern of the rejected null hypotheses identifies the MeD-set. In order to assess the performance of our procedure, we define several statistical measures. These notions are used in a large simulation study to examine the goodness of the estimation procedures and to identify the population configurations when the procedure performs the best. Statistics
286	Variable Selection when Confronted with Missing Data Ziegler, Melissa Lynn 02 October 2006 (has links) Variable selection is a common problem in linear regression. Stepwise methods, such as forward selection, are popular and are easily available in most statistical packages. The models selected by these methods have a number of drawbacks: they are often unstable, with changes in the set of variable selected due to small changes in the data, and they provide upwardly biased regression coefficient estimates. Recently proposed methods, such as the lasso, provide accurate predictions via a parsimonious, interpretable model. Missing data values are also a common problem, especially in longitudinal studies. One approach to account for missing data is multiple imputation. The simulation studies were conducted comparing the lasso to standard variable selection methods under different missing data conditions, including the percentage of missing values and the missing data mechanism. Under missing at random mechanisms, missing data were created at the 25 and 50 percent levels with two types of regression parameters, one containing large effects and one containing several small, but nonzero, effects. Five correlation structures were used in generating the data: independent, autoregressive with correlation 0.25 and 0.50, and equicorrelated again with correlation 0.25 and 0.50. Three different missing data mechanisms were used to create the missing data: linear, convex and sinister. These mechanisms Least angle regression performed well under all conditions when the true regression parameter vector contained large effects, with its dominance increasing as the correlation between the predictor variables increased. This is consistent with complete data simulations studies suggesting the lasso performed poorly in situations where the true beta vector contained small, nonzero effects. When the true beta vector contained small, nonzero effects, the performance of the variable selection methods considered was situation dependent. Ordinary least squares had superior performance in terms confidence interval coverage under the independent correlation structure and with correlated data when the true regression parameter vector consists of small, nonzero effects. A variety of methods performed well when the regression parameter vector consisted of large effects and the predictor variables were correlated depending on the missing data situation. Statistics
287	REPORTING UNCERTAINITY BY SPLINE FUNCTION APPROXIMATION OF LOG-LIKELIHOOD Sezer, Ahmet 30 January 2007 (has links) Reporting uncertainty is one of the most important tasks in any statistical paradigm. Likelihood functions from independent studies can be easily combined, and the combined likelihood function serves as a meaningful indication of the support the observed data give to the various parameter values. This fact has led us to suggest using the likelihood function as a summary of post-data uncertainty concerning the parameter. However, a serious difficulty arises because likelihood functions may not be expressible in a compact, easily-understood mathematical form suitable for communication or publication. To overcome this difficulty, we propose to approximate log-likelihood functions by using piecewise polynomials governed by a minimal number of parameters. Our goal is to find the function of the parameter(s) that approximates the log-likelihood function with the minimum integrated (square) error over the parameter space. We achieve several things by approximating the log-likelihood; first, we significantly reduce the numerical difficulty associated with finding the maximum likelihood estimator. Second, in order to be able to combine the likelihoods that come from independent studies, it is important that the approximation of the log- likelihood should depend only upon a few parameters so that the results can be communicated compactly. By the simulation studies we compared natural cubic spline approximation with the conventional modified likelihood methods in terms of coverage probability and interval length of highest density region obtained from the likelihood and the mean squared error of the maximum likelihood estimator. Statistics
288	Clustering Methodologies with Applications to Integrative Analyses of Post-mortem Tissue Studies in Schizophrenia Wu, Qiang 27 September 2007 (has links) There is an enormous amount of research devoted to the understanding of the neurobiology of schizophrenia. Basic neurobiological studies have focused on identifying possible abnormal neurobiological markers in subjects with schizophrenia. However, due to the many possible combinations of symptoms, schizophrenia is clinically thought not to be a homogeneous disease, so that this possible heterogeneity might be explained neurobiologically in various brain regions. Statistically, the interesting problem is to cluster the subjects with schizophrenia with these neurobiological markers. But, in attempting to combine the neurobiological measurements from multiple studies, several experimental specifics arise that lead to difficulties in developing statistical methodologies for the clustering analysis. The main difficulties are differing control subjects, effects of covariates and existence of missing data. We develop new parametric models to successively deal with these difficulties. First, assuming no missing data and no clusters we construct multivariate normal models with structured means and covariance matrices to deal with the differing control subjects and the effects of covariates. We obtain several parameter estimation algorithms for these models and the asymptotic properties of the resulting estimators. Using these newly obtained results, we then develop model based clustering algorithms to cluster the subjects with schizophrenia into two possible subpopulations while still assuming no missing data. We obtain a new more effective algorithm for clustering and show by simulations that our new algorithm provides the same results in a relatively faster manner as compared to direct applications of some existing algorithms. Finally, for some actual data obtained from three studies conducted in the Conte Center for the Neuroscience of Mental Disorders in the Department of Psychiatry at the University of Pittsburgh, to handle the missingness we conduct imputations to create multiply imputed data sets using certain regression methods. The new complete data clustering algorithm is then applied to the multiply imputed data sets. The resulting multiple clustering results are integrated to form one single clustering of the subjects with schizophrenia to represent the uncertainty due to the missingness. The results suggest the existence of two possible clusters of the subjects with schizophrenia. Statistics
289	Analysis of Longitudinal Random Length Data Iosif, Ana-Maria 25 January 2008 (has links) In some clinical trials, data are gathered longitudinally on both the frequency of an event and its severity. Oftentimes, it is not feasible to obtain the exact time of the events, and the events are collected over fixed follow-up intervals. We refer to this type of data as longitudinal random length data, since the subjects are observed repeatedly and, at each assessment time, the data can be viewed as vectors of severities with lengths determined by the number of events experienced during the assessment. Suppose the interest is in comparing two treatments, and the treatments are evaluated at multiple points in time. Treatment effect is reflected in simultaneous changes in both the number of events and the severity of each event. Consequently, one needs to jointly model the two outcomes to better evaluate treatment effects. The main objective of this dissertation is to introduce a framework for longitudinal random length data. We propose two multiple population models for such data. We parameterize the models such that, at each measurement time, both the distribution of the random lengths and the distributional mean of each component of the severity vectors depend on the underlying parameter reflecting the treatment effect at that time. Given the random lengths, we assume the distribution of the severities to be multivariate normal. Conditional on the number of events, the dependence in the vector of severities recorded at a single measurement time is modeled using compound symmetry. The first model assumes the numbers of events for a subject at different time points to be independent Poisson random variables and dependence over time is built into the severity measures. The second model generalizes the first one, by adding another layer of dependence over time. We further assume the numbers of the events experienced by a subject across time to be dependent and use a multivariate Poisson distribution to model them. For each model we describe the maximum likelihood estimation procedure and provide the asymptotic properties for the estimators. We apply both models to analyze a data set containing stressful life events in adolescents with major depressive disorder. Statistics
290	MARKOV MODELS FOR LONGITUDINAL COURSE OF YOUTH BIPOLAR DISORDER Lopez, Adriana 13 June 2008 (has links) In this dissertation, mixtures of first order Markov chains and Hidden Markov models were used to model variable length sequences in order to find longitudinal patterns. Data from the Course and Outcome of Bipolar Youth (COBY) study was used to estimate these models. A mixture of four first order Markov chains found patterns of movers and stayers. Cluster 4 is the stayers. Cluster 3 are movers among the depression, well and submania states. Cluster 2 are movers that tend to stay in the well state. Cluster 1 are movers that tend to go to the submania/subdepression state. On the other hand, a hidden Markov model with ten hidden states justifies the use of a scale with syndromal, subsyndromal and asymptomatic episodes defined by psychiatrists. The inclusion of covariates in hidden Markov models showed that: males move more than females, children move more than teenagers, and patients who live in another situation move more than patients who live with both natural parents. For bipolar diagnosis, BPII and BPNOS patients show similar transition patterns. Age of bipolar onset sheds light on the stability of patients with a childhood and an early adolescence onset. Thus, the possibility of an early diagnosis of the disorder would consequently lead to provide appropriate treatment, and that would lessen the impairment of bipolar youth. Socio-economic status showed patients with low socio-economic status staying more weeks with subsyndromal submanic and mixed episodes, and less weeks with subsyndromal depression and asymptomatic episodes. Quite the opposite behavior observed for their counterparts in with high socio-economic status. This is the first research using these two Markov models to analyze the longitudinal course of bipolar disorder in children and adolescents. No previous study has modeled the longitudinal course of bipolar disorder using Markov models that estimate the transitions among the different episodes of bipolar disorder. Furthermore, no previous study has modeled the effects of covariates consistently with the longitudinal nature of the disease. Statistics

Search results