Spelling suggestions: "subject:"estatistics,"" "subject:"cstatistics,""
281 |
Improving coverage of rectangular confidence intervalGogtas, Hakan 23 September 2004 (has links)
To find a better confidence region is always of interest in statistics. One way to find better confidence regions is to uniformly improve coverage probability over the usual confidence region while maintaining the same volume. Thus, the classical spherical confidence regions for the mean vector of a multivariate normal distribution have been improved by changing the point estimator for the parameter
In 1961, James and Stein found a shrinkage estimator having total mean square error, TMSE, smaller than that of the usual estimator. In 1982, Casella and Hwang gave an analytical proof of the dominance of the confidence sphere which uses the James Stein estimator as its center over the usual confidence sphere centered at the sample mean vector. This opened up new possibilities in multiple comparisons
This dissertation will focus on simultaneous confidence intervals for treatment means and for the differences between treatment means and the mean of a control in one-way and two-way Analysis of Variance, ANOVA, studies. We make use of Stein-type shrinkage estimators as centers to improve the simultaneous coverage of those confidence intervals. The main obstacle to an analytic study is that the rectangular confidence regions are not rotation invariant like the spherical confidence regions
Therefore, we primarily use simulation to show dominance of the rectangular confidence intervals centered around a shrinkage estimator over the usual rectangular confidence regions centered about the sample means. For the one-way ANOVA model, our simulation results indicate that our confidence procedure has higher coverage probability than the usual confidence procedure if the number of means is sufficiently large. We develop a lower bound for the coverage probability of our rectangular confidence region which is a decreasing function of the shrinkage constant for the estimator used as center and use this bound to prove that the rectangular confidence intervals centered around a shrinkage estimator have coverage probability uniformly exceeding that of the usual rectangular confidence regions up to an arbitrarily small epsilon when the number of means is sufficiently large. We show that these intervals have strictly greater coverage probability when all the parameters are zero, and that the coverage probability of the two procedures converge to one another when at least one of the parameters becomes arbitrarily large
To check the reliability of our simulations for the one-way ANOVA model, we use numerical integration to calculate the coverage probability for the rectangular confidence regions. Gaussian quadrature making use of Hermite polynomials is used to approximate the coverage probability of our rectangular confidence regions for n=2, 3, 4. The difference in results between numerical integration and simulations is negligible. However, numerical integration yields values slightly higher than the simulations
A similar approach is applied to develop improved simultaneous confidence intervals for the comparison of treatment means with the mean of a control. We again develop a lower bound for the coverage probability of our confidence procedure and prove results similar to those that we proved for the one-way ANOVA model.
We also apply our approach to develop improved simultaneous confidence intervals for the cell means for a two-way ANOVA model. We again primarily use simulation to show dominance of the rectangular confidence intervals centered around an appropriate shrinkage estimator over the usual rectangular confidence regions. We again develop a lower bound for the coverage probabilities of our confidence procedure and prove the same results that we proved for the one-way model
|
282 |
Use of Simultaneous Inference Under Order Restriction, Stepdown Testing Procedure and Stage-wise Sequential Optimal Design in Clinical Dose StudyJia, Gang 31 January 2005 (has links)
This dissertation discusses the design approaches of adaptive dose escalation study and the analysis methods of dose study data, and the relationship between the study design approach and data analysis methods.
A general max-min approach to construct simultaneous confidence Intervals for the monotone means of correlated and normally istributed random samples is proposed to analyze correlated dose response data. The approach provides an accurate, flexible and computationally easy way to obtain critical values of simultaneous confidence intervals under monotone order restriction.
Stepdown testing procedure for analyzing dose study data is examined and an modified stepdown testing approach is proposed to incorporate the adaptive sampling nature of the study data. An approximate mixture normal distribution of the dose response is proposed to analyze the binary outcome with small sample size at the first stage of the adaptive design.
Finally, an optimal stage-wise adptive clinical dose study design is proposed to be applied in dose escalation study with binary outcome and correlated dose response. The study design criteria is defined as a weighted average power to identify all effective dose levels. A back induction algorithm is used to obtain the design parameters. The values of optimal design parameters vary when different analysis methods are used to analyze the study data.
Simulation studies are performed to illustrate the two proposed analysis methods and the proposed optimal design approach.
|
283 |
PARAMETER ESTIMATION FOR LATENT MIXTURE MODELS WITH APPLICATIONS TO PSYCHIATRYRen, Lulu 06 July 2006 (has links)
Longitudinal and repeated measurement data commonly arise in many scientific research
areas. Traditional methods have focused on estimating single mean response as a function of
a time related variable and other covariates in a homogeneous population. However, in many
situations the homogeneity assumption may not be appropriate. Latent mixture models
combine latent class modeling and conventional mixture modeling. They accommodate the
population heterogeneity by modeling each subpopulation with a mixing component. In
this paper, we developed a hybrid Markov Chain Monte Carlo algorithm to estimate the
parameters of the latent mixture model. We show through simulation studies that MCMC
algorithm is superior than the EM algorithm when missing value percentage is large.
As an extension of latent mixture models, we also propose the use of cubic splines as
a curve fitting technique instead of classic polynomial fitting. We show that this method
gives better fits to the data, and our MCMC algorithm estimates the model efficiently. We
apply the cubic spline technique to a data set which was collected in a study of alcoholism.
Our MCMC algorithm shows several different P300 amplitude trajectory patterns among
children and adolescents.
Other topics that are covered in this thesis include the identifiability of the latent mixture
model and the use of such model to predict a binary outcome. We propose a bivariate version
of the latent mixture model, where two courses of longitudinal responses can be modeled at
the same time. Computational aspects of such models remain to be completed in the future.
|
284 |
Bounded Influence Approaches to Constrained Mixed Vector Autoregressive ModelsGamalo, Mark Amper 28 September 2006 (has links)
The proliferation of many clinical studies obtaining multiple biophysical signals from several individuals repeatedly in time is increasingly recognized, a recognition generating growth in statistical models that analyze cross-sectional time series data. In general, these statistical models try to answer two questions: (i) intra-individual dynamics of the response and its relation to some covariates; and, (ii) how this dynamics can be aggregated consistently in a group. In response to the first question, we propose a covariate-adjusted constrained Vector Autoregressive model, a technique similar to the STARMAX model (Stoffer, JASA 81, 762-772), to describe serial dependence of observations. In this way, the number of parameters to be estimated is kept minimal while offering flexibility for the model to explore higher order dependence. In response to (ii), we use mixed effects analysis that accommodates modelling of heterogeneity among cross-sections arising from covariate effects that vary from one cross-section to another.
Although estimation of the model can proceed using standard maximum likelihood techniques, we believed it is advantageous to use bounded influence procedures in the modelling (such as choosing constraints) and parameter estimation so that the effects of outliers can be controlled. In particular, we use M-estimation with a redescending bounding function because its influence function is always bounded. Furthermore, assuming consistency, this influence function is useful to obtain the limiting distribution of the estimates. However, this distribution may not necessarily yield accurate inference in the presence of contamination as the actual asymptotic distribution might have wider tails. This led us to investigate bootstrap approximation techniques. A sampling scheme based on IID innovations is modified to accommodate the cross-sectional structure of the data. Then the M-estimation is applied to each bootstrap sample naively to obtain the asymptotic distribution of the estimates.
We apply these strategies to the extracted BOLD activation from several regions of the brain from a group of individuals to describe joint dynamic behavior between these locations. We used simulated data with both innovation and additive outliers to test whether the estimation procedure is accurate despite contamination.
|
285 |
DOSE FINDING STRATEGIES FOR SINGLE DRUG AND COMBINATION DRUG TRIALSSoulakova, Julia 02 October 2006 (has links)
A key component of drug development is to establish the compounds dose-response relationship, and identify all effective doses of the drug with a general goal of selecting the minimum effective dose (MED).
A new closed testing procedure is proposed for identifying the MED for a single component drug. This procedure is based on constructing simultaneous one-sided confidence bands for the response surface of each doses effect relative to placebo. Our methodology utilizes a stepwise closed testing to test the ordered hypotheses of equality of mean dose-responses. The pattern of the rejected and accepted null hypotheses provides the estimate of the MED, if it exists.
In the case of a combination drug, in addition to demonstrating safety and efficacy the FDA requires demonstrating that each component makes a contribution to the claimed effects. A combination which satisfies the last requirement is called an efficacious combination.
In the most common case both single drugs are approved ones, and therefore, the efficacious combinations are effective, that is, they produce a therapeutic effect which is superior to placebo.
We propose a closed testing procedure for estimating the minimum efficacious combinations (MeDs) in a two-drug study and introduce a notion of the MeD-set.
The main advantage of a closed testing procedure is the strong control of the familywise error at level of significance á and allowing testing individual hypotheses at the same significance level á without multiplicity adjustments.
The proposed procedure is based on two main steps. In the first step, all possible structures of the population MeD-set are identified and the related closed family of hypotheses is constructed, and the proper step-down testing partial order is established. The second step is the á-testing step. Using the closed testing principle, we test the hypotheses by constructing the AVE-test statistic. The pattern of the rejected null hypotheses identifies the MeD-set.
In order to assess the performance of our procedure, we define several statistical measures. These notions are used in a large simulation study to examine the goodness of the estimation procedures and to identify the population configurations when the procedure performs the best.
|
286 |
Variable Selection when Confronted with Missing DataZiegler, Melissa Lynn 02 October 2006 (has links)
Variable selection is a common problem in linear regression.
Stepwise methods, such as forward selection, are popular and are
easily available in most statistical packages. The models selected
by these methods have a number of drawbacks: they are often
unstable, with changes in the set of variable selected due to small
changes in the data, and they provide upwardly biased regression
coefficient estimates. Recently proposed methods, such as the lasso,
provide accurate predictions via a parsimonious, interpretable
model.
Missing data values are also a common problem, especially in
longitudinal studies. One approach to account for missing data is
multiple imputation. The simulation studies were conducted comparing
the lasso to standard variable selection methods under different
missing data conditions, including the percentage of missing values
and the missing data mechanism. Under missing at random mechanisms,
missing data were created at the 25 and 50 percent levels with two
types of regression parameters, one containing large effects and one
containing several small, but nonzero, effects. Five correlation
structures were used in generating the data: independent,
autoregressive with correlation 0.25 and 0.50, and equicorrelated
again with correlation 0.25 and 0.50. Three different missing data
mechanisms were used to create the missing data: linear, convex and
sinister. These mechanisms
Least angle regression performed well under all conditions when the
true regression parameter vector contained large effects, with its
dominance increasing as the correlation between the predictor
variables increased. This is consistent with complete data
simulations studies suggesting the lasso performed poorly in
situations where the true beta vector contained small, nonzero
effects. When the true beta vector contained small, nonzero effects,
the performance of the variable selection methods considered was
situation dependent.
Ordinary least squares had superior performance in terms confidence
interval coverage under the independent correlation structure and
with correlated data when the true regression parameter vector
consists of small, nonzero effects. A variety of methods performed
well when the regression parameter vector consisted of large effects
and the predictor variables were correlated depending on the missing
data situation.
|
287 |
REPORTING UNCERTAINITY BY SPLINE FUNCTION APPROXIMATION OF LOG-LIKELIHOODSezer, Ahmet 30 January 2007 (has links)
Reporting uncertainty is one of the most important tasks in any statistical paradigm.
Likelihood functions from independent studies can be easily combined, and the combined
likelihood function serves as a meaningful indication of the support the observed data
give to the various parameter values. This fact has led us to suggest using the likelihood
function as a summary of post-data uncertainty concerning the parameter.
However, a serious difficulty arises because likelihood functions may not be
expressible in a compact, easily-understood mathematical form suitable for
communication or publication. To overcome this difficulty, we propose to approximate
log-likelihood functions by using piecewise polynomials governed by a minimal number
of parameters. Our goal is to find the function of the parameter(s) that approximates
the log-likelihood function with the minimum integrated (square) error over the
parameter space. We achieve several things by approximating the log-likelihood;
first, we significantly reduce the numerical difficulty associated with finding
the maximum likelihood estimator. Second, in order to be able to combine the likelihoods
that come from independent studies, it is important that the approximation of the log-
likelihood should depend only upon a few parameters so that the results can be
communicated compactly.
By the simulation studies we compared natural cubic spline approximation with the
conventional modified likelihood methods in terms of coverage probability and interval
length of highest density region obtained from the likelihood and the mean squared
error of the maximum likelihood estimator.
|
288 |
Clustering Methodologies with Applications to Integrative Analyses of Post-mortem Tissue Studies in SchizophreniaWu, Qiang 27 September 2007 (has links)
There is an enormous amount of research devoted to the understanding of the neurobiology of schizophrenia. Basic neurobiological studies have focused on identifying possible abnormal neurobiological markers in subjects with schizophrenia. However, due to the many possible combinations of symptoms, schizophrenia is clinically thought not to be a homogeneous disease, so that this possible heterogeneity might be explained neurobiologically in various brain regions. Statistically, the interesting problem is to cluster the subjects with schizophrenia with these neurobiological markers. But, in attempting to combine the neurobiological measurements from multiple studies, several experimental specifics arise that lead to difficulties in developing statistical methodologies for the clustering analysis. The main difficulties are differing control subjects, effects of covariates and existence of missing data. We develop new parametric models to successively deal with these difficulties. First, assuming no missing data and no clusters we construct multivariate normal models with structured means and covariance matrices to deal with the differing control subjects and the effects of covariates. We obtain several parameter estimation algorithms for these models and the asymptotic properties of the resulting estimators. Using these newly obtained results, we then develop model based clustering algorithms to cluster the subjects with schizophrenia into two possible subpopulations while still assuming no missing data. We obtain a new more effective algorithm for clustering and show by simulations that our new algorithm provides the same results in a relatively faster manner as compared to direct applications of some existing algorithms.
Finally, for some actual data obtained from three studies conducted in the Conte Center for the Neuroscience of Mental
Disorders in the Department of Psychiatry at the University of Pittsburgh, to handle the missingness we conduct imputations to create multiply imputed data sets using certain regression methods. The new complete data clustering algorithm is then applied to the multiply imputed data sets. The resulting multiple clustering results are integrated to form one single clustering of the subjects with schizophrenia to represent the uncertainty due to the missingness. The results suggest the existence of two possible clusters of the subjects with schizophrenia.
|
289 |
Analysis of Longitudinal Random Length DataIosif, Ana-Maria 25 January 2008 (has links)
In some clinical trials, data are gathered longitudinally on both the frequency of an event and its severity. Oftentimes, it is not feasible to obtain the exact time of the events, and the events are collected over fixed follow-up intervals. We refer to this type of data as longitudinal random length data, since the subjects are observed repeatedly and, at each assessment time, the data can be viewed as vectors of severities with lengths determined by the number of events experienced during the assessment.
Suppose the interest is in comparing two treatments, and the treatments are evaluated at multiple points in time. Treatment effect is reflected in simultaneous changes in both the number of events and the severity of each event. Consequently, one needs to jointly model the two outcomes to better evaluate treatment effects. The main objective of this dissertation is to introduce a framework for longitudinal random length data.
We propose two multiple population models for such data. We parameterize the models such that, at each measurement time, both the distribution of the random lengths and the distributional mean of each component of the severity vectors depend on the underlying parameter reflecting the treatment effect at that time. Given the random lengths, we assume the distribution of the severities to be multivariate normal. Conditional on the number of events, the dependence in the vector of severities recorded at a single measurement time is modeled using compound symmetry.
The first model assumes the numbers of events for a subject at different time points to be independent Poisson random variables and dependence over time is built into the severity measures. The second model generalizes the first one, by adding another layer of dependence over time. We further assume the numbers of the events experienced by a subject across time to be dependent and use a multivariate Poisson distribution to model them. For each model we describe the maximum likelihood estimation procedure and provide the asymptotic properties for the estimators. We apply both models to analyze a data set containing stressful life events in adolescents with major depressive disorder.
|
290 |
MARKOV MODELS FOR LONGITUDINAL COURSE OF YOUTH BIPOLAR DISORDERLopez, Adriana 13 June 2008 (has links)
In this dissertation, mixtures of first order Markov chains and Hidden Markov models were used to model variable length sequences in order to find longitudinal patterns. Data from the Course and Outcome of Bipolar Youth (COBY) study was used to estimate these models. A mixture of four first order Markov chains found patterns of movers and stayers. Cluster 4 is the stayers. Cluster 3 are movers among the depression, well and submania states. Cluster 2 are movers that tend to stay in the well state. Cluster 1 are movers that tend to go to the submania/subdepression state. On the other hand, a hidden Markov model with ten hidden states justifies the use of a scale with syndromal, subsyndromal and asymptomatic episodes defined by psychiatrists. The inclusion of covariates in hidden Markov models showed that: males move more than females, children move more than teenagers, and patients who live in another situation move more than patients who live with both natural parents. For bipolar diagnosis, BPII and BPNOS patients show similar transition patterns. Age of bipolar onset sheds light on the stability of patients with a childhood and an early adolescence onset. Thus, the possibility of an early diagnosis of the disorder would consequently lead to provide appropriate treatment, and that would lessen the impairment of bipolar youth. Socio-economic status showed patients with low socio-economic status staying more weeks with subsyndromal submanic and mixed episodes, and less weeks with subsyndromal depression and asymptomatic episodes. Quite the opposite behavior observed for their counterparts in with high socio-economic status. This is the first research using these two Markov models to analyze the longitudinal course of bipolar disorder in children and adolescents. No previous study has modeled the longitudinal course of bipolar disorder using Markov models that estimate the transitions among the different episodes of bipolar disorder. Furthermore, no previous study has modeled the effects of covariates consistently with the longitudinal nature of the disease.
|
Page generated in 0.3747 seconds