Spelling suggestions: "subject:"estatistics"" "subject:"cstatistics""
391 |
A comparison of approaches to analysis of clustered binary data when cluster sizes are large /Tan, Ye January 2004 (has links)
Several methods can be used in cluster-randomized studies with binary outcome, including GLMM, GEE, and ordinary logistic regression. In this thesis, we study cluster-randomized data with large cluster sizes relative to the number of clusters (for example, the PROBIT study). We compared GLMM, GEE, and ordinary logistic regression approaches in terms of parameter interpretation, magnitude, and standard errors of model parameters. A simulation study was performed to evaluate the performance of these methods. GLMM implemented with penalized quasi-likelihood performed well. It gave the highest empirical confidence interval coverage of the true coefficients at 95% level of confidence. GEE was robust with the smallest MSE when within group correlation sigma u ≥ 0.5. Logistic regression models performed well when the correlation was very weak, but fared poorly when the correlation was stronger. When correlations are quite low, logistic models may be acceptable for clustered data; however, they give inappropriate inference when correlation is elevated.
|
392 |
Applications of Markov chains to reliability of long-haul communication systemsHuang, Yilan January 1995 (has links)
This thesis applies Markov chain methods to evaluate the reliability of multipath switching networks and optical amplified systems in long-haul communication systems. When conventional methods proved impractical to evaluate the reliability of multipath switching networks and optical amplified systems, some practical methods based on Markov chains were developed by Whitmore and others (1987, 1988, 1991) and used successfully to evaluate the reliability of such systems. This work aims at describing the details of Markov chain methods for reliability calculation, and demonstrating their application to reliability evaluation of multipath switching networks and optical amplified systems.
|
393 |
Analysis of prostate specific antigen "trajectories" : statistical challengesTurner, Elizabeth, 1978- January 2002 (has links)
Prostate Specific Antigen (PSA) is a biochemical marker used to monitor prostate cancer following treatment. We have analyzed serial PSA data for a cohort of men who underwent radical surgery for prostate cancer in the early 1990s. We first describe the development of a statistical model that reflects the characteristics of these data. It accommodates, via a mixture model with constrained parameters, two sub-populations of men who are and are not cured, along with the biologic pattern in the latter. We then describe how we fit this model using Gibbs Sampling. We describe how the output can be converted into parameters of interest to patients and their physicians.
|
394 |
Omitting a strong covariate from Cox's proportional hazards modelIshak, Khajak. January 2001 (has links)
The primary objective of this thesis is to explore the effect of omitting a strong prognostic factor from Cox's proportional hazards model when analyzing data from randomized trials. The secondary objective is to provide an overview of the Cox model. We first present the properties of the model, the method of maximum partial likelihood and the elements available on which to draw inferences concerning parameters. We then discuss various methods used to assess the tenability of the proportional hazards assumption as well as ways to incorporate non-proportional hazards in the model. / In the third and final chapter, we address the primary objective of the thesis. In linear regression analysis, unbiased estimates of the effect of the intervention can be obtained even when important but balanced determinants of the outcome are omitted from the model; the precision of the estimates are improved, however, with the inclusion of strong covariates. The logistic and Cox regression (and other non-linear) models do not share this property, however. We discuss the literature on this topic and provide examples to illustrate the problem. We examine the situation for the Cox model in more detail with the analysis of data from an experiment on the effect of increased sexual activity on the longevity of male fruitflies.
|
395 |
Modeling heterogeneity of capture probabilities in capture-recapture studiesMelocco, Marie. January 2002 (has links)
In capture-recapture studies, the estimate of the size of the population is biased when members differ in the probability of capture by any source. Different log-linear models can account for the heterogeneity among individuals, using covariate information. We compare different methods modeling this observable heterogeneity using data from the Auckland Leg Ulcer Study, 1997--1998. One method is based on the Coefficient of Source Dependence (CSD) and the other one is based on logistic regression. We find that these methods reduce the bias of the estimate and we recommend their use.
|
396 |
Dynamic Documents for Data Analytic ScienceBecker, Gabriel 26 March 2015 (has links)
<p> The need for reproducibility in computational research has been highlighted by a number of recent failures to replicate published data analytic findings. Most efforts to ensure reproducibility involve providing guarantees that reported results can be generated from the data via the reported methods, with a popular avenue being dynamic documents. This insurance is necessary but not sufficient for full validation, as inappropriately chosen methods will simply reproduce questionable results. To fully verify computational research we must replicate analysts' research processes, including: choice of and response to exploratory or intermediate results, identification of potential analysis strategies and statistical methods, selection of a single strategy from among those considered, and finally, the generation of reported results using the chosen method. </p><p> We present the concept of comprehensive dynamic documents. These documents represent the full breadth of an analyst's work during computational research, including code and text describing: intermediate and exploratory computations, alternate methods, and even ideas the analyst had which were not fully pursued. Furthermore, additional information can be embedded in the documents such as data provenance, experimental design, or details of the computing system on which the work was originally performed. We also propose computational models for representing, processing, and programmatically operating on such documents within R. </p><p> These comprehensive documents act as databases, encompassing both the work that the analyst has performed and the relationships among specific pieces of that work. This allows us to investigate research in a number of ways difficult or impossible to achieve given only a description of the final strategy. We can explore the choice of methods and whether due diligence was performed during an analysis. Secondly, we can compare alternative strategies either side-by-side or interactively. Finally, we can treat these complex documents as data about the research process and analyze them programmatically. </p><p> We also present a proof-of-concept set of software tools for working with comprehensive dynamic documents. This includes an R package which implements a framework for comprehensive documents in R, an extension of the IPython Notebook platform which allows users to author and interactively view them, and a caching mechanism which provides the efficiency necessary for interactive, self-updating views of such documents.</p>
|
397 |
Modeling Minimal Spanning Trees with Beta VectorsLiu, Haigang 28 June 2014 (has links)
<p>We examine methods of generating beta random vectors to model the normalized interpoint distances on the minimal spanning tree (MST). Using properties of the univariate beta distribution, we propose three methods for generating multivariate beta vectors. We use overlapping sums of the components of a Dirichlet distribution to construct beta vectors. We investigate the products of beta variables that follow an ordered Dirichlet distribution. The geometric mean of beta variables is explored to produce a multivariate beta distribution. We define a multivariate Gini index for the normalized distances on the MST to measure the amount of scatter in a multivariate sample and the inequity among the interpoint distances. An example shows the MST of 11 European languages with respect to the first 10 numerals. A simulation study compares the parametric bootstrap of the Gini index, the maximum and the range of the interpoint distances with results from modeling the distances on the MST. </p><p> <b>Keywords</b>: Minimal Spanning Tree; Multivariate Beta; Dirichlet; Gini index; Lorenz Curve. </p>
|
398 |
Depth Functions, Multidimensional Medians and Tests of Uniformity on Proximity GraphsYang, Mengta 20 May 2014 (has links)
<p> We represent the <i>d</i>-dimensional random vectors as vertices of a complete weighted graph and propose depth functions that are applicable to distributions in <i>d</i>-dimensional spaces and data on graphs. We explore the proximity graphs, examine their connection to existing depth functions, define a family of depth functions on the β-skeleton graph, suggest four types of depth functions on the minimal spanning tree (MST) and define depth functions including path depth, path depth of path length at most δ, all paths probability depth, eccentricity depth, peeling depth and RUNT depth. We study their properties, including affine invariance, maximality at the center, monotonicity and vanishing at infinity. We show that the β-skeleton depth is a family of statistical depth functions and define the sample β-skeleton depth function. We show that it has desirable asymptotic properties, including uniform consistency and asymptotic normality. We consider the corresponding multidimensional medians, investigate their robustness, computational complexity, compare them in a simulation study to find the multivariate medians under different distributions and sample sizes and explore the asymptotic properties of β-skeleton median. We generalize the univariate Greenwood statistic and Hegazy-Green statistic using depth induced order statistics and propose two new test statistics based on normal copula and interpoint distances for testing multivariate uniformity. We generalize the path depth under two-sample setting and propose a new multivariate equality of DF test. We study their empirical power against several copula and multivariate <i>Beta</i> alternatives. The topic is complemented with a discussion on the distribution and moments of the interpoint distances (ID) between bivariate uniform random vectors and the IDs between FGM copula random vectors.</p>
|
399 |
Data mining techniques for constructing jury selection modelsEspy, John 23 April 2014 (has links)
<p> Jury selection can determine a case before it even begins. The goal is to predict whether a juror rules for the plaintiff or the defense in the medical malpractice trials that are conducted, and which variables are significant in predicting this. The data for the analysis were obtained from mock trials that simulated actual trials, with possible arguments from the defense and the plaintiff with ample discussion time. These mock trials were supplemented by surveys that attempted to capture the characteristics and attitudes of the mock juror and the case at hand. The data were modeled using the logistic regression as well as decision trees and neural networks techniques. </p>
|
400 |
Correlation adjusted penalization in regression analysisTan, Qi Er 25 September 2012 (has links)
The PhD thesis introduces two new types of correlation adjusted penalization methods to address the issue of multicollinearity in regression analysis. The main purpose is to achieve simultaneous shrinkage of parameter estimators and variable selection for multiple linear regression and logistic regression when the predictor variables are highly correlated. The motivation is that when there is serious issue of multicollinearity, the variances of parameter estimators are significantly large. The new correlation adjusted penalization methods shrink the parameter estimators and their variances to alleviate the problem of multicollinearity. The latest important trend to deal with multicollinearity is to apply penalization methods for simultaneous shrinkage and variable selection. In the literature, the following penalization methods are popular: ridge, bridge, LASSO, SCAD, and OSCAR. Few papers have used correlation based penalization methods, and these correlation based methods in the literature do not work when some correlations are either 1 or -1. This means that these correlation based methods fail if at least two predictor variables are perfectly correlated. We introduce two new types of correlation adjusted penalization methods that work whether or not the predictor variables are perfectly correlated. The types of correlation adjusted penalization methods introduced in my thesis are intuitive and innovative. We investigate important theoretical properties of these new types of penalization methods, including bias, mean squared error, data argumentation and asymptotic properties, and plan to apply them to real data sets in the near future.
|
Page generated in 0.0556 seconds