Global ETD Search

31	MULTI-STATE MODELS WITH MISSING COVARIATES Lou, Wenjie 01 January 2016 (has links) Multi-state models have been widely used to analyze longitudinal event history data obtained in medical studies. The tools and methods developed recently in this area require the complete observed datasets. While, in many applications measurements on certain components of the covariate vector are missing on some study subjects. In this dissertation, several likelihood-based methodologies were proposed to deal with datasets with different types of missing covariates efficiently when applying multi-state models. Firstly, a maximum observed data likelihood method was proposed when the data has a univariate missing pattern and the missing covariate is a categorical variable. The construction of the observed data likelihood function is based on the model of a joint distribution of the response longitudinal event history data and the discrete covariate with missing values. Secondly, we proposed a maximum simulated likelihood method to deal with the missing continuous covariate when applying multi-state models. The observed data likelihood function was approximated by using the Monte Carlo simulation method. At last, an EM algorithm was used to deal with multiple missing covariates when estimating the parameters of multi-state model. The EM algorithm would be able to handle multiple missing discrete covariates in general missing pattern efficiently. All the proposed methods are justified by simulation studies and applications to the datasets from the SMART project, a consortium of 11 different high-quality longitudinal studies of aging and cognition. Longitudinal event history data multi-state model missing covariate data EM algorithm maximum simulated likelihood SMART project Applied Statistics Statistical Models
32	CONTINUOUS TIME MULTI-STATE MODELS FOR INTERVAL CENSORED DATA Wan, Lijie 01 January 2016 (has links) Continuous-time multi-state models are widely used in modeling longitudinal data of disease processes with multiple transient states, yet the analysis is complex when subjects are observed periodically, resulting in interval censored data. Recently, most studies focused on modeling the true disease progression as a discrete time stationary Markov chain, and only a few studies have been carried out regarding non-homogenous multi-state models in the presence of interval-censored data. In this dissertation, several likelihood-based methodologies were proposed to deal with interval censored data in multi-state models. Firstly, a continuous time version of a homogenous Markov multi-state model with backward transitions was proposed to handle uneven follow-up assessments or skipped visits, resulting in the interval censored data. Simulations were used to compare the performance of the proposed model with the traditional discrete time stationary Markov chain under different types of observation schemes. We applied these two methods to the well-known Nun study, a longitudinal study of 672 participants aged ≥ 75 years at baseline and followed longitudinally with up to ten cognitive assessments per participant. Secondly, we constructed a non-homogenous Markov model for this type of panel data. The baseline intensity was assumed to be Weibull distributed to accommodate the non-homogenous property. The proportional hazards method was used to incorporate risk factors into the transition intensities. Simulation studies showed that the Weibull assumption does not affect the accuracy of the parameter estimates for the risk factors. We applied our model to data from the BRAiNS study, a longitudinal cohort of 531 subjects each cognitively intact at baseline. Last, we presented a parametric method of fitting semi-Markov models based on Weibull transition intensities with interval censored cognitive data with death as a competing risk. We relaxed the Markov assumption and took interval censoring into account by integrating out all possible unobserved transitions. The proposed model also allowed for incorporating time-dependent covariates. We provided a goodness-of-fit assessment for the proposed model by the means of prevalence counts. To illustrate the methods, we applied our model to the BRAiNS study. Longitudinal Data Multi-State Model Interval Censoring Markov Semi-Markov NUN Study BRAiNS Study Statistical Models
33	EMPIRICAL LIKELIHOOD AND DIFFERENTIABLE FUNCTIONALS Shen, Zhiyuan 01 January 2016 (has links) Empirical likelihood (EL) is a recently developed nonparametric method of statistical inference. It has been shown by Owen (1988,1990) and many others that empirical likelihood ratio (ELR) method can be used to produce nice confidence intervals or regions. Owen (1988) shows that -2logELR converges to a chi-square distribution with one degree of freedom subject to a linear statistical functional in terms of distribution functions. However, a generalization of Owen's result to the right censored data setting is difficult since no explicit maximization can be obtained under constraint in terms of distribution functions. Pan and Zhou (2002), instead, study the EL with right censored data using a linear statistical functional constraint in terms of cumulative hazard functions. In this dissertation, we extend Owen's (1988) and Pan and Zhou's (2002) results subject to non-linear but Hadamard differentiable statistical functional constraints. In this purpose, a study of differentiable functional with respect to hazard functions is done. We also generalize our results to two sample problems. Stochastic process and martingale theories will be applied to prove the theorems. The confidence intervals based on EL method are compared with other available methods. Real data analysis and simulations are used to illustrate our proposed theorem with an application to the Gini's absolute mean difference. Empirical Likelihood Statistical Functional Hadamard Differentiable Applied Statistics Statistical Methodology Statistical Theory Survival Analysis
34	PARAMETRIC ESTIMATION IN COMPETING RISKS AND MULTI-STATE MODELS Lin, Yushun 01 January 2011 (has links) The typical research of Alzheimer's disease includes a series of cognitive states. Multi-state models are often used to describe the history of disease evolvement. Competing risks models are a sub-category of multi-state models with one starting state and several absorbing states. Analyses for competing risks data in medical papers frequently assume independent risks and evaluate covariate effects on these events by modeling distinct proportional hazards regression models for each event. Jeong and Fine (2007) proposed a parametric proportional sub-distribution hazard (SH) model for cumulative incidence functions (CIF) without assumptions about the dependence among the risks. We modified their model to assure that the sum of the underlying CIFs never exceeds one, by assuming a proportional SH model for dementia only in the Nun study. To accommodate left censored data, we computed non-parametric MLE of CIF based on Expectation-Maximization algorithm. Our proposed parametric model was applied to the Nun Study to investigate the effect of genetics and education on the occurrence of dementia. After including left censored dementia subjects, the incidence rate of dementia becomes larger than that of death for age < 90, education becomes significant factor for incidence of dementia and standard errors for estimates are smaller. Multi-state Markov model is often used to analyze the evolution of cognitive states by assuming time independent transition intensities. We consider both constant and duration time dependent transition intensities in BRAiNS data, leading to a mixture of Markov and semi-Markov processes. The joint probability of observing a sequence of same state until transition in a semi-Markov process was expressed as a product of the overall transition probability and survival probability, which were simultaneously modeled. Such modeling leads to different interpretations in BRAiNS study, i.e., family history, APOE4, and sex by head injury interaction are significant factors for transition intensities in traditional Markov model. While in our semi-Markov model, these factors are significant in predicting the overall transition probabilities, but none of these factors are significant for duration time distribution. cumulative incidence function interval censored data semi-Markov competing risks multi-state model Statistical Methodology Statistical Theory
35	STOCHASTIC DYNAMICS OF GENE TRANSCRIPTION Xie, Yan 01 January 2011 (has links) Gene transcription in individual living cells is inevitably a stochastic and dynamic process. Little is known about how cells and organisms learn to balance the fidelity of transcriptional control and the stochasticity of transcription dynamics. In an effort to elucidate the contribution of environmental signals to this intricate balance, a Three State Model was recently proposed, and the transcription system was assumed to transit among three different functional states randomly. In this work, we employ this model to demonstrate how the stochastic dynamics of gene transcription can be characterized by the three transition parameters. We compute the probability distribution of a zero transcript event and its conjugate, the distribution of the time durations in gene on or gene off periods, the transition frequency between system states, and the transcriptional bursting frequency. We also exemplify the mathematical results by the experimental data on prokaryotic and eukaryotic transcription. The analysis reveals that no promoters will be definitely turned on to transcribe within a finite time period, no matter how strong the induction signals are applied, and how abundant the activators are available. Although stronger extrinsic signals could enhance promoter activation rate, the promoter creates an intrinsic ceiling that no signals could cross over in a finite time. Consequently, among a large population of isogenic cells, only a portion of the cells, but not the whole population, could be induced by environmental signals to express a particular gene within a finite time period. We prove that the gene on duration follows an exponential distribution, and the gene off intervals show a local maximum that is best described by assuming two sequential exponential process. The transition frequencies are determined by a system of stochastic differential equations, or equivalently, an iterative scheme of integral operators. We prove that for each positive integer n , there associates a unique time, called the peak instant, at which the nth transcript synthesis cycle since time zero proceeds most likely. These moments constitute a time series preserving the nature order of n. Stochastic Dynamics Three State Model Zero Transcript Event Transition Frequency Frequency Burst Statistical Models Statistics and Probability
36	Analysis of Binary Data via Spatial-Temporal Autologistic Regression Models Wang, Zilong 01 January 2012 (has links) Spatial-temporal autologistic models are useful models for binary data that are measured repeatedly over time on a spatial lattice. They can account for effects of potential covariates and spatial-temporal statistical dependence among the data. However, the traditional parametrization of spatial-temporal autologistic model presents difficulties in interpreting model parameters across varying levels of statistical dependence, where its non-negative autocovariates could bias the realizations toward 1. In order to achieve interpretable parameters, a centered spatial-temporal autologistic regression model has been developed. Two efficient statistical inference approaches, expectation-maximization pseudo-likelihood approach (EMPL) and Monte Carlo expectation-maximization likelihood approach (MCEML), have been proposed. Also, Bayesian inference is considered and studied. Moreover, the performance and efficiency of these three inference approaches across various sizes of sampling lattices and numbers of sampling time points through both simulation study and a real data example have been studied. In addition, We consider the imputation of missing values is for spatial-temporal autologistic regression models. Most existing imputation methods are not admissible to impute spatial-temporal missing values, because they can disrupt the inherent structure of the data and lead to a serious bias during the inference or computing efficient issue. Two imputation methods, iteration-KNN imputation and maximum entropy imputation, are proposed, both of them are relatively simple and can yield reasonable results. In summary, the main contributions of this dissertation are the development of a spatial-temporal autologistic regression model with centered parameterization, and proposal of EMPL, MCEML, and Bayesian inference to obtain the estimations of model parameters. Also, iteration-KNN and maximum entropy imputation methods have been presented for spatial-temporal missing data, which generate reliable imputed values with the reasonable efficient imputation time. Autologistic regression models Binary data Imputation Spatial-temporal process Biostatistics Statistical Models
37	Analysis of Spatial Data Zhang, Xiang 01 January 2013 (has links) In many areas of the agriculture, biological, physical and social sciences, spatial lattice data are becoming increasingly common. In addition, a large amount of lattice data shows not only visible spatial pattern but also temporal pattern (see, Zhu et al. 2005). An interesting problem is to develop a model to systematically model the relationship between the response variable and possible explanatory variable, while accounting for space and time effect simultaneously. Spatial-temporal linear model and the corresponding likelihood-based statistical inference are important tools for the analysis of spatial-temporal lattice data. We propose a general asymptotic framework for spatial-temporal linear models and investigate the property of maximum likelihood estimates under such framework. Mild regularity conditions on the spatial-temporal weight matrices will be put in order to derive the asymptotic properties (consistency and asymptotic normality) of maximum likelihood estimates. A simulation study is conducted to examine the finite-sample properties of the maximum likelihood estimates. For spatial data, aside from traditional likelihood-based method, a variety of literature has discussed Bayesian approach to estimate the correlation (auto-covariance function) among spatial data, especially Zheng et al. (2010) proposed a nonparametric Bayesian approach to estimate a spectral density. We will also discuss nonparametric Bayesian approach in analyzing spatial data. We will propose a general procedure for constructing a multivariate Feller prior and establish its theoretical property as a nonparametric prior. A blocked Gibbs sampling algorithm is also proposed for computation since the posterior distribution is analytically manageable. Autoregressive models Spatial-temporal process Multivariate Feller prior Blocked Gibbs sampling Statistical Methodology Statistical Models Statistical Theory
38	Developing An Alternative Way to Analyze NanoString Data Shen, Shu 01 January 2016 (has links) Nanostring technology provides a new method to measure gene expressions. It's more sensitive than microarrays and able to do more gene measurements than RT-PCR with similar sensitivity. This system produces counts for each target gene and tabulates them. Counts can be normalized by using an Excel macro or nSolver before analysis. Both methods rely on data normalization prior to statistical analysis to identify differentially expressed genes. Alternatively, we propose to model gene expressions as a function of positive controls and reference gene measurements. Simulations and examples are used to compare this model with Nanostring normalization methods. The results show that our model is more stable, efficient, and able to control false positive proportions. In addition, we also derive asymptotic properties of a normalized test of control versus treatment. NanoString nCounter data normalization linear model asymptotic properties Statistical Models
39	INFERENCE USING BHATTACHARYYA DISTANCE TO MODEL INTERACTION EFFECTS WHEN THE NUMBER OF PREDICTORS FAR EXCEEDS THE SAMPLE SIZE Janse, Sarah A. 01 January 2017 (has links) In recent years, statistical analyses, algorithms, and modeling of big data have been constrained due to computational complexity. Further, the added complexity of relationships among response and explanatory variables, such as higher-order interaction effects, make identifying predictors using standard statistical techniques difficult. These difficulties are only exacerbated in the case of small sample sizes in some studies. Recent analyses have targeted the identification of interaction effects in big data, but the development of methods to identify higher-order interaction effects has been limited by computational concerns. One recently studied method is the Feasible Solutions Algorithm (FSA), a fast, flexible method that aims to find a set of statistically optimal models via a stochastic search algorithm. Although FSA has shown promise, its current limits include that the user must choose the number of times to run the algorithm. Here, statistical guidance is provided for this number iterations by deriving a lower bound on the probability of obtaining the statistically optimal model in a number of iterations of FSA. Moreover, logistic regression is severely limited when two predictors can perfectly separate the two outcomes. In the case of small sample sizes, this occurs quite often by chance, especially in the case of a large number of predictors. Bhattacharyya distance is proposed as an alternative method to address this limitation. However, little is known about the theoretical properties or distribution of B-distance. Thus, properties and the distribution of this distance measure are derived here. A hypothesis test and confidence interval are developed and tested on both simulated and real data. Bhattacharyya Distance model selection Feasible Solutions Algorithm perfect separation interaction effects logistic regression Statistical Methodology
40	ACCOUNTING FOR MATCHING UNCERTAINTY IN PHOTOGRAPHIC IDENTIFICATION STUDIES OF WILD ANIMALS Ellis, Amanda R. 01 January 2018 (has links) I consider statistical modelling of data gathered by photographic identification in mark-recapture studies and propose a new method that incorporates the inherent uncertainty of photographic identification in the estimation of abundance, survival and recruitment. A hierarchical model is proposed which accepts scores assigned to pairs of photographs by pattern recognition algorithms as data and allows for uncertainty in matching photographs based on these scores. The new models incorporate latent capture histories that are treated as unknown random variables informed by the data, contrasting past models having the capture histories being fixed. The methods properly account for uncertainty in the matching process and avoid the need for researchers to confirm matches visually, which may be a time consuming and error prone process. Through simulation and application to data obtained from a photographic identification study of whale sharks I show that the proposed method produces estimates that are similar to when the true matching nature of the photographic pairs is known. I then extend the method to incorporate auxiliary information to predetermine matches and non-matches between pairs of photographs in order to reduce computation time when fitting the model. Additionally, methods previously applied to record linkage problems in survey statistics are borrowed to predetermine matches and non-matches based on scores that are deemed extreme. I fit the new models in the Bayesian paradigm via Markov Chain Monte Carlo and custom code that is available by request. Mark-Recapture Photographic Identification Bayesian Analysis Hierarchical Model Applied Statistics Statistical Models

Search results