Global ETD Search

1	Aspects of Composite Likelihood Estimation And Prediction Xu, Ximing 08 January 2013 (has links) A composite likelihood is usually constructed by multiplying a collection of lower dimensional marginal or conditional densities. In recent years, composite likelihood methods have received increasing interest for modeling complex data arising from various application areas, where the full likelihood function is analytically unknown or computationally prohibitive due to the structure of dependence, the dimension of data or the presence of nuisance parameters. In this thesis we investigate some theoretical properties of the maximum composite likelihood estimator (MCLE). In particular, we obtain the limit of the MCLE in a general setting, and set out a framework for understanding the notion of robustness in the context of composite likelihood inference. We also study the improvement of the efficiency of a composite likelihood by incorporating additional component likelihoods, or by using component likelihoods with higher dimension. We show through some illustrative examples that such strategies do not always work and may impair the efficiency. We also show that the MCLE of the parameter of interest can be less efficient when the nuisance parameters are known than when they are unknown. In addition to the theoretical study on composite likelihood estimation, we also explore the possibility of using composite likelihood to make predictive inference in computer experiments. The Gaussian process model is widely used to build statistical emulators for computer experiments. However, when the number of trials is large, both estimation and prediction based on a Gaussian process can be computationally intractable due to the dimension of the covariance matrix. To address this problem, we propose prediction methods based on different composite likelihood functions, which do not require the evaluation of the large covariance matrix and hence alleviate the computational burden. Simulation studies show that the blockwise composite likelihood-based predictors perform well and are competitive with the optimal predictor based on the full likelihood. Statistics Composite likelihood 0463
2	Aspects of Composite Likelihood Estimation And Prediction Xu, Ximing 08 January 2013 (has links) A composite likelihood is usually constructed by multiplying a collection of lower dimensional marginal or conditional densities. In recent years, composite likelihood methods have received increasing interest for modeling complex data arising from various application areas, where the full likelihood function is analytically unknown or computationally prohibitive due to the structure of dependence, the dimension of data or the presence of nuisance parameters. In this thesis we investigate some theoretical properties of the maximum composite likelihood estimator (MCLE). In particular, we obtain the limit of the MCLE in a general setting, and set out a framework for understanding the notion of robustness in the context of composite likelihood inference. We also study the improvement of the efficiency of a composite likelihood by incorporating additional component likelihoods, or by using component likelihoods with higher dimension. We show through some illustrative examples that such strategies do not always work and may impair the efficiency. We also show that the MCLE of the parameter of interest can be less efficient when the nuisance parameters are known than when they are unknown. In addition to the theoretical study on composite likelihood estimation, we also explore the possibility of using composite likelihood to make predictive inference in computer experiments. The Gaussian process model is widely used to build statistical emulators for computer experiments. However, when the number of trials is large, both estimation and prediction based on a Gaussian process can be computationally intractable due to the dimension of the covariance matrix. To address this problem, we propose prediction methods based on different composite likelihood functions, which do not require the evaluation of the large covariance matrix and hence alleviate the computational burden. Simulation studies show that the blockwise composite likelihood-based predictors perform well and are competitive with the optimal predictor based on the full likelihood. Statistics Composite likelihood 0463
3	Longitudinal Data Analysis with Composite Likelihood Methods Li, Haocheng January 2012 (has links) Longitudinal data arise commonly in many fields including public health studies and survey sampling. Valid inference methods for longitudinal data are of great importance in scientific researches. In longitudinal studies, data collection are often designed to follow all the interested information on individuals at scheduled times. The analysis in longitudinal studies usually focuses on how the data change over time and how they are associated with certain risk factors or covariates. Various statistical models and methods have been developed over the past few decades. However, these methods could become invalid when data possess additional features. First of all, incompleteness of data presents considerable complications to standard modeling and inference methods. Although we hope each individual completes all of the scheduled measurements without any absence, missing observations occur commonly in longitudinal studies. It has been documented that biased results could arise if such a feature is not properly accounted for in the analysis. There has been a large body of methods in the literature on handling missingness arising either from response components or covariate variables, but relatively little attention has been directed to addressing missingness in both response and covariate variables simultaneously. Important reasons for the sparsity of the research on this topic may be attributed to substantially increased complexity of modeling and computational difficulties. In Chapter 2 and Chapter 3 of the thesis, I develop methods to handle incomplete longitudinal data using the pairwise likelihood formulation. The proposed methods can handle longitudinal data with missing observations in both response and covariate variables. A unified framework is invoked to accommodate various types of missing data patterns. The performance of the proposed methods is carefully assessed under a variety of circumstances. In particular, issues on efficiency and robustness are investigated. Longitudinal survey data from the National Population Health Study are analyzed with the proposed methods. The other difficulty in longitudinal data is model selection. Incorporating a large number of irrelevant covariates to the model may result in computation, interpretation and prediction difficulties, thus selecting parsimonious models are typically desirable. In particular, the penalized likelihood method is commonly employed for this purpose. However, when we apply the penalized likelihood approach in longitudinal studies, it may involve high dimensional integrals which are computationally expensive. We propose an alternative method using the composite likelihood formulation. Formulation of composite likelihood requires only a partial structure of the correlated data such as marginal or pairwise distributions. This strategy shows modeling tractability and computational cheapness in model selection. Therefore, in Chapter 4 of this thesis, I propose a composite likelihood approach with penalized function to handle the model selection issue. In practice, we often face the model selection problem not only from choosing proper covariates for regression predictor, but also from the component of random effects. Furthermore, the specification of random effects distribution could be crucial to maintain the validity of statistical inference. Thus, the discussion on selecting both covariates and random effects as well as misspecification of random effects are also included in Chapter 4. Chapter 5 of this thesis mainly addresses the joint features of missingness and model selection. I propose a specific composite likelihood method to handle this issue. A typical advantage of the approach is that the inference procedure does not involve explicit missing process assumptions and nuisance parameters estimation. Longitudinal Data Composite Likelihood Statistics (Biostatistics)
4	Longitudinal Data Analysis with Composite Likelihood Methods Li, Haocheng January 2012 (has links) Longitudinal data arise commonly in many fields including public health studies and survey sampling. Valid inference methods for longitudinal data are of great importance in scientific researches. In longitudinal studies, data collection are often designed to follow all the interested information on individuals at scheduled times. The analysis in longitudinal studies usually focuses on how the data change over time and how they are associated with certain risk factors or covariates. Various statistical models and methods have been developed over the past few decades. However, these methods could become invalid when data possess additional features. First of all, incompleteness of data presents considerable complications to standard modeling and inference methods. Although we hope each individual completes all of the scheduled measurements without any absence, missing observations occur commonly in longitudinal studies. It has been documented that biased results could arise if such a feature is not properly accounted for in the analysis. There has been a large body of methods in the literature on handling missingness arising either from response components or covariate variables, but relatively little attention has been directed to addressing missingness in both response and covariate variables simultaneously. Important reasons for the sparsity of the research on this topic may be attributed to substantially increased complexity of modeling and computational difficulties. In Chapter 2 and Chapter 3 of the thesis, I develop methods to handle incomplete longitudinal data using the pairwise likelihood formulation. The proposed methods can handle longitudinal data with missing observations in both response and covariate variables. A unified framework is invoked to accommodate various types of missing data patterns. The performance of the proposed methods is carefully assessed under a variety of circumstances. In particular, issues on efficiency and robustness are investigated. Longitudinal survey data from the National Population Health Study are analyzed with the proposed methods. The other difficulty in longitudinal data is model selection. Incorporating a large number of irrelevant covariates to the model may result in computation, interpretation and prediction difficulties, thus selecting parsimonious models are typically desirable. In particular, the penalized likelihood method is commonly employed for this purpose. However, when we apply the penalized likelihood approach in longitudinal studies, it may involve high dimensional integrals which are computationally expensive. We propose an alternative method using the composite likelihood formulation. Formulation of composite likelihood requires only a partial structure of the correlated data such as marginal or pairwise distributions. This strategy shows modeling tractability and computational cheapness in model selection. Therefore, in Chapter 4 of this thesis, I propose a composite likelihood approach with penalized function to handle the model selection issue. In practice, we often face the model selection problem not only from choosing proper covariates for regression predictor, but also from the component of random effects. Furthermore, the specification of random effects distribution could be crucial to maintain the validity of statistical inference. Thus, the discussion on selecting both covariates and random effects as well as misspecification of random effects are also included in Chapter 4. Chapter 5 of this thesis mainly addresses the joint features of missingness and model selection. I propose a specific composite likelihood method to handle this issue. A typical advantage of the approach is that the inference procedure does not involve explicit missing process assumptions and nuisance parameters estimation. Longitudinal Data Composite Likelihood Statistics (Biostatistics)
5	Copula Models for Multi-type Life History Processes Diao, Liqun January 2013 (has links) This thesis considers statistical issues in the analysis of data in the studies of chronic diseases which involve modeling dependencies between life history processes using copula functions. Many disease processes feature recurrent events which represent events arising from an underlying chronic condition; these are often modeled as point processes. In addition, however, there often exists a random variable which is realized upon the occurrence of each event, which is called a mark of the point process. When considered together, such processes are called marked point processes. A novel copula model for the marked point process is described here which uses copula functions to govern the association between marks and event times. Specifically, a copula function is used to link each mark with the next event time following the realization of that mark to reflect the pattern in the data wherein larger marks are often followed by longer time to the next event. The extent of organ damage in an individual can often be characterized by ordered states, and interest frequently lies in modeling the rates at which individuals progress through these states. Risk factors can be studied and the effect of therapeutic interventions can be assessed based on relevant multistate models. When chronic diseases affect multiple organ systems, joint modeling of progression in several organ systems is also important. In contrast to common intensity-based or frailty-based approaches to modelling, this thesis considers a copula-based framework for modeling and analysis. Through decomposition of the density and by use of conditional independence assumptions, an appealing joint model is obtained by assuming that the joint survival function of absorption transition times is governed by a multivariate copula function. Different approaches to estimation and inference are discussed and compared including composite likelihood and two-stage estimation methods. Special attention is paid to the case of interval-censored data arising from intermittent assessment. Attention is also directed to use of copula models for more general scenarios with a focus on semiparametric two-stage estimation procedures. In this approach nonparametric or semiparametric estimates of the marginal survivor functions are obtained in the first stage and estimates of the association parameters are obtained in the second stage. Bivariate failure time models are considered for data under right-censoring and current status observation schemes, and right-censored multistate models. A new expression for the asymptotic variance of the second-stage estimator for the association parameter along with a way of estimating this for finite samples are presented under these models and observation schemes. Copula Lifetime data Composite likelihood Multistage estimation procedure Statistics (Biostatistics)
6	Copula Models for Multi-type Life History Processes Diao, Liqun January 2013 (has links) This thesis considers statistical issues in the analysis of data in the studies of chronic diseases which involve modeling dependencies between life history processes using copula functions. Many disease processes feature recurrent events which represent events arising from an underlying chronic condition; these are often modeled as point processes. In addition, however, there often exists a random variable which is realized upon the occurrence of each event, which is called a mark of the point process. When considered together, such processes are called marked point processes. A novel copula model for the marked point process is described here which uses copula functions to govern the association between marks and event times. Specifically, a copula function is used to link each mark with the next event time following the realization of that mark to reflect the pattern in the data wherein larger marks are often followed by longer time to the next event. The extent of organ damage in an individual can often be characterized by ordered states, and interest frequently lies in modeling the rates at which individuals progress through these states. Risk factors can be studied and the effect of therapeutic interventions can be assessed based on relevant multistate models. When chronic diseases affect multiple organ systems, joint modeling of progression in several organ systems is also important. In contrast to common intensity-based or frailty-based approaches to modelling, this thesis considers a copula-based framework for modeling and analysis. Through decomposition of the density and by use of conditional independence assumptions, an appealing joint model is obtained by assuming that the joint survival function of absorption transition times is governed by a multivariate copula function. Different approaches to estimation and inference are discussed and compared including composite likelihood and two-stage estimation methods. Special attention is paid to the case of interval-censored data arising from intermittent assessment. Attention is also directed to use of copula models for more general scenarios with a focus on semiparametric two-stage estimation procedures. In this approach nonparametric or semiparametric estimates of the marginal survivor functions are obtained in the first stage and estimates of the association parameters are obtained in the second stage. Bivariate failure time models are considered for data under right-censoring and current status observation schemes, and right-censored multistate models. A new expression for the asymptotic variance of the second-stage estimator for the association parameter along with a way of estimating this for finite samples are presented under these models and observation schemes. Copula Lifetime data Composite likelihood Multistage estimation procedure Statistics (Biostatistics)
7	Essays in panel data and financial econometrics Pakel, Cavit January 2012 (has links) This thesis is concerned with volatility estimation using financial panels and bias-reduction in non-linear dynamic panels in the presence of dependence. Traditional GARCH-type volatility models require large time-series for accurate estimation. This makes it impossible to analyse some interesting datasets which do not have a large enough history of observations. This study contributes to the literature by introducing the GARCH Panel model, which exploits both time-series and cross-section information, in order to make up for this lack of time-series variation. It is shown that this approach leads to gains both in- and out-of-sample, but suffers from the well-known incidental parameter issue and therefore, cannot deal with short data either. As a response, a bias-correction approach valid for a general variety of models beyond GARCH is proposed. This extends the analytical bias-reduction literature to cross-section dependence and is a theoretical contribution to the panel data literature. In the final chapter, these two contributions are combined in order to develop a new approach to volatility estimation in short panels. Simulation analysis reveals that this approach is capable of removing a substantial portion of the bias even when only 150-200 observations are available. This is in stark contrast with the standard methods which require 1,000-1,500 observations for accurate estimation. This approach is used to model monthly hedge fund volatility, which is another novel contribution, as it has hitherto been impossible to analyse hedge fund volatility, due to their typically short histories. The analysis reveals that hedge funds exhibit variation in their volatility characteristics both across and within investment strategies. Moreover, the sample distributions of fund volatilities are asymmetric, have large right tails and react to major economic events such as the recent credit crunch episode. 330.015195
8	Aspects of Composite Likelihood Inference Jin, Zi 07 March 2011 (has links) A composite likelihood consists of a combination of valid likelihood objects, and in particular it is of typical interest to adopt lower dimensional marginal likelihoods. Composite marginal likelihood appears to be an attractive alternative for modeling complex data, and has received increasing attention in handling high dimensional data sets when the joint distribution is computationally difficult to evaluate, or intractable due to complex structure of dependence. We present some aspects of methodological development in composite likelihood inference. The resulting estimator enjoys desirable asymptotic properties such as consistency and asymptotic normality. Composite likelihood based test statistics and their asymptotic distributions are summarized. Higher order asymptotic properties of the signed composite likelihood root statistic are explored. Moreover, we aim to compare accuracy and efficiency of composite likelihood estimation relative to estimation based on ordinary likelihood. Analytical and simulation results are presented for different models, which include multivariate normal distributions, times series model, and correlated binary data. Composite likelihood inference Pairwise likelihood Asymptotic relative efficiency Higher-order asymptotics 0463
9	Aspects of Composite Likelihood Inference Jin, Zi 07 March 2011 (has links) A composite likelihood consists of a combination of valid likelihood objects, and in particular it is of typical interest to adopt lower dimensional marginal likelihoods. Composite marginal likelihood appears to be an attractive alternative for modeling complex data, and has received increasing attention in handling high dimensional data sets when the joint distribution is computationally difficult to evaluate, or intractable due to complex structure of dependence. We present some aspects of methodological development in composite likelihood inference. The resulting estimator enjoys desirable asymptotic properties such as consistency and asymptotic normality. Composite likelihood based test statistics and their asymptotic distributions are summarized. Higher order asymptotic properties of the signed composite likelihood root statistic are explored. Moreover, we aim to compare accuracy and efficiency of composite likelihood estimation relative to estimation based on ordinary likelihood. Analytical and simulation results are presented for different models, which include multivariate normal distributions, times series model, and correlated binary data. Composite likelihood inference Pairwise likelihood Asymptotic relative efficiency Higher-order asymptotics 0463
10	Bayesian analysis for time series of count data 2014 July 1900 (has links) Time series involving count data are present in a wide variety of applications. In many applications, the observed counts are usually small and dependent. Failure to take these facts into account can lead to misleading inferences and may detect false relationships. To tackle such issues, a Poisson parameter-driven model is assumed for the time series at hand. This model can account for the time dependence between observations through introducing an autoregressive latent process. In this thesis, we consider Bayesian approaches for estimating the Poisson parameter-driven model. The main challenge is that the likelihood function for the observed counts involves a high dimensional integral after integrating out the latent variables. The main contributions of this thesis are threefold. First, I develop a new single-move (SM) Markov chain Monte Carlo (MCMC) method to sample the latent variables one by one. Second, I adopt the idea of the particle Gibbs sampler (PGS) method \citep{andrieu} into our model setting and compare its performance with the SM method. Third, I consider Bayesian composite likelihood methods and compare three different adjustment methods with the unadjusted method and the SM method. The comparisons provide a practical guide to what method to use. We conduct simulation studies to compare the latter two methods with the SM method. We conclude that the SM method outperforms the PGS method for small sample size, while they perform almost the same for large sample size. However, the SM method is much faster than the PGS method. The adjusted Bayesian composite methods provide closer results to the SM than the unadjusted one. The PGS and the selected adjustment method from simulation studies are compared with the SM method via a real data example. Similar results are obtained: first, the PGS method provides results very close to those of the SM method. Second, the adjusted composite likelihood methods provide closer results to the SM than the unadjusted one. Time series count data Poisson parameter-driven model composite likelihood particle Gibbs sampler car crashes

Search results