31 |
ESTIMATION IN PARTIALLY LINEAR MODELS WITH CORRELATED OBSERVATIONS AND CHANGE-POINT MODELSFan, Liangdong 01 January 2018 (has links)
Methods of estimating parametric and nonparametric components, as well as properties of the corresponding estimators, have been examined in partially linear models by Wahba [1987], Green et al. [1985], Engle et al. [1986], Speckman [1988], Hu et al. [2004], Charnigo et al. [2015] among others. These models are appealing due to their flexibility and wide range of practical applications including the electricity usage study by Engle et al. [1986], gum disease study by Speckman [1988], etc., wherea parametric component explains linear trends and a nonparametric part captures nonlinear relationships.
The compound estimator (Charnigo et al. [2015]) has been used to estimate the nonparametric component of such a model with multiple covariates, in conjunction with linear mixed modeling for the parametric component. These authors showed, under a strict orthogonality condition, that parametric and nonparametric component estimators could achieve what appear to be (nearly) optimal rates, even in the presence of subject-specific random effects.
We continue with research on partially linear models with subject-specific random intercepts. Inspired by Speckman [1988], we propose estimators of both parametric and nonparametric components of a partially linear model, where consistency is achievable under an orthogonality condition. We also examine a scenario without orthogonality to find that bias could still exist asymptotically. The random intercepts accommodate analysis of individuals on whom repeated measures are taken. We illustrate our estimators in a biomedical case study and assess their finite-sample performance in simulation studies.
Jump points have often been found within the domain of nonparametric models (Muller [1992], Loader [1996] and Gijbels et al. [1999]), which may lead to a poor fit when falsely assuming the underlying mean response is continuous. We study a specific type of change-point where the underlying mean response is continuous on both left and right sides of the change-point. We identify the convergence rate of the estimator proposed in Liu [2017] and illustrate the result in simulation studies.
|
32 |
A Flexible Zero-Inflated Poisson Regression ModelRoemmele, Eric S. 01 January 2019 (has links)
A practical problem often encountered with observed count data is the presence of excess zeros. Zero-inflation in count data can easily be handled by zero-inflated models, which is a two-component mixture of a point mass at zero and a discrete distribution for the count data. In the presence of predictors, zero-inflated Poisson (ZIP) regression models are, perhaps, the most commonly used. However, the fully parametric ZIP regression model could sometimes be restrictive, especially with respect to the mixing proportions. Taking inspiration from some of the recent literature on semiparametric mixtures of regressions models for flexible mixture modeling, we propose a semiparametric ZIP regression model. We present an "EM-like" algorithm for estimation and a summary of asymptotic properties of the estimators. The proposed semiparametric models are then applied to a data set involving clandestine methamphetamine laboratories and Alzheimer's disease.
|
33 |
Three essays on the labor marketKharbanda, Varun 01 May 2014 (has links)
Using a three-essay approach, I focus on two issues related to the labor market: the effect of changes in regulatory costs on informal sector employment, and the role of endogeneity in the relationship between education and earnings.
In the first essay, I analyze the implications of regulatory costs on skill-based wage differences and informal sector employment. I use a two sector matching model with exogenous skill types for workers where firms have sector-specific costs and workers have sector-specific bargaining power. In general, there are multiple equilibria possible for this model. I focus on the equilibrium that best resembles the situation in the developing countries of sub-Saharan Africa and southern Asia. My results show that government policies which reduce regulatory costs decrease unemployment, earnings inequality, and the fraction of skilled workers in the informal sector. The different types of regulatory costs affect the skill premium differently and non-monotonically.
In the second essay, I test the hypothesis of linearity in returns to education in the Mincer regression with endogenous schooling and earnings. I estimate the marginal rate of return to education using a polynomial model and a semiparametric partial linear model based on the standard Mincer regression. To perform the analysis, I use a control function approach for IV estimation with spousal and parental education as instruments. Results suggest that estimates not accounting for endogeneity understate returns at the tails of the education spectrum and overstate returns for education levels between middle-school and college.
In the third essay, I empirically test the claim of Mookherjee and Ray (2010), based on a theoretical model of skill complexity, that "the return to human capital is endogenously nonconcave." I estimate the functional form of returns to education for India using a semiparametric partial linear model based on the standard Mincer regression. Marginal returns are estimated to test the nonconcavity of the functional form under both exogenous and endogenous schooling assumptions. My results show that the marginal rate of return declines during primary education and increases until high school, followed by stable returns for college and higher studies. However, the test of robustness of the functional form based on uniform confidence bands fails to reject the presence of nonconcavity in returns to education for India. This lends support to the claim of Mookherjee and Ray (2010).
|
34 |
Selection of smoothing parameters with application in causal inferenceHäggström, Jenny January 2011 (has links)
This thesis is a contribution to the research area concerned with selection of smoothing parameters in the framework of nonparametric and semiparametric regression. Selection of smoothing parameters is one of the most important issues in this framework and the choice can heavily influence subsequent results. A nonparametric or semiparametric approach is often desirable when large datasets are available since this allow us to make fewer and weaker assumptions as opposed to what is needed in a parametric approach. In the first paper we consider smoothing parameter selection in nonparametric regression when the purpose is to accurately predict future or unobserved data. We study the use of accumulated prediction errors and make comparisons to leave-one-out cross-validation which is widely used by practitioners. In the second paper a general semiparametric additive model is considered and the focus is on selection of smoothing parameters when optimal estimation of some specific parameter is of interest. We introduce a double smoothing estimator of a mean squared error and propose to select smoothing parameters by minimizing this estimator. Our approach is compared with existing methods.The third paper is concerned with the selection of smoothing parameters optimal for estimating average treatment effects defined within the potential outcome framework. For this estimation problem we propose double smoothing methods similar to the method proposed in the second paper. Theoretical properties of the proposed methods are derived and comparisons with existing methods are made by simulations.In the last paper we apply our results from the third paper by using a double smoothing method for selecting smoothing parameters when estimating average treatment effects on the treated. We estimate the effect on BMI of divorcing in middle age. Rich data on socioeconomic conditions, health and lifestyle from Swedish longitudinal registers is used.
|
35 |
Study and validation of data structures with missing values. Application to survival analysisSerrat i Piè, Carles 21 May 2001 (has links)
En aquest treball tractem tres metodologies diferents -no paramètrica, paramètrica i semiparamètrica- per tal de considerar els patrons de dades amb valors no observats en un context d'anàlisi de la supervivència. Les dues primeres metodologies han estat desenvolupades sota les hipòtesis de MCAR (Missing Completely at Random) o MAR (Missing at Random). Primer, hem utilitzat el mètode de remostreig de bootstrap i un esquema d'imputació basat en un model bilineal en la matriu de dades per tal d'inferir sobre la distribució dels paràmetres d'interès. Per una altra banda, hem analitzat els inconvenients a l'hora d'obtenir inferències correctes quan es tracta el problema de forma totalment paramètrica, a la vegada que hem proposat algunes estratègies per tenir en compte la informació complementària que poden proporcionar altres covariants completament observades.De tota manera, en general no es pot suposar la ignorabilitat del mecanisme de no resposta. Aleshores, ens proposem desenvolupar un mètode semiparamètric per a l'anàlisi de la supervivència quan tenim un patró de no resposta no ignorable. Primer, proposem l'estimador de Kaplan-Meier Agrupat (GKM) com una alternativa a l'estimador KM estàndard per tal d'estimar la supervivència en un nombre finit de temps fixats. De tota manera, quan les covariants són parcialment observades ni l'estimador GKM estratificat ni l'estimador KM estratificat poden ser calculats directament a partir de la mostra. Aleshores, proposem una classe d'equacions d'estimació per tal d'obtenir estimadors semiparamètrics de les probabilitats i substituïm aquestes estimacions en l'estimador GKM estratificat. Ens referim a aquest nou estimador com l'estimador Kaplan-Meier Agrupat-Estimat (EGKM). Demostrem que els estimadors GKM i EGKM són arrel quadrada consistents i que asimptòticament segueixen una distribució normal multivariant, a la vegada que obtenim estimadors consistents per a la matriu de variància-covariància límit. L'avantatge de l'estimador EGKM és que proporciona estimacions no esbiaixades de la supervivència i permet utilitzar un model de selecció flexible per a les probabilitats de no resposta. Il·lustrem el mètode amb una aplicació a una cohort de pacients amb Tuberculosi i infectats pel VIH. Al final de l'aplicació, duem a terme una anàlisi de sensibilitat que inclou tots els tipus de patrons de no resposta, des de MCAR fins a no ignorable, i que permet que l'analista pugui obtenir conclusions després d'analitzar tots els escenaris plausibles i d'avaluar l'impacte que tenen les suposicions en el mecanisme no ignorable de no resposta sobre les inferències resultants.Acabem l'enfoc semiparamètric explorant el comportament de l'estimador EGKM per a mostres finites. Per fer-ho, duem a terme un estudi de simulació. Les simulacions, sota escenaris que tenen en compte diferents nivells de censura, de patrons de no resposta i de grandàries mostrals, il·lustren les bones propietats que té l'estimador que proposem. Per exemple, les probabilitats de cobertura tendeixen a les nominals quan el patró de no resposta fet servir en l'anàlisi és proper al vertader patró de no resposta que ha generat les dades. En particular, l'estimador és eficient en el cas menys informatiu dels considerats: aproximadament un 80% de censura i un 50% de dades no observades. / In this work we have approached three different methodologies --nonparametric, parametric and semiparametric-- to deal with data patterns with missing values in a survival analysis context. The first two approaches have been developed under the assumption that the investigator has enough information and can assume that the non-response mechanism is MCAR or MAR. In this situation, we have adapted a bootstrap and bilinear multiple imputation scheme to draw the distribution of the parameters of interest. On the other hand, we have analyzed the drawbacks encountered to get correct inferences, as well as, we have proposed some strategies to take into account the information provided by other fully observed covariates.However, in many situations it is impossible to assume the ignorability of the non-response probabilities. Then, we focus our interest in developing a method for survival analysis when we have a non-ignorable non-response pattern, using a semiparametric perspective. First, for right censored samples with completely observed covariates, we propose the Grouped Kaplan-Meier estimator (GKM) as an alternative to the standard KM estimator when we are interested in the survival at a finite number of fixed times of interest. However, when the covariates are partially observed, neither the stratified GKM estimator, nor the stratified KM estimator can be directly computed from the sample. Henceforth, we propose a class of estimating equations to obtain semiparametric estimates for these probabilities and then we substitute these estimates in the stratified GKM estimator. We refer to this new estimation procedure as Estimated Grouped Kaplan-Meier estimator (EGKM). We prove that the GKM and EGKM estimators are squared root consistent and asymptotically normal distributed, and a consistent estimator for their limiting variances is derived. The advantage of the EGKM estimator is that provides asymptotically unbiased estimates for the survival under a flexible selection model for the non-response probability pattern. We illustrate the method with a cohort of HIV-infected with Tuberculosis patients. At the end of the application, a sensitivity analysis that includes all types of non-response pattern, from MCAR to non-ignorable, allows the investigator to draw conclusions after analyzing all the plausible scenarios and evaluating the impact on the resulting inferences of the non-ignorable assumptions in the non-response mechanism.We close the semiparametric approach by exploring the behaviour of the EGKM estimator for finite samples. In order to do that, a simulation study is carried out. Simulations performed under scenarios taking into account different levels of censoring, non-response probability patterns and sample sizes show the good properties of the proposed estimator. For instance, the empirical coverage probabilities tend to the nominal ones when the non-response pattern used in the analysis is close to the true non-response pattern that generated the data. In particular, it is specially efficient in the less informative scenarios (e,g, around a 80% of censoring and a 50% of missing data).
|
36 |
Estimation and Goodness of Fit for Multivariate Survival Models Based on CopulasYilmaz, Yildiz Elif 11 August 2009 (has links)
We provide ways to test the fit of a parametric copula family for bivariate censored data with or without covariates. The proposed copula family is tested by embedding it in an expanded parametric family of copulas. When parameters in the proposed and the expanded copula models are estimated by maximum likelihood, a likelihood ratio test can
be used. However, when they are estimated by two-stage pseudolikelihood estimation, the corresponding test is a pseudolikelihood ratio test. The two-stage procedures offer less computation, which is especially attractive when the marginal lifetime distributions are specified nonparametrically or semiparametrically. It is shown that the likelihood ratio test is consistent even when the expanded model is misspecified. Power comparisons of the
likelihood ratio and the pseudolikelihood ratio tests with some other goodness-of-fit tests are performed both when the expanded family is correct and when it is misspecified. They
indicate that model expansion provides a convenient, powerful and robust approach.
We introduce a semiparametric maximum likelihood estimation method in which the
copula parameter is estimated without assumptions on the marginal distributions. This method and the two-stage semiparametric estimation method suggested by Shih and Louis (1995) are generalized to regression models with Cox proportional hazards margins. The two-stage semiparametric estimator of the copula parameter is found to be about as good
as the semiparametric maximum likelihood estimator. Semiparametric likelihood ratio
and pseudolikelihood ratio tests are considered to provide goodness of fit tests for a copula model without making parametric assumptions for the marginal distributions. Both when the expanded family is correct and when it is misspecified, the semiparametric pseudolikelihood ratio test is almost as powerful as the parametric likelihood ratio and pseudolikelihood ratio tests while achieving robustness to the form of the marginal distributions. The methods are illustrated on applications in medicine and insurance.
Sequentially observed survival times are of interest in many studies but there are difficulties in modeling and analyzing such data. First, when the duration of followup is limited and the times for a given individual are not independent, the problem of induced dependent censoring arises for the second and subsequent survival times. Non-identifiability of the marginal survival distributions for second and later times is another issue, since they are
observable only if preceding survival times for an individual are uncensored. In addition, in some studies, a significant proportion of individuals may never have the first event. Fully parametric models can deal with these features, but lack of robustness is a concern, and methods of assessing fit are lacking. We introduce an approach to address these issues. We model the joint distribution of the successive survival times by using copula functions,
and provide semiparametric estimation procedures in which copula parameters are estimated without parametric assumptions on the marginal distributions. The performance
of semiparametric estimation methods is compared with some other estimation methods in simulation studies and shown to be good. The methodology is applied to a motivating
example involving relapse and survival following colon cancer treatment.
|
37 |
Moment Problems with Applications to Value-At-Risk and Portfolio ManagementTian, Ruilin 07 May 2008 (has links)
Moment Problems with Applications to Value-At-Risk and Portfolio Management By Ruilin Tian May 2008 Committee Chair: Dr. Samuel H. Cox Major Department: Risk Management and Insurance My dissertation provides new applications of moment theory and optimization to financial and insurance risk management. In the investment and managerial areas, one often needs to determine some measure of risk, especially the risk of extreme events. However, complete information of the underlying outcomes is usually unavailable; instead one has access to partial information such as the mean, variance, mode, or range. In Chapters 2 and 3, we find the semiparametric upper and lower bounds for the value-at-risk (VaR) with incomplete information, that is, moments of the underlying distribution. When a single variable is concerned, bounds on VaR are computed to obtain a 100% confidence interval. When the sample financial data have a global maximum, we show that unimodal assumption tightens the optimal bounds. Next we further analyze a function of two correlated random variables. Specifically, we find bounds on the probability of two joint extreme events. When three or more variables are involved, the multivariate problem can sometimes be converted to a single variable problem. In all cases, we use the physical measure rather than the commonly used equivalent pricing probability measure. In addition to solving these problems using the traditional approach based on the geometry of a moment problem, a more efficient method is proposed to solve a general class of moment bounds via semidefinite programming. In the last part of the thesis, we apply optimization techniques to improve financial portfolio risk management. Instead of considering VaR, we work with a coherent risk measure, the conditional VaR (CVaR). As an extension of Krokhmal et al. (2002), we impose CVaR-related functions to the portfolio selection problem. The CVaR approach sets a β-level CVaR as the objective function and maximizes the worst case on the tail of the distribution. The CVaR-like constraints approach adds a set of CVaR-like constraints to the traditional Markowitz problem, reshaping the portfolio distribution. Both methods greatly increase the skewness of portfolios, although the CVaR approach may lose control of the variance. This capability of increasing skewness is very attractive to the investors who may prefer higher probability of obtaining higher returns. We compare the CVaR-related approaches to some other popular portfolio optimization methods. Our numerical analysis provides empirical support for the superiority of the CVaR-like constraints approach in terms of portfolio efficiency.
|
38 |
Estimation and Goodness of Fit for Multivariate Survival Models Based on CopulasYilmaz, Yildiz Elif 11 August 2009 (has links)
We provide ways to test the fit of a parametric copula family for bivariate censored data with or without covariates. The proposed copula family is tested by embedding it in an expanded parametric family of copulas. When parameters in the proposed and the expanded copula models are estimated by maximum likelihood, a likelihood ratio test can
be used. However, when they are estimated by two-stage pseudolikelihood estimation, the corresponding test is a pseudolikelihood ratio test. The two-stage procedures offer less computation, which is especially attractive when the marginal lifetime distributions are specified nonparametrically or semiparametrically. It is shown that the likelihood ratio test is consistent even when the expanded model is misspecified. Power comparisons of the
likelihood ratio and the pseudolikelihood ratio tests with some other goodness-of-fit tests are performed both when the expanded family is correct and when it is misspecified. They
indicate that model expansion provides a convenient, powerful and robust approach.
We introduce a semiparametric maximum likelihood estimation method in which the
copula parameter is estimated without assumptions on the marginal distributions. This method and the two-stage semiparametric estimation method suggested by Shih and Louis (1995) are generalized to regression models with Cox proportional hazards margins. The two-stage semiparametric estimator of the copula parameter is found to be about as good
as the semiparametric maximum likelihood estimator. Semiparametric likelihood ratio
and pseudolikelihood ratio tests are considered to provide goodness of fit tests for a copula model without making parametric assumptions for the marginal distributions. Both when the expanded family is correct and when it is misspecified, the semiparametric pseudolikelihood ratio test is almost as powerful as the parametric likelihood ratio and pseudolikelihood ratio tests while achieving robustness to the form of the marginal distributions. The methods are illustrated on applications in medicine and insurance.
Sequentially observed survival times are of interest in many studies but there are difficulties in modeling and analyzing such data. First, when the duration of followup is limited and the times for a given individual are not independent, the problem of induced dependent censoring arises for the second and subsequent survival times. Non-identifiability of the marginal survival distributions for second and later times is another issue, since they are
observable only if preceding survival times for an individual are uncensored. In addition, in some studies, a significant proportion of individuals may never have the first event. Fully parametric models can deal with these features, but lack of robustness is a concern, and methods of assessing fit are lacking. We introduce an approach to address these issues. We model the joint distribution of the successive survival times by using copula functions,
and provide semiparametric estimation procedures in which copula parameters are estimated without parametric assumptions on the marginal distributions. The performance
of semiparametric estimation methods is compared with some other estimation methods in simulation studies and shown to be good. The methodology is applied to a motivating
example involving relapse and survival following colon cancer treatment.
|
39 |
Bayesian Semiparametric Models for Heterogeneous Cross-platform Differential Gene ExpressionDhavala, Soma Sekhar 2010 December 1900 (has links)
We are concerned with testing for differential expression and consider three different
aspects of such testing procedures. First, we develop an exact ANOVA type
model for discrete gene expression data, produced by technologies such as a Massively
Parallel Signature Sequencing (MPSS), Serial Analysis of Gene Expression (SAGE)
or other next generation sequencing technologies. We adopt two Bayesian hierarchical
models—one parametric and the other semiparametric with a Dirichlet process
prior that has the ability to borrow strength across related signatures, where a signature
is a specific arrangement of the nucleotides. We utilize the discreteness of the
Dirichlet process prior to cluster signatures that exhibit similar differential expression
profiles. Tests for differential expression are carried out using non-parametric
approaches, while controlling the false discovery rate. Next, we consider ways to
combine expression data from different studies, possibly produced by different technologies
resulting in mixed type responses, such as Microarrays and MPSS. Depending
on the technology, the expression data can be continuous or discrete and can have different
technology dependent noise characteristics. Adding to the difficulty, genes can
have an arbitrary correlation structure both within and across studies. Performing
several hypothesis tests for differential expression could also lead to false discoveries.
We propose to address all the above challenges using a Hierarchical Dirichlet process
with a spike-and-slab base prior on the random effects, while smoothing splines model the unknown link functions that map different technology dependent manifestations
to latent processes upon which inference is based. Finally, we propose an algorithm
for controlling different error measures in a Bayesian multiple testing under generic
loss functions, including the widely used uniform loss function. We do not make
any specific assumptions about the underlying probability model but require that
indicator variables for the individual hypotheses are available as a component of the
inference. Given this information, we recast multiple hypothesis testing as a combinatorial
optimization problem and in particular, the 0-1 knapsack problem which
can be solved efficiently using a variety of algorithms, both approximate and exact in
nature.
|
40 |
Enhancing Statistician Power: Flexible Covariate-Adjusted Semiparametric Inference for Randomized Studies with Multivariate OutcomesStephens, Alisa Jane 21 June 2014 (has links)
It is well known that incorporating auxiliary covariates in the analysis of randomized clinical trials (RCTs) can increase efficiency. Questions still remain regarding how to flexibly incorporate baseline covariates while maintaining valid inference. Recent methodological advances that use semiparametric theory to develop covariate-adjusted inference for RCTs have focused on independent outcomes. In biomedical research, however, cluster randomized trials and longitudinal studies, characterized by correlated responses, are commonly used. We develop methods that flexibly incorporate baseline covariates for efficiency improvement in randomized studies with correlated outcomes. In Chapter 1, we show how augmented estimators may be used for cluster randomized trials, in which treatments are assigned to groups of individuals. We demonstrate the potential for imbalance correction and efficiency improvement through consideration of both cluster- and individual-level covariates. To improve small-sample estimation, we consider several variance adjustments. We evaluate this approach for continuous and binary outcomes through simulation and apply it to the Young Citizens study, a cluster randomized trial of a community behavioral intervention for HIV prevention in Tanzania. Chapter 2 builds upon the previous chapter by deriving semiparametric locally efficient estimators of marginal mean treatment effects when outcomes are correlated. Estimating equations are determined by the efficient score under a mean model for marginal effects when data contain baseline covariates and exhibit correlation. Locally efficient estimators are implemented for longitudinal data with continuous outcomes and clustered data with binary outcomes. Methods are illustrated through application to AIDS Clinical Trial Group Study 398, a longitudinal randomized study that compared various protease inhibitors in HIV-positive subjects. In Chapter 3, we empirically evaluate several covariate-adjusted tests of intervention effects when baseline covariates are selected adaptively and the number of randomized units is small. We demonstrate that randomization inference preserves type I error under model selection while tests based on asymptotic theory break down. Additionally, we show that covariate adjustment typically increases power, except at extremely small sample sizes using liberal selection procedures. Properties of covariate-adjusted tests are explored for independent and multivariate outcomes. We revisit Young Citizens to provide further insight into the performance of various methods in small-sample settings.
|
Page generated in 0.0492 seconds