Global ETD Search

1	Estimation and Goodness of Fit for Multivariate Survival Models Based on Copulas Yilmaz, Yildiz Elif 11 August 2009 (has links) We provide ways to test the fit of a parametric copula family for bivariate censored data with or without covariates. The proposed copula family is tested by embedding it in an expanded parametric family of copulas. When parameters in the proposed and the expanded copula models are estimated by maximum likelihood, a likelihood ratio test can be used. However, when they are estimated by two-stage pseudolikelihood estimation, the corresponding test is a pseudolikelihood ratio test. The two-stage procedures offer less computation, which is especially attractive when the marginal lifetime distributions are specified nonparametrically or semiparametrically. It is shown that the likelihood ratio test is consistent even when the expanded model is misspecified. Power comparisons of the likelihood ratio and the pseudolikelihood ratio tests with some other goodness-of-fit tests are performed both when the expanded family is correct and when it is misspecified. They indicate that model expansion provides a convenient, powerful and robust approach. We introduce a semiparametric maximum likelihood estimation method in which the copula parameter is estimated without assumptions on the marginal distributions. This method and the two-stage semiparametric estimation method suggested by Shih and Louis (1995) are generalized to regression models with Cox proportional hazards margins. The two-stage semiparametric estimator of the copula parameter is found to be about as good as the semiparametric maximum likelihood estimator. Semiparametric likelihood ratio and pseudolikelihood ratio tests are considered to provide goodness of fit tests for a copula model without making parametric assumptions for the marginal distributions. Both when the expanded family is correct and when it is misspecified, the semiparametric pseudolikelihood ratio test is almost as powerful as the parametric likelihood ratio and pseudolikelihood ratio tests while achieving robustness to the form of the marginal distributions. The methods are illustrated on applications in medicine and insurance. Sequentially observed survival times are of interest in many studies but there are difficulties in modeling and analyzing such data. First, when the duration of followup is limited and the times for a given individual are not independent, the problem of induced dependent censoring arises for the second and subsequent survival times. Non-identifiability of the marginal survival distributions for second and later times is another issue, since they are observable only if preceding survival times for an individual are uncensored. In addition, in some studies, a significant proportion of individuals may never have the first event. Fully parametric models can deal with these features, but lack of robustness is a concern, and methods of assessing fit are lacking. We introduce an approach to address these issues. We model the joint distribution of the successive survival times by using copula functions, and provide semiparametric estimation procedures in which copula parameters are estimated without parametric assumptions on the marginal distributions. The performance of semiparametric estimation methods is compared with some other estimation methods in simulation studies and shown to be good. The methodology is applied to a motivating example involving relapse and survival following colon cancer treatment. copula semiparametric estimation likelihood ratio test pseudolikelihood ratio test multivariate survival data Statistics
2	Estimation and Goodness of Fit for Multivariate Survival Models Based on Copulas Yilmaz, Yildiz Elif 11 August 2009 (has links) We provide ways to test the fit of a parametric copula family for bivariate censored data with or without covariates. The proposed copula family is tested by embedding it in an expanded parametric family of copulas. When parameters in the proposed and the expanded copula models are estimated by maximum likelihood, a likelihood ratio test can be used. However, when they are estimated by two-stage pseudolikelihood estimation, the corresponding test is a pseudolikelihood ratio test. The two-stage procedures offer less computation, which is especially attractive when the marginal lifetime distributions are specified nonparametrically or semiparametrically. It is shown that the likelihood ratio test is consistent even when the expanded model is misspecified. Power comparisons of the likelihood ratio and the pseudolikelihood ratio tests with some other goodness-of-fit tests are performed both when the expanded family is correct and when it is misspecified. They indicate that model expansion provides a convenient, powerful and robust approach. We introduce a semiparametric maximum likelihood estimation method in which the copula parameter is estimated without assumptions on the marginal distributions. This method and the two-stage semiparametric estimation method suggested by Shih and Louis (1995) are generalized to regression models with Cox proportional hazards margins. The two-stage semiparametric estimator of the copula parameter is found to be about as good as the semiparametric maximum likelihood estimator. Semiparametric likelihood ratio and pseudolikelihood ratio tests are considered to provide goodness of fit tests for a copula model without making parametric assumptions for the marginal distributions. Both when the expanded family is correct and when it is misspecified, the semiparametric pseudolikelihood ratio test is almost as powerful as the parametric likelihood ratio and pseudolikelihood ratio tests while achieving robustness to the form of the marginal distributions. The methods are illustrated on applications in medicine and insurance. Sequentially observed survival times are of interest in many studies but there are difficulties in modeling and analyzing such data. First, when the duration of followup is limited and the times for a given individual are not independent, the problem of induced dependent censoring arises for the second and subsequent survival times. Non-identifiability of the marginal survival distributions for second and later times is another issue, since they are observable only if preceding survival times for an individual are uncensored. In addition, in some studies, a significant proportion of individuals may never have the first event. Fully parametric models can deal with these features, but lack of robustness is a concern, and methods of assessing fit are lacking. We introduce an approach to address these issues. We model the joint distribution of the successive survival times by using copula functions, and provide semiparametric estimation procedures in which copula parameters are estimated without parametric assumptions on the marginal distributions. The performance of semiparametric estimation methods is compared with some other estimation methods in simulation studies and shown to be good. The methodology is applied to a motivating example involving relapse and survival following colon cancer treatment. copula semiparametric estimation likelihood ratio test pseudolikelihood ratio test multivariate survival data Statistics
3	Second-order least squares estimation in dynamic regression models AbdelAziz Salamh, Mustafa 16 April 2014 (has links) In this dissertation we proposed two generalizations of the Second-Order Least Squares (SLS) approach in two popular dynamic econometrics models. The first one is the regression model with time varying nonlinear mean function and autoregressive conditionally heteroskedastic (ARCH) disturbances. The second one is a linear dynamic panel data model. We used a semiparametric framework in both models where the SLS approach is based only on the first two conditional moments of response variable given the explanatory variables. There is no need to specify the distribution of the error components in both models. For the ARCH model under the assumption of strong-mixing process with finite moments of some order, we established the strong consistency and asymptotic normality of the SLS estimator. It is shown that the optimal SLS estimator, which makes use of the additional information inherent in the conditional skewness and kurtosis of the process, is superior to the commonly used quasi-MLE, and the efficiency gain is significant when the underlying distribution is asymmetric. Moreover, our large scale simulation studies showed that the optimal SLSE behaves better than the corresponding estimating function estimator in finite sample situation. The practical usefulness of the optimal SLSE was tested by an empirical example on the U.K. Inflation. For the linear dynamic panel data model, we showed that the SLS estimator is consistent and asymptotically normal for large N and finite T under fairly general regularity conditions. Moreover, we showed that the optimal SLS estimator reaches a semiparametric efficiency bound. A specification test was developed for the first time to be used whenever the SLS is applied to real data. Our Monte Carlo simulations showed that the optimal SLS estimator performs satisfactorily in finite sample situations compared to the first-differenced GMM and the random effects pseudo ML estimators. The results apply under stationary/nonstationary process and wih/out exogenous regressors. The performance of the optimal SLS is robust under near-unit root case. Finally, the practical usefulness of the optimal SLSE was examined by an empirical study on the U.S. airfares. Statistics Econometrics ARCH models linear dynamic panel data models semiparametric estimation second order least squares
4	Essays in Macroeconomics and Finance: Hu, Yushan January 2020 (has links) Thesis advisor: Fabio Schiantarelli / Thesis advisor: Zhijie Xiao / This dissertation consists of three essays in macroeconomics and finance. The first and second chapters analyze the impact of the financial shocks and anti-corruption campaign on Chinese firms through the bank lending channel. The third chapter provides a new method to predict the cash flow from operations (CFO) via semi-parametric estimation and machine learning. The first chapter explores the impact of the financial crisis and sovereign debt crisis on Chinese firms through the bank lending channel and firm borrowing channel. Using new data linking Chinese firms to their bank(s) and four different measurements of exposure to the international markets (international borrowing, importance of lending to foreign listed companies, share of trade settlement, and exchange/income), I find that banks with higher exposure to the international markets cut lending more during the recent financial crisis. In addition, state-owned bank loans are more pro-cyclical compared with private bank loans. Moreover, banks with higher exposure to the international markets cut lending more when there is a negative shock in OECD GDP growth. With regard to firm borrowing channel, I find that firms with higher weighted aggregate exposure to the international markets through banks have lower net debt, cash, employment, and capital investment during the financial crisis. Firms with higher weighted aggregate exposure to the global markets have higher net debt and lower cash, employment, and capital investment when there is a negative shock in OECD GDP growth. This paper also provides a theoretical model to explain the mechanism in a partially opened economy like China. The second chapter discusses the impact of the anti-corruption campaign on Chinese firms through the bank lending channel. Using confidential data linking Chinese firms to their bank(s) and prefecture-level corruption index, I find that banks located in more corrupted prefectures offer significantly less credits before the anti-corruption investigation, and this effect changes the direction after the investigation. Moreover, banks located in more corrupted prefectures tend to use higher interest rates, longer maturity, and more collateral before the campaign, all of these effects change the direction after the campaign. This paper suggests that the banks located in more corrupted prefectures have stronger monopoly power (or higher markup, and lower efficiency). This monopoly effect could be proved by that the bank concentration ratio is higher, and the bad loans of the banks are higher in the more corrupted areas, and all of these effects disappear after the campaign. The third chapter considers the methods of prediction of Cash flow from operations (CFO). Forecasting CFO is an essential topic in financial econometrics and empirical accounting. It impacts a variety of economic decisions, including valuation methodologies employing discounted cash flows, distress prediction, risk assessment, the accuracy of credit-rating predictions, and the provision of value-relevant information to security markets. Existing literature on statistically-based cash-flow prediction has pursued cross-sectional versus time-series estimation procedures in a mutually exclusive fashion. Cumulated empirical evidence indicates that the beta value varies across firms of different sizes, and the cross-sectional regression can not capture an idiosyncratic beta. However, although a time series based predictive model has the advantage of allowing for firm-specific variability in beta, it requires a long enough time series data. In this paper, we extend the literature on statistically-based, cash-flow prediction models by introducing an estimation procedure that, in essence, combine the favorable attributes of both cross-sectional estimation via the use of "local" cross-sectional data for firms of similar size and time-series estimation via the capturing of firm-specific variability in the beta parameters for the independent variables. The local learning approach assumes no a priori knowledge on the constancy of the beta coefficient. It allows the information about coefficients to be represented by only a subset of observations. This feature is particularly relevant in the CFO model, where the beta values are only related to cross-sectional data information that is "local" to its size. We provide empirical evidence that the prediction of cash flows from operations is enhanced by jointly adopting features specific to both cross-sectional and time-series modeling simultaneously. / Thesis (PhD) — Boston College, 2020. / Submitted to: Boston College. Graduate School of Arts and Sciences. / Discipline: Economics. Bank lending Corruption Financial crisis Firm borrowing Machine learning Semiparametric estimation
5	Three Essays in Inference and Computational Problems in Econometrics Todorov, Zvezdomir January 2020 (has links) This dissertation is organized into three independent chapters. In Chapter 1, I consider the selection of weights for averaging a set of threshold models. Existing model averaging literature primarily focuses on averaging linear models, I consider threshold regression models. The theory I developed in that chapter demonstrates that the proposed jackknife model averaging estimator achieves asymptotic optimality when the set of candidate models are all misspecified threshold models. The simulations study demonstrates that the jackknife model averaging estimator achieves the lowest mean squared error when contrasted against other model selection and model averaging methods. In Chapter 2, I propose a model averaging framework for the synthetic control method of Abadie and Gardeazabal (2003) and Abadie et al. (2010). The proposed estimator serves a twofold purpose. First, it reduces the bias in estimating the weights each member of the donor pool receives. Secondly, it accounts for model uncertainty for the program evaluation estimation. I study two variations of the model, one where model weights are derived by solving a cross-validation quadratic program and another where each candidate model receives equal weights. Next, I show how to apply the placebo study and the conformal inference procedure for both versions of my estimator. With a simulation study, I reveal that the superior performance of the proposed procedure. In Chapter 3, which is co-authored with my advisor Professor Youngki Shin, we provide an exact computation algorithm for the maximum rank correlation estimator using the mixed integer programming (MIP) approach. We construct a new constrained optimization problem by transforming all indicator functions into binary parameters to be estimated and show that the transformation is equivalent to the original problem. Using a modern MIP solver, we apply the proposed method to an empirical example and Monte Carlo simulations. The results show that the proposed algorithm performs better than the existing alternatives. / Dissertation / Doctor of Philosophy (PhD) model averaging cross validation mixed integer programming semiparametric estimation threshold model synthetic control estimation maximum rank correlation
6	Sur l'estimation semi paramétrique robuste pour statistique fonctionnelle / On the semiparametric robust estimation in functional statistic Attaoui, Said 10 December 2012 (has links) Dans cette thèse, nous nous proposons d'étudier quelques paramètres fonctionnels lorsque les données sont générées à partir d'un modèle de régression à indice simple. Nous étudions deux paramètres fonctionnels. Dans un premier temps nous supposons que la variable explicative est à valeurs dans un espace de Hilbert (dimension infinie) et nous considérons l'estimation de la densité conditionnelle par la méthode de noyau. Nous traitons les propriétés asymptotiques de cet estimateur dans les deux cas indépendant et dépendant. Pour le cas où les observations sont indépendantes identiquement distribuées (i.i.d.), nous obtenons la convergence ponctuelle et uniforme presque complète avec vitesse de l'estimateur construit. Comme application nous discutons l'impact de ce résultat en prévision non paramétrique fonctionnelle à partir de l'estimation de mode conditionnelle. La dépendance est modélisée via la corrélation quasi-associée. Dans ce contexte nous établissons la convergence presque complète ainsi que la normalité asymptotique de l'estimateur à noyau de la densité condtionnelle convenablement normalisée. Nous donnons de manière explicite la variance asymptotique. Notons que toutes ces propriétés asymptotiques ont été obtenues sous des conditions standard et elles mettent en évidence le phénomène de concentration de la mesure de probabilité de la variable fonctionnelle sur des petites boules. Dans un second temps, nous supposons que la variable explicative est vectorielle et nous nous intéressons à un modèle de prévision assez général qui est la régression robuste. A partir d'observations quasi-associées, on construit un estimateur à noyau pour ce paramètre fonctionnel. Comme résultat asymptotique on établit la vitesse de convergence presque complète uniforme de l'estimateur construit. Nous insistons sur le fait que les deux modèles étudiés dans cette thèse pourraient être utilisés pour l'estimation de l'indice simple lorsque ce dernier est inconnu, en utilisant la méthode d'M-estimation ou la méthode de pseudo-maximum de vraisemblance, qui est un cas particulier de la première méthode. / In this thesis, we propose to study some functional parameters when the data are generated from a model of regression to a single index. We study two functional parameters. Firstly, we suppose that the explanatory variable take its values in Hilbert space (infinite dimensional space) and we consider the estimate of the conditional density by the kernel method. We establish some asymptotic properties of this estimator in both independent and dependent cases. For the case where the observations are independent identically distributed (i.i.d.), we obtain the pointwise and uniform almost complete convergence with rateof the estimator. As an application we discuss the impact of this result in fuctional nonparametric prevision for the estimation of the conditional mode. In the dependent case we modelize the later via the quasi-associated correlation. Note that all these asymptotic properties are obtained under standard conditions and they highlight the phenomenon of concentration properties on small balls probability measure of the functional variable. Secondly we suppose that the explanatory variable takes values in the _nite dimensional space and we interest in a rather general prevision model whichis the robust regression. From the quasi-associated data, we build a kernel estimator for this functional parameter. As an asymptotic result we establish the uniform almost complete convergence rate of the estimator. We point out by the fact that these two models studied in this thesis could be used for the estimation of the single index of the model when the latter is unknown, by using the method of M-estimation or the pseudo-maximum likelihood method which is a particular case of the first method. Statisque fonctionnelle Estimation semi-paramétrique Estimation non paramétrique Indice simple Régression robuste Functional data Semiparametric estimation Nonparametric estimation Single index Quasi-associated dependent variables
7	Applications of nonparametric methods in economic and political science / Anwendungen nichtparametrischer Verfahren in den Wirtschafts- und Staatswissenschaften Heidenreich, Nils-Bastian 11 April 2011 (has links) No description available. Bandweite Kerndichteschätzung Multikategorial Multinomial Logit Modell Semiparatrische Schätzmethoden Additivität Lokale Likelihood Wählerprofile Entscheidungsmodelle bandwidth choice kernel density estimation voter profiling smoothed likelihood semiparametric estimation additivity multiple choice models
8	Statistical Methods for Life History Analysis Involving Latent Processes Shen, Hua January 2014 (has links) Incomplete data often arise in the study of life history processes. Examples include missing responses, missing covariates, and unobservable latent processes in addition to right censoring. This thesis is on the development of statistical models and methods to address these problems as they arise in oncology and chronic disease. Methods of estimation and inference in parametric, weakly parametric and semiparametric settings are investigated. Studies of chronic diseases routinely sample individuals subject to conditions on an event time of interest. In epidemiology, for example, prevalent cohort studies aiming to evaluate risk factors for survival following onset of dementia require subjects to have survived to the point of screening. In clinical trials designed to assess the effect of experimental cancer treatments on survival, patients are required to survive from the time of cancer diagnosis to recruitment. Such conditions yield samples featuring left-truncated event time distributions. Incomplete covariate data often arise in such settings, but standard methods do not deal with the fact that the covariate distribution is also affected by left truncation. We develop a likelihood and algorithm for estimation for dealing with incomplete covariate data in such settings. An expectation-maximization algorithm deals with the left truncation by using the covariate distribution conditional on the selection criterion. An extension to deal with sub-group analyses in clinical trials is described for the case in which the stratification variable is incompletely observed. In studies of affective disorder, individuals are often observed to experience recurrent symptomatic exacerbations of symptoms warranting hospitalization. Interest lies in modeling the occurrence of such exacerbations over time and identifying associated risk factors to better understand the disease process. In some patients, recurrent exacerbations are temporally clustered following disease onset, but cease to occur after a period of time. We develop a dynamic mover-stayer model in which a canonical binary variable associated with each event indicates whether the underlying disease has resolved. An individual whose disease process has not resolved will experience events following a standard point process model governed by a latent intensity. If and when the disease process resolves, the complete data intensity becomes zero and no further events will arise. An expectation-maximization algorithm is developed for parametric and semiparametric model fitting based on a discrete time dynamic mover-stayer model and a latent intensity-based model of the underlying point process. The method is applied to a motivating dataset from a cohort of individuals with affective disorder experiencing recurrent hospitalization for their mental health disorder. Interval-censored recurrent event data arise when the event of interest is not readily observed but the cumulative event count can be recorded at periodic assessment times. Extensions on model fitting techniques for the dynamic mover-stayer model are discussed and incorporate interval censoring. The likelihood and algorithm for estimation are developed for piecewise constant baseline rate functions and are shown to yield estimators with small empirical bias in simulation studies. Data on the cumulative number of damaged joints in patients with psoriatic arthritis are analysed to provide an illustrative application.
9	Uncovering Structure in High-Dimensions: Networks and Multi-task Learning Problems Kolar, Mladen 01 July 2013 (has links) Extracting knowledge and providing insights into complex mechanisms underlying noisy high-dimensional data sets is of utmost importance in many scientific domains. Statistical modeling has become ubiquitous in the analysis of high dimensional functional data in search of better understanding of cognition mechanisms, in the exploration of large-scale gene regulatory networks in hope of developing drugs for lethal diseases, and in prediction of volatility in stock market in hope of beating the market. Statistical analysis in these high-dimensional data sets is possible only if an estimation procedure exploits hidden structures underlying data. This thesis develops flexible estimation procedures with provable theoretical guarantees for uncovering unknown hidden structures underlying data generating process. Of particular interest are procedures that can be used on high dimensional data sets where the number of samples n is much smaller than the ambient dimension p. Learning in high-dimensions is difficult due to the curse of dimensionality, however, the special problem structure makes inference possible. Due to its importance for scientific discovery, we put emphasis on consistent structure recovery throughout the thesis. Particular focus is given to two important problems, semi-parametric estimation of networks and feature selection in multi-task learning. Complex Systems Dynamic Networks Feature Selection Gaussian Graphical Models High-dimensional Inference Markov Random Fields Multi-task Learning Semiparametric Estimation Sparsity Structure Learning Undirected Graphical Models Variable Screening Varying Coefficient Computer Sciences

Search results