Global ETD Search

1	Semiparametric functional data analysis for longitudinal/clustered data: theory and application Hu, Zonghui 12 April 2006 (has links) Semiparametric models play important roles in the ﬁeld of biological statistics. In this dissertation, two types of semiparametic models are to be studied. One is the partially linear model, where the parametric part is a linear function. We are to investigate the two common estimation methods for the partially linear models when the data is correlated Â longitudinal or clustered. The other is a semiparametric model where a latent covariate is incorporated in a mixed effects model. We will propose a semiparametric approach for estimation of this model and apply it to the study on colon carcinogenesis. First, we study the proﬁlekernel and backﬁtting methods in partially linear models for clustered/longitudinal data. For independent data, despite the potential rootn inconsistency of the backﬁtting estimator noted by Rice (1986), the two estimators have the same asymptotic variance matrix as shown by Opsomer and Ruppert (1999). In this work, theoretical comparisons of the two estimators for multivariate responses are investigated. We show that, for correlated data, backﬁtting often produces a larger asymptotic variance than the proﬁlekernel method; that is, in addition to its bias problem, the backﬁtting estimator does not have the same asymptotic efﬁciency as the proﬁlekernel estimator when data is correlated. Consequently, the common practice of using the backﬁtting method to compute proﬁlekernel estimates is no longer advised. We illustrate this in detail by following Zeger and Diggle (1994), Lin and Carroll (2001) with a working independence covariance structure for nonparametric estimation and a correlated covariance structure for parametric estimation. Numerical performance of the two estimators is investigated through a simulation study. Their application to an ophthalmology dataset is also described. Next, we study a mixed effects model where the main response and covariate variables are linked through the positions where they are measured. But for technical reasons, they are not measured at the same positions. We propose a semiparametric approach for this misaligned measurements problem and derive the asymptotic properties of the semiparametric estimators under reasonable conditions. An application of the semiparametric method to a colon carcinogenesis study is provided. We ﬁnd that, as compared with the corn oil supplemented diet, ﬁsh oil supplemented diet tends to inhibit the increment of bcl2 (oncogene) gene expression in rats when the amount of DNA damage increases, and thus promotes apoptosis. partially linear model backfitting profile-kernel semiparametric model
2	A Flexible Zero-Inflated Poisson Regression Model Roemmele, Eric S. 01 January 2019 (has links) A practical problem often encountered with observed count data is the presence of excess zeros. Zero-inflation in count data can easily be handled by zero-inflated models, which is a two-component mixture of a point mass at zero and a discrete distribution for the count data. In the presence of predictors, zero-inflated Poisson (ZIP) regression models are, perhaps, the most commonly used. However, the fully parametric ZIP regression model could sometimes be restrictive, especially with respect to the mixing proportions. Taking inspiration from some of the recent literature on semiparametric mixtures of regressions models for flexible mixture modeling, we propose a semiparametric ZIP regression model. We present an "EM-like" algorithm for estimation and a summary of asymptotic properties of the estimators. The proposed semiparametric models are then applied to a data set involving clandestine methamphetamine laboratories and Alzheimer's disease. Bootstrap Count data EM Algorithm zero-inflation semiparametric model Statistical Models Statistical Theory
3	Efficient Semiparametric Estimators for Nonlinear Regressions and Models under Sample Selection Bias Kim, Mi Jeong 2012 August 1900 (has links) We study the consistency, robustness and efficiency of parameter estimation in different but related models via semiparametric approach. First, we revisit the second- order least squares estimator proposed in Wang and Leblanc (2008) and show that the estimator reaches the semiparametric efficiency. We further extend the method to the heteroscedastic error models and propose a semiparametric efficient estimator in this more general setting. Second, we study a class of semiparametric skewed distributions arising when the sample selection process causes sampling bias for the observations. We begin by assuming the anti-symmetric property to the skewing function. Taking into account the symmetric nature of the population distribution, we propose consistent estimators for the center of the symmetric population. These estimators are robust to model misspecification and reach the minimum possible estimation variance. Next, we extend the model to permit a more flexible skewing structure. Without assuming a particular form of the skewing function, we propose both consistent and efficient estimators for the center of the symmetric population using a semiparametric method. We also analyze the asymptotic properties and derive the corresponding inference procedures. Numerical results are provided to support the results and illustrate the finite sample performance of the proposed estimators. Efficiency Non-representative Data Robustness Second-order least squares estimator Selection Bias Semiparametric Model.
4	Spline-based sieve semiparametric generalized estimating equation for panel count data Hua, Lei 01 May 2010 (has links) In this thesis, we propose to analyze panel count data using a spline-based sieve generalized estimating equation method with a semiparametric proportional mean model E(N(t)\|Z) = Λ0(t) eβT0Z. The natural log of the baseline mean function, logΛ0(t), is approximated by a monotone cubic B-spline function. The estimates of regression parameters and spline coefficients are the roots of the spline based sieve generalized estimating equations (sieve GEE). The proposed method avoids assumingany parametric structure of the baseline mean function and the underlying counting process. Selection of an appropriate covariance matrix that represents the true correlation between the cumulative counts improves estimating efficiency. In addition to the parameters existing in the proportional mean function, the estimation that accounts for the over-dispersion and autocorrelation involves an extra nuisance parameter σ2, which could be estimated using a method of moment proposed by Zeger (1988). The parameters in the mean function are then estimated by solving the pseudo generalized estimating equation with σ2 replaced by its estimate, σ2n. We show that the estimate of (β0,Λ0) based on this two-stage approach is still consistent and could converge at the optimal convergence rate in the nonparametric/semiparametric regression setting. The asymptotic normality of the estimate of β0 is also established. We further propose a spline-based projection variance estimating method and show its consistency. Simulation studies are conducted to investigate finite sample performance of the sieve semiparametric GEE estimates, as well as different variance estimating methods with different sample sizes. The covariance matrix that accounts for the overdispersion generally increases estimating efficiency when overdispersion is present in the data. Finally, the proposed method with different covariance matrices is applied to a real data from a bladder tumor clinical trial. Counting process Generalized Estimating Equation Monotone polynomial splines Over-dispersion Semiparametric model Biostatistics
5	Estimation d'un modèle de mélange paramétrique et semiparamétrique par des phi-divergences / Estimation of parametric and semiparametric mixture models using phi-divergences Al-Mohamad, Diaa 17 November 2016 (has links) L’étude des modèles de mélanges est un champ très vaste en statistique. Nous présentons dans la première partie de la thèse les phi-divergences et les méthodes existantes qui construisent des estimateurs robustes basés sur des phi-divergences. Nous nous intéressons en particulier à la forme duale des phi-divergences et nous construisons un nouvel estimateur robuste basant sur cette formule. Nous étudions les propriétés asymptotiques de cet estimateur et faisons une comparaison numérique avec les méthodes existantes. Dans un seconde temps, nous introduisons un algorithme proximal dont l’objectif est de calculer itérativement des estimateurs basés sur des critères de divergences statistiques. La convergence de l’algorithme est étudiée et illustrée par différents exemples théoriques et sur des données simulées. Dans la deuxième partie de la thèse, nous construisons une nouvelle structure pour les modèles de mélanges à deux composantes dont l’une est inconnue. La nouvelle approche permet d’incorporer une information a priori linéaire de type moments ou L-moments. Nous étudions les propriétés asymptotiques des estimateurs proposés. Des simulations numériques sont présentées afin de montrer l’avantage de la nouvelle approche en comparaison avec les méthodes existantes qui ne considèrent pas d’information a priori à part une hypothèse de symétrie sur la composante inconnue. / The study of mixture models constitutes a large domain of research in statistics. In the first part of this work, we present phi-divergences and the existing methods which produce robust estimators. We are more particularly interested in the so-called dual formula of phi-divergences. We build a new robust estimator based on this formula. We study its asymptotic properties and give a numerical comparison with existing methods on simulated data. We also introduce a proximal-point algorithm whose aim is to calculate divergence-based estimators. We give some of the convergence properties of this algorithm and illustrate them on theoretical and simulated examples. In the second part of this thesis, we build a new structure for two-component mixture models where one component is unknown. The new approach permits to incorporate a prior linear information about the unknown component such as moment-type and L-moments constraints. We study the asymptotic properties of the proposed estimators. Several experimental results on simulated data are illustrated showing the advantage of the novel approach and the gain from using the prior information in comparison to existing methods which do not incorporate any prior information except for a symmetry assumption over the unknown component. Modèle de mélange Phi-Divergence Algorithme proximal Dualité de Fenchel-Legendre Modèle semiparamétrique L-Moments Mixiture model Proximal algorithm Semiparametric model 510
6	Statistical Methods for Multi-type Recurrent Event Data Based on Monte Carlo EM Algorithms and Copula Frailties Bedair, Khaled Farag Emam 01 October 2014 (has links) In this dissertation, we are interested in studying processes which generate events repeatedly over the follow-up time of a given subject. Such processes are called recurrent event processes and the data they provide are referred to as recurrent event data. Examples include the cancer recurrences, recurrent infections or disease episodes, hospital readmissions, the filing of warranty claims, and insurance claims for policy holders. In particular, we focus on the multi-type recurrent event times which usually arise when two or more different kinds of events may occur repeatedly over a period of observation. Our main objectives are to describe features of each marginal process simultaneously and study the dependence among different types of events. We present applications to a real dataset collected from the Nutritional Prevention of Cancer Trial. The objective of the clinical trial was to evaluate the efficacy of Selenium in preventing the recurrence of several types of skin cancer among 1312 residents of the Eastern United States. Four chapters are involved in this dissertation. Chapter 1 introduces a brief background to the statistical techniques used to develop the proposed methodology. We cover some concepts and useful functions related to survival data analysis and present a short introduction to frailty distributions. The Monte Carlo expectation maximization (MCEM) algorithm and copula functions for the multivariate variables are also presented in this chapter. Chapter 2 develops a multi-type recurrent events model with multivariate Gaussian random effects (frailties) for the intensity functions. In this chapter, we present nonparametric baseline intensity functions and a multivariate Gaussian distribution for the multivariate correlated random effects. An MCEM algorithm with MCMC routines in the E-step is adopted for the partial likelihood to estimate model parameters. Equations for the variances of the estimates are derived and variances of estimates are computed by Louis' formula. Predictions of the individual random effects are obtained because in some applications the magnitude of the random effects is of interest for a better understanding and interpretation of the variability in the data. The performance of the proposed methodology is evaluated by simulation studies, and the developed model is applied to the skin cancer dataset. Chapter 3 presents copula-based semiparametric multivariate frailty models for multi-type recurrent event data with applications to the skin cancer data. In this chapter, we generalize the multivariate Gaussian assumption of the frailty terms and allow the frailty distributions to have more features than the symmetric, unimodal properties of the Gaussian density. More flexible approaches to modeling the correlated frailty, referred to as copula functions, are introduced. Copula functions provide tremendous flexibility especially in allowing taking the advantages of a variety of choices for the marginal distributions and correlation structures. Semiparametric intensity models for multi-type recurrent events based on a combination of the MCEM with MCMC sampling methods and copula functions are introduced. The combination of the MCEM approach and copula function is flexible and is a generally applicable approach for obtaining inferences of the unknown parameters for high dimension frailty models. Estimation procedures for fixed effects, nonparametric baseline intensity functions, copula parameters, and predictions for the subject-specific multivariate frailties and random effects are obtained. Louis' formula for variance estimates are derived and calculated. We investigate the impact of the specification of the frailty and random effect models on the inference of covariate effects, cumulative baseline intensity functions, prediction of random effects and frailties, and the estimation of the variance-covariance components. Performances of proposed models are evaluated by simulation studies. Applications are illustrated through the dataset collected from the clinical trial of patients with skin cancer. Conclusions and some remarks for future work are presented in Chapter 4. / Ph. D. MCEM algorithm cancer studies multi-type recurrent events multivariate frailty semiparametric model random effects copula survival analysis.
7	以重複事件分析法分析信用評等 / Recurrent Event Analysis of Credit Rating 陳奕如, Chen, Yi Ru Unknown Date (has links) This thesis surveys the method of extending Cox proportional hazard models (1972) and the general class of semiparametric model (2004) in the upgrades or downgrades of credit ratings by S&P. The two kinds of models can be used to modify the relationship of covariates to a recurrent event data of upgrades or downgrades. The benchmark credit-scoring model with a quintet of financial ratios which is inspired by the Z-Score model is employed. These financial ratios include measures of short-term liquidity, leverage, sales efficiency, historical profitability and productivity. The evidences of empirical results show that the financial ratios of historical profitability, leverage, and sales efficiency are significant factors on the rating transitions of upgrades. For the downgrades data setting, the financial ratios of short-term liquidity, productivity, and leverage are significant factors in the extending Cox models, whereas only the historical profitability is significant in the general class of semiparametric model. The empirical analysis of S&P credit ratings provide evidence supporting that the transitions of credit ratings are related to some determined financial ratios under these new econometrics methods. 信用評等重複事件分析法 Cox比例風險模型 credit rating recurrent event analysis Cox proportional hazard model general class of semiparametric model Z-Score model
8	Three Essays on Application of Semiparametric Regression: Partially Linear Mixed Effects Model and Index Model / Drei Aufsätze über Anwendung der Semiparametrischen Regression: Teilweise Lineares Gemischtes Modell und Index Modell Ohinata, Ren 03 May 2012 (has links) No description available. 310 Statistik EGCG 080 EGCH 250 EGCP 200 LCB 011 LCB 020 Economics Semiparametrisches Modell Kernregression Index Modell Gemischtes Modell Paneldaten Kreuzvalidierung Wild Bootstrap Bandweitenwahl Dimensionsreduktion Hauptkomponenten Wohlfahrtsindikator Semiparametric model Kernel regression Index model Mixed model Panel data Cross validation Wild bootstrap Bandwidth selection Dimension reduction Principal components Welfare indicator 31.73 83.03
9	Some Advanced Model Selection Topics for Nonparametric/Semiparametric Models with High-Dimensional Data Fang, Zaili 13 November 2012 (has links) Model and variable selection have attracted considerable attention in areas of application where datasets usually contain thousands of variables. Variable selection is a critical step to reduce the dimension of high dimensional data by eliminating irrelevant variables. The general objective of variable selection is not only to obtain a set of cost-effective predictors selected but also to improve prediction and prediction variance. We have made several contributions to this issue through a range of advanced topics: providing a graphical view of Bayesian Variable Selection (BVS), recovering sparsity in multivariate nonparametric models and proposing a testing procedure for evaluating nonlinear interaction effect in a semiparametric model. To address the first topic, we propose a new Bayesian variable selection approach via the graphical model and the Ising model, which we refer to the ``Bayesian Ising Graphical Model'' (BIGM). There are several advantages of our BIGM: it is easy to (1) employ the single-site updating and cluster updating algorithm, both of which are suitable for problems with small sample sizes and a larger number of variables, (2) extend this approach to nonparametric regression models, and (3) incorporate graphical prior information. In the second topic, we propose a Nonnegative Garrote on a Kernel machine (NGK) to recover sparsity of input variables in smoothing functions. We model the smoothing function by a least squares kernel machine and construct a nonnegative garrote on the kernel model as the function of the similarity matrix. An efficient coordinate descent/backfitting algorithm is developed. The third topic involves a specific genetic pathway dataset in which the pathways interact with the environmental variables. We propose a semiparametric method to model the pathway-environment interaction. We then employ a restricted likelihood ratio test and a score test to evaluate the main pathway effect and the pathway-environment interaction. / Ph. D. Variable Selection Smoothing Splines Sparsistency Semiparametric Model Pathway Analysis Additive Model Cluster Algorithm Gaussian Random Process Global-Local Shrinkage Graphical Model Ising Model Kernel Machine KM Model LASSO Long Tail Prior Mixture Normals Model Selection Multivariate Smoothing Function Nonnegative Garrote Nonparametric Model
10	Confidence bands in quantile regression and generalized dynamic semiparametric factor models Song, Song 01 November 2010 (has links) In vielen Anwendungen ist es notwendig, die stochastische Schwankungen der maximalen Abweichungen der nichtparametrischen Schätzer von Quantil zu wissen, zB um die verschiedene parametrische Modelle zu überprüfen. Einheitliche Konfidenzbänder sind daher für nichtparametrische Quantil Schätzungen der Regressionsfunktionen gebaut. Die erste Methode basiert auf der starken Approximation der empirischen Verfahren und Extremwert-Theorie. Die starke gleichmäßige Konsistenz liegt auch unter allgemeinen Bedingungen etabliert. Die zweite Methode beruht auf der Bootstrap Resampling-Verfahren. Es ist bewiesen, dass die Bootstrap-Approximation eine wesentliche Verbesserung ergibt. Der Fall von mehrdimensionalen und diskrete Regressorvariablen wird mit Hilfe einer partiellen linearen Modell behandelt. Das Verfahren wird mithilfe der Arbeitsmarktanalysebeispiel erklärt. Hoch-dimensionale Zeitreihen, die nichtstationäre und eventuell periodische Verhalten zeigen, sind häufig in vielen Bereichen der Wissenschaft, zB Makroökonomie, Meteorologie, Medizin und Financial Engineering, getroffen. Der typische Modelierungsansatz ist die Modellierung von hochdimensionalen Zeitreihen in Zeit Ausbreitung der niedrig dimensionalen Zeitreihen und hoch-dimensionale zeitinvarianten Funktionen über dynamische Faktorenanalyse zu teilen. Wir schlagen ein zweistufiges Schätzverfahren. Im ersten Schritt entfernen wir den Langzeittrend der Zeitreihen durch Einbeziehung Zeitbasis von der Gruppe Lasso-Technik und wählen den Raumbasis mithilfe der funktionalen Hauptkomponentenanalyse aus. Wir zeigen die Eigenschaften dieser Schätzer unter den abhängigen Szenario. Im zweiten Schritt erhalten wir den trendbereinigten niedrig-dimensionalen stochastischen Prozess (stationär). / In many applications it is necessary to know the stochastic fluctuation of the maximal deviations of the nonparametric quantile estimates, e.g. for various parametric models check. Uniform confidence bands are therefore constructed for nonparametric quantile estimates of regression functions. The first method is based on the strong approximations of the empirical process and extreme value theory. The strong uniform consistency rate is also established under general conditions. The second method is based on the bootstrap resampling method. It is proved that the bootstrap approximation provides a substantial improvement. The case of multidimensional and discrete regressor variables is dealt with using a partial linear model. A labor market analysis is provided to illustrate the method. High dimensional time series which reveal nonstationary and possibly periodic behavior occur frequently in many fields of science, e.g. macroeconomics, meteorology, medicine and financial engineering. One of the common approach is to separate the modeling of high dimensional time series to time propagation of low dimensional time series and high dimensional time invariant functions via dynamic factor analysis. We propose a two-step estimation procedure. At the first step, we detrend the time series by incorporating time basis selected by the group Lasso-type technique and choose the space basis based on smoothed functional principal component analysis. We show properties of this estimator under the dependent scenario. At the second step, we obtain the detrended low dimensional stochastic process (stationary). Bootstrap Quantilsregression Konsistenzrate Selbstvertrauen Band Check-Funktion Kernel Smoothing Nichtparametrische Fitting Teilweise Lineares Modell Semiparametrische Modell Faktor-Modell Saisonalität Periodisch Asymptotische Schlussfolgerung Wetter fMRI Group Lasso Implizite Volatilität Oberfläche fMRI Bootstrap Quantile Regression Consistency Rate Confidence Band Check Function Kernel Smoothing Nonparametric Fitting Partial Linear Model Factor model Group Lasso Seasonality Periodic Asymptotic inference Weather Semiparametric model Implied volatility surface 330 Wirtschaft 17 Wirtschaft QH 234 ddc:330

Search results