Spelling suggestions: "subject:"semiparametric regression"" "subject:"emiparametric regression""
21 |
Semiparametric Bayesian Approach using Weighted Dirichlet Process Mixture For Finance Statistical ModelsSun, Peng 07 March 2016 (has links)
Dirichlet process mixture (DPM) has been widely used as exible prior in nonparametric Bayesian literature, and Weighted Dirichlet process mixture (WDPM) can be viewed as extension of DPM which relaxes model distribution assumptions. Meanwhile, WDPM requires to set weight functions and can cause extra computation burden. In this dissertation, we develop more efficient and exible WDPM approaches under three research topics. The first one is semiparametric cubic spline regression where we adopt a nonparametric prior for error terms in order to automatically handle heterogeneity of measurement errors or unknown mixture distribution, the second one is to provide an innovative way to construct weight function and illustrate some decent properties and computation efficiency of this weight under semiparametric stochastic volatility (SV) model, and the last one is to develop WDPM approach for Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) model (as an alternative approach for SV model) and propose a new model evaluation approach for GARCH which produces easier-to-interpret result compared to the canonical marginal likelihood approach.
In the first topic, the response variable is modeled as the sum of three parts. One part is a linear function of covariates that enter the model parametrically. The second part is an additive nonparametric model. The covariates whose relationships to response variable are unclear will be included in the model nonparametrically using Lancaster and Šalkauskas bases. The third part is error terms whose means and variance are assumed to follow non-parametric priors. Therefore we denote our model as dual-semiparametric regression because we include nonparametric idea for both modeling mean part and error terms. Instead of assuming all of the error terms follow the same prior in DPM, our WDPM provides multiple candidate priors for each observation to select with certain probability. Such probability (or weight) is modeled by relevant predictive covariates using Gaussian kernel. We propose several different WDPMs using different weights which depend on distance in covariates. We provide the efficient Markov chain Monte Carlo (MCMC) algorithms and also compare our WDPMs to parametric model and DPM model in terms of Bayes factor using simulation and empirical study.
In the second topic, we propose an innovative way to construct weight function for WDPM and apply it to SV model. SV model is adopted in time series data where the constant variance assumption is violated. One essential issue is to specify distribution of conditional return. We assume WDPM prior for conditional return and propose a new way to model the weights. Our approach has several advantages including computational efficiency compared to the weight constructed using Gaussian kernel. We list six properties of this proposed weight function and also provide the proof of them. Because of the additional Metropolis-Hastings steps introduced by WDPM prior, we find the conditions which can ensure the uniform geometric ergodicity of transition kernel in our MCMC. Due to the existence of zero values in asset price data, our SV model is semiparametric since we employ WDPM prior for non-zero values and parametric prior for zero values.
On the third project, we develop WDPM approach for GARCH type model and compare different types of weight functions including the innovative method proposed in the second topic. GARCH model can be viewed as an alternative way of SV for analyzing daily stock prices data where constant variance assumption does not hold. While the response variable of our SV models is transformed log return (based on log-square transformation), GARCH directly models the log return itself. This means that, theoretically speaking, we are able to predict stock returns using GARCH models while this is not feasible if we use SV model. Because SV models ignore the sign of log returns and provides predictive densities for squared log return only. Motivated by this property, we propose a new model evaluation approach called back testing return (BTR) particularly for GARCH. This BTR approach produces model evaluation results which are easier to interpret than marginal likelihood and it is straightforward to draw conclusion about model profitability by applying this approach. Since BTR approach is only applicable to GARCH, we also illustrate how to properly cal- culate marginal likelihood to make comparison between GARCH and SV. Based on our MCMC algorithms and model evaluation approaches, we have conducted large number of model fittings to compare models in both simulation and empirical study. / Ph. D.
|
22 |
Semiparametric and Nonparametric Methods for Complex DataKim, Byung-Jun 26 June 2020 (has links)
A variety of complex data has broadened in many research fields such as epidemiology, genomics, and analytical chemistry with the development of science, technologies, and design scheme over the past few decades. For example, in epidemiology, the matched case-crossover study design is used to investigate the association between the clustered binary outcomes of disease and a measurement error in covariate within a certain period by stratifying subjects' conditions. In genomics, high-correlated and high-dimensional(HCHD) data are required to identify important genes and their interaction effect over diseases. In analytical chemistry, multiple time series data are generated to recognize the complex patterns among multiple classes. Due to the great diversity, we encounter three problems in analyzing those complex data in this dissertation. We have then provided several contributions to semiparametric and nonparametric methods for dealing with the following problems: the first is to propose a method for testing the significance of a functional association under the matched study; the second is to develop a method to simultaneously identify important variables and build a network in HDHC data; the third is to propose a multi-class dynamic model for recognizing a pattern in the time-trend analysis.
For the first topic, we propose a semiparametric omnibus test for testing the significance of a functional association between the clustered binary outcomes and covariates with measurement error by taking into account the effect modification of matching covariates. We develop a flexible omnibus test for testing purposes without a specific alternative form of a hypothesis. The advantages of our omnibus test are demonstrated through simulation studies and 1-4 bidirectional matched data analyses from an epidemiology study.
For the second topic, we propose a joint semiparametric kernel machine network approach to provide a connection between variable selection and network estimation. Our approach is a unified and integrated method that can simultaneously identify important variables and build a network among them. We develop our approach under a semiparametric kernel machine regression framework, which can allow for the possibility that each variable might be nonlinear and is likely to interact with each other in a complicated way. We demonstrate our approach using simulation studies and real application on genetic pathway analysis.
Lastly, for the third project, we propose a Bayesian focal-area detection method for a multi-class dynamic model under a Bayesian hierarchical framework. Two-step Bayesian sequential procedures are developed to estimate patterns and detect focal intervals, which can be used for gas chromatography. We demonstrate the performance of our proposed method using a simulation study and real application on gas chromatography on Fast Odor Chromatographic Sniffer (FOX) system. / Doctor of Philosophy / A variety of complex data has broadened in many research fields such as epidemiology, genomics, and analytical chemistry with the development of science, technologies, and design scheme over the past few decades. For example, in epidemiology, the matched case-crossover study design is used to investigate the association between the clustered binary outcomes of disease and a measurement error in covariate within a certain period by stratifying subjects' conditions. In genomics, high-correlated and high-dimensional(HCHD) data are required to identify important genes and their interaction effect over diseases. In analytical chemistry, multiple time series data are generated to recognize the complex patterns among multiple classes. Due to the great diversity, we encounter three problems in analyzing the following three types of data: (1) matched case-crossover data, (2) HCHD data, and (3) Time-series data. We contribute to the development of statistical methods to deal with such complex data.
First, under the matched study, we discuss an idea about hypothesis testing to effectively determine the association between observed factors and risk of interested disease. Because, in practice, we do not know the specific form of the association, it might be challenging to set a specific alternative hypothesis. By reflecting the reality, we consider the possibility that some observations are measured with errors. By considering these measurement errors, we develop a testing procedure under the matched case-crossover framework. This testing procedure has the flexibility to make inferences on various hypothesis settings.
Second, we consider the data where the number of variables is very large compared to the sample size, and the variables are correlated to each other. In this case, our goal is to identify important variables for outcome among a large amount of the variables and build their network. For example, identifying few genes among whole genomics associated with diabetes can be used to develop biomarkers. By our proposed approach in the second project, we can identify differentially expressed and important genes and their network structure with consideration for the outcome.
Lastly, we consider the scenario of changing patterns of interest over time with application to gas chromatography. We propose an efficient detection method to effectively distinguish the patterns of multi-level subjects in time-trend analysis. We suggest that our proposed method can give precious information on efficient search for the distinguishable patterns so as to reduce the burden of examining all observations in the data.
|
23 |
Non- and semiparametric models for conditional probabilities in two-way contingency tables / Modèles non-paramétriques et semiparamétriques pour les probabilités conditionnelles dans les tables de contingence à deux entréesGeenens, Gery 04 July 2008 (has links)
This thesis is mainly concerned with the estimation of conditional probabilities in two-way contingency
tables, that is probabilities of type P(R=i,S=j|X=x), for (i,j) in {1, . . . , r}×{1, . . . , s}, where
R and S are the two categorical variables forming the contingency table, with r and s levels respectively, and
X is a vector of explanatory variables possibly associated with R, S, or both. Analyzing such a conditional
distribution is often of interest, as this allows to go further than the usual unconditional study of the behavior
of the variables R and S. First, one can check an eventual effect of these covariates on the distribution of
the individuals through the cells of the table, and second, one can carry out usual analyses of contingency
tables, such as independence tests, taking into account, and removing in some sense, this effect. This helps
for instance to identify the external factors which could be responsible for an eventual association between
R and S. This also gives the possibility to adapt for a possible heterogeneity in the population of interest,
when analyzing the table.
|
24 |
Contributions à la réduction de dimensionKuentz, Vanessa 20 November 2009 (has links)
Cette thèse est consacrée au problème de la réduction de dimension. Cette thématique centrale en Statistique vise à rechercher des sous-espaces de faibles dimensions tout en minimisant la perte d'information contenue dans les données. Tout d'abord, nous nous intéressons à des méthodes de statistique multidimensionnelle dans le cas de variables qualitatives. Nous abordons la question de la rotation en Analyse des Correspondances Multiples (ACM). Nous définissons l'expression analytique de l'angle de rotation planaire optimal pour le critère de rotation choisi. Lorsque le nombre de composantes principales retenues est supérieur à deux, nous utilisons un algorithme de rotations planaires successives de paires de facteurs. Nous proposons également différents algorithmes de classification de variables qualitatives qui visent à optimiser un critère de partitionnement basé sur la notion de rapports de corrélation. Un jeu de données réelles illustre les intérêts pratiques de la rotation en ACM et permet de comparer empiriquement les différents algorithmes de classification de variables qualitatives proposés. Puis nous considérons un modèle de régression semiparamétrique, plus précisément nous nous intéressons à la méthode de régression inverse par tranchage (SIR pour Sliced Inverse Regression). Nous développons une approche basée sur un partitionnement de l'espace des covariables, qui est utilisable lorsque la condition fondamentale de linéarité de la variable explicative est violée. Une seconde adaptation, utilisant le bootstrap, est proposée afin d'améliorer l'estimation de la base du sous-espace de réduction de dimension. Des résultats asymptotiques sont donnés et une étude sur des données simulées démontre la supériorité des approches proposées. Enfin les différentes applications et collaborations interdisciplinaires réalisées durant la thèse sont décrites. / This thesis concentrates on dimension reduction approaches, that seek for lower dimensional subspaces minimizing the lost of statistical information. First we focus on multivariate analysis for categorical data. The rotation problem in Multiple Correspondence Analysis (MCA) is treated. We give the analytic expression of the optimal angle of planar rotation for the chosen criterion. If more than two principal components are to be retained, this planar solution is used in a practical algorithm applying successive pairwise planar rotations. Different algorithms for the clustering of categorical variables are also proposed to maximize a given partitioning criterion based on correlation ratios. A real data application highlights the benefits of using rotation in MCA and provides an empirical comparison of the proposed algorithms for categorical variable clustering. Then we study the semiparametric regression method SIR (Sliced Inverse Regression). We propose an extension based on the partitioning of the predictor space that can be used when the crucial linearity condition of the predictor is not verified. We also introduce bagging versions of SIR to improve the estimation of the basis of the dimension reduction subspace. Asymptotic properties of the estimators are obtained and a simulation study shows the good numerical behaviour of the proposed methods. Finally applied multivariate data analysis on various areas is described.
|
25 |
Estimation non paramétrique pour les processus markoviens déterministes par morceaux / Nonparametric estimation for piecewise-deterministic Markov processesAzaïs, Romain 01 July 2013 (has links)
M.H.A. Davis a introduit les processus markoviens déterministes par morceaux (PDMP) comme une classe générale de modèles stochastiques non diffusifs, donnant lieu à des trajectoires déterministes ponctuées, à des instants aléatoires, par des sauts aléatoires. Dans cette thèse, nous présentons et analysons des estimateurs non paramétriques des lois conditionnelles des deux aléas intervenant dans la dynamique de tels processus. Plus précisément, dans le cadre d'une observation en temps long de la trajectoire d'un PDMP, nous présentons des estimateurs de la densité conditionnelle des temps inter-sauts et du noyau de Markov qui gouverne la loi des sauts. Nous établissons des résultats de convergence pour nos estimateurs. Des simulations numériques pour différentes applications illustrent nos résultats. Nous proposons également un estimateur du taux de saut pour des processus de renouvellement, ainsi qu'une méthode d'approximation numérique pour un modèle de régression semi-paramétrique. / Piecewise-deterministic Markov processes (PDMP’s) have been introduced by M.H.A. Davis as a general family of non-diffusion stochastic models, involving deterministic motion punctuated by random jumps at random times. In this thesis, we propose and analyze nonparametric estimation methods for both the features governing the randomness of such a process. More precisely, we present estimators of the conditional density of the inter-jumping times and of the transition kernel for a PDMP observed within a long time interval. We establish some convergence results for both the proposed estimators. In addition, numerical simulations illustrate our theoretical results. Furthermore, we propose an estimator for the jump rate of a nonhomogeneous renewal process and a numerical approximation method based on optimal quantization for a semiparametric regression model.
|
26 |
Three studies on semi-mixed effects models / Drei Studien über semi-Mixed Effects ModelleSavaþcý, Duygu 28 September 2011 (has links)
No description available.
|
27 |
Dynamic semiparametric factor modelsBorak, Szymon 11 July 2008 (has links)
Hochdimensionale Regressionsprobleme, die sich dynamisch entwickeln, sind in zahlreichen Bereichen der Wissenschaft anzutreffen. Die Dynamik eines solchen komplexen Systems wird typischerweise mittels der Zeitreiheneigenschaften einer geringen Anzahl von Faktoren analysiert. Diese Faktoren wiederum sind mit zeitinvarianten Funktionen von explikativen Variablen bewichtet. Diese Doktorarbeit beschäftigt sich mit einem dynamischen semiparametrischen Faktormodell, dass nichtparametrische Bewichtungsfunktionen benutzt. Zu Beginn sollen kurz die wichtigsten statistischen Methoden diskutiert werden um dann auf die Eigenschaften des verwendeten Modells einzugehen. Im Anschluss folgt die Diskussion einiger Anwendungen des Modellrahmens auf verschiedene Datensätze. Besondere Aufmerksamkeit wird auf die Dynamik der so genannten Implizierten Volatilität und das daraus resultierende Faktor-Hedging von Barrier Optionen gerichtet. / High-dimensional regression problems which reveal dynamic behavior occur frequently in many different fields of science. The dynamics of the whole complex system is typically analyzed by time propagation of few number of factors, which are loaded with time invariant functions of exploratory variables. In this thesis we consider dynamic semiparametric factor model, which assumes nonparametric loading functions. We start with a short discussion of related statistical techniques and present the properties of the model. Additionally real data applications are discussed with particular focus on implied volatility dynamics and resulting factor hedging of barrier options.
|
Page generated in 0.134 seconds