Global ETD Search

1	SMVCIR Dimensionality Test Lindsey, Charles D. 2010 May 1900 (has links) The original SMVCIR algorithm was developed by Simon J. Sheather, Joseph W. McKean, and Kimberly Crimin. The dissertation first presents a new version of this algorithm that uses the scaling standardization rather than the Mahalanobis standardization. This algorithm takes grouped multivariate data as input and then outputs a new coordinate space that contrasts the groups in location, scale, and covariance. The central goal of research is to develop a method to determine the dimension of this space with statistical confidence. A dimensionality test is developed that can be used to make this determination. The new SMVCIR algorithm is compared with two other inverse regression algorithms, SAVE and SIR in the process of developing the dimensionality test and testing it. The dimensionality test is based on the singular values of the kernel of the spanning set of the vector space. The asymptotic distribution of the spanning set is found by using the central limit theorem, delta method, and finally Slutsky's Theorem with a permutation matrix. This yields a mean adjusted asymptotic distribution of the spanning set. Theory by Eaton, Tyler, and others is then used to show an equivalence between the singular values of the mean adjusted spanning set statistic and the singular values of the spanning set statistic. The test statistic is a sample size scaled sum of squared singular values of the spanning set. This statistic is asymptotically equivalent in distribution to that of a linear combination of independent 21 random variables. Simulations are performed to corroborate these theoretic findings. Additionally, based on work by Bentler and Xie, an approximation to the test statistic reference distribution is proposed and tested. This is also corroborated with simulations. Examples are performed that demonstrate how SMVCIR is used and how the developed tests for dimensionality are performed. Finally, further directions of research are hinted at for SMVCIR and the dimensionality test. One of the more interesting directions is explored by briefly examining how SMVCIR can be used to identify potentially complex functions that link predictors and a continuous response variable. multivariate data group discrimination inverse regression
2	Race time prediction for Taiwan marathoner Jiang, Cheng-Hong 19 July 2008 (has links) Pete Riegel, a well-known sport expert, proposed the formula of race time prediction in 1977. This article discusses whether it is also suitable for Taiwan marathoners. We compiled two hundred and four effective datum by questionary. Some variables possible to affect the running result are added in this work, namely: sex, age, the year of run, height, weight, the race number of marathon, the quantity and the frequency of practices each week. Next, we use multiple regression and sliced inverse regression to increase the accuracy of the running time prediction. The best model, found here has eighty percentage's player with predictive error within fifteen minuates, which is better than the original model by Riegel(1977) with only having sixty-two percentages. sliced inverse regression marathon multiple regression
3	Functional inverse regression and reproducing kernel Hilbert space Ren, Haobo 30 October 2006 (has links) The basic philosophy of Functional Data Analysis (FDA) is to think of the observed data functions as elements of a possibly infinite-dimensional function space. Most of the current research topics on FDA focus on advancing theoretical tools and extending existing multivariate techniques to accommodate the infinite-dimensional nature of data. This dissertation reports contributions on both fronts, where a unifying inverse regression theory for both the multivariate setting (Li 1991) and functional data from a Reproducing Kernel Hilbert Space (RKHS) prospective is developed. We proposed a functional multiple-index model which models a real response variable as a function of a few predictor variables called indices. These indices are random elements of the Hilbert space spanned by a second order stochastic process and they constitute the so-called Effective Dimensional Reduction Space (EDRS). To conduct inference on the EDRS, we discovered a fundamental result which reveals the geometrical association between the EDRS and the RKHS of the process. Two inverse regression procedures, a Ã¢ÂÂslicingÃ¢ÂÂ approach and a kernel approach, were introduced to estimate the counterpart of the EDRS in the RKHS. Further the estimate of the EDRS was achieved via the transformation from the RKHS to the original Hilbert space. To construct an asymptotic theory, we introduced an isometric mapping from the empirical RKHS to the theoretical RKHS, which can be used to measure the distance between the estimator and the target. Some general computational issues of FDA were discussed, which led to the smoothed versions of the functional inverse regression methods. Simulation studies were performed to evaluate the performance of the inference procedures and applications to biological and chemometrical data analysis were illustrated. Functional data analaysis Inverse Regression Reproducing Kernel Hilbert Space
4	Dynamic Bayesian Approaches to the Statistical Calibration Problem Rivers, Derick Lorenzo 01 January 2014 (has links) The problem of statistical calibration of a measuring instrument can be framed both in a statistical context as well as in an engineering context. In the first, the problem is dealt with by distinguishing between the "classical" approach and the "inverse" regression approach. Both of these models are static models and are used to estimate "exact" measurements from measurements that are affected by error. In the engineering context, the variables of interest are considered to be taken at the time at which you observe the measurement. The Bayesian time series analysis method of Dynamic Linear Models (DLM) can be used to monitor the evolution of the measures, thus introducing a dynamic approach to statistical calibration. The research presented employs the use of Bayesian methodology to perform statistical calibration. The DLM framework is used to capture the time-varying parameters that may be changing or drifting over time. Dynamic based approaches to the linear, nonlinear, and multivariate calibration problem are presented in this dissertation. Simulation studies are conducted where the dynamic models are compared to some well known "static'" calibration approaches in the literature from both the frequentist and Bayesian perspectives. Applications to microwave radiometry are given. Statistical Calibration Bayesian Dynamic Linear Model Inverse Regression Applied Statistics Multivariate Analysis Statistical Models
5	Partition Models for Variable Selection and Interaction Detection Jiang, Bo 27 September 2013 (has links) Variable selection methods play important roles in modeling high-dimensional data and are key to data-driven scientific discoveries. In this thesis, we consider the problem of variable selection with interaction detection. Instead of building a predictive model of the response given combinations of predictors, we start by modeling the conditional distribution of predictors given partitions based on responses. We use this inverse modeling perspective as motivation to propose a stepwise procedure for effectively detecting interaction with few assumptions on parametric form. The proposed procedure is able to detect pairwise interactions among p predictors with a computational time of \(O(p)\) instead of \(O(p^2)\) under moderate conditions. We establish consistency of the proposed procedure in variable selection under a diverging number of predictors and sample size. We demonstrate its excellent empirical performance in comparison with some existing methods through simulation studies as well as real data examples. Next, we combine the forward and inverse modeling perspectives under the Bayesian framework to detect pleiotropic and epistatic effects in effects in expression quantitative loci (eQTLs) studies. We augment the Bayesian partition model proposed by Zhang et al. (2010) to capture complex dependence structure among gene expression and genetic markers. In particular, we propose a sequential partition prior to model the asymmetric roles played by the response and the predictors, and we develop an efficient dynamic programming algorithm for sampling latent individual partitions. The augmented partition model significantly improves the power in detecting eQTLs compared to previous methods in both simulations and real data examples pertaining to yeast. Finally, we study the application of Bayesian partition models in the unsupervised learning of transcription factor (TF) families based on protein binding microarray (PBM). The problem of TF subclass identification can be viewed as the clustering of TFs with variable selection on their binding DNA sequences. Our model provides simultaneous identification of TF families and their shared sequence preferences, as well as DNA sequences bound preferentially by individual members of TF families. Our analysis may aid in deciphering cis regulatory codes and determinants of protein-DNA binding specificity. / Statistics Statistics Hierarchical models Inverse models Quantitative trait loci Sliced inverse regression Sure independence screening Transcriptional regulation
6	Réduction de dimension via Sliced Inverse Regression : Idées et nouvelles propositions / Dimension reductio via Sliced Inverse Regression : ideas and extensions Chiancone, Alessandro 28 October 2016 (has links) Cette thèse propose trois extensions de la Régression linéaire par tranches (Sliced Inverse Regression, SIR), notamment Collaborative SIR, Student SIR et Knockoff SIR.Une des faiblesses de la méthode SIR est l’impossibilité de vérifier si la Linearity Design Condition (LDC) est respectée. Il est établi que, si x suit une distribution elliptique, la condition est vraie ; dans le cas d’une composition de distributions elliptiques il n y a aucune garantie que la condition soit vérifiée globalement, pourtant, elle est respectée localement.On va donc proposer une extension sur la base de cette considération. Étant donné une variable explicative x, Collaborative SIR réalise d’abord un clustering. Pour chaque cluster, la méthode SIR est appliquée de manière indépendante.Le résultat de chaque composant contribue à créer la solution finale.Le deuxième papier, Student SIR, dérive de la nécessité de robustifier la méthode SIR.Vu que cette dernière repose sur l’estimation de la covariance et contient une étape APC, alors elle est sensible au bruit.Afin d’étendre la méthode SIR on a utilisé une stratégie fondée sur une formulation inverse du SIR, proposée par R.D. Cook.Finalement, Knockoff SIR est une extension de la méthode SIR pour la sélection des variables et la recherche d’une solution sparse, ayant son fondement dans le papier publié par R.F. Barber et E.J. Candès qui met l’accent sur le false discovery rate dans le cadre de la régression. L’idée sous-jacente à notre papier est de créer des copies de variables d’origine ayant certaines proprietés.On va montrer que la méthode SIR est robuste par rapport aux copies et on va proposer une stratégie pour utiliser les résultats dans la sélection des variables et pour générer des solutions sparse / This thesis proposes three extensions of Sliced Inverse Regression namely: Collaborative SIR, Student SIR and Knockoff SIR.One of the weak points of SIR is the impossibility to check if the Linearity Design Condition (LDC) holds. It is known that if X follows an elliptic distribution thecondition holds true, in case of a mixture of elliptic distributions there are no guaranties that the condition is satisfied globally, but locally holds. Starting from this consideration an extension is proposed. Given the predictor variable X, Collaborative SIR performs initially a clustering. In each cluster, SIR is applied independently. The result from each component collaborates to give the final solution.Our second contribution, Student SIR, comes from the need to robustify SIR. Since SIR is based on the estimation of the covariance, and contains a PCA step, it is indeed sensitive to noise. To extend SIR, an approach based on a inverse formulation of SIR proposed by R.D. Cook has been used.Finally Knockoff SIR is an extension of SIR to perform variable selection and give sparse solution that has its foundations in a recently published paper by R. F. Barber and E. J. Candès that focuses on the false discovery rate in the regression framework. The underlying idea of this paper is to construct copies of the original variables that have some properties. It is shown that SIR is robust to this copies and a strategy is proposed to use this result for variable selection and to generate sparse solutions. Régression linéaire par tranches Reduction de dimension Selection de variables Sliced Inverse Regression Dimension reduction Variable selection 510
7	Bayesian Model Averaging Sufficient Dimension Reduction Power, Michael Declan January 2020 (has links) In sufficient dimension reduction (Li, 1991; Cook, 1998b), original predictors are replaced by their low-dimensional linear combinations while preserving all of the conditional information of the response given the predictors. Sliced inverse regression [SIR; Li, 1991] and principal Hessian directions [PHD; Li, 1992] are two popular sufficient dimension reduction methods, and both SIR and PHD estimators involve all of the original predictor variables. To deal with the cases when the linear combinations involve only a subset of the original predictors, we propose a Bayesian model averaging (Raftery et al., 1997) approach to achieve sparse sufficient dimension reduction. We extend both SIR and PHD under the Bayesian framework. The superior performance of the proposed methods is demonstrated through extensive numerical studies as well as a real data analysis. / Statistics Statistics Bayesian Model Averaging Distance Correlation Principal Hessian Directions Sliced Inverse Regression Sufficient Dimension Reduction
8	Sufficient Dimension Reduction with Missing Data XIA, QI January 2017 (has links) Existing sufficient dimension reduction (SDR) methods typically consider cases with no missing data. The dissertation aims to propose methods to facilitate the SDR methods when the response can be missing. The first part of the dissertation focuses on the seminal sliced inverse regression (SIR) approach proposed by Li (1991). We show that missing responses generally affect the validity of the inverse regressions under the mechanism of missing at random. We then propose a simple and effective adjustment with inverse probability weighting that guarantees the validity of SIR. Furthermore, a marginal coordinate test is introduced for this adjusted estimator. The proposed method share the simplicity of SIR and requires the linear conditional mean assumption. The second part of the dissertation proposes two new estimating equation procedures: the complete case estimating equation approach and the inverse probability weighted estimating equation approach. The two approaches are applied to a family of dimension reduction methods, which includes ordinary least squares, principal Hessian directions, and SIR. By solving the estimating equations, the two approaches are able to avoid the common assumptions in the SDR literature, the linear conditional mean assumption, and the constant conditional variance assumption. For all the aforementioned methods, the asymptotic properties are established, and their superb finite sample performances are demonstrated through extensive numerical studies as well as a real data analysis. In addition, existing estimators of the central mean space have uneven performances across different types of link functions. To address this limitation, a new hybrid SDR estimator is proposed that successfully recovers the central mean space for a wide range of link functions. Based on the new hybrid estimator, we further study the order determination procedure and the marginal coordinate test. The superior performance of the hybrid estimator over existing methods is demonstrated in simulation studies. Note that the proposed procedures dealing with the missing response at random can be simply adapted to this hybrid method. / Statistics Statistics Missing at Random Ordinary Least Squares Principal Hessian Directions Sliced Inverse Regression Sufficient Dimension Reduction
9	Stochastic modelling using large data sets : applications in ecology and genetics Coudret, Raphaël 16 September 2013 (has links) (PDF) There are two main parts in this thesis. The first one concerns valvometry, which is here the study of the distance between both parts of the shell of an oyster, over time. The health status of oysters can be characterized using valvometry in order to obtain insights about the quality of their environment. We consider that a renewal process with four states underlies the behaviour of the studied oysters. Such a hidden process can be retrieved from a valvometric signal by assuming that some probability density function linked with this signal, is bimodal. We then compare several estimators which take this assumption into account, including kernel density estimators.In another chapter, we compare several regression approaches, aiming at analysing transcriptomic data. To understand which explanatory variables have an effect on gene expressions, we apply a multiple testing procedure on these data, through the linear model FAMT. The SIR method may find nonlinear relations in such a context. It is however more commonly used when the response variable is univariate. A multivariate version of SIR was then developed. Procedures to measure gene expressions can be expensive. The sample size n of the corresponding datasets is then often small. That is why we also studied SIR when n is less than the number of explanatory variables p. Kernel density estimator Multiple testing Renewal process Sliced inverse regression Transcriptomics Valvometry
10	Réduction de la dimension en régression / Dimension reduction in regression Portier, François 02 July 2013 (has links) Dans cette thèse, nous étudions le problème de réduction de la dimension dans le cadre du modèle de régression suivant Y=g(B X,e), où X est un vecteur de dimension p, Y appartient à R, la fonction g est inconnue et le bruit e est indépendant de X. Nous nous intéressons à l'estimation de la matrice B, de taille dxp où d est plus petit que p, (dont la connaissance permet d'obtenir de bonnes vitesses de convergence pour l'estimation de g). Ce problème est traité en utilisant deux approches distinctes. La première, appelée régression inverse nécessite la condition de linéarité sur X. La seconde, appelée semi-paramétrique ne requiert pas une telle condition mais seulement que X possède une densité lisse. Dans le cadre de la régression inverse, nous étudions deux familles de méthodes respectivement basées sur E[X f(Y)] et E[XX^T f(Y)]. Pour chacune de ces familles, nous obtenons les conditions sur f permettant une estimation exhaustive de B, aussi nous calculons la fonction f optimale par minimisation de la variance asymptotique. Dans le cadre de l'approche semi-paramétrique, nous proposons une méthode permettant l'estimation du gradient de la fonction de régression. Sous des hypothèses semi-paramétriques classiques, nous montrons la normalité asymptotique de notre estimateur et l'exhaustivité de l'estimation de B. Quel que soit l'approche considérée, une question fondamentale est soulevée : comment choisir la dimension de B ? Pour cela, nous proposons une méthode d'estimation du rang d'une matrice par test d'hypothèse bootstrap. / In this thesis, we study the problem of dimension reduction through the following regression model Y=g(BX,e), where X is a p dimensional vector, Y belongs to R, the function g is unknown and the noise e is independent of X. We are interested in the estimation of the matrix B, with dimension d times p where d is smaller than p (whose knowledge provides good convergence rates for the estimation of g). This problem is processed according to two different approaches. The first one, called the inverse regression, needs the linearity condition on X. The second one, called semiparametric, do not require such an assumption but only that X has a smooth density. In the context of inverse regression, we focus on two families of methods respectively based on E[X f(Y)] and E[XX^T f(Y)]. For both families, we provide conditions on f that allow an exhaustive estimation of B, and also we compute the better function f by minimizing the asymptotic variance. In the semiparametric context, we give a method for the estimation of the gradient of the regression function. Under some classical semiparametric assumptions, we show the root n consistency of our estimator, the exhaustivity of the estimation and the convergence in the processes space. Within each point, an important question is raised : how to choose the dimension of B ? For this we propose a method that estimates of the rank of a matrix by bootstrap hypothesis testing. Régression inverse Modèle à directions révélatrices Sufficient dimension reduction Inverse regression Multiple index model Average derivative estimator.

Search results