Global ETD Search

11	Model-Free Variable Selection For Two Groups of Variables Alothman, Ahmad January 2018 (has links) In this dissertation we introduce two variable selection procedures for multivariate responses. Our procedures are based on sufficient dimension reduction concepts and are model-free. In the first procedure we consider the dual marginal coordinate hypotheses, where the role of the predictor and the response is not important. Motivated by canonical correlation analysis (CCA), we propose a CCA-based test for the dual marginal coordinate hypotheses, and devise a joint backward selection algorithm for dual model-free variable selection. The second procedure is based on ordinary least squares (OLS). We derive and study the asymptotic properties of the OLS-based test under the normality assumption of the predictors as well as an asymmetry assumption. When these assumptions are violated, the asymptotic test with elliptical trimming and clustering is still valid with desirable numerical performances. A backward selection algorithm for the predictor is also provided for the OLS-based test. The performances of the proposed tests and the variable selection procedures are evaluated through synthetic examples and a real data analysis. / Statistics Statistics Canonical Correlation Analysis Marginal Coordinate Test Multivariate Response Ordinary Least Squares Sufficient Dimension Reduction Variable Selection
12	Dimension Reduction and Variable Selection Moradi Rekabdarkolaee, Hossein 01 January 2016 (has links) High-dimensional data are becoming increasingly available as data collection technology advances. Over the last decade, significant developments have been taking place in high-dimensional data analysis, driven primarily by a wide range of applications in many fields such as genomics, signal processing, and environmental studies. Statistical techniques such as dimension reduction and variable selection play important roles in high dimensional data analysis. Sufficient dimension reduction provides a way to find the reduced space of the original space without a parametric model. This method has been widely applied in many scientific fields such as genetics, brain imaging analysis, econometrics, environmental sciences, etc. in recent years. In this dissertation, we worked on three projects. The first one combines local modal regression and Minimum Average Variance Estimation (MAVE) to introduce a robust dimension reduction approach. In addition to being robust to outliers or heavy-tailed distribution, our proposed method has the same convergence rate as the original MAVE. Furthermore, we combine local modal base MAVE with a $L_1$ penalty to select informative covariates in a regression setting. This new approach can exhaustively estimate directions in the regression mean function and select informative covariates simultaneously, while being robust to the existence of possible outliers in the dependent variable. The second project develops sparse adaptive MAVE (saMAVE). SaMAVE has advantages over adaptive LASSO because it extends adaptive LASSO to multi-dimensional and nonlinear settings, without any model assumption, and has advantages over sparse inverse dimension reduction methods in that it does not require any particular probability distribution on \textbf{X}. In addition, saMAVE can exhaustively estimate the dimensions in the conditional mean function. The third project extends the envelope method to multivariate spatial data. The envelope technique is a new version of the classical multivariate linear model. The estimator from envelope asymptotically has less variation compare to the Maximum Likelihood Estimator (MLE). The current envelope methodology is for independent observations. While the assumption of independence is convenient, this does not address the additional complication associated with a spatial correlation. This work extends the idea of the envelope method to cases where independence is an unreasonable assumption, specifically multivariate data from spatially correlated process. This novel approach provides estimates for the parameters of interest with smaller variance compared to maximum likelihood estimator while still being able to capture the spatial structure in the data. Envelope Local Modal MAVE Robust Estimation Shrinkage Estimation Spatial Process Sufficient Dimension Reduction Variable Selection. Applied Statistics Multivariate Analysis Statistical Methodology Statistical Models Statistical Theory
13	Réduction de la dimension en régression / Dimension reduction in regression Portier, François 02 July 2013 (has links) Dans cette thèse, nous étudions le problème de réduction de la dimension dans le cadre du modèle de régression suivant Y=g(B X,e), où X est un vecteur de dimension p, Y appartient à R, la fonction g est inconnue et le bruit e est indépendant de X. Nous nous intéressons à l'estimation de la matrice B, de taille dxp où d est plus petit que p, (dont la connaissance permet d'obtenir de bonnes vitesses de convergence pour l'estimation de g). Ce problème est traité en utilisant deux approches distinctes. La première, appelée régression inverse nécessite la condition de linéarité sur X. La seconde, appelée semi-paramétrique ne requiert pas une telle condition mais seulement que X possède une densité lisse. Dans le cadre de la régression inverse, nous étudions deux familles de méthodes respectivement basées sur E[X f(Y)] et E[XX^T f(Y)]. Pour chacune de ces familles, nous obtenons les conditions sur f permettant une estimation exhaustive de B, aussi nous calculons la fonction f optimale par minimisation de la variance asymptotique. Dans le cadre de l'approche semi-paramétrique, nous proposons une méthode permettant l'estimation du gradient de la fonction de régression. Sous des hypothèses semi-paramétriques classiques, nous montrons la normalité asymptotique de notre estimateur et l'exhaustivité de l'estimation de B. Quel que soit l'approche considérée, une question fondamentale est soulevée : comment choisir la dimension de B ? Pour cela, nous proposons une méthode d'estimation du rang d'une matrice par test d'hypothèse bootstrap. / In this thesis, we study the problem of dimension reduction through the following regression model Y=g(BX,e), where X is a p dimensional vector, Y belongs to R, the function g is unknown and the noise e is independent of X. We are interested in the estimation of the matrix B, with dimension d times p where d is smaller than p (whose knowledge provides good convergence rates for the estimation of g). This problem is processed according to two different approaches. The first one, called the inverse regression, needs the linearity condition on X. The second one, called semiparametric, do not require such an assumption but only that X has a smooth density. In the context of inverse regression, we focus on two families of methods respectively based on E[X f(Y)] and E[XX^T f(Y)]. For both families, we provide conditions on f that allow an exhaustive estimation of B, and also we compute the better function f by minimizing the asymptotic variance. In the semiparametric context, we give a method for the estimation of the gradient of the regression function. Under some classical semiparametric assumptions, we show the root n consistency of our estimator, the exhaustivity of the estimation and the convergence in the processes space. Within each point, an important question is raised : how to choose the dimension of B ? For this we propose a method that estimates of the rank of a matrix by bootstrap hypothesis testing. Régression inverse Modèle à directions révélatrices Sufficient dimension reduction Inverse regression Multiple index model Average derivative estimator.
14	TRANSFORMS IN SUFFICIENT DIMENSION REDUCTION AND THEIR APPLICATIONS IN HIGH DIMENSIONAL DATA Weng, Jiaying 01 January 2019 (has links) The big data era poses great challenges as well as opportunities for researchers to develop efficient statistical approaches to analyze massive data. Sufficient dimension reduction is such an important tool in modern data analysis and has received extensive attention in both academia and industry. In this dissertation, we introduce inverse regression estimators using Fourier transforms, which is superior to the existing SDR methods in two folds, (1) it avoids the slicing of the response variable, (2) it can be readily extended to solve the high dimensional data problem. For the ultra-high dimensional problem, we investigate both eigenvalue decomposition and minimum discrepancy approaches to achieve optimal solutions and also develop a novel and efficient optimization algorithm to obtain the sparse estimates. We derive asymptotic properties of the proposed estimators and demonstrate its efficiency gains compared to the traditional estimators. The oracle properties of the sparse estimates are derived. Simulation studies and real data examples are used to illustrate the effectiveness of the proposed methods. Wavelet transform is another tool that effectively detects information from time-localization of high frequency. Parallel to our proposed Fourier transform methods, we also develop a wavelet transform version approach and derive the asymptotic properties of the resulting estimators. Central subspace Fourier transform Predictors hypothesis tests Sufficient dimension reduction Sufficient variable selection Wavelet transform Multivariate Analysis Statistical Methodology Statistical Models
15	A NEW INDEPENDENCE MEASURE AND ITS APPLICATIONS IN HIGH DIMENSIONAL DATA ANALYSIS Ke, Chenlu 01 January 2019 (has links) This dissertation has three consecutive topics. First, we propose a novel class of independence measures for testing independence between two random vectors based on the discrepancy between the conditional and the marginal characteristic functions. If one of the variables is categorical, our asymmetric index extends the typical ANOVA to a kernel ANOVA that can test a more general hypothesis of equal distributions among groups. The index is also applicable when both variables are continuous. Second, we develop a sufficient variable selection procedure based on the new measure in a large p small n setting. Our approach incorporates marginal information between each predictor and the response as well as joint information among predictors. As a result, our method is more capable of selecting all truly active variables than marginal selection methods. Furthermore, our procedure can handle both continuous and discrete responses with mixed-type predictors. We establish the sure screening property of the proposed approach under mild conditions. Third, we focus on a model-free sufficient dimension reduction approach using the new measure. Our method does not require strong assumptions on predictors and responses. An algorithm is developed to find dimension reduction directions using sequential quadratic programming. We illustrate the advantages of our new measure and its two applications in high dimensional data analysis by numerical studies across a variety of settings. High dimensional data analysis Independence Reproducing Kernel Hilbert Space Sufficient Dimension Reduction Sufficient Variable Selection Categorical Data Analysis Multivariate Analysis Statistics and Probability

Page generated in 0.2133 seconds