Spelling suggestions: "subject:"cufficient dimension reduction"" "subject:"asufficient dimension reduction""
1 |
DEEP LEARNING FOR STATISTICAL DATA ANALYSIS: DIMENSION REDUCTION AND CAUSAL STRUCTURE INFERENCESiqi Liang (11799653) 19 December 2021 (has links)
<div>During the past decades, deep learning has been proven to be an important tool for statistical data analysis. Motivated by the promise of deep learning in tackling the curse of dimensionality, we propose three innovative methods which apply deep learning techniques to high-dimensional data analysis in this dissertation.</div><div><br></div><div>Firstly, we propose a nonlinear sufficient dimension reduction method, the so-called split-and-merge deep neural networks (SM-DNN), which employs the split-and-merge technique on deep neural networks to obtain nonlinear sufficient dimension reduction of the input data and then learn a deep neural network on the dimension reduced data. We show that the DNN-based dimension reduction is sufficient for data drawn from exponential family, which retains all information on response contained in the explanatory data. Our numerical experiments indicate that the SM-DNN method can lead to significant improvement in phenotype prediction for a variety of real data examples. In particular, with only rare variants, we achieved a remarkable prediction accuracy of over 74\% for the Early-Onset Myocardial Infarction (EOMI) exome sequence data. </div><div><br></div><div>Secondly, we propose another nonlinear SDR method based on a new type of stochastic neural network under a rigorous probabilistic framework and show that it can be used for sufficient dimension reduction for high-dimensional data. The proposed stochastic neural network can be trained using an adaptive stochastic gradient Markov chain Monte Carlo algorithm. Through extensive experiments on real-world classification and regression problems, we show that the proposed method compares favorably with the existing state-of-the-art sufficient dimension reduction methods and is computationally more efficient for large-scale data.</div><div><br></div><div>Finally, we propose a structure learning method for learning the causal structure hidden in the high-dimensional data, which consists of two stages:</div><div>we first conduct Bayesian sparse learning for variable screening to build a primary graph, and then we perform conditional independence tests to refine the primary graph. </div><div>Extensive numerical experiments and quantitative tests confirm the generality, effectiveness and power of the proposed methods.</div>
|
2 |
Application of Influence Function in Sufficient Dimension Reduction ModelsShrestha, Prabha 28 September 2020 (has links)
No description available.
|
3 |
On Analysis of Sufficient Dimension Reduction ModelsAn, Panduan 04 June 2019 (has links)
No description available.
|
4 |
Advances on Dimension Reduction for Univariate and Multivariate Time SeriesMahappu Kankanamge, Tharindu Priyan De Alwis 01 August 2022 (has links) (PDF)
Advances in modern technologies have led to an abundance of high-dimensional time series data in many fields, including finance, economics, health, engineering, and meteorology, among others. This causes the “curse of dimensionality” problem in both univariate and multivariate time series data. The main objective of time series analysis is to make inferences about the conditional distributions. There are some methods in the literature to estimate the conditional mean and conditional variance functions in time series. However, most of those are inefficient, computationally intensive, or suffer from the overparameterization. We propose some dimension reduction techniques to address the curse of dimensionality in high-dimensional time series dataFor high-dimensional matrix-valued time series data, there are a limited number of methods in the literature that can preserve the matrix structure and reduce the number of parameters significantly (Samadi, 2014, Chen et al., 2021). However, those models cannot distinguish between relevant and irrelevant information and yet suffer from the overparameterization. We propose a novel dimension reduction technique for matrix-variate time series data called the "envelope matrix autoregressive model" (EMAR), which offers substantial dimension reduction and links the mean function and the covariance matrix of the model by using the minimal reducing subspace of the covariance matrix. The proposed model can identify and remove irrelevant information and can achieve substantial efficiency gains by significantly reducing the total number of parameters. We derive the asymptotic properties of the proposed maximum likelihood estimators of the EMAR model. Extensive simulation studies and a real data analysis are conducted to corroborate our theoretical results and to illustrate the finite sample performance of the proposed EMAR model.For univariate time series, we propose sufficient dimension reduction (SDR) methods based on some integral transformation approaches that can preserve sufficient information about the response. In particular, we use the Fourier and Convolution transformation methods (FM and CM) to perform sufficient dimension reduction in univariate time series and estimate the time series central subspace (TS-CS), the time series mean subspace (TS-CMS), and the time series variance subspace (TS-CVS). Using FM and CM procedures and with some distributional assumptions, we derive candidate matrices that can fully recover the TS-CS, TS-CMS, and TS-CVS, and propose an explicit estimate of the candidate matrices. The asymptotic properties of the proposed estimators are established under both normality and non-normality assumptions. Moreover, we develop some data-drive methods to estimate the dimension of the time series central subspaces as well as the lag order. Our simulation results and real data analyses reveal that the proposed methods are not only significantly more efficient and accurate but also offer substantial computational efficiency compared to the existing methods in the literature. Moreover, we develop an R package entitled “sdrt” to easily perform our program code in FM and CM procedures to estimate suffices dimension reduction subspaces in univariate time series.
|
5 |
On Sufficient Dimension Reduction via Asymmetric Least SquaresSoale, Abdul-Nasah, 0000-0003-2093-7645 January 2021 (has links)
Accompanying the advances in computer technology is an increase collection of high dimensional data in many scientific and social studies. Sufficient dimension reduction (SDR) is a statistical method that enable us to reduce the dimension ofpredictors without loss of regression information. In this dissertation, we introduce principal asymmetric least squares (PALS) as a unified framework for linear and nonlinear sufficient dimension reduction. Classical methods such as sliced inverse regression (Li, 1991) and principal support vector machines (Li, Artemiou and Li, 2011) often do not perform well in the presence of heteroscedastic error, while our proposal addresses this limitation by synthesizing different expectile levels. Through extensive numerical studies, we demonstrate the superior performance of PALS in terms of both computation time and estimation accuracy. For the asymptotic analysis of PALS for linear sufficient dimension reduction, we develop new tools to compute the derivative of an expectation of a non-Lipschitz function.
PALS is not designed to handle symmetric link function between the response and the predictors. As a remedy, we develop expectile-assisted inverse regression estimation (EA-IRE) as a unified framework for moment-based inverse regression. We propose to first estimate the expectiles through kernel expectile regression, and then carry out dimension reduction based on random projections of the regression expectiles. Several popular inverse regression methods in the literature including slice inverse regression, slice average variance estimation, and directional regression are extended under this general framework. The proposed expectile-assisted methods outperform existing moment-based dimension reduction methods in both numerical studies and an analysis of the Big Mac data. / Statistics
|
6 |
Analysis of Sparse Sufficient Dimension Reduction ModelsWithanage, Yeshan 16 September 2022 (has links)
No description available.
|
7 |
Bayesian Model Averaging Sufficient Dimension ReductionPower, Michael Declan January 2020 (has links)
In sufficient dimension reduction (Li, 1991; Cook, 1998b), original predictors are replaced by their low-dimensional linear combinations while preserving all of the conditional information of the response given the predictors. Sliced inverse regression [SIR; Li, 1991] and principal Hessian directions [PHD; Li, 1992] are two popular sufficient dimension reduction methods, and both SIR and PHD estimators involve all of the original predictor variables. To deal with the cases when the linear combinations involve only a subset of the original predictors, we propose a Bayesian model averaging (Raftery et al., 1997) approach to achieve sparse sufficient dimension reduction. We extend both SIR and PHD under the Bayesian framework. The superior performance of the proposed methods is demonstrated through extensive numerical studies as well as a real data analysis. / Statistics
|
8 |
Sufficient Dimension Reduction with Missing DataXIA, QI January 2017 (has links)
Existing sufficient dimension reduction (SDR) methods typically consider cases with no missing data. The dissertation aims to propose methods to facilitate the SDR methods when the response can be missing. The first part of the dissertation focuses on the seminal sliced inverse regression (SIR) approach proposed by Li (1991). We show that missing responses generally affect the validity of the inverse regressions under the mechanism of missing at random. We then propose a simple and effective adjustment with inverse probability weighting that guarantees the validity of SIR. Furthermore, a marginal coordinate test is introduced for this adjusted estimator. The proposed method share the simplicity of SIR and requires the linear conditional mean assumption. The second part of the dissertation proposes two new estimating equation procedures: the complete case estimating equation approach and the inverse probability weighted estimating equation approach. The two approaches are applied to a family of dimension reduction methods, which includes ordinary least squares, principal Hessian directions, and SIR. By solving the estimating equations, the two approaches are able to avoid the common assumptions in the SDR literature, the linear conditional mean assumption, and the constant conditional variance assumption. For all the aforementioned methods, the asymptotic properties are established, and their superb finite sample performances are demonstrated through extensive numerical studies as well as a real data analysis. In addition, existing estimators of the central mean space have uneven performances across different types of link functions. To address this limitation, a new hybrid SDR estimator is proposed that successfully recovers the central mean space for a wide range of link functions. Based on the new hybrid estimator, we further study the order determination procedure and the marginal coordinate test. The superior performance of the hybrid estimator over existing methods is demonstrated in simulation studies. Note that the proposed procedures dealing with the missing response at random can be simply adapted to this hybrid method. / Statistics
|
9 |
Sufficient Dimension Reduction in Complex DatasetsYang, Chaozheng January 2016 (has links)
This dissertation focuses on two problems in dimension reduction. One is using permutation approach to test predictor contribution. The permutation approach applies to marginal coordinate tests based on dimension reduction methods such as SIR, SAVE and DR. This approach no longer requires calculation of the method-specific weights to determine the asymptotic null distribution. The other one is through combining clustering method with robust regression (least absolute deviation) to estimate dimension reduction subspace. Compared with ordinary least squares, the proposed method is more robust to outliers; also, this method replaces the global linearity assumption with the more flexible local linearity assumption through k-means clustering. / Statistics
|
10 |
INFORMATIONAL INDEX AND ITS APPLICATIONS IN HIGH DIMENSIONAL DATAYuan, Qingcong 01 January 2017 (has links)
We introduce a new class of measures for testing independence between two random vectors, which uses expected difference of conditional and marginal characteristic functions. By choosing a particular weight function in the class, we propose a new index for measuring independence and study its property. Two empirical versions are developed, their properties, asymptotics, connection with existing measures and applications are discussed. Implementation and Monte Carlo results are also presented.
We propose a two-stage sufficient variable selections method based on the new index to deal with large p small n data. The method does not require model specification and especially focuses on categorical response. Our approach always improves other typical screening approaches which only use marginal relation. Numerical studies are provided to demonstrate the advantages of the method.
We introduce a novel approach to sufficient dimension reduction problems using the new measure. The proposed method requires very mild conditions on the predictors, estimates the central subspace effectively and is especially useful when response is categorical. It keeps the model-free advantage without estimating link function. Under regularity conditions, root-n consistency and asymptotic normality are established. The proposed method is very competitive and robust comparing to existing dimension reduction methods through simulations results.
|
Page generated in 0.1418 seconds