Global ETD Search

101	Dimension Reduction and Variable Selection Moradi Rekabdarkolaee, Hossein 01 January 2016 (has links) High-dimensional data are becoming increasingly available as data collection technology advances. Over the last decade, significant developments have been taking place in high-dimensional data analysis, driven primarily by a wide range of applications in many fields such as genomics, signal processing, and environmental studies. Statistical techniques such as dimension reduction and variable selection play important roles in high dimensional data analysis. Sufficient dimension reduction provides a way to find the reduced space of the original space without a parametric model. This method has been widely applied in many scientific fields such as genetics, brain imaging analysis, econometrics, environmental sciences, etc. in recent years. In this dissertation, we worked on three projects. The first one combines local modal regression and Minimum Average Variance Estimation (MAVE) to introduce a robust dimension reduction approach. In addition to being robust to outliers or heavy-tailed distribution, our proposed method has the same convergence rate as the original MAVE. Furthermore, we combine local modal base MAVE with a $L_1$ penalty to select informative covariates in a regression setting. This new approach can exhaustively estimate directions in the regression mean function and select informative covariates simultaneously, while being robust to the existence of possible outliers in the dependent variable. The second project develops sparse adaptive MAVE (saMAVE). SaMAVE has advantages over adaptive LASSO because it extends adaptive LASSO to multi-dimensional and nonlinear settings, without any model assumption, and has advantages over sparse inverse dimension reduction methods in that it does not require any particular probability distribution on \textbf{X}. In addition, saMAVE can exhaustively estimate the dimensions in the conditional mean function. The third project extends the envelope method to multivariate spatial data. The envelope technique is a new version of the classical multivariate linear model. The estimator from envelope asymptotically has less variation compare to the Maximum Likelihood Estimator (MLE). The current envelope methodology is for independent observations. While the assumption of independence is convenient, this does not address the additional complication associated with a spatial correlation. This work extends the idea of the envelope method to cases where independence is an unreasonable assumption, specifically multivariate data from spatially correlated process. This novel approach provides estimates for the parameters of interest with smaller variance compared to maximum likelihood estimator while still being able to capture the spatial structure in the data. Envelope Local Modal MAVE Robust Estimation Shrinkage Estimation Spatial Process Sufficient Dimension Reduction Variable Selection. Applied Statistics Multivariate Analysis Statistical Methodology Statistical Models Statistical Theory
102	Filtered Historical SimulationValue at Risk for Options : A Dimension Reduction Approach to Model the VolatilitySurface Shifts Gunnarsson, Fredrik January 2019 (has links) No description available. Volatility surface Principle Component Analysis Implied volatility Value at Risk Historical simulation Volatility surface shifts Dimension reduction Options Risk Natural Sciences Naturvetenskap
103	Prédiction des séries temporelles larges / Prediction of large time series Hmamouche, Youssef 13 December 2018 (has links) De nos jours, les systèmes modernes sont censés stocker et traiter des séries temporelles massives. Comme le nombre de variables observées augmente très rapidement, leur prédiction devient de plus en plus compliquée, et l’utilisation de toutes les variables pose des problèmes pour les modèles classiques.Les modèles de prédiction sans facteurs externes sont parmi les premiers modèles de prédiction. En vue d’améliorer la précision des prédictions, l’utilisation de multiples variables est devenue commune. Ainsi, les modèles qui tiennent en compte des facteurs externes, ou bien les modèles multivariés, apparaissent, et deviennent de plus en plus utilisés car ils prennent en compte plus d’informations.Avec l’augmentation des données liées entre eux, l’application des modèles multivariés devient aussi discutable. Le challenge dans cette situation est de trouver les facteurs les plus pertinents parmi l’ensemble des données disponibles par rapport à une variable cible.Dans cette thèse, nous étudions ce problème en présentant une analyse détaillée des approches proposées dans la littérature. Nous abordons le problème de réduction et de prédiction des données massives. Nous discutons également ces approches dans le contexte du Big Data.Ensuite, nous présentons une méthodologie complète pour la prédiction des séries temporelles larges. Nous étendons également cette méthodologie aux données très larges via le calcul distribué et le parallélisme avec une implémentation du processus de prédiction proposé dans l’environnement Hadoop/Spark. / Nowadays, storage and data processing systems are supposed to store and process large time series. As the number of variables observed increases very rapidly, their prediction becomes more and more complicated, and the use of all the variables poses problems for classical prediction models.Univariate prediction models are among the first models of prediction. To improve these models, the use of multiple variables has become common. Thus, multivariate models and become more and more used because they consider more information.With the increase of data related to each other, the application of multivariate models is also questionable. Because the use of all existing information does not necessarily lead to the best predictions. Therefore, the challenge in this situation is to find the most relevant factors among all available data relative to a target variable.In this thesis, we study this problem by presenting a detailed analysis of the proposed approaches in the literature. We address the problem of prediction and size reduction of massive data. We also discuss these approaches in the context of Big Data.The proposed approaches show promising and very competitive results compared to well-known algorithms, and lead to an improvement in the accuracy of the predictions on the data used.Then, we present our contributions, and propose a complete methodology for the prediction of wide time series. We also extend this methodology to big data via distributed computing and parallelism with an implementation of the prediction process proposed in the Hadoop / Spark environment. Prédiction des séries temporelles Apprentissage automatique Réduction de dimension Sélection de variables Big Data Time Series Prediction Variable Selection Dimension Reduction Machine Learning 004
104	Parcellisation du manteau cortical à partir du réseau de connectivité anatomique cartographié par imagerie de diffusion / Connectivity-based parcellation of the human cortex Roca, Pauline 03 November 2011 (has links) La parcellisation du cerveau humain en aires fonctionnelles est un problème complexe mais majeur pour la compréhension du fonctionnement du cerveau et pourrait avoir des applications médicales importantes en neurochirurgie par exemple pour mieux identiﬁer les zones fonctionnelles à sauvegarder. Cet objectif va de pair avec l’idée de construire le connectome cérébral humain, qui n’est autre que le réseau de ses connexions.Pour déﬁnir un tel réseau, il faut en effet déﬁnir les éléments de ce réseau de connexions : c’est-à-dire avoir un découpage du cerveau en régions. Il existe de multiples manières et critères pour identiﬁer ces régions et à ce jour il n’y a pas de parcellisation universelle du cortex. Dans cette thèse nous étudierons la possibilité d’effectuer cette parcellisation en fonction des données de connectivité anatomique, issues de l’imagerie par résonance magnétique de diffusion, qui est une technique d’acquisition permettant de reconstruire les faisceaux de neurones cérébraux de manière non invasive. Nous nous placerons dans un cadre surfacique en étudiant seulement la surface corticale et les connexions anatomiques sous-jacentes. Dans ce contexte nous présenterons un ensemble de nouveaux outils pour construire, visualiser et simuler le connectome cérébral humain, dans un cadre surfacique et à partir des données de connectivité anatomique reconstruites par IRM, et ceci pour un groupe de sujets. A partir de ces outils nous présenterons des méthodes de réduction de dimension des données de connectivité, que nous appliquerons pour parcelliser le cortex entier de quelques sujets. Nous proposons aussi une nouvelle manière de décomposer les données de connectivité au niveau d’un groupe de sujets en tenant compte de la variabilité inter-individuelle. Cette méthode sera testée et comparée à d’autres méthodes sur des données simulées et des données réelles. Les enjeux de ce travail sont multiples, tant au niveau méthodologique (comparaison de différents algorithmes de tractographie par exemple) que clinique (étude du lien entre altérations des connexions et pathologie). / In-vivo parcellation of the human cortex into functional brain areas is a major goal to better understand how the brain works and could have a lot of medical applications and give useful information to guide neurosurgery for example. This objective is related to the buildong of the human brain connectome, which is the networks of brain connections.Indeed, it is necessary to deﬁne the basic element of this connectome, and for doing this to have a subdivision of the cortex into brain regions. Actually, there is no such gold standard parcellation : there are a lot of techniques and methods to achieve this goal. During this PhD., anatomical connectivité based on diffusion-weighted imaging hase been used to address this problem, with a surfacic approach. In this context, we will present a set of new tools to create, visualize and simulate the human brain connectome for a group of subjects. We will introduce dimension reduction methods to compile the cortical connectivity proﬁles taking into account the interindividual variability. These methods will be apply to parcellate the cortex, for one subject or for a group of subjects simultaneously.There are many applications of this work, in methodology, to compare tractography algorithms for example or in clinical, to look at the relations between connections damages and pathology. Parcellisation corticale IRM de diffusion Connectome cérébral humain Réduction de dimension Proﬁls de connectivité Cortical parcellation Diffusion MRI Human brain connectome Dimension reduction Connectivity proﬁles
105	Réduction de la dimension en régression / Dimension reduction in regression Portier, François 02 July 2013 (has links) Dans cette thèse, nous étudions le problème de réduction de la dimension dans le cadre du modèle de régression suivant Y=g(B X,e), où X est un vecteur de dimension p, Y appartient à R, la fonction g est inconnue et le bruit e est indépendant de X. Nous nous intéressons à l'estimation de la matrice B, de taille dxp où d est plus petit que p, (dont la connaissance permet d'obtenir de bonnes vitesses de convergence pour l'estimation de g). Ce problème est traité en utilisant deux approches distinctes. La première, appelée régression inverse nécessite la condition de linéarité sur X. La seconde, appelée semi-paramétrique ne requiert pas une telle condition mais seulement que X possède une densité lisse. Dans le cadre de la régression inverse, nous étudions deux familles de méthodes respectivement basées sur E[X f(Y)] et E[XX^T f(Y)]. Pour chacune de ces familles, nous obtenons les conditions sur f permettant une estimation exhaustive de B, aussi nous calculons la fonction f optimale par minimisation de la variance asymptotique. Dans le cadre de l'approche semi-paramétrique, nous proposons une méthode permettant l'estimation du gradient de la fonction de régression. Sous des hypothèses semi-paramétriques classiques, nous montrons la normalité asymptotique de notre estimateur et l'exhaustivité de l'estimation de B. Quel que soit l'approche considérée, une question fondamentale est soulevée : comment choisir la dimension de B ? Pour cela, nous proposons une méthode d'estimation du rang d'une matrice par test d'hypothèse bootstrap. / In this thesis, we study the problem of dimension reduction through the following regression model Y=g(BX,e), where X is a p dimensional vector, Y belongs to R, the function g is unknown and the noise e is independent of X. We are interested in the estimation of the matrix B, with dimension d times p where d is smaller than p (whose knowledge provides good convergence rates for the estimation of g). This problem is processed according to two different approaches. The first one, called the inverse regression, needs the linearity condition on X. The second one, called semiparametric, do not require such an assumption but only that X has a smooth density. In the context of inverse regression, we focus on two families of methods respectively based on E[X f(Y)] and E[XX^T f(Y)]. For both families, we provide conditions on f that allow an exhaustive estimation of B, and also we compute the better function f by minimizing the asymptotic variance. In the semiparametric context, we give a method for the estimation of the gradient of the regression function. Under some classical semiparametric assumptions, we show the root n consistency of our estimator, the exhaustivity of the estimation and the convergence in the processes space. Within each point, an important question is raised : how to choose the dimension of B ? For this we propose a method that estimates of the rank of a matrix by bootstrap hypothesis testing. Régression inverse Modèle à directions révélatrices Sufficient dimension reduction Inverse regression Multiple index model Average derivative estimator.
106	TRANSFORMS IN SUFFICIENT DIMENSION REDUCTION AND THEIR APPLICATIONS IN HIGH DIMENSIONAL DATA Weng, Jiaying 01 January 2019 (has links) The big data era poses great challenges as well as opportunities for researchers to develop efficient statistical approaches to analyze massive data. Sufficient dimension reduction is such an important tool in modern data analysis and has received extensive attention in both academia and industry. In this dissertation, we introduce inverse regression estimators using Fourier transforms, which is superior to the existing SDR methods in two folds, (1) it avoids the slicing of the response variable, (2) it can be readily extended to solve the high dimensional data problem. For the ultra-high dimensional problem, we investigate both eigenvalue decomposition and minimum discrepancy approaches to achieve optimal solutions and also develop a novel and efficient optimization algorithm to obtain the sparse estimates. We derive asymptotic properties of the proposed estimators and demonstrate its efficiency gains compared to the traditional estimators. The oracle properties of the sparse estimates are derived. Simulation studies and real data examples are used to illustrate the effectiveness of the proposed methods. Wavelet transform is another tool that effectively detects information from time-localization of high frequency. Parallel to our proposed Fourier transform methods, we also develop a wavelet transform version approach and derive the asymptotic properties of the resulting estimators. Central subspace Fourier transform Predictors hypothesis tests Sufficient dimension reduction Sufficient variable selection Wavelet transform Multivariate Analysis Statistical Methodology Statistical Models
107	A NEW INDEPENDENCE MEASURE AND ITS APPLICATIONS IN HIGH DIMENSIONAL DATA ANALYSIS Ke, Chenlu 01 January 2019 (has links) This dissertation has three consecutive topics. First, we propose a novel class of independence measures for testing independence between two random vectors based on the discrepancy between the conditional and the marginal characteristic functions. If one of the variables is categorical, our asymmetric index extends the typical ANOVA to a kernel ANOVA that can test a more general hypothesis of equal distributions among groups. The index is also applicable when both variables are continuous. Second, we develop a sufficient variable selection procedure based on the new measure in a large p small n setting. Our approach incorporates marginal information between each predictor and the response as well as joint information among predictors. As a result, our method is more capable of selecting all truly active variables than marginal selection methods. Furthermore, our procedure can handle both continuous and discrete responses with mixed-type predictors. We establish the sure screening property of the proposed approach under mild conditions. Third, we focus on a model-free sufficient dimension reduction approach using the new measure. Our method does not require strong assumptions on predictors and responses. An algorithm is developed to find dimension reduction directions using sequential quadratic programming. We illustrate the advantages of our new measure and its two applications in high dimensional data analysis by numerical studies across a variety of settings. High dimensional data analysis Independence Reproducing Kernel Hilbert Space Sufficient Dimension Reduction Sufficient Variable Selection Categorical Data Analysis Multivariate Analysis Statistics and Probability
108	SIR、SAVE、SIR-II、pHd等四種維度縮減方法之比較探討方悟原, Fang, Wu-Yuan Unknown Date (has links) 本文以維度縮減(dimension reduction)為主題，介紹其定義以及四種目前較被廣為討論的處理方式。文中首先針對Li (1991)所使用的維度縮減定義型式y = g(x,ε) = g1(βx,ε)，與Cook (1994)所採用的定義型式「條件密度函數f(y \| x)=f(y \|βx)」作探討，並就Cook (1994)對最小維度縮減子空間的相關討論作介紹。此外文中也試圖提出另一種適用於pHd的可能定義(E(y \| x)=E(y \|βx)，亦即縮減前後y的條件期望值不變)，並發現在此一新定義下所衍生而成的子空間會包含於Cook (1994)所定義的子空間。有關現有四種維度縮減方法(SIR、SAVE、SIR-II、pHd)的理論架構，則重新予以說明並作必要的補充證明，並以兩個機率模式(y = bx +ε及y = \|z\| +ε)為例，分別測試四種方法能否縮減出正確的方向。文中同時也分別找出對應於這四種方法的等價條件，並利用這些等價條件相互比較，得到彼此間的關係。我們發現當解釋變數x為多維常態情形下，四種方法理論上都不會保留可以被縮減的方向，而該保留住的方向卻不一定能夠被保留住，但是使用SAVE所可以保留住的方向會比單獨使用其他三者之一來的多(或至少一樣多)，而如果SIR與SIR-II同時使用則恰好等同於使用SAVE。另外使用pHd似乎時並不需要「E(y│x)二次可微分」這個先決條件。 / The focus of the study is on the dimension reduction and the over-view of the four methods frequently cited in the literature, i.e. SIR, SAVE, SIR-II, and pHd. The definitions of dimension reduction proposed by Li (1991)(y = g( x,ε) = g1(βx,ε)), and by Cook (1994)(f(y \| x)=f(y\|βx)) are briefly reviewed. Issues on minimum dimension reduction subspace (Cook (1994)) are also discussed. In addition, we propose a possible definition (E(y \| x)=E(y \|βx)), i.e. the conditional expectation of y remains the same both in the original subspace and the reduced subspace), which seems more appropriate when pHd is concerned. We also found that the subspace induced by this definition would be contained in the subspace generated based on Cook (1994). We then take a closer look at basic ideas behind the four methods, and supplement some more explanations and proofs, if necessary. Equivalent conditions related to the four methods that can be used to locate "right" directions are presented. Two models (y = bx +ε and y = \|z\| +ε) are used to demonstrate the methods and to see how good they can be. In order to further understand the possible relationships among the four methods, some comparisons are made. We learn that when x is normally distributed, directions that are redundant will not be preserved by any of the four methods. Directions that contribute significantly, however, may be mistakenly removed. Overall, SAVE has the best performance in terms of saving the "right" directions, and applying SIR along with SIR-II performs just as well. We also found that the prerequisite, 「E(y \| x) is twice differentiable」, does not seem to be necessary when pHd is applied. 維度縮減子空間 dimension reduction subspace pHd principal Hessian directions SIR sliced inverse regression SAVE sliced average variance estimate SIR-II
109	Simultaneous control of coupled actuators using singular value decomposition and semi-nonnegative matrix factorization Winck, Ryder Christian 08 November 2012 (has links) This thesis considers the application of singular value decomposition (SVD) and semi-nonnegative matrix factorization (SNMF) within feedback control systems, called the SVD System and SNMF System, to control numerous subsystems with a reduced number of control inputs. The subsystems are coupled using a row-column structure to allow mn subsystems to be controlled using m+n inputs. Past techniques for controlling systems in this row-column structure have focused on scheduling procedures that offer limited performance. The SVD and SNMF Systems permit simultaneous control of every subsystem, which increases the convergence rate by an order of magnitude compared with previous methods. In addition to closed loop control, open loop procedures using the SVD and SNMF are compared with previous scheduling procedures, demonstrating significant performance improvements. This thesis presents theoretical results for the controllability of systems using the row-column structure and for the stability and performance of the SVD and SNMF Systems. Practical challenges to the implementation of the SVD and SNMF Systems are also examined. Numerous simulation examples are provided, in particular, a dynamic simulation of a pin array device, called Digital Clay, and two physical demonstrations are used to assess the feasibility of the SVD and SNMF Systems for specific applications. Nonlinear control Stability Nonnegative matrix factorization Singular value decomposition Feedback control system Rank one Dimension reduction Feedback control systems Automatic control
110	Theoretical Results and Applications Related to Dimension Reduction Chen, Jie 01 November 2007 (has links) To overcome the curse of dimensionality, dimension reduction is important and necessary for understanding the underlying phenomena in a variety of fields. Dimension reduction is the transformation of high-dimensional data into a meaningful representation in the low-dimensional space. It can be further classified into feature selection and feature extraction. In this thesis, which is composed of four projects, the first two focus on feature selection, and the last two concentrate on feature extraction. The content of the thesis is as follows. The first project presents several efficient methods for the sparse representation of a multiple measurement vector (MMV); some theoretical properties of the algorithms are also discussed. The second project introduces the NP-hardness problem for penalized likelihood estimators, including penalized least squares estimators, penalized least absolute deviation regression and penalized support vector machines. The third project focuses on the application of manifold learning in the analysis and prediction of 24-hour electricity price curves. The last project proposes a new hessian regularized nonlinear time-series model for prediction in time series. HRM Time series prediction Electricity price curve Dimension reduction MMV Penalized likelihood estimator Computational complexity Estimation theory Prediction theory Electricity Pricing

Search results