Global ETD Search

11	SIR、SAVE、SIR-II、pHd等四種維度縮減方法之比較探討方悟原, Fang, Wu-Yuan Unknown Date (has links) 本文以維度縮減(dimension reduction)為主題，介紹其定義以及四種目前較被廣為討論的處理方式。文中首先針對Li (1991)所使用的維度縮減定義型式y = g(x,ε) = g1(βx,ε)，與Cook (1994)所採用的定義型式「條件密度函數f(y \| x)=f(y \|βx)」作探討，並就Cook (1994)對最小維度縮減子空間的相關討論作介紹。此外文中也試圖提出另一種適用於pHd的可能定義(E(y \| x)=E(y \|βx)，亦即縮減前後y的條件期望值不變)，並發現在此一新定義下所衍生而成的子空間會包含於Cook (1994)所定義的子空間。有關現有四種維度縮減方法(SIR、SAVE、SIR-II、pHd)的理論架構，則重新予以說明並作必要的補充證明，並以兩個機率模式(y = bx +ε及y = \|z\| +ε)為例，分別測試四種方法能否縮減出正確的方向。文中同時也分別找出對應於這四種方法的等價條件，並利用這些等價條件相互比較，得到彼此間的關係。我們發現當解釋變數x為多維常態情形下，四種方法理論上都不會保留可以被縮減的方向，而該保留住的方向卻不一定能夠被保留住，但是使用SAVE所可以保留住的方向會比單獨使用其他三者之一來的多(或至少一樣多)，而如果SIR與SIR-II同時使用則恰好等同於使用SAVE。另外使用pHd似乎時並不需要「E(y│x)二次可微分」這個先決條件。 / The focus of the study is on the dimension reduction and the over-view of the four methods frequently cited in the literature, i.e. SIR, SAVE, SIR-II, and pHd. The definitions of dimension reduction proposed by Li (1991)(y = g( x,ε) = g1(βx,ε)), and by Cook (1994)(f(y \| x)=f(y\|βx)) are briefly reviewed. Issues on minimum dimension reduction subspace (Cook (1994)) are also discussed. In addition, we propose a possible definition (E(y \| x)=E(y \|βx)), i.e. the conditional expectation of y remains the same both in the original subspace and the reduced subspace), which seems more appropriate when pHd is concerned. We also found that the subspace induced by this definition would be contained in the subspace generated based on Cook (1994). We then take a closer look at basic ideas behind the four methods, and supplement some more explanations and proofs, if necessary. Equivalent conditions related to the four methods that can be used to locate "right" directions are presented. Two models (y = bx +ε and y = \|z\| +ε) are used to demonstrate the methods and to see how good they can be. In order to further understand the possible relationships among the four methods, some comparisons are made. We learn that when x is normally distributed, directions that are redundant will not be preserved by any of the four methods. Directions that contribute significantly, however, may be mistakenly removed. Overall, SAVE has the best performance in terms of saving the "right" directions, and applying SIR along with SIR-II performs just as well. We also found that the prerequisite, 「E(y \| x) is twice differentiable」, does not seem to be necessary when pHd is applied. 維度縮減子空間 dimension reduction subspace pHd principal Hessian directions SIR sliced inverse regression SAVE sliced average variance estimate SIR-II
12	Stochastic modelling using large data sets : applications in ecology and genetics / Modélisation stochastique de grands jeux de données : applications en écologie et en génétique Coudret, Raphaël 16 September 2013 (has links) Deux parties principales composent cette thèse. La première d'entre elles est consacrée à la valvométrie, c'est-à-dire ici l'étude de la distance entre les deux parties de la coquille d'une huître au cours du temps. La valvométrie est utilisée afin de déterminer si de tels animaux sont en bonne santé, pour éventuellement tirer des conclusions sur la qualité de leur environnement. Nous considérons qu'un processus de renouvellement à quatre états sous-tend le comportement des huîtres étudiées. Afin de retrouver ce processus caché dans le signal valvométrique, nous supposons qu'une densité de probabilité reliée à ce signal est bimodale. Nous comparons donc plusieurs estimateurs qui prennent en compte ce type d'hypothèse, dont des estimateurs à noyau.Dans un second temps, nous comparons plusieurs méthodes de régression, dans le but d'analyser des données transcriptomiques. Pour comprendre quelles variables explicatives influent sur l'expression de gènes, nous avons réalisé des tests multiples grâce au modèle linéaire FAMT. La méthode SIR peut être envisagée pour trouver des relations non-linéaires. Toutefois, elle est principalement employée lorsque la variable à expliquer est univariée. Une version multivariée de cette approche a donc été développée. Le coût d'acquisition des données transcriptomiques pouvant être élevé, la taille n des échantillons correspondants est souvent faible. C'est pourquoi, nous avons également étudié la méthode SIR lorsque n est inférieur au nombre de variables explicatives p. / There are two main parts in this thesis. The first one concerns valvometry, which is here the study of the distance between both parts of the shell of an oyster, over time. The health status of oysters can be characterized using valvometry in order to obtain insights about the quality of their environment. We consider that a renewal process with four states underlies the behaviour of the studied oysters. Such a hidden process can be retrieved from a valvometric signal by assuming that some probability density function linked with this signal, is bimodal. We then compare several estimators which take this assumption into account, including kernel density estimators.In another chapter, we compare several regression approaches, aiming at analysing transcriptomic data. To understand which explanatory variables have an effect on gene expressions, we apply a multiple testing procedure on these data, through the linear model FAMT. The SIR method may find nonlinear relations in such a context. It is however more commonly used when the response variable is univariate. A multivariate version of SIR was then developed. Procedures to measure gene expressions can be expensive. The sample size n of the corresponding datasets is then often small. That is why we also studied SIR when n is less than the number of explanatory variables p. Données transcriptomiques Estimateur à noyau Processus de renouvellement Régression inverse par tranches Tests multiples Valvométrie Kernel density estimator Multiple testing Renewal process Sliced inverse regression Transcriptomics Valvometry
13	Application of Influence Function in Sufficient Dimension Reduction Models Shrestha, Prabha 28 September 2020 (has links) No description available. Mathematics Statistics Sufficient Dimension Reduction Influence function central subspace central matrix inverse regression methods regression analysis
14	Multi-Stage Experimental Planning and Analysis for Forward-Inverse Regression Applied to Genetic Network Modeling Taslim, Cenny 05 September 2008 (has links) No description available. Bioinformatics Biostatistics Engineering Operations Research Statistics Optimal Design of Experiments D-Optimality Inverse Regression Transcriptional Networks Statistical Simulation System Identification Steady-State Forward-Inverse Modeling Bayesian regression
15	Contributions à la réduction de dimension Kuentz, Vanessa 20 November 2009 (has links) Cette thèse est consacrée au problème de la réduction de dimension. Cette thématique centrale en Statistique vise à rechercher des sous-espaces de faibles dimensions tout en minimisant la perte d'information contenue dans les données. Tout d'abord, nous nous intéressons à des méthodes de statistique multidimensionnelle dans le cas de variables qualitatives. Nous abordons la question de la rotation en Analyse des Correspondances Multiples (ACM). Nous définissons l'expression analytique de l'angle de rotation planaire optimal pour le critère de rotation choisi. Lorsque le nombre de composantes principales retenues est supérieur à deux, nous utilisons un algorithme de rotations planaires successives de paires de facteurs. Nous proposons également différents algorithmes de classification de variables qualitatives qui visent à optimiser un critère de partitionnement basé sur la notion de rapports de corrélation. Un jeu de données réelles illustre les intérêts pratiques de la rotation en ACM et permet de comparer empiriquement les différents algorithmes de classification de variables qualitatives proposés. Puis nous considérons un modèle de régression semiparamétrique, plus précisément nous nous intéressons à la méthode de régression inverse par tranchage (SIR pour Sliced Inverse Regression). Nous développons une approche basée sur un partitionnement de l'espace des covariables, qui est utilisable lorsque la condition fondamentale de linéarité de la variable explicative est violée. Une seconde adaptation, utilisant le bootstrap, est proposée afin d'améliorer l'estimation de la base du sous-espace de réduction de dimension. Des résultats asymptotiques sont donnés et une étude sur des données simulées démontre la supériorité des approches proposées. Enfin les différentes applications et collaborations interdisciplinaires réalisées durant la thèse sont décrites. / This thesis concentrates on dimension reduction approaches, that seek for lower dimensional subspaces minimizing the lost of statistical information. First we focus on multivariate analysis for categorical data. The rotation problem in Multiple Correspondence Analysis (MCA) is treated. We give the analytic expression of the optimal angle of planar rotation for the chosen criterion. If more than two principal components are to be retained, this planar solution is used in a practical algorithm applying successive pairwise planar rotations. Different algorithms for the clustering of categorical variables are also proposed to maximize a given partitioning criterion based on correlation ratios. A real data application highlights the benefits of using rotation in MCA and provides an empirical comparison of the proposed algorithms for categorical variable clustering. Then we study the semiparametric regression method SIR (Sliced Inverse Regression). We propose an extension based on the partitioning of the predictor space that can be used when the crucial linearity condition of the predictor is not verified. We also introduce bagging versions of SIR to improve the estimation of the basis of the dimension reduction subspace. Asymptotic properties of the estimators are obtained and a simulation study shows the good numerical behaviour of the proposed methods. Finally applied multivariate data analysis on various areas is described. Statistique multidimensionnelle Données qualitatives Rotation Classification de variables Régression semiparamétrique Condition de linéarité Bootstrap

Page generated in 0.3395 seconds