Spelling suggestions: "subject:"dimensionreduction"" "subject:"dimensionsreduktion""
61 |
Apprentissage de structures dans les valeurs extrêmes en grande dimension / Discovering patterns in high-dimensional extremesChiapino, Maël 28 June 2018 (has links)
Nous présentons et étudions des méthodes d’apprentissage non-supervisé de phénomènes extrêmes multivariés en grande dimension. Dans le cas où chacune des distributions marginales d’un vecteur aléatoire est à queue lourde, l’étude de son comportement dans les régions extrêmes (i.e. loin de l’origine) ne peut plus se faire via les méthodes usuelles qui supposent une moyenne et une variance finies. La théorie des valeurs extrêmes offre alors un cadre adapté à cette étude, en donnant notamment une base théorique à la réduction de dimension à travers la mesure angulaire. La thèse s’articule autour de deux grandes étapes : - Réduire la dimension du problème en trouvant un résumé de la structure de dépendance dans les régions extrêmes. Cette étape vise en particulier à trouver les sous-groupes de composantes étant susceptible de dépasser un seuil élevé de façon simultané. - Modéliser la mesure angulaire par une densité de mélange qui suit une structure de dépendance déterminée à l’avance. Ces deux étapes permettent notamment de développer des méthodes de classification non-supervisée à travers la construction d’une matrice de similarité pour les points extrêmes. / We present and study unsupervised learning methods of multivariate extreme phenomena in high-dimension. Considering a random vector on which each marginal is heavy-tailed, the study of its behavior in extreme regions is no longer possible via usual methods that involve finite means and variances. Multivariate extreme value theory provides an adapted framework to this study. In particular it gives theoretical basis to dimension reduction through the angular measure. The thesis is divided in two main part: - Reduce the dimension by finding a simplified dependence structure in extreme regions. This step aim at recover subgroups of features that are likely to exceed large thresholds simultaneously. - Model the angular measure with a mixture distribution that follows a predefined dependence structure. These steps allow to develop new clustering methods for extreme points in high dimension.
|
62 |
Random projection for high-dimensional optimization / Projection aléatoire pour l'optimisation de grande dimensionVu, Khac Ky 05 July 2016 (has links)
À l'ère de la numérisation, les données devient pas cher et facile à obtenir. Cela se traduit par de nombreux nouveaux problèmes d'optimisation avec de très grandes tailles. En particulier, pour le même genre de problèmes, le nombre de variables et de contraintes sont énormes. En outre, dans de nombreux paramètres d'application tels que ceux dans l'apprentissage de la machine, une solution précise est moins préférée que celles approximatives mais robustes. Il est un véritable défi pour les algorithmes traditionnels, qui sont utilisés pour bien travailler avec des problèmes de taille moyenne, pour faire face à ces nouvelles circonstances.Au lieu de développer des algorithmes qui évoluent bien à résoudre ces problèmes directement, une idée naturelle est de les transformer en problèmes de petite taille qui se rapporte fortement aux originaux. Étant donné que les nouvelles sont de tailles gérables, ils peuvent encore être résolus efficacement par des méthodes classiques. Les solutions obtenues par ces nouveaux problèmes, cependant, nous donner un aperçu des problèmes originaux. Dans cette thèse, nous allons exploiter l'idée ci-dessus pour résoudre certains problèmes de grande dimension optimisation. En particulier, nous appliquons une technique spéciale appelée projection aléatoire pour intégrer les données du problème dans les espaces de faible dimension, et de reformuler environ le problème de telle manière qu'il devient très facile à résoudre, mais capte toujours l'information la plus importante.Dans le chapitre 3, nous étudions les problèmes d'optimisation dans leurs formes de faisabilité. En particulier, nous étudions le problème que l'on appelle l'adhésion linéaire restreint. Cette classe contient de nombreux problèmes importants tels que la faisabilité linéaire et entier. Nous proposonsd'appliquer une projection aléatoire aux contraintes linéaires etnous voulons trouver des conditions sur T, de sorte que les deux problèmes de faisabilité sont équivalentes avec une forte probabilité.Dans le chapitre 4, nous continuons à étudier le problème ci-dessus dans le cas où l'ensemble restreint est un ensemble convexe. Nous établissons les relations entre les problèmes originaux et projetés sur la base du concept de la largeur gaussienne, qui est populaire dans la détection comprimé. En particulier, nous montrons que les deux problèmes sont équivalents avec une forte probabilité aussi longtemps que pour une projection aléatoire échantillonné à partir ensemble sous-gaussienne avec grande dimension suffisante (dépend de la largeur gaussienne).Dans le chapitre 5, nous étudions le problème de l'adhésion euclidienne:.. `` Étant donné un vecteur b et un euclidienne ensemble fermé X, décider si b est en Xor pas "Ceci est une généralisation du problème de l'appartenance linéaire restreinte précédemment considéré. Nous employons une gaussienne projection aléatoire T pour l'intégrer à la fois b et X dans un espace de dimension inférieure et étudier la version projetée correspondant. Lorsque X est fini ou dénombrable, en utilisant un argument simple, nous montrons que les deux problèmes sont équivalents presque sûrement quelle que soit la dimension projetée. Dans le cas où X peut être indénombrable, nous prouvons que les problèmes initiaux et prévus sont également équivalentes si la dimension d projetée est proportionnelle à une dimension intrinsèque de l'ensemble X. En particulier, nous employons la définition de doubler la dimension estimer la relation entre les deux problèmes.Dans le chapitre 6, nous proposons d'appliquer des projections aléatoires pour la zone de confiance sous-problème. Nous réduisons le nombre de variables en utilisant une projection aléatoire et prouver que des solutions optimales pour le nouveau problème sont en fait des solutions approchées de l'original. Ce résultat peut être utilisé dans le cadre de confiance-région pour étudier l'optimisation de boîte noire et l'optimisation des produits dérivés libre. / In the digitization age, data becomes cheap and easy to obtain. That results in many new optimization problems with extremely large sizes. In particular, for the same kind of problems, the numbers of variables and constraints are huge. Moreover, in many application settings such as those in Machine learning, an accurate solution is less preferred as approximate but robust ones. It is a real challenge for traditional algorithms, which are used to work well with average-size problems, to deal with these new circumstances.Instead of developing algorithms that scale up well to solve these problems directly, one natural idea is to transform them into small-size problems that strongly relates to the originals. Since the new ones are of manageable sizes, they can still be solved efficiently by classical methods. The solutions obtained by these new problems, however, will provide us insight into the original problems. In this thesis, we will exploit the above idea to solve some high-dimensional optimization problems. In particular, we apply a special technique called random projection to embed the problem data into low dimensional spaces, and approximately reformulate the problem in such a way that it becomes very easy to solve but still captures the most important information. Therefore, by solving the projected problem, we either obtain an approximate solution or an approximate objective value for the original problem.We will apply random projection to study a number of important optimization problems, including linear and integer programming (Chapter 3), convex optimization with linear constraints (Chapter 4), membership and approximate nearest neighbor (Chapter 5) and trust-region subproblems (Chapter 6).In Chapter 3, we study optimization problems in their feasibility forms. In particular, we study the so-called restricted linear membership problem. This class contains many important problems such as linear and integer feasibility. We proposeto apply a random projection to the linear constraints, andwe want to find conditions on T, so that the two feasibility problems are equivalent with high probability.In Chapter 4, we continue to study the above problem in the case the restricted set is a convex set. Under that assumption, we can define a tangent cone at some point with minimal squared error. We establish the relations between the original and projected problems based on the concept of Gaussian width, which is popular in compressed sensing. In particular, we prove thatthe two problems are equivalent with high probability as long as for some random projection sampled from sub-gaussian ensemble with large enough dimension (depends on the gaussian width).In Chapter 5, we study the Euclidean membership problem: ``Given a vector b and a Euclidean closed set X, decide whether b is in Xor not". This is a generalization of the restricted linear membership problem considered previously. We employ a Gaussian random projection T to embed both b and X into a lower dimension space and study the corresponding projected version: ``Decide whether Tb is in T(X) or not". When X is finite or countable, using a straightforward argument, we prove that the two problems are equivalent almost surely regardless the projected dimension. In the case when X may be uncountable, we prove that the original and projected problems are also equivalent if the projected dimension d is proportional to some intrinsic dimension of the set X. In particular, we employ the definition of doubling dimension estimate the relation between the two problems.In Chapter 6, we propose to apply random projections for the trust-region subproblem. We reduce the number of variables by using a random projection and prove that optimal solutions for the new problem are actually approximate solutions of the original. This result can be used in the trust-region framework to study black-box optimization and derivative-free optimization.
|
63 |
On Analysis of Sufficient Dimension Reduction ModelsAn, Panduan 04 June 2019 (has links)
No description available.
|
64 |
Advances on Dimension Reduction for Univariate and Multivariate Time SeriesMahappu Kankanamge, Tharindu Priyan De Alwis 01 August 2022 (has links) (PDF)
Advances in modern technologies have led to an abundance of high-dimensional time series data in many fields, including finance, economics, health, engineering, and meteorology, among others. This causes the “curse of dimensionality” problem in both univariate and multivariate time series data. The main objective of time series analysis is to make inferences about the conditional distributions. There are some methods in the literature to estimate the conditional mean and conditional variance functions in time series. However, most of those are inefficient, computationally intensive, or suffer from the overparameterization. We propose some dimension reduction techniques to address the curse of dimensionality in high-dimensional time series dataFor high-dimensional matrix-valued time series data, there are a limited number of methods in the literature that can preserve the matrix structure and reduce the number of parameters significantly (Samadi, 2014, Chen et al., 2021). However, those models cannot distinguish between relevant and irrelevant information and yet suffer from the overparameterization. We propose a novel dimension reduction technique for matrix-variate time series data called the "envelope matrix autoregressive model" (EMAR), which offers substantial dimension reduction and links the mean function and the covariance matrix of the model by using the minimal reducing subspace of the covariance matrix. The proposed model can identify and remove irrelevant information and can achieve substantial efficiency gains by significantly reducing the total number of parameters. We derive the asymptotic properties of the proposed maximum likelihood estimators of the EMAR model. Extensive simulation studies and a real data analysis are conducted to corroborate our theoretical results and to illustrate the finite sample performance of the proposed EMAR model.For univariate time series, we propose sufficient dimension reduction (SDR) methods based on some integral transformation approaches that can preserve sufficient information about the response. In particular, we use the Fourier and Convolution transformation methods (FM and CM) to perform sufficient dimension reduction in univariate time series and estimate the time series central subspace (TS-CS), the time series mean subspace (TS-CMS), and the time series variance subspace (TS-CVS). Using FM and CM procedures and with some distributional assumptions, we derive candidate matrices that can fully recover the TS-CS, TS-CMS, and TS-CVS, and propose an explicit estimate of the candidate matrices. The asymptotic properties of the proposed estimators are established under both normality and non-normality assumptions. Moreover, we develop some data-drive methods to estimate the dimension of the time series central subspaces as well as the lag order. Our simulation results and real data analyses reveal that the proposed methods are not only significantly more efficient and accurate but also offer substantial computational efficiency compared to the existing methods in the literature. Moreover, we develop an R package entitled “sdrt” to easily perform our program code in FM and CM procedures to estimate suffices dimension reduction subspaces in univariate time series.
|
65 |
W2R: an ensemble Anomaly detection model inspired by language models for web application firewalls securityWang, Zelong, AnilKumar, Athira January 2023 (has links)
Nowadays, web application attacks have increased tremendously due to the large number of users and applications. Thus, industries are paying more attention to using Web application Firewalls and improving their security which acts as a shield between the app and the internet by filtering and monitoring the HTTP traffic. Most works focus on either traditional feature extraction or deep methods that require no feature extraction method. We noticed that a combination of an unsupervised language model and a classic dimension reduction method is less explored for this problem. Inspired by this gap, we propose a new unsupervised anomaly detection model with better results than the existing state-of-the-art model for anomaly detection in WAF security. This paper focuses on this structure to explore WAF security: 1) feature extraction from HTTP traffic packets by using NLP (natural language processing) methods such as word2vec and Bert, and 2) Dimension reduction by PCA and Autoencoder, 3) Using different types of anomaly detection techniques including OCSVM, isolation forest, LOF and combination of these algorithms to explore how these methods affect results. We used the datasets CSIC 2010 and ECML/PKDD 2007 in this paper, and the model has better results.
|
66 |
Unsupervised Dimension Reduction Techniques for Lung Cancer Diagnosis Based on RadiomicsKireta, Janet, Zahed, Mostafa, Dr. 25 April 2023 (has links)
One of the most pressing global health concerns is the impact of cancer, which remains a leading cause of death worldwide. The timeliness of detection and diagnosis is critical to maximizing the chances of successful treatment. Radiomics is an emerging medical imaging analysis proposed, which refers to the high-throughput extraction of a large number of image features. Radiomics generally refers to the use of CT, PET, MRI or Ultrasound imaging as input data, extracting expressive features from massive image-based data, and then using machine learning or statistical models for quantitative analysis and prediction of disease. Feature reduction is very critical in Radiomics as a large number of quantitative features can have redundant characteristics not necessarily important in the analysis process. Due to the immense features obtained from radiological images, the main objective of our research is the application of machine learning techniques to reduce the number of dimensions, thereby rendering the data more manageable. Radiomics involves several steps including: Imaging, segmentation, feature extraction, and analysis. Extracted features can be categorized in the description of tumor gray histograms, shape, texture features, and the tumor location and surrounding tissue. For this research, a large-scale CT dataset for Lung cancer diagnosis (Lung- PET-CT-Dx) which was collected by scholars from Medical University in Harbin in China is used to illustrate the dimension reduction techniques, which is a main part of radiomics process, via R, SAS and Python. The proposed reduction and analysis techniques in our research will entail; Principal Component Analysis, Clustering analysis (Hierarchical Clustering and K-means), and Manifold-based algorithms (Isometric Feature Mapping (ISOMAP).
|
67 |
Topics in Multivariate Time Series Analysis: Statistical Control, Dimension Reduction Visualization and Their Business ApplicationsHuang, Xuan 01 May 2010 (has links)
Most business processes are, by nature, multivariate and autocorrelated. High-dimensionality is rooted in processes where more than one variable is considered simultaneously to provide a more comprehensive picture. Time series models are preferable to an independently identically distributed (I.I.D.) model because they capture the fact that many processes have a memory of their past. Examples of multivariate autocorrelation can be found in processes in the business fields such as Operations Management, Finance and Marketing. The topic of statistical control is most relevant to Quality Management. While both multivariate I.I.D. processes and univariate autocorrelated processes have received much attention in the Statistical Process Control (SPC) literature, little work has been done to simultaneously address high-dimensionality and autocorrelation. In this dissertation, this gap is filled by extending the univariate special cause chart and common cause chart to multivariate situations. In addition, two-chart control schemes are extended to nonstationary processes. Further, a class of Markov Chain models is proposed to provide accurate Average Run Length (ARL) computation when the process is autocorrelated. The second part of this dissertation aims to devise a dimension reduction method for autocorrelated processes. High-dimensionality often obscures the true underlying components of a process. In traditional multivariate literature, Principal Components Analysis (PCA) is the standard tool for dimension reduction. For autocorrelated processes, however, PCA fails to take into account the autocorrelation information. Thus, it is doubtful that PCA is the best choice. A two-step dimension reduction procedure is devised for multivariate time series. Comparisons based on both simulated examples and case studies show that the two-step procedure is more efficient in retrieving true underlying factors. Visualization of multivariate time series assists our understanding of the process. In the last part of this dissertation a simple three-dimensional graph is proposed to assist visualizing the results of PCA. It is intended to complement existing graphical methods for multivariate time series data. The idea is to visualize multivariate data as a surface that in turn can be decomposed with PCA. The developed surface plots are intended for statistical process analysis but may also help visualize economics data and, in particular, co-integration.
|
68 |
Predictive Modeling of Spatio-Temporal Datasets in High DimensionsChen, Linchao 27 May 2015 (has links)
No description available.
|
69 |
Some Statistical Aspects of Association Studies in Genetics and Tests of the Hardy-Weinberg EquilibriumHe, Ran 08 October 2007 (has links)
No description available.
|
70 |
Two Essays on Single-index ModelsWu, Zhou 24 September 2008 (has links)
No description available.
|
Page generated in 0.0812 seconds