Global ETD Search

Return to search

Dimension Reduction and LASSO using Pointwise and Group Norms

Principal Components Analysis (PCA) is a statistical procedure commonly used for the purpose of analyzing high dimensional data. It is often used for dimensionality reduction, which is accomplished by determining orthogonal components that contribute most to the underlying variance of the data. While PCA is widely used for identifying patterns and capturing variability of data in lower dimensions, it has some known limitations. In particular, PCA represents its results as linear combinations of data attributes. PCA is therefore, often seen as difficult to interpret and because of the underlying optimization problem that is being solved it is not robust to outliers. In this thesis, we examine extensions to PCA that address these limitations. Specific techniques researched in this thesis include variations of Robust and Sparse PCA as well as novel combinations of these two methods which result in a structured low-rank approximation that is robust to outliers. Our work is inspired by the well known machine learning methods of Least Absolute Shrinkage and Selection Operator (LASSO) as well as pointwise and group matrix norms. Practical applications including robust and non-linear methods for anomaly detection in Domain Name System network data as well as interpretable feature selection with respect to a website classification problem are discussed along with implementation details and techniques for analysis of regularization parameters.

Identifer	oai:union.ndltd.org:wpi.edu/oai:digitalcommons.wpi.edu:etd-theses-2264
Date	11 December 2018
Creators	Jutras, Melanie A
Contributors	Randy C. Paffenroth, Advisor, Lane T. Harrison, Reader,
Publisher	Digital WPI
Source Sets	Worcester Polytechnic Institute
Detected Language	English
Type	text
Format	application/pdf
Source	Masters Theses (All Theses, All Years)

Page generated in 0.0019 seconds

Dimension Reduction and LASSO using Pointwise and Group Norms

Description

Links & Downloads

Tags

Additional Fields