Data comes in many di erent shapes and sizes. In real life applications it is
common that data we are studying has features that are of varied data types. This
may include, numerical, categorical, and text. In order to be able to model this data
with machine learning algorithms, it is required that the data is typically in numeric
form. Therefore, for data that is not originally numerical, it must be transformed to
be able to be used as input into these algorithms.
Along with this transformation it is common that data we study has many
features relative to the number of samples in the data. It is often desirable to reduce
the number of features that are being trained in a model to eliminate noise and reduce
time in training. This problem of high dimensionality can be approached through
feature selection, feature extraction, or feature embedding. Feature selection seeks to
identify the most essential variables in a dataset that will lead to a parsimonious model
and high performing results, while feature extraction and embedding are techniques
that utilize a mathematical transformation of the data into a represented space. As a
byproduct of using a new representation, we are able to reduce the dimension greatly
without sacri cing performance. Oftentimes, by using embedded features we observe a gain in performance.
Though extraction and embedding methods may be powerful for isolated machine
learning problems, they do not always generalize well. Therefore, we are motivated
to illustrate a methodology that can be applied to any data type with little
pre-processing. The methods we develop can be applied in unsupervised, supervised,
incremental, and deep learning contexts. Using 28 benchmark datasets as examples
which include di erent data types, we construct a framework that can be applied for
general machine learning tasks.
The techniques we develop contribute to the eld of dimension reduction and
feature embedding. Using this framework, we make additional contributions to eigendecomposition
by creating an objective matrix that includes three main vital components.
The rst being a class partitioned row and feature product representation
of one-hot encoded data. Secondarily, the derivation of a weighted adjacency matrix
based on class label relationships. Finally, by the inner product of these aforementioned
values, we are able to condition the one-hot encoded data generated from the
original data prior to eigenvector decomposition. The use of class partitioning and
adjacency enable subsequent projections of the data to be trained more e ectively
when compared side-to-side to baseline algorithm performance. Along with this improved
performance, we can adjust the dimension of the subsequent data arbitrarily.
In addition, we also show how these dense vectors may be used in applications to
order the features of generic data for deep learning.
In this dissertation, we examine a general approach to dimension reduction and
feature embedding that utilizes a class partitioned row and feature representation, a
weighted approach to instance similarity, and an adjacency representation. This general
approach has application to unsupervised, supervised, online, and deep learning.
In our experiments of 28 benchmark datasets, we show signi cant performance gains
in clustering, classi cation, and training time. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2018. / FAU Electronic Theses and Dissertations Collection
Identifer | oai:union.ndltd.org:fau.edu/oai:fau.digital.flvc.org:fau_40719 |
Contributors | Golinko, Eric David (author), Zhu, Xingquan (Thesis advisor), Florida Atlantic University (Degree grantor), College of Engineering and Computer Science, Department of Computer and Electrical Engineering and Computer Science |
Publisher | Florida Atlantic University |
Source Sets | Florida Atlantic University |
Language | English |
Detected Language | English |
Type | Electronic Thesis or Dissertation, Text |
Format | 128 p., application/pdf |
Rights | Copyright © is held by the author, with permission granted to Florida Atlantic University to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder., http://rightsstatements.org/vocab/InC/1.0/ |
Page generated in 0.0091 seconds