Global ETD Search

Return to search

Some Advances in Classifying and Modeling Complex Data

In statistical methodology of analyzing data, two of the most commonly used techniques are classification and regression modeling. As scientific technology progresses rapidly, complex data often occurs and requires novel classification and regression modeling methodologies according to the data structure. In this dissertation, I mainly focus on developing a few approaches for analyzing the data with complex structures.

Classification problems commonly occur in many areas such as biomedical, marketing, sociology and image recognition. Among various classification methods, linear classifiers have been widely used because of computational advantages, ease of implementation and interpretation compared with non-linear classifiers. Specifically, linear discriminant analysis (LDA) is one of the most important methods in the family of linear classifiers.

For high dimensional data with number of variables p larger than the number of observations n occurs more frequently, it calls for advanced classification techniques.

In Chapter 2, I proposed a novel sparse LDA method which generalizes LDA through a regularized approach for the two-class classification problem.

The proposed method can obtain an accurate classification accuracy with attractive computation, which is suitable for high dimensional data with p>n.

In Chapter 3, I deal with the classification when the data complexity lies in the non-random missing responses in the training data set. Appropriate classification method needs to be developed accordingly. Specifically, I considered the "reject inference problem'' for the application of fraud detection for online business. For online business, to prevent fraud transactions, suspicious transactions are rejected with unknown fraud status, yielding a training data with selective missing response. A two-stage modeling approach using logistic regression is proposed to enhance the efficiency and accuracy of fraud detection.

Besides the classification problem, data from designed experiments in scientific areas often have complex structures. Many experiments are conducted with multiple variance sources. To increase the accuracy of the statistical modeling, the model need to be able to accommodate more than one error terms. In Chapter 4, I propose a variance component mixed model for a nano material experiment data to address the between group, within group and within subject variance components into a single model. To adjust possible systematic error introduced during the experiment, adjustment terms can be added. Specifically a group adaptive forward and backward selection (GFoBa) procedure is designed to select the significant adjustment terms. / Ph. D.

A/B testing

fraud detection

linear classifier

misclassification error

net profit value

reject inference

sparse linear discriminant analysis

two-class classification

variance component mixed model.

Identifer	oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/77958
Date	16 December 2015
Creators	Zhang, Angang
Contributors	Statistics, Deng, Xinwei, Kim, Inyoung, Smith, Eric P., Hong, Yili
Publisher	Virginia Tech
Source Sets	Virginia Tech Theses and Dissertation
Detected Language	English
Type	Dissertation
Format	ETD, application/pdf
Rights	In Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0022 seconds

Some Advances in Classifying and Modeling Complex Data

Description

Links & Downloads

Tags

Additional Fields