Global ETD Search

Return to search

System Complexity Reduction via Feature Selection

abstract: This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve high accuracy, but the combination of many rules is difficult to interpret. Rule condition subset selection (RCSS) methods for associative classification are considered. RCSS aims to prune the rule conditions into a subset via feature selection. The subset then can be summarized into rule-based classifiers. Experiments show that classifiers after RCSS can substantially improve the classification interpretability without loss of accuracy. An ensemble feature selection method is proposed to learn Markov blankets for either discrete or continuous networks (without linear, Gaussian assumptions). The method is compared to a Bayesian local structure learning algorithm and to alternative feature selection methods in the causal structure learning problem. Feature selection is also used to enhance the interpretability of time series classification. Existing time series classification algorithms (such as nearest-neighbor with dynamic time warping measures) are accurate but difficult to interpret. This research leverages the time-ordering of the data to extract features, and generates an effective and efficient classifier referred to as a time series forest (TSF). The computational complexity of TSF is only linear in the length of time series, and interpretable features can be extracted. These features can be further reduced, and summarized for even better interpretability. Lastly, two variable importance measures are proposed to reduce the feature selection bias in tree-based ensemble models. It is well known that bias can occur when predictor attributes have different numbers of values. Two methods are proposed to solve the bias problem. One uses an out-of-bag sampling method called OOBForest, and the other, based on the new concept of a partial permutation test, is called a pForest. Experimental results show the existing methods are not always reliable for multi-valued predictors, while the proposed methods have advantages. / Dissertation/Thesis / Ph.D. Industrial Engineering 2011

http://hdl.handle.net/2286/R.I.9019

Industrial Engineering

Artificial Intelligence

Information Technology

associative classification

attribute importance

feature selection

random forest

time series classification

Identifer	oai:union.ndltd.org:asu.edu/item:9019
Date	January 2011
Contributors	Deng, Houtao (Author), Runger, George C (Advisor), Lohr, Sharon L (Committee member), Pan, Rong (Committee member), Zhang, Muhong (Committee member), Arizona State University (Publisher)
Source Sets	Arizona State University
Language	English
Detected Language	English
Type	Doctoral Dissertation
Format	118 pages
Rights	http://rightsstatements.org/vocab/InC/1.0/, All Rights Reserved

Page generated in 0.0019 seconds

System Complexity Reduction via Feature Selection

Description

Links & Downloads

Tags

Additional Fields