Global ETD Search

Return to search

Data Mining in Tree-Based Models and Large-Scale Contingency Tables

This thesis is composed of two parts. The first part pertains to tree-based models. The second part deals with multiple testing in large-scale contingency tables. Tree-based models have gained enormous popularity in statistical modeling and data mining. We propose a novel tree-pruning algorithm called frontier-based tree-pruning algorithm (FBP). The new method has an order of computational complexity comparable to cost-complexity pruning (CCP). Regarding tree pruning, it provides a full spectrum of information. Numerical study on real data sets reveals a surprise: in the complexity-penalization approach, most of the tree sizes are inadmissible. FBP facilitates a more faithful implementation of cross validation, which is favored by simulations.
One of the most common test procedures using two-way contingency tables is the test of independence between two categorizations. Current test procedures such as chi-square or likelihood ratio tests provide overall independency but bring limited information about the nature of the association in contingency tables. We propose an approach of testing independence of categories in individual cells of contingency tables based on a multiple testing framework. We then employ the proposed method to identify the patterns of pair-wise associations between amino acids involved in beta-sheet bridges of proteins. We identify a number of amino acid pairs that exhibit either strong or weak association. These patterns provide useful information for algorithms that predict secondary and tertiary structures of proteins.

http://hdl.handle.net/1853/6825

Identifer	oai:union.ndltd.org:GATECH/oai:smartech.gatech.edu:1853/6825
Date	11 January 2005
Creators	Kim, Seoung Bum
Publisher	Georgia Institute of Technology
Source Sets	Georgia Tech Electronic Thesis and Dissertation Archive
Language	en_US
Detected Language	English
Type	Dissertation
Format	1137247 bytes, application/pdf

Page generated in 0.0017 seconds

Data Mining in Tree-Based Models and Large-Scale Contingency Tables

Description

Links & Downloads

Tags

Additional Fields