Return to search

Comparison of Data Mining and Statistical Techniques for Classification Model

The purpose of this study is to observe the performance of three statistical and data mining classification models viz., logistic regression, decision tree and neural network models for different sample sizes and sampling methods on three sets of data. It is a 3 by 2 by 3 by 8 study where each statistical or data mining method has been employed to build a model for each of 8 different sample sizes and two different sampling methods. The effect of sample size on the overall performance of each model against two sets of test data are observed and compared.
It is seen that for a given dataset, none of the three methods is found to outperform any other and their performances are comparable. This is in contrast to many of the existing studies as cited in the literature review chapter of this thesis. But the absolute value of prediction accuracy varied between the three datasets indicating that the data distribution and data characteristics play a role in the actual prediction accuracy, especially the ratio of the binary values of the dependent variable in the training dataset and the population. The models built with each of the sample size and sampling method for each method were run on two sets of test data to test whether the prediction accuracy was being replicated. It was found that for each of the cases the prediction accuracy was replicated across the test datasets.

Identiferoai:union.ndltd.org:LSU/oai:etd.lsu.edu:etd-11012006-192748
Date03 November 2006
CreatorsLahiri, Rochana
ContributorsYoung H. Chun, Edward F. Watson, Helmut S. Schneider
PublisherLSU
Source SetsLouisiana State University
LanguageEnglish
Detected LanguageEnglish
Typetext
Formatapplication/pdf
Sourcehttp://etd.lsu.edu/docs/available/etd-11012006-192748/
Rightsunrestricted, I hereby certify that, if appropriate, I have obtained and attached herein a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to LSU or its agents the non-exclusive license to archive and make accessible, under the conditions specified below and in appropriate University policies, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.

Page generated in 0.0015 seconds