碩士 / 國防醫學院 / 公共衛生學研究所 / 94 / ABSTRACT
Objective: This study compared the performance of prediction models, which were implemented using three data mining methods- Artificial Neural Network, Logistic Regression and Decision Tree. We found causes that will affect the prediction models, which causes included effects of populations, sample sizes, exclusive criteria, internal/external validation and single/cumulative yearly training set(s). Furthermore, the study investigated the abilities of predictive variables in Taiwan and USA Cancer Registry System for breast cancer patients.
Methods and materials: Study samples were diagnosed as breast cancer patients in the USA Cancer Registry Database (SEER) during years 1973-2001 and Taiwan Cancer Registry Database (CRS) during years 1979-2002. SEER dataset were 64096 cases excluding died not in breast cancer, 66875 cases excluding died not in cancer. CRS dataset were 27010 cases excluding died not in breast cancer, 27809 cases excluding died not in cancer. The performances of prediction models were evaluated according to parameters such as accuracy, the area under ROC curve, specificity under sensitivity fixed at 0.95.
Results: The results indicated : 1. SEER model performed better accuracy that was 10.04% higher than CRS model. 2. Models of cases excluding died not in breast cancer performed better accuracy that was 1.17% slightly higher than models of cases excluding died not in cancer. 3. Models of training set with single and accumulative yearly cases performed similarly on the external validation set. 4. In SEER models, Decision Tree performed the best accuracy but not steadily that was about 6% higher than other methods on internal validation set but about 3% lower than other methods on external validation set. In CRS models, three methods performed similarly . 5. Models on external validation set performed an average accuracy 2.93% lower than on internal validation set, respectively, 1.42% lower in Artificial Neural Network, 1.01% lower in Logistic Regression and 6.73% lower in Decision Tree.
Conclusion: SEER models performed better than CRS models, and Artificial Neural Network method performed similarly with Logistic Regression method. Decision Tree models performed the best accuracy and AUC among three methods on internal validation set that showed Decision Tree was a good tool for implementing predictive rules, but highly overestimated accuracy when internal versus external validation and was hypersensitive of decreasing predictive factors.
Identifer | oai:union.ndltd.org:TW/094NDMC0058008 |
Date | January 2006 |
Creators | Lu, Yu Fen, 盧瑜芬 |
Contributors | Chu Chi Ming, 朱基銘 |
Source Sets | National Digital Library of Theses and Dissertations in Taiwan |
Language | zh-TW |
Detected Language | English |
Type | 學位論文 ; thesis |
Format | 210 |
Page generated in 0.0014 seconds