Return to search

預測模型的遺失值處理─選值順序的研究 / Handling Missing Values in Predictive Model - Research of the Order of Data Acquisition

商業知識的發展突飛猛進,其中,預測模型在眾多商業智慧中扮演重要的角色,然而,當我們從大量資料萃取隱藏、未知與潛在具有實用性的資訊處理過程時,往往會遇到許多資料品質上的問題而難以著手分析,尤其是遺失值 (Missing value)的問題在資料前置處理階段更是常見的困難。因此,要如何在建立預測模型時有效的處理遺失值是一個很重要的議題。
過去已有許多文獻致力於遺失值處理的議題,其中,Active Feature-Value Acquisition的相關研究更針對訓練資料的選填順序深入探討。Active Feature-Value Acquisition的概念是從具有遺失值的訓練資料中,選擇適當的遺失資料填補,讓預測的模型在最具效率的情況下達到理想的準確率。本研究將延續Active Feature-Value Acquisition的研究主軸,優先考量決策樹上的節點為遺失值選值填補的順序,提出一個新的訓練資料遺失值的選填順序方法─I Sampling,並透過實際的數據進行訓練與測試,同時我們也與過去文獻所提出的方法進行比較,了解不同的填值偵測與順序的選擇對於一個預測模型的分類準確率是否有影響,並了解各個方法的優缺點與在不同情境下的適用性。
本研究所提出的新方法與驗證的結果,將可給予未來從事預測行為的管理或學術工作一些參考與建議,可以依據不同性質的資料採取合宜的選值方式,以節省取值的成本並提高預測模型的分類能力。 / The importance of business intelligence is accelerated developing nowadays. Especially predictive models play a key role in numerous business intelligence tasks. However, while we extract information from unidentified data, there are critical problems of how to handle the missing values, especially in the data pre-processing phase. Therefore, it is important to identify which methods best deal with the missing data when building predictive models.
There are several papers dedicated in the research of strategies to deal with the missing values. The topic of Active-Feature Acquisition (aka. AFA) especially worked on the priority order of choosing which feature-value to acquire. The goal of AFA is to reduce the costs of achieving a desired model accuracy by identifying instances for which obtaining complete information is most informative. Followed by the AFA concept, we present an approach- I Sampling, in which feature-values are selected for acquisition based on the attribute on the top node of the current decision tree. Also we compare our approach with other methods in different situations and data missing patterns.
Experimental results demonstrate that our approach can induce accurate models using substantially fewer feature-value acquisitions as compared to alternative policies in some situations. The method we proposed can aid the further predictive works in academic and business area. They can therefore choose the right method based on their needs and obtain the informative data in an efficient way.

Identiferoai:union.ndltd.org:CHENGCHI/G0101355006
Creators黃秋芸, Huang, Chiu Yun
Publisher國立政治大學
Source SetsNational Chengchi University Libraries
Language中文
Detected LanguageEnglish
Typetext
RightsCopyright © nccu library on behalf of the copyright holders

Page generated in 0.0132 seconds