Spelling suggestions: "subject:"data asystematic purifying 2analysis"" "subject:"data asystematic purifying 3analysis""
1 |
資料採礦中的資料純化過程之效果評估楊惠如 Unknown Date (has links)
數年來台灣金控公司已如雨後春筍般冒出來,在金控公司底下含有產險公司、銀行、證券以及人壽公司等許多金融相關公司,因此,原本各自擺放於各子公司的資料庫可以通通整合在一起,當高階主管想提出決策時可利用資料庫進行資料採礦,以獲取有用的資訊。然而資料採礦的效果再怎麼神奇,也必須先有一個好的、完整的資料庫供使用,如果資料品質太差或者資料內容與研究目標無關,這是無法達成完美的資料採礦工作。
透過抽樣調查與函數映射的方法使得資料庫得以加值,因此當有目標資料庫與輔助資料庫時,可以利用函數映射方法使資料庫整合為一個大資料庫,再將資料庫中遺失值或稀少值作插補得到增值後的資料庫。在此給予這個整個流程一個名詞 ”Data SPA(Data Systematic Purifying Analysis)”,即「資料純化」。在本研究中,主要就是針對純化完成的資料進行結構的確認,確認經過這些過程之後的資料是效用且正確的。在本研究採用了橫向評估、縱向評估與全面性評估三種方法來檢驗資料。
資料純化後的資料經過三項評估後,可以發現資料以每個變數或者每筆觀察樣本的角度去查驗資料時,資料的表現並不理想,但是,資料的整體性卻是相當不錯。雖然以橫向評估和縱向評估來看,資料純化後的資料無法與原本完整的資料完全一致,但是透過資料純化的過程,資料得以插補且欄位得以擴增,這樣使得資料的資訊量增加,所以,資料純化確實有其效果,因為資訊量的增加對於要進行資料採礦的資料庫是一大助益。 / For the past few years, Taiwan has experienced a tremendous growth in its financial industry namely in banks, life and property insurances, brokerages and security firms. Needless to say the need to store the data produced in this industry has become an important and a primary task to accomplish. Originally, firms store the data in their own database. With the progressive development of data management, the data now can be combined and stored into one large database that allows the users an easy access for data retrieval. However, if the quality of the data is questionable, then the existence of database would not provide much insightful information to the users.
To tackle the fore mentioned problem, this research uses functional mapping combining the goal and auxiliary database and then imputes the missing data or the rare data from the combined database. This whole process is called Data Systematic Purifying Analysis (Data SPA). The purpose of this research is to evaluate whether there is any improvement of the structure of the data when the data has gone through the process of systematic purifying analysis. Generally the resulting data should be within good quality and useful.
After the assessments of the data structure, the behavior of the data with respect to their added variables and observations is unsatisfactory. However the manifestation of the data as a whole has seen an improvement. The modified database through Data SPA has augmented the database making it more efficient to the usage of data mining techniques.
|
Page generated in 0.0943 seconds