Global ETD Search

1	利用函數映射進行資料庫增值於資料採礦中林建言 Unknown Date (has links) 人口的增長、現代化的生活環境，讓人們必須去面對隨時不斷產生的巨量資料；不過值得慶幸的是，電腦設備的運算、儲存能力一直在改進，所以人類所能處理的資料量也隨之提升，資料採礦技術的發展便是人類嘗試在大量資料中進行分析，以解決生活中所遇到的難題。許多實際個案的結果顯示，資料採礦工作確實能替分析者帶來更好的績效，然而仍是有不少的失敗案例。如果深入去分析失敗原因，問題並不是出於資料採礦技術無法使用，而是資料品質不良或是資料內涵資訊不足所導致的。資料庫中有用的變數不足的問題可以藉由重新收集資料解決，然而這勢必需要花費龐大的經費並且缺乏時效性。如何利用其他的外部資料來提昇資料庫的資訊含量便是本研究的目的。在實證過程中，利用工商業與服務業普查資料庫和技術創新資料庫做為分析所使用的資料庫；並且控制資料庫連結變數個數、建模資料比例和各類模型三個因子，採用函數映設方式，進行資料庫增值的工作。從研究結果可以發現，確實可以藉由其他資料或是資料庫的內容，來增加資料庫的內含欄位和訊息，希望能夠替資料採礦工作者提供一個節省精力的方向，而且做為未來更多研究的基礎。關鍵字：資料採礦、函數映射、資料庫加值。資料採礦函數映射資料庫加值 Data Mining Functional Mapping Database Value-added
2	應用資料採礦技術於資料庫加值中的插補方法比較 / Imputation of value-added database in data mining 黃雅芳 Unknown Date (has links) 資料在企業資訊來源中扮演了極為重要的角色，特別是在現今知識與技術的世代裡。如果對於一個有意義且具有代表性資料庫中的遺漏值能夠正確的處理，那麼對於企業資訊而言，是一個大有可為的突破。然而，有時我們或許會遇到一些不是那麼完善的資料庫，當資料庫中的資料有遺漏值時，從這樣資料庫中所獲得的結果，或許會是一些有偏差或容易令人誤解的結果。因此，本研究的目的在於插補遺漏值為資料庫加值，進而根據遺漏值類型建立插補模型。如果遺漏值為連續型，用迴歸模型和倒傳遞類神經模型來進行插補；如果遺漏值為類別型，採用邏輯斯迴歸、倒傳遞類神經和決策樹進行插補分析。經由模擬的結果顯示，對於連續型的遺漏值，迴歸模型提供了最佳的插補估計；而類別型的遺漏值，C5.0決策樹是最佳的選擇。此外，對於資料庫中的稀少資料，當連續型的遺漏值，倒傳遞類神經模型提供了最佳的插補估計；而類別型的遺漏值，亦是C5.0決策樹是最佳的選擇。 / Data plays a vital role as source of information to the organization especially in the era of information and technology. A meaningful, qualitative and representative database if properly handled could mean a promising breakthrough to the organizations. However, from time to time, we may encounter a not so perfect database, that is we have the situation where the data in the database is missing. With the incomplete database, the results obtained from such database may provide biased or misleading solutions. Therefore, the purpose of this research is to place its emphasis on imputing missing data of the value-added database then builds the model in accordance to the type of data. If the missing data type is continuous, regression model and BPNN neural network is applied. If the missing data type is categorical, logistic regression, BPNN neural network and decision tree is chosen for the application. Our result has shown that for the continuous missing data, the regression model proved to deliver the best estimate. For the categorical missing data, C5.0 decision tree model is the chosen one. Besides, as regards the rare data missing in the database, our result has shown that for the continuous missing data, the BPNN neural network proved to deliver the best estimate. For the categorical missing data, C5.0 decision tree model is the chosen one. 資料採礦資料庫加值稀少資料遺漏值插補 data mining value-added database rare data missing data imputation
3	資料採礦中的資料純化過程之效果評估楊惠如 Unknown Date (has links) 數年來台灣金控公司已如雨後春筍般冒出來，在金控公司底下含有產險公司、銀行、證券以及人壽公司等許多金融相關公司，因此，原本各自擺放於各子公司的資料庫可以通通整合在一起，當高階主管想提出決策時可利用資料庫進行資料採礦，以獲取有用的資訊。然而資料採礦的效果再怎麼神奇，也必須先有一個好的、完整的資料庫供使用，如果資料品質太差或者資料內容與研究目標無關，這是無法達成完美的資料採礦工作。透過抽樣調查與函數映射的方法使得資料庫得以加值，因此當有目標資料庫與輔助資料庫時，可以利用函數映射方法使資料庫整合為一個大資料庫，再將資料庫中遺失值或稀少值作插補得到增值後的資料庫。在此給予這個整個流程一個名詞 ”Data SPA(Data Systematic Purifying Analysis)”，即「資料純化」。在本研究中，主要就是針對純化完成的資料進行結構的確認，確認經過這些過程之後的資料是效用且正確的。在本研究採用了橫向評估、縱向評估與全面性評估三種方法來檢驗資料。資料純化後的資料經過三項評估後，可以發現資料以每個變數或者每筆觀察樣本的角度去查驗資料時，資料的表現並不理想，但是，資料的整體性卻是相當不錯。雖然以橫向評估和縱向評估來看，資料純化後的資料無法與原本完整的資料完全一致，但是透過資料純化的過程，資料得以插補且欄位得以擴增，這樣使得資料的資訊量增加，所以，資料純化確實有其效果，因為資訊量的增加對於要進行資料採礦的資料庫是一大助益。 / For the past few years, Taiwan has experienced a tremendous growth in its financial industry namely in banks, life and property insurances, brokerages and security firms. Needless to say the need to store the data produced in this industry has become an important and a primary task to accomplish. Originally, firms store the data in their own database. With the progressive development of data management, the data now can be combined and stored into one large database that allows the users an easy access for data retrieval. However, if the quality of the data is questionable, then the existence of database would not provide much insightful information to the users. To tackle the fore mentioned problem, this research uses functional mapping combining the goal and auxiliary database and then imputes the missing data or the rare data from the combined database. This whole process is called Data Systematic Purifying Analysis (Data SPA). The purpose of this research is to evaluate whether there is any improvement of the structure of the data when the data has gone through the process of systematic purifying analysis. Generally the resulting data should be within good quality and useful. After the assessments of the data structure, the behavior of the data with respect to their added variables and observations is unsatisfactory. However the manifestation of the data as a whole has seen an improvement. The modified database through Data SPA has augmented the database making it more efficient to the usage of data mining techniques. 資料純化資料採礦遺漏值插補函數映射資料庫加值 Data Systematic Purifying Analysis Data Mining Missing Data Rare Data Imputation Functional Mapping Database Value-Added

1

Page generated in 0.0281 seconds