Return to search

決策樹形式知識整合之研究 The Research on Decision-Tree-Based Knowledge Integration

隨著知識經濟時代的來臨,掌握知識可幫助組織提昇其競爭力,因此對於知識的產生、儲存、應用和整合,已成為熱烈討論的議題,本研究針對知識整合議題進行探討;而在知識呈現方式中,決策樹(Decision Tree)形式知識為樹狀結構,可以用圖形化的方式來呈現,它的結構簡單且易於瞭解,本研究針對決策樹形式知識來探討其知識整合的課題。
本研究首先提出一個合併選擇決策樹方法MODT(Merging Optional Decision Tree),主要是在原始決策樹結構中增加一個選擇連結(Option Link),來結合具有相同祖先(Ancestor)的兩個子樹;而結合方式是以兩兩合併的方式,由上而下比對兩棵決策樹的節點(Node),利用接枝(Grafting)技術來結合兩棵樹的知識。再者利用強態法則(Strong Pattern Rule)概念來提昇合併樹的預測能力。
其次,由於MODT方法在合併兩棵不同根節點的決策樹時,會形成環狀連結的情形而破壞了原有的樹形結構,以及新增的選擇連結會增加儲存空間且不易維護,因此本研究提出決策樹合併修剪方法DTBMPA(Decision-Tree-Based Merging-Pruning Approach)方法,來改善MODT方法的問題,並且增加修剪程序來簡化合併樹。此方法包括三個主要程序:決策樹合併、合併樹修剪和決策樹驗證。其做法是先將兩棵原始樹經由合併程序結合成一棵合併樹,再透過修剪程序產生修剪樹,最後由驗證程序來評估修剪樹的準確度。本研究提出的DTBMPA方法藉由合併程序來擴大樹的知識,再利用修剪程序來取得更精簡的合併樹。
本研究利用實際信用卡客戶的信用資料來進行驗證。在MODT方法的實驗上,合併樹的準確度同時大於或等於兩棵原始樹的比例為79.5%;並且針對兩者的準確度進行統計檢定,我們發現合併樹的準確度是有顯著大於原始樹。而在DTBMPA方法的實驗中,合併樹的準確度優於原始一棵樹的比率有90%,而修剪樹的準確度大於或等於合併樹的比率有80%。在統計檢定中,合併樹和修剪樹的準確度優於一棵樹的準確度達顯著差異。且修剪樹的節點數較合併樹的節點數平均減少約15%。綜合上述,本研究所提之MODT方法和DTBMPA方法皆能使得合併樹的準確度優於一棵樹的準確度,而其中DTBMPA方法可以取得更精簡的合併樹。
此外,就決策樹形式知識整合的應用而言,本研究提出一個決策樹形式知識發掘預測系統架構,其主要的目在於提供一個Web-Based的知識發掘預測系統,以輔助企業進行知識學習、知識儲存、知識整合、知識流通和知識應用等知識管理的功能。期能藉由使用這套系統來發掘企業內部隱含的重要知識,並運用此發掘的知識進行分類和預測工作。它包含三個主要子系統,即知識學習子系統、合併決策樹子系統和線上預測子系統,其中合併決策樹子系統就是應用本研究所提出之決策樹形式知識整合方法來進行知識整合處理。
有關後續研究方面,可針對下列議題進行研究:
一、就決策樹形式知識整合架構中,探討決策樹形式知識清理單元,即前置處理部份的功能設計,期能讓合併樹結合有一定質量的決策樹形式知識。
二、就綜合多個預測值部份,可加入模糊邏輯理論,處理判定結果值之灰色地帶,以提昇合併樹的預測準確度。
三、就決策樹本身而言,可進一步探討結合選取多個屬性來進行往下分群的決策樹。針對分類性屬性的分支數目不同或可能值不同時的合併處理方法;以及數值性屬性選取不同的分割點時的合併處理方法。
四、探討分類性屬性的分支數目不同或可能值不同時之合併處理方法,以及數值性屬性選取不同的分割點時之合併處理方法。
五、對於合併樹的修剪方法,可考量利用額外修剪例子集來進行修剪的處理方法,並比較不同修剪法之修剪效果及準確度評估。
六、探討多次合併修剪後的決策樹之重整課題,期能藉由調整樹形結構來提昇其使用時的運作效率,且期能讓合併樹順應環境變化而進行其知識調整,並進一步觀察合併樹的樹形結構之變化情形。
七、就實際應用而言,可與廠商合作來建置決策樹形式知識發掘預測系統,配合該廠商的產業特性及業務需求來設計此系統,並導入此系統於企業內部的營運,期能藉此累積該企業的知識且輔助管理者決策的制定。 / In the knowledge economy era, mastering knowledge can improve organization competitive abilities. Therefore, knowledge creation, retention, application, and integration are becoming the hottest themes for discussion nowadays.
Our research focuses on the discussion of knowledge integration and related subjects. Decision trees are one of the most common methods of knowledge representation. They show knowledge structure in a tree-shaped graph. Decision trees are simple and easily understood; thus we focus on decision-tree-based knowledge in connection with the theme of knowledge integration.
First, this research proposes a method called MODT (Merging Optional Decision Tree), which merges two knowledge trees at once and adds an optional link to merge nodes which have the same ancestor. In MODT, we compare the corresponding nodes of two trees using the top-down traversal method. When their nodes are the same, we recount the number of samples and recalculate the degree of purity. When their nodes are not the same, we add the node of the second tree and its descendants to the first tree by the grafting technique. This yields a completely merged decision tree. The Strong Pattern Rule is used to strengthen the forecast accuracy during the merged decision trees.
Secondly, when we use the MODT method to merge two trees which have different roots, the merged tree has cyclic link in the root. It makes the merged tree not a tree structure, so we propose another approach called DTBMPA (Decision-Tree-Based Merging-Pruning Approach) to solve this problem. There are three steps in this approach. In the merging step, the first step, two primitive decision trees are merged as a merged tree to enlarge the knowledge of primitive trees. In the pruning step, the second step, the merged tree from the first step is pruned as a pruned tree to cut off the bias branches of the merged tree. In the validating step, the last step, the performance of the pruned tree from the second step is validated.
We took real credit card user data as our sample data. In the MODT experiments, the merged trees showed a 79.5% chance of being equal or more accurate than the primitive trees. This research result supports our proposition that the merged decision tree method could achieve a better outcome with regard to knowledge integration and accumulation. In the DTBMPA simulation experiments, the percentage accuracy for the merged tree will have 90% of chance that is greater than or equal to the accuracy for those primitive trees, and the percentage accuracy for the pruned tree will have 80% of chance that is greater than or equal to the accuracy for merged tree. And we also find that the average number of nodes of the pruned tree will have 15% less than that of the merged tree. Eventually, our MODT and DTBMPA methods can improve the accuracy of the merged tree, and the DTBMPA method can produced smaller merged tree.
Finally, in respect to the application of the Decision-Tree-Based Knowledge Integration, this research proposes an on-line Decision-Tree-Based knowledge discovery and predictive system architecture. It can aid businesses to discover their knowledge, store their knowledge, integrate their knowledge, and apply their knowledge to make decisions. It contains three components, including knowledge learning system, decision-tree merging system, and on-line predictive system. And we use the DTBMPA method to design the decision-tree merging system. Future directions of research are as follows.
1.Discussing the Decision-Tree preprocessing process in our Decision-Tree-Based Knowledge Integration Architecture.
2.Using the fuzzy theory to improve the accuracy of the merged tree when combining multiple predictions.
3.Discussing the merge of the complicated decision trees which are model trees, linear decision trees, oblique decision trees, regression trees, or fuzzy trees.
4.Discussing the process to merge two trees which have different possible values of non-numeric attributes or have different cut points of numeric attributes.
5.Comparing the performance of other pruning methods with ours.
6.Discussing the reconstruction of the merged trees after merging many new trees, discussing the adaptation of the merged trees to the changing environment, and observation of the evolution of the merged trees which are produced in different time stamp
7.Implementation of the on-line Decision-Tree-Based knowledge discovery in a real business environment.

Identiferoai:union.ndltd.org:CHENGCHI/G0873565031
Creators馬芳資, Ma, Fang-tz
Publisher國立政治大學
Source SetsNational Chengchi University Libraries
Language中文
Detected LanguageEnglish
Typetext
RightsCopyright © nccu library on behalf of the copyright holders

Page generated in 0.0017 seconds