Return to search

以規則為基礎的分類演算法:應用粗糙集 / A Rule-Based classification algorithm: a rough set approach

在本論文中,我們提出了一個以規則為基礎的分類演算法,名為ROUSER(ROUgh SEt Rule),它利用粗糙集理論作為搜尋啟發的基礎,進而建立規則。我們使用一個已經被廣泛利用的工具實作ROUSER,也使用數個公開資料集對它進行實驗,並將它應用於真實世界的案例。
本論文的初衷可被追溯到一個真實世界的案例,而此案例的目標是從感應器所蒐集的資料中找出與機械故障之間的關聯。為了能支援機械故障的根本原因分析,我們設計並實作了一個以規則為基礎的分類演算法,它所產生的模型是由人類可理解的決策規則所組成,而故障的徵兆與原因則被決策規則所連結。此外,資料中存在著矛盾。舉例而言,不同時間點所蒐集的兩筆紀錄極為相似、甚至相同(除了時間戳記),但其中一筆紀錄與機械故障相關,另一筆則否。本案例的挑戰在於分析矛盾的資料。我們使用粗糙集理論克服這個難題,因為它可以處理不完美知識。
研究者們已經提出了各種不同的分類演算法,而實踐者們則已經將它們應用於各種領域,然而多數分類演算法的設計並不強調演算法所產生模型的可解釋性與可理解性。ROUSER的設計是專門從名目資料中萃取人類可理解的決策規則。而ROUSER與其它多數規則分類演算法不同的地方是利用粗糙集方法選取特徵。ROUSER也提供了數種方式來選擇合宜的屬性與值配對,作為規則的前項。此外,ROUSER的規則產生方法是基於separate-and-conquer策略,因此比其它基於粗糙集的分類演算法所廣泛採用的不可分辨矩陣方法還有效率。
我們進行延伸實驗來驗證ROUSER的能力。對於名目資料的實驗裡,ROUSER在半數的結果中的準確率可匹敵、甚至勝過其他以規則為基礎的分類演算法以及決策樹分類演算法。ROUSER也可以在一些離散化的資料集之中達到可匹敵甚至超越的準確率。我們也提供了內建的特徵萃取方法與其它方法的比較的實驗結果,以及數種用來決定規則前項的方法的實驗結果。 / In this thesis, we propose a rule-based classification algorithm named ROUSER (ROUgh SEt Rule), which uses the rough set theory as the basis of the search heuristics in the process of rule generation. We implement ROUSER using a well developed and widely used toolkit, evaluate it using several public data sets, and examine its applicability using a real-world case study.
The origin of the problem addressed in this thesis can be traced back to a real-world problem where the goal is to determine whether a data record collected from a sensor corresponds to a machine fault. In order to assist in the root cause analysis of the machine faults, we design and implement a rule-based classification algorithm that can generate models consisting of human understandable decision rules to connect symptoms to the cause. Moreover, there are contradictions in data. For example, two data records collected at different time points are similar, or the same (except their timestamps), while one is corresponding to a machine fault but not the other. The challenge is to analyze data with contradictions. We use the rough set theory to overcome the challenge, since it is able to process imperfect knowledge.
Researchers have proposed various classification algorithms and practitioners have applied them to various application domains, while most of the classification algorithms are designed without a focus on interpretability or understandability of the models built using the algorithms. ROUSER is specifically designed to extract human understandable decision rules from nominal data. What distinguishes ROUSER from most, if not all, other rule-based classification algorithms is that it utilizes a rough set approach to select features. ROUSER also provides several ways to decide an appropriate attribute-value pair for the antecedents of a rule. Moreover, the rule generation method of ROUSER is based on the separate-and-conquer strategy, and hence it is more efficient than the indiscernibility matrix method that is widely adopted in the classification algorithms based on the rough set theory.
We conduct extensive experiments to evaluate the capability of ROUSER. On about half of the nominal data sets considered in experiments, ROUSER can achieve comparable or better accuracy than do classification algorithms that are able to generate decision rules or trees. On some of the discretized data sets, ROUSER can achieve comparable or better accuracy. We also present the results of the experiments on the embedded feature selection method and several ways to decide an appropriate attribute-value pair for the antecedents of a rule.

Identiferoai:union.ndltd.org:CHENGCHI/G0099753005
Creators廖家奇, Liao, Chia Chi
Publisher國立政治大學
Source SetsNational Chengchi University Libraries
Language英文
Detected LanguageEnglish
Typetext
RightsCopyright © nccu library on behalf of the copyright holders

Page generated in 0.0029 seconds