因果分析由于可以刻画随机事件之间的关系而被关注,而图模型则是描述因果关系的重要工具。在图模型框架中,数据集中隐含的因果关系被表示为定义在这个数据集上的贝叶斯网络,通过贝叶斯网络学习就可以完成数据集上的因果关系挖掘。因此,贝叶斯网络学习在因果分析中具有非常重要的作用。在本文中,我们提出了一种二段式的贝叶斯网络学习算法。在第一阶段,此算法从数据中构建出马尔可夫随机场。在第二阶段,此算法根据学习到的条件随机场构造出贝叶斯网络。本文中提出的二段式贝叶斯网络学习算法具有比现有算法更高的准确率,而且这种二段式算法中的一些技术可以很容易的被应用于其他贝叶斯网络学习算法当中。此外,通过与其他的时间序列中的因果分析模型(例如向量自回归和结构向量自回归模型)做比较,我们可以看出二段式的贝叶斯网络学习算法可以被用于时间序列的因果分析。 通过在真实数据集上的实验,我们证明的二段式贝叶斯网络学习算法在实际问题中的可用性。 / 本文开始介绍了基于约束的贝叶斯网络学习框架,其中的代表作是SGS 算法。在基于约束的贝叶斯网络学习框架中,如何减小测试条件独立的搜索空间是提高算法性能的关键步骤。二段式贝叶斯网络学习算法的核心即是研究如何减小条件独立测试的搜索空间。为此,我们证明了通过马尔可夫随机场来确定贝叶斯网络的结构可以有效的减小条件独立测试的计算复杂性以及增加算法的稳定性。在本文中,偏相关系数被用来度量条件独立。这种方法可用于基于约束的贝叶斯网络学习算法。具体来说,本文证明了在给定数据集的生成模型为线性的条件下,偏相关系数可被用于度量条件独立。而且本文还证明了高斯模型是线性结构方程模型的一个特例。本文比较了二段式的贝叶斯网络学习算法与当前性能最佳的贝叶斯算法在一系列真实贝叶斯网络上的表现。 / 文章的最后一部分研究了二段式的贝叶斯网络学习算法在时间序列因果分析中的应用。在这部分工作中,我们首先证明了结构向量自回归模型模型在高斯过程中不能发现同时期的因果关系。失败的原因是结构向量自回归模型不能满足贝叶斯网络的忠实性条件。因此,本文的最后一部分提出了一种区别于现有工作的基于贝叶斯网络的向量自回归和结构向量自回归模型学习算法。并且通过实验证明的算法在实际问题中的可用性。 / Causal analysis has drawn a lot of attention because it provides with deep insight of relations between random events. Graphical model is a dominant tool to represent causal relations. Under graphical model framework, causal relations implied in a data set are captured by a Bayesian network defined on this data set and causal discovery is achieved by constructing a Bayesian network from the data set. Therefore, Bayesian network learning plays an important role in causal relation discovery. In this thesis, we develop a Two-Phase Bayesian network learning algorithm that learns Bayesian network from data. Phase one of the algorithm learns Markov random fields from data, and phase two constructs Bayesian networks based on Markov random fields obtained. We show that the Two-Phase algorithm provides state-of-the-art accuracy, and the techniques proposed in this work can be easily adopted by other Bayesian network learning algorithms. Furthermore, we present that Two-Phase algorithm can be used for time series analysis by evaluating it against a series of time series causal learning algorithms, including VAR and SVAR. Its practical applicability is also demonstrated through empirical evaluation on real world data set. / We start by presenting a constraint-based Bayesian network learning framework that is a generalization of SGS algorithm [86]. We show that the key step in making Bayesian networks to learn efficiently is restricting the search space of conditioning sets. This leads to the core of this thesis: Two-Phase Bayesian network learning algorithm. Here we show that by learning Bayesian networks fromMarkov random fields, we efficiently reduce the computational complexity and enhance the reliability of the algorithm. Besides the proposal of this Bayesian network learning algorithm, we use zero partial correlation as an indicator of conditional independence. We show that partial correlation can be applied to arbitrary distributions given that data are generated by linear models. In addition, we prove that Gaussian distribution is a special case of linear structure equation model. We then compare our Two-Phase algorithm to other state-of-the-art Bayesian network algorithms on several real world Bayesian networks that are used as benchmark by many related works. / Having built an efficient and accurate Bayesian network learning algorithm, we then apply the algorithm for causal relation discovering on time series. First we show that SVAR model is incapable of identifying contemporaneous causal orders for Gaussian process because it fails to discover the structures faithful to the underlying distributions. We also develop a framework to learn true SVAR and VAR using Bayesian network, which is distinct from existing works. Finally, we show its applicability to a real world problem. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Detailed summary in vernacular field only. / Wang, Zhenxing. / Thesis (Ph.D.)--Chinese University of Hong Kong, 2012. / Includes bibliographical references (leaves 184-195). / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Abstract also in Chinese. / Abstract --- p.i / Acknowledgement --- p.v / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Causal Relation and Directed Graphical Model --- p.1 / Chapter 1.2 --- A Brief History of Bayesian Network Learning --- p.3 / Chapter 1.3 --- Some Important Issues for Causal BayesianNetwork Learning --- p.5 / Chapter 1.3.1 --- Learning Bayesian network locally --- p.6 / Chapter 1.3.2 --- Conditional independence test --- p.7 / Chapter 1.3.3 --- Causation discovery for time series --- p.8 / Chapter 1.4 --- Road Map of the Thesis --- p.10 / Chapter 1.5 --- Summary of the Remaining Chapters --- p.12 / Chapter 2 --- Background Study --- p.14 / Chapter 2.1 --- Notations --- p.14 / Chapter 2.2 --- Formal Preliminaries --- p.15 / Chapter 2.3 --- Constraint-Based Bayesian Network Learning --- p.24 / Chapter 3 --- Two-Phase Bayesian Network Learning --- p.33 / Chapter 3.1 --- Two-Phase Bayesian Network Learning Algorithm --- p.35 / Chapter 3.1.1 --- Basic Two-Phase algorithm --- p.37 / Chapter 3.1.2 --- Two-Phase algorithm with Markov blanket information --- p.59 / Chapter 3.2 --- Correctness Proof and Complexity Analysis --- p.73 / Chapter 3.2.1 --- Correctness proof --- p.73 / Chapter 3.2.2 --- Complexity analysis --- p.81 / Chapter 3.3 --- Related Works --- p.83 / Chapter 3.3.1 --- Search-and-score algorithms --- p.84 / Chapter 3.3.2 --- Constraint-based algorithms --- p.85 / Chapter 3.3.3 --- Other algorithms --- p.86 / Chapter 4 --- Measuring Conditional Independence --- p.88 / Chapter 4.1 --- Formal Definition of Conditional Independence --- p.88 / Chapter 4.2 --- Measuring Conditional Independence --- p.96 / Chapter 4.2.1 --- Measuring independence with partial correlation --- p.96 / Chapter 4.2.2 --- Measuring independence with mutual information --- p.104 / Chapter 4.3 --- Non-Gaussian Distributions and Equivalent Class --- p.108 / Chapter 4.4 --- Heuristic CI Tests UnderMonotone Faithfulness Condition --- p.116 / Chapter 5 --- Empirical Results of Two-Phase Algorithms --- p.125 / Chapter 5.1 --- Experimental Setup --- p.126 / Chapter 5.2 --- Structure Error After Each Phase of Two-Phase Algorithms --- p.129 / Chapter 5.3 --- Maximal and Average Sizes of Conditioning Sets --- p.131 / Chapter 5.4 --- Comparison of the Number of CI Tests Required by Dependency Analysis Approaches --- p.133 / Chapter 5.5 --- Reason forWhich Number of CI Tests Required Grow with Sample Size --- p.135 / Chapter 5.6 --- Two-Phase Algorithms on Linear Gaussian Data --- p.136 / Chapter 5.7 --- Two-phase Algorithms on Linear Non-Gaussian Data --- p.139 / Chapter 5.8 --- Compare Two-phase Algorithms with Search-and-Score Algorithms and Lasso Regression --- p.142 / Chapter 6 --- Causal Mining in Time Series Data --- p.146 / Chapter 6.1 --- A Brief Review of Causation Discovery in Time Series --- p.146 / Chapter 6.2 --- Limitations of Constructing SVAR from VAR --- p.150 / Chapter 6.3 --- SVAR Being Incapability of Identifying Contemporaneous Causal Order for Gaussian Process --- p.152 / Chapter 6.4 --- Estimating the SVARs by Bayesian Network Learning Algorithm --- p.157 / Chapter 6.4.1 --- Represent SVARs by Bayesian networks --- p.158 / Chapter 6.4.2 --- Getting back SVARs and VARs fromBayesian networks --- p.159 / Chapter 6.5 --- Experimental Results --- p.162 / Chapter 6.5.1 --- Experiment on artificial data --- p.162 / Chapter 6.5.2 --- Application in finance --- p.172 / Chapter 6.6 --- Comparison with Related Works --- p.174 / Chapter 7 --- Concluding Remarks --- p.178 / Bibliography --- p.184
Identifer | oai:union.ndltd.org:cuhk.edu.hk/oai:cuhk-dr:cuhk_328161 |
Date | January 2012 |
Contributors | Wang, Zhenxing, Chinese University of Hong Kong Graduate School. Division of Computer Science and Engineering. |
Source Sets | The Chinese University of Hong Kong |
Language | English, Chinese |
Detected Language | English |
Type | Text, bibliography |
Format | electronic resource, electronic resource, remote, 1 online resource (xii, 195 leaves) : ill. (some col.) |
Rights | Use of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/) |
Page generated in 0.0025 seconds