Management of data imprecision has become increasingly important, especially with the advance of technology enabling applications to collect and store huge amount data from multiple sources. Data collected in such applications involve a large number of variables and various types of data imperfections. These data, when used in knowledge discovery applications, require the following: 1) computationally efficient algorithms that works faster with limited resources, 2) an effective methodology for modeling data imperfections and 3) procedures for enabling knowledge discovery and quantifying and propagating partial or incomplete knowledge throughout the decision-making process. Bayesian Networks (BNs) provide a convenient framework for modeling these applications probabilistically enabling a compact representation of the joint probability distribution involving large numbers of variables. BNs also form the foundation for a number of computationally efficient algorithms for making inferences. The underlying probabilistic approach however is not sufficiently capable of handling the wider range of data imperfections that may appear in many new applications (e.g., medical data). Dempster-Shafer theory on the other hand provides a strong framework for modeling a broader range of data imperfections. However, it must overcome the challenge of a potentially enormous computational burden. In this dissertation, we introduce the joint Dirichlet BoE, a certain mass assignment in the DS theoretic framework, that simplifies the computational complexity while enabling one to model many common types of data imperfections. We first use this Dirichlet BoE model to enhance the performance of the EM algorithm used in learning BN parameters from data with missing values. To form a framework of reasoning with the Dirichlet BoE, the DS theoretic notions of conditionals, independence and conditional independence are revisited. These notions are then used to develop the DS-BN, a BN-like graphical model in the DS theoretic framework, that enables a compact representation of the joint Dirichlet BoE. We also show how one may use the DS-BN in different types of reasoning tasks. A local message passing scheme is developed for efficient propagation of evidence in the DS-BN. We also extend the use of the joint Dirichlet BoE to Markov models and hidden Markov models to address the uncertainty arising due to inadequate training data. Finally, we present the results of various experiments carried out on synthetically generated data sets as well as data sets from medical applications.
Identifer | oai:union.ndltd.org:UMIAMI/oai:scholarlyrepository.miami.edu:oa_dissertations-1111 |
Date | 09 June 2008 |
Creators | Hewawasam, Kottigoda. K. Rohitha G. |
Publisher | Scholarly Repository |
Source Sets | University of Miami |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Open Access Dissertations |
Page generated in 0.007 seconds