Global ETD Search

1	Model Based Learning and Reasoning from Partially Observed Data Hewawasam, Kottigoda. K. Rohitha G. 09 June 2008 (has links) Management of data imprecision has become increasingly important, especially with the advance of technology enabling applications to collect and store huge amount data from multiple sources. Data collected in such applications involve a large number of variables and various types of data imperfections. These data, when used in knowledge discovery applications, require the following: 1) computationally efficient algorithms that works faster with limited resources, 2) an effective methodology for modeling data imperfections and 3) procedures for enabling knowledge discovery and quantifying and propagating partial or incomplete knowledge throughout the decision-making process. Bayesian Networks (BNs) provide a convenient framework for modeling these applications probabilistically enabling a compact representation of the joint probability distribution involving large numbers of variables. BNs also form the foundation for a number of computationally efficient algorithms for making inferences. The underlying probabilistic approach however is not sufficiently capable of handling the wider range of data imperfections that may appear in many new applications (e.g., medical data). Dempster-Shafer theory on the other hand provides a strong framework for modeling a broader range of data imperfections. However, it must overcome the challenge of a potentially enormous computational burden. In this dissertation, we introduce the joint Dirichlet BoE, a certain mass assignment in the DS theoretic framework, that simplifies the computational complexity while enabling one to model many common types of data imperfections. We first use this Dirichlet BoE model to enhance the performance of the EM algorithm used in learning BN parameters from data with missing values. To form a framework of reasoning with the Dirichlet BoE, the DS theoretic notions of conditionals, independence and conditional independence are revisited. These notions are then used to develop the DS-BN, a BN-like graphical model in the DS theoretic framework, that enables a compact representation of the joint Dirichlet BoE. We also show how one may use the DS-BN in different types of reasoning tasks. A local message passing scheme is developed for efficient propagation of evidence in the DS-BN. We also extend the use of the joint Dirichlet BoE to Markov models and hidden Markov models to address the uncertainty arising due to inadequate training data. Finally, we present the results of various experiments carried out on synthetically generated data sets as well as data sets from medical applications.
2	DS-ARM: An Association Rule Based Predictor that Can Learn from Imperfect Data Sooriyaarachchi Wickramaratna, Kasun Jayamal 13 January 2010 (has links) Over the past decades, many industries have heavily spent on computerizing their work environments with the intention to simplify and expedite access to information and its processing. Typical of real-world data are various types of imperfections, uncertainties, ambiguities, that have complicated attempts at automated knowledge discovery. Indeed, it soon became obvious that adequate methods to deal with these problems were critically needed. Simple methods such as "interpolating" or just ignoring data imperfections being found often to lead to inferences of dubious practical value, the search for appropriate modification of knowledge-induction techniques began. Sometimes, rather non-standard approaches turned out to be necessary. For instance, the probabilistic approaches by earlier works are not sufficiently capable of handling the wider range of data imperfections that appear in many new applications (e.g., medical data). Dempster-Shafer theory provides a much stronger framework, and this is why it has been chosen as the fundamental paradigm exploited in this dissertation. The task of association rule mining is to detect frequently co-occurring groups of items in transactional databases. The majority of the papers in this field concentrate on how to expedite the search. Less attention has been devoted to how to employ the identified frequent itemsets for prediction purposes; worse still, methods to tailor association-mining techniques so that they can handle data imperfections are virtually nonexistent. This dissertation proposes a technique referred to by the acronym DS-ARM (Dempster-Shafer based Association Rule Mining) where the DS-theoretic framework is used to enhance a more traditional association-mining mechanism. Of particular interest is here a method to employ the knowledge of partial contents of a "shopping cart" for the prediction of what else the customer is likely to add to it. This formalized problem has many applications in the analysis of medical databases. A recently-proposed data structure, an itemset tree (IT-tree), is used to extract association rules in a computationally efficient manner, thus addressing the scalability problem that has disqualified more traditional techniques from real-world applications. The proposed algorithm is based on the Dempster-Shafer theory of evidence combination. Extensive experiments explore the algorithm's behavior; some of them use synthetically generated data, others relied on data obtained from a machine-learning repository, yet others use a movie ratings dataset or a HIV/AIDS patient dataset.
3	Vers la construction d'un référentiel géographique ancien : un modèle de graphe agrégé pour intégrer, qualifier et analyser des réseaux géohistoriques / Towards the construction of a geohistorical reference database : an aggregated graph to integrate, qualify and analyze geohistorical networks Costes, Benoît 04 November 2016 (has links) Les historiens et archéologues ont efficacement mis à profit les travaux réalisés dans le domaine des SIG pour répondre à leurs propres problématiques. Pour l'historien, un Système d’Information Géographique est avant tout un outil de compréhension des phénomènes sociaux.De nombreuses sources géohistoriques sont aujourd'hui mises à la disposition des chercheurs: plans anciens, bottins, etc. Le croisement de ces sources d'informations diverses et hétérogènes soulève de nombreuses questions autour des dynamiques urbaines.Mais les données géohistoriques sont par nature imparfaites, et pour pouvoir être exploitées, elles doivent être spatialisées et qualifiées.L'objectif de cette thèse est d'apporter une solution à ce verrou par la production de données anciennes de référence. En nous focalisant sur le réseau des rues de Paris entre la fin du XVIIIe et la fin du XIXe siècles, nous proposons plus précisément un modèle multi-représentations de données agrégées permettant, par confrontation d'observations homologues dans le temps, de créer de nouvelles connaissances sur les imperfections des données utilisées et de les corriger. Nous terminons par tester le rôle de référentiel géohistorique des données précédemment qualifiées et enrichies en spatialisant et intégrant dans le modèle de nouvelles données géohistoriques de types variés (sociales et spatiales), en proposant de nouvelles approches d'appariement et de géocodage / The increasing availability of geohistorical data, particularly through the development of collaborative projects is a first step towards the design of a representation of space and its changes over time in order to study its evolution, whether social, administrative or topographical.Geohistorical data extracted from various and heterogeneous sources are highly inaccurate, uncertain or inexact according to the existing terminology. Before being processed, such data should be qualified and spatialized.In this thesis, we propose a solution to this issue by producing reference data. In particular, we focus on Paris historical street networks and its evolution between the end of the XVIIIth and the end of the XIXth centuries.Our proposal is based on a merged structure of multiple representations of data capable of modelling spatial networks at different times, providing tools such as pattern detection in order to criticize, qualify and eventually correct data and sources without using ground truth data but the comparison of data with each other through the merging process.Then, we use the produced reference data to spatialize and integrate other geohistorical data such as social data, by proposing new approaches of data matching and geocoding Réseaux géohistoriques Théorie des graphes Imperfections Intégration Référentiel géohistorique Appariement Geohistorical networks Graph theory Data imperfections Data integration Geohistorical referential Data matching

Search results

Model Based Learning and Reasoning from Partially Observed Data

DS-ARM: An Association Rule Based Predictor that Can Learn from Imperfect Data