Global ETD Search

11	Data Editing and Logic: The covering set method from the perspective of logic Boskovitz, Agnes, abvi@webone.com.au January 2008 (has links) Errors in collections of data can cause significant problems when those data are used. Therefore the owners of data find themselves spending much time on data cleaning. This thesis is a theoretical work about one part of the broad subject of data cleaning - to be called the covering set method. More specifically, the covering set method deals with data records that have been assessed by the use of edits, which are rules that the data records are supposed to obey. The problem solved by the covering set method is the error localisation problem, which is the problem of determining the erroneous fields within data records that fail the edits. In this thesis I analyse the covering set method from the perspective of propositional logic. I demonstrate that the covering set method has strong parallels with well-known parts of propositional logic. The first aspect of the covering set method that I analyse is the edit generation function, which is the main function used in the covering set method. I demonstrate that the edit generation function can be formalised as a logical deduction function in propositional logic. I also demonstrate that the best-known edit generation function, written here as FH (standing for Fellegi-Holt), is essentially the same as propositional resolution deduction. Since there are many automated implementations of propositional resolution, the equivalence of FH with propositional resolution gives some hope that the covering set method might be implementable with automated logic tools. However, before any implementation, the other main aspect of the covering set method must also be formalised in terms of logic. This other aspect, to be called covering set correctibility, is the property that must be obeyed by the edit generation function if the covering set method is to successfully solve the error localisation problem. In this thesis I demonstrate that covering set correctibility is a strengthening of the well-known logical properties of soundness and refutation completeness. What is more, the proofs of the covering set correctibility of FH and of the soundness / completeness of resolution deduction have strong parallels: while the proof of soundness / completeness depends on the reduction property for counter-examples, the proof of covering set correctibility depends on the related lifting property. In this thesis I also use the lifting property to prove the covering set correctibility of the function defined by the Field Code Forest Algorithm. In so doing, I prove that the Field Code Forest Algorithm, whose correctness has been questioned, is indeed correct. The results about edit generation functions and covering set correctibility apply to both categorical edits (edits about discrete data) and arithmetic edits (edits expressible as linear inequalities). Thus this thesis gives the beginnings of a theoretical logical framework for error localisation, which might give new insights to the problem. In addition, the new insights will help develop new tools using automated logic tools. What is more, the strong parallels between the covering set method and aspects of logic are of aesthetic appeal. data editing data cleaning propositional logic propositional resolution Fellegi-Holt Field Code Forest Algorithm error localisation reduction property for counter-examples lifting property
12	Partial persistent sequences and their applications to collaborative text document editing and processing Wu, Qinyi 08 July 2011 (has links) In a variety of text document editing and processing applications, it is necessary to keep track of the revision history of text documents by recording changes and the metadata of those changes (e.g., user names and modification timestamps). The recent Web 2.0 document editing and processing applications, such as real-time collaborative note taking and wikis, require fine-grained shared access to collaborative text documents as well as efficient retrieval of metadata associated with different parts of collaborative text documents. Current revision control techniques only support coarse-grained shared access and are inefficient to retrieve metadata of changes at the sub-document granularity. In this dissertation, we design and implement partial persistent sequences (PPSs) to support real-time collaborations and manage metadata of changes at fine granularities for collaborative text document editing and processing applications. As a persistent data structure, PPSs have two important features. First, items in the data structure are never removed. We maintain necessary timestamp information to keep track of both inserted and deleted items and use the timestamp information to reconstruct the state of a document at any point in time. Second, PPSs create unique, persistent, and ordered identifiers for items of a document at fine granularities (e.g., a word or a sentence). As a result, we are able to support consistent and fine-grained shared access to collaborative text documents by detecting and resolving editing conflicts based on the revision history as well as to efficiently index and retrieve metadata associated with different parts of collaborative text documents. We demonstrate the capabilities of PPSs through two important problems in collaborative text document editing and processing applications: data consistency control and fine-grained document provenance management. The first problem studies how to detect and resolve editing conflicts in collaborative text document editing systems. We approach this problem in two steps. In the first step, we use PPSs to capture data dependencies between different editing operations and define a consistency model more suitable for real-time collaborative editing systems. In the second step, we extend our work to the entire spectrum of collaborations and adapt transactional techniques to build a flexible framework for the development of various collaborative editing systems. The generality of this framework is demonstrated by its capabilities to specify three different types of collaborations as exemplified in the systems of RCS, MediaWiki, and Google Docs respectively. We precisely specify the programming interfaces of this framework and describe a prototype implementation over Oracle Berkeley DB High Availability, a replicated database management engine. The second problem of fine-grained document provenance management studies how to efficiently index and retrieve fine-grained metadata for different parts of collaborative text documents. We use PPSs to design both disk-economic and computation-efficient techniques to index provenance data for millions of Wikipedia articles. Our approach is disk economic because we only save a few full versions of a document and only keep delta changes between those full versions. Our approach is also computation-efficient because we avoid the necessity of parsing the revision history of collaborative documents to retrieve fine-grained metadata. Compared to MediaWiki, the revision control system for Wikipedia, our system uses less than 10% of disk space and achieves at least an order of magnitude speed-up to retrieve fine-grained metadata for documents with thousands of revisions. Collaborative text document Labeling scheme Version control Persistent data structure Data consistency control Metadata management Metadata Data editing Electronic data processing Text processing (Computer science)
13	Improving Survey Methodology Through Matrix Sampling Design, Integrating Statistical Review Into Data Collection, and Synthetic Estimation Evaluation Seiss, Mark Thomas 13 May 2014 (has links) The research presented in this dissertation touches on all aspects of survey methodology, from questionnaire design to final estimation. We first approach the questionnaire development stage by proposing a method of developing matrix sampling designs, a design where a subset of questions are administered to a respondent in such a way that the administered questions are predictive of the omitted questions. The proposed methodology compares favorably to previous methods when applied to data collected from a household survey conducted in the Nampula province of Mozambique. We approach the data collection stage by proposing a structured procedure of implementing small-scale surveys in such a way that non-sampling error attributed to data collection is minimized. This proposed methodology requires the inclusion of the statistician in the data editing process during data collection. We implemented the structured procedure during the collection of household survey data in the city of Maputo, the capital of Mozambique. We found indications that the data resulting from the structured procedure is of higher quality than the data with no editing. Finally, we approach the estimation phase of sample surveys by proposing a model-based approach to the estimation of the mean squared error associated with synthetic (indirect) estimates. Previous methodology aggregates estimates for stability, while our proposed methodology allows area-specific estimates. We applied the proposed mean squared error estimation methodology and methods found during literature review to simulated data and estimates from 2010 Census Coverage Measurement (CCM). We found that our proposed mean squared error estimation methodology compares favorably to the previous methods, while allowing for area-specific estimates. / Ph. D. Census Coverage Measurement (CCM) Data Editing Impact Evaluation Matrix Sampling Multiple Imputation On-the-Ground Statistician Split Questionnaire Design Synthetic Estimation Error
14	Analyse de la tenue en endurance de caisses automobiles soumises à des profils de mission sévérisés / Numerical durability analysis of body-in-white Duraffourg, Simon 13 November 2015 (has links) Une caisse automobile est un ensemble complexe formé de plusieurs éléments qui sont souvent constitués de matériaux différents et assemblés principalement par points soudés, généralement à plus de 80%. Au stade de la conception, plusieurs critères doivent être vérifiés numériquement et confirmés expérimentalement par le prototype de la caisse, dont sa tenue en endurance. Dans le contexte économique actuel, la politique de réduction des dépenses énergétiques ou autres a conduit les constructeurs automobiles à optimiser les performances des véhicules actuels, en particulier en réduisant de façon très conséquente la masse de leur caisse. Des problèmes liés à la tenue structurelle ou à la tenue en fatigue de la caisse sont alors apparus. Afin d'être validé, le prototype de caisse doit avoir une résistance suffisante pour supporter les essais de fatigue. Les tests de validation sur bancs d'essais réalisés en amont sur un prototype sont très coûteux pour l'industriel, en particulier lorsque les tests d'essais en fatigue sur la caisse ne permettent pas de confirmer les zones d'apparition des fissures identifiées par simulations numériques. Le sujet de la thèse se limitera à ce dernier point. Il porte sur l'ensemble des analyses à mettre en oeuvre afin d'étudier la tenue en endurance de caisses automobiles soumises à des profils de mission sévérisés. L'objectif principal est de mettre au point un processus d'analyse en simulation numérique permettant de garantir un bon niveau de prédictivité de tenue en endurance des caisses automobiles. On entend par bon niveau de prédictivité, le fait d'être en mesure de corréler correctement les résultats d'essais associés aux profils de missions sévérisés classiquement utilisés dans les plans de validation de la caisse. Cette thèse a conduit à :_ analyser le comportement mécanique de la caisse et les forces d'excitations appliquées au cours de l'essai de validation,_ établir une nouvelle méthode de réduction d'un chargement pour les calculs en endurance,_ mettre au point une nouvelle modélisation EF des liaisons soudées par points,_ améliorer les modèles de prédiction de durée de vie des PSR. Les études menées ont ainsi permis d'améliorer le niveau de prédiction des calculs en fatigue de la caisse afin :_ d'identifier la majorité des zones réellement critiques sur la caisse,_ d'évaluer de manière fiable de la criticité relative de chacune de ces zones,_ d'estimer de façon pertinente la durée de vie associée à chacune de ces zones / A body-in-white (biw) is a complex structure which consists of several elements that are made of different materials and assembled mainly by spot welds, generally above 80%. At the design stage, several criteria must be verified numerically and experimentally by the car prototype, as the biw durability. In the current economic context, the policy of reducing energy and other costs led automotive companies to optimize the vehicle performances, in particular by reducing very consistently the mass of the biw. As a consequences, some structural design problems appeared. In order to be validated, validation test benches are carried out upstream on a prototype vehicle. They are very costly to the manufacturer, especially when fatigue tests do not confirm the cracks areas identified by numerical simulations. The thesis is focused on numerical biw durability analysis. It covers all the numerical analysis to be implemented to study the biw durability behavior. The main objective is to develop a numerical simulation process to ensure a good level of durability prediction. It means to be able to have a good correlation level between test bench results and numerical fatigue life prediction. This thesis has led to:_ analyze the biw mechanical behavior and the excitation forces applied to the biw during the validation tests,_ establish a new fatigue data editing technique to simplify load signal,_ create a new finite element spot weld model,_ develop a new fatigue life prediction of spot welds. The studies have thus improved the level of biw fatigue life prediction by:_ identifying the majority of critical areas on the full biw,_ reliably assessing the relative criticality of each area,_ accurately estimating the lifetime associated with each of these areas Fatigue et endurance Caisse automobile Optimisation Points de soudure par résistance Calcul de structure Fatigue and durability Body-In-White Optimization Spot weld Structural calculation

Page generated in 0.0569 seconds