Global ETD Search

11	Building a framework for improving data quality in engineering asset management Lin, Chih Shien January 2008 (has links) Asset managers recognise that high-quality engineering data is the key enabler in gaining control of engineering assets. Although they consider accurate, timely and relevant data as critical to the quality of their asset management (AM) decisions, evidence of large variations in data quality (DQ) associated with AM abounds. Therefore, the question arises as to what factors influence DQ in engineering AM. Accordingly, the main goal of this research is to investigate DQ issues associated with AM, and to develop an AM specific DQ framework of factors affecting DQ in AM. The framework is aimed at providing structured guidance for AM organisations to understand, identify and mitigate their DQ problems in a systematic way, and help them create an information orientation to achieve a greater AM performance. Management information systems Data quality Quality control
12	Accuracy optimisation and error detection in automatically generated elevation models derived using digital photogrammetry Gooch, Michael J. January 1999 (has links) Users of current Digital Photogrammetric Systems (DPS) can now rapidly generate dense Digital Elevation Models (DEMs) with a minimal amount of training. This procedure is controlled through a set of strategy parameters embedded in the software. Previous research into the effect of these parameters on the resulting DEMs produc'ed mixed results, with some researchers finding that significant changes to the DEM can be made through manipulation of the parameters whilst others suggested that they have little effect. This thesis builds upon this early work to develop two systems that provide assistance for novice users. The first technique optimises the parameters with respect to DEM accuracy and takes the form of an expert system and compares the output from the DEM with a knowledge base to prescribe an improved set of parameters. The results suggest that the system works and can produce improvements in the accuracy of a DEM. It was found that in certain circumstances, changes to the parameters can have a significant effect on the resulting DEM, but this change does not occur across the entire DEM. The second aspect of the thesis details the development of a completely new approach that automatically detects low accuracy areas of the DEM and presents this information graphically. This is an important development since, as documented in the current literature, few quality control procedures are offered to users. The user can use this information to assist in the manual checking and editing of the final DEM, thus speeding up the workflow and improving the accuracy of the output. The results of tests (using the ERDAS Imagine OrthoMAX software) on a wide variety of imagery are presented and show that the technique reliably detects areas of a DEM with high errors. More significantly, the technique has also been tested on two other DPSs (Zeiss Phodis TS and VirtuoZo) and it was found that it worked well for the Zeiss system but could not be applied to the VirtuoZo software. This demonstrates that the research is not limited to the users of one software package and is of interest to the wider photogrammetric community. 621.3994 Strategy parameters; Data quality; DEM
13	Building a framework for improving data quality in engineering asset management Lin, Chih Shien January 2008 (has links) Asset managers recognise that high-quality engineering data is the key enabler in gaining control of engineering assets. Although they consider accurate, timely and relevant data as critical to the quality of their asset management (AM) decisions, evidence of large variations in data quality (DQ) associated with AM abounds. Therefore, the question arises as to what factors influence DQ in engineering AM. Accordingly, the main goal of this research is to investigate DQ issues associated with AM, and to develop an AM specific DQ framework of factors affecting DQ in AM. The framework is aimed at providing structured guidance for AM organisations to understand, identify and mitigate their DQ problems in a systematic way, and help them create an information orientation to achieve a greater AM performance. Management information systems Data quality Quality control
14	Schema quality analysis in a data integration system BATISTA, Maria da Conceição Moraes 31 January 2008 (has links) Made available in DSpace on 2014-06-12T15:49:12Z (GMT). No. of bitstreams: 1 license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2008 / Conselho Nacional de Desenvolvimento Científico e Tecnológico / Qualidade da Informação (QI) tem se tornado um aspecto crítico nas organizações e em pesquisas da área de sistemas de informação. Informações de pouca qualidade podem ter impactos negativos na efetividade de uma organização. O crescimento do uso de data warehouses e acesso direto de gerentes e usários a informações obtidas de várias fontes contribuíram para o crescimento da necessidade de qualidade nas informações das empresas. A noção de QI em sistemas de informação emergiu nos últimos e vem sendo alvo de interesse cada vez maior. Não existe ainda um acordo comum acerca de uma definição da QI. Apenas um consenso de que tratase de um conceito de adequação ao uso . A informação é considerada apropriada para o uso dentro da perspectiva dos requisitos e necessidades de um usuário, ou seja, a qualidade da informação depende de sua utilidade. O acesso integrado a informações distribuídas em múltiplas fontes de dados heterogêneas, distribuídas e autônomas é um problema importante a ser resolvido em muitos domínios de aplicações. Tipicamente existem algumas formas de se obter respostas a consultas globais, sobre dados em fontes diferentes com diferentes combinações. entretanto é bastante custoso obter todas as respostas possíveis. Enquanto muita pesquisa tem sido feita em relação a processamento de consultas e seleção de planos com critérios de custo, pouco se conhece com relação ao problema de incorporar aspectos de QI em esquemas globais de sistemas de integração de dados. Neste trabalho, nós propomos a análise da QI em um sistema de integração de dados, mais especificamente a qualidade dos esquemas do sistema. O nosso principal objetivo é melhorar a qualidade da execução das consultas. Nossa proposta baseiasse na hipótese de que uma alternativa de otimizar o processamento de consultas seria a construção de esquemas com altos escores de QI. Assim, o foco deste trabalho está no desenvolvimento de mecanismos de análise da QI voltados esquemas de integração de dados, especialmente o esquema global. Inicialmente, nós construímos uma lista de critérios de QI e relacionamos estes critérios com os elementos existentes em sistemas de integração de dados. Em seguida, direcionamos o foco para o esquema integrado e especificamos formalmente critérios de qualidade de esquemas minimalidade, completude do esquema e consistência de tipo. Também especificamos um algoritmo de execução de ajustes de forma a melhorar a minimalidade e algoritmos para medir a consistência de tipo nos esquemas. Com esses experimentos conseguimos mostrar que o tempo de execução de uma consulta em um sistema de integração de dados pode diminuir se esta consulta for submetida a um esquema com escores altos de minimalidade e consistência de tipo Information Quality Data Quality Data Integration
15	Data Quality Metrics / Data Quality Metrics Sýkorová, Veronika January 2008 (has links) The aim of the thesis is to prove measurability of the Data Quality which is a relatively subjective measure and thus is difficult to measure. In doing this various aspects of measuring the quality of data are analyzed and a Complex Data Quality Monitoring System is introduced with the aim to provide a concept for measuring/monitoring the overall Data Quality in an organization. The system is built on a metrics hierarchy decomposed into particular detailed metrics, dimensions enabling multidimensional analyses of the metrics, and processes being measured by the metrics. The first part of the thesis (Chapter 2 and Chapter 3) is focused on dealing with Data Quality, i.e. provides various definitions of Data Quality, gives reasoning for the importance of Data Quality in a company, and presents some of the most common tools and solutions that target to managing Data Quality in an organization. The second part of the thesis (Chapter 4 and Chapter 5) builds on the previous part and leads into measuring Data Quality using metrics, i.e. contains definition and purpose of Data Quality Metrics, places them into the multidimensional context (dimensions, hierarchies) and states five possible decompositions of Data Quality metrics into detail. The third part of the thesis (Chapter 6) contains the proposed Complex Data Quality Monitoring System including description of Data Quality Management related dimensions and processes, and most importantly detailed definition of bottom-level metrics used for calculation of the overall Data Quality.
16	Datová kvalita a nástroje pro její řízení / Data Quality And Tools For Its Management Tezzelová, Jana January 2009 (has links) This diploma thesis deals with data quality, with emphasis on issues of management and on tools which were developed for solving data quality issues. The goal of this work is to summarize knowledge about data quality problems which includes its evaluation, management, description of key problems in data and possibilities of their solutions. The aims of this thesis are among others also analysis of market of software tools for support and management of data quality and mainly comparison of functionalities and possibilities of several of those tools. This work is split into two consequential parts. The first theoretical part is focusing on opening to problems of data quality and mainly data quality management, including identification of main steps for successful management. The second practical part is focusing on the market with data quality tools, especially its characteristics, segmentation, evolution, current state and expectable trends. The important section of this part is also practical comparison of features and evaluation of the work with several data quality tools. This work aims to be beneficial for all the audience interested in data quality problems, especially its management and supporting technology. Thanks to focusing on data quality tools market and tools comparison this work could be also useful guide for companies which are currently choosing the proper tool for introducing the data quality. Regarding this work focus the readers are expected to have at least basic orientation in Business Intelligence.
17	Relational Data Curation by Deduplication, Anonymization, and Diversification Huang, Yu January 2020 (has links) Enterprises acquire large amounts of data from a variety of sources with the goal of extracting valuable insights and enabling informed analysis. Unfortunately, organizations continue to be hindered by poor data quality as they wrangle with their data to extract value since most real datasets are rarely error-free. Poor data quality is a pervasive problem that spans across all industries causing unreliable data analysis, and costing billions of dollars. The large body of datasets, the pace of data acquisition, and the heterogeneity of data sources pose challenges towards achieving high-quality data. These challenges are further exacerbated with data privacy and data diversity requirements. In this thesis, we study and propose solutions to address data duplication, managing the trade-off between data cleaning and data privacy, and computing diverse data instances. In the first part of this thesis, we address the data duplication problem. We propose a duplication detection framework, which combines word-embeddings with constraints among attributes to improve the accuracy of deduplication. We propose a set of constraint-based statistical features to capture the semantic relationship among attributes. We showed that our techniques achieve comparative accuracy on real datasets. In the second part of this thesis, we study the problem of data privacy and data cleaning, and we present a Privacy-Aware data Cleaning-As-a-Service (PACAS) framework to protect privacy during the cleaning process. Our evaluation shows that PACAS safeguards semantically related sensitive values, and provides lower repair errors compared to existing privacy-aware cleaning techniques. In the third part of this thesis, we study the problem of finding a diverse anonymized data instance where diversity is measured via a set of diversity constraints, and propose an algorithm to seek a k-anonymous relation with value suppression as well as satisfying given diversity constraints. We conduct extensive experiments using real and synthetic data showing the effectiveness of our techniques, and improvement over existing baselines. / Thesis / Doctor of Philosophy (PhD) data quality data cleaning data privacy
18	An Integrated Approach to Improve Data Quality Al-janabi, Samir 06 1900 (has links) Thesis / A huge quantity of data is created and saved everyday in databases from different types of data sources, including financial data, web log data, sensor data, and human input. Information technology enables organizations to collect and store large amounts of data in databases. Different organizations worldwide use data to support their activities through various applications. Issues in data quality such as duplicate records, inaccurate data, violations of integrity constraints, and outdated data are common in databases. Thus, data in databases are often unclean. Such issues in data quality might cost billions of dollars annually and might have severe consequences on critical tasks such as analysis, decision making, and planning. Data cleaning processes are required to detect and correct errors in the unclean data. Despite the fact that there are multiple quality issues, current data cleaning techniques generally deal with only one or two aspects of quality. The techniques assume either the availability of master data, or training data, or the involvement of users in data cleaning. For instance, users might manually place confidence scores that represent the correctness of the values of data or they may be consulted about the repairs. In addition, the techniques may depend on high-quality master data or pre-labeled training data to fix errors. However, relying on human effort to correct errors is expensive, and master data or training data are not always available. These factors make it challenging to discover which values have issues, thereby making it difficult to fix the data (e.g., merging several duplicate records into a single representative record). To address these problems in data cleaning, we propose algorithms that integrate multiple data quality issues in the cleaning. In this thesis, we apply this approach in the context of multiple data quality issues where errors in data are introduced from multiple causes. The issues include duplicate records, violations of integrity constraints, inaccurate data, and outdated data. We fix these issues holistically, without a need for human manual interaction, master data, or training data. We propose an algorithm to tackle the problem of data cleaning. We concentrate on issues in data quality including duplicate records, violations of integrity constraints, and inaccurate data. We utilize the embedded density information in data to eliminate duplicates based on data density, where tuples that are close to each other are packed together. Density information enables us to reduce manual user interaction in the deduplication process, and the dependency on master data or training data. To resolve inconsistency in duplicate records, we present a weight model to automatically assign confidence scores that are based on the density of data. We consider the inconsistent data in terms of violations with respect to a set of functional dependencies (FDs). We present a cost model for data repair that is based on the weight model. To resolve inaccurate data in duplicate records, we measure the relatedness of the words of the attributes in the duplicate records based on hierarchical clustering. In the context of integrating the fix of outdated data and inaccurate data in duplicate elimination, we propose an algorithm for data cleaning by introducing techniques based on corroboration, i.e. taking into consideration the trustworthiness of the attribute values. The algorithm integrates data deduplication with data currency and accuracy. We utilize the density information embedded inside the tuples in order to guide the cleaning process to fix multiple data quality issues. By using density information in corroboration, we reduce relying on manual user interaction, and the dependency on master data or training data. / Thesis / Doctor of Philosophy (PhD) data management data quality data mining
19	Improving The Accuracy of 3D Geologic Subsurface Models MacCormack, Kelsey 06 1900 (has links) <P> This study investigates ways to improve the accuracy of 3D geologic models by assessing the impact of data quality, grid complexity, data quantity and distribution, interpolation algorithm and program selection on model accuracy. The first component of this research examines the impact of variable quality data on 3D model outputs and presents a new methodology to optimize the impact of high quality data, while minimizing the impact of low quality data on the model results. This 'Quality Weighted' modelling approach greatly improves model accuracy when compared with un-weighted models. </p> <p> The second component of the research assesses the variability and influence of data quantity, data distribution, algorithm selection, and program selection on the accuracy of 3D geologic models. A series of synthetic grids representing environments of varying complexity were created from which data subsets were extracted using specially developed MA TLAB scripts. The modelled data were compared back to the actual synthetic values and statistical tests were conducted to quantify the impact of each variable on the accuracy of the model predictions. The results indicate that grid complexity is the predominant control on model accuracy, more data do not necessarily produce more accurate models, and data distribution is particularly important when relatively simple environments are modelled. A major finding of this study is that in some situations, the software program selected for modelling can have a greater influence on model accuracy than the algorithm used for interpolation. When modelling spatial data there is always a high level of uncertainty, especially in subsurface environments where the unit(s) of interest are defined by data only available in select locations. The research presented in this thesis can be used to guide the selection of modelling parameters used in 3D subsurface investigations and will allow the more effective and efficient creation of accurate 3D models. </p> / Thesis / Doctor of Philosophy (PhD) 3D geologic subsurface model data quality
20	An Analysis of Data Quality Defects in Podcasting Systems Mis, Thomas A. 06 December 2012 (has links) No description available. Computer Science podcast podcasting data quality, TDQM

Search results