• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 5
  • Tagged with
  • 6
  • 6
  • 6
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Quality data extraction methodology based on the labeling of coffee leaves with nutritional deficiencies

Jungbluth, Adolfo, Yeng, Jon Li 04 1900 (has links)
El texto completo de este trabajo no está disponible en el Repositorio Académico UPC por restricciones de la casa editorial donde ha sido publicado. / Nutritional deficiencies detection for coffee leaves is a task which is often undertaken manually by experts on the field known as agronomists. The process they follow to carry this task is based on observation of the different characteristics of the coffee leaves while relying on their own experience. Visual fatigue and human error in this empiric approach cause leaves to be incorrectly labeled and thus affecting the quality of the data obtained. In this context, different crowdsourcing approaches can be applied to enhance the quality of the data extracted. These approaches separately propose the use of voting systems, association rule filters and evolutive learning. In this paper, we extend the use of association rule filters and evolutive approach by combining them in a methodology to enhance the quality of the data while guiding the users during the main stages of data extraction tasks. Moreover, our methodology proposes a reward component to engage users and keep them motivated during the crowdsourcing tasks. The extracted dataset by applying our proposed methodology in a case study on Peruvian coffee leaves resulted in 93.33% accuracy with 30 instances collected by 8 experts and evaluated by 2 agronomic engineers with background on coffee leaves. The accuracy of the dataset was higher than independently implementing the evolutive feedback strategy and an empiric approach which resulted in 86.67% and 70% accuracy respectively under the same conditions. / Revisión por pares
2

SemDQ: A Semantic Framework for Data Quality Assessment

Zhu, Lingkai January 2014 (has links)
Objective: Access to, and reliance upon, high quality data is an enabling cornerstone of modern health delivery systems. Sadly, health systems are often awash with poor quality data which contributes both to adverse outcomes and can compromise the search for new knowledge. Traditional approaches to purging poor data from health information systems often require manual, laborious and time-consuming procedures at the collection, sanitizing and processing stages of the information life cycle with results that often remain sub-optimal. A promising solution may lie with semantic technologies - a family of computational standards and algorithms capable of expressing and deriving the meaning of data elements. Semantic approaches purport to offer the ability to represent clinical knowledge in ways that can support complex searching and reasoning tasks. It is argued that this ability offers exciting promise as a novel approach to assessing and improving data quality. This study examines the effectiveness of semantic web technologies as a mechanism by which high quality data can be collected and assessed in health settings. To make this assessment, key study objectives include determining the ability to construct of valid semantic data model that sufficiently expresses the complexity present in the data as well as the development of a comprehensive set of validation rules that can be applied semantically to test the effectiveness of the proposed semantic framework. Methods: The Semantic Framework for Data Quality Assessment (SemDQ) was designed. A core component of the framework is an ontology representing data elements and their relationships in a given domain. In this study, the ontology was developed using openEHR standards with extensions to capture data elements used in for patient care and research purposes in a large organ transplant program. Data quality dimensions were defined and corresponding criteria for assessing data quality were developed for each dimension. These criteria were then applied using semantic technology to an anonymized research dataset containing medical data on transplant patients. Results were validated by clinical researchers. Another test was performed on a simulated dataset with the same attributes as the research dataset to confirm the computational accuracy and effectiveness of the framework. Results: A prototype of SemDQ was successfully implemented, consisting of an ontological model integrating the openEHR reference model, a vocabulary of transplant variables and a set of data quality dimensions. Thirteen criteria in three data quality dimensions were transformed into computational constructs using semantic web standards. Reasoning and logic inconsistency checking were first performed on the simulated dataset, which contains carefully constructed test cases to ensure the correctness and completeness of logical computation. The same quality checking algorithms were applied to an established research database. Data quality defects were successfully identified in the dataset which was manually cleansed and validated periodically. Among the 103,505 data entries, application of two criteria did not return any error, while eleven of the criteria detected erroneous or missing data, with the error rates ranging from 0.05% to 79.9%. Multiple review sessions were held with clinical researchers to verify the results. The SemDQ framework was refined to reflect the intricate clinical knowledge. Data corrections were implemented in the source dataset as well as in the clinical system used in the transplant program resulting in improved quality of data for both clinical and research purposes. Implications: This study demonstrates the feasibility and benefits of using semantic technologies in data quality assessment processes. SemDQ is based on semantic web standards which allows easy reuse of rules and leverages generic reasoning engines for computation purposes. This mechanism avoids the shortcomings that come with proprietary rule engines which often make ruleset and knowledge developed for one dataset difficult to reuse in different datasets, even in a similar clinical domain. SemDQ can implement rules that have shown to have a greater capacity of detect complex cross-reference logic inconsistencies. In addition, the framework allows easy extension of knowledge base to cooperate more data types and validation criteria. It has the potential to be incorporated into current workflow in clinical care setting to reduce data errors during the process of data capture.
3

Towards a Data Quality Framework for Heterogeneous Data

Micic, Natasha, Neagu, Daniel, Campean, Felician, Habib Zadeh, Esmaeil 22 April 2017 (has links)
yes / Every industry has significant data output as a product of their working process, and with the recent advent of big data mining and integrated data warehousing it is the case for a robust methodology for assessing the quality for sustainable and consistent processing. In this paper a review is conducted on Data Quality (DQ) in multiple domains in order to propose connections between their methodologies. This critical review suggests that within the process of DQ assessment of heterogeneous data sets, not often are they treated as separate types of data in need of an alternate data quality assessment framework. We discuss the need for such a directed DQ framework and the opportunities that are foreseen in this research area and propose to address it through degrees of heterogeneity.
4

An assessment of data quality in routine health information systems in Oyo State, Nigeria

Adejumo, Adedapo January 2017 (has links)
Magister Public Health - MPH / Ensuring that routine health information systems provide good quality information for informed decision making and planning in health systems remain a major priority in several countries and health systems. The lack of use of health information or use of poor quality data in health care and systems results in inadequate assessments and evaluation of health care and result in weak and poorly functioning health systems. The Nigerian health system like in many developing countries has challenges with the building blocks of the health system with a weak Health Information System. Although the quality of data in the Nigerian routine health information system has been deemed poor in some reports and studies, there is little research based evidence of the current state of data quality in the country as well as factors that may influence data quality in routine health information systems. This study explored the data quality of routine health information generated from health facilities in Oyo State, Nigeria, providing the state of data quality of the routine health information. This study was a cross sectional descriptive study taking a retrospective look at paper based and electronic data records in the National Health Management Information System in Nigeria. A mixed methodology approaches with quantitative to assess the quality of data within the health information system and qualitative methods to identify factors influencing the quality of health information at the health facilities in the district. Assessment of the quality of information was done using a structured evaluation tool looking at completeness, accuracy and consistency of routine health statistics generated at these health facilities. A multistage sampling method was used in the quantitative component of the research. For the qualitative component of the research, purposive sampling was done to select respondents from each health facility to describe the factors influencing data quality. The study found incomplete and inaccurate data in facility paper summaries as well as in the electronic databases storing aggregate information from the facility data.
5

Towards model governance in predictive toxicology

Palczewska, Anna Maria, Fu, X., Trundle, Paul R., Yang, Longzhi, Neagu, Daniel, Ridley, Mick J., Travis, Kim January 2013 (has links)
no / Efficient management of toxicity information as an enterprise asset is increasingly important for the chemical, pharmaceutical, cosmetics and food industries. Many organisations focus on better information organisation and reuse, in an attempt to reduce the costs of testing and manufacturing in the product development phase. Toxicity information is extracted not only from toxicity data but also from predictive models. Accurate and appropriately shared models can bring a number of benefits if we are able to make effective use of existing expertise. Although usage of existing models may provide high-impact insights into the relationships between chemical attributes and specific toxicological effects, they can also be a source of risk for incorrect decisions. Thus, there is a need to provide a framework for efficient model management. To address this gap, this paper introduces a concept of model governance, that is based upon data governance principles. We extend the data governance processes by adding procedures that allow the evaluation of model use and governance for enterprise purposes. The core aspect of model governance is model representation. We propose six rules that form the basis of a model representation schema, called Minimum Information About a QSAR Model Representation (MIAQMR). As a proof-of-concept of our model governance framework we develop a web application called Model and Data Farm (MADFARM), in which models are described by the MIAQMR-ML markup language. (C) 2013 Elsevier Ltd. All rights reserved.
6

Data Quality Assessment for Closed-Loop System Identification and Forecasting with Application to Soft Sensors

Shardt, Yuri Unknown Date
No description available.

Page generated in 0.1363 seconds