Global ETD Search

11	Using DevOps principles to continuously monitor RDF data quality Meissner, Roy, Junghanns, Kurt 01 August 2017 (has links) One approach to continuously achieve a certain data quality level is to use an integration pipeline that continuously checks and monitors the quality of a data set according to defined metrics. This approach is inspired by Continuous Integration pipelines, that have been introduced in the area of software development and DevOps to perform continuous source code checks. By investigating in possible tools to use and discussing the specific requirements for RDF data sets, an integration pipeline is derived that joins current approaches of the areas of software development and semantic web as well as reuses existing tools. As these tools have not been built explicitly for CI usage, we evaluate their usability and propose possible workarounds and improvements. Furthermore, a real world usage scenario is discussed, outlining the benefit of the usage of such a pipeline. info:eu-repo/classification/ddc/000 ddc:000
12	Design and implementation of a workflow for quality improvement of the metadata of scientific publications Wolff, Stefan 07 November 2023 (has links) In this paper, a detailed workflow for analyzing and improving the quality of metadata of scientific publications is presented and tested. The workflow was developed based on approaches from the literature. Frequently occurring types of errors from the literature were compiled and mapped to the data-quality dimensions most relevant for publication data – completeness, correctness, and consistency – and made measurable. Based on the identified data errors, a process for improving data quality was developed. This process includes parsing hidden data, correcting incorrectly formatted attribute values, enriching with external data, carrying out deduplication, and filtering erroneous records. The effectiveness of the workflow was confirmed in an exemplary application to publication data from Open Researcher and Contributor ID (ORCID), with 56\% of the identified data errors corrected. The workflow will be applied to publication data from other source systems in the future to further increase its performance. info:eu-repo/classification/ddc/006 ddc:006
13	A Clinical Decision Support System for the Identification of Potential Hospital Readmission Patients Unknown Date (has links) Recent federal legislation has incentivized hospitals to focus on quality of patient care. A primary metric of care quality is patient readmissions. Many methods exist to statistically identify patients most likely to require hospital readmission. Correct identification of high-risk patients allows hospitals to intelligently utilize limited resources in mitigating hospital readmissions. However, these methods have seen little practical adoption in the clinical setting. This research attempts to identify the many open research questions that have impeded widespread adoption of predictive hospital readmission systems. Current systems often rely on structured data extracted from health records systems. This data can be expensive and time consuming to extract. Unstructured clinical notes are agnostic to the underlying records system and would decouple the predictive analytics system from the underlying records system. However, additional concerns in clinical natural language processing must be addressed before such a system can be implemented. Current systems often perform poorly using standard statistical measures. Misclassification cost of patient readmissions has yet to be addressed and there currently exists a gap between current readmission system evaluation metrics and those most appropriate in the clinical setting. Additionally, data availability for localized model creation has yet to be addressed by the research community. Large research hospitals may have sufficient data to build models, but many others do not. Simply combining data from many hospitals often results in a model which performs worse than using data from a single hospital. Current systems often produce a binary readmission classification. However, patients are often readmitted for differing reasons than index admission. There exists little research into predicting primary cause of readmission. Furthermore, co-occurring evidence discovery of clinical terms with primary diagnosis has seen only simplistic methods applied. This research addresses these concerns to increase adoption of predictive hospital readmission systems. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2017. / FAU Electronic Theses and Dissertations Collection Medical care--Decision making. Evidence-based medicine. Outcome assessment (Medical care)
14	Řízení kvality dat v malých a středních firmách / Data quality management in small and medium enterprises Zelený, Pavel January 2010 (has links) This diploma thesis deals with the data quality management. There are many tools and methodologies to support the data quality management even in Czech market but they are all only for large companies. Small and middle companies can't afford them because of high cost. The first goal of this thesis is to summarize principles of the methodologies and then on the base of the methodologies to suggest more simple methodology for small and middle companies. In the second part of thesis is created and adapted the methodology for a specific company. The first step is to choose the data area of interest in the company. Because of impossibility to buy a software tool to clean data, there are defined relatively simple rules which are base source to create cleaning scripts in SQL language. The scripts are used for automatic data cleaning. On the base of next analyze is decided what data should be cleaned manually. In the next step are described recommendations how to remove duplicities from the database. There is used a functionality of the company's production system. The last step of the methodology is to create a control mechanism which have to keep the required data quality in future. At the end of thesis is made a data research in four data sources. All these sources are from companies using the same production system. The reason of research is to present the overview of data quality and to help with decision about cleaning data in the companies also.
15	A contribution towards real-time forecasting of algal blooms in drinking water reservoirs by means of artificial neural networks and evolutionary algorithms. Welk, Amber Lee January 2008 (has links) Historical water quality databases from two South Australian drinking water reservoirs were used, in conjunction with various computational modelling methods for the ordination, clustering and forecasting of complex ecological data. Techniques used throughout the study were: Kohonen artificial neural networks (KANN) for data categorisation and the discovery of patterns and relationships, recurrent supervised artificial neural networks (RANN) for knowledge discovery and forecasting of algal dynamics and hybrid evolutionary algorithms (HEA) for rule-set discovery and optimisation for forecasting algal dynamics. These methods were combined to provide an integrated approach to the analysis of algal populations including interactions within the algal community and with other water quality factors, which results in improved understanding and forecasting of algal dynamics. The project initially focussed on KANN for the patternising and classification of the historical data to reveal links between the physical, chemical and biological components of the reservoirs. This offered some understanding of the system and relationships being considered for the construction of the forecasting models. Specific investigations were performed to examine past conditions and the impacts of different management regimes, as well as to discover sets of conditions that correspond with specific algal functional groups. RANN was then used to build models for forecasting both Chl-a and the main nuisance species, Anabaena, up to 7 days in advance. This method also provided sensitivity analyses to demonstrate the relationship between input and output variables by plotting the reaction of the output to variations in the inputs. Initially one year from the data set was selected for the testing of a model, as per the split-sample technique. To further test the models, it was later decided to select several years for testing to ensure the models were useful under changed conditions, and that test results were not misleading regarding the models true capabilities. RANN were firstly used to create reservoir specific or ad-hoc models. Later, the models were trained with the merged data sets of both reservoirs to create one model that could be applied to either reservoir. Another method of forecasting was trialled and compared to RANN. HEA was found to be equal or superior to RANN in predictive power, also allowed sensitivity analysis and provided an explicit, portable rule set. The HEA rule sets were initially tested on selected years of data, however to fully demonstrate the models potential, a process for k-fold cross-validation was developed to test the rule-set on all years of data. To further extend the applicability of the HEA rule-set; the idea of rule-based agents for specific lake ecosystem categories was examined. The generality of a rule-based agent means that, after successful validation on several lakes from one category, the agent could then be applied to other water bodies from within that category that had not been involved in the training process. The ultimate test of the rule-based agent for the warm monomictic and eutrophic lake ecosystem category was to be applied to a real-time monitoring and forecasting situation. The agent was fed with online, real-time data from a reservoir that belonged to the same ecosystem category but was not used in the training process. These preliminary experiments showed promising results. It can be concluded that the concept of rulebased agents will facilitate real-time forecasting of algal blooms in drinking water reservoirs provided on-line monitoring of relevant variables has been implemented. Contributions of this research include: (1) to offer insight into the capabilities of 3 kinds of computational modelling techniques applied to complex water quality data, (2) novel applications of KANN including the division of data into separate management periods for comparison of management efficiency, (3) to both qualitatively and quantitatively elucidate relationships between water quality parameters, (4) research toward the development of a forecasting tool for algal abundance 7 days in advance that could be generic for a particular lake ecosystem category and implemented in real-time, and (5) to suggest a thorough testing method for such models (k-fold cross validation). / http://proxy.library.adelaide.edu.au/login?url= http://library.adelaide.edu.au/cgi-bin/Pwebrecon.cgi?BBID=1331584 / Thesis (Ph.D.) -- University of Adelaide, School of Earth and Environmental Sciences, 2008 Algal blooms Drinking water -- microbiology Water quality Water quality -- Mathematical models
16	Datová kvalita a nástroje pro její řízení / Data Quality And Tools For Its Management Tezzelová, Jana January 2009 (has links) This diploma thesis deals with data quality, with emphasis on issues of management and on tools which were developed for solving data quality issues. The goal of this work is to summarize knowledge about data quality problems which includes its evaluation, management, description of key problems in data and possibilities of their solutions. The aims of this thesis are among others also analysis of market of software tools for support and management of data quality and mainly comparison of functionalities and possibilities of several of those tools. This work is split into two consequential parts. The first theoretical part is focusing on opening to problems of data quality and mainly data quality management, including identification of main steps for successful management. The second practical part is focusing on the market with data quality tools, especially its characteristics, segmentation, evolution, current state and expectable trends. The important section of this part is also practical comparison of features and evaluation of the work with several data quality tools. This work aims to be beneficial for all the audience interested in data quality problems, especially its management and supporting technology. Thanks to focusing on data quality tools market and tools comparison this work could be also useful guide for companies which are currently choosing the proper tool for introducing the data quality. Regarding this work focus the readers are expected to have at least basic orientation in Business Intelligence.
17	Řešení Business Intelligence / Business Intelligence Solutions Dzimko, Miroslav January 2017 (has links) Diploma thesis presents an evaluation of the current state of the company system, identification of critical areas and areas suitable for improvement. Based on the theoretical knowledge and analysis results, commercial Business Intelligence software is designed to enhance the quality and efficiency of the company's decision-support system and the introduction of an advanced Quality Culture system. The thesis reveals critical locations in the corporate environment and opens up space to design improvements to the system.
18	Datová kvalita, integrita a konsolidace dat v BI / Data quality, data integrity and consolidation of data in BI Dražil, Michal January 2008 (has links) This thesis deals with the areas of enterprise data quality, data integrity and data consolidation from the perspective of Business Intelligence (BI), which is currently experiencing significant growth. The aim of this thesis is to provide a comprehensive view of the data quality in terms of BI, to analyze problems in the area of data quality control and to propose options to address them. Moreover, the thesis aims to analyze and assess the features of specialized software tools for data quality. Last but not least aim of this thesis is to identify the critical success factors in the field of data quality in CRM and BI projects. The thesis is divided into two parts. The first (theoretical) part deals with data quality, data integrity and consolidation of data in relation to BI trying to identify key issues, which are related to these areas. The second (practical) part of the thesis deals at first with the features of software tools for data quality and offers their fundamental summary as well as the tools breakdown. This part also provides basic comparison of the few selected software products specialized at the corporate data quality assurance. The practical part hereafter describes addressing the data quality within the specific BI/CRM project conducted by Clever Decision Ltd. This thesis is intended primarily for BI and data quality experts, as well as the others who are interested in these disciplines. The main contribution of this thesis is that it provides comprehensive view not only of data quality itself, but also deals with the issues that are directly related to the corporate data quality assurance. This thesis can serve as a sort of guidance for one of the first implementation phases in the BI projects, which deals with the data integration, data consolidation and solving problems in the area of data quality.
19	Ensemble Stream Model for Data-Cleaning in Sensor Networks Iyer, Vasanth 16 October 2013 (has links) Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as f-test is performed during each node’s split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today. Sensor Networks Mobile Sensor Networks Data-cleaning Machine Learning Data Mining Routing Power-aware routing Netcoding Data Aggregation Quality of Data Quality of Service Feature Extraction Randomforest Bagging Classifiers Renewable Energy
20	Sistema de comunicação de internação hospitalar: avaliação da qualidade das informações / Communication system of the hospital: assessing the quality of the information Benevides, Plauto Ricardo de Sá e January 2009 (has links) Made available in DSpace on 2011-05-04T12:36:22Z (GMT). No. of bitstreams: 0 Previous issue date: 2009 / Este estudo visa explorar o Sistema de Comunicação de Internação Hospitalar (CIH) propondo critérios para avaliação da qualidade dos dados, com o objetivo de sinalizar a existência dos aspectos limitantes, e contribuir para a melhoria da qualidade dos dados dessa importante fonte de informação. O trabalho objetiva, também, incentivar o uso do CIH, ressaltando o seu potencial de utilização na epidemiologia e na gestão da saúde do País. Trata-se de um estudo ecológico em um banco de dados em nível nacional, no período de 2007 e 2008. A metodologia adotada na avaliação da qualidade dos dados foi baseada nas experiências do Instituto Canadense de Informação para a Saúde (Canadian Institute for Health Information) e da Rede Interagencial de Informações para a Saúde (Ripsa), adaptando-se seus conceitos e recomendações às necessidades inerentes ao CIH. O estudo demonstrou que, no período analisado, a base de dados do CIH possui fragilidade na coleta das informações, porém tem boa completitude e mostra coerência das informações na série histórica. / This study explored the Communication System for Hospital (CIH) proposing criteria for evaluating the quality of data in order to signal the existence of the limiting aspects and contribute to the improvement of data quality of this important source of information. The work also aims at encouraging the use of CIH, highlighting its potential use in epidemiology and health management in the country. This is one ecological study in a database at the national level, between 2007 and 2008. The methodology used in assessing the quality of the data was based on the experiences of the Canadian Institute for Health Information and the Inter-Agency Network for Health Information (Ripsa), adapting its concepts and recommendations to the needs inherent CIH. The study showed that during the period analyzed, the database of the CIH has weakness in the data collection, but has shown good consistency and completeness of the information in the series. Comunicação de Internação Hospitalar Qualidade de dados Saúde Suplementar Hospitalização Comunicação Sistemas de Informação Hospitalar Avaliação Saúde Suplementar Coleta de Dados Assessment System on Heath Information Quality of data; Health insurance Hospitalização Comunicação Sistemas de Informação Hospitalar Avaliação Saúde Suplementar Coleta de Dados -Gestão de Qualidade

Search results