• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 152
  • 34
  • 33
  • 26
  • 12
  • 10
  • 9
  • 4
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 334
  • 334
  • 76
  • 59
  • 49
  • 35
  • 34
  • 32
  • 31
  • 30
  • 30
  • 29
  • 28
  • 28
  • 27
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
261

Managing and Consuming Completeness Information for RDF Data Sources

Darari, Fariz 20 June 2017 (has links)
The ever increasing amount of Semantic Web data gives rise to the question: How complete is the data? Though generally data on the Semantic Web is incomplete, many parts of data are indeed complete, such as the children of Barack Obama and the crew of Apollo 11. This thesis aims to study how to manage and consume completeness information about Semantic Web data. In particular, we first discuss how completeness information can guarantee the completeness of query answering. Next, we propose optimization techniques of completeness reasoning and conduct experimental evaluations to show the feasibility of our approaches. We also provide a technique to check the soundness of queries with negation via reduction to query completeness checking. We further enrich completeness information with timestamps, enabling query answers to be checked up to when they are complete. We then introduce two demonstrators, i.e., CORNER and COOL-WD, to show how our completeness framework can be realized. Finally, we investigate an automated method to generate completeness statements from text on the Web via relation cardinality extraction.
262

Datenqualität in Sensordatenströmen

Klein, Anja 19 June 2009 (has links)
Die stetige Entwicklung intelligenter Sensorsysteme erlaubt die Automatisierung und Verbesserung komplexer Prozess- und Geschäftsentscheidungen in vielfältigen Anwendungsszenarien. Sensoren können zum Beispiel zur Bestimmung optimaler Wartungstermine oder zur Steuerung von Produktionslinien genutzt werden. Ein grundlegendes Problem bereitet dabei die Sensordatenqualität, die durch Umwelteinflüsse und Sensorausfälle beschränkt wird. Ziel der vorliegenden Arbeit ist die Entwicklung eines Datenqualitätsmodells, das Anwendungen und Datenkonsumenten Qualitätsinformationen für eine umfassende Bewertung unsicherer Sensordaten zur Verfügung stellt. Neben Datenstrukturen zur effizienten Datenqualitätsverwaltung in Datenströmen und Datenbanken wird eine umfassende Datenqualitätsalgebra zur Berechnung der Qualität von Datenverarbeitungsergebnissen vorgestellt. Darüber hinaus werden Methoden zur Datenqualitätsverbesserung entwickelt, die speziell auf die Anforderungen der Sensordatenverarbeitung angepasst sind. Die Arbeit wird durch Ansätze zur nutzerfreundlichen Datenqualitätsanfrage und -visualisierung vervollständigt.
263

Data Quality Evaluation and Improvement for Machine Learning

Chen, Haihua 05 1900 (has links)
In this research the focus is on data-centric AI with a specific concentration on data quality evaluation and improvement for machine learning. We first present a practical framework for data quality evaluation and improvement, using a legal domain as a case study and build a corpus for legal argument mining. We first created an initial corpus with 4,937 instances that were manually labeled. We define five data quality evaluation dimensions: comprehensiveness, correctness, variety, class imbalance, and duplication, and conducted a quantitative evaluation on these dimensions for the legal dataset and two existing datasets in the medical domain for medical concept normalization. The first group of experiments showed that class imbalance and insufficient training data are the two major data quality issues that negatively impacted the quality of the system that was built on the legal corpus. The second group of experiments showed that the overlap between the test datasets and the training datasets, which we defined as "duplication," is the major data quality issue for the two medical corpora. We explore several widely used machine learning methods for data quality improvement. Compared to pseudo-labeling, co-training, and expectation-maximization (EM), generative adversarial network (GAN) is more effective for automated data augmentation, especially when a small portion of labeled data and a large amount of unlabeled data is available. The data validation process, the performance improvement strategy, and the machine learning framework for data evaluation and improvement discussed in this dissertation can be used by machine learning researchers and practitioners to build high-performance machine learning systems. All the materials including the data, code, and results will be released at: https://github.com/haihua0913/dissertation-dqei.
264

The Use of Big Data in Process Management : A Literature Study and Survey Investigation

Ephraim, Ekow Esson, Sehic, Sanel January 2021 (has links)
In recent years there has been an increasing interest in understanding how organizations can utilize big data in their process management to create value and improve their processes. This is due to new challenges for process management which have arisen from increasing competition and the complexity of large data sets due to technological advancements. These large data sets have been described by scholars as big data which involves data that are so complex traditional data analysis software are not sufficient in managing or analyzing them. Because of the complexity of handling such great volumes of data there is a big gap in practical examples where organizations have incorporated big data in their process management. Therefore, in order to fill relevant gaps and contribute to advancements in this field, this thesis will explore how big data can contribute to improved process management. Hence, the aim of this thesis entailed investigating how, why and to what extent big data is used in process management. As well as to outline the purpose and challenges of using big data in process management. This was accomplished through a literature review and a survey, respectively, in order to understand how big data had previously been used to create value and improve processes in organizations. From the extensive literature review, an analysis matrix of how big data is used in process management is provided through the intersections between big data and process management dimensions. The analysis matrix showed that most of the instances in which big data was used in process management were in process analysis & improvement and process control & agility. Simply put, organizations used big data in specific activities involved in process management but not in a holistic manner. Furthermore, the limited findings from the survey indicate that the main challenges and purposes of big data use in Swedish organizations are the complexity of handling data and making statistically better decisions, respectively.
265

Amélioration de la qualité des données : correction sémantique des anomalies inter-colonnes / Improved data quality : correction of semantic inter-column anomalies

Zaidi, Houda 01 February 2017 (has links)
La qualité des données présente un grand enjeu au sein d'une organisation et influe énormément sur la qualité de ses services et sur sa rentabilité. La présence de données erronées engendre donc des préoccupations importantes autour de cette qualité. Ce rapport traite la problématique de l'amélioration de la qualité des données dans les grosses masses de données. Notre approche consiste à aider l'utilisateur afin de mieux comprendre les schémas des données manipulées, mais aussi définir les actions à réaliser sur celles-ci. Nous abordons plusieurs concepts tels que les anomalies des données au sein d'une même colonne, et les anomalies entre les colonnes relatives aux dépendances fonctionnelles. Nous proposons dans ce contexte plusieurs moyens de pallier ces défauts en nous intéressons à la performance des traitements ainsi opérés. / Data quality represents a major challenge because the cost of anomalies can be very high especially for large databases in enterprises that need to exchange information between systems and integrate large amounts of data. Decision making using erroneous data has a bad influence on the activities of organizations. Quantity of data continues to increase as well as the risks of anomalies. The automatic correction of these anomalies is a topic that is becoming more important both in business and in the academic world. In this report, we propose an approach to better understand the semantics and the structure of the data. Our approach helps to correct automatically the intra-column anomalies and the inter-columns ones. We aim to improve the quality of data by processing the null values and the semantic dependencies between columns.
266

Cancer reporting: timeliness analysis and process reengineering

Jabour, Abdulrahman M. 09 November 2015 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Introduction: Cancer registries collect tumor-related data to monitor incident rates and support population-based research. A common concern with using population-based registry data for research is reporting timeliness. Data timeliness have been recognized as an important data characteristic by both the Centers for Disease Control and Prevention (CDC) and the Institute of Medicine (IOM). Yet, few recent studies in the United States (U.S.) have systemically measured timeliness. The goal of this research is to evaluate the quality of cancer data and examine methods by which the reporting process can be improved. The study aims are: 1- evaluate the timeliness of cancer cases at the Indiana State Department of Health (ISDH) Cancer Registry, 2- identify the perceived barriers and facilitators to timely reporting, and 3- reengineer the current reporting process to improve turnaround time. Method: For Aim 1: Using the ISDH dataset from 2000 to 2009, we evaluated the reporting timeliness and subtask within the process cycle. For Aim 2: Certified cancer registrars reporting for ISDH were invited to a semi-structured interview. The interviews were recorded and qualitatively analyzed. For Aim 3: We designed a reengineered workflow to minimize the reporting timeliness and tested it using simulation. Result: The results show variation in the mean reporting time, which ranged from 426 days in 2003 to 252 days in 2009. The barriers identified were categorized into six themes and the most common barrier was accessing medical records at external facilities. We also found that cases reside for a few months in the local hospital database while waiting for treatment data to become available. The recommended workflow focused on leveraging a health information exchange for data access and adding a notification system to inform registrars when new treatments are available.
267

Examining Opioid-related Overdose Events in Dayton, OH using Police, Emergency Medical Services and Coroner’s Data

Pan, Yuhan 06 October 2020 (has links)
No description available.
268

THE PERCEIVED AND REAL VALUE OF HEALTH INFORMATION EXCHANGE IN PUBLIC HEALTH SURVEILLANCE

Dixon, Brian Edward 22 August 2011 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Public health agencies protect the health and safety of populations. A key function of public health agencies is surveillance or the ongoing, systematic collection, analysis, interpretation, and dissemination of data about health-related events. Recent public health events, such as the H1N1 outbreak, have triggered increased funding for and attention towards the improvement and sustainability of public health agencies’ capacity for surveillance activities. For example, provisions in the final U.S. Centers for Medicare and Medicaid Services (CMS) “meaningful use” criteria ask that physicians and hospitals report surveillance data to public health agencies using electronic laboratory reporting (ELR) and syndromic surveillance functionalities within electronic health record (EHR) systems. Health information exchange (HIE), organized exchange of clinical and financial health data among a network of trusted entities, may be a path towards achieving meaningful use and enhancing the nation’s public health surveillance infrastructure. Yet the evidence on the value of HIE, especially in the context of public health surveillance, is sparse. In this research, the value of HIE to the process of public health surveillance is explored. Specifically, the study describes the real and perceived completeness and usefulness of HIE in public health surveillance activities. To explore the real value of HIE, the study examined ELR data from two states, comparing raw, unedited data sent from hospitals and laboratories to data enhanced by an HIE. To explore the perceived value of HIE, the study examined public health, infection control, and HIE professionals’ perceptions of public health surveillance data and information flows, comparing traditional flows to HIE-enabled ones. Together these methods, along with the existing literature, triangulate the value that HIE does and can provide public health surveillance processes. The study further describes remaining gaps that future research and development projects should explore. The data collected in the study show that public health surveillance activities vary dramatically, encompassing a wide range of paper and electronic methods for receiving and analyzing population health trends. Few public health agencies currently utilize HIE-enabled processes for performing surveillance activities, relying instead on direct reporting of information from hospitals, physicians, and laboratories. Generally HIE is perceived well among public health and infection control professionals, and many of these professionals feel that HIE can improve surveillance methods and population health. Human and financial resource constraints prevent additional public health agencies from participating in burgeoning HIE initiatives. For those agencies that do participate, real value is being added by HIEs. Specifically, HIEs are improving the completeness and semantic interoperability of ELR messages sent from clinical information systems. New investments, policies, and approaches will be necessary to increase public health utilization of HIEs while improving HIEs’ capacity to deliver greater value to public health surveillance processes.
269

Evaluating Data Quality in a Data Warehouse Environment / Utvärdering av datakvalitet i ett datalager

Redgert, Rebecca January 2017 (has links)
The amount of data accumulated by organizations have grown significantly during the last couple of years, increasing the importance of data quality. Ensuring data quality for large amounts of data is a complicated task, but crucial to subsequent analysis. This study investigates how to maintain and improve data quality in a data warehouse. A case study of the errors in a data warehouse was conducted at the Swedish company Kaplan, and resulted in guiding principles on how to improve the data quality. The investigation was done by manually comparing data from the source systems to the data integrated in the data warehouse and applying a quality framework based on semiotic theory to identify errors. The three main guiding principles given are (1) to implement a standardized format for the source data, (2) to implement a check prior to integration where the source data are reviewed and corrected if necessary, and (3) to create and implement specific database integrity rules. Further work is encouraged on establishing a guide for the framework on how to best perform a manual approach for comparing data, and quality assurance of source data. / Mängden data som ackumulerats av organisationer har ökat betydligt under de senaste åren, vilket har ökat betydelsen för datakvalitet. Att säkerställa datakvalitet för stora mängder data är en komplicerad uppgift, men avgörande för efterföljande analys. Denna studie undersöker hur man underhåller och förbättrar datakvaliteten i ett datalager. En fallstudie av fel i ett datalager på det svenska företaget Kaplan genomfördes och resulterade i riktlinjer för hur datakvaliteten kan förbättras. Undersökningen gjordes genom att manuellt jämföra data från källsystemen med datat integrerat i datalagret och genom att tillämpa ett kvalitetsramverk grundat på semiotisk teori för att kunna identifiera fel. De tre huvudsakliga riktlinjerna som gavs är att (1) implementera ett standardiserat format för källdatat, (2) genomföra en kontroll före integration där källdatat granskas och korrigeras vid behov, och (3) att skapa och implementera specifika databasintegritetsregler. Vidare forskning uppmuntras för att skapa en guide till ramverket om hur man bäst jämför data genom en manuell undersökning, och kvalitetssäkring av källdata.
270

Mining Vehicle Classifications from Archived Loop Detector Data

Huang, Bo January 2014 (has links)
No description available.

Page generated in 0.0673 seconds