Global ETD Search

21	Data Driven Framework for Prognostics January 2010 (has links) abstract: Prognostics and health management (PHM) is a method that permits the reliability of a system to be evaluated in its actual application conditions. This work involved developing a robust system to determine the advent of failure. Using the data from the PHM experiment, a model was developed to estimate the prognostic features and build a condition based system based on measured prognostics. To enable prognostics, a framework was developed to extract load parameters required for damage assessment from irregular time-load data. As a part of the methodology, a database engine was built to maintain and monitor the experimental data. This framework helps in significant reduction of the time-load data without compromising features that are essential for damage estimation. A failure precursor based approach was used for remaining life prognostics. The developed system has a throughput of 4MB/sec with 90% latency within 100msec. This work hence provides an overview on Prognostic framework survey, Prognostics Framework architecture and design approach with a robust system implementation. / Dissertation/Thesis / M.S. Computer Science 2010 Computer Science Database Data management system Prognostics
22	Master data management maturity model for the microfinance sector in Peru Vásquez Zúñiga, Daniel, Kukurelo Cruz, Romina, Raymundo Ibañez, Carlos, Dominguez, Francisco, Moguerza, Javier January 2018 (has links) El texto completo de este trabajo no está disponible en el Repositorio Académico UPC por restricciones de la casa editorial donde ha sido publicado. / The microfinance sector has a strategic role since they facilitate integration and development of all social classes to sustained economic growth. In this way the actual point is the exponential growth of data, resulting from transactions and operations carried out with these companies on a daily basis, becomes imminent. Appropriate management of this data is therefore necessary because, otherwise, it will result in a competitive disadvantage due to the lack of valuable and quality information for decision-making and process improvement. The Master Data Management (MDM) give a new way in the Data management, reducing the gap between the business perspectives versus the technology perspective In this regard, it is important that the organization have the ability to implement a data management model for Master Data Management. This paper proposes a Master Data management maturity model for microfinance sector, which frames a series of formal requirements and criteria providing an objective diagnosis with the aim of improving processes until entities reach desired maturity levels. This model was implemented based on the information of Peruvian microfinance organizations. Finally, after validation of the proposed model, it was evidenced that it serves as a means for identifying the maturity level to help in the successful of initiative for Master Data management projects. / Revisión por pares Data Master data management Microfinance Maturity model
23	Data Quality Metrics / Data Quality Metrics Sýkorová, Veronika January 2008 (has links) The aim of the thesis is to prove measurability of the Data Quality which is a relatively subjective measure and thus is difficult to measure. In doing this various aspects of measuring the quality of data are analyzed and a Complex Data Quality Monitoring System is introduced with the aim to provide a concept for measuring/monitoring the overall Data Quality in an organization. The system is built on a metrics hierarchy decomposed into particular detailed metrics, dimensions enabling multidimensional analyses of the metrics, and processes being measured by the metrics. The first part of the thesis (Chapter 2 and Chapter 3) is focused on dealing with Data Quality, i.e. provides various definitions of Data Quality, gives reasoning for the importance of Data Quality in a company, and presents some of the most common tools and solutions that target to managing Data Quality in an organization. The second part of the thesis (Chapter 4 and Chapter 5) builds on the previous part and leads into measuring Data Quality using metrics, i.e. contains definition and purpose of Data Quality Metrics, places them into the multidimensional context (dimensions, hierarchies) and states five possible decompositions of Data Quality metrics into detail. The third part of the thesis (Chapter 6) contains the proposed Complex Data Quality Monitoring System including description of Data Quality Management related dimensions and processes, and most importantly detailed definition of bottom-level metrics used for calculation of the overall Data Quality.
24	Towards a Database System for Large-scale Analytics on Strings Sahli, Majed 23 July 2015 (has links) Recent technological advances are causing an explosion in the production of sequential data. Biological sequences, web logs and time series are represented as strings. Currently, strings are stored, managed and queried in an ad-hoc fashion because they lack a standardized data model and query language. String queries are computationally demanding, especially when strings are long and numerous. Existing approaches cannot handle the growing number of strings produced by environmental, healthcare, bioinformatic, and space applications. There is a trade- off between performing analytics efficiently and scaling to thousands of cores to finish in reasonable times. In this thesis, we introduce a data model that unifies the input and output representations of core string operations. We define a declarative query language for strings where operators can be pipelined to form complex queries. A rich set of core string operators is described to support string analytics. We then demonstrate a database system for string analytics based on our model and query language. In particular, we propose the use of a novel data structure augmented by efficient parallel computation to strike a balance between preprocessing overheads and query execution times. Next, we delve into repeated motifs extraction as a core string operation for large-scale string analytics. Motifs are frequent patterns used, for example, to identify biological functionality, periodic trends, or malicious activities. Statistical approaches are fast but inexact while combinatorial methods are sound but slow. We introduce ACME, a combinatorial repeated motifs extractor. We study the spatial and temporal locality of motif extraction and devise a cache-aware search space traversal technique. ACME is the only method that scales to gigabyte- long strings, handles large alphabets, and supports interesting motif types with minimal overhead. While ACME is cache-efficient, it is limited by being serial. We devise a lightweight parallel space traversal technique, called FAST, that enables ACME to scale to thousands of cores. High degree of concurrency is achieved by partition- ing the search space horizontally and balancing the workload among cores with minimal communication overhead. Consequently, complex queries are solved in minutes instead of days. ACME is a versatile system that runs on workstations, clusters, and supercomputers. It is the first to utilize a supercomputer and scale to 16 thousand CPUs. Merely using more cores does not guarantee efficiency, because of the related overheads. To this end, we introduce an automatic tuning mechanism that suggests the appropriate number of cores to meet user constraints in terms of runtime while minimizing the financial cost of cloud resources. Particularly, we study workload frequency distributions then build a model that finds the best problem decomposition and estimates serial and parallel runtimes. Finally, we generalize our automatic tuning method as a general method, called APlug. APlug can be used in other applications and we integrate it with systems for molecular docking and multiple sequence alignment. Data Management Parallele Processing DBMS for strings
25	Engineering a Software Environment for Research Data Management of Microscopy Image Data in a Core Facility Kunis, Susanne 30 May 2022 (has links) This thesis deals with concepts and solutions in the field of data management in everyday scientific life for image data from microscopy. The focus of the formulated requirements has so far been on published data, which represent only a small subset of the data generated in the scientific process. More and more, everyday research data are moving into the focus of the principles for the management of research data that were formulated early on (FAIR-principles). The adequate management of this mostly multimodal data is a real challenge in terms of its heterogeneity and scope. There is a lack of standardised and established workflows and also the software solutions available so far do not adequately reflect the special requirements of this area. However, the success of any data management process depends heavily on the degree of integration into the daily work routine. Data management must, as far as possible, fit seamlessly into this process. Microscopy data in the scientific process is embedded in pre-processing, which consists of preparatory laboratory work and the analytical evaluation of the microscopy data. In terms of volume, the image data often form the largest part of data generated within this entire research process. In this paper, we focus on concepts and techniques related to the handling and description of this image data and address the necessary basics. The aim is to improve the embedding of the existing data management solution for image data (OMERO) into the everyday scientific work. For this purpose, two independent software extensions for OMERO were implemented within the framework of this thesis: OpenLink and MDEmic. OpenLink simplifies the access to the data stored in the integrated repository in order to feed them into established workflows for further evaluations and enables not only the internal but also the external exchange of data without weakening the advantages of the data repository. The focus of the second implemented software solution, MDEmic, is on the capturing of relevant metadata for microscopy. Through the extended metadata collection, a corresponding linking of the multimodal data by means of a unique description and the corresponding semantic background is aimed at. The configurability of MDEmic is designed to address the currently very dynamic development of underlying concepts and formats. The main goal of MDEmic is to minimise the workload and to automate processes. This provides the scientist with a tool to handle this complex and extensive task of metadata acquisition for microscopic data in a simple way. With the help of the software, semantic and syntactic standardisation can take place without the scientist having to deal with the technical concepts. The generated metadata descriptions are automatically integrated into the image repository and, at the same time, can be transferred by the scientists into formats that are needed when publishing the data. research data management metadata microscopy ddc:500
26	Automating Laboratory Operations by Intergrating Laboratory Information Management Systems (LIMS) with Analytical Instruments and Scientific Data Management System (SDMS) Zhu, Jianyong 06 1900 (has links) Submitted to the faculty of the University Graduate School in partial fulfillment of the requirements for the degree Master of Science in the School of Informatics, Indiana University June 2005 / The large volume of data generated by commercial and research laboratories, along with requirements mandated by regulatory agencies, have forced companies to use laboratory information management systems (LIMS) to improve efficiencies in tracking, managing samples, and precisely reporting test results. However, most general purpose LIMS do not provide an interface to automatically collect data from analytical instruments to store in a database. A scientific data management system (SDMS) provides a “Print-to-Database” technology, which facilitates the entry of reports generated by instruments directly into the SDMS database as Windows enhanced metafiles thus to minimize data entry errors. Unfortunately, SDMS does not allow performing further analysis. Many LIMS vendors provide plug-ins for single instrument but none of them provides a general purpose interface to extract the data from SDMS and store in LIMS. In this project, a general purpose middle layer named LabTechie is designed, built and tested for seamless integration between instruments, SDMS and LIMS. This project was conducted at American Institute of Technology (AIT) Laboratories, an analytical laboratory that specializes in trace chemical measurement of biological fluids. Data is generated from 20 analytical instruments, including gas chromatography/mass spectrometer (GC/MS), high performance liquid chromatography (HPLC), and liquid chromatography/mass spectrometer (LC/MS), and currently stored in NuGenesis SDMS iv (Waters, Milford, MA). This approach can be easily expanded to include additional instruments. laboratory information management scientific data management
27	An Integrated Approach to Improve Data Quality Al-janabi, Samir 06 1900 (has links) Thesis / A huge quantity of data is created and saved everyday in databases from different types of data sources, including financial data, web log data, sensor data, and human input. Information technology enables organizations to collect and store large amounts of data in databases. Different organizations worldwide use data to support their activities through various applications. Issues in data quality such as duplicate records, inaccurate data, violations of integrity constraints, and outdated data are common in databases. Thus, data in databases are often unclean. Such issues in data quality might cost billions of dollars annually and might have severe consequences on critical tasks such as analysis, decision making, and planning. Data cleaning processes are required to detect and correct errors in the unclean data. Despite the fact that there are multiple quality issues, current data cleaning techniques generally deal with only one or two aspects of quality. The techniques assume either the availability of master data, or training data, or the involvement of users in data cleaning. For instance, users might manually place confidence scores that represent the correctness of the values of data or they may be consulted about the repairs. In addition, the techniques may depend on high-quality master data or pre-labeled training data to fix errors. However, relying on human effort to correct errors is expensive, and master data or training data are not always available. These factors make it challenging to discover which values have issues, thereby making it difficult to fix the data (e.g., merging several duplicate records into a single representative record). To address these problems in data cleaning, we propose algorithms that integrate multiple data quality issues in the cleaning. In this thesis, we apply this approach in the context of multiple data quality issues where errors in data are introduced from multiple causes. The issues include duplicate records, violations of integrity constraints, inaccurate data, and outdated data. We fix these issues holistically, without a need for human manual interaction, master data, or training data. We propose an algorithm to tackle the problem of data cleaning. We concentrate on issues in data quality including duplicate records, violations of integrity constraints, and inaccurate data. We utilize the embedded density information in data to eliminate duplicates based on data density, where tuples that are close to each other are packed together. Density information enables us to reduce manual user interaction in the deduplication process, and the dependency on master data or training data. To resolve inconsistency in duplicate records, we present a weight model to automatically assign confidence scores that are based on the density of data. We consider the inconsistent data in terms of violations with respect to a set of functional dependencies (FDs). We present a cost model for data repair that is based on the weight model. To resolve inaccurate data in duplicate records, we measure the relatedness of the words of the attributes in the duplicate records based on hierarchical clustering. In the context of integrating the fix of outdated data and inaccurate data in duplicate elimination, we propose an algorithm for data cleaning by introducing techniques based on corroboration, i.e. taking into consideration the trustworthiness of the attribute values. The algorithm integrates data deduplication with data currency and accuracy. We utilize the density information embedded inside the tuples in order to guide the cleaning process to fix multiple data quality issues. By using density information in corroboration, we reduce relying on manual user interaction, and the dependency on master data or training data. / Thesis / Doctor of Philosophy (PhD) data management data quality data mining
28	Metadata-Driven Management of Scientific Data Kumar, Aman 08 September 2009 (has links) No description available. Computer Science metadata data management folksonomies ontologies
29	Development of a measurement-based approach for monitoring the changes in an evolving quality management system Caroli, Vivek 04 May 2010 (has links) The concept of quality management is operationalized in an organization through a Quality Management System (QMS) - a complex, coordinated set of activities and behaviors aimed at improving the quality of an organization's processes, goods, and services. Like all systems, a QMS must be planned, monitored, improved, and maintained over time to function at its best. For this, measurement is key. The standard of quality management performance developed by Triantis, et. al. (1991b) is the quality management system definition used in this thesis. The thesis subsequently makes three contributions. First, it provides a methodology for defining generic measures of QMS performance and evolution, and implements this methodology in creating more than 200 prototype measures for 10 out off the 37 component "modules" of a QMS. Second, a methodology is presented for developing a tool to collect the very data called for by the measures. This methodology is implemented and a prototype questionnaire developed to collect measurement data for the Vendor/Contractor Relations (VCR)module of a QMS. Third, given the vast amount of data collected with the various questionnaires that needs to be manipulated in order to manage the QMS, it is important to be able to use automation. Therefore, it becomes necessary to logically organize the data. The entity-relationship (E/R) modeling technique is one approach that can be used to achieve this objective. This E/R approach is used to logically organize data that is generated by the questionnaire for the VCR module. In so doing, one can assess the potential viability of this data modeling approach and begin laying the foundation for a database that will support the measurement requirements of a QMS. / Master of Science data management Performance LD5655.V855 1994.C376
30	Utilização de sistemas PDM em ambientes de engenharia simultânea: o caso de uma implantação em uma montadora de veículos pesados. / Use of PDM systems in concurrent engineering environments: a case study of an implementation in a multinational heavy vehicles industry. Omokawa, Rogerio 21 May 1999 (has links) A engenharia simultânea e os sistemas de gerenciamento de dados de produto(PDM), apesar de serem uma ajuda preciosa para que as empresas enfrentem as novas condições de sobrevivência no mercado atual, não são muito conhecidos. Além disso, existem poucos trabalhos científicos relacionados à implantação e ao auxílio deste tipo sistema no gerenciamento de dados em ambiente de engenharia simultânea. Neste trabalho procura-se levantar, segundo a bibliografia, as necessidades de gerenciamento de dados em ambientes de engenharia simultânea, comparar as necessidades de gerenciamento de dados encontradas na bibliografia com as necessidades de um caso de implantação real, levantar quais funcionalidades de sistemas PDM (product data management) suprem as necessidades encontradas, e caracterizar um projeto de implantação real de um sistema PDM em um ambiente de engenharia simultânea. / The concurrent engineering and the product data management systems(PDM), although being a precious aid to allow the companies to face the new survival conditions in the current market, are not very well-known. Besides that, few scientific works related to the implementation of this kind of system and their usage in the data management of data in a concurrent engineering environment are available. The objectives of this work are: to rise, according to the bibliography, the needs of data management in a concurrent engineering environment, to compare those needs with the ones of a real implementation case, to rise which functionality of PDM (Product Data Management) systems supply the founded needs, and to characterize a project of a real PDM system implementation in an environment of concurrent engineering. concurrent engineering data management engenharia simultânea gerenciamento de dados product data management system

Search results