Global ETD Search

161	Information Integration Using aLinked Data Approach : Information Integration Using aLinked Data Approach Munir, Jawad January 2015 (has links) Enterprise product, in our case either an embedded system or a software application, development is a complex task, which require model approaches and multiple diverse software tools to be linked together in a tool-chain to support the development process. Individual tools in the tool-chain maintains an incomplete picture of the development process. Data integration is necessary between these various tools in order to have a unified, consistent view of the whole development process. Information integration between these tools is a challenging task due to heterogeneity of these tools. Linked data is a promising approach for tool and data integration, where tools are integrated at data level in a tool-chain. Linked data is an architectural style to integrate and require more definitions and specifications to capture relationships between data. In our case, data of tools are described and shared using OSLC specifications. While such an approach has been widely researched for tool integration, but none covers the aspect of using such a distributed approach for lifecycle data integration, management and search. In this thesis work, we investigated the use of linked data approach to lifecycle data integration. The outcome is a prototype tool-chain architecture for lifecycle data integration, which can support data intensive queries that require information from various data sources in the tool-chain. The report takes Scania´s data integration needs as a case study for the investigation and presents various insights gained during the prototype implementation. Furthermore, the report also presents the key benefits of using a linked data approach for data integration in an enterprise environment. Based on encouraging test results for our prototype, the architecture presented in this report can be seen as a probable solution to lifecycle data integration for the OSLC tool - chain. / Företagets produkt, i vårt fall antingen ett inbyggt system eller ett program, är utvecklingen en komplicerad uppgift, som kräver modell metoder och flera skiftande programverktyg som ska kopplas samman i en verktygskedja för att stödja utvecklingsprocessen. Enskilda verktyg i verktygskedjan bibehåller en ofullständig bild av utvecklingsprocessen. Dataintegration är nödvändigt mellan dessa olika verktyg för att få en enhetlig och konsekvent syn på hela utvecklingsprocessen. Informationsintegration mellan dessa verktyg är en utmanande uppgift på grund av heterogenitet av dessa verktyg. Kopplad data är en lovande strategi för verktygs-och dataintegration, där verktyg är integrerade på datanivå i en verktygskedja. Kopplade uppgifter är en arkitektonisk stil att integrera och kräver fler definitioner och specifikationer för att fånga relationer mellan data. I vårt fall är data av verktyg beskrivna och delad med hjälp av OSLC specifikationer. Medan ett sådant tillvägagångssätt har i stor utsträckning forskats på för integrationsverktyg, men ingen täcker aspekten att använda en sådan distribuerad strategi för livscykeldataintegration, hantering och sökning. I detta examensarbete, undersökte vi användning av länkad data strategi för livscykeldataintegration. Resultatet är en prototyp av verktygskedjans arkitektur för livscykeldataintegration, som kan stödja dataintensiva frågor som kräver information från olika datakällor i verktygskedjan. Rapporten tar Scanias dataintegrationsbehov som en fallstudie för utredning och presenterar olika insikter under genomförandet av prototypen. Vidare presenterar rapporten också de viktigaste fördelarna med att använda en länkad-data-strategi för dataintegration i en företagsmiljö. Baserat på positiva testresultat för vår prototyp, kan arkitekturen presenteras i denna rapport ses som en trolig lösning för livscykeldataintegration för OSLC verktyg - kedja. Data integration Linked Data approach Tracked Resource Set (TRS) Data integration Linked Data approach Tracked Resource Set (TRS)
162	Computational Inference of Genome-Wide Protein-DNA Interactions Using High-Throughput Genomic Data Zhong, Jianling January 2015 (has links) <p>Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape. </p><p>We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations. </p><p>We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites. </p><p>Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets. </p><p>This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.</p> / Dissertation Bioinformatics Statistics Computer science Bayes factor Genomic data integration Protein-DNA interactions state-space models statistical inference transcriptional regulation
163	Open(Geo-)Data - ein Katalysator für die Digitalisierung in der Landwirtschaft? Nölle, Olaf 15 November 2016 (has links) (PDF) (Geo-)Daten integrieren, analysieren und visualisieren - Wissen erschließen und in Entscheidungsprozesse integrieren – dafür steht Disy seit knapp 20 Jahren! Datenintegration Analyse Cadenza Landwirtschaft moderne Landwirtschaft data integration analysis Cadenza Agriculture modern farming ddc:630 rvk:ZA 25000
164	Integrace legacy databází do soudobých informačních systémů / Integration of legacy databases to current information systems Navrátil, Jan January 2016 (has links) The goal of this thesis is to design and implement a framework to support Legacy system access. Legacy systems are databases that use incompatible and obsolete technologies and can not be easily abandoned. The framework allows the abstraction of application logic from database platform and will enable full or incremental migration to a new, modern platform in the future. The framework also considers the option of encapsulation of an existing legacy application to be included in the new system as a black box. A system based on proposed framework has been succesfully deployed in a company. The system facilitated the migration of the company to a new information system with an entirely different database platform. The practice shows the viability of the framework design. 1
165	Modelo navegacional dinâmico, para implementação da integração inter-estrutural de dados. / Dynamic navigational model for implementation of the data inter-structural integration. Gomes Neto, José 04 November 2016 (has links) Na última década, observaram-se substanciais mudanças nos tipos de dados processados, quando comparados à definição convencional de dados estruturados. Neste contexto, sistemas computacionais que em sua maioria acessam bases de dados convencionais, centralizadas, que armazenam dados estruturados, necessitam cada vez mais acessarem e processarem também dados não estruturados, distribuídos e em grandes quantidades. Fatores tais como versatilidade em abrigar dados não estruturados, coexistência, integração e difusão de dados complexos a velocidades superiores as velocidades até então observadas, restringem, em determinadas situações, o uso dos modelos de dados convencionais. Dessa forma, nesta Tese é proposto e formalizado um modelo de dados pós relacional, baseado nos conceitos de grafos complexos, também denominados, Redes Complexas. Por intermédio da utilização do modelo de grafos, define-se uma forma de se implementar uma integração inter-estrutural de dados, ou seja, os tradicionais dados estruturados, com os mais recentemente utilizados dados não estruturados, tais como os dados multimídia. Tal integração envolve todas as transações presentes em um banco de dados, ou seja, consulta, inserção, atualização e exclusão de dados. A denominação dada a tal forma de trabalho e implementação foi Modelo Navegacional Dinâmico - MND. Esse modelo representa diferentes estruturas de dados e sobretudo, permite que essas diferentes estruturas coexistam de forma integrada, agregando à informação resultante maior completeza e abrangência. Portanto, o MND associa os benefícios da junção da estrutura das Redes Complexas ao contexto de dados não estruturados, sobretudo no que tange à integração resultante de dados com estruturas distintas, conferindo assim às aplicações que necessitam desta integração, melhoria no aproveitamento dos recursos. / Over the last decade several changes in data processing have been observed when compared to the conventional structured data definition. In such context, computational systems accessing centralized databases need to process large, distributed, non-structured data as well. Factors like versatility in hosting data, coexistence, integration and diffusion of such complex data at high speeds can be, in some cases, troublesome when using conventional data models. In this work a post-relational, graph-based (also known as Complex Network) model, is presented. Such model enables the integration of both structured data and non-structured data, such as multimedia, allowing such structures to coexist. This integration involves all transactions found in a database, such as select, insert, delete and update data. The name given to this form of work and implementation was Navigational Model Dynamic - MND. This model represents different data structures and above all, allows these different structures to coexist in an integrated way, adding to the resulting information greater completeness and comprehensiveness. Hence, MND harnesses the benefits of Complex Network and non-structured data providing all relational data handling already available in other databases but also integration and better use of resources. Banco de dados Complex networks Dados não estruturados Data integration Data models Integração de dados Modelagem de dados Redes complexas Unstructured data
166	Towards Integrating Crowdsourced and Official Traffic Data : A study on the integration of data from Waze in traffic management in Stockholm, Sweden Eriksson, Isak January 2019 (has links) Modern traffic management systems often rely on static technologies, such as sensors and CCTV-cameras, in the gathering of data regarding the current traffic situation. Recent reports have shown that this method can result in a lack of coverage in Stockholm, Sweden. In addressing this issue, an alternative strategy to installing more sensors and CCTV-cameras could be to utilize crowdsourced traffic data from other sources, such as Waze. In order to examine the usage and potential utility of crowdsourced data in traffic management, the Swedish Transport Administration’s center in Stockholm, Trafik Stockholm, developed a web application which visualizes traffic data from both official sources and Waze. While the application was successful in doing so, it revealed the problem of integrating the traffic data from these two sources, as a significant portion of the data was redundant, and the reliability occasionally was questionable. This study aims at determining how issues regarding redundancy and reliability can be resolved in the integration of crowdsourced and official traffic data. Conducted using a design science research strategy, the study investigates these issues by designing and developing an artifact that implements integration methods to match alerts from the data sources based on temporal and spatial proximity constraints. The artifact was evaluated through test sessions in which real-time traffic data from all over Sweden was processed, and through acceptance testing with the stakeholders of the application. Analysis of the results from the evaluations shows that the artifact is effective in reducing the redundancy in the crowdsourced data and that it can provide a more solid ground for reliability assessment. Furthermore, the artifact met its expectations and requirements, demonstrating a proof-of-concept and a proof-of-acceptance. Based on these results, the study concludes that by analyzing temporal and spatial factors in crowdsourced data, redundancy issues in the integration of crowdsourced and official traffic can be resolved to a large extent. Furthermore, it is concluded that reliability issues in the same context can be resolved to a high degree by managing redundancy factors in combination with general traffic management factors. While the study is focused on traffic management, the issues of redundancy and reliability are not restricted to crowdsourced data in this context specifically. Thus, the results of the study are potentially of interest to researchers investigating other areas of application for crowdsourcing as well. Traffic Management Systems Swedish Transport Administration Waze Crowdsourcing Data Integration Information Systems, Social aspects
167	Análise da utilização da manufatura virtual no processo de desenvolvimento de produtos / Analysis of virtual manufacturing utilization in products development process Souza, Mariella Consoni Florenzano 17 June 2005 (has links) A manufatura virtual representa uma abordagem emergente que as empresas podem adotar para melhorar seus processos de desenvolvimento de produtos, introduzindo novos produtos no mercado mais rapidamente e a um custo apropriado. A ideia fundamental é criar um ambiente integrado e sintético, composto por um conjunto de ferramentas e sistemas de software, tais como realidade virtual e simulação para apoiar esses processos. O objetivo deste trabalho é analisar a utilização da manufatura virtual no processo de desenvolvimento de produtos em termos de limitações existentes que podem ser superadas, proposta da manufatura virtual, benefícios, e desafios encontrados para sua aplicação. Para a realização da análise, foi desenvolvido um modelo para orientar a aplicação da manufatura virtual no processo de desenvolvimento de produtos que considera: as atividades do desenvolvimento de produtos que podem ser apoiadas por sistemas de software da manufatura virtual; os tipos de sistemas e suas funcionalidades; e alternativas de formatos neutros para habilitar a interoperabilidade de dados. O trabalho foi desenvolvido através da realização de estudos de caso, que forneceram informações para a análise da utilização da manufatura virtual e para a geração do modelo proposto. / Virtual manufacturing represents the emerging approach the enterprises can use to improve their processes, introducing new products more quickly in the market in a cost effective way. The fundamental idea is to create an integrated and synthetic environment, composed of software tools and systems such as virtual reality and simulation to support those processes. The purpose of this work is to analyze the utilization of virtual manufacturing in the product development process regarding current limitations that can be overcome by virtual manufacturing, its proposal, benefits and challenges for its application. For the analysis accomplishment, a product development model in virtual manufacturing environment was developed which considers: the product development activities that can be supported by virtual manufacturing systems; the system types and their functionalities; and neutral formats alternatives to enable data interoperability. The research was done by the accomplishment of case studies that provided information to the impact analysis and to the model development. Data integration Desenvolvimento de produtos Integração de dados Manufatura virtual Products development Realidade virtual Simulação Simulation Virtual manufacturing Virtual reality
168	Integration of Heterogeneous Databases: Discovery of Meta-Information and Maintenance of Schema-Restructuring Views Koeller, Andreas 15 April 2002 (has links) In today's networked world, information is widely distributed across many independent databases in heterogeneous formats. Integrating such information is a difficult task and has been adressed by several projects. However, previous integration solutions, such as the EVE-Project, have several shortcomings. Database contents and structure change frequently, and users often have incomplete information about the data content and structure of the databases they use. When information from several such insufficiently described sources is to be extracted and integrated, two problems have to be solved: How can we discover the structure and contents of and interrelationships among unknown databases, and how can we provide durable integration views over several such databases? In this dissertation, we have developed solutions for those key problems in information integration. The first part of the dissertation addresses the fact that knowledge about the interrelationships between databases is essential for any attempt at solving the information integration problem. We are presenting an algorithm called FIND2 based on the clique-finding problem in graphs and k-uniform hypergraphs to discover redundancy relationships between two relations. Furthermore, the algorithm is enhanced by heuristics that significantly reduce the search space when necessary. Extensive experimental studies on the algorithm both with and without heuristics illustrate its effectiveness on a variety of real-world data sets. The second part of the dissertation addresses the durable view problem and presents the first algorithm for incremental view maintenance in schema-restructuring views. Such views are essential for the integration of heterogeneous databases. They are typically defined in schema-restructuring query languages like SchemaSQL, which can transform schema into data and vice versa, making traditional view maintenance based on differential queries impossible. Based on an existing algebra for SchemaSQL, we present an update propagation algorithm that propagates updates along the query algebra tree and prove its correctness. We also propose optimizations on our algorithm and present experimental results showing its benefits over view recomputation. schema restructuring schema changes meta-data discovery data mining data integration Database management Computer algorithms Data mining
169	Targeted feedback collection for data source selection with uncertainty Cortés Ríos, Julio César January 2018 (has links) The aim of this dissertation is to contribute to research on pay-as-you-go data integration through the proposal of an approach for targeted feedback collection (TFC), which aims to improve the cost-effectiveness of feedback collection, especially when there is uncertainty associated with characteristics of the integration artefacts. In particular, this dissertation focuses on the data source selection task in data integration. It is shown how the impact of uncertainty about the evaluation of the characteristics of the candidate data sources, also known as data criteria, can be reduced, in a cost-effective manner, thereby improving the solutions to the data source selection problem. This dissertation shows how alternative approaches such as active learning and simple heuristics have drawbacks that throw light into the pursuit of better solutions to the problem. This dissertation describes the resulting TFC strategy and reports on its evaluation against alternative techniques. The evaluation scenarios vary from synthetic data sources with a single criterion and reliable feedback to real data sources with multiple criteria and unreliable feedback (such as can be obtained through crowdsourcing). The results confirm that the proposed TFC approach is cost-effective and leads to improved solutions for data source selection by seeking feedback that reduces uncertainty about the data criteria of the candidate data sources. 004
170	Avaliação experimental de uma técnica de padronização de escores de similaridade / Experimental evaluation of a similarity score standardization technique Nunes, Marcos Freitas January 2009 (has links) Com o crescimento e a facilidade de acesso a Internet, o volume de dados cresceu muito nos últimos anos e, consequentemente, ficou muito fácil o acesso a bases de dados remotas, permitindo integrar dados fisicamente distantes. Geralmente, instâncias de um mesmo objeto no mundo real, originadas de bases distintas, apresentam diferenças na representação de seus valores, ou seja, os mesmos dados no mundo real podem ser representados de formas diferentes. Neste contexto, surgiram os estudos sobre casamento aproximado utilizando funções de similaridade. Por consequência, surgiu a dificuldade de entender os resultados das funções e selecionar limiares ideais. Quando se trata de casamento de agregados (registros), existe o problema de combinar os escores de similaridade, pois funções distintas possuem distribuições diferentes. Com objetivo de contornar este problema, foi desenvolvida em um trabalho anterior uma técnica de padronização de escores, que propõe substituir o escore calculado pela função de similaridade por um escore ajustado (calculado através de um treinamento), o qual é intuitivo para o usuário e pode ser combinado no processo de casamento de registros. Tal técnica foi desenvolvida por uma aluna de doutorado do grupo de Banco de Dados da UFRGS e será chamada aqui de MeaningScore (DORNELES et al., 2007). O presente trabalho visa estudar e realizar uma avaliação experimental detalhada da técnica MeaningScore. Com o final do processo de avaliação aqui executado, é possível afirmar que a utilização da abordagem MeaningScore é válida e retorna melhores resultados. No processo de casamento de registros, onde escores de similaridades distintos devem ser combinados, a utilização deste escore padronizado ao invés do escore original, retornado pela função de similaridade, produz resultados com maior qualidade. / With the growth of the Web, the volume of information grew considerably over the past years, and consequently, the access to remote databases became easier, which allows the integration of distributed information. Usually, instances of the same object in the real world, originated from distinct databases, present differences in the representation of their values, which means that the same information can be represented in different ways. In this context, research on approximate matching using similarity functions arises. As a consequence, there is a need to understand the result of the functions and to select ideal thresholds. Also, when matching records, there is the problem of combining the similarity scores, since distinct functions have different distributions. With the purpose of overcoming this problem, a previous work developed a technique that standardizes the scores, by replacing the computed score by an adjusted score (computed through a training), which is more intuitive for the user and can be combined in the process of record matching. This work was developed by a Phd student from the UFRGS database research group, and is referred to as MeaningScore (DORNELES et al., 2007). The present work intends to study and perform an experimental evaluation of this technique. As the validation shows, it is possible to say that the usage of the MeaningScore approach is valid and return better results. In the process of record matching, where distinct similarity must be combined, the usage of the adjusted score produces results with higher quality. Armazenamento : Dados Banco : Dados Métricas : Similaridade Consulta : Similaridade Similarity querying Data integration Data cleaning Record matching Adjusted score Data quality

Search results