Global ETD Search

1	Schema Integration : How to Integrate Static and Dynamic Database Schemata Bellström, Peter January 2010 (has links) Schema integration is the task of integrating several local schemata into one global database schema. It is a complex, error-prone and time consuming task. Problems arise in recognizing and resolving problems, such as differences and similarities, between two schemata. Problems also arise in integrating static and dynamic schemata. In this thesis, three research topics are addressed: Maintaining Vocabulary in Schema Integration, Integration of Static Schemata and Integration of Static and Dynamic Schemata, while applying the notation in the Enterprise Modeling approach. In Maintaining Vocabulary in Schema Integration an analysis of what semantic loss is and why it occurs in schema integration is conducted. Semantic loss is a problem that should be avoided because both concepts and dependencies might be lost. In the thesis, it is argued that concepts and dependencies should be retained as long as possible in the schemata. This should facilitate user involvement since the users’ vocabulary is retained even after resolving similarities and differences between two schemata. In Integration of Static Schemata two methods are developed. These methods facilitate recognition and resolution of similarities and differences between two conceptual database schemata. By applying the first method, problems between two schemata can be recognized that otherwise could pass unnoticed; by applying the second method, problems can be resolved without causing semantic loss by retaining concepts and dependencies in the schemata. In Integration of Static and Dynamic Schemata a method on how to integrate static and dynamic schemata is developed. In the method, focus is put on pre- and post-conditions and how to map these to states and state changes in the database. By applying the method, states that are important for the database can be designed and integrated into the conceptual database schema. Also, by applying the method, active database rules can be designed and integrated into the conceptual database schema. Schema Integration Conceptual Database Design Database Schemata Enterprise Modeling
2	A context-based name resolution approach for semantic schema integration BELIAN, Rosalie Barreto 31 January 2008 (has links) Made available in DSpace on 2014-06-12T15:50:47Z (GMT). No. of bitstreams: 2 arquivo1988_1.pdf: 1433897 bytes, checksum: 2bd67eddaeadba13aa380ec5c913b7e0 (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2008 / Uma das propostas da Web Semântica é fornecer uma grande diversidade de serviços de diferentes domínios na Web. Estes serviços são, em sua maioria, colaborativos, cujas tarefas se baseiam em processos de tomada de decisão. Estas decisões, por sua vez, serão mais bem embasadas se considerarem a maior quantidade possível de informação relacionada às tarefas em execução. Neste sentido, este cenário encoraja o desenvolvimento de técnicas e ferramentas orientadas para a integração de informação, procurando soluções para a heterogeneidade das fontes de dados. A arquitetura baseada em mediação, utilizada no desenvolvimento de sistemas de integração de informações tem como objetivo isolar o usuário das fontes de dados distribuídas utilizando uma camada intermediária de software chamada de mediador. O mediador, em um sistema de integração de informações, utiliza um esquema global para a execução das consultas do usuário que são reformuladas em sub-consultas de acordo com os esquemas locais das fontes de dados. Neste caso, um processo de integração de esquemas gera o esquema global (esquema de mediação) como resultado da integração dos esquemas individuais das fontes de dados. O problema maior em integração de esquemas é a heterogeneidade das fontes de dados locais. Neste sentido, a resolução semântica é primordial. A utilização de métodos puramente estruturais e sintáticos na integração de esquemas é pouco eficaz se antes não houver a identificação do real significado dos elementos dos esquemas. Um processo de integração de esquemas tem como resultado um esquema global integrado e um conjunto de mapeamentos inter-esquema e usualmente, compreende algumas etapas básicas como: pré-integração, comparação, mapeamento e unificação de esquemas e geração do esquema de mediação. Em integração de esquemas, resolução de nomes é o processo que determina a qual entidade do mundo real um dado elemento de esquema se refere, levando em consideração um conjunto de informações semânticas disponíveis. A informação semântica necessária para resolução de nomes, em geral, é obtida de vocabulários genéricos e/ou específicos de um determinado domínio de conhecimento. Nomes de elementos podem apresentar significados diferentes dependendo do contexto semântico ao qual eles estão relacionados. Assim, o uso de informação contextual, além da de domínio, pode trazer uma maior precisão na interpretação dos elementos permitindo modificar o seu significado de acordo com um dado contexto. Este trabalho propõe uma abordagem de resolução de nomes baseada em contexto para integração de esquemas. Um de seus pontos fortes é a utilização e modelagem da informação contextual necessária à resolução de nomes em diferentes etapas do processo de integração de esquemas. A informação contextual está modelada utilizando uma ontologia, o que favorece a utilização de mecanismos de inferência, compartilhamento e reuso da informação. Além disto, este trabalho propõe um processo de integração de esquemas simples e extensível de forma que seu desenvolvimento se concentrasse principalmente nos requisitos relacionados à resolução de nomes. Este processo foi desenvolvido para um sistema de integração de informações baseado em mediação, que adota a abordagem GAV e XML como modelo comum para intercâmbio de dados e integração de fontes de dados na Web Semantics in schema integration Name resolution Context-sensitive information Context ontology
3	Duomenų loginių struktūrų išskyrimas funkcinių reikalavimų specifikacijos pagrindu / Data logical structure segregation on the ground of a functional requirements specification Jučiūtė, Laura 25 May 2006 (has links) The place of data modelling in information systems‘ life cycle and the importance of data model quality in effective IS exploitation are shown in this masters' work. Refering to results of the nonfiction literature analysis the reasons why the process of data modelling must be automated are introduced; current automatization solutions are described. And as it is the main purpose of this work an original data modelling method is described and programmable prototype which automates one step of that method – schema integration is introduced. Informatics Engineering Schemų integravimas Duomenų modelis ER schema Data modeling Data model Schema integration
4	A survey of approaches to automatic schema matching Rahm, Erhard, Bernstein, Philip A. 19 October 2018 (has links) Schema matching is a basic problem in many database application domains, such as data integration, E-business, data warehousing, and semantic query processing. In current implementations, schema matching is typically performed manually, which has significant limitations. On the other hand, previous research papers have proposed many techniques to achieve a partial automation of the match operation for specific application domains. We present a taxonomy that covers many of these existing approaches, and we describe the approaches in some detail. In particular, we distinguish between schema-level and instance-level, element-level and structure-level, and language-based and constraint-based matchers. Based on our classification we review some previous match implementations thereby indicating which part of the solution space they cover. We intend our taxonomy and review of past work to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component. info:eu-repo/classification/ddc/004 ddc:004
5	Information Integration in a Grid Environment Applications in the Bioinformatics Domain Radwan, Ahmed M. 16 December 2010 (has links) Grid computing emerged as a framework for supporting complex operations over large datasets; it enables the harnessing of large numbers of processors working in parallel to solve computing problems that typically spread across various domains. We focus on the problems of data management in a grid/cloud environment. The broader context of designing a services oriented architecture (SOA) for information integration is studied, identifying the main components for realizing this architecture. The BioFederator is a web services-based data federation architecture for bioinformatics applications. Based on collaborations with bioinformatics researchers, several domain-specific data federation challenges and needs are identified. The BioFederator addresses such challenges and provides an architecture that incorporates a series of utility services; these address issues like automatic workflow composition, domain semantics, and the distributed nature of the data. The design also incorporates a series of data-oriented services that facilitate the actual integration of data. Schema integration is a core problem in the BioFederator context. Previous methods for schema integration rely on the exploration, implicit or explicit, of the multiple design choices that are possible for the integrated schema. Such exploration relies heavily on user interaction; thus, it is time consuming and labor intensive. Furthermore, previous methods have ignored the additional information that typically results from the schema matching process, that is, the weights and in some cases the directions that are associated with the correspondences. We propose a more automatic approach to schema integration that is based on the use of directed and weighted correspondences between the concepts that appear in the source schemas. A key component of our approach is a ranking mechanism for the automatic generation of the best candidate schemas. The algorithm gives more weight to schemas that combine the concepts with higher similarity or coverage. Thus, the algorithm makes certain decisions that otherwise would likely be taken by a human expert. We show that the algorithm runs in polynomial time and moreover has good performance in practice. The proposed methods and algorithms are compared to the state of the art approaches. The BioFederator design, services, and usage scenarios are discussed. We demonstrate how our architecture can be leveraged on real world bioinformatics applications. We preformed a whole human genome annotation for nucleosome exclusion regions. The resulting annotations were studied and correlated with tissue specificity, gene density and other important gene regulation features. We also study data processing models on grid environments. MapReduce is one popular parallel programming model that is proven to scale. However, using the low-level MapReduce for general data processing tasks poses the problem of developing, maintaining and reusing custom low-level user code. Several frameworks have emerged to address this problem; these frameworks share a top-down approach, where a high-level language is used to describe the problem semantics, and the framework takes care of translating this problem description into the MapReduce constructs. We highlight several issues in the existing approaches and alternatively propose a novel refined MapReduce model that addresses the maintainability and reusability issues, without sacrificing the low-level controllability offered by directly writing MapReduce code. We present MapReduce-LEGOS (MR-LEGOS), an explicit model for composing MapReduce constructs from simpler components, namely, "Maplets", "Reducelets" and optionally "Combinelets". Maplets and Reducelets are standard MapReduce constructs that can be composed to define aggregated constructs describing the problem semantics. This composition can be viewed as defining a micro-workflow inside the MapReduce job. Using the proposed model, complex problem semantics can be defined in the encompassing micro-workflow provided by MR-LEGOS while keeping the building blocks simple. We discuss the design details, its main features and usage scenarios. Through experimental evaluation, we show that the proposed design is highly scalable and has good performance in practice. Data Federation Data Integration Schema Integration Bioinformatics Grid Computing Cloud Computing Mapreduce Hadoop Data Management Extract Transform Load
6	MIDB : um modelo de integração de dados biológicos Perlin, Caroline Beatriz 29 February 2012 (has links) Made available in DSpace on 2016-06-02T19:05:56Z (GMT). No. of bitstreams: 1 4370.pdf: 1089392 bytes, checksum: 82daa0e51d37184f8864bd92d9342dde (MD5) Previous issue date: 2012-02-29 / In bioinformatics, there is a huge volume of data related to biomolecules and to nucleotide and amino acid sequences that reside (in almost their totality) in several Biological Data Bases (BDBs). For a specific sequence, there are some informational classifications: genomic data, evolution-data, structural data, and others. Some BDBs store just one or some of these classifications. Those BDBs are hosted in different sites and servers, with several data base management systems with different data models. Besides, instances and schema might have semantic heterogeneity. In such scenario, the objective of this project is to propose a biological data integration model, that adopts new schema integration and instance integration techniques. The proposed integration model has a special mechanism of schema integration and another mechanism that performs the instance integration (with support of a dictionary) allowing conflict resolution in the attribute values; and a Clustering Algorithm is used in order to cluster similar entities. Besides, a domain specialist participates managing those clusters. The proposed model was validated through a study case focusing on schema and instance integration about nucleotide sequence data from organisms of Actinomyces gender, captured from four different data sources. The result is that about 97.91% of the attributes were correctly categorized in the schema integration, and the instance integration was able to identify that about 50% of the clusters created need support from a specialist, avoiding errors on the instance resolution. Besides, some contributions are presented, as the Attributes Categorization, the Clustering Algorithm, the distance functions proposed and the proposed model itself. / Na bioinformática, existe um imenso volume de dados sendo produzidos, os quais estão relacionados a sequências de nucleotídeos e aminoácidos que se encontram, em quase a sua totalidade, armazenados em Bancos de Dados Biológicos (BDBs). Para uma determinada sequência existem algumas classificações de informação: dados genômicos, dados evolutivos, dados estruturais, dentre outros. Existem BDBs que armazenam somente uma ou algumas dessas classificações. Tais BDBs estão hospedados em diferentes sites e servidores, com sistemas gerenciadores de banco de dados distintos e com uso de diferentes modelos de dados, além de terem instâncias e esquemas com heterogeneidade semântica. Dentro desse contexto, o objetivo deste projeto de mestrado é propor um Modelo de Integração de Dados Biológicos, com novas técnicas de integração de esquemas e integração de instâncias. O modelo de integração proposto possui um mecanismo especial de integração de esquemas, e outro mecanismo que realiza a integração de instâncias de dados (com um dicionário acoplado) permitindo resolução de conflitos nos valores dos atributos; e um Algoritmo de Clusterização é utilizado, com o objetivo de realizar o agrupamento de entidades similares. Além disso, o especialista de domínio participa do gerenciamento desses agrupamentos. Esse modelo foi validado por meio de um estudo de caso com ênfase na integração de esquemas e integração de instâncias com dados de sequências de nucleotídeos de genes de organismos do gênero Actinomyces, provenientes de quatro diferentes fontes de dados. Como resultado, obteve-se que aproximadamente 97,91% dos atributos foram categorizados corretamente na integração de esquemas e a integração de instâncias conseguiu identificar que aproximadamente 50% dos clusters gerados precisam de tratamento do especialista, evitando erros de resolução de entidades. Além disso, algumas contribuições são apresentadas, como por exemplo a Categorização de Atributos, o Algoritmo de Clusterização, as funções de distância propostas e o modelo MIDB em si. Banco de dados Bioinformática Modelo de integração de dados Integração de esquemas Integração de instâncias Integração de Dados Biológicos Bioinformatics Biological Databases Biological Database Integration Data Integration Model Schema Integration Instance Integration
7	Modelagem da eficiência de coleta em ciclones utilizando a fluidodinâmica computacional / Modeling the cyclone collection efficiency using computational fluid dynamics Ramirez, Maribel Valverde 30 September 2013 (has links) Made available in DSpace on 2016-06-02T19:55:37Z (GMT). No. of bitstreams: 1 5587.pdf: 3066852 bytes, checksum: 17d88d3ab913d61bb315842b57ec5a74 (MD5) Previous issue date: 2013-09-30 / Universidade Federal de Sao Carlos / Cyclones are used to separate suspended particles from gas streams. The analysis of flow inside the cyclone is complex due to the large number of parameters and operating variables that influence the dynamics of the system. Due to its versatility of use and its robustness, results in computational fluid dynamic are an important alternative often used to study the dynamics of flow in the cyclone. In the literature, several works make use of computational fluid dynamics in order to study the pressure drop of cyclones. However, the works about the efficiency of collection are fewer. For the cyclone, computational fluid dynamics allows an accurately simulation of the collection efficiency of particles with diameter greater than 5 µm; but for particles with smaller diameters the simulated results diverge from experimental values. In this work, Stairmand s cyclone was numerically simulated with injected particles with diameters from 1 to 5 µm. The results were verified using available experimental data reported in the literature. The experimental data taken from literature belong to Zhao and Shen Kang (2004) and Zhao (2005). The simulation mesh was analyzed before and after the simulations. The turbulence models used in the simulation of flow in the cyclone were: Reynolds Stress Model (RSM) and Large Eddy Simulation (LES). The dispersed phase was simulated considering one and two way couplings. The equation of motion of the particles was integrated using the integration schemes: implicit, analytic, trapezoidal, and Runge-Kutta. The results showed that the methodology was adequate to reproduce the behavior of the flow in the cyclone. The error percentages obtained in pressure drop were under 5% and the average collection efficiency was reproduced with good accuracy for diameters of 3, 4 and 5 µm. / O ciclone é um equipamento utilizado para separar partículas suspensas em correntes gasosas. A análise do escoamento dentro do ciclone é complexa devido ao grande número de parâmetros e variáveis operacionais que influenciam na dinâmica do sistema. Devido a sua versatilidade de uso e a sua robustez nos resultados a fluidodinâmica computacional é uma alternativa importante utilizada com frequência no estudo da dinâmica do escoamento no ciclone. Na literatura existem vários trabalhos que fazem uso da fluidodinâmica computacional no estudo da queda de pressão. Entretanto, no estudo da eficiência de coleta, os artigos são em menor quantidade. No ciclone, a fluidodinâmica computacional permite simular a eficiência de coleta de partículas com diâmetro maiores que 5 µm com bastante precisão; a modelagem da eficiência de coleta para partículas com diâmetros menores ainda resultam em valores diferentes dos valores experimentais. Neste trabalho foram realizadas as simulações numéricas no ciclone tipo Stairmand; foram injetadas partículas de 1 a 5 µm. Para a verificação foram tomados dados experimentais disponíveis na literatura. Os dados experimentais foram obtidos por Zhao, Shen e Kang (2004) e Zhao (2005). A malha utilizada nas simulações, foi analisada antes e depois das simulações. Os modelos de turbulência utilizados para simular o escoamento no ciclone foram: Reynolds Stress Model (RSM) e Large Eddy Simulation (LES). A fase dispersa foi simulada considerando acoplamento de uma via e acoplamento de duas vias. Para integrar as equações de movimento das partículas, foram utilizados os esquemas de integração: implícito, analítico, trapezoidal e Runge-Kutta. Os resultados mostraram que a metodologia empregada mostrou-se adequada para reproduzir o comportamento do escoamento no ciclone. As porcentagens de erro obtidos na queda de pressão foram em média menores que 5% e na eficiência de coleta foram reproduzidas com boa precisão para os diâmetros de 3, 4 e 5 µm. Engenharia química Ciclone Análise numérica Turbulência - modelos matemáticos Eficiência de coleta Integração numérica Malhas refinadas Fluxo turbulento Esquemas de integração Acoplamento de uma e duas vias Cyclone Mesh refined Turbulent flow Collection efficiency Schema integration One and two way coupling ENGENHARIAS::ENGENHARIA QUIMICA
8	Integracija šema modula baze podataka informacionog sistema / Integration of Information System Database Module Schemas Luković Ivan 18 January 1996 (has links) <p>Paralelan i nezavisan rad vi&scaron;e projektanata na različitim modulima (podsistemima) nekog informacionog sistema, identifikovanim saglasno početnoj funkcionalnoj dekompoziciji realnog sistema, nužno dovodi do međusobno nekonzistentnih re&scaron;enja &scaron;ema modula baze podataka. Rad se bavi pitanjima identifikacije i razre&scaron;avanja problema, vezanih za automatsko otkrivanje kolizija, koje nastaju pri paralelnom projektovanju različitih &scaron;ema modula i problema vezanih za integraciju &scaron;ema modula u jedinstvenu &scaron;emu baze podataka informacionog sistema.</p><p>Identifikovani su mogući tipovi kolizija &scaron;ema modula, formulisan je i dokazan potreban i dovoljan uslov stroge i intenzionalne kompatibilnosti &scaron;ema modula, &scaron;to je omogućilo da se, u formi algoritama, prikažu postupci za ispitivanje stroge i intenzionalne kompatibilnosti &scaron;ema modula. Formalizovan je i postupak integracije kompatibilnih &scaron;ema u jedinstvenu (strogo pokrivajuću) &scaron;emu baze podataka. Dat je, takođe, prikaz metodologije primene algoritama za testiranje kompatibilnosti i integraciju &scaron;ema modula u jedinstvenu &scaron;emu baze podataka informacionog sistema.</p> / <p>Parallel and independent work of a number of designers on different information system modules (i.e. subsystems), identified by the initial real system functional decomposition, necessarily leads to mutually inconsistent database (db) module schemas. The thesis considers the problems concerning automatic detection of collisions, that can appear during the simultaneous design of different db module schemas, and integration of db module schemas into the unique information system db schema.</p><p>All possible types of db module schema collisions have been identified. Necessary and sufficient condition of strong and intensional db module schema compatibility has been formu-lated and proved. It has enabled to formalize the process of db module schema strong and intensional compatibility checking and to construct the appropriate algorithms. The integration process of the unique (strong covering) db schema, on the basis of compatible db module schemas, is formalized, as well. The methodology of applying the algorithms for compatibility checking and unique db schema integration is also presented.</p>
9	Exploitation dynamique des données de production pour améliorer les méthodes DFM dans l'industrie Microélectronique / Towards production data mining to improve DFM methods in Microelectronics industry Shahzad, Muhammad Kashif 05 October 2012 (has links) La « conception pour la fabrication » ou DFM (Design for Manufacturing) est une méthode maintenant classique pour assurer lors de la conception des produits simultanément la faisabilité, la qualité et le rendement de la production. Dans l'industrie microélectronique, le Design Rule Manual (DRM) a bien fonctionné jusqu'à la technologie 250nm avec la prise en compte des variations systématiques dans les règles et/ou des modèles basés sur l'analyse des causes profondes, mais au-delà de cette technologie, des limites ont été atteintes en raison de l'incapacité à sasir les corrélations entre variations spatiales. D'autre part, l'évolution rapide des produits et des technologies contraint à une mise à jour « dynamique » des DRM en fonction des améliorations trouvées dans les fabs. Dans ce contexte les contributions de thèse sont (i) une définition interdisciplinaire des AMDEC et analyse de risques pour contribuer aux défis du DFM dynamique, (ii) un modèle MAM (mapping and alignment model) de localisation spatiale pour les données de tests, (iii) un référentiel de données basé sur une ontologie ROMMII (referential ontology Meta model for information integration) pour effectuer le mapping entre des données hétérogènes issues de sources variées et (iv) un modèle SPM (spatial positioning model) qui vise à intégrer les facteurs spatiaux dans les méthodes DFM de la microélectronique, pour effectuer une analyse précise et la modélisation des variations spatiales basées sur l'exploitation dynamique des données de fabrication avec des volumétries importantes. / The DFM (design for manufacturing) methods are used during technology alignment and adoption processes in the semiconductor industry (SI) for manufacturability and yield assessments. These methods have worked well till 250nm technology for the transformation of systematic variations into rules and/or models based on the single-source data analyses, but beyond this technology they have turned into ineffective R&D efforts. The reason for this is our inability to capture newly emerging spatial variations. It has led an exponential increase in technology lead times and costs that must be addressed; hence, objectively in this thesis we are focused on identifying and removing causes associated with the DFM ineffectiveness. The fabless, foundry and traditional integrated device manufacturer (IDM) business models are first analyzed to see coherence against a recent shift in business objectives from time-to-market (T2M) and time-to-volume towards (T2V) towards ramp-up rate. The increasing technology lead times and costs are identified as a big challenge in achieving quick ramp-up rates; hence, an extended IDM (e-IDM) business model is proposed to support quick ramp-up rates which is based on improving the DFM ineffectiveness followed by its smooth integration. We have found (i) single-source analyses and (ii) inability to exploit huge manufacturing data volumes as core limiting factors (failure modes) towards DFM ineffectiveness during technology alignment and adoption efforts within an IDM. The causes for single-source root cause analysis are identified as the (i) varying metrology reference frames and (ii) test structures orientations that require wafer rotation prior to the measurements, resulting in varying metrology coordinates (die/site level mismatches). A generic coordinates mapping and alignment model (MAM) is proposed to remove these die/site level mismatches, however to accurately capture the emerging spatial variations, we have proposed a spatial positioning model (SPM) to perform multi-source parametric correlation based on the shortest distance between respective test structures used to measure the parameters. The (i) unstructured model evolution, (ii) ontology issues and (iii) missing links among production databases are found as causes towards our inability to exploit huge manufacturing data volumes. The ROMMII (referential ontology Meta model for information integration) framework is then proposed to remove these issues and enable the dynamic and efficient multi-source root cause analyses. An interdisciplinary failure mode effect analysis (i-FMEA) methodology is also proposed to find cyclic failure modes and causes across the business functions which require generic solutions rather than operational fixes for improvement. The proposed e-IDM, MAM, SPM, and ROMMII framework results in accurate analysis and modeling of emerging spatial variations based on dynamic exploitation of the huge manufacturing data volumes. DFM (conception pour le fabrication) MFD (Fabrication pour la conception) Integration d'information Ingenrie d'ontologie Apprentissage par la machine DFM (Design for manufacturing) MFD (Manufacturing for Design) Ontology Engineering PLM (Product life cycle management) Machine Learning

Search results