Global ETD Search

51	Ontology-based approach for standard formats integration in reservoir modeling / Abordagem baseada em ontologias para integração de formatos padrões em modelagem de reservatórios Werlang, Ricardo January 2015 (has links) A integração de dados oriundos de fontes autônomas e heterogêneas ainda é um grande problema para diversas aplicações. Na indústria de petróleo e gás, uma grande quantidade de dados é gerada diariamente a partir de múltiplas fontes, tais como dados sísmicos, dados de poços, dados de perfuração, dados de transporte e dados de marketing. No entanto, estes dados são adquiridos através da aplicação de diferentes técnicas e representados em diferentes formatos e padrões. Assim, estes dados existem de formas estruturadas em banco de dados e de formas semi-estruturadas em planilhas e documentos, tais como relatórios e coleções multimídia. Para lidar com a heterogeneidade dos formatos de dados, a informação precisa ser padronizada e integrada em todos os sistemas, disciplinas e fronteiras organizacionais. Como resultado, este processo de integração permitirá uma melhor tomada de decisão dentro de colaborações, uma vez que dados de alta qualidade poderão ser acessados em tempo hábil. A indústria do petróleo depende do uso eficiente desses dados para a construção de modelos computacionais, a fim de simplificar a realidade geológica e para ajudar a compreende-la. Tal modelo, que contém objetos geológicos analisados por diferentes profissionais—geólogos, geofísicos e engenheiros — não representa a realidade propriamente dita, mas a conceitualização do especialista. Como resultado, os objetos geológicos modelados assumem representações semânticas distintas e complementares no apoio à tomada de decisões. Para manter os significados pretendidos originalmente, ontologias estão sendo usadas para explicitar a semântica dos modelos e para integrar os dados e arquivos gerados nas etapas da cadeia de exploração. A principal reivindicação deste trabalho é que a interoperabilidade entre modelos da terra construídos e manipulados por diferentes profissionais e sistemas pode ser alcançada evidenciando o significado dos objetos geológicos representados nos modelos. Nós mostramos que ontologias de domínio desenvolvidas com o apoio de conceitos teórico de ontologias de fundamentação demonstraram ser uma ferramenta adequada para esclarecer a semântica dos conceitos geológicos. Nós exemplificamos essa capacidade através da análise dos formatos de comunicação padrões mais utilizados na cadeia de modelagem (LAS, WITSML e RESQML), em busca de entidades semanticamente relacionadas com os conceitos geológicos descritos em ontologias de Geociências. Mostramos como as noções de identidade, rigidez, essencialidade e unidade, aplicadas a conceitos ontológicos, conduzem o modelador à definir mais precisamente os objetos geológicos no modelo. Ao tornar explícitas as propriedades de identidade dos objetos modelados, o modelador pode superar as ambiguidades da terminologia geológica. Ao fazer isso, explicitamos os objetos e propriedades relevantes que podem ser mapeados a partir de um modelo para outro, mesmo quando eles estão representados em diferentes nomes e formatos. / The integration of data issued from autonomous and heterogeneous sources is still a significant problem for an important number of applications. In the oil and gas industry, a large amount of data is generated every day from multiple sources such as seismic data, well data, drilling data, transportation data, and marketing data. However, these data are acquired by the application of different techniques and represented in different standards and formats. Thus, these data exist in a structured form in databases, and in semi-structured forms in spreadsheets and documents such as reports and multimedia collections. To deal with this large amount of information, as well as the heterogeneous data formats of the data, the information needs to be standardized and integrated across systems, disciplines and organizational boundaries. As a result, this information integration will enable better decision making within collaborations, once high quality data will be accessible timely. The petroleum industry depends on the efficient use of these data to the construction of computer models in order to simplify the geological reality and to help understanding it. Such a model, which contains geological objects analyzed by different professionals – geologists, geophysicists and engineers – does not represent the reality itself, but the expert’s conceptualization. As a result, the geological objects modeled assume distinct semantic representations and complementary in supporting decision-making. For keeping the original intended meanings, ontologies were used for expliciting the semantic of the models and for integrating the data and files generated in the various stages of the exploration chain. The major claim of this work is that interoperability among earth models built and manipulated by different professionals and systems can be achieved by making apparent the meaning of the geological objects represented in the models. We show that domain ontologies developed with support of theoretical background of foundational ontologies show to be an adequate tool to clarify the semantic of geology concepts. We exemplify this capability by analyzing the communication standard formats most used in the modeling chain (LAS,WITSML, and RESQML), searching for entities semantically related with the geological concepts described in ontologies for Geosciences. We show how the notions of identity, rigidity, essentiality and unity applied to ontological concepts lead the modeler to more precisely define the geological objects in the model. By making explicit the identity properties of the modeled objects, the modeler who applies data standards can overcome the ambiguities of the geological terminology. In doing that, we clarify which are the relevant objects and properties that can be mapped from one model to another, even when they are represented with different names and formats. Ontologias Geociências Informática médica Geological data integration Communication standard formats Conceptual modeling Ontology Foundational ontology Geological objects mapping
52	Modelagem e representação semântica de dados governamentais abertos da Previdência Social brasileira Pereira, Durval Vieira 14 February 2017 (has links) Submitted by Jussara Moore (jussaramoore@id.uff.br) on 2017-02-14T13:33:38Z No. of bitstreams: 1 DISSERTAÇÃO_DURVAL VIEIRA PEREIRA.pdf: 2783976 bytes, checksum: c9a2323462b1a68f94c5e66da32966bd (MD5) / Approved for entry into archive by Jussara Moore (jussaramoore@id.uff.br) on 2017-02-14T16:29:27Z (GMT) No. of bitstreams: 1 DISSERTAÇÃO_DURVAL VIEIRA PEREIRA.pdf: 2783976 bytes, checksum: c9a2323462b1a68f94c5e66da32966bd (MD5) / Made available in DSpace on 2017-02-14T16:29:27Z (GMT). No. of bitstreams: 1 DISSERTAÇÃO_DURVAL VIEIRA PEREIRA.pdf: 2783976 bytes, checksum: c9a2323462b1a68f94c5e66da32966bd (MD5) / Objetiva propor um modelo conceitual dos dados sobre acidentes do trabalho para publicação dos dados governamentais mantidos pela Previdência Social. Busca na literatura modelos conceituais ou vocabulários sobre acidentes do trabalho, analisa o Vocabulário Controlado do Governo Eletrônico (VCGE), o modelo de publicações de dados sobre acidentes do trabalho publicado pela Dataprev e o tesauro e a taxonomia da Organização Internacional do Trabalho (OIT). Identifica a ausência de um modelo conceitual dos dados da Previdência Social para publicação em formato aberto e utiliza as tecnologias de Web Semântica, de forma a torná-las compartilháveis, acessíveis e reutilizáveis. Seleciona e analisa definições de acidente do trabalho e identifica conceitos e relacionamentos. Classifica os conceitos encontrados de acordo com as ontologias UFO-B e DUL. Utiliza o modelo Entidade-Relacionamento para auxiliar na elaboração de um modelo que consiga representar o domínio sobre acidente do trabalho. Constata a necessidade da elaboração de um vocabulário específico para descrever os conceitos sobre acidentes do trabalho como forma de enriquecer a representação dos dados analisados. Representa uma amostra dos dados em RDF, utilizando o modelo conceitual e o vocabulário proposto. Conclui que a elaboração do modelo conceitual e a descrição em RDF pareceram adequadas para organizar e fornecer um nível mínimo de semântica aos dados sobre acidente do trabalho da Previdência Social brasileira. / This project studies the conceptualization model about occupational accidents at Social Security for open government data. It conducts research literature to identify conceptual models or vocabulary about workplace accidents from the analysis of E-Government Controlled Vocabulary (VCGE from the Portuguese language), the occupational accident model published by Dataprev and also the taxonomy and thesaurus of International Labour Organization (ILO). It identifies the absence of a conceptual model for Social Security data for publication in an open format using Web Semantics technologies, to make this data sharable, affordable and reusable. It selects and discusses the definitions of occupational accidents by identifying concepts and relationships. It classifies the concepts found in accordance with ontologies UFO-B and DUL. It uses the Entity-Relationship model to assist in developing a model that can represent the domain of occupational accident. It notes the need to develop a specific vocabulary to describe the concepts of occupational accidents as a way to enhance the representation of the data analyzed. It represents part of the data in RDF using the proposed conceptual model and vocabulary. It concludes that the development of the conceptual model and description in RDF seemed appropriate to organize and provide a minimum level of semantic data on occupational accident at Brazilian Social Security. Modelagem conceitual Dados governamentais abertos Web semântica Vocabulário controlado Previdência social Acidente de trabalho Conceptual modeling Open Government Data
53	Extrakce informací z webu založená na ontologiích / Ontology Based Information Extraction from the Web Buba, Vojtěch January 2017 (has links) The main aim of this thesis is an extraction of information from web based on conceptual modeling with ontology. The main goal of this thesis in question is implementation of a tool, which process input ontology and enable additional editing through graphical user interface. Readers of this thesis will be introduced with languages for writing ontologies, for example RDF, RDFS or OWL. Two extraction methods which use ontologies to describe extracted informations are also explained. Final solution is designed to consider all needs of extraction task defined by Ing. Radek Burget Ph.D. Output of this tool is definition of extraction task compatible with solution FITLayout, being developed at FIT BUT.
54	Ontology and Law: Bioprospecting in Antarctica Prasad, Rakesh January 2022 (has links) Could it be that even though no international treaty or regulation regulates bioprospecting in Antarctica, some features of the techno-science of bioprospecting already lie embedded in the deep texts of the potentially most relevant treaties and regulations? If so, international law already to that extent comprehends the phenomenon, making for sustainable governance and thereby sustainable development. To find out, first an ontology of bioprospecting was synthesized, by an activity theory based conceptual system modeling (CSM). Treating bioprospecting as an activity of search for and research of naturally occurring biota, a set of Conceptual Graphs and associated Tables were drawn up as its ontology-synthesis. Features of this conceptualization were then searched for by an ontological-analysis of the deep texts of selected twenty-five legal instruments, through an ontological legal research (OLR). Search results did unearth several features dispersed and intriguingly embedded in several of the treaties and regulations, quite richly in some of the more recent ones. The cross-application of CSM followed by the hybridized OLR, is a methodological innovation and the generated empirical results of each are resources for further research. The language of international law is revealed as possessing a surprisingly better-than-expected techno-scientific literacy of bioprospecting. Sustainable Development Conceptual Modeling Ontological Legal Research Knowledge Artifact Activity Theory Deep Text Earth and Related Environmental Sciences Geovetenskap och miljövetenskap
55	A Framework for Extraction Plans and Heuristics in an Ontology-Based Data-Extraction System Wessman, Alan E. 26 January 2005 (has links) (PDF) Extraction of information from semi-structured or unstructured documents, such as Web pages, is a useful yet complex task. Research has demonstrated that ontologies may be used to achieve a high degree of accuracy in data extraction while maintaining resiliency in the face of document changes. Ontologies do not, however, diminish the complexity of a data-extraction system. As research in the field progresses, the need for a modular data-extraction system that de-couples the various functional processes involved continues to grow. In this thesis we propose a framework for such a system. The nature of the framework allows new algorithms and ideas to be incorporated into a data extraction system without requiring wholesale rewrites of a large part of the system’s source code. It also allows researchers to focus their attention on parts of the system relevant to their research without having to worry about introducing incompatibilities with the remaining components. We demonstrate the value of the framework by providing a implementation of it, and we show that our implementation is capable of achieving accuracy in its extraction results comparable to that achieved by the legacy BYU-Ontos data-extraction system. We also suggest alternate ways in which the framework may be extended and implemented, and we supply documentation on the framework for future use by data-extraction researchers. data extraction ontology framework extraction plan inference conceptual modeling data frame information extraction OSMX OSM Ontos OntosEngine OntologyEditor Computer Sciences
56	Designing Conventional, Spatial, and Temporal Data Warehouses: Concepts and Methodological Framework Malinowski Gajda, Elzbieta 02 October 2006 (has links) Decision support systems are interactive, computer-based information systems that provide data and analysis tools in order to better assist managers on different levels of organization in the process of decision making. Data warehouses (DWs) have been developed and deployed as an integral part of decision support systems. A data warehouse is a database that allows to store high volume of historical data required for analytical purposes. This data is extracted from operational databases, transformed into a coherent whole, and loaded into a DW during the extraction-transformation-loading (ETL) process. DW data can be dynamically manipulated using on-line analytical processing (OLAP) systems. DW and OLAP systems rely on a multidimensional model that includes measures, dimensions, and hierarchies. Measures are usually numeric additive values that are used for quantitative evaluation of different aspects about organization. Dimensions provide different analysis perspectives while hierarchies allow to analyze measures on different levels of detail. Nevertheless, currently, designers as well as users find difficult to specify multidimensional elements required for analysis. One reason for that is the lack of conceptual models for DW and OLAP system design, which would allow to express data requirements on an abstract level without considering implementation details. Another problem is that many kinds of complex hierarchies arising in real-world situations are not addressed by current DW and OLAP systems. In order to help designers to build conceptual models for decision-support systems and to help users in better understanding the data to be analyzed, in this thesis we propose the MultiDimER model - a conceptual model used for representing multidimensional data for DW and OLAP applications. Our model is mainly based on the existing ER constructs, for example, entity types, attributes, relationship types with their usual semantics, allowing to represent the common concepts of dimensions, hierarchies, and measures. It also includes a conceptual classification of different kinds of hierarchies existing in real-world situations and proposes graphical notations for them. On the other hand, currently users of DW and OLAP systems demand also the inclusion of spatial data, visualization of which allows to reveal patterns that are difficult to discover otherwise. The advantage of using spatial data in the analysis process is widely recognized since it allows to reveal patterns that are difficult to discover otherwise. However, although DWs typically include a spatial or a location dimension, this dimension is usually represented in an alphanumeric format. Furthermore, there is still a lack of a systematic study that analyze the inclusion as well as the management of hierarchies and measures that are represented using spatial data. With the aim of satisfying the growing requirements of decision-making users, we extend the MultiDimER model by allowing to include spatial data in the different elements composing the multidimensional model. The novelty of our contribution lays in the fact that a multidimensional model is seldom used for representing spatial data. To succeed with our proposal, we applied the research achievements in the field of spatial databases to the specific features of a multidimensional model. The spatial extension of a multidimensional model raises several issues, to which we refer in this thesis, such as the influence of different topological relationships between spatial objects forming a hierarchy on the procedures required for measure aggregations, aggregations of spatial measures, the inclusion of spatial measures without the presence of spatial dimensions, among others. Moreover, one of the important characteristics of multidimensional models is the presence of a time dimension for keeping track of changes in measures. However, this dimension cannot be used to model changes in other dimensions. Therefore, usual multidimensional models are not symmetric in the way of representing changes for measures and dimensions. Further, there is still a lack of analysis indicating which concepts already developed for providing temporal support in conventional databases can be applied and be useful for different elements composing a multidimensional model. In order to handle in a similar manner temporal changes to all elements of a multidimensional model, we introduce a temporal extension for the MultiDimER model. This extension is based on the research in the area of temporal databases, which have been successfully used for modeling time-varying information for several decades. We propose the inclusion of different temporal types, such as valid and transaction time, which are obtained from source systems, in addition to the DW loading time generated in DWs. We use this temporal support for a conceptual representation of time-varying dimensions, hierarchies, and measures. We also refer to specific constraints that should be imposed on time-varying hierarchies and to the problem of handling multiple time granularities between source systems and DWs. Furthermore, the design of DWs is not an easy task. It requires to consider all phases from the requirements specification to the final implementation including the ETL process. It should also take into account that the inclusion of different data items in a DW depends on both, users' needs and data availability in source systems. However, currently, designers must rely on their experience due to the lack of a methodological framework that considers above-mentioned aspects. In order to assist developers during the DW design process, we propose a methodology for the design of conventional, spatial, and temporal DWs. We refer to different phases, such as requirements specification, conceptual, logical, and physical modeling. We include three different methods for requirements specification depending on whether users, operational data sources, or both are the driving force in the process of requirement gathering. We show how each method leads to the creation of a conceptual multidimensional model. We also present logical and physical design phases that refer to DW structures and the ETL process. To ensure the correctness of the proposed conceptual models, i.e., with conventional data, with the spatial data, and with time-varying data, we formally define them providing their syntax and semantics. With the aim of assessing the usability of our conceptual model including representation of different kinds of hierarchies as well as spatial and temporal support, we present real-world examples. Pursuing the goal that the proposed conceptual solutions can be implemented, we include their logical representations using relational and object-relational databases. temporal data warehouses spatial data warehouses OLAP hierarchies multidimensional model conceptual modeling data warehouses methodology for data warehouse design spatial OLAP
57	Vers une approche linguistico-cognitive de la polysémie : Représentation de la signification et construction du sens / Towards a cognitive linguistic approach of polysemy : Meaning representation and sense construction Mazaleyrat, Hélène 10 December 2010 (has links) Tout d’abord perçue comme un phénomène marginal, presque un accident en langue, on considère aujourd’hui que la polysémie fait partie intégrante des systèmes linguistiques. De nombreuses théories se sont intéressées au phénomène des unités à sens multiples et reliés. La première partie de notre travail en dresse un panorama non exhaustif mais révélateur, montrant comment et pourquoi la polysémie s’est peu à peu imposée comme un phénomène incontournable qui doit nécessairement être au cœur de tout modèle de la signification. Aussi, à partir de la distinction établie par G. Kleiber (1999), nous considérons deux grands courants selon le rapport établi entre signification, référence et polysémie. Le premier décrit la polysémie en termes de sens premier référentiel dont sont dérivés des sens secondaires (courant objectiviste). Le second l’analyse en termes de potentiel sémantique aréférentiel à partir duquel est obtenu l’ensemble des sens du polysème par spécialisation ou enrichissement contextuel(le) (courant constructiviste). Notre réflexion porte ensuite sur la représentation de la signification des polysèmes – principalement des noms – en grammaire cognitive (R.W. Langacker). Nous postulons que toute expression est associée, dans l’appareil cognitif des locuteurs-auditeurs, à une structure conceptuelle d’informations représentant sa signification. Nous proposons une modélisation en réseau structuré autour de valeurs sémantiques plus ou moins schématiques et de sens élaborés. Ainsi, c’est la valeur la plus schématique qui permet de faire le lien en langue entre ses élaborations que sont les sens observables en discours. Sur la base des travaux de D. Tuggy (1993), nous déclinons les représentations de la signification des mots à sens multiples le long d’un continuum homonymie-polysémie-multifacialité-indétermination, selon les degrés d’enracinement, de saillance, et les possibilités d’accessibilité et d’activation des différents composants (valeur schématique et élaborations sémantiques). Et, nous mettons ainsi en avant certaines des régularités organisatrices propres aux représentations sémanticoconceptuelles des polysèmes nominaux, ainsi qu’une typologie des sens polysémiques. Nous abordons enfin la construction du sens en grammaire cognitive, notamment l’influence du contexte dans l’interprétation d’expressions complexes comportant un polysème. Ainsi, nous considérons qu’il s’agit d’un processus non modulaire, compositionnel et dynamique. L’analyse de syntagmes nominaux du type Adj-N et N-Adj révèle en outre certaines régularités dans l’activation des sens polysémiques des unités linguistiques mises en jeu, liées au cotexte (place et fonction de l’adjectif par rapport au substantif recteur) et au contexte extralinguistique / For a long time, polysemy used to be considered as a marginal or accidental phenomenon in language. Where as today, it is well known that polysemy is being part of linguistic systems. The first part of our thesis draws up a panorama of semantic theories dealing with polysemy. Although it is not exhaustive, it reveals how and why that phenomenon has become a problematic of the utmost significance in linguistics. From the distinction established by G. Kleiber (1999), we consider two major trends in accordance with the way they conceive the link between meaning, reference and polysemy. On one hand, polysemy is described in terms of one basic referential sense from which secondary senses derive (objectivism). On the other hand, polysemy is analyzed as an areferential semantic potential from which senses emerge by contextual mechanisms (constructivism). About the question of the meaning representation of polysems, we postulate that linguistic unities are associated with a structure of pieces of conceptual information into the mind of speakers-hearers, so that it is possible to elaborate of conceptual modeling of it. In the framework of Cognitive Grammar (R.W. Langacker), the structure is a network constituted of semantic values, which are more or less schematic, and of elaborated senses stemmed from them. The most schematic meaning corresponds to the linguistic conceptual link between its instantiations. Some elaborations are the senses which can be constructed in discourse. On the basis of D. Tuggy’s works (1993), we propose to organize the conceptual modelings of multiple meanings words along a continuum homonymy-polysemy-multifaciality-vagueness, in function of various parameters : entrenchment, cognitive salience, possibility of accessibility and of activation of the network components (schematic or elaborated values). So, we can highlight some organizational regularities specific to the semantic representation of polysems as well as a typology of polysemous senses. The third and last part of our thesis is dedicated to sense construction. In Cognitive Grammar, it is a non modulary, compositional and dynamic process. Focusing especially on the impact of context on the interpretation of complex expressions containing a polysem, the analysis of Adj-N and N-Adj noun phrases puts to the fore some regularities governing the activation of polysemic senses. These regularities are linked to the linguistic context (position and function of the adjective towards the qualified substantive) and to the extra-linguistic context Polysémie Représentation de la signification Modélisation conceptuelle Construction du sens Contexte Noms et adjectifs Grammaire cognitive Polysemy Meaning representation Conceptual modeling Sense construction Context Nouns and adjectives Cognitive Grammar
58	Análise arquitetural, ontológica e proposta de modelo de referência para a Recomendação ITU-T G.805 Barcelos, Pedro Paulo Favato 07 April 2011 (has links) Made available in DSpace on 2016-12-23T14:07:26Z (GMT). No. of bitstreams: 1 Pedro Paulo Favato Barcelos Cap 1 a 5.pdf: 1996678 bytes, checksum: 06570d7feadc8f768039f34ccc71400a (MD5) Previous issue date: 2011-04-07 / A recomendação ITU-T G.805 (ITU-T, 2000) é uma importante recomendação para redes de transporte, pois descreve uma arquitetura funcional genérica independente de tecnologias para este domínio e é usada como base para outras recomendações que descrevem a arquitetura funcional de redes, a gerência, a avaliação de desempenho e a especificação funcional de equipamentos. Apesar de fornecer uma ferramenta ágil para a descrição da arquitetura, a apresentação dos conceitos é feita de forma textual, gerando confusão por conta de definições recursivas e exemplos não claros, que muitas vezes até mesmo se contradizem. Esses aspectos da recomendação a torna de difícil entendimento, podendo confundir o leitor. É importante que, devido sua fundamental relevância, essa recomendação seja livre desses problemas. Para tal, é proposta nesta dissertação a utilização de técnicas de modelagem conceitual baseadas em ontologias para a geração de um modelo de referência para a área de redes de transporte, a partir da Recomendação ITU-T G.805. Além dos principais conceitos da recomendação são também apresentadas as vantagens da criação de um modelo de referência em ontologias e as principais tecnologias utilizadas para este objetivo. São realizadas uma análise arquitetural e uma reestruturação dos componentes definidos pela recomendação e uma avaliação ontológica da mesma, verificando casos de incompletudes, ambiguidades e outras deficiências ontológicas e apontando soluções. Por fim, é apresentado o modelo de referência em ontologia desenvolvido para a Recomendação ITU-T G.805, incluindo o modelo conceitual e suas regras de derivação e de restrição / The ITU-T Recommendation G.805 (ITU-T, 2000) is an important recommendation for transport networks. It describes a generic functional architecture that is independent of technology for this domain and it is used as the basis for recommendations that describe the functional architecture of networks, management, performance analysis and functional specification of equipment. Despite providing a flexible tool for the architecture description, the recommendation presents its concepts textually, leading to confusion because of recursive definitions and unclear examples that are often contradictory. These aspects of the recommendation make it difficult to understand and may confuse the reader. It is important that, due to its fundamental importance, this recommendation is free from these problems. For this purpose, this work proposes the use of ontology-based conceptual modeling techniques for the generation of a reference model for the transport network domain, based on the ITU-T Recommendation G.805. In addition to the recommendation main concepts, the advantages of creating an ontology-based reference model and the main technologies used for this purpose are also presented. An architectural analysis and a restructuring of the components defined by the recommendation are performed together with an ontological evaluation of it. Cases of incompleteness, ambiguities and other deficiencies are checked and solutions are pointed. Finally, the ontology-based reference model developed for the ITU-T Recommendation G.805 is presented, including the conceptual model and its derivation and restrictions rules Ontologia Recomendação ITU-T G.805 Modelagem conceitual Ontology ITU-T Recommendation G.805 Conceptual modeling
59	Diretrizes metodológicas e validação estatística de dados para a construção de data warehouses / Methodological guidelines and statistical data validation for the construction of data warehouses Takecian, Pedro Losco 14 August 2014 (has links) Os sistemas de integração de dados que usam a arquitetura de data warehouse (DW) têm se tornado cada vez maiores e mais difíceis de gerenciar devido à crescente heterogeneidade das fontes de dados envolvidas. Apesar dos avanços tecnológicos e científicos, os projetos de DW ainda são muito lentos na geração de resultados pragmáticos. Este trabalho busca responder à seguinte questão: como pode ser reduzida a complexidade do desenvolvimento de sistemas de DW que integram dados provenientes de sistemas transacionais heterogêneos? Para isso, apresenta duas contribuições: 1) A criação de diretrizes metodológicas baseadas em ciclos de modelagem conceitual e análise de dados para guiar a construção de um sistema modular de integração de dados. Essas diretrizes foram fundamentais para reduzir a complexidade do desenvolvimento do projeto internacional Retrovirus Epidemiology Donor Study-II (REDS-II), se mostrando adequadas para serem aplicadas em sistemas reais. 2) O desenvolvimento de um método de validação de lotes de dados candidatos a serem incorporados a um sistema integrador, que toma decisões baseado no perfil estatístico desses lotes, e de um projeto de sistema que viabiliza o uso desse método no contexto de sistemas de DW. / Data integration systems that use data warehouse (DW) architecture are becoming bigger and more difficult to manage due to the growing heterogeneity of data sources. Despite the significant advances in research and technologies, many integration projects are still too slow to generate pragmatic results. This work addresses the following question: how can the complexity of DW development for integration of heterogeneous transactional information systems be reduced? For this purpose, we present two contributions: 1) The establishment of methodological guidelines based on cycles of conceptual modeling and data analysis to drive construction of a modular data integration system. These guidelines were fundamental for reducing the development complexity of the international project Retrovirus Epidemiology Donor Study-II (REDS-II), proving suited to be applied in real systems. 2) The development of a validation method of data batches that are candidates to be incorporated into an integration system, which makes decisions based on the statistical profile of these batches, and a project of a system that enables the use of this method in DW systems context. análise de dados aprendizado de máquina arquitetura modular conceptual modeling data analysis data validation data warehouse data warehouse machine learning modelagem conceitual modular architecture validação de dados
60	Un modèle de données pour bibliothèques numériques / A data model for digital libraries Yang, Jitao 30 May 2012 (has links) Les bibliothèques numériques sont des systèmes d'information complexes stockant des ressources numériques (par exemple, texte, images, sons, audio), ainsi que des informations sur les ressources numériques ou non-numériques; ces informations sont appelées des métadonnées. Nous proposons un modèle de données pour les bibliothèques numériques permettant l'identification des ressources, l’utilisation de métadonnées et la réutilisation des ressources stockées, ainsi qu’un langage de requêtes pour l’interrogation de ressources. Le modèle que nous proposons est inspiré par l'architecture du Web, qui forme une base solide et universellement acceptée pour les notions et les services attendus d'une bibliothèque numérique. Nous formalisons notre modèle comme une théorie du premier ordre, afin d’exprimer les concepts de bases de la bibliothèque numérique, sans aucune contrainte technique. Les axiomes de la théorie donnent la sémantique formelle des notions du modèle, et en même temps fournissent une définition de la connaissance qui est implicite dans une bibliothèque numérique. La théorie est traduite en un programme Datalog qui, étant donnée une bibliothèque numérique, permet de la compléter efficacement avec les connaissances implicites. Le but de notre travail est de contribuer à la technologie de gestion des informations des bibliothèques numériques. De cette façon, nous pouvons montrer la faisabilité théorique de notre modèle, en montrant qu'il peut être efficacement appliqué. En outre, nous démontrons la faisabilité pratique du modèle en fournissant une traduction complète du modèle en RDF et du langage de requêtes en SPARQL.Nous fournissons un calcul sain et complet pour raisonner sur les graphes RDF résultant de la traduction. Selon ce calcul, nous prouvons la correction de ces deux traductions, montrant que les fonctions de traduction préservent la sémantique de la bibliothèque numérique et de son langage de requêtes. / Digital Libraries are complex information systems, storing digital resources (e.g., text, images, sound, audio), as well as knowledge about digital or non-digital resources; this knowledge is referred to as metadata. We propose a data model for digital libraries supporting resource identification, use of metadata and re-use of stored resources, as well as a query language supporting discovery of resources. The model that we propose is inspired by the architecture of the Web, which forms a solid, universally accepted basis for the notions and services expected from a digital library. We formalize our model as a first-order theory, in order to be able to express the basic concepts of digital libraries without being constrained by any technical considerations. The axioms of the theory give the formal semantics of the notions of the model, and at the same time, provide a definition of the knowledge that is implicit in a digital library. The theory is then translated into a Datalog program that, given a digital library, allows to efficiently complete the digital library with the knowledge implicit in it. The goal of our research is to contribute to the information management technology of digital libraries. In this way, we are able to demonstrate the theoretical feasibility of our digital library model, by showing that it can be efficiently implemented. Moreover, we demonstrate our model’s practical feasibility by providing a full translation of the model into RDF and of the query language into SPARQL. We provide a sound and complete calculus for reasoning on the RDF graphs resulting from translation. Based on this calculus, we prove the correctness of both translations, showing that the translation functions preserve the semantics of the digital library and of the query language. Bibliothèques numériques Modélisation conceptuelle Logique du premier ordre Datalog Architecture Web RDF SPARQL Digital Libraries Conceptual Modeling First-order Logic Datalog Web architecture RDF SPARQL

Search results