81 |
Genômica translacional: integrando dados clínicos e biomoleculares / Translational genomics: integrating clinical and biomolecular dataMiyoshi, Newton Shydeo Brandão 06 February 2013 (has links)
A utilização do conhecimento científico para promoção da saúde humana é o principal objetivo da ciência translacional. Para que isto seja possível, faz-se necessário o desenvolvimento de métodos computacionais capazes de lidar com o grande volume e com a heterogeneidade da informação gerada no caminho entre a bancada e a prática clínica. Uma barreira computacional a ser vencida é o gerenciamento e a integração dos dados clínicos, sócio-demográficos e biológicos. Neste esforço, as ontologias desempenham um papel essencial, por serem um poderoso artefato para representação do conhecimento. Ferramentas para gerenciamento e armazenamento de dados clínicos na área da ciência translacional que têm sido desenvolvidas, via de regra falham por não permitir a representação de dados biológicos ou por não oferecer uma integração com as ferramentas de bioinformática. Na área da genômica existem diversos modelos de bancos de dados biológicos (tais como AceDB e Ensembl), os quais servem de base para a construção de ferramentas computacionais para análise genômica de uma forma independente do organismo de estudo. Chado é um modelo de banco de dados biológicos orientado a ontologias, que tem ganhado popularidade devido a sua robustez e flexibilidade, enquanto plataforma genérica para dados biomoleculares. Porém, tanto Chado quanto os outros modelos de banco de dados biológicos não estão preparados para representar a informação clínica de pacientes. Este projeto de mestrado propõe a implementação e validação prática de um framework para integração de dados, com o objetivo de auxiliar a pesquisa translacional integrando dados biomoleculares provenientes das diferentes tecnologias omics com dados clínicos e sócio-demográficos de pacientes. A instanciação deste framework resultou em uma ferramenta denominada IPTrans (Integrative Platform for Translational Research), que tem o Chado como modelo de dados genômicos e uma ontologia como referência. Chado foi estendido para permitir a representação da informação clínica por meio de um novo Módulo Clínico, que utiliza a estrutura de dados entidade-atributo-valor. Foi desenvolvido um pipeline para migração de dados de fontes heterogêneas de informação para o banco de dados integrado. O framework foi validado com dados clínicos provenientes de um Hospital Escola e de um banco de dados biomoleculares para pesquisa de pacientes com câncer de cabeça e pescoço, assim como informações de experimentos de microarray realizados para estes pacientes. Os principais requisitos almejados para o framework foram flexibilidade, robustez e generalidade. A validação realizada mostrou que o sistema proposto satisfaz as premissas, levando à integração necessária para a realização de análises e comparações dos dados. / The use of scientific knowledge to promote human health is the main goal of translational science. To make this possible, it is necessary to develop computational methods capable of dealing with the large volume and heterogeneity of information generated on the road between bench and clinical practice. A computational barrier to be overcome is the management and integration of clinical, biological and socio-demographics data. In this effort, ontologies play a crucial role, being a powerful artifact for knowledge representation. Tools for managing and storing clinical data in the area of translational science that have been developed, usually fail due to the lack on representing biological data or not offering integration with bioinformatics tools. In the field of genomics there are many different biological databases (such as AceDB and Ensembl), which are the basis for the construction of computational tools for genomic analysis in an organism independent way. Chado is a ontology-oriented biological database model which has gained popularity due to its robustness and flexibility, as a generic platform for biomolecular data. However, both Chado as other models of biological databases are not prepared to represent the clinical information of patients. This project consists in the proposal, implementation and validation of a practical framework for data integration, aiming to help translational research integrating data coming from different omics technologies with clinical and socio-demographic characteristics of patients. The instantiation of the designed framework resulted in a computational tool called IPTrans (Integrative Platform for Translational Research), which has Chado as template for genomic data and uses an ontology reference. Chado was extended to allow the representation of clinical information through a new Clinical Module, which uses the data structure entity-attribute-value. We developed a pipeline for migrating data from heterogeneous sources of information for the integrated database. The framework was validated with clinical data from a School Hospital and a database for biomolecular research of patients with head and neck cancer. The main requirements were targeted for the framework flexibility, robustness and generality. The validation showed that the proposed system satisfies the assumptions leading to integration required for the analysis and comparisons of data.
|
82 |
The assembly of island floras from a macroecological perspective.König, Christian 25 October 2018 (has links)
No description available.
|
83 |
Understanding disease and disease relationships using transcriptomic dataOerton, Erin January 2019 (has links)
As the volume of transcriptomic data continues to increase, so too does its potential to deepen our understanding of disease; for example, by revealing gene expression patterns shared between diseases. However, key questions remain around the strength of the transcriptomic signal of disease and the identification of meaningful commonalities between datasets, which are addressed in this thesis as follows. The first chapter, Concordance of Microarray Studies of Parkinson's Disease, examines the agreement between differential expression signatures across 33 studies of Parkinson's disease. Comparison of these studies, which cover a range of microarray platforms, tissues, and disease models, reveals a characteristic pattern of differential expression in the most highly-affected tissues in human patients. Using correlation and clustering analyses to measure the representativeness of different study designs to human disease, the work described acts as a guideline for the comparison of microarray studies in the following chapters. In the next chapter, Using Dysregulated Signalling Paths to Understand Disease, gene expression changes are linked on the human signalling network, enabling identification of network regions dysregulated in disease. Applying this method across a large dataset of 141 common and rare diseases identifies dysregulated processes shared between diverse conditions, which relate to known disease- and drug-sharing-relationships. The final chapter, Understanding and Predicting Disease Relationships Through Similarity Fusion, explores the integration of gene expression with other data types - in this case, ontological, phenotypic, literature co-occurrence, genetic, and drug data - to understand relationships between diseases. A similarity fusion approach is proposed to overcome the differences in data type properties between each space, resulting in the identification of novel disease relationships spanning multiple bioinformatic levels. The similarity of disease relationships between each data type is considered, revealing that relationships in differential expression space are distinct from those in other molecular and clinical spaces. In summary, the work described in this thesis sets out a framework for the comparative analysis of transcriptomic data in disease, including the integration of biological networks and other bioinformatic data types, in order to further our knowledge of diseases and the relationships between them.
|
84 |
Une approche sémantique pour l’exploitation de données environnementales : application aux données d’un observatoire / A semantic-based approach to exploit environmental data : application to an observatory’s dataTran, Ba Huy 23 November 2017 (has links)
La nécessité de collecter des observations sur une longue durée pour la recherche sur des questions environnementales a entrainé la mise en place de Zones Ateliers par le CNRS. Ainsi, depuis plusieurs années, de nombreuses bases de données à caractère spatio-temporel sont collectées par différentes équipes de chercheurs. Afin de faciliter les analyses transversales entre différentes observations, il est souhaitable de croiser les informations provenant de ces sources de données. Néanmoins, chacune de ces sources est souvent construite de manière indépendante de l'une à l'autre, ce qui pose des problèmes dans l'analyse et l'exploitation. De ce fait, cette thèse se propose d'étudier les potentialités des ontologies à la fois comme objets de modélisation, d'inférence, et d'interopérabilité. L'objectif est de fournir aux experts du domaine une méthode adaptée permettant d'exploiter l'ensemble de données collectées. Étant appliquées dans le domaine environnemental, les ontologies doivent prendre en compte des caractéristiques spatio-temporelles de ces données. Vu le besoin d'une modélisation des concepts et des opérateurs spatiaux et temporaux, nous nous appuyons sur la solution de réutilisation des ontologies de temps et de l'espace. Ensuite, une approche d'intégration de données spatio-temporelles accompagnée d'un mécanisme de raisonnement sur leurs relations a été introduite. Enfin, les méthodes de fouille de données ont été adoptées aux données spatio-temporelles sémantiques pour découvrir de nouvelles connaissances à partir de la base de connaissances. L'approche a ensuite été mise en application au sein du prototype Geminat qui a pour but d'aider à comprendre les pratiques agricoles et leurs relations avec la biodiversité dans la zone atelier Plaine et Val de Sèvre. De l'intégration de données à l'analyse de connaissances, celui-ci offre les éléments nécessaires pour exploiter des données spatio-temporelles hétérogènes ainsi qu'en extraire de nouvelles connaissances. / The need to collect long-term observations for research on environmental issues led to the establishment of "Zones Ateliers" by the CNRS. Thus, for several years, many databases of a spatio-temporal nature are collected by different teams of researchers. To facilitate transversal analysis of different observations, it is desirable to cross-reference information from these data sources. Nevertheless, these sources are constructed independently of each other, which raise problems of data heterogeneity in the analysis.Therefore, this thesis proposes to study the potentialities of ontologies as both objects of modeling, inference, and interoperability. The aim is to provide experts in the field with a suitable method for exploiting heterogeneous data. Being applied in the environmental domain, ontologies must take into account the spatio-temporal characteristics of these data. As the need for modeling concepts and spatial and temporal operators, we rely on the solution of reusing the ontologies of time and space. Then, a spatial-temporal data integration approach with a reasoning mechanism on the relations of these data has been introduced. Finally, data mining methods have been adapted to spatio-temporal RDF data to discover new knowledge from the knowledge-base. The approach was then applied within the Geminat prototype, which aims to help understand farming practices and their relationships with the biodiversity in the "zone atelier Plaine and Val de Sèvre". From data integration to knowledge analysis, it provides the necessary elements to exploit heterogeneous spatio-temporal data as well as to discover new knowledge.
|
85 |
Evangelist Marketing of the CloverETL Software / Evangelist Marketing of the CloverETL SoftwareŠtýs, Miroslav January 2011 (has links)
The Evangelist Marketing of the CloverETL Software diploma thesis aims at proposing a new marketing strategy for an ETL tool - CloverETL. Theoretical part comprises chapters two and three. In chapter two, the thesis attempts to cover the ETL term, which - as a separate component of the Business Intelligence architecture - is not given much space in literature. Chapter three introduces evangelist marketing, explains its origins and best practices. Practical part involves introducing the Javlin, a.s. company and its CloverETL software product. After assessing the current marketing strategy, proposal of a new strategy follows. The new strategy is built on evangelist marketing pillars. Finally, benefits of the new approach are discussed looking at stats and data - mostly Google Analytics outputs.
|
86 |
Towards developing a goal-driven data integration framework for counter-terrorism analyticsLiu, Dapeng 01 January 2019 (has links)
Terrorist attacks can cause massive casualties and severe property damage, resulting in terrorism crises surging across the world; accordingly, counter-terrorism analytics that take advantage of big data have been attracting increasing attention. The knowledge and clues essential for analyzing terrorist activities are often spread across heterogeneous data sources, which calls for an effective data integration solution. In this study, employing the goal definition template in the Goal-Question-Metric approach, we design and implement an automated goal-driven data integration framework for counter-terrorism analytics. The proposed design elicits and ontologizes an input user goal of counter-terrorism analytics; recognizes goal-relevant datasets; and addresses semantic heterogeneity in the recognized datasets. Our proposed design, following the design science methodology, presents a theoretical framing for on-demand data integration designs that can accommodate diverse and dynamic user goals of counter-terrorism analytics and output integrated data tailored to these goals.
|
87 |
Classificação taxonômica de sequências obtidas com meta-ômicas por meio de integração de dados / Taxonomic classification of sequences obtained with meta-omics by data integrationLima, Felipe Prata 20 August 2019 (has links)
Comunidades microbianas possuem papéis importantes em processos que ocorrem em diversos ambientes, tais como solos, oceanos e o trato gastrointestinal dos seres humanos. Portanto, é de interesse a compreensão da estrutura e do funcionamento dessas comunidades. A estrutura dessas comunidades, em termos de organismos componentes, pode ser determinada com o uso do sequenciamento de nova geração em conjunto com as técnicas meta-ômicas e pela análise taxonômica das sequências obtidas com programas de classificação taxonômica. Se por um lado diversos programas estão disponíveis, por outro lado eles cometem erros, como a identificação parcial dos organismos presentes na amostra e a identificação de organismos que não estão presentes na amostra (os falsos positivos - FPs). Algumas abordagens foram propostas para a melhoria das classificações taxonômicas obtidas por esses programas com a redução desses FPs, porém elas abordam apenas um tipo de meta-ômica, a metagenômica. Neste trabalho, propomos uma nova abordagem através da integração de diferentes meta-ômicas - metagenômicas shotgun e de amplicons de 16S, e metatranscritômica. Exploramos os resultados de classificações de dados simulados e mocks para a extração de variáveis e desenvolvemos modelos de classificação para discriminação de predições de espécies de bactérias classificadas como corretas ou incorretas. Comparamos o desempenho dos resultados obtidos entre as meta-ômicas individuais e os obtidos através da integração observando o balanceamento entre a precisão e a sensibilidade. De acordo com as medidas calculadas com nossos conjuntos de dados, nossa abordagem demonstrou melhorias na classificação com a redução de FPs e aumentos para a medida F1, quando comparada com abordagens não integrativas, inclusive com o uso de métodos de combinação de classificadores. Para facilitar seu uso, desenvolvemos o Gunga, uma ferramenta que incorpora a abordagem desenvolvida em formato de pacote do R, com funcionalidades para a integração de dados de classificação taxonômica com diferentes meta-ômicas e a classificação das predições incorretas. / Microbial communities play important roles in processes that occur in diverse environments, such as soils, oceans, and the gastrointestinal tract of humans. Therefore, it is of interest to understand the structure and functioning of these communities. The structure of these communities, in terms of component organisms, can be determined by the use of the next generation sequencing in conjunction with the meta-omics techniques and by the taxonomic analysis of the sequences obtained with taxonomic classification programs. If on the one hand several programs are available, on the other hand they make mistakes, such as the partial identification of the organisms present in the sample and the identification of organisms that are not present in the sample (the false positives - FPs). Some approaches have been proposed to improve the taxonomic classifications obtained by these programs with the reduction of these FPs, but they address only one type of meta-omics, the metagenomics. In this work, we propose a new approach by integrating different meta-omics - shotgun and 16S amplicon metagenomics, and metatranscriptomics. We explored the classifications results of simulated data and mocks for variable extraction and developed classification models for discriminating predictions of bacterial species classified as correct or incorrect. We compared the performance of the results obtained between the individual meta-omics and the obtained through the integration observing the balance between precision and sensitivity. According to the measures calculated with our data sets, our approach has shown improvements in the classification with the reduction of the FPs and increases for the F1 measure, when compared to non-integrative approaches, including the use of classifiers combination methods. To facilitate its use, we developed the Gunga, a tool that incorporates the developed approach in R package format, with features for the integration of taxonomic classification data with different meta-omics and the classification of the incorrect predictions.
|
88 |
Supporting Scientific Collaboration through Workflows and ProvenanceEllqvist, Tommy January 2010 (has links)
<p>Science is changing. Computers, fast communication, and new technologies have created new ways of conducting research. For instance, researchers from different disciplines are processing and analyzing scientific data that is increasing at an exponential rate. This kind of research requires that the scientists have access to tools that can handle huge amounts of data, enable access to vast computational resources, and support the collaboration of large teams of scientists. This thesis focuses on tools that help support scientific collaboration.</p><p>Workflows and provenance are two concepts that have proven useful in supporting scientific collaboration. Workflows provide a formal specification of scientific experiments, and provenance offers a model for documenting data and process dependencies. Together, they enable the creation of tools that can support collaboration through the whole scientific life-cycle, from specification of experiments to validation of results. However, existing models for workflows and provenance are often specific to particular tasks and tools. This makes it hard to analyze the history of data that has been generated over several application areas by different tools. Moreover, workflow design is a time-consuming process and often requires extensive knowledge of the tools involved and collaboration with researchers with different expertise. This thesis addresses these problems.</p><p>Our first contribution is a study of the differences between two approaches to interoperability between provenance models: direct data conversion, and mediation. We perform a case study where we integrate three different provenance models using the mediation approach, and show the advantages compared to data conversion. Our second contribution serves to support workflow design by allowing multiple users to concurrently design workflows. Current workflow tools lack the ability for users to work simultaneously on the same workflow. We propose a method that uses the provenance of workflow evolution to enable real-time collaborative design of workflows. Our third contribution considers supporting workflow design by reusing existing workflows. Workflow collections for reuse are available, but more efficient methods for generating summaries of search results are still needed. We explore new summarization strategies that considers the workflow structure.</p><p><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABgAAAAYCAYAAADgdz34AAADsElEQVR4nK2VTW9VVRSGn33OPgWpYLARbKWhQlCHTogoSkjEkQwclEQcNJEwlfgD/AM6NBo1xjhx5LyJ0cYEDHGkJqhtBGKUpm3SFii3vb2956wPB/t+9raEgSs52fuus89613rftdcNH8/c9q9++oe/Vzb5P+3McyNcfm2CcPj9af9w6gwjTwzvethx3Bx3x8xwd1wNM8dMcTNUHTfFLPnX6nVmZpeIYwf3cWD/PhbrvlPkblAzVFurKS6GmmGqqComaS+qmBoTI0Ncu3mXuGvWnrJ+ZSxweDgnkHf8ndVTdbiT3M7cQp2Z31dRTecHAfqydp4ejhwazh6Zezfnu98E1WIQwB3crEuJ2Y45PBTAQUVR9X4At66AppoEVO1Q8sgAOKJJjw6Am6OquDmvHskZ3R87gW+vlHz98zpmiqphkkRVbQtsfPTOC30lJKFbFTgp83bWh7Zx/uX1B6w3hI3NkkZTqEpBRDBRzG2AQHcwcYwEkOGkTERREbLQ/8HxJwuW7zdYrzfZ2iopy4qqEspKaDYravVm33k1R91Q69FA1VBRzFIVvXbx5AgXT44A8MWP81yfu0utIR2aVK3vfCnGrcUNxp8a7gKYKiLCvY2SUvo/aNtnM3e49ucK9S3p0aDdaT0UAVsKi2tVi6IWwNL9JvdqTdihaz79/l+u/rHMxmaJVMLkS2OoKKLWacdeE3IsSxctc2D5Qcl6vUlVVgNt+fkPPcFFmTw1xruvT7SCd7nuVhDQvECzJH90h0azRKoKFRkAmP5lKTWAGRdefoZL554FQNUxB92WvYeA5UN4PtSqwB2phKqsqMpBgAunRhFR3j49zuU3jnX8k6fHEQKXzh1jbmGDuYU6s4t1rt6socUeLLZHhYO2AHSHmzt19ihTZ48O8Hzl/AmunD/BjTvrvPfNX3hWsNpwJCvwYm+ngug4UilSCSq6k8YPtxDwfA+WRawIWFbgscDiULcCEaWqBFOlrLazurupOSHLqGnEKJAY8TwBEHumqUirAjNm52vEPPRV4p01XXMPAQhUBjcWm9QZwijwokgAeYHlHYA06KR1cT6ZvoV56pDUJQEjw0KeaMgj1hPEY4vz2A4eW0/e1qA7KtQdsxTYAG0H3iG4xyK1Y+xm7XmEPOJZDiENzLi2WZHngeOjj2Pe+sMg4GRYyLAsx7ME4FnsyTD9pr0PEc8zPGRAwKXBkYOPEd96cZRvf11g9MDe7e3R4Z4Q+vyEnn3P4t0XzK/W+ODN5/kPfRLewAJVEQ0AAAAASUVORK5CYII%3D" /></p>
|
89 |
Privacy-Preserving Data Integration in Public Health SurveillanceHu, Jun 16 May 2011 (has links)
With widespread use of the Internet, data is often shared between organizations in B2B health care networks. Integrating data across all sources in a health care network would be useful to public health surveillance and provide a complete view of how the overall network is performing. Because of the lack of standardization for a common data model across organizations, matching identities between different locations in order to link and aggregate records is difficult. Moreover, privacy legislation controls the use of personal information, and health care data is very sensitive in nature so the protection of data privacy and prevention of personal health information leaks is more important than ever. Throughout the process of integrating data sets from different organizations, consent (explicitly or implicitly) and/or permission to use must be in place, data sets must be de-identified, and identity must be protected. Furthermore, one must ensure that combining data sets from different data sources into a single consolidated data set does not create data that may be potentially re-identified even when only summary data records are created.
In this thesis, we propose new privacy preserving data integration protocols for public health surveillance, identify a set of privacy preserving data integration patterns, and propose a supporting framework that combines a methodology and architecture with which to implement these protocols in practice. Our work is validated with two real world case studies that were developed in partnership with two different public health surveillance organizations.
|
90 |
Best effort query answering for mediators with union viewsPapri, Rowshon Jahan 07 1900 (has links)
Consider an SQL query that involves joins of several relations, optionally followed by selections and/or projections. It can be represented by a conjunctive datalog query Q without negation or arithmetic subgoals. We consider the problem of answering such a query Q using a mediator M. For each relation R that corresponds to a subgoal in Q, M contains several sources; each source for R provides some of the tuples in R. The capability of each source are described in terms of templates. It might not be possible to get all the tuples in the result, Result(Q), using M, due to restrictions imposed by the templates. We consider best-effort query answering: Find as many tuples in Result(Q) as possible. We present an algorithm to determine if Q can be so answered using M. / Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Electrical Engineering and Computer Science.
|
Page generated in 0.2002 seconds