Global ETD Search

301	Integration and analysis of phenotypic data from functional screens Paszkowski-Rogacz, Maciej 10 January 2011 (has links) (PDF) Motivation: Although various high-throughput technologies provide a lot of valuable information, each of them is giving an insight into different aspects of cellular activity and each has its own limitations. Thus, a complete and systematic understanding of the cellular machinery can be achieved only by a combined analysis of results coming from different approaches. However, methods and tools for integration and analysis of heterogenous biological data still have to be developed. Results: This work presents systemic analysis of basic cellular processes, i.e. cell viability and cell cycle, as well as embryonic stem cell pluripotency and differentiation. These phenomena were studied using several high-throughput technologies, whose combined results were analysed with existing and novel clustering and hit selection algorithms. This thesis also introduces two novel data management and data analysis tools. The first, called DSViewer, is a database application designed for integrating and querying results coming from various genome-wide experiments. The second, named PhenoFam, is an application performing gene set enrichment analysis by employing structural and functional information on families of protein domains as annotation terms. Both programs are accessible through a web interface. Conclusions: Eventually, investigations presented in this work provide the research community with novel and markedly improved repertoire of computational tools and methods that facilitate the systematic analysis of accumulated information obtained from high-throughput studies into novel biological insights. Datenintegration Datenanalyse Bioinformatik Systembiologie Hochdurchsatz-Screening RNA-Interferenz data integration data analysis bioinformatics systems biology high-throughput screening RNA interference ddc:576 ddc:005 ddc:570 rvk:WC 7700 Datenintegration Datenanalyse Bioinformatik Hochdurchsatz-Screening RNA-Interferenz
302	Geostatistical three-dimensional modeling of the subsurface unconsolidated materials in the Göttingen area / The transitional-probability Markov chain versus traditional indicator methods for modeling the geotechnical categories in a test site. Ranjineh Khojasteh, Enayatollah 27 June 2013 (has links) Das Ziel der vorliegenden Arbeit war die Erstellung eines dreidimensionalen Untergrundmodells der Region Göttingen basierend auf einer geotechnischen Klassifikation der unkosolidierten Sedimente. Die untersuchten Materialen reichen von Lockersedimenten bis hin zu Festgesteinen, werden jedoch in der vorliegenden Arbeit als Boden, Bodenklassen bzw. Bodenkategorien bezeichnet. Diese Studie evaluiert verschiedene Möglichkeiten durch geostatistische Methoden und Simulationen heterogene Untergründe zu erfassen. Derartige Modellierungen stellen ein fundamentales Hilfswerkzeug u.a. in der Geotechnik, im Bergbau, der Ölprospektion sowie in der Hydrogeologie dar. Eine detaillierte Modellierung der benötigten kontinuierlichen Parameter wie z. B. der Porosität, der Permeabilität oder hydraulischen Leitfähigkeit des Untergrundes setzt eine exakte Bestimmung der Grenzen von Fazies- und Bodenkategorien voraus. Der Fokus dieser Arbeit liegt auf der dreidimensionalen Modellierung von Lockergesteinen und deren Klassifikation basierend auf entsprechend geostatistisch ermittelten Kennwerten. Als Methoden wurden konventionelle, pixelbasierende sowie übergangswahrscheinlichkeitsbasierende Markov-Ketten Modelle verwendet. Nach einer generellen statistischen Auswertung der Parameter wird das Vorhandensein bzw. Fehlen einer Bodenkategorie entlang der Bohrlöcher durch Indikatorparameter beschrieben. Der Indikator einer Kategorie eines Probepunkts ist eins wenn die Kategorie vorhanden ist bzw. null wenn sie nicht vorhanden ist. Zwischenstadien können ebenfalls definiert werden. Beispielsweise wird ein Wert von 0.5 definiert falls zwei Kategorien vorhanden sind, der genauen Anteil jedoch nicht näher bekannt ist. Um die stationären Eigenschaften der Indikatorvariablen zu verbessern, werden die initialen Koordinaten in ein neues System, proportional zur Ober- bzw. Unterseite der entsprechenden Modellschicht, transformiert. Im neuen Koordinatenraum werden die entsprechenden Indikatorvariogramme für jede Kategorie für verschiedene Raumrichtungen berechnet. Semi-Variogramme werden in dieser Arbeit, zur besseren Übersicht, ebenfalls als Variogramme bezeichnet. IV Durch ein Indikatorkriging wird die Wahrscheinlichkeit jeder Kategorie an einem Modellknoten berechnet. Basierend auf den berechneten Wahrscheinlichkeiten für die Existenz einer Modellkategorie im vorherigen Schritt wird die wahrscheinlichste Kategorie dem Knoten zugeordnet. Die verwendeten Indikator-Variogramm Modelle und Indikatorkriging Parameter wurden validiert und optimiert. Die Reduktion der Modellknoten und die Auswirkung auf die Präzision des Modells wurden ebenfalls untersucht. Um kleinskalige Variationen der Kategorien auflösen zu können, wurden die entwickelten Methoden angewendet und verglichen. Als Simulationsmethoden wurden "Sequential Indicator Simulation" (SISIM) und der "Transition Probability Markov Chain" (TP/MC) verwendet. Die durchgeführten Studien zeigen, dass die TP/MC Methode generell gute Ergebnisse liefert, insbesondere im Vergleich zur SISIM Methode. Vergleichend werden alternative Methoden für ähnlichen Fragestellungen evaluiert und deren Ineffizienz aufgezeigt. Eine Verbesserung der TP/MC Methoden wird ebenfalls beschrieben und mit Ergebnissen belegt, sowie weitere Vorschläge zur Modifikation der Methoden gegeben. Basierend auf den Ergebnissen wird zur Anwendung der Methode für ähnliche Fragestellungen geraten. Hierfür werden Simulationsauswahl, Tests und Bewertungsysteme vorgeschlagen sowie weitere Studienschwerpunkte beleuchtet. Eine computergestützte Nutzung des Verfahrens, die alle Simulationsschritte umfasst, könnte zukünftig entwickelt werden um die Effizienz zu erhöhen. Die Ergebnisse dieser Studie und nachfolgende Untersuchungen könnten für eine Vielzahl von Fragestellungen im Bergbau, der Erdölindustrie, Geotechnik und Hydrogeologie von Bedeutung sein. 910 550 Geostatistics three-dimensional modeling subsurface geotechnics transition-probability Markov-chain model indicator geostatistics categorical variables soil unconsolidated materials geostatistical simulation realizations selection Chi-square test of homogeneity stepwise selection deviation rate data integration geological soundness Bergbau {Technik} (PPN619470550)
303	Estimação do erro em redes de sensores sem fios. / Error estimation in wireless sensor networks. Feitosa Neto, José Alencar 16 June 2008 (has links) Wireless Sensor Networks (WSNs) are presented in the constext of information acquisition and we propose a generic model based on the processes of signal sampling and reconstruction.We then define a measure of performance using the error when reconstructiong the signal.The analytical assessment of this measure in a variety of scenarios is unfeasible, so we propose and implement a Monte Carlo experiment for estimating the contribution of six factors on the performance of a WSN, namely: (i) the spatial distribution of sensors, (ii) the granularity of the phenomenon being monitored, (iii) the way in which sensors sample the phenomenon (constant characteristic functions defined on Voronoi cells or on cicles), (iv) the communication between sensors (either among neighboring Voronoi cells or among sensors within a range), (v) the clustering and aggregation algorithms (LEACH and SKATER), and (vi) the reconstruction techniques (by Voronoi cells and by Kriging). We conclude that all these factors have significative influence on the performance of a WSN, and we are able to quantitatively assess this influence. / Apresentamos as redes de sensores sem fios no contexto da aquisição de informação, e propomos um modelo genérico baseado nos processos de amostragem e de reconstrução de sinais. Utilizando esse modelo, definimos uma medida de desempenho do funcionamento das redes através do erro de reconstrução do sinal. Dada a complexidade analítica de se calcular esse erro em diferentes cenários, propomos e implementamos uma experiência Monte Carlo que permite avaliar quantitativamente a contribuição de diversos fatores no desempenho de uma rede de sensores sem fios. Esses fatores são (i) a distribuição espacial dos sensores (ii) a granularidade do fenômeno sob observação (iii) a forma em que os sensores amostram o fenômeno (funções características constantes sobre células de Voronoi e sobre círculos), (iv) as características de comunicação entre sensores (por vizinhança entre células de Voronoi e pelo raio de comunicação), (v) os algoritmos de clusterização e agregação (LEACH e SKATER), e (vi) as técnicas de reconstrução (por Voronoi e por Kriging). Os resultados obtidos permitem concluir que todos esses fatores influem significativamente no desempenho de uma rede de sensores sem fios e, pela metodologia de trabalho, foi possível medir essa influência em todos os cenários considerados. Monte Carlo, Method of Modeling Simulation Multiple sensors data integration Error Performance Reconstruction Sistems design Monte Carlo, Método de Modelagem Simulação Erro Desempenho Reconstrução Projetos de sistemas
304	MIDB : um modelo de integração de dados biológicos Perlin, Caroline Beatriz 29 February 2012 (has links) Made available in DSpace on 2016-06-02T19:05:56Z (GMT). No. of bitstreams: 1 4370.pdf: 1089392 bytes, checksum: 82daa0e51d37184f8864bd92d9342dde (MD5) Previous issue date: 2012-02-29 / In bioinformatics, there is a huge volume of data related to biomolecules and to nucleotide and amino acid sequences that reside (in almost their totality) in several Biological Data Bases (BDBs). For a specific sequence, there are some informational classifications: genomic data, evolution-data, structural data, and others. Some BDBs store just one or some of these classifications. Those BDBs are hosted in different sites and servers, with several data base management systems with different data models. Besides, instances and schema might have semantic heterogeneity. In such scenario, the objective of this project is to propose a biological data integration model, that adopts new schema integration and instance integration techniques. The proposed integration model has a special mechanism of schema integration and another mechanism that performs the instance integration (with support of a dictionary) allowing conflict resolution in the attribute values; and a Clustering Algorithm is used in order to cluster similar entities. Besides, a domain specialist participates managing those clusters. The proposed model was validated through a study case focusing on schema and instance integration about nucleotide sequence data from organisms of Actinomyces gender, captured from four different data sources. The result is that about 97.91% of the attributes were correctly categorized in the schema integration, and the instance integration was able to identify that about 50% of the clusters created need support from a specialist, avoiding errors on the instance resolution. Besides, some contributions are presented, as the Attributes Categorization, the Clustering Algorithm, the distance functions proposed and the proposed model itself. / Na bioinformática, existe um imenso volume de dados sendo produzidos, os quais estão relacionados a sequências de nucleotídeos e aminoácidos que se encontram, em quase a sua totalidade, armazenados em Bancos de Dados Biológicos (BDBs). Para uma determinada sequência existem algumas classificações de informação: dados genômicos, dados evolutivos, dados estruturais, dentre outros. Existem BDBs que armazenam somente uma ou algumas dessas classificações. Tais BDBs estão hospedados em diferentes sites e servidores, com sistemas gerenciadores de banco de dados distintos e com uso de diferentes modelos de dados, além de terem instâncias e esquemas com heterogeneidade semântica. Dentro desse contexto, o objetivo deste projeto de mestrado é propor um Modelo de Integração de Dados Biológicos, com novas técnicas de integração de esquemas e integração de instâncias. O modelo de integração proposto possui um mecanismo especial de integração de esquemas, e outro mecanismo que realiza a integração de instâncias de dados (com um dicionário acoplado) permitindo resolução de conflitos nos valores dos atributos; e um Algoritmo de Clusterização é utilizado, com o objetivo de realizar o agrupamento de entidades similares. Além disso, o especialista de domínio participa do gerenciamento desses agrupamentos. Esse modelo foi validado por meio de um estudo de caso com ênfase na integração de esquemas e integração de instâncias com dados de sequências de nucleotídeos de genes de organismos do gênero Actinomyces, provenientes de quatro diferentes fontes de dados. Como resultado, obteve-se que aproximadamente 97,91% dos atributos foram categorizados corretamente na integração de esquemas e a integração de instâncias conseguiu identificar que aproximadamente 50% dos clusters gerados precisam de tratamento do especialista, evitando erros de resolução de entidades. Além disso, algumas contribuições são apresentadas, como por exemplo a Categorização de Atributos, o Algoritmo de Clusterização, as funções de distância propostas e o modelo MIDB em si. Banco de dados Bioinformática Modelo de integração de dados Integração de esquemas Integração de instâncias Integração de Dados Biológicos Bioinformatics Biological Databases Biological Database Integration Data Integration Model Schema Integration Instance Integration
305	Active XML Data Warehouses for Intelligent, On-line Decision Support / Entrepôts de données XML actifs pour la décision intelligente en ligne Salem, Rashed 23 March 2012 (has links) Un système d'aide à la décision (SIAD) est un système d'information qui assiste lesdécideurs impliqués dans les processus de décision complexes. Les SIAD modernesont besoin d'exploiter, en plus de données numériques et symboliques, des donnéeshétérogènes (données texte, données multimédia, ...) et provenant de sources diverses(comme le Web). Nous qualifions ces données complexes. Les entrepôts dedonnées forment habituellement le socle des SIAD. Ils permettent d'intégrer des données provenant de diverses sources pour appuyer le processus décisionnel. Cependant, l'avènement de données complexes impose une nouvelle vision de l'entreposagedes données, y compris de l'intégration des données, de leur stockage et de leuranalyse. En outre, les exigences d'aujourd'hui imposent l'intégration des donnéescomplexes presque en temps réel, pour remplacer le processus ETL traditionnel(Extraction, Transformation et chargement). Le traitement en temps réel exige unprocessus ETL plus actif. Les tâches d'intégration doivent réagir d'une façon intelligente, c'est-à-dire d'une façon active et autonome pour s'adapter aux changementsrencontrés dans l'environnement d'intégration des données, notamment au niveaudes sources de données.Dans cette thèse, nous proposons des solutions originales pour l'intégration dedonnées complexes en temps réel, de façon active et autonome. En eet, nous avons conçu une approche générique basé sur des métadonnées, orientée services et orienté évènements pour l'intégration des données complexes. Pour prendre en charge lacomplexité des données, notre approche stocke les données complexes à l'aide d'unformat unie en utilisant une approche base sur les métadonnées et XML. Nous noustraitons également la distribution de données et leur l'interopérabilité en utilisantune approche orientée services. Par ailleurs, pour considérer le temps réel, notreapproche stocke non seulement des données intégrées dans un référentiel unie,mais présente des fonctions d'intégration des données a la volée. Nous appliquonségalement une approche orientée services pour observer les changements de donnéespertinentes en temps réel. En outre, l'idée d'intégration des données complexes defaçon active et autonome, nous proposons une méthode de fouille dans les évènements.Pour cela, nous proposons un algorithme incrémentiel base sur XML pourla fouille des règles d'association a partir d’évènements. Ensuite, nous denissonsdes règles actives a l'aide des données provenant de la fouille d'évènements an deréactiver les tâches d'intégration.Pour valider notre approche d'intégration de données complexes, nous avons développé une plateforme logicielle, à savoir AX-InCoDa ((Active XML-based frameworkfor Integrating Complex Data). AX-InCoDa est une application Web implémenté à l'aide d'outils open source. Elle exploite les standards du Web (comme les services Web et XML) et le XML actif pour traiter la complexité et les exigences temps réel. Pour explorer les évènements stockés dans base d'évènement, nous avons proposons une méthode de fouille d'évènements an d'assurer leur autogestion.AX-InCoDa est enrichi de règles actives L'ecacite d'AX-InCoDa est illustrée par une étude de cas sur des données médicales. En, la performance de notre algorithme de fouille d'évènements est démontrée expérimentalement. / A decision support system (DSS) is an information system that supports decisionmakers involved in complex decision-making processes. Modern DSSs needto exploit data that are not only numerical or symbolic, but also heterogeneouslystructured (e.g., text and multimedia data) and coming from various sources (e.g,the Web). We term such data complex data. Data warehouses are casually usedas the basis of such DSSs. They help integrate data from a variety of sourcesto support decision-making. However, the advent of complex data imposes anothervision of data warehousing including data integration, data storage and dataanalysis. Moreover, today's requirements impose integrating complex data in nearreal-time rather than with traditional snapshot and batch ETL (Extraction, Transformationand Loading). Real-time and near real-time processing requires a moreactive ETL process. Data integration tasks must react in an intelligent, i.e., activeand autonomous way, to encountered changes in the data integration environment,especially data sources.In this dissertation, we propose novel solutions for complex data integration innear real-time, actively and autonomously. We indeed provide a generic metadatabased,service-oriented and event-driven approach for integrating complex data.To address data complexity issues, our approach stores heterogeneous data into aunied format using a metadata-based approach and XML. We also tackle datadistribution and interoperability using a service-oriented approach. Moreover, toaddress near real-time requirements, our approach stores not only integrated datainto a unied repository, but also functions to integrate data on-the-y. We also apply a service-oriented approach to track relevant data changes in near real-time.Furthermore, the idea of integrating complex data actively and autonomously revolvesaround mining logged events of data integration environment. For this sake,we propose an incremental XML-based algorithm for mining association rules fromlogged events. Then, we de ne active rules upon mined data to reactivate integrationtasks.To validate our approach for managing complex data integration, we develop ahigh-level software framework, namely AX-InCoDa (Active XML-based frameworkfor Integrating Complex Data). AX-InCoDa is implemented as Web application usingopen-source tools. It exploits Web standards (e.g., XML and Web services) andActive XML to handle complexity issues and near real-time requirements. Besidewarehousing logged events into an event repository to be mined for self-managingpurposes, AX-InCoDa is enriched with active rules. AX-InCoDa's feasibility is illustratedby a healthcare case study. Finally, the performance of our incremental eventmining algorithm is experimentally demonstrated. Données complexes Intégration de données temps réel Services d'intégration Fouille d'événements XML actif Règles actives Services Web Complex data Near real-time data integration Integration services Event mining Active XML Active rules Web services
306	Identificação única de pacientes em fontes de dados distribuídas e heterogêneas Soares, Vinícius de Freitas 25 August 2009 (has links) Made available in DSpace on 2016-12-23T14:33:39Z (GMT). No. of bitstreams: 1 dissertacao.pdf: 2082796 bytes, checksum: e50f1bc16d61a50d4c9fb2e41ecd3cd5 (MD5) Previous issue date: 2009-08-25 / No decorrer de sua vida, um paciente é atendido por várias instituições de saúde e é submetido a uma série de procedimentos. A quantidade de informações armazenadas sobre esse paciente é crescente, tanto em volume quanto em diversidade. Existem ainda diferentes identificações para um mesmo paciente, gerando alto custo com duplicação de procedimentos e colaborando com a imprecisão dos diagnósticos e tratamentos. Nesse sentido, o presente trabalho utiliza técnicas de Record Linkage e geração de MPI (Master Patient Index), combinadas com as especificações do perfil de integração PIX (Patient Identifier Cross-Referencing), para estabelecer uma identificação única de pacientes em diferentes sistemas de informação em saúde, que contenham fontes de dados heterogêneas e distribuídas. Com a utilização desses conceitos e tecnologias, foi especificado um projeto e desenvolvido um protótipo de um IHE (Integrating the Healthcare Enterprise)/PIX. Experimentos foram realizados em três cenários com dados reais. Banco de Dados Integração de Dados Sistemas de Informação em Saúde Identificação de Pacientes IHE/PIX Record Linkage Databases Information Storage and Retrieval Data Integration Health Information Systems Patient Identification IHE/PIX Record Linkage
307	Seleção de características a partir da integração de dados por meio de análise de variação de número de cópias (CNV) para associação genótipo-fenótipo de doenças complexas Meneguin, Christian Reis January 2018 (has links) Orientador: Prof. Dr. David Corrêa Martins Júnior / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Ciência da Computação, Santo André, 2018. / As pesquisas em biologia sistêmica caracterizam-se pela interdisciplinaridade, a compreensão com visão ampla sobre as interações ocorridas internamente em organismos biológicos, hereditariedade e a influência de fatores ambientais. Neste cenário, é constituída uma rede complexa de interações na qual seus componentes são de diferentes tipos, como as variações do número de cópias (Copy Number Variation - CNVs), genes, entre outros. As doenças complexas que ocorrem neste contexto normalmente são consequências de perturbações intracelulares e intercelulares em tecidos e órgãos, sendo desenvolvidas de forma multifatorial, ou seja, a causa e o desenvolvimento dessas doenças são fruto de diversos fatores genéticos e ambientais. Nos últimos anos, tem sido produzido um volume bastante elevado de dados biológicos gerados por técnicas de sequenciamento de alto desempenho, requerendo pesquisas que envolvam para uma análise integrada desses dados. As variações do número de cópias (Copy Number Variation - CNVs), ou seja, a variação no número de repetições de subsequências de DNA entre indivíduos, se mostram úteis visto que estão relacionadas com outros tipos de dados como genes e dados de expressão gênica (abundâncias de mRNAs transcritos pelos genes em diferentes contextos). Devido a natureza heterogênea e a imensa quantidade de dados, a análise integrativa é um desafio computacional para o qual abordagens vêm sendo propostas. Neste sentido, nesta dissertação foi proposto um método que realiza a integração de dados (CNVs, dados de expressão gênica, haploinsuficiência, imprint, entre outros) por meio de um processo que permite identificar trechos comuns de CNVs entre amostras de diferentes indivíduos, sejam estas amostras de caso ou de controle e que possuem informações obtidas a partir das integrações feitas. Com este processo, o método aqui proposto diferencia-se dos métodos que realizam integração de dados por meio da análise de sobreposição dos dados biológicos, mas não geram novos dados contendo intervalos de CNVs existentes entre as amostras. O método proposto foi analisado com base no estudo de caso do autismo (Transtornos do Espectro Autista - TEA). O autismo, além de ser considerado uma doença complexa, possui algumas particularidades que dificultam o seu estudo quando comparado a outros tipos de doenças complexas como o câncer, por exemplo. Foram realizados dois experimentos que envolveram dados dos CNVs de indivíduos com TEA (caso) e indivíduos sem este transtorno (controle). Também foi feito um experimento utilizando amostras de CNVs de TEA e amostras de CNVs relacionados a outras doenças do neurodesenvolvimento. Os experimentos envolveram a integração dos tipos de dados propostos. Foi possível identificar trechos de CNVs que estão presentes somente em amostras associadas aos casos e não em controles, ou cenários de trechos de CNVs presentes em amostras de TEA e ausentes nas amostras de outras doenças do neurodesenvolvimento, e vice-versa. Os resultados também refletiram a tendência de indivíduos do gênero masculino serem mais afetados por TEA em relação ao feminino. Foi possível também identificar genes associados e informações como o biotipo e se estão presentes em dados de haploinsuficiência, imprint ou ainda dados de expressão agrupados em regiões e períodos. Finalmente, análises de enriquecimento das listas de genes dos CNVs resultantes do método apontam para diversas vias relacionadas com o TEA, tais como as vias de sinalização do receptor toll-like dependente de TRIF, do ácido gama-aminobutírico (GABA), de transmissão sináptica e secreção neurotransmissora, de recepção da insulina, de percepção sensorial olfativa, e de adesão celular independente de cálcio. / Researches in systems biology are characterized by interdisciplinarity, wide-ranging understanding of interactions within biological organisms, heredity, and the influence of environmental factors. In this scenario, a complex network of interactions is constituted of different types of components, such as CNVs (Copy Number Variations), genes, and others. Complex diseases that occur in this context are usually consequences of intracellular, intercellular, tissue, organ, and multifactorial disorders, i.e., the cause and development of these diseases are the result of various genetic and environmental factors. In recent years, a very large volume of biological data generated by high performance sequencing techniques has been produced, requiring researches involving an integrated analysis of these data. CNVs, i.e., the variation in the number of DNA subsequences between individuals, are useful because they are related to other types of data such as genes and gene expression data (abundances of mRNAs transcribed by genes in different contexts). Due to the heterogeneous nature and the immense amount of data, integrative analysis is a computational challenge for which approaches have been proposed. In this sense, in this dissertation a method was proposed that performs a data integration (CNVs, gene expression data, haploinsufficiency, imprint, among others) through a process that allows to identify common portions of CNVs between samples of different individuals, being these case or control samples and that have information obtained from the integration performed. In this context, the method proposed here differs from the methods that carry out data integration through the analysis of the overlay of the biological data, but does not generate new data containing ranges of CNVs existing between the samples. The proposed method was analyzed on the basis of the case study of Autistic Spectrum Disorder (ASD). Besides being considered a complex disease, TEA has some peculiarities that hinder its study when compared to other types of complex diseases such as cancer, for example. As a case study, two experiments were carried out that involved data from the CNVs of individuals with ASD (case) and individuals without this disorder (control). An experiment was also done using samples of ASD CNVs and CNVs samples related to other neurodevelopmental diseases. The experiments involved the integration of the proposed data types. Among the results, the method identified excerpts of CNVs that are present only in samples associated with the cases and not in controls, or scenarios of CNVs snippets present in TEA samples and not present in other neurodevelopmental disease samples, and vice-versa. The results also reflected the tendency for males to be more affected by TEA compared to the females. In the excerpts of CNVs in certain results, it was possible to identify associated gene informations such as the biotype and whether they are present in Haploinsufficiency, imprint or even expression data grouped in regions and periods. Finally, enrichment analyses involving lists of genes from the resulting CNVs point to several signaling pathways related to TEA, such as TRIF-dependent toll-like receptor signaling, gamma aminobutyric acid (GABA), synaptic transmission and neurotransmitter secretion, insulin reception, olfactory sensorial perception, and calcium independent cell-cell adhesion. VARIAÇÃO NO NÚMERO DE CÓPIAS DADOS DE EXPRESSÃO GÊNICA DOENÇAS COMPLEXAS INTEGRAÇÃO DE DADOS MINERAÇÃO DE DADOS COPY NUMBER VARIATION GENE EXPRESSION DATA COMPLEX DISEASES DATA INTEGRATION DATA MINING
308	Gerenciamento e Integração das Bases de Dados de Sistemas de Detecção de Intrusões / MANAGEMENT AND INTEGRATION OF BASES DATA SYSTEMS FOR DETECTING INVASION SILVA, Emanoel Costa Claudino 19 December 2006 (has links) Made available in DSpace on 2016-08-17T14:53:13Z (GMT). No. of bitstreams: 1 Emanoel Claudino.pdf: 1555729 bytes, checksum: b4ba5604a13f7f5cbe0d556a5a9eedf8 (MD5) Previous issue date: 2006-12-19 / The digital security has become an important factor for institutions of diverse domains. The Intrusion Detection Systems (IDS) have appeared as a solution for detention and correction of intrusion of pro-active way. Thus, some models of SDIs have appeared to diminish the probability of compromising of on computational systems connected in net, identifying, reporting and answering to these incidents. In face to that diversity of solutions, they lack proposals of standardization of the information used for these Systems, as well as of mechanisms of interoperability and exchange of information between the solutions in use. This dissertation, proposes a model, an architecture and an implementation of a SDI´s Information Manager, using the technologies of Multi- Agents Systems and Web Services. The objective of the Information Manager is to keep the information that are necessary to the development of the inherent functions of a SDI, in a safe and updated way. We also propose a standard of format for storage of these data to insert requirements in the environment, as: Unified Storage, Transparent Access, Uniform Generation of Data and Friendly Interaction. / A Segurança digital tem se tornado fator inegociável para instituições de diversos domínios. Os Sistemas de Detecção de Intrusão (SDIs) têm surgido como uma solução para detecção e correção de intrusão de forma próativa. Assim, vários modelos de SDIs têm surgido para, identificando, reportando e respondendo a estes incidentes, diminuir a probabilidade de comprometimento dos sistemas computacionais ligados em rede. Diante desta diversidade de soluções, faltam propostas de padronização das informações utilizadas por estes Sistemas, bem como de mecanismos de interoperabilidade e troca de informações entre as soluções em uso. Esta dissertação, propõem um modelo, uma arquitetura e uma implementação de um Gerenciador de Informações para SDIs, usando as tecnologias de Sistemas Multiagentes e Web Services. O objetivo do Gerenciador de Informações é manter de forma segura e atualizada as informações que são necessárias ao desenvolvimento das funções inerentes a um SDI. É proposto também, um padrão de formato para armazenamento desses dados, de forma a inserir no ambiente requisitos como: Armazenamento Unificado, Acesso Transparente, Geração de Dados Uniforme e Facilidade de Interoperabilidade. Gerenciamento de Informações Sistemas Multiagentes Segurança de Redes Information Manager Intrusion Detection Systems Multi-Agents Systems Data Integration Systems for IDS Net Security
309	Intelligent information processing in building monitoring systems and applications Skön, J.-P. (Jukka-Pekka) 10 November 2015 (has links) Abstract Global warming has set in motion a trend for cutting energy costs to reduce the carbon footprint. Reducing energy consumption, cutting greenhouse gas emissions and eliminating energy wastage are among the main goals of the European Union (EU). The buildings sector is the largest user of energy and CO2 emitter in the EU, estimated at approximately 40% of the total consumption. According to the International Panel on Climate Change, 30% of the energy used in buildings could be reduced with net economic benefits by 2030. At the same time, indoor air quality is recognized more and more as a distinct health hazard. Because of these two factors, energy efficiency and healthy housing have become active topics in international research. The main aims of this thesis were to study and develop a wireless building monitoring and control system that will produce valuable information and services for end-users using computational methods. In addition, the technology developed in this thesis relies heavily on building automation systems (BAS) and some parts of the concept termed the “Internet of Things” (IoT). The data refining process used is called knowledge discovery from data (KDD) and contains methods for data acquisition, pre-processing, modeling, visualization and interpreting the results and then sharing the new information with the end-users. In this thesis, four examples of data analysis and knowledge deployment are presented. The results of the case studies show that innovative use of computational methods provides a good basis for researching and developing new information services. In addition, the data mining methods used, such as regression and clustering completed with efficient data pre-processing methods, have a great potential to process a large amount of multivariate data effectively. The innovative and effective use of digital information is a key element in the creation of new information services. The service business in the building sector is significant, but plenty of new possibilities await capable and advanced companies or organizations. In addition, end-users, such as building maintenance personnel and residents, should be taken into account in the early stage of the data refining process. Furthermore, more advantages can be gained by courageous co-operation between companies and organizations, by utilizing computational methods for data processing to produce valuable information and by using the latest technologies in the research and development of new innovations. / Tiivistelmä Rakennus- ja kiinteistösektori on suurin fossiilisilla polttoaineilla tuotetun energian käyttäjä. Noin 40 prosenttia kaikesta energiankulutuksesta liittyy rakennuksiin, rakentamiseen, rakennusmateriaaleihin ja rakennuksien ylläpitoon. Ilmastonmuutoksen ehkäisyssä rakennusten energiankäytön vähentämisellä on suuri merkitys ja rakennuksissa energiansäästöpotentiaali on suurin. Tämän seurauksena yhä tiiviimpi ja energiatehokkaampi rakentaminen asettaa haasteita hyvän sisäilman laadun turvaamiselle. Näistä seikoista johtuen sisäilman laadun tutkiminen ja jatkuvatoiminen mittaaminen on tärkeää. Väitöskirjan päätavoitteena on kuvata kehitetty energiankulutuksen ja sisäilman laadun monitorointijärjestelmä. Järjestelmän tuottamaa mittaustietoa on jalostettu eri loppukäyttäjiä palvelevaan muotoon. Tiedonjalostusprosessi koostuu tiedon keräämisestä, esikäsittelystä, tiedonlouhinnasta, visualisoinnista, tulosten tulkitsemisesta ja oleellisen tiedon välittämisestä loppukäyttäjille. Aineiston analysointiin on käytetty tiedonlouhintamenetelmiä, kuten esimerkiksi klusterointia ja ennustavaa mallintamista. Väitöskirjan toisena tavoitteena on tuoda esille jatkuvatoimiseen mittaamiseen liittyviä haasteita sekä rohkaista yrityksiä ja organisaatioita käyttämään tietovarantoja monipuolisemmin ja tehokkaammin. Väitöskirja pohjautuu viiteen julkaisuun, joissa kuvataan kehitetty monitorointijärjestelmä, osoitetaan tiedonjalostusprosessin toimivuus erilaisissa tapauksissa ja esitetään esimerkkejä kuhunkin prosessivaiheeseen soveltuvista laskennallisista menetelmistä. Julkaisuissa on kuvattu energiankulutuksen ja sisäilman laadun informaatiopalvelu sekä sisäilman laatuun liittyviä data-analyysejä omakoti- ja kerrostaloissa sekä koulurakennuksissa. Innovatiivinen digitaalisen tiedon hyödyntäminen on avainasemassa kehitettäessä uusia informaatiopalveluita. Kiinteistöalalle on kehitetty lukuisia informaatioon pohjautuvia palveluita, mutta ala tarjoaa edelleen hyviä liiketoimintamahdollisuuksia kyvykkäille ja kehittyneille yrityksille sekä organisaatioille. data integration data mining energy consumption indoor air quality information services k-means monitoring systems multilayer perceptron self-organizing map asuinrakennukset energiankulutus ilmanlaatu informaatiopalvelu julkiset rakennukset laskentamenetelmät mallintaminen mittaaminen tiedonlouhinta
310	Distributed knowledge sharing and production through collaborative e-Science platforms / Partage et production de connaissances distribuées dans des plateformes scientifiques collaboratives Gaignard, Alban 15 March 2013 (has links) Cette thèse s'intéresse à la production et au partage cohérent de connaissances distribuées dans le domaine des sciences de la vie. Malgré l'augmentation constante des capacités de stockage et de calcul des infrastructures informatiques, les approches centralisées pour la gestion de grandes masses de données scientifiques multi-sources deviennent inadaptées pour plusieurs raisons: (i) elles ne garantissent pas l'autonomie des fournisseurs de données qui doivent conserver un certain contrôle sur les données hébergées pour des raisons éthiques et/ou juridiques, (ii) elles ne permettent pas d'envisager le passage à l'échelle des plateformes en sciences computationnelles qui sont la source de productions massives de données scientifiques. Nous nous intéressons, dans le contexte des plateformes collaboratives en sciences de la vie NeuroLOG et VIP, d'une part, aux problématiques de distribution et d'hétérogénéité sous-jacentes au partage de ressources, potentiellement sensibles ; et d'autre part, à la production automatique de connaissances au cours de l'usage de ces plateformes, afin de faciliter l'exploitation de la masse de données produites. Nous nous appuyons sur une approche ontologique pour la modélisation des connaissances et proposons à partir des technologies du web sémantique (i) d'étendre ces plateformes avec des stratégies efficaces, statiques et dynamiques, d'interrogations sémantiques fédérées et (ii) d'étendre leur environnent de traitement de données pour automatiser l'annotation sémantique des résultats d'expérience ``in silico'', à partir de la capture d'informations de provenance à l'exécution et de règles d'inférence spécifiques au domaine. Les résultats de cette thèse, évalués sur l'infrastructure distribuée et contrôlée Grid'5000, apportent des éléments de réponse à trois enjeux majeurs des plateformes collaboratives en sciences computationnelles : (i) un modèle de collaborations sécurisées et une stratégie de contrôle d'accès distribué pour permettre la mise en place d'études multi-centriques dans un environnement compétitif, (ii) des résumés sémantiques d'expérience qui font sens pour l'utilisateur pour faciliter la navigation dans la masse de données produites lors de campagnes expérimentales, et (iii) des stratégies efficaces d'interrogation et de raisonnement fédérés, via les standards du Web Sémantique, pour partager les connaissances capitalisées dans ces plateformes et les ouvrir potentiellement sur le Web de données. Mots-clés: Flots de services et de données scientifiques, Services web sémantiques, Provenance, Web de données, Web sémantique, Fédération de bases de connaissances, Intégration de données distribuées, e-Sciences, e-Santé. / This thesis addresses the issues of coherent distributed knowledge production and sharing in the Life-science area. In spite of the continuously increasing computing and storage capabilities of computing infrastructures, the management of massive scientific data through centralized approaches became inappropriate, for several reasons: (i) they do not guarantee the autonomy property of data providers, constrained, for either ethical or legal concerns, to keep the control over the data they host, (ii) they do not scale and adapt to the massive scientific data produced through e-Science platforms. In the context of the NeuroLOG and VIP Life-science collaborative platforms, we address on one hand, distribution and heterogeneity issues underlying, possibly sensitive, resource sharing ; and on the other hand, automated knowledge production through the usage of these e-Science platforms, to ease the exploitation of the massively produced scientific data. We rely on an ontological approach for knowledge modeling and propose, based on Semantic Web technologies, to (i) extend these platforms with efficient, static and dynamic, transparent federated semantic querying strategies, and (ii) to extend their data processing environment, from both provenance information captured at run-time and domain-specific inference rules, to automate the semantic annotation of ``in silico'' experiment results. The results of this thesis have been evaluated on the Grid'5000 distributed and controlled infrastructure. They contribute to addressing three of the main challenging issues faced in the area of computational science platforms through (i) a model for secured collaborations and a distributed access control strategy allowing for the setup of multi-centric studies while still considering competitive activities, (ii) semantic experiment summaries, meaningful from the end-user perspective, aimed at easing the navigation into massive scientific data resulting from large-scale experimental campaigns, and (iii) efficient distributed querying and reasoning strategies, relying on Semantic Web standards, aimed at sharing capitalized knowledge and providing connectivity towards the Web of Linked Data. Services web sémantiques Web de données Fédération de bases de connaissances Intégration de données distribuées E-Sciences E-Santé Scientific workflows Semantic web services Web of linked data Federated knowledge bases Distributed data integration E-Science E-Health 004

Search results