• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 165
  • 65
  • 20
  • 15
  • 11
  • 7
  • 4
  • 4
  • 2
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 332
  • 332
  • 70
  • 48
  • 48
  • 45
  • 38
  • 36
  • 35
  • 34
  • 32
  • 31
  • 31
  • 31
  • 29
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
151

Data integration and visualization for systems biology data

Cheng, Hui 29 December 2010 (has links)
Systems biology aims to understand cellular behavior in terms of the spatiotemporal interactions among cellular components, such as genes, proteins and metabolites. Comprehensive visualization tools for exploring multivariate data are needed to gain insight into the physiological processes reflected in these molecular profiles. Data fusion methods are required to integratively study high-throughput transcriptomics, metabolomics and proteomics data combined before systems biology can live up to its potential. In this work I explored mathematical and statistical methods and visualization tools to resolve the prominent issues in the nature of systems biology data fusion and to gain insight into these comprehensive data. In order to choose and apply multivariate methods, it is important to know the distribution of the experimental data. Chi square Q-Q plot and violin plot were applied to all M. truncatula data and V. vinifera data, and found most distributions are right-skewed (Chapter 2). The biplot display provides an effective tool for reducing the dimensionality of the systems biological data and displaying the molecules and time points jointly on the same plot. Biplot of M. truncatula data revealed the overall system behavior, including unidentified compounds of interest and the dynamics of the highly responsive molecules (Chapter 3). The phase spectrum computed from the Fast Fourier transform of the time course data has been found to play more important roles than amplitude in the signal reconstruction. Phase spectrum analyses on in silico data created with two artificial biochemical networks, the Claytor model and the AB2 model proved that phase spectrum is indeed an effective tool in system biological data fusion despite the data heterogeneity (Chapter 4). The difference between data integration and data fusion are further discussed. Biplot analysis of scaled data were applied to integrate transcriptome, metabolome and proteome data from the V. vinifera project. Phase spectrum combined with k-means clustering was used in integrative analyses of transcriptome and metabolome of the M. truncatula yeast elicitation data and of transcriptome, metabolome and proteome of V. vinifera salinity stress data. The phase spectrum analysis was compared with the biplot display as effective tools in data fusion (Chapter 5). The results suggest that phase spectrum may perform better than the biplot. This work was funded by the National Science Foundation Plant Genome Program, grant DBI-0109732, and by the Virginia Bioinformatics Institute. / Ph. D.
152

Data Integration Methodologies and Services for Evaluation and Forecasting of Epidemics

Deodhar, Suruchi 31 May 2016 (has links)
Most epidemiological systems described in the literature are built for evaluation and analysis of specific diseases, such as Influenza-like-illness. The modeling environments that support these systems are implemented for specific diseases and epidemiological models. Hence they are not reusable or extendable. This thesis focuses on the design and development of an integrated analytical environment with flexible data integration methodologies and multi-level web services for evaluation and forecasting of various epidemics in different regions of the world. The environment supports analysis of epidemics based on any combination of disease, surveillance sources, epidemiological models, geographic regions and demographic factors. The environment also supports evaluation and forecasting of epidemics when various policy-level and behavioral interventions are applied, that may inhibit the spread of an epidemic. First, we describe data integration methodologies and schema design, for flexible experiment design, storage and query retrieval mechanisms related to large scale epidemic data. We describe novel techniques for data transformation, optimization, pre-computation and automation that enable flexibility, extendibility and efficiency required in different categories of query processing. Second, we describe the design and engineering of adaptable middleware platforms based on service-oriented paradigms for interactive workflow, communication, and decoupled integration. This supports large-scale multi-user applications with provision for online analysis of interventions as well as analytical processing of forecast computations. Using a service-oriented architecture, we have provided a platform-as-a-service representation for evaluation and forecasting of epidemics. We demonstrate the applicability of our integrated environment through development of the applications, DISIMS and EpiCaster. DISIMS is an interactive web-based system for evaluating the effects of dynamic intervention strategies on epidemic propagation. EpiCaster is a situation assessment and forecasting tool for projecting the state of evolving epidemics such as flu and Ebola in different regions of the world. We discuss how our platform uses existing technologies to solve a novel problem in epidemiology, and provides a unique solution on which different applications can be built for analyzing epidemic containment strategies. / Ph. D.
153

Hide-Metadata Based Data Integration Environment for Hydrological Datasets

Ravindran, Nimmy 30 December 2004 (has links)
Efficient data integration is one of the most challenging problems in data management, interoperation and analysis. The Earth science data which are heterogeneous are collected at various geographical locations for scientific studies and operational uses. The intrinsic problem of archiving, distributing and searching such huge scientific datasets is compounded by the heterogeneity of data and queries, thus limiting scientific analysis, and generation/validation of hydrologic forecast models. The data models of hydrologic research communities such as National Weather Service (NWS), National Oceanic and Atmospheric Administration (NOAA), and US Geological Survey (USGS) are diverse and complex. A complete derivation of any useful hydrological models from data integrated from all these sources is often a time consuming process. One of the current trends of data harvesting in scientific community is towards a distributed digital library initiative. However, these approaches may not be adequate for data sources / entities who do not want to "upload" the data into a "data pool." In view of this, we present here an effective architecture to address the issues of data integration in such a diverse environment for hydrological studies. The heterogeneities in these datasets are addressed based on the autonomy of data source in terms of design, communication, association and execution using a hierarchical integration model. A metadata model is also developed for defining data as well as the data sources, thus providing a uniform view of the data for different kind of users. An implementation of the model using web based system that integrates widely varied hydrology datasets from various data sources is also being developed. / Master of Science
154

Cognitively-inspired Architecture for Wireless Sensor Networks: A Model Driven Approach for Data Integration in a Traffic Monitoring System

Phalak, Kashmira 08 September 2006 (has links)
We describe CoSMo, a Cognitively Inspired Service and Model Architecture for situational awareness and monitoring of vehicular traffic in urban transportation systems using a network of wireless sensors. The system architecture combines (i) a cognitively inspired internal representation for analyzing and answering queries concerning the observed system and (ii) a service oriented architecture that facilitates interaction among individual modules, of the internal representation, the observed system and the user. The cognitively inspired model architecture allows effective deductive as well as inductive reasoning by combining simulation based dynamic models for planning with traditional relational databases for knowledge and data representation. On the other hand the service oriented design of interaction allows one to build flexible, extensible and scalable systems that can be deployed in practical settings. To illustrate our concepts and the novel features of our architecture, we have recently completed a prototype implementation of CoSMo. The prototype illustrates advantages of our approach over other traditional approaches for designing scalable software for situational awareness in large complex systems. The basic architecture and its prototype implementation are generic and can be applied for monitoring other complex systems. This thesis describes the design of cognitively-inspired model architecture and its corresponding prototype. Two important contributions include the following: • The cognitively-inspired architecture: In contrast to earlier work in model driven architecture, CoSMo contains a number of cognitively inspired features, including perception, memory and learning. Apart from illustrating interesting trade-offs between computational cost (e.g. access time, memory), and correctness available to a user, it also allows users specified deductive and inductive queries. • Distributed Data Integration and Fusion: In keeping with the cognitively-inspired model-driven approach, the system allows for an efficient data fusion from heterogeneous sensors, simulation based dynamic models and databases that are continually updated with real world and simulated data. It is capable of supporting a rich class of queries. / Master of Science
155

Performance evaluation for process refinement stage of SWA system

Shurrab, O., Awan, Irfan U. January 2015 (has links)
No / Abstract: In periodic manner the analysts teams are in the process of designing, updating and verifying the situational awareness SWA system. Initially, at the designing stage the risk assessment model has little information about the dynamic environment. Hence, any missing information can directly impact the situational assessment capabilities. With this in mind, researchers relied on various performance metrics in order to verify how well they were doing in assessing different situations. In fact, before measuring the ranking capabilities of the SWA system, the underlying performance metrics should be examined against its intended purpose. In this paper, we have conducted quality based evaluations for the performance metrics, namely "The Ranking Capability Score". The results obtained showed that the proposed performance metrics have scaled well over a number of scenarios. Indeed, from the data fusion perspectives the underlying metrics have adequately satisfied different SWA system needs and configurations.
156

A Study of Machine Learning Approaches for Integrated Biomedical Data Analysis

Chang, Yi Tan 29 June 2018 (has links)
This thesis consists of two projects in which various machine learning approaches and statistical analysis for the integration of biomedical data analysis were explored, developed and tested. Integration of different biomedical data sources allows us to get a better understating of human body from a bigger picture. If we can get a more complete view of the data, we not only get a more complete view of the molecule basis of phenotype, but also possibly can identify abnormality in diseases which were not found when using only one type of biomedical data. The objective of the first project is to find biological pathways which are related to Duechenne Muscular Dystrophy(DMD) and Lamin A/C(LMNA) using the integration of multi-omics data. We proposed a novel method which allows us to integrate proteins, mRNAs and miRNAs to find disease related pathways. The goal of the second project is to develop a personalized recommendation system which recommend cancer treatments to patients. Compared to the traditional way of using only users' rating to impute missing values, we proposed a method to incorporate users' profile to help enhance the accuracy of the prediction. / Master of Science / There are two existing major problems in the biomedical field. Previously, researchers only used one data type for analysis. However, one measurement does not fully capture the processes at work and can lead to inaccurate result with low sensitivity and specificity. Moreover, there are too many missing values in the biomedical data. This left us with many questions unanswered and can lead us to draw wrong conclusions from the data. To overcome these problems, we would like to integrate multiple data types which not only better captures the complex biological processes but also leads to a more comprehensive characterization. Moreover, utilizing the correlation among various data structures also help us impute missing values in the biomedical datasets. For my two research projects, we are interested in integrating multiple biological data to identify disease specific pathways and predict unknown treatment responses for cancer patients. In this thesis, we propose a novel approach for pathways identification using the integration of multi-omics data. Secondly, we also develop a recommendation system which combines different types of patients’ medical information for missing treatment responses’ prediction. Our goal is that we would find disease related pathways for the first project and enhance missing treatment response’s prediction for the second project with the methods we develop. The findings of my studies show that it is possible to find pathways related to muscular dystrophies using the integration of multi-omics data. Moreover, we also demonstrate that incorporating patient’s genetic profile can improve the prediction accuracy compared to using the treatment responses matrix alone for imputation.
157

Multi-omics Data Integration for Identifying Disease Specific Biological Pathways

Lu, Yingzhou 05 June 2018 (has links)
Pathway analysis is an important task for gaining novel insights into the molecular architecture of many complex diseases. With the advancement of new sequencing technologies, a large amount of quantitative gene expression data have been continuously acquired. The springing up omics data sets such as proteomics has facilitated the investigation on disease relevant pathways. Although much work has previously been done to explore the single omics data, little work has been reported using multi-omics data integration, mainly due to methodological and technological limitations. While a single omic data can provide useful information about the underlying biological processes, multi-omics data integration would be much more comprehensive about the cause-effect processes responsible for diseases and their subtypes. This project investigates the combination of miRNAseq, proteomics, and RNAseq data on seven types of muscular dystrophies and control group. These unique multi-omics data sets provide us with the opportunity to identify disease-specific and most relevant biological pathways. We first perform t-test and OVEPUG test separately to define the differential expressed genes in protein and mRNA data sets. In multi-omics data sets, miRNA also plays a significant role in muscle development by regulating their target genes in mRNA dataset. To exploit the relationship between miRNA and gene expression, we consult with the commonly used gene library - Targetscan to collect all paired miRNA-mRNA and miRNA-protein co-expression pairs. Next, by conducting statistical analysis such as Pearson's correlation coefficient or t-test, we measured the biologically expected correlation of each gene with its upstream miRNAs and identify those showing negative correlation between the aforementioned miRNA-mRNA and miRNA-protein pairs. Furthermore, we identify and assess the most relevant disease-specific pathways by inputting the differential expressed genes and negative correlated genes into the gene-set libraries respectively, and further characterize these prioritized marker subsets using IPA (Ingenuity Pathway Analysis) or KEGG. We will then use Fisher method to combine all these p-values derived from separate gene sets into a joint significance test assessing common pathway relevance. In conclusion, we will find all negative correlated paired miRNA-mRNA and miRNA-protein, and identifying several pathophysiological pathways related to muscular dystrophies by gene set enrichment analysis. This novel multi-omics data integration study and subsequent pathway identification will shed new light on pathophysiological processes in muscular dystrophies and improve our understanding on the molecular pathophysiology of muscle disorders, preventing and treating disease, and make people become healthier in the long term. / Master of Science / Identification of biological pathways play a central role in understanding both human health and diseases. A biological pathway is a series of information processing steps via interactions among molecules in a cell that partially determines the phenotype of a cell. Specifically, identifying disease-specific pathway will guide focused studies on complex diseases, thus potentially improve the prevention and treatment of diseases. To identify disease-specific pathways, it is crucial to develop computational methods and statistical tests that can integrate multi-omics (multiple omes such as genome, proteome, etc) data. Compared to single omics data, multi-omics data will help gaining a more comprehensive understanding on the molecular architecture of disease processes. In this thesis, we propose a novel data analytics pipeline for multi-omics data integration. We test and apply our method on/to the real proteomics data sets on muscular dystrophy subtypes, and identify several biologically plausible pathways related to muscular dystrophies.
158

Multi-Platform Molecular Data Integration and Disease Outcome Analysis

Youssef, Ibrahim Mohamed 06 December 2016 (has links)
One of the most common measures of clinical outcomes is the survival time. Accurately linking cancer molecular profiling with survival outcome advances clinical management of cancer. However, existing survival analysis relies intensively on statistical evidence from a single level of data, without paying much attention to the integration of interacting multi-level data and the underlying biology. Advances in genomic techniques provide unprecedented power of characterizing the cancer tissue in a more complete manner than before, opening the opportunity of designing biologically informed and integrative approaches for survival analysis. Many cancer tissues have been profiled for gene expression levels and genomic variants (such as copy number alterations, sequence mutations, DNA methylation, and histone modification). However, it is not clear how to integrate the gene expression and genetic variants to achieve a better prediction and understanding of the cancer survival. To address this challenge, we propose two approaches for data integration in order to both biologically and statistically boost the features selection process for proper detection of the true predictive players of survival. The first approach is data-driven yet biologically informed. Consistent with the biological hierarchy from DNA to RNA, we prioritize each survival-relevant feature with two separate scores, predictive and mechanistic. With mRNA expression levels in concern, predictive features are those mRNAs whose variation in expression levels are associated with the survival outcome, and mechanistic features are those mRNAs whose variation in expression levels are associated with genomic variants (copy number alterations (CNAs) in this study). Further, we propose simultaneously integrating information from both the predictive model and the mechanistic model through our new approach GEMPS (Gene Expression as a Mediator for Predicting Survival). Applied on two cancer types (ovarian and glioblastoma multiforme), our method achieved better prediction power than peer methods. Gene set enrichment analysis confirms that the genes utilized for the final survival analysis are biologically important and relevant. The second approach is a generic mathematical framework to biologically regularize the Cox's proportional hazards model that is widely used in survival analysis. We propose a penalty function that both links the mechanistic model to the clinical model and reflects the biological downstream regulatory effect of the genomic variants on the mRNA expression levels of the target genes. Fast and efficient optimization principles like the coordinate descent and majorization-minimization are adopted in the inference process of the coefficients of the Cox model predictors. Through this model, we develop the regulator-target gene relationship to a new one: regulator-target-outcome relationship of a disease. Assessed via a simulation study and analysis of two real cancer data sets, the proposed method showed better performance in terms of selecting the true predictors and achieving better survival prediction. The proposed method gives insightful and meaningful interpretability to the selected model due to the biological linking of the mechanistic model and the clinical model. Other important forms of clinical outcomes are monitoring angiogenesis (formation of new blood vessels necessary for tumor to nourish itself and sustain its existence) and assessing therapeutic response. This can be done through dynamic imaging, in which a series of images at different time instances are acquired for a specific tumor site after injection of a contrast agent. Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is a noninvasive tool to examine tumor vasculature patterns based on accumulation and washout of the contrast agent. DCE-MRI gives indication about tumor vasculature permeability, which in turn indicates the tumor angiogenic activity. Observing this activity over time can reflect the tumor drug responsiveness and efficacy of the treatment plan. However, due to the limited resolution of the imaging scanners, a partial-volume effect (PVE) problem occurs, which is the result of signals from two or more tissues combining together to produce a single image concentration value within a pixel, with the effect of inaccurate estimation to the values of the pharmacokinetic parameters. A multi-tissue compartmental modeling (CM) technique supported by convex analysis of mixtures is used to mitigate the PVE by clustering pixels and constructing a simplex whose vertices are of a single compartment type. CAM uses the identified pure-volume pixels to estimate the kinetics of the tissues under investigation. We propose an enhanced version of CAM-CM to identify pure-volume pixels more accurately. This includes the consideration of the neighborhood effect on each pixel and the use of a barycentric coordinate system to identify more pure-volume pixels and to test those identified by CAM-CM. Tested on simulated DCE-MRI data, the enhanced CAM-CM achieved better performance in terms of accuracy and reproducibility. / Ph. D. / Disease outcome can refer to an event, state, condition, or behavior for some aspect of a patient’s health status. Event can express survival, while behavior can assess drug efficacy and treatment responsiveness. To gain deeper and, hence, better understanding about diseases, symptoms inspection has been shifted from the physical symptoms appearing externally on the human body to internal symptoms that require invasive and noninvasive techniques to find out and quantify them. These internal symptoms can be further divided into phenotypic and genotypic symptoms. Examples of phenotypes can include shape, structure, and volume of a specific human body organ or tissue. Examples of genotypes can be the dosage of the genetic information and the activity of genes, where genes are responsible for identifying the function of the cells constituting tissues. Linking disease phenotypes and genotypes to disease outcomes is of great importance to widen the understanding of disease mechanisms and progression. In this dissertation, we propose novel computational techniques to integrate data generated from different platforms, where each data type addresses one aspect of the disease internal symptoms, to provide wider picture and deeper understanding about a disease. We use imaging and genomic data with applications in ovarian, glioblastoma multiforme, and breast cancers to test the proposed techniques. These techniques aim to provide outcomes that are statistically significant, as what current peer methods do, beside biological insights, which current peer methods lack.
159

Abordagem para integração automática de dados estruturados e não estruturados em um contexto Big Data / Approach for automatic integration of structured and unstructured data in a Big Data context

Keylla Ramos Saes 22 November 2018 (has links)
O aumento de dados disponíveis para uso tem despertado o interesse na geração de conhecimento pela integração de tais dados. No entanto, a tarefa de integração requer conhecimento dos dados e também dos modelos de dados utilizados para representá-los. Ou seja, a realização da tarefa de integração de dados requer a participação de especialistas em computação, o que limita a escalabilidade desse tipo de tarefa. No contexto de Big Data, essa limitação é reforçada pela presença de uma grande variedade de fontes e modelos heterogêneos de representação de dados, como dados relacionais com dados estruturados e modelos não relacionais com dados não estruturados, essa variedade de representações apresenta uma complexidade adicional para o processo de integração de dados. Para lidar com esse cenário é necessário o uso de ferramentas de integração que reduzam ou até mesmo eliminem a necessidade de intervenção humana. Como contribuição, este trabalho oferece a possibilidade de integração de diversos modelos de representação de dados e fontes de dados heterogêneos, por meio de uma abordagem que permite o do uso de técnicas variadas, como por exemplo, algoritmos de comparação por similaridade estrutural dos dados, algoritmos de inteligência artificial, que através da geração do metadados integrador, possibilita a integração de dados heterogêneos. Essa flexibilidade permite lidar com a variedade crescente de dados, é proporcionada pela modularização da arquitetura proposta, que possibilita que integração de dados em um contexto Big Data de maneira automática, sem a necessidade de intervenção humana / The increase of data available to use has piqued interest in the generation of knowledge for the integration of such data bases. However, the task of integration requires knowledge of the data and the data models used to represent them. Namely, the accomplishment of the task of data integration requires the participation of experts in computing, which limits the scalability of this type of task. In the context of Big Data, this limitation is reinforced by the presence of a wide variety of sources and heterogeneous data representation models, such as relational data with structured and non-relational models with unstructured data, this variety of features an additional complexity representations for the data integration process. Handling this scenario is required the use of integration tools that reduce or even eliminate the need for human intervention. As a contribution, this work offers the possibility of integrating diverse data representation models and heterogeneous data sources through the use of varied techniques such as comparison algorithms for structural similarity of the artificial intelligence algorithms, data, among others. This flexibility, allows dealing with the growing variety of data, is provided by the proposed modularized architecture, which enables data integration in a context Big Data automatically, without the need for human intervention
160

Abordagem para integração automática de dados estruturados e não estruturados em um contexto Big Data / Approach for automatic integration of structured and unstructured data in a Big Data context

Saes, Keylla Ramos 22 November 2018 (has links)
O aumento de dados disponíveis para uso tem despertado o interesse na geração de conhecimento pela integração de tais dados. No entanto, a tarefa de integração requer conhecimento dos dados e também dos modelos de dados utilizados para representá-los. Ou seja, a realização da tarefa de integração de dados requer a participação de especialistas em computação, o que limita a escalabilidade desse tipo de tarefa. No contexto de Big Data, essa limitação é reforçada pela presença de uma grande variedade de fontes e modelos heterogêneos de representação de dados, como dados relacionais com dados estruturados e modelos não relacionais com dados não estruturados, essa variedade de representações apresenta uma complexidade adicional para o processo de integração de dados. Para lidar com esse cenário é necessário o uso de ferramentas de integração que reduzam ou até mesmo eliminem a necessidade de intervenção humana. Como contribuição, este trabalho oferece a possibilidade de integração de diversos modelos de representação de dados e fontes de dados heterogêneos, por meio de uma abordagem que permite o do uso de técnicas variadas, como por exemplo, algoritmos de comparação por similaridade estrutural dos dados, algoritmos de inteligência artificial, que através da geração do metadados integrador, possibilita a integração de dados heterogêneos. Essa flexibilidade permite lidar com a variedade crescente de dados, é proporcionada pela modularização da arquitetura proposta, que possibilita que integração de dados em um contexto Big Data de maneira automática, sem a necessidade de intervenção humana / The increase of data available to use has piqued interest in the generation of knowledge for the integration of such data bases. However, the task of integration requires knowledge of the data and the data models used to represent them. Namely, the accomplishment of the task of data integration requires the participation of experts in computing, which limits the scalability of this type of task. In the context of Big Data, this limitation is reinforced by the presence of a wide variety of sources and heterogeneous data representation models, such as relational data with structured and non-relational models with unstructured data, this variety of features an additional complexity representations for the data integration process. Handling this scenario is required the use of integration tools that reduce or even eliminate the need for human intervention. As a contribution, this work offers the possibility of integrating diverse data representation models and heterogeneous data sources through the use of varied techniques such as comparison algorithms for structural similarity of the artificial intelligence algorithms, data, among others. This flexibility, allows dealing with the growing variety of data, is provided by the proposed modularized architecture, which enables data integration in a context Big Data automatically, without the need for human intervention

Page generated in 0.0277 seconds