61 |
High-dimensional statistical data integrationJanuary 2019 (has links)
archives@tulane.edu / Modern biomedical studies often collect multiple types of high-dimensional data on a common set of objects. A representative model for the integrative analysis of multiple data types is to decompose each data matrix into a low-rank common-source matrix generated by latent factors shared across all data types, a low-rank distinctive-source matrix corresponding to each data type, and an additive noise matrix. We propose a novel decomposition method, called the decomposition-based generalized canonical correlation analysis, which appropriately defines those matrices by imposing a desirable orthogonality constraint on distinctive latent factors that aims to sufficiently capture the common latent factors. To further delineate the common and distinctive patterns between two data types, we propose another new decomposition method, called the common and distinctive pattern analysis. This method takes into account the common and distinctive information between the coefficient matrices of the common latent factors. We develop consistent estimation approaches for both proposed decompositions under high-dimensional settings, and demonstrate their finite-sample performance via extensive simulations. We illustrate the superiority of proposed methods over the state of the arts by real-world data examples obtained from The Cancer Genome Atlas and Human Connectome Project. / 1 / Zhe Qu
|
62 |
Décrypter la réponse thérapeutique des tumeurs en intégrant des données moléculaires, pharmacologiques et cliniques à l’aide de méthodes statistiques et informatiques / Deciphering Tumor Therapeutic Response by Integrating Molecular, Pharmacological and Clinical Data Using Statistical and Computational MethodsCarene, Dimitri 19 December 2019 (has links)
Le cancer est la cause la plus fréquente de décès dans le monde, avec 8,2 millions de décès par an. Des études génomiques à grande échelle ont montré que chaque tumeur est caractérisée par un profil génomique unique, conduisant au développement de la médecine de précision, où le traitement est adapté aux altérations génomiques de la tumeur du patient. Dans le cancer du sein précoce HR+/HER2-, les caractéristiques clinicopathologiques des patientes, bien qu’elles aient une valeur pronostique claire, ne sont pas suffisantes pour expliquer entièrement le risque de rechute à distance. L'objectif principal de ce projet de thèse était de déterminer les altérations génomiques impliquées dans la rechute à distance, en plus des paramètres cliniques des patientes, en utilisant des méthodes statistiques et informatiques. Ce projet a été réalisé à partir de données cliniques et génomiques (nombre de copies et mutations) issues des études PACS04 et METABRIC.Dans la première partie de mon projet de thèse, j’ai tout d’abord évalué la valeur pronostique du nombre de copies de gènes prédéfinis (FGFR1, Fibroblast Growth Factor Receptor 1 ; CCND1, Cyclin D1 ; ZNF217, Zinc Finger protein 217 ; ERBB2 ou HER2, Humain Epidermal Growth Factor) ainsi qu’un panel de mutations de gènes « driver ». Les résultats de l’étude PACS04 ont montrés que l’amplification de FGFR1 augmente le risque de rechute à distance alors que les mutations de MAP3K1 diminuent le risque de rechute. Ensuite, un score génomique fondé sur FGFR1 et MAP3K1 a été créé et a permis de déceler trois niveaux de risques de rechute à distance : risque faible (patientes ayant une mutation du gène MAP3K1), risque modéré (patientes n’ayant pas d’altération du nombre de copies de FGFR1 et n’ayant pas de mutation de MAP3K1) et risque élevé (patientes ayant une amplification de FGFR1 et n’ayant pas de mutation de MAP3K1). Enfin, ce score génomique a été validé sur une base de données publique, METABRIC. Dans la seconde partie de mon projet de thèse, de nouveaux biomarqueurs génomiques pronostiques de la survie ont pu être identifiés grâce aux méthodes pénalisées de type LASSO, prenant en compte la structure en bloc des données.Mots-clés : Altération du nombre de copies, mutations, cancer du sein, biomarqueurs, méthode de sélection de variables, réduction de dimension, modèle de Cox / Cancer is the most frequent cause of death in the world, with 8.2 million death / year. Large-scale genome studies have shown that each cancer is characterized by a unique genomic profile. This has led to the development of precision medicine, which aims at targeting treatment using tumor genomic alterations that are patient-specific. In hormone-receptor positive/human epidermal growth factor receptor-2 negative early breast cancer, clinicopathologic characteristics are not sufficient to fully explain the risk of distant relapse, despite their well-established prognostic value. The main objective of this thesis project was to use statistical and computational methods to assess to what extent genomic alterations are involved in distant breast cancer relapse in addition to classic prognostic clinicopathologic parameters. This project used clinical and genomic data (i.e., copy numbers and driver gene mutations) from the PACS04 and METABRIC trial.In the first part of my thesis project, I first evaluated prognostic value of copy numbers of predefined genes including FGFR1, Fibroblast Growth Factor Receptor 1; CCND1, Cyclin D1; ZNF217, Zinc Finger Protein 217; ERBB2 or HER2, Human Epidermal Growth Factor, as well as a panel of driver gene mutations. Results from the PACS04 trial showed that FGFR1 amplification increases the risk of distant relapse, whereas mutations of MAP3K1 decrease the risk of relapse. Second, a genomic score based on FGFR1 and MAP3K1, allowed to identify three levels of risk of distant relapse: low risk (patients with a MAP3K1 mutation), moderate risk (patients without FGFR1 copy number aberration and without MAP3K1 mutation) and high risk (patients with FGFR1 amplification and without MAP3K1 mutation). Finally, this genomic score was validated in METABRIC, a publicly available database. In the second part of my thesis project, new prognostic genomic biomarkers of survival were identified using penalized methods of LASSO type, taking into account the block structure of the data.Keywords: Copy number aberrations (CNA), mutations, breast cancer (BC), biomarkers, variable selection methods, dimension reduction, cox regression
|
63 |
Integrating phenotype-genotype data for prioritization of candidate symptom genesXing, L., Zhou, X., Peng, Yonghong, Zhang, R., Hu, J., Yu, J., Liu, B. January 2013 (has links)
No / Symptoms and signs (symptoms in brief) are the essential clinical manifestations for traditional Chinese medicine (TCM) diagnosis and treatments. To gain insights into the molecular mechanism of symptoms, this paper presents a network-based data mining method to integrate multiple phenotype-genotype data sources and predict the prioritizing gene rank list of symptoms. The result of this pilot study suggested some insights on the molecular mechanism of symptoms.
|
64 |
The Properties of Property Alignment on the Semantic WebCheatham, Michelle Andreen 25 August 2014 (has links)
No description available.
|
65 |
Data Integration in Reporting Systems using Enterprise Service BusKoppal, Ketaki January 2009 (has links)
No description available.
|
66 |
Contributions to Sparse Statistical Methods for Data IntegrationBonner, Ashley January 2018 (has links)
Background: Scientists are measuring multiple sources of massive, complex, and diverse data in hopes to better understand the principles underpinning complex phenomena. Sophisticated statistical and computational methods that reduce data complexity, harness variability, and integrate multiple sources of information are required. The ‘sparse’ class of multivariate statistical methods is becoming a promising solution to these data-driven challenges, but lacks application, testing, and development.
Methods: In this thesis, efforts are three-fold. Sparse principal component analysis (sparse PCA) and sparse canonical correlation analysis (sparse CCA) are applied to a large toxicogenomic database to uncover candidate genes associated with drug toxicity. Extensive simulations are conducted to test and compare the performance of many sparse CCA methods, determining which methods are most accurate under a variety of realistic, large-data scenarios. Finally, the performance of the non-parametric bootstrap is examined, determining its ability to generate inferential measures for sparse CCA.
Results: Through applications, several groups of candidate genes are obtained to point researchers towards promising genetic profiles of drug toxicity. Simulations expose one sparse CCA method that outperforms the rest in the majority of data scenarios, while suggesting the use of a combination of complimentary sparse CCA methods for specific data conditions. Simulations for the bootstrap conclude the bootstrap to be a suitable means for inference for the canonical correlation coefficient for sparse CCA but only when sample size approaches the number of variables. As well, it is shown that aggregating sparse CCA results from many bootstrap samples can improve accuracy of detection of truly cross-correlated features.
Conclusions: Sparse multivariate methods can flexibly handle challenging integrative analysis tasks. Work in this thesis has demonstrated their much-needed utility in the field of toxicogenomics and strengthened our knowledge about how they perform within a complex, massive data framework, while promoting the use of bootstrapped inferential measures. / Thesis / Doctor of Philosophy (PhD) / Due to rapid advances in technology, many areas of scientific research are measuring multiple sources of massive, complex, and diverse data in hopes to better understand the principles underpinning puzzling phenomena. Now, more than ever, advancement and discovery relies upon sophisticated and robust statistical and computational methods that reduce the data complexity, harness variability, and integrate multiple sources of information. In this thesis, I test and validate the ‘sparse’ class of multivariate statistical methods that is becoming a promising, fresh solution to these data-driven challenges. Using publicly available data from genetic toxicology as motivation, I demonstrate the utility of these methods, find where they work best, and explore the possibility of improving their scientific interpretability. The work in this thesis contributes to both biostatistics and genomic literature, by meshing together rigorous statistical methodology with real-world data applications.
|
67 |
Integration strategies for toxicity data from an empirical perspectiveYang, L., Neagu, Daniel January 2014 (has links)
No / The recent development of information techniques, especially the state-of-the-art “big data” solutions, enables the extracting, gathering, and processing large amount of toxicity information from multiple sources. Facilitated by this technology advance, a framework named integrated testing strategies (ITS) has been proposed in the predictive toxicology domain, in an effort to intelligently jointly use multiple heterogeneous toxicity data records (through data fusion, grouping, interpolation/extrapolation etc.) for toxicity assessment. This will ultimately contribute to accelerating the development cycle of chemical products, reducing animal use, and decreasing development costs. Most of the current study in ITS is based on a group of consensus processes, termed weight of evidence (WoE), which quantitatively integrate all the relevant data instances towards the same endpoint into an integrated decision supported by data quality. Several WoE implementations for the particular case of toxicity data fusion have been presented in the literature, which are collectively studied in this paper. Noting that these uncertainty handling methodologies are usually not simply developed from conventional probability theory due to the unavailability of big datasets, this paper first investigates the mathematical foundations of these approaches. Then, the investigated data integration models are applied to a representative case in the predictive toxicology domain, with the experimental results compared and analysed.
|
68 |
Master Data Integration hub - řešení pro konsolidaci referenčních dat v podniku / Master Data Integration hub - solution for company-wide consolidation of referrential dataBartoš, Jan January 2011 (has links)
In current information systems the requirement to integrate disparate applications into cohesive package is greatly accented. While well-established technologies facilitating functional and comunicational integration (ESB, message brokes, web services) already exist, tools and methodologies for continuous integration of disparate data sources on enterprise-wide level are still in development. Master Data Management (MDM) is a major approach in the area of data integration and referrential data management in particular. It encompasses the referrential data integration, data quality management and referrential data consolidation, metadata management, master data ownership, principle of accountability for master data and processes related to referrential data management. Thesis is focused on technological aspects of MDM implementation realized via introduction of centrallized repository for master data -- Master Data Integration Hub (MDI Hub). MDI Hub is an application which enables the integration and consolidation of referrential data stored in disparate systems and applications based on predefined workflows. It also handles the master data propagation back to source systems and provides services like dictionaries management and data quality monitoring. Thesis objective is to cover design and implementation aspects of MDI Hub, which forms the application part of MDM. In introduction the motivation for referrential data consolidation is discussed and list of techniques used in MDI Hub solution development is presented. The main part of thesis proposes the design of MDI Hub referrential architecture and suggests the activities performed in process of MDI Hub implementation. Thesis is based on information gained from specialized publications, on knowledge gathererd by delivering projects with companies Adastra and Ataccama and on co-workers know-how and experience. Most important contribution of thesis is comprehensive view on MDI Hub design and MDI Hub referrential architecture proposal. MDI Hub referrential architecture can serve as basis for particular MDI Hub implementation.
|
69 |
Data integration in large enterprises / Datová integrace ve velkých podnicíchNagyová, Barbora January 2015 (has links)
Data Integration is currently an important and complex topic for many companies, because having a good and working Data Integration solution can bring multiple advantages over competitors. Data Integration is usually being executed in a form of a project, which might easily turn into failure. In order to decrease risks and negative impact of a failed Data Integration project, there needs to be good project management, Data Integration knowledge and the right technology in place. This thesis provides a framework for setting up a good Data Integration solution. The framework is developed based on the current theory, currently available Data Integration tools and opinions provided by experts working in the field for a minimum of 7+ years and have proven their skills with a successful Data Integration project. This thesis does not guarantee the development of the right Data Integration solution, but it does provide guidance how to deal with a Data Integration project in a large enterprise. This thesis is structured into seven chapters. The first chapter brings an overview about this thesis such as scope, goals, assumptions and expected value. The second chapter describes Data Management and basic Data Integration theory in order to distinguish these two topics and to explain the relationship between them. The third chapter is focused purely on Data Integration theory which should be known by everyone who participates in a Data Integration project. The fourth chapter analyses features of the current Data Integration solutions available on the market and provides an overview of the most common and necessary functionalities. Chapter five focuses on the practical part of this thesis, where the Data Integration framework is designed based on findings from previous chapters and interviews with experts in this field. Chapter six then applies the framework to a real working (anonymized) Data Integration solution, highlights the gap between the framework and the solution and provides guidance how to deal with the gaps. Chapter seven provides a resume, personal opinion and outlook.
|
70 |
Master Data Management, Integrace zákaznických dat a hodnota pro business / Master Data Management, Customer Data Integration and value for businessRais, Filip January 2009 (has links)
This thesis is focused on Master Data Management (MDM), Customer Data Integration (CDI) area and its main domains. It is also a reference to a various theoretical directions that can be found in this area of expertise. It summarizes main aspects, domains and presents different perspectives to referenced principles. It is an exhaustive background research in area of Master Data Management with emphasis on practical use with references on authors experience and opinions. Secondary focus is directed to the field of business value of Master Data Management initiatives. Thesis presents a thought concept for initiations of MDM project. The reason for such a concept is based on current trend, where companies are struggling to determine actual benefits of MDM initiatives. There is overall accord on the subject of necessity of such initiatives, but the struggle is in area of determining actual measureable impact on company's revenue or profit. Since the MDM initiative is more of an enabling function, rather than direct revenue function, the benefit is less straight forward and therefore harder to determine. This work describes different layers and mapping of business requirements through layers for transparent linkage between enabling functions to revenue generating ones. The emphasis is given to financial benefit calculation, measurability and responsibility of business and IT departments. To underline certain conclusions thesis also presents real world interviews with possible stakeholders of MDM initiative within the company. These representatives were selected as key drivers for such an initiative. Interviews map their recognition of MDM and related terms. It also focus on their reasons and expectations from MDM. The representatives were also selected to equally represent business and IT departments, which presents interesting clash of views and expectations.
|
Page generated in 0.023 seconds