Global ETD Search

1	Beyond monogenic diseases: a first collection and analysis of digenic diseases Gazzo, Andrea 25 June 2018 (has links) In the next generation sequencing era many bioinformatics tools have been developed for assisting scientists in their studies on the molecular basis of genetic diseases, often with the aim of identifying the pathogenic variants. As a consequence, in the last decades more than one hundred new disease-gene associations have been discovered. Nevertheless, the genetic basis of many genetic diseases yet remains undisclosed. It has been shown that many diseases considered as monogenic with an imperfect genotype-phenotype correlation or incomplete penetrance are, on the contrary, caused or modulated by more than one mutated gene, meaning that they are in fact oligogenic. Current bioinformatics methods used for identifying pathogenic variants are trained and fine-tuned for identifying a single variant responsible of a disease. This monogenic-oriented approach cannot be used to explore the impact of combinations of variants in different genes on the complexity and genetic heterogeneity of rare diseases. Digenic diseases are the simplest form of oligogenic disease and thus they can provide a conceptual bridge between monogenic and the poorly understood polygenic diseases.The ambition of this thesis is to collect and analyse digenic data, introducing this topic in the bioinformatics field where digenic diseases are still an unexplored branch. This can be divided in two steps: the first consists in the creation of a central repository containing detailed information on digenic diseases; the second is an analysis of their peculiarities, using machine learning methods for studying subclasses of digenic effects.In the first step we developed DIDA (DIgenic diseases DAtabase), a novel database that provides for the first time a curated collection of genes and associated variants involved in digenic diseases. Detailed information related to the digenic mechanism have been manually mined from the medical literature. All instances in DIDA were also assigned to two sub classes of digenic effects, annotated as true digenic (both genes are required for developing the disease) and composite classes (one gene is sufficient to produce the disease phenotype, the second one alters it or change significantly the age of onset).In the second step, we hypothesized that the digenic effect may be related to some biological properties characterizing digenic combinations. Using machine learning methods, we show that a set of variant, gene and higher-level features can differentiate between the true digenic and composite classes with high accuracy. Moreover, we show that a digenic effect decision profile, extracted from the predictive model, motivates why an instance is assigned to either of the two classes.Together, our results show that digenic disease data generates novel insights, providing a glimpse into the oligogenic realm. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Médecine pathologie humaine Sciences biomédicales digenic oligogenic bioinformatics genetics machine learning
2	Towards multivariant pathogenicity predictions: Using machine-learning to directly predict and explore disease-causing oligogenic variant combinations Papadimitriou, Sofia 15 September 2020 (has links) (PDF) The emergence of statistical and predictive methods able to analyse genomic data has revolutionised the field of medical genetics, allowing the identification of disease-causing gene variants (i.e. mutations) for several human genetic diseases. Although these approaches have greatly improved our understanding of Mendelian «one gene – one phenotype» genetic models, studying diseases related to more intricate models that involve causative variants in several genes (i.e. oligogenic diseases) still remains a challenge, either due to the lack of sufficient methodologies and disease-specific cohorts to study or the phenotypic complexity associated with such diseases. This situation makes it difficult to not only understand the genetic mechanisms of the disease, but to also offer proper counseling and support to the patient. Until recently, no specialized predictive methods existed to directly predict causative variant combinations for oligogenic diseases. However, with the advent of data on variant combinations in gene pairs (i.e. bilocus variant combinations) leading to disease, collected at the Digenic Diseases Database (DIDA), we hypothesized that the transition from single to variant combination pathogenicity predictors is now possible.To investigate this hypothesis, we organised our research on two main routes. At first, we developed an interpretable variant combination pathogenicity predictor, called VarCoPP, for gene pairs. For this goal, we trained multiple Random Forest algorithms on pathogenic bilocus variant combinations from DIDA against neutral data from the 1000 Genomes Project and investigated the contribution of the incorporated variant, gene and gene pair features to the prediction outcome. In the second part, we explored the usefulness of different gene pair burden scores based on this novel predictive method, in discovering oligogenic signatures in neurodevelopmental diseases, which involve a spectrum of monogenic to polygenic cases. We performed a preliminary analysis on the Deciphering Developmental Diseases (DDD) project containing exome data of 4195 families and assessed the capability of our scores in supporting already diagnosed monogenic cases, discovering significant pairs compared to control cases and linking patients in communities based on the genetic burden they share, using the Leiden community detection algorithm.The performance of VarCoPP shows that it is possible to predict disease-causing bilocus variant combinations with good accuracy both during cross-validation and when testing on new cases. We also show its relevance for disease-related gene panels, and enhanced its clinical applicability by defining confidence zones that guarantee with 95\% or 99\% probability that a prediction is indeed a true positive, guiding clinical researchers towards the most relevant results. This method and additional biological annotations are incorporated in an online platform called ORVAL that allows the prediction and exploration of candidate disease-causing oligogenic variant combinations with predicted gene networks, based on patient variant data. Our preliminary analysis on the DDD cohort shows that - although all bi-locus burden scores show advantages, disadvantages and certain types of biases - taking the maximum pathogenicity score present inside a gene pair seems to provide, at the moment, the most unbiased results. We also show that our predictive methods enable us to detect patient communities inside DDD, based exclusively on the shared pathogenic bi-locus burden between patients, with more than half of these communities containing enriched phenotypic and molecular pathway information. Our predictive method is also able to bring to the surface genes not officially known to be involved in disease, but nevertheless, with a biological relevance, as well as a few examples of potential oligogenicity inside the network, paving the way for further exploration of oligogenic signatures for neurodevelopmental diseases. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Informatique générale Génétique clinique Informatique médicale bioinformatics machine-learning oligogenic diseases neurodevelopmental diseases community detection

Search results

Beyond monogenic diseases: a first collection and analysis of digenic diseases

Towards multivariant pathogenicity predictions: Using machine-learning to directly predict and explore disease-causing oligogenic variant combinations