21 |
Uma abordagem de integração de dados de redes PPI e expressão gênica para priorizar genes relacionados a doenças complexas / An integrative approach combining PPI networks and gene expression to prioritize genes related to complex diseasesSérgio Nery Simões 30 June 2015 (has links)
Doenças complexas são caracterizadas por serem poligênicas e multifatoriais, o que representa um desafio em relação à busca de genes relacionados a elas. Com o advento das tecnologias de sequenciamento em larga escala do genoma e das medições de expressão gênica (transcritoma), bem como o conhecimento de interações proteína-proteína, doenças complexas têm sido sistematicamente investigadas. Particularmente, baseando-se no paradigma Network Medicine, as redes de interação proteína-proteína (PPI -- Protein-Protein Interaction) têm sido utilizadas para priorizar genes relacionados às doenças complexas segundo suas características topológicas. Entretanto, as redes PPI são afetadas pelo viés da literatura, em que as proteínas mais estudadas tendem a ter mais conexões, degradando a qualidade dos resultados. Adicionalmente, métodos que utilizam somente redes PPI fornecem apenas resultados estáticos e não-específicos, uma vez que as topologias destas redes não são específicas de uma determinada doença. Neste trabalho, desenvolvemos uma metodologia para priorizar genes e vias biológicas relacionados à uma dada doença complexa, através de uma abordagem integrativa de dados de redes PPI, transcritômica e genômica, visando aumentar a replicabilidade dos diferentes estudos e a descoberta de novos genes associados à doença. Após a integração das redes PPI com dados de expressão gênica, aplicamos as hipóteses da Network Medicine à rede resultante para conectar genes sementes (relacionados à doença, definidos a partir de estudos de associação) através de caminhos mínimos que possuam maior co-expressão entre seus genes. Dados de expressão em duas condições (controle e doença) são usados separadamente para obter duas redes, em que cada nó (gene) dessas redes é pontuado segundo fatores topológicos e de co-expressão. Baseado nesta pontuação, desenvolvemos dois escores de ranqueamento: um que prioriza genes com maior alteração entre suas pontuações em cada condição, e outro que privilegia genes com a maior soma destas pontuações. A aplicação do método a três estudos envolvendo dados de expressão de esquizofrenia recuperou com sucesso genes diferencialmente co-expressos em duas condições, e ao mesmo tempo evitou o viés da literatura. Além disso, houve uma melhoria substancial na replicação dos resultados pelo método aplicado aos três estudos, que por métodos convencionais não alcançavam replicabilidade satisfatória. / Complex diseases are characterized as being poligenic and multifactorial, so this poses a challenge regarding the search for genes related to them. With the advent of high-throughput technologies for genome sequencing and gene expression measurements (transcriptome), as well as the knowledge of protein-protein interactions, complex diseases have been sistematically investigated. Particularly, Protein-Protein Interaction (PPI) networks have been used to prioritize genes related to complex diseases according to its topological features. However, PPI networks are affected by ascertainment bias, in which the most studied proteins tend to have more connections, degrading the quality of the results. Additionally, methods using only PPI networks can provide just static and non-specific results, since the topologies of these networks are not specific of a given disease. In this work, we developed a methodology to prioritize genes and biological pathways related to a given complex disease, through an approach that integrates data from PPI networks, transcriptomics and genomics, aiming to increase replicability of different studies and to discover new genes associated to the disease. The methodology integrates PPI network and gene expression data, and then applies the Network Medicine Hypotheses to the resulting network in order to connect seed genes (obtained from association studies) through shortest paths possessing larger coexpression among their genes. Gene expression data in two conditions (control and disease) are used to obtain two networks, where each node (gene) in these networks is rated according to topological and coexpression aspects. Based on this rating, we developed two ranking scores: one that prioritizes genes with the largest alteration between their ratings in each condition, and another that favors genes with the greatest sum of these scores. The application of this method to three studies involving schizophrenia expression data successfully recovered differentially co-expressed gene in two conditions, while avoiding the ascertainment bias. Furthermore, when applied to the three studies, the method achieved a substantial improvement in replication of results, while other conventional methods did not reach a satisfactory replicability.
|
22 |
Genetic Signatures of the Retina in Health and DiseaseMustafi, Debarshi 23 August 2013 (has links)
No description available.
|
23 |
Topics in Computational and Statistical Genomics: Exploring Alternatives to the Wald Test and Identifying Deleterious Mutations in Human Diseases.GNONA, KOMLA MESSAN 30 August 2022 (has links)
No description available.
|
24 |
Stratégies de recherches de phénomènes d’interactions dans les maladies multifactorielles / Research strategies for finding genetic interaction phenomena in multifactorial diseasesGreliche, Nicolas 18 February 2013 (has links)
Les études d'associations en génome entier ("GWAS") ont récemment permis la découverte de nombreux polymorphismes génétiques impliqués dans la susceptibilité aux maladies multifactorielles. Cependant, ces polymorphismes n'expliquent qu'une faible part de l'héritabilité génétique de ces maladies, nous poussant ainsi à explorer de nouvelles pistes de recherche. Une des hypothèses envisagées serait qu'une partie de cette héritabilité manquante fasse intervenir des phénomènes d'interactions entre polymorphismes génétiques. L'objectif de cette thèse est d'explorer cette hypothèse en adoptant une stratégie de recherche d'interactions basée sur des critères statistiques et biologiques à partir de données issues de différentes études "GWAS". Ainsi, en utilisant différentes méthodes statistiques, nous avons commencé par rechercher des interactions entre polymorphismes qui pourraient influencer le risque de thrombose veineuse. Cette recherche n'a malheureusement pas abouti à l'identification de résultats robustes vis à vis du problème des tests multiples. Dans un deuxième temps, à partir d'hypothèses "plus biologiques", nous avons tenté de mettre en évidence des interactions entre polymorphismes impliqués dans les mécanismes de régulation de l'expression génique associés aux microARNs. Nous avons pu ainsi montrer de manière robuste dans deux populations indépendantes qu'un polymorphisme au sein de la séquence du microARN hsa-mir-219-1 interagissait avec un polymorphisme du gène HLA-DPB1 pour en moduler l'expression monocytaire. Nous avons également montré que l'expression monocytaire du gène H1F0 était influencée par un phénomène d'interaction impliquant un polymorphisme du microARN hsa-mir-659. En apportant sa propre contribution à l'engouement récent que suscite la recherche d'interactions entre polymorphismes dans les maladies dites complexes, ce travail de thèse illustre clairement la difficulté d'une telle tâche et l'importance de réfléchir à de nouvelles stratégies de recherches. / Recently, Genome-Wide Association Studies (GWAS) have led to the discovery of numerous genetic polymorphisms involved in complex human diseases. However, these polymorphisms contribute only a little to the overall genetic variability of these diseases, suggesting the need for new kind of investigations in order to disentangle the so-called "missing heritability". The purpose of my PhD project was to investigate how different research strategies relying on statistical and biological considerations could help in determining whether part of this missing heritability could reside in interaction phenomena between genetic polymorphisms. Firstly, we applied different statistical methodologies and looked for interactions between polymorphisms that could influence the risk of venous thrombosis (VT). Even though this study was based on two large GWAS datasets, we were not able to identify pairwise interactions that survive multiple testing. This work suggests that strong interactive phenomena between common SNPs are unlikely to contribute much to the risk of VT. Second, by adopting a hypothesis-driven approach relying on biological arguments, we sought for interactions between microRNA related polymorphisms that could alter genetic expression. Using two large GWAS datasets in which genome-wide monocyte expression was also available, we were able to demonstrate the existence of two pairwise interaction phenomena on monocyte expression involving miRNAs polymorphisms: 1/ the expression of HLA-DPB1 was modulated by a polymorphism in its 3'UTR region with a polymorphism in the hsa-mir-219-1 microRNA sequence; 2/ similarly, the expression of H1F0 was influenced by a polymorphism in its 3'UTR region interacting with a polymorphism in the microRNA hsa-mir-659. Altogether, this project supports for the role of gene x gene interactions in the interindividual variability of biological processes but their identifications remain a tedious task requiring large samples and the development of new research strategies and methodologies.
|
25 |
Seleção de características a partir da integração de dados por meio de análise de variação de número de cópias (CNV) para associação genótipo-fenótipo de doenças complexasMeneguin, Christian Reis January 2018 (has links)
Orientador: Prof. Dr. David Corrêa Martins Júnior / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Ciência da Computação, Santo André, 2018. / As pesquisas em biologia sistêmica caracterizam-se pela interdisciplinaridade, a compreensão
com visão ampla sobre as interações ocorridas internamente em organismos biológicos,
hereditariedade e a influência de fatores ambientais. Neste cenário, é constituída uma
rede complexa de interações na qual seus componentes são de diferentes tipos, como as
variações do número de cópias (Copy Number Variation - CNVs), genes, entre outros.
As doenças complexas que ocorrem neste contexto normalmente são consequências de
perturbações intracelulares e intercelulares em tecidos e órgãos, sendo desenvolvidas de
forma multifatorial, ou seja, a causa e o desenvolvimento dessas doenças são fruto de
diversos fatores genéticos e ambientais. Nos últimos anos, tem sido produzido um volume
bastante elevado de dados biológicos gerados por técnicas de sequenciamento de alto
desempenho, requerendo pesquisas que envolvam para uma análise integrada desses dados.
As variações do número de cópias (Copy Number Variation - CNVs), ou seja, a variação
no número de repetições de subsequências de DNA entre indivíduos, se mostram úteis
visto que estão relacionadas com outros tipos de dados como genes e dados de expressão
gênica (abundâncias de mRNAs transcritos pelos genes em diferentes contextos). Devido
a natureza heterogênea e a imensa quantidade de dados, a análise integrativa é um desafio
computacional para o qual abordagens vêm sendo propostas. Neste sentido, nesta
dissertação foi proposto um método que realiza a integração de dados (CNVs, dados de
expressão gênica, haploinsuficiência, imprint, entre outros) por meio de um processo que
permite identificar trechos comuns de CNVs entre amostras de diferentes indivíduos, sejam
estas amostras de caso ou de controle e que possuem informações obtidas a partir das
integrações feitas. Com este processo, o método aqui proposto diferencia-se dos métodos
que realizam integração de dados por meio da análise de sobreposição dos dados biológicos,
mas não geram novos dados contendo intervalos de CNVs existentes entre as amostras. O
método proposto foi analisado com base no estudo de caso do autismo (Transtornos do
Espectro Autista - TEA). O autismo, além de ser considerado uma doença complexa, possui
algumas particularidades que dificultam o seu estudo quando comparado a outros tipos
de doenças complexas como o câncer, por exemplo. Foram realizados dois experimentos
que envolveram dados dos CNVs de indivíduos com TEA (caso) e indivíduos sem este
transtorno (controle). Também foi feito um experimento utilizando amostras de CNVs de
TEA e amostras de CNVs relacionados a outras doenças do neurodesenvolvimento. Os
experimentos envolveram a integração dos tipos de dados propostos. Foi possível identificar
trechos de CNVs que estão presentes somente em amostras associadas aos casos e não em
controles, ou cenários de trechos de CNVs presentes em amostras de TEA e ausentes nas
amostras de outras doenças do neurodesenvolvimento, e vice-versa. Os resultados também
refletiram a tendência de indivíduos do gênero masculino serem mais afetados por TEA em
relação ao feminino. Foi possível também identificar genes associados e informações como
o biotipo e se estão presentes em dados de haploinsuficiência, imprint ou ainda dados de
expressão agrupados em regiões e períodos. Finalmente, análises de enriquecimento das
listas de genes dos CNVs resultantes do método apontam para diversas vias relacionadas
com o TEA, tais como as vias de sinalização do receptor toll-like dependente de TRIF, do
ácido gama-aminobutírico (GABA), de transmissão sináptica e secreção neurotransmissora,
de recepção da insulina, de percepção sensorial olfativa, e de adesão celular independente
de cálcio. / Researches in systems biology are characterized by interdisciplinarity, wide-ranging understanding
of interactions within biological organisms, heredity, and the influence of
environmental factors. In this scenario, a complex network of interactions is constituted of
different types of components, such as CNVs (Copy Number Variations), genes, and others.
Complex diseases that occur in this context are usually consequences of intracellular,
intercellular, tissue, organ, and multifactorial disorders, i.e., the cause and development
of these diseases are the result of various genetic and environmental factors. In recent
years, a very large volume of biological data generated by high performance sequencing
techniques has been produced, requiring researches involving an integrated analysis of
these data. CNVs, i.e., the variation in the number of DNA subsequences between individuals,
are useful because they are related to other types of data such as genes and
gene expression data (abundances of mRNAs transcribed by genes in different contexts).
Due to the heterogeneous nature and the immense amount of data, integrative analysis
is a computational challenge for which approaches have been proposed. In this sense, in
this dissertation a method was proposed that performs a data integration (CNVs, gene
expression data, haploinsufficiency, imprint, among others) through a process that allows
to identify common portions of CNVs between samples of different individuals, being these
case or control samples and that have information obtained from the integration performed.
In this context, the method proposed here differs from the methods that carry out data
integration through the analysis of the overlay of the biological data, but does not generate
new data containing ranges of CNVs existing between the samples. The proposed method
was analyzed on the basis of the case study of Autistic Spectrum Disorder (ASD). Besides
being considered a complex disease, TEA has some peculiarities that hinder its study
when compared to other types of complex diseases such as cancer, for example. As a case
study, two experiments were carried out that involved data from the CNVs of individuals
with ASD (case) and individuals without this disorder (control). An experiment was also
done using samples of ASD CNVs and CNVs samples related to other neurodevelopmental
diseases. The experiments involved the integration of the proposed data types. Among the
results, the method identified excerpts of CNVs that are present only in samples associated
with the cases and not in controls, or scenarios of CNVs snippets present in TEA samples
and not present in other neurodevelopmental disease samples, and vice-versa. The results
also reflected the tendency for males to be more affected by TEA compared to the females.
In the excerpts of CNVs in certain results, it was possible to identify associated gene
informations such as the biotype and whether they are present in Haploinsufficiency, imprint
or even expression data grouped in regions and periods. Finally, enrichment analyses
involving lists of genes from the resulting CNVs point to several signaling pathways related
to TEA, such as TRIF-dependent toll-like receptor signaling, gamma aminobutyric acid
(GABA), synaptic transmission and neurotransmitter secretion, insulin reception, olfactory
sensorial perception, and calcium independent cell-cell adhesion.
|
26 |
Computational Methods to Characterize the Etiology of Complex Diseases at Multiple LevelsElmansy, Dalia F. 29 May 2020 (has links)
No description available.
|
27 |
Prioritizing Causative Genomic Variants by Integrating Molecular and Functional Annotations from Multiple Biomedical OntologiesAlthagafi, Azza Th. 20 July 2023 (has links)
Whole-exome and genome sequencing are widely used to diagnose individual patients. However, despite its success, this approach leaves many patients undiagnosed. This could be due to the need to discover more disease genes and variants or because disease phenotypes are novel and arise from a combination of variants of multiple known genes related to the disease. Recent rapid increases in available genomic, biomedical, and phenotypic data enable computational analyses, reducing the search space for disease-causing genes or variants and facilitating the prediction of causal variants. Therefore, artificial intelligence, data mining, machine learning, and deep learning are essential tools that have been used to identify biological interactions, including protein-protein interactions, gene-disease predictions, and variant--disease associations. Predicting these biological associations is a critical step in diagnosing patients with rare or complex diseases.
In recent years, computational methods have emerged to improve gene-disease prioritization by incorporating phenotype information. These methods evaluate a patient's phenotype against a database of gene-phenotype associations to identify the closest match. However, inadequate knowledge of phenotypes linked with specific genes in humans and model organisms limits the effectiveness of the prediction. Information about gene product functions and anatomical locations of gene expression is accessible for many genes and can be associated with phenotypes through ontologies and machine-learning models. Incorporating this information can enhance gene-disease prioritization methods and more accurately identify potential disease-causing genes.
This dissertation aims to address key limitations in gene-disease prediction and variant prioritization by developing computational methods that systematically relate human phenotypes that arise as a consequence of the loss or change of gene function to gene functions and anatomical and cellular locations of activity. To achieve this objective, this work focuses on crucial problems in the causative variant prioritization pipeline and presents novel computational methods that significantly improve prediction performance by leveraging large background knowledge data and integrating multiple techniques.
Therefore, this dissertation presents novel approaches that utilize graph-based machine-learning techniques to leverage biomedical ontologies and linked biological data as background knowledge graphs. The methods employ representation learning with knowledge graphs and introduce generic models that address computational problems in gene-disease associations and variant prioritization. I demonstrate that my approach is capable of compensating for incomplete information in public databases and efficiently integrating with other biomedical data for similar prediction tasks. Moreover, my methods outperform other relevant approaches that rely on manually crafted features and laborious pre-processing. I systematically evaluate our methods and illustrate their potential applications for data analytics in biomedicine. Finally, I demonstrate how our prediction tools can be used in the clinic to assist geneticists in decision-making. In summary, this dissertation contributes to the development of more effective methods for predicting disease-causing variants and advancing precision medicine.
|
Page generated in 0.0634 seconds