Some methods improvement and extension to exposure-response estimation with family dataWang, Ruiqi 05 February 2025 (has links)
2023 / Ignoring correlation among observations can lead to inaccurate inference, making it essential to develop methods to analyze correlated data correctly. However, existing methods are limited and cannot answer specific questions when the data are correlated, or certain model assumptions are unmet. To address these issues, this thesis extends and improves three existing methods to handle correlated observations, specifically those arising from family data. The objective is to provide analytical tools that can handle the complexities of familial correlations, making these methods more effective and applicable in real-world scenarios. We first extend Bayesian factor analysis for family data to account for family structure. Our method can estimate the covariance matrix and select batches of correlated predictors simultaneously while accounting for family structure in the model. We demonstrate its effectiveness through simulation studies and real data analysis. Our method outperforms existing methods in covariance matrix estimation, regression coefficient estimation, and variable selection, particularly in high-dimensional situations. We also apply our method to a real dataset and show that it successfully deciphers the true association among a group of correlated metabolites. Then we propose a new method referred as BKMR-MHMC to improve the computational efficiency and capacity of Bayesian kernel machine regression (BKMR) to estimate non-linear exposure-response functions and perform variable selection. We also modify hierarchical variable selection using mixed Hamiltonian Monte Carlo (M-HMC) to handle with highly correlated predictors. By introducing a random effect, BKMR-MHMC can accommodate complex correlation structures like family structures. We show through simulation and real data analyses that the proposed BKMR-MHMC method outperforms the original BKMR method and its speed-up version in convergence speed and accuracy for high-dimensional data, in the ability to incorporate highly correlated predictors and in modeling complex correlation structures. Finally, we extend the generalized additive mixed model to handle family data with repeated measurements. We extend the gamm4 R package for incorporating family data using re-parameterization and transformation. We evaluate the effectiveness of our proposed approach on intraclass correlation coefficient (ICC) estimation and prediction accuracy through simulations under various scenarios, as well as on a real dataset from the Framingham Heart Study. Our proposed approach can accurately estimate ICC, particularly for dense family structures, and the prediction accuracy surpasses the original gamm4 method, particularly for a large number of subjects. / 2027-02-04T00:00:00Z
Detection of Copy Number Variation (CNV) and its characterization in Brazilian population / Detecção de Copy Number Variation (CNV) e sua caracterização na população brasileiraCiconelle, Ana Cláudia Martins 06 February 2018 (has links)
Genome-wide association studies (GWAS) are a tool of high importance to associate genetic markers, genes and genomic regions with complex phenotypes and diseases, allowing to understand in details this regulation of gene expression as well as the genes, and then develop new techniques of diagnoses and treatment of diseases. Nowadays, the main genetic marker used in GWAS is the SNP (single nucleotide polymorphism), a variation that affects only one base of the DNA, being the most common type of variation between individuals and inside the genome. Even though there are multiple techniques available for GWAS, several complex traits still have unexplained heritability. To contribute to these studies, reference genetic maps are being created, such as the HapMap and 1000 Genomes, which have common genetic variants from world wide population (including European, Asian and African populations). In the last years, two solutions adopted to solve the missing heritability are to use different types of genetic variants and include the rare and population specific markers. Copy number variation (CNV) is a structural variant which use is increasing in GWAS in the last years. This variant is characterized for the deletion or duplication of a region a DNA and its length can be from few bases pair to the whole chromosome, as in Down syndrome. In collaboration of the Heart Institute (InCor-FMUSP), this work uses the dataset from Baependi Heart Study to establish a methodology to characterized the CNVs in the Brazilian population using SNP array data and associate them with height. This project uses the genetic and phenotype data of 1,120 related samples (family structure). For CNV calling, resources from the software PennCNV are used and methodologies of preprocessing, normalization, identification and other analysis are reviewed. The characterization of CNVs include information about location, size, frequency in our population and the patterns of inheritance in trios. The association of CNVs and height is made using linear mixed models and with information of family structure. The obtained results indicate that the Brazilian population has regions with variation in the number of copies that are not in the literature. General characteristics, such as length and frequency in samples, are similar to the information found in the literature. In addition, it was observed that the transmission of CNVs could not follow the Mendelian laws, since the frequency of trios which one parent has a deletion/duplication and the offspring is normal is higher than the frequency of trios with one parent and the offspring has a deletion/duplication. This work also identified a region on chromosome 9 that could be associated to height, being that carries of a duplication in this region can have the expected height dropped by approximately 3cm. / Estudos de associação genética (do inglês, Genome-wide association studies - GWAS) são uma ferramenta fundamental para associar marcadores genéticos, genes e regiões genômicas com doenças e fenótipos complexos, permitindo compreender em mais detalhes essa rede de regulação bem como mapear genes e, com isso, desenvolver técnicas de diagnóstico e tratamento. Atualmente, a principal variante genética utilizada nos estudos de associação é o SNP (do inglês, Single Nucleotide Polymorphism), uma variação que afeta apenas uma base do DNA, sendo o tipo de variação mais comum tanto entre os indivíduos como dentro do genoma. Apesar das diferentes técnicas disponíveis para os estudos de associação, muitas doenças e traços complexos ainda possuem parte de sua herdabilidade inexplicada. Para contribuir com estes estudos, foram criados banco de dados genéticos de referência, como o HapMap e o 1000 Genomes, que possuem representantes das variantes genéticas comuns das populações mundiais (européias, asiáticas e africanas). Nos últimos anos, duas das solucões adotadas para tentar explicar a herdabilidade de doenças e fenótipos complexos correspondem a utilizar diferentes tipos de variantes genéticas e incluir variantes raras e específicas para uma determinada população. O CNV (do inglês, Copy Number Variation) é uma variante estrutural que está ganhando espaço nos estudos de associação nos últimos anos. Essa variante é caracterizada pela deleção ou duplicação de uma região do DNA que pode ser de apenas alguns pares de bases até cromossomos inteiros, como no caso da síndrome de Down. Em parceria com o Instituto do Coração (InCor-FMUSP), este trabalho utiliza os dados do projeto Corações de Baependi para estabelecer uma metodologia para caracterizar os CNVs na população brasileira a partir de dados de SNPs e associá-los com a altura. O projeto inclui dados genéticos e fenótipos de 1,120 indivíduos relacionados (estruturados em famílias). Para a detecção dos CNVs, os recursos do software PennCNV são utilizados e metodologias de processamento, normalização, identificação e análises envolvidas são revisadas. A caracterização dos CNVs obtidos inclui informações de localização, tamanho e frequência na população e padrões de herança genética em trios. A associação dos CNVs com a altura é realizada a partir de modelos lineares mistos e utilizando informações sobre a estrutura de família. Os resultados obtidos indicaram que a população brasileira contém regiões (únicas) com variação no número de cópias que não estão identificadas na literatura. Características gerais dos CNVs, como tamanho e frequência no indivíduo, foram semelhantes ao que é apontado na literatura. Também foi observado que a transmissão de CNV pode não seguir as leis mendelianas, uma vez que a frequência de trios com um dos pais com deleção/duplicação e filho normal era superior à frequência dos trios com filho portador da mesma variação.
Modélisation de la susceptibilité génétique non observée d’un individu à partir de son histoire familiale de cancer : application aux études d'identification pangénomiques et à l'estimation du risque de cancer dans le syndrome de Lynch / Modeling the unobserved genetic susceptibility of an individual from his family history of cancer : applications to genome-wide identification studies and to the cancer risk estimation in Lynch syndromeDrouet, Youenn 09 October 2012 (has links)
Le syndrome de Lynch est responsable d’environ 5% des cas de cancer colorectaux (CCR). Il correspond à la transmission d’une mutation,variation génétique rare, qui confère un haut risque de CCR. Une telle mutationn’est cependant identifiée que dans une famille sur deux. Dans lesfamilles sans mutation identifiée, dites négatives, le risque de CCR est malconnu en particulier les estimations individuelles du risque. Cette thèse comportedeux objectifs principaux. Obj. 1- étudier les stratégies capables de réduireles tailles d’échantillon dans les études visant à identifier de nouveauxgènes de susceptibilité ; et Obj. 2- définir un cadre théorique permettantd’estimer des risques individualisés de CCR dans les familles négatives, enutilisant l’histoire familiale et personnelle de CCR de l’individu. Notre travails’appuie sur la théorie des modèles mendéliens et la simulation de donnéesfamiliales, à partir desquelles il est possible d’étudier la puissance d’étudesd’identification, et d’évaluer in silico les qualités prédictives de méthodesd’estimation du risque. Les résultats obtenus apportent des connaissancesnouvelles pour la planification d’études futures. D’autre part, la cadre méthodologiqueque nous proposons permet une estimation plus précise durisque individuel, permettant d’envisager une surveillance plus individualisée. / Lynch syndrome is responsible of about 5% of cases of colorectal cancer (CRC). It corresponds to the transmission of a mutation, which is arare genetic variant, that confers a high risk of CRC. Such a mutation isidentified, however, in only one family of two. In families without identifiedmutation, called negative, the risk of CRC is largely unknown in particularthere is a lack of individualized risk estimates. This thesis has two main objectives.Obj. 1 - to explore strategies that could reduce the required samplesizes of identification studies, and Obj. 2 - to define a theoretical frameworkfor estimating individualized risk of CRC in negative families, using personaland family history of CRC of the individuals. Our work is based on thetheory of Mendelian models and the simulation of family data, from whichit is possible to study the power of identification studies as well as to assessand compare in silico the predictive ability of risk estimation methods. Theresults provide new knowledge for designing future studies, and the methodologicalframework we propose allows a more precise estimate of risk, thatmight lead to a more individualized cancer follow-up.
A Decathlon in Multidimensional Modeling: Open Issues and Some SolutionsHümmer, W., Lehner, W., Bauer, A., Schlesinger, L. 12 January 2023 (has links)
The concept of multidimensional modeling has proven extremely successful in the area of Online Analytical Processing (OLAP) as one of many applications running on top of a data warehouse installation. Although many different modeling techniques expressed in extended multidimensional data models were proposed in the recent past, we feel that many hot issues are not properly reflected. In this paper we address ten common problems reaching from defects within dimensional structures over multidimensional structures to new analytical requirements and more.
Aprendizado de estruturas de dependência entre fenótipos da síndrome metabólica em estudos genômicos / Structure learning of the metabolic syndrome phenotypes network in family genomic studiesWilk, Lilian Skilnik 26 June 2017 (has links)
Introdução: O número de estudos relacionados à Síndrome Metabólica (SM) vem aumentando nos últimos anos, muitas vezes motivados pelo aumento do número de casos de sobrepeso/obesidade e diabetes Tipo II levando ao desenvolvimento de doenças cardiovasculares e, como consequência, infarto agudo do miocárdio e AVC, dentre outros desfechos desfavoráveis. A SM é uma doença multifatorial composta de cinco características, porém, para que um indivíduo seja diagnosticado com ela, possuir pelo menos três dessas características torna-se condição suficiente. Essas cinco características são: Obesidade visceral, caracterizada pelo aumento da circunferência da cintura, Glicemia de jejum elevada, Triglicérides aumentado, HDL-colesterol reduzido, Pressão Arterial aumentada. Objetivo: Estabelecer a rede de associações entre os fenótipos que compõem a Síndrome Metabólica através do aprendizado de estruturas de dependência, decompor a rede em componentes de correlação genética e ambiental e avaliar o efeito de ajustes por covariáveis e por variantes genéticas exclusivamente relacionadas à cada um dos fenótipos da rede. Material e Métodos: A amostra do estudo corresponderá a 79 famílias da cidade mineira de Baependi, composta por 1666 indivíduos. O aprendizado de estruturas de redes será feito por meio da Teoria de Grafos e Modelos de Equações Estruturais envolvendo o modelo linear misto poligênico para determinar as relações de dependência entre os fenótipos que compõem a Síndrome Metabólica / Introduction: The number of studies related to Metabolic Syndrome (MetS) has been increasing in the last years, encouraged by the increase on the overweight / obesity and Type II Diabetes cases, leading to the development of cardiovascular disease and, therefore, acute myocardial infarction and stroke, and others unfavorable outcomes. MetS is a multifactorial disease containing five characteristics, however, for an individual to be diagnosed with MetS, he/she may have at least three of them. These characteristics are: Truncal Obesity, characterized by increasing on the waist circumference, increasing on Fasting Blood Glucose, increasing on Triglycerides, decreasing on HDL cholesterol and increasing on Blood Pressure. Aims: Establish the best association network between MetS phenotypes through structured dependency learning between phenotypes considering genetic variants exclusively related to each phenotype. Materials and Methods: The study sample is composed of 79 families, 1666 individuals of a city in a rural area of Brazil, called Beapendi. Structured learning will use graph theory and Structural Equations Models to establish the dependency relations between MetS phenotypes
