21 |
Enhancing discovery of genetic variants for posttraumatic stress disorder through integration of quantitative phenotypes and trauma exposure informationMaihofer, Adam X., Choi, Karmel W., Coleman, Jonathan R.I., Daskalakis, Nikolaos P., Denckla, Christy A., Ketema, Elizabeth, Morey, Rajendra A., Polimanti, Renato, Ratanatharathorn, Andrew, Torres, Katy, Wingo, Aliza P., Zai, Clement C., Aiello, Allison E., Almli, Lynn M., Amstadter, Ananda B., Andersen, Soren B., Andreassen, Ole A., Arbisi, Paul A., Ashley-Koch, Allison E., Austin, S. Bryn, Avdibegović, Esmina, Borglum, Anders D., Babić, Dragan, Bækvad-Hansen, Marie, Baker, Dewleen G., Beckham, Jean C., Bierut, Laura J., Bisson, Jonathan I., Boks, Marco P., Bolger, Elizabeth A., Bradley, Bekh, Brashear, Meghan, Breen, Gerome, Bryant, Richard A., Bustamante, Angela C., Bybjerg-Grauholm, Jonas, Calabrese, Joseph R., Caldas-de-Almeida, José M., Chen, Chia Yen, Dale, Anders M., Dalvie, Shareefa, Deckert, Jürgen, Delahanty, Douglas L., Dennis, Michelle F., Disner, Seth G., Domschke, Katharina, Duncan, Laramie E., Džubur Kulenović, Alma, Erbes, Christopher R., Evans, Alexandra, Farrer, Lindsay A., Feeny, Norah C., Flory, Janine D., Forbes, David, Franz, Carol E., Galea, Sandro, Garrett, Melanie E., Gautam, Aarti, Gelaye, Bizu, Gelernter, Joel, Geuze, Elbert, Gillespie, Charles F., Goçi, Aferdita, Gordon, Scott D., Guffanti, Guia, Hammamieh, Rasha, Hauser, Michael A., Heath, Andrew C., Hemmings, Sian M.J., Hougaard, David Michael, Jakovljević, Miro, Jett, Marti, Johnson, Eric Otto, Jones, Ian, Jovanovic, Tanja, Qin, Xue Jun, Karstoft, Karen Inge, Kaufman, Milissa L., Kessler, Ronald C., Khan, Alaptagin, Kimbrel, Nathan A., King, Anthony P., Koen, Nastassja, Kranzler, Henry R., Kremen, William S., Lawford, Bruce R., Lebois, Lauren A.M., Lewis, Catrin, Liberzon, Israel, Linnstaedt, Sarah D., Logue, Mark W., Lori, Adriana, Lugonja, Božo, Luykx, Jurjen J., Lyons, Michael J., Maples-Keller, Jessica L., Marmar, Charles, Martin, Nicholas G., Maurer, Douglas, Mavissakalian, Matig R. 01 April 2022 (has links)
Background: Posttraumatic stress disorder (PTSD) is heritable and a potential consequence of exposure to traumatic stress. Evidence suggests that a quantitative approach to PTSD phenotype measurement and incorporation of lifetime trauma exposure (LTE) information could enhance the discovery power of PTSD genome-wide association studies (GWASs). Methods: A GWAS on PTSD symptoms was performed in 51 cohorts followed by a fixed-effects meta-analysis (N = 182,199 European ancestry participants). A GWAS of LTE burden was performed in the UK Biobank cohort (N = 132,988). Genetic correlations were evaluated with linkage disequilibrium score regression. Multivariate analysis was performed using Multi-Trait Analysis of GWAS. Functional mapping and annotation of leading loci was performed with FUMA. Replication was evaluated using the Million Veteran Program GWAS of PTSD total symptoms. Results: GWASs of PTSD symptoms and LTE burden identified 5 and 6 independent genome-wide significant loci, respectively. There was a 72% genetic correlation between PTSD and LTE. PTSD and LTE showed largely similar patterns of genetic correlation with other traits, albeit with some distinctions. Adjusting PTSD for LTE reduced PTSD heritability by 31%. Multivariate analysis of PTSD and LTE increased the effective sample size of the PTSD GWAS by 20% and identified 4 additional loci. Four of these 9 PTSD loci were independently replicated in the Million Veteran Program. Conclusions: Through using a quantitative trait measure of PTSD, we identified novel risk loci not previously identified using prior case-control analyses. PTSD and LTE have a high genetic overlap that can be leveraged to increase discovery power through multivariate methods. © 2021 Society of Biological Psychiatry / National Institutes of Health / Revisión por pares
|
22 |
MMP20 and ARMS2/HTRA1 are Associated with Neovascular Lesion Size in Age-Related Macular Degeneration / MMP20とARMS2/HTRA1は滲出型加齢黄斑変性の病変サイズと相関するAkagi, Yumiko 25 January 2016 (has links)
京都大学 / 0048 / 新制・課程博士 / 博士(医学) / 甲第19404号 / 医博第4055号 / 新制||医||1012(附属図書館) / 32429 / 京都大学大学院医学研究科医学専攻 / (主査)教授 野田 亮, 教授 瀬原 淳子, 教授 藤渕 航 / 学位規則第4条第1項該当 / Doctor of Medical Science / Kyoto University / DFAM
|
23 |
Quantitative genetics from genome assemblies to neural network aided omics-based prediction of complex traits / Quantitative Genetik von Genomassemblierungen bis zur genomischen Vorhersage von phänotypischen Merkmalen mit Hilfe von künstlichen neuronalen NetzwerkenFreudenthal, Jan Alexander January 2020 (has links) (PDF)
Quantitative genetics is the study of continuously distributed traits and their ge-
netic components. Recent developments in DNA sequencing technologies and
computational systems allow researchers to conduct large scale in silico studies.
However, going from raw DNA reads to genomic prediction of quantitative traits
with the help of neural networks is a long and error-prone process. In the course
of this thesis, many steps involved in this process will be assessed in depth. Chap-
ter 2 will feature a study that compares the landscape of chloroplast genome as-
sembly tools. Chapter 3 will present a software to perform genome-wide associa-
tion studies using modern tools, which allow GWAS-Flow to outperform current
state of the art software packages. Chapter 4 will give an in depth introduc-
tion to machine learning and the nature of quantitative traits and will combine
those to genomic prediction with artificial neural networks and compares the re-
sults to those of algorithms based on linear mixed models. Finally, in Chapter 5
the results from the previous chapters are summarized and used to elucidate the
complex nature of studies concerning quantitative genetics. / Quantitative Genetik beschäftigt sich mit kontinuierlich verteilten Merkmalen und deren genetischer Komponenten. In den letzten Jahren gab es vielfältige Entwicklungen in der Computertechnik und der Genomik, insbesondere der DNA Sequenzierung, was Forschern erlaubt großflächig angelegte in silico Studien durchzuführen. Jedoch ist es ein komplexer Prozess von rohen Sequenzdaten bis zur genomischen Vorhersage mit Hilfe von neuronalen Netzwerken zu kommen. Im Rahmen der vorliegenden Studien werden viele Schritte, die an diesem Prozess beteiligt sind beleuchtet. Kapitel 2 wird einen Vergleich zwischen einer Vielzahl an Werkzeugen zur Assemblierung von Chloroplasten Genomen ziehen. Kapitel 3 stellt eine neu entwickelte Software zur genom-weiten Assoziationskartierung vor, die bisherigen Programmen überlegen ist. Kapitel 4 stellt maschinelles Lernen und die genetischen Komponenten von quantitativen Merkmalen vor und bringt diese im Kontext der genomischen Vorhersagen zusammen. Zum Schluss in Kapitel 5 werden die vorherigen Ergebnisse im Gesamtkontext der quantitativen Genetik erläutert.
|
24 |
Mechanisms Linking CARS2 to Coronary Artery DiseaseDang, Anh-Thu 14 December 2023 (has links)
Coronary artery disease (CAD) is the leading cause of death worldwide. Genome-wide association studies (GWAS) have identified more than 200 loci associated with CAD. Here, we investigated the functional effects of a locus tagged by rs61969072 (T/G), with the common allele (T) associated with protection from CAD.
Expression quantitative trait loci (eQTL) analysis demonstrated a strong association between rs61969072 and CARS2 gene expression, which increased with the T allele, in various human tissues. CARS2 encodes the mitochondrial cysteinyl-tRNA synthetase, an enzyme that attaches cysteine to its cognate tRNA. We hypothesized that CARS2 is a candidate causal gene and that CARS2 confers a protective effect against CAD.
We characterized CARS2 expression in macrophages and demonstrated decreased expression in pro-inflammatory M1 macrophages. Gene expression profiling following CARS2 siRNA knockdown revealed increased levels of several pro-inflammatory cytokines. Functional enrichment analysis identified the anti-inflammatory IL-10 signaling pathway, and western blotting showed that CARS2 attenuated IL-10 pathway activation through STAT3 phosphorylation. We also demonstrated that macrophage CARS2 knockdown in a macrophage/smooth muscle cell (SMC) co-culture model elicited gene expression changes indicative of a less contractile, pro-inflammatory, SMC phenotype.
We then performed an in-depth analysis of differentially expressed genes following CARS2 knockdown. Several inflammatory pathways and functions were affected, particularly Protein Kinase R (PKR), implicated in Interferon Induction and Antiviral Response. Downstream of PKR is the NF-κB signaling pathway; CARS2 knockdown led to increased NF-κB protein expression but not activation, as measured by a luciferase reporter assay.
Finally, we investigated potential mitochondrial mechanisms that could lead to inflammation. Reduced CARS2 levels were found to decrease mitochondrial membrane potential. However, there was a decrease in reactive oxygen species (ROS) levels and no changes in mitochondrial DNA release, metabolism, or mitochondrial bioenergetics. While ROS are often considered harmful due to their role in oxidative damage and inflammation, studies have shown that under certain contexts, ROS can have protective effects. Further studies are required to understand the mechanisms underlying the anti-inflammatory effects of CARS2.
Overall, my findings highlight a novel anti-inflammatory role of CARS2 in human macrophages, consistent with the CAD protective effect of a common GWAS-identified variant.
|
25 |
HERITABILITY AND SEX-EFFECT ANALYSES OF NEURODEGENERATIVE DISEASEKeller, Margaux Finn January 2014 (has links)
This work analyzes the genetic basis of three neurodegenerative diseases using several thousands of individuals of European descent to determine a range of phenotypic heritability outside of what has been identified by prior methods. By measuring additive genetic variance genome-wide, measures of its contribution to the phenotypic variance of these diseases were substantially increased, in some instances by a factor of 10 or more. Additionally, regional-mapping methods identified segments of the genome exhibiting significantly high heritability estimates associated with one of the neurodegenerative diseases, Amyotrophic lateral sclerosis. This resulted in the detection of novel candidate regions and provided conclusive evidence for the polygenic architecture of this disease. Lastly, novel risk variants associated with Parkinson's disease were identified on the X chromosome, a previously ignored genomic region. Overall, the employment of new analytic methods produced robust and novel results, adding substantial information to the neurodegenerative disease literature and connecting the anthropological perspective with growing informatics-based methods. / Anthropology
|
26 |
Development of tools to study the association of transposons to agronomic traitsYan, Haidong 21 May 2020 (has links)
Transposable elements (Transposons; TEs) constitute the majority of DNA in genomes and are a major source of genetic polymorphisms. TEs act as potential regulators of gene expression and lead to phenotypic plasticity in plants and animals. In crops, several TEs were identified to influence alleles associated with important agronomic traits, such as apical dominance in maize and seed number in rice. Crops may harbor more TE-mediated genetic regulations than expected in view of multifunctional TEs in genomes. However, tools that accurately annotate TEs and clarify their associations with agronomic traits are still lacking, which largely limits applications of TEs in crop breeding. Here we 1) evaluate performances of popular tools and strategies to identify TEs in genomes, 2) develop a tool 'DeepTE' to annotate TEs based on deep learning models, and 3) develop a tool 'TE-marker' to identify potential TE-regulated alleles associated with agronomic traits. As a result, we propose a series of recommendations and a guideline to develop a comprehensive library to precisely identify TEs in genomes. Secondly, 'DeepTE' classifies TEs into 15-24 super families according to sequences from plants, metazoans, and fungi. For unknown sequences, this tool can distinguish non-TEs and TEs in plant species. Finally, the 'TE-marker' tool builds a TE-based marker system that is able to cluster rice populations similar to a classical SNP marker approach. This system can also detect association peaks that are equivalent to the ones produced by SNP markers. 'TE-marker' is a novel complementary approach to the classical SNP markers that it assists in revealing population structures and in identifying alleles associated with agronomic traits. / Doctor of Philosophy / Transposable elements (Transposons; TEs) are DNA fragments that can jump and integrate into new positions in the genome. TEs potentially act as regulators of gene expression and alter traits of plants and animals. In crops, several TEs were identified to influence functions of genes that control important agronomic traits, such as branching in maize and seed number in rice. However, tools that identify these associations in the crops are still lacking, which largely limits applications of TEs in crop breeding. Here we evaluated performance of popular tools and strategies that identify TEs, and provide a series of recommendations to efficiently apply these tools to the TE identification. In view of structural and sequence differences, TEs are classified into multiple families. We developed a 'DeepTE' tool to precisely cluster TEs into different families using a deep learning method. Finally, a 'TE-marker' tool was developed to build TE-based genetic markers to identify nearby alleles associated with agronomic traits. Overall, this work could promote the use of TEs as markers in improving quality and yielding crops.
|
27 |
Towards constructing disease relationship networks using genome-wide association studiesHuang, Wenhui 19 January 2010 (has links)
Background: Genome-wide association studies (GWAS) prove to be a powerful approach to identify the genetic basis of various human[1] diseases. Here we take advantage of existing GWAS data and attempt to build a framework to understand the complex relationships among diseases. Specifically, we examined 49 diseases from all available GWAS with a cascade approach by exploiting network analysis to study the single nucleotide polymorphisms (SNP) effect on the similarity between different diseases. Proteins within perturbation subnetwork are considered to be connection points between the disease similarity networks.
Results: shared disease subnetwork proteins are consistent, accurate and sensitive to measure genetic similarity between diseases. Clustering result shows the evidence of phenome similarity.
Conclusion: our results prove the usefulness of genetic profiles for evaluating disease similarity and constructing disease relationship networks. / Master of Science
|
28 |
Molecular Mechanisms Associated With Mastitis In Dairy CattleChen, Zikai Kevin 01 June 2024 (has links) (PDF)
Mastitis, the inflammation of the mammary gland caused by both gram-positive and gram-negative bacteria, causes the global dairy industry over $20 billion annually in losses due to changes in milk yield and quality. The goals of this research were to 1) identify genetic variants, genes, and biological pathways that are associated with mastitis, and 2) identify genetic variants, genes, and biological pathways that are associated with somatic cell count. In Chapter 3, we describe genome-wide association studies (GWAS) on 1,858 cows from two lactations with both medium (n=87,493) and high density (n=624,300) single nucleotide polymorphisms (SNP) using a single SNP mixed linear model. A total of 289 SNPs across all models reached genome-wide significance at 5% false discovery rate (FDR) and were mapped to 100 different nearby genes which have been identified in previous studies and are quantitative trait loci for somatic cell count. Significant SNPs and genes were mapped to the bovine genome to reveal corresponding genes to allow gene enrichment analysis to identify significant biological pathways associated with mastitis. In Chapter 4, we describe genetic associations affecting lactation average somatic cell count using 624,300 single nucleotide polymorphisms (SNP) on performed on 364 Holstein and Jersey cows in their first lactation. A total of 205 SNPs reached genome-wide significance at 5% FDR and were mapped to 68 unique genes.
|
29 |
Role of endogenous and exogenous factors in chronic disease development and progressionLeone, Dominick Anthony 06 August 2024 (has links)
Genetic and molecular factors associated with chronic lung and kidney disease susceptibility, leading causes of domestic and global mortality, remain largely unexplained. Until recently, much of the research has focused on nucleotide differences in the human genome. However, the scope of gene studies has expanded beyond nucleotide variability, such as single nucleotide polymorphisms (SNPs). Current research utilizes multiple “omics” that consider not just inherited susceptibility factors, but also the macromolecules coded by the genome and gene-environment interactions. We conducted three studies to answer scientific questions, gain valuable insight into chronic lung and kidney disease processes, and provided meaningful guidance in prioritizing future research. This dissertation is particularly relevant to research in under-represented populations and when sample sizes are limited due to the nature of the disease or difficulty in obtaining biological specimens.
We began with a common lung disease that affects millions annually, a potential risk factor for which little is known, and for which study is complicated because the diseased lung tissue is difficult to obtain for cases and more so for controls. In this study, we used easier-to-obtain nasal brushings and transcriptomics to measure counts of non-coding genetic elements. We then shifted to a kidney disease that has been best described in South America and South Asia, where genetic components of the disease might be very different from other more typical forms of kidney disease in North American and European populations. We used genetic variants explaining urinary metabolites to help prioritize future genetic research for the kidney disease, and we explored variants that explain blood concentration of urate to better understand metabolic processes in the at-risk population.
In our first study, we examined the associations of Human Endogenous Retrovirus (HERV) transcription with lung function (FEV1/FVC) and Chronic Obstructive Pulmonary Disease (COPD) among current and former smokers. We sought evidence that greater transcription of HERVs was associated with lower lung function and higher risk of COPD. HERVs associated with FEV1/FVC were located within or near COPD genes, and HERV transcription was correlated with gene-regulation involved in FEV1 decline. RNA-seq data from two Detection of Early lung Cancer Among Military Personnel (DECAMP) cohorts were used to identify transcribed HERVs and estimate locus-specific HERV counts. A weight-of-evidence framework was then used to prioritize HERVs and was compared to a p-value based approach. We found 55 HERV loci with evidence that transcription was associated with higher or lower FEV1/FVC. Consistent with the prevailing theory that transcription and expression of HERVs is suppressed in adults, we observed HERV loci with lower mean counts and increased transcription was associated with worse lung function. However, we found some HERV loci with higher mean counts, and increased transcription was associated with better lung function. Most of the lung function-associated HERVs were intergenic and none were near COPD genes. Higher counts of HUERSP1 (15q21) were correlated with higher expression of down-regulated FEV1 decline genes (rho= 0.82; 95%CI: 0.77, 0.85; p= 1.1 x10-65). We also found, albeit with limited evidence, that overall higher transcriptional levels of some HERVs were associated with differential COPD risk. Contrary to our expectation, we observed lower odds of COPD among those with the highest transcription across all loci of HERLEQUIN (beta= 0.45; 95%CI: 0.21, 0.96), HERV-P71A (beta= 0.44; 95%CI: 0.21, 0.91) and HML2 (beta= 0.45; 95%CI: 0.22, 0.93) compared to those with the lowest transcriptional levels. Use of both weight-of-evidence and p-values to prioritize HERVs was better than exclusive use of either framework. We were unable to rule out the possibility that, for some HERVs, transcription might exhibit Goldilocks-like effects on pulmonary function and COPD risk.
For the second study, we explored the role of genetic variants in metabolism as measured by urinary metabolite concentrations. Our sample population was Mesoamericans residing in Nicaragua who were at risk for chronic kidney disease of non-traditional origins (CKDnt). High serum urate, also indicative of altered metabolism, is more common than expected in this genetically distinct population. We suspected a potential genetic component that might explain both CKDnt risk and differences in urinary metabolites. However, our limited sample size for genome-wide association studies (GWAS) motivated the design of three approaches to test our hypotheses: (1) for our main hypothesis, we began by investigating associations between nucleotide variability of all variants and concentration of all urine metabolites, and we identified statistically significant associations after accounting for multiple testing; (2) we tested a more limited hypothesis that analyzed all variants but selected metabolites that differed in concentration from similar European workers; (3) for our most specific hypothesis we analyzed associations between a selection of variants that affect protein structure and all metabolites. We included 313 non-cases of CKDnt from our case-control genetics study and with available metabolite data in our analyses. Metabolites were identified, and their concentrations measured, using nuclear magnetic resonance. The effect of variants on metabolite concentration and CKDnt case odds ratios were estimated using generalized linear mixed model association tests. We accounted for population structure, using the top ten genetically determined principal components, and cryptic relatedness using a genetic relatedness matrix. We found that Mesoamericans in our study population carried genetic variants that predispose them to altered energy metabolism and renal transport of solutes, but the allele frequencies did not differ significantly between cases and non-cases of CKDnt. A non-synonymous AGXT2 variant (rs37370) significantly explained 3-Aminoisobutyric acid (beta= -0.68; 95%CI: -0.73, -0.47, p= 9.55 x10 -20) and Dihydrothymine (beta= -1.05; 95%CI: -1.23, -0.88; p= 1.83 x10 -31). Our other approaches identified variants in several other genes including TTC23L, ACADS, HAO2, TCIRG1, NDUFA10, and SLC26A10. While we failed to find a shared genetic link between metabolism and CKDnt, we observed genetic variability in biologically plausible genes such as AGXT2 and HAO2. We concluded that Mesoamericans at risk of CKDnt, due to heat and dehydration, may also carry gene variants that predispose them to higher body heat production.
Our third study was a GWAS to further understand genetic factors related to serum uric acid (sUA) metabolism among Nicaraguans at risk of CKDnt. Prior longitudinal studies have found that sugarcane workers experience post-shift elevation in sUA, and abnormally high concentration (hyperuricemia) is a known risk factor for CKD. After onset of CKD, higher sUA is associated with a worse prognosis. Also, previous sUA GWAS have identified a set of genetic variants explaining urate concentration across ethnic populations and found SNPs where the effect on concentration depends on renal function. However, it was unclear if known sUA associated variants explain concentration among populations at risk of CKDnt. We also suspected that genetic variability, unique to Mesoamericans, might explain serum concentration more than known sUA associated SNPs. Therefore, we examined SNP variability and associations with sUA among Nicaraguans. We chose to separately analyze our cases and non-cases of CKDnt because sUA in these groups had different possible biological etiologies. Our aims were to: (1) determine if previously published variants, with established association to sUA among global populations, explained concentration among those diagnosed with CKDnt and those at-risk of the disease; (2) discover novel sUA SNPs among our sample of Mesoamericans; (3) boost our discovery of variants by exploiting kidney function as an effect modifier; (4) identify SNPs where association depends on renal function; and (5) identify SNPs explaining sUA among both CKDnt cases and non-cases. We drew two analytical samples from the Mesoamerican Nephropathy Case-Control Genetic Study: case-only (n= 609) and control-only (non-cases= 385). Renal function was measured as serum creatinine (sCr). In our case-only GWAS, we used two models: generalized linear mixed model (GLMM) that included top 10 principal components (PCs) and use of sUA-lowering medication (allopurinol), and also included an additional gene*sCr interaction term. Only among cases, we tested for joint effects of variant and the interaction with renal function. SNPs, which explained sUA in case-only GWAS or joint test, were examined for modification by renal function using the gene*sCr coefficient. Control-only GWAS was based on GLMM that only included top 10 PCs. In this pilot study, we observed a high prevalence of hyperuricemia (CKDnt cases= 78.8% and non-cases= 23.6%). Despite our limited power, we successfully replicated prior findings for the ABCG2 variant, rs74904971, which was among the top three SNPs explaining sUA in global populations. Among CKDnt cases, rs74904971 significantly explained sUA concentration (beta = 1.1; 95%CI: 0.8, 1.4; p= 9.2 x10 -12), but not among controls (beta = 0.41; 95%CI: 0.16, 0.66; p= 1.2 x10 -3). We found no evidence that renal function modified the genetic effects of rs74904971 (betaG*sCr = 0.44, pG*sCr = 0.17). Most other established urate-SNPs were not significant even at 0.05 significance-level. We found five novel SNPs associated with sUA in our control-only and case-only GWAS, but these variants all had low minor allele frequency (MAF < 5%). The best SNP from control-only GWAS, a GALNTL6 intron variant (rs17057585), was significantly associated with a 1.9 mg/dL increase in sUA (95%CI: 1.3, 2.6, p= 3.6 x10 -9). In our case-only GWAS, we found participants with more alternative alleles of rs61156970, located 89kbp downstream from MCTP2, had suggestively lower sUA (beta= -1.9; 95%CI: -2.6, -1.2; p= 1.8 x10 -7). The joint test allowed us to identify an additional 20 variants, of which 11 SNPs had MAF ≥ 10%, and we found 16 SNPs with significantly greater effects among cases with worse renal function compared to cases with better functioning kidneys. We used The Human Protein Atlas, an online database, to access the biological plausibility for the most significant SNPs from the joint test. We found limited support for the top intronic SNPs from the joint test: rs9499393 (within ERCC3) and rs89572695 (within TMEM9B) have MAF < 10% and are highly expressed genes that are non-specific to renal cells. The effect of indel rs11404676 (within MTHFD1L), which was significantly greater among CKDnt cases with worse renal function than cases with better function (betaG*sCr= 1.4, pG*sCr = 1.4 x10 -6), is more likely to be biologically relevant to purine biosynthesis and urate concentration. We have demonstrated that we can identify known sUA associated SNPs with large effects, and that other Mesoamerican-specific variants better explain serum urate than known ABCG2 SNPs. Renal function among non-cases should be addressed in future GWAS from this population and caution is advised when extrapolating multi-ethnic GWAS results to Mesoamericans.
|
30 |
Variable selection for generalized linear mixed models and non-Gaussian Genome-wide associated study dataXu, Shuangshuang 11 June 2024 (has links)
Genome-wide associated study (GWAS) aims to identify associated single nucleotide polymorphisms (SNP) for phenotypes. SNP has the characteristic that the number of SNPs is from hundred of thousands to millions. If p is the number of SNPs and n is the sample size, it is a p>>n variable selection problem. To solve this p>>n problem, the common method for GWAS is single marker analysis (SMA). However, since SNPs are highly correlated, SMA identifies true causal SNPs with high false discovery rate. In addition, SMA does not consider interaction between SNPs. In this dissertation, we propose novel Bayesian variable selection methods BG2 and IBG3 for non-Gaussian GWAS data. To solve ultra-high dimension problem and highly correlated SNPs problem, BG2 and IBG3 have two steps: screening step and fine-mapping step. In the screening step, BG2 and IBG3, like SMA method, only have one SNP in one model and screen to obtain a subset of most associated SNPs. In the fine-mapping step, BG2 and IBG3 consider all possible combinations of screened candidate SNPs to find the best model. Fine-mapping step helps to reduce false positives. In addition, IBG3 iterates these two steps to detect more SNPs with small effect size. In simulation studies, we compare our methods with SMA methods and fine-mapping methods. We also compare our methods with different priors for variables, including nonlocal prior, unit information prior, Zellner-g prior, and Zellner-Siow prior. Our methods are applied to substance use disorder (alcohol comsumption and cocaine dependence), human health (breast cancer), and plant science (the number of root-like structure). / Doctor of Philosophy / Genome-wide associated study (GWAS) aims to identify genomics variants for targeted phenotype, such as disease and trait. The genomics variants which we are interested in are single nucleotide polymorphisms (SNP). SNP is a substitution mutation in the DNA sequence. GWAS solves the problem that which SNP is associated with the phenotype. However, the number of possible SNPs is from hundred of thousands to millions. The common method for GWAS is called single marker analysis (SMA). SMA only considers one SNP's association with the phenotype each time. In this way, SMA does not have the problem which comes from the large number of SNPs and small sample size. However, SMA does not consider the interaction between SNPs. In addition, SNPs that are close to each other in the DNA sequance may highly correlated SNPs causing SMA to have high false discovery rate. To solve these problems, this dissertation proposes two variable selection methods (BG2 and IBG3) for non-Gaussian GWAS data. Compared with SMA methods, BG2 and IBG3 methods detect true causal SNPs with low false discovery rate. In addition, IBG3 can detect SNPs with small effect sizes. Our methods are applied to substance use disorder (alcohol comsumption and cocaine dependence), human health (breast cancer), and plant science (the number of root-like structure).
|
Page generated in 0.11 seconds