Genetic and molecular factors associated with chronic lung and kidney disease susceptibility, leading causes of domestic and global mortality, remain largely unexplained. Until recently, much of the research has focused on nucleotide differences in the human genome. However, the scope of gene studies has expanded beyond nucleotide variability, such as single nucleotide polymorphisms (SNPs). Current research utilizes multiple “omics” that consider not just inherited susceptibility factors, but also the macromolecules coded by the genome and gene-environment interactions. We conducted three studies to answer scientific questions, gain valuable insight into chronic lung and kidney disease processes, and provided meaningful guidance in prioritizing future research. This dissertation is particularly relevant to research in under-represented populations and when sample sizes are limited due to the nature of the disease or difficulty in obtaining biological specimens.
We began with a common lung disease that affects millions annually, a potential risk factor for which little is known, and for which study is complicated because the diseased lung tissue is difficult to obtain for cases and more so for controls. In this study, we used easier-to-obtain nasal brushings and transcriptomics to measure counts of non-coding genetic elements. We then shifted to a kidney disease that has been best described in South America and South Asia, where genetic components of the disease might be very different from other more typical forms of kidney disease in North American and European populations. We used genetic variants explaining urinary metabolites to help prioritize future genetic research for the kidney disease, and we explored variants that explain blood concentration of urate to better understand metabolic processes in the at-risk population.
In our first study, we examined the associations of Human Endogenous Retrovirus (HERV) transcription with lung function (FEV1/FVC) and Chronic Obstructive Pulmonary Disease (COPD) among current and former smokers. We sought evidence that greater transcription of HERVs was associated with lower lung function and higher risk of COPD. HERVs associated with FEV1/FVC were located within or near COPD genes, and HERV transcription was correlated with gene-regulation involved in FEV1 decline. RNA-seq data from two Detection of Early lung Cancer Among Military Personnel (DECAMP) cohorts were used to identify transcribed HERVs and estimate locus-specific HERV counts. A weight-of-evidence framework was then used to prioritize HERVs and was compared to a p-value based approach. We found 55 HERV loci with evidence that transcription was associated with higher or lower FEV1/FVC. Consistent with the prevailing theory that transcription and expression of HERVs is suppressed in adults, we observed HERV loci with lower mean counts and increased transcription was associated with worse lung function. However, we found some HERV loci with higher mean counts, and increased transcription was associated with better lung function. Most of the lung function-associated HERVs were intergenic and none were near COPD genes. Higher counts of HUERSP1 (15q21) were correlated with higher expression of down-regulated FEV1 decline genes (rho= 0.82; 95%CI: 0.77, 0.85; p= 1.1 x10-65). We also found, albeit with limited evidence, that overall higher transcriptional levels of some HERVs were associated with differential COPD risk. Contrary to our expectation, we observed lower odds of COPD among those with the highest transcription across all loci of HERLEQUIN (beta= 0.45; 95%CI: 0.21, 0.96), HERV-P71A (beta= 0.44; 95%CI: 0.21, 0.91) and HML2 (beta= 0.45; 95%CI: 0.22, 0.93) compared to those with the lowest transcriptional levels. Use of both weight-of-evidence and p-values to prioritize HERVs was better than exclusive use of either framework. We were unable to rule out the possibility that, for some HERVs, transcription might exhibit Goldilocks-like effects on pulmonary function and COPD risk.
For the second study, we explored the role of genetic variants in metabolism as measured by urinary metabolite concentrations. Our sample population was Mesoamericans residing in Nicaragua who were at risk for chronic kidney disease of non-traditional origins (CKDnt). High serum urate, also indicative of altered metabolism, is more common than expected in this genetically distinct population. We suspected a potential genetic component that might explain both CKDnt risk and differences in urinary metabolites. However, our limited sample size for genome-wide association studies (GWAS) motivated the design of three approaches to test our hypotheses: (1) for our main hypothesis, we began by investigating associations between nucleotide variability of all variants and concentration of all urine metabolites, and we identified statistically significant associations after accounting for multiple testing; (2) we tested a more limited hypothesis that analyzed all variants but selected metabolites that differed in concentration from similar European workers; (3) for our most specific hypothesis we analyzed associations between a selection of variants that affect protein structure and all metabolites. We included 313 non-cases of CKDnt from our case-control genetics study and with available metabolite data in our analyses. Metabolites were identified, and their concentrations measured, using nuclear magnetic resonance. The effect of variants on metabolite concentration and CKDnt case odds ratios were estimated using generalized linear mixed model association tests. We accounted for population structure, using the top ten genetically determined principal components, and cryptic relatedness using a genetic relatedness matrix. We found that Mesoamericans in our study population carried genetic variants that predispose them to altered energy metabolism and renal transport of solutes, but the allele frequencies did not differ significantly between cases and non-cases of CKDnt. A non-synonymous AGXT2 variant (rs37370) significantly explained 3-Aminoisobutyric acid (beta= -0.68; 95%CI: -0.73, -0.47, p= 9.55 x10 -20) and Dihydrothymine (beta= -1.05; 95%CI: -1.23, -0.88; p= 1.83 x10 -31). Our other approaches identified variants in several other genes including TTC23L, ACADS, HAO2, TCIRG1, NDUFA10, and SLC26A10. While we failed to find a shared genetic link between metabolism and CKDnt, we observed genetic variability in biologically plausible genes such as AGXT2 and HAO2. We concluded that Mesoamericans at risk of CKDnt, due to heat and dehydration, may also carry gene variants that predispose them to higher body heat production.
Our third study was a GWAS to further understand genetic factors related to serum uric acid (sUA) metabolism among Nicaraguans at risk of CKDnt. Prior longitudinal studies have found that sugarcane workers experience post-shift elevation in sUA, and abnormally high concentration (hyperuricemia) is a known risk factor for CKD. After onset of CKD, higher sUA is associated with a worse prognosis. Also, previous sUA GWAS have identified a set of genetic variants explaining urate concentration across ethnic populations and found SNPs where the effect on concentration depends on renal function. However, it was unclear if known sUA associated variants explain concentration among populations at risk of CKDnt. We also suspected that genetic variability, unique to Mesoamericans, might explain serum concentration more than known sUA associated SNPs. Therefore, we examined SNP variability and associations with sUA among Nicaraguans. We chose to separately analyze our cases and non-cases of CKDnt because sUA in these groups had different possible biological etiologies. Our aims were to: (1) determine if previously published variants, with established association to sUA among global populations, explained concentration among those diagnosed with CKDnt and those at-risk of the disease; (2) discover novel sUA SNPs among our sample of Mesoamericans; (3) boost our discovery of variants by exploiting kidney function as an effect modifier; (4) identify SNPs where association depends on renal function; and (5) identify SNPs explaining sUA among both CKDnt cases and non-cases. We drew two analytical samples from the Mesoamerican Nephropathy Case-Control Genetic Study: case-only (n= 609) and control-only (non-cases= 385). Renal function was measured as serum creatinine (sCr). In our case-only GWAS, we used two models: generalized linear mixed model (GLMM) that included top 10 principal components (PCs) and use of sUA-lowering medication (allopurinol), and also included an additional gene*sCr interaction term. Only among cases, we tested for joint effects of variant and the interaction with renal function. SNPs, which explained sUA in case-only GWAS or joint test, were examined for modification by renal function using the gene*sCr coefficient. Control-only GWAS was based on GLMM that only included top 10 PCs. In this pilot study, we observed a high prevalence of hyperuricemia (CKDnt cases= 78.8% and non-cases= 23.6%). Despite our limited power, we successfully replicated prior findings for the ABCG2 variant, rs74904971, which was among the top three SNPs explaining sUA in global populations. Among CKDnt cases, rs74904971 significantly explained sUA concentration (beta = 1.1; 95%CI: 0.8, 1.4; p= 9.2 x10 -12), but not among controls (beta = 0.41; 95%CI: 0.16, 0.66; p= 1.2 x10 -3). We found no evidence that renal function modified the genetic effects of rs74904971 (betaG*sCr = 0.44, pG*sCr = 0.17). Most other established urate-SNPs were not significant even at 0.05 significance-level. We found five novel SNPs associated with sUA in our control-only and case-only GWAS, but these variants all had low minor allele frequency (MAF < 5%). The best SNP from control-only GWAS, a GALNTL6 intron variant (rs17057585), was significantly associated with a 1.9 mg/dL increase in sUA (95%CI: 1.3, 2.6, p= 3.6 x10 -9). In our case-only GWAS, we found participants with more alternative alleles of rs61156970, located 89kbp downstream from MCTP2, had suggestively lower sUA (beta= -1.9; 95%CI: -2.6, -1.2; p= 1.8 x10 -7). The joint test allowed us to identify an additional 20 variants, of which 11 SNPs had MAF ≥ 10%, and we found 16 SNPs with significantly greater effects among cases with worse renal function compared to cases with better functioning kidneys. We used The Human Protein Atlas, an online database, to access the biological plausibility for the most significant SNPs from the joint test. We found limited support for the top intronic SNPs from the joint test: rs9499393 (within ERCC3) and rs89572695 (within TMEM9B) have MAF < 10% and are highly expressed genes that are non-specific to renal cells. The effect of indel rs11404676 (within MTHFD1L), which was significantly greater among CKDnt cases with worse renal function than cases with better function (betaG*sCr= 1.4, pG*sCr = 1.4 x10 -6), is more likely to be biologically relevant to purine biosynthesis and urate concentration. We have demonstrated that we can identify known sUA associated SNPs with large effects, and that other Mesoamerican-specific variants better explain serum urate than known ABCG2 SNPs. Renal function among non-cases should be addressed in future GWAS from this population and caution is advised when extrapolating multi-ethnic GWAS results to Mesoamericans.
Identifer | oai:union.ndltd.org:bu.edu/oai:open.bu.edu:2144/49139 |
Date | 06 August 2024 |
Creators | Leone, Dominick Anthony |
Contributors | Brooks, Daniel R. |
Source Sets | Boston University |
Language | en_US |
Detected Language | English |
Type | Thesis/Dissertation |
Rights | Attribution 4.0 International, http://creativecommons.org/licenses/by/4.0/ |
Page generated in 0.004 seconds