Starmer, Joshua Mr.
02 November 2006
Molecular biologists have been observing interactions between messenger RNA (mRNA) molecules and other non-coding RNA molecules for quite some time. Here I revisit some of the classical hybridizations between the 16S ribosomal RNA (rRNA) and mRNA during initiation, as well as investigate the interactions between small interfering RNA (siRNA) molecules and mRNA. In reviewing rRNA-mRNA interactions, I observed that the majority of both bacterial and eukaryote genes can bind at the start codon. This novel result lead to a method for improving genome annotation as well as a new theory of translation initiation. The examination of siRNA-mRNA interactions lead to new criteria for predicting an siRNA's efficacy.
30 November 2004
Disease gene mapping is one of the main focuses of genetic epidemiology and statistical genetics. This dissertation explores some methods and algorithms in this area, especially in pedigrees. The first chapter gives an introduction to human genetics and disease gene mapping. Existing linkage and association methods are introduced and compared. Probabilities of genotypic data from multiple linked marker loci on related individuals are used as likelihoods of gene locations for gene-mapping, or as likelihoods of other parameters of interest in human genetics. With the recent development in genetics and molecular biology techniques, large-scale marker data has become available, which requires highly efficient likelihood calculations especially for complex pedigrees. Algorithms for likelihood calculations for pedigree data are reviewed in chapter 2. Besides exact likelihood calculation methods and MCMC, a Sequential Importance Sampling (SIS) approach has been proposed to enable calculations for large pedigrees with large numbers of markers. However, when the system gets large, the variance of the importance sampling weights increases while both efficiency and accuracy of the method decrease. We propose an optimization algorithm for calculating the likelihood of general pedigrees in Chapter 3. We incorporate a resampling strategy into SIS to reduce the variance inflation problem. A successful linkage analysis may identify a linkage region of interest containing hundreds of genes at a magnitude of perhaps ten to thirty centiMorgans. A follow-up association (or so-called linkage disequilibrium) analysis can provide much finer gene-mapping but is subject to greater multiple testing problems. In Chapter 4, we present a method for determining whether an association result is responsible for a non-parametric linkage result for binary traits in general pedigrees. The correlation between family frequency of a variant of interest and family LOD score is used as a measure of whether the association between a given variant at a marker and the disease status can help to explain a significant linkage result seen in the collection of families in the region around the marker.
05 November 2007
Genetic association studies aim to detect association between one or more genetic polymorphisms and complex traits, which might be some quantitative characteristic or a qualitative attribute of disease. In Chapter 1, we introduce the development of methods for association mapping in the past decades and present the rationale behind our X-linked method development. Family-based association methods have been well developed for autosomes, but unique features of X-linked markers have received little attention. In Chapter 2, we propose a likelihood approach (X-LRT) to estimate genetic risks and test association using a case-parents design. The method uses nuclear families with a single affected proband, and allows additional siblings and missing parental genotypes. We also extend X-LRT from a single-marker test to a multiple-marker haplotype analysis. Our X-LRT offers great flexibility for testing different penetrance relationships within and between sexes. In addition, estimation of relative risks provides a measure of the magnitude of X-linked genetic effects on complex disorders. In Chapter 3 and 4, we fill the methodological gaps by developing two approaches (X-QTL and X-HQTL) to test association between X-linked marker alleles/haplotypes and quantitative traits in nuclear family design. We adopt the orthogonal decomposition which provides consistent estimates of the additive genetic values of marker alleles/haplotypes. Joint estimation of the linkage variance component in the association model reduces type I errors to nominal expectations. Dosage compensation models provide a simple relationship of X-linked additive effects between sexes. In Chapter 2, 3, and 4, our simulation results demonstrate the validity and substantially higher power of our approaches compared with other existing programs. We also apply our methods to MAOA & MAOB candidate-gene studies of family data with Parkinson disease. In Chapter 5, we discuss some issues relevant to the design and execution of our X-linked family-based association studies.
STATISTICAL METHODS FOR FAMILY-BASED ASSOCIATION STUDIES FOR COMPLEX HUMAN DISEASES: SINGLE-LOCUS AND HAPLOTYPE METHODSChung, Ren-Hua 15 December 2006 (has links)
Disease-gene fine-mapping is an important task in human genetics. Linkage and association analyses are the two main approaches for exploring disease susceptibility genes. In Chapter 1, we introduce the development of methods for disease-gene mapping in the past decades and present the rationale behind our new method development. Family-based association analyses have provided powerful tools for disease-gene mapping. The Association in the Presence of Linkage test (APL), a family-based association method, can use nuclear families with multiple affected siblings and infer missing parental genotypes properly in the linkage region. In Chapter 2, we generalized and extended APL so that it can be applied to general nuclear family structures using a bootstrap variance estimator. Unlike the original APL that can handle at most two affected siblings, the new APL can handle up to three affected siblings. We also extended APL from a single-marker test to a multiple-marker haplotype analysis. According to our simulations, the new APL has a correct type I error rate and more power than other family-based association methods such as PDT, FBAT/HBAT, and PDTPHASE in nuclear families with missing parents. The robustness of APL when there are rare alleles or haplotypes and when there is population substructure such that the allele frequencies in the population deviated from the Hardy-Weinberg Equilibrium (HWE) assumption was also examined in Chapter 2. Genes on the X chromosome play a role in many common diseases. Linkage analyses have identified regions on the X chromosome with high linkage peaks for several diseases. Currently there are few family-based association methods available for X-chromosome markers. In order to fill in this gap, we proposed a novel family-based association method, X-APL, in Chapter 3. X-APL is a modification of APL and shares some important properties with APL. X-APL can also perform haplotype analyses, which is the only family-based test of association we are aware of for testing haplotypes for the X-chromosome markers. Our simulation results showed that X-APL has a correct type I error rate and has more power than other family-based association methods for X chromosome such as XS-TDT, XPDT and XMCPDT for single-marker analysis in nuclear families. The robustness of X-APL when there are deviations of genotype frequencies from HWE was also examined in Chapter 3. Linkage and family-based association analyses are often applied simultaneously in the same data in order to maximize use of family data sets. However, it is not intuitively clear under what conditions association and linkage tests performed in the same data set may be correlated. In Chapter 4, we used computer simulations and theoretical statements to estimate the correlation between linkage statistics (affected sib pair maximum LOD scores) and family-based association statistics (PDT and APL) under various hypotheses. Different types of pedigrees were studied: nuclear families with affected sib pairs, extended pedigrees and incomplete pedigrees. Both simulation and theoretical results showed that when there is either no linkage or no association, the linkage and association statistics are not correlated. When there is linkage and association in the data, the two tests have a positive correlation.
Conners, Shannon Burns
01 December 2005
Carbohydrate utilization and production pathways identified in Thermotoga species likely contribute to their ubiquity in hydrothermal environments. Many carbohydrate-active enzymes from Thermotoga maritima have been characterized biochemically; however, sugar uptake systems and regulatory mechanisms that control them have not been well defined. Transcriptional data from cDNA microarrays were examined using mixed effects statistical models to predict candidate sugar substrates for ABC (ATP-binding cassette) transporters in T. maritima. Genes encoding proteins previously annotated as oligopeptide/dipeptide ABC transporters responded transcriptionally to various carbohydrates. This finding was consistent with protein sequence comparisons that revealed closer relationships to archaeal sugar transporters than to bacterial peptide transporters. In many cases, glycosyl hydrolases, co-localized with these transporters, also responded to the same sugars. Putative transcriptional repressors of the LacI, XylR, and DeoR families were likely involved in regulating genomic units for beta-1,4-glucan, beta-1,3-glucan, beta-1,4-mannan, ribose, and rhamnose metabolism and transport. Carbohydrate utilization pathways in T. maritima may be related to ecological interactions within cell communities. Exopolysaccharide-based biofilms composed primarily of ?Ò-linked glucose, with small amounts of mannose and ribose, formed under certain conditions in both pure T. maritima cultures and mixed cultures of T. maritima and M. jannaschii. Further examination of transcriptional differences between biofilm-bound sessile cells and planktonic cells revealed differential expression of beta-glucan-specific degradation enzymes, even though maltose, an alpha-1,4 linked glucose disaccharide, was used as a growth substrate. Higher transcripts of genes encoding iron and sulfur compound transport, iron-sulfur cluster chaperones, and iron-sulfur cluster proteins suggest altered redox environments in biofilm cells. Further direct comparisons between cellobiose and maltose-grown cells suggested that transcription of cellobiose utilization genes is highly sensitive to the presence of cellobiose, or a cellobiose-maltose mixture. Increased transcripts of genes related to polysulfide reductases in cellobiose-grown cells and biofilm cells suggested that T. maritima cells in pure culture biofilms escaped hydrogen inhibition by preferentially reducing sulfur compounds, while cells in mixed culture biofilms form close associations with hydrogen-utilizing methanogens. In addition to probing issues related to the microbial physiology and ecology of T. maritima, this work illustrates the strategic use of DNA microarray-based transcriptional analysis for functional genomics studies.
Towards a Toxico-Chemogenomic Future: The Transformation of Public Gene Expression Data and Consideration for its Use.Williams-DeVane, ClarLynda Raynell 16 December 2008 (has links)
The term âtoxico-chemogenomicsâ is used to convey extension of toxicogenomics to more broadly survey gene expression changes across chemical space. Moving towards an improved, publicly available toxico-chemogenomics capability requires not only common data standards and protocols across public resources, but also broad data coverage within the chemical, genomics and toxicological information domains, and transparent and functional linkages of Internet data resources. The first goal of this project was to assess the current extent of standardization, interoperability, and chemical indexing of public genomics resources with respect to toxico-chemogenomics utility. Focusing on the largest of these public data resources â Gene Expression Omnibus (GEO) and ArrayExpress -- the second goal was to chemically index the full experimental content of these repositories to assess the current coverage of chemical exposure-related microarray experiments in relation to chemical space and toxicology, and to make these data accessible in relation to other publicly available, chemically-indexed toxicological information. Current standards for chemical annotation within ArrayExpress and GEO are presently inadequate to this task, such that development of new methodologies to mine the author-submitted content was required. A series of automated Perl programs were utilized along with extensive manual review to transform the raw experiment/study descriptions and text files into a standardized chemically-indexed inventory of microarray experiments in both resources. These files and top-level experiment annotations allowed for identification of all current chemical-associated experimental content as well as the subset of chemical exposure-related (or âTreatmentâ) content deemed most relevant to toxicogenomics in the GEO Series and ArrayExpress Repository experiment inventories. With chemical exposure experiments suitably indexed by chemical structure, it is possible for the first time to assess the breadth of chemical study space represented in these databases, as well as the overlapping chemical content, and to begin to assess the sufficiency of data for making chemical similarity inferences. Chemical indexing of public genomics databases is also the first step towards integrating chemical, toxicological and genomics data into predictive toxicology by providing linkages across public resources. The main products of this effort include the following: (1) published, downloadable and structure-searchable DSSTox Structure-Index (Locator) files for both the GEO Series (GEOGDS) and ArrayExpress Repository (ARYEXP), containing standard chemical fields for the unique chemical âTreatmentâ subset, accompanied by URLs to AccessionID experiment pages in GEO and ArrayExpress; (2) published, downloadable DSSTox Aux data files for GEOGDS and ARYEXP providing a chemical-experiment pair index to all chemical-associated content in each resource and containing 14 standard genomics fields (e.g., Experiment_Title, Experiment_Description, Experiment_ArrayType, Species, Number_Samples, etc.) and source-specific fields extracted from each resource (e.g., MIAME_Protocol, MIAMI_Factors, etc. for ArrayExpress); and (3) incorporation of the âTreatmentâ chemical-experiment pair index with URLs linked directly to AccessionID pages for GEO and ArrayExpress into the National Center for Biotechnology Information (NCBI) PubChem resource. The secondary product of this effort is a methodology discussion about the proper use of public microarray data with a demonstrative analysis of how one might use the newly identified public microarray data.
<p>The success of genome-wide association studies (GWAS) has been limited by missing heritability and lack of biological relevance of identified variants. We sought to address these issues by characterizing interaction among genotypes and environment using case-control samples enrolled at Duke University Medical Center. First, we studied the impact of age on coronary artery disease (CAD). Gene-by-age (GxAGE) interactions were tested at genome-wide scale, along with genes' marginal effects in age-stratified groups. Based on the interaction model, age plays the role as a modifier of the age-CAD relationship. SNPs associated with CAD in both young and old demonstrate consistency in effect sizes and directions. In spite of these SNPs, vastly different CAD associated genes were discovered across age and race groups, suggesting age-dependent mechanisms of CAD onset. Second, we explored gene-by-gene interaction (GxG) using a statistical model and compared results to biological evidence. Specifically, we investigated GATA2 as a candidate gene transcription factor, and modeled the interaction with genome-wide SNPs. The genetic effects at interacting loci were modified by GATA2 genotype. Without taking GATA2 variants into account , no marginal main effects were detected. Open access ChIP-seq data was available for comparison with the statistical model, and to relate GWAS findings with biological mechanisms. The agreement between the statistical and biological models was very limited.</p> / Thesis
<p>Of the two most common forms of genetic variation in the human genome, Single Nucleotide Polymorphisms (SNPs) and Variable Number Tandem Repeat Polymorphisms (VNTRs), SNPs are much more easily and inexpensively assayed in a high-throughput manner. For this reason, we seek to explore methods that can allow us to use the more readily available SNP genotype information to infer VNTR genotypes in nearby genomic regions. We focus in particular on imputing a VNTR polymorphism, 5-HTTLPR, in the promoter region of the serotonin transporter gene in a small sample of individuals from an ongoing neuroimaging genetics study, a portion of whom have both manual 5-HTTLPR genotypes and genome wide SNP data. We investigate four imputation methods: Tagger, Vertex Discriminant Analysis (VDA), IMPUTE2, and BEAGLE. We achieve an accuracy of 93% with VDA in our subsample of Caucasians with manual 5-HTTLPR genotypes. Further, we find that for the entire Caucasian subsample without manual genotypes, a majority of the imputation methods tested make the same 5-HTTLPR genotype call.</p> / Thesis
Thesis (Ph.D.) -- University of Texas at Arlington, 2008.
Ballinger, Tracy J.
29 October 2015
<p> In the last century cancer has become increasingly prevalent and is the second largest killer in the United States, estimated to afflict 1 in 4 people during their life. Despite our long history with cancer and our herculean efforts to thwart the disease, in many cases we still do not understand the underlying causes or have successful treatments. In my graduate work, I’ve developed two approaches to the study of cancer genomics and applied them to the whole genome sequencing data of cancer patients from The Cancer Genome Atlas (TCGA). In collaboration with Dr. Ewing, I built a pipeline to detect retrotransposon insertions from paired-end high-throughput sequencing data and found somatic retrotransposon insertions in a fifth of cancer patients. </p><p> My second novel contribution to the study of cancer genomics is the development of the CN-AVG pipeline, a method for reconstructing the evolutionary history of a single tumor by predicting the order of structural mutations such as deletions, duplications, and inversions. The CN-AVG theory was developed by Drs. Haussler, Zerbino, and Paten and samples potential evolutionary histories for a tumor using Markov Chain Monte Carlo sampling. I contributed to the development of this method by testing its accuracy and limitations on simulated evolutionary histories. I found that the ability to reconstruct a history decays exponentially with increased breakpoint reuse, but that we can estimate how accurately we reconstruct a mutation event using the likelihood scores of the events. I further designed novel techniques for the application of CN-AVG to whole genome sequencing data from actual patients and applied these techniques to search for evolutionary patterns in glioblastoma multiforme using sequencing data from TCGA. My results show patterns of two-hit deletions, as we would expect, and amplifications occurring over several mutational events. I also find that the CN-AVG method frequently makes use of whole chromosome copy number changes following by localized deletions, a bias that could be mitigated through modifying the cost function for an evolutionary history. </p>
Page generated in 3.2717 seconds