• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2248
  • 282
  • 242
  • 228
  • 46
  • 38
  • 31
  • 31
  • 31
  • 31
  • 31
  • 31
  • 20
  • 18
  • 14
  • Tagged with
  • 3810
  • 1339
  • 539
  • 506
  • 462
  • 424
  • 414
  • 413
  • 390
  • 387
  • 325
  • 308
  • 306
  • 297
  • 278
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Characterization of Gene-by-Age Interaction and Gene-by-Gene Interaction In Coronary Artery Disease

Zhao, Yi January 2012 (has links)
<p>The success of genome-wide association studies (GWAS) has been limited by missing heritability and lack of biological relevance of identified variants. We sought to address these issues by characterizing interaction among genotypes and environment using case-control samples enrolled at Duke University Medical Center. First, we studied the impact of age on coronary artery disease (CAD). Gene-by-age (GxAGE) interactions were tested at genome-wide scale, along with genes' marginal effects in age-stratified groups. Based on the interaction model, age plays the role as a modifier of the age-CAD relationship. SNPs associated with CAD in both young and old demonstrate consistency in effect sizes and directions. In spite of these SNPs, vastly different CAD associated genes were discovered across age and race groups, suggesting age-dependent mechanisms of CAD onset. Second, we explored gene-by-gene interaction (GxG) using a statistical model and compared results to biological evidence. Specifically, we investigated GATA2 as a candidate gene transcription factor, and modeled the interaction with genome-wide SNPs. The genetic effects at interacting loci were modified by GATA2 genotype. Without taking GATA2 variants into account , no marginal main effects were detected. Open access ChIP-seq data was available for comparison with the statistical model, and to relate GWAS findings with biological mechanisms. The agreement between the statistical and biological models was very limited.</p> / Thesis
2

Imputation of Microsatellite Markers With Tag SNPs

Knodt, Annchen January 2012 (has links)
<p>Of the two most common forms of genetic variation in the human genome, Single Nucleotide Polymorphisms (SNPs) and Variable Number Tandem Repeat Polymorphisms (VNTRs), SNPs are much more easily and inexpensively assayed in a high-throughput manner. For this reason, we seek to explore methods that can allow us to use the more readily available SNP genotype information to infer VNTR genotypes in nearby genomic regions. We focus in particular on imputing a VNTR polymorphism, 5-HTTLPR, in the promoter region of the serotonin transporter gene in a small sample of individuals from an ongoing neuroimaging genetics study, a portion of whom have both manual 5-HTTLPR genotypes and genome wide SNP data. We investigate four imputation methods: Tagger, Vertex Discriminant Analysis (VDA), IMPUTE2, and BEAGLE. We achieve an accuracy of 93% with VDA in our subsample of Caucasians with manual 5-HTTLPR genotypes. Further, we find that for the entire Caucasian subsample without manual genotypes, a majority of the imputation methods tested make the same 5-HTTLPR genotype call.</p> / Thesis
3

Integrative Genomic Modeling of Complex Traits using Pathway Analysis

Bennett, Brian D. January 2012 (has links)
<p>Understanding the root molecular causes driving complex traits is a fundamental challenge in genomics and genetics. Numerous studies have used variation in gene expression to understand complex traits, but the underlying genomic variation that contributes to these expression changes is not well understood. The overall goal of this work is to develop an integrative framework to better understand the genetic and molecular causes of complex traits, including complex diseases. In this work, I present a computational framework that I developed to integrate gene expression and other genomic data to identify biological differences between samples from opposing complex trait classes that are driven by expression changes and genomic variation. This framework combines analysis on the multi-gene biological pathway level with multi-task learning to build predictive models that also uncover pathways potentially relevant to the complex trait of interest. To validate this framework, I first performed a simulation study to test its predictive ability and to measure how well it uncovered pathways that contain genes that are both differentially expressed and genetically associated with a complex trait. The predictive performance of the multi-task model was found to be comparable to other similar methods. Also, multi-task learning, along with other methods that jointly considered pathway scores from both data sets, was able to better identify pathways with both genetic and expression differences related to the phenotype. I applied this framework to gene expression and genotype data from estrogen receptor (ER) positive and ER negative breast cancer samples. The top 15 predictive pathways from the multi-task model were all related to estrogen, steroids, cell signaling, or the cell cycle. The results from both the simulation studies and the breast cancer analysis suggest that this multi-task framework is useful for both identifying biologically relevant pathways associated with a phenotype across multiple data types while also retaining similar predictive performance as other similar methods.</p> / Dissertation
4

Evolutionary Factors Shaping Haplotype and Nucleotide Diversity in Humans and Malaria

McGee, Kate 08 February 2008 (has links)
Cheaper and more rapid DNA sequencing has led to the accumulation of large amounts of genetic data and has fueled the development of new methods to analyze this data. Using population genetics theory and computational methods we can explore the evolutionary forces that shape genetic variation within and among populations of humans and malaria parasites. Demographic events such as population size change influence current patterns of genetic variation. Accounting for the demographic history of a population is critical in the interpretation of population genetic analyses, particularly in detecting of regions under selection and in making inferences about linkage disequilibrium. Characterizing how recombination rates evolve is critical for the efficient design of association studies and, in turn, the understanding of the genetics behind complex phenotypes. In malaria parasites, recombination is a key element in the creation of a wide array of antigens, which help invade host cells. We examine patterns of genetic variation in humans and malaria and explore how demographic history and recombination rates affect these patterns.
5

Analysis of Multilocus Linkage Disequilibrium Structure in the Human Genome

Kim, Yunjung 30 March 2008 (has links)
The International HapMap Project and high- throughput genotyping technology have generated millions of genome-wide marker data that can be used in genetic studies. Each marker can be analyzed separately. But analyzing multiple markers simultaneously through haplotypes has generated great interest recently. Understanding the haplotype structure in the human genome may provide important information on human evolutionary history and identification of genetic variants responsible for human complex diseases. Since the alleles at closely linked markers on a single chromosome are often in statistical dependence (i.e. linkage disequilibrium (LD)), one crucial aspect of haplotype analysis is to characterize LD patterns in different regions and different populations. To assess the extent of correlation of genetic variation at multiple markers in a given region and a population, pairwise LD measures such as and have been commonly used. However, pairwise LD measures alone may be suboptimal to effectively capture the variability of background levels of disequilibrium since multilocus LD measures can provide information about simultaneous allele associations among multiple loci which pairwise LD measures miss. In addition, in order to fully characterize the haplotype structure and LD pattern at multiple markers, it is necessary to consider high order disequilibria and estimate their values.
6

Identifying Transcription Factor Targets and Studying Human Complex Disease Genes

Wang, Tianyuan 13 April 2009 (has links)
Transcription factors (TFs) have been characterized as mediators of human complex disease processes. The target genes of TFs also may be associated with disease. Identification of potential TF targets could further our understanding of gene-gene interactions underlying complex disease. We focused on two TFs, USF1 and ZNF217, because of their biological importance, especially their known genetic association with coronary artery disease (CAD), and the availability of chromatin immunoprecipitation microarray (ChIP-chip) results. First, we used USF1 ChIP-chip data as a training dataset to develop and evaluate several kernel logistic regression prediction models. Our most accurate predictor significantly outperformed standard PWM-based prediction methods. This novel prediction method enables a more accurate and efficient genome-scale identification of USF1 binding and associated target genes. Second, the results from independent linkage and gene expression studies suggest that ZNF217 also may be a candidate gene for CAD. We further investigated the role of ZNF217 for CAD in three independent CAD samples with different phenotypes. Our association studies of ZNF217 identified three SNPs having consistent association with CAD in three samples. Aorta expression profiling indicated that the proportion of the aorta with raised lesions was also positively correlated to ZNF217 expression. The combined evidence suggests that ZNF217 is a novel susceptibility gene for CAD. Finally, we applied our previously developed TF binding site (TFBS) prediction method to ZNF217. The performance of the prediction models of ZNF217 and USF1 are very similar. We demonstrated that our TFBS prediction method can be extended to other TFs. In summary, the results of this dissertation research are (1) evaluation of two TFs, USF1 and ZNF217, as susceptibility factors for CAD; (2) development of a generalized method for TFBS prediction; (3) prediction of TFBSs and target genes of two TFs, and identification of SNPs within TFBSs. This research allows for the development of study design to access TF based interactions in genetic susceptibility to human complex disease.
7

Site-to-site rate variation in protein coding genes

Mannino, Frank Vincent 28 April 2006 (has links)
The ability to realistically model gene evolution improved dramatically with the rejection of the assumption that rates are constant across sites. Rate heterogeneity models allow for better estimates of parameters and site specific inferences such as the detection of positive selection. Recently developed models of codon evolution allow for both synonymous and nonsynonymous rates to vary independently according to discretized gamma distributions. I applied this model to mitochondrial genomes and concluded that synonymous rate variation is present in many genes, and is of appreciable magnitude relative to the amount of nonsynonymous heterogeneity. I then extending this model to allow for the two rates to vary according to a dependent bivariate distribution, permitting tests for the significance of correlation of rates within a gene. I present here the algorithm to discretize this bivariate distribution and the application of the model to many real data sets. Significant correlation between synonymous and nonsynonymous rates exists in roughly half of the data sets that I examined, and the correlation is typically positive. These data sets range over a wide group of taxa and genes, implying that the trend of correlation is general. Finally, I performed a thorough investigation of the statistical properties of using discretized gamma distributions to model rate variation, looking at the bias and variance in parameter estimates. These discretized distributions are common in modeling heterogeneity, but have weaknesses that must be well understood before making inferences.
8

Spontaneous Mutation Discovery via High-Throughput Sequencing of Pedigrees

Keebler, Jonathan Edward Myers 20 April 2010 (has links)
Recent technological advances have made high-throughput DNA sequencing a routine laboratory experiment. This progression in technology has been made possible by the parallel production of millions of short fragments of sequence. The responsibility of garnering biological information from these DNA fragments has shifted from the wet-lab to the bioinformatician. As sequencing technology is applied to a growing number of individual human genomes, entire families are now being sequenced. Information contained within the pedigree of a sequenced family can be leveraged when inferring the donorsâ genotypes, a task that is not necessarily trivial using high-throughput sequencing reads. A violation of Mendelian inheritance laws observed amid the resequenced genomes of family members can indicate the presence of a de novo mutation. A method for locating de novo mutations by probabilistically inferring genotypes across a pedigree using high-throughput sequencing is presented and applied to two resequenced nuclear families: one as a collaborative effort within The 1,000 Genomes Project, and the second in an attempt to discover candidate driver and passenger mutations within the genome of an Acute Lymphoblastic Leukemia. The mutation findings within these projects are presented, and the approach is examined in detail, highlighting areas where method improvements may be made. Considering the challenges experienced in these studies within the larger context of the nascent field of Personal Genomics, an honest assessment is presented of developments that must be made before the application of whole-genome sequencing on the scale of an individual human can unequivocally be used to predict, diagnose, or treat human disease.
9

Plant Molecular Evolution

Strain, Errol Alan 07 August 2006 (has links)
The current dissertation looks at the molecular evolution of protein-coding genes in the flowering plant Arabidopsis thaliana and within two RNA viruses, humanimmunodeficiency virus (HIV) and Astroviridae. We analyzed members of the receptor-like kinase (RLK) gene family in Arabidopsis thaliana for positive selection. Likelihood analysis found evidence for positive selection in 12 of the 52 RLK family sequences groups. These 12 groups represent 97 of the 403 sequences analyzed. The majority of genes in groups subject to positive selection have not been functionally characterized, but sites under selection are predominantly located in the extracellular region. In HIV we use Akaike Information Criteria (AIC) based model averaging for models of nucleotide evolution to examine estimates of genetic distance and the ratio of transition/transversion (ts/tv). AIC weighted estimates of distance and ts/tv were shown to be robust relative to model assumptions. AIC weighted estimates of the ts/tv ratio in simulated HIV sequences generally had less variance than similar estimates made by selecting the single best scoring AIC model. Astroviruses are a leading cause of viral gastroenteritis in infants worldwide and little is known about the mechanisms of astrovirus-induced diarrhea or the virally encoded components responsible for disease. We report the genomic sequence of nine novel TAstV-2 isolates. Nucleotide and amino acid identities for the isolates were generally > 90% conserved. Phylogenies constructed using genomic RNA and the individual open reading frames (ORF) provide evidence for recombination and indicate differences in substitution rates between non-structural and structural genes. Analysis of the viral capsid genes using codon models of evolution indicate site-specific positive selection in both turkey and human astroviruses.
10

Methods Evaluation and Application in Complex Human Genetic Disease

Lou, Xuemei 04 August 2008 (has links)
One of the most important tasks in human genetics is to search for disease susceptibility genes. Linkage and association analyses are two major approaches for disease-gene mapping. Chapter 1 reviewed the development of disease-gene mapping methods in the past decades. Gene mapping of complex human diseases often results in the identification of multiple potential risk variants within a gene and/or in the identification of multiple genes within a linkage peak. Thus a question of interest is to test whether the linkage result can be explained in part or in full by the candidate SNP if it shows evidence of association, and then provide some guidance for the next time-consuming step of positional cloning of susceptibility genes. Two methods, GIST and LAMP, which access whether the SNP can partially or fully account for the linkage signal in the region identified by a linkage scan, are evaluated on Genetic Analysis Workshop 15 (GAW15) simulated rheumatoid arthritis (RA) data and discussed in Chapter 2. The simulation results showed that GIST is simple and works slightly better than LAMP-LE test when there is little linkage evidence, LAMP linkage test has limited power when there is not much linkage evidence, and LAMP association test is the best not only when the linkage evidence is extremely high, but also when there is some LD between the candidate SNP and the trait locus. The fact that complex traits are often determined by multiple genetic and environmental factors with small-to-moderate effects makes it important to investigate the behavior of current association methods under multiple risk variants model. In Chapter 3, we compared APL, FBAT, LAMP, APL-Haplotype, FBAT-LC and APL-OSA conditional test in five multiple risk variants models. The simulation results showed that the power of single marker association tests is closely correlated with the amount of LD between marker and disease loci, and these tests maintain good power to detect multiple risk variants in a small region with moderate degree of LD for fully genotyped families. Global tests, such as FBAT-LC are sensitive to the presence of at least one susceptibility variant, but are not helpful for selecting the most promising SNPs for further study. We reported that if multiple haplotypes are associated with different disease loci, the haplotype tests results can be misleading while APL-OSA conditional test has the greatest power to properly dissect the clustered associated markers for all models with an acceptable type I error rate ranging from 0.033 to 0.056. We applied APL-OSA conditional test on GENECARD samples, and got reasonable results. One linkage region of particular interest on chromosome 3 was identified by two independent genome linkage scan with Coronary Artery Disease (CAD). Multiple disease susceptibility genes have been reported from this region, and there are also linkage evidence that this region may harbors a gene or genes determining HDL-C levels. Within this region, a search for HDL-C QTL and analyses of the relationship between genetic variants, HDL-C level to CAD risk are discussed in Chapter 4. We performed CAD association and HDL-C QTL analysis on two independent datasets. We identified SNP rs2979307 in the OSBPL11 gene which survives a Bonferroni correction. We observed different HDL-C trends with HDL-C associated SNPs. Even with the evident heterogeneity presented in our CAD population, we detected several association signals with SNPs in KALRN, MYLK, CDGAP and PAK2 genes in both CAD datasets for HDL-C, where all these genes belong to a Rho pathway.

Page generated in 0.1522 seconds