• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2258
  • 282
  • 242
  • 229
  • 46
  • 39
  • 31
  • 31
  • 31
  • 31
  • 31
  • 31
  • 20
  • 18
  • 14
  • Tagged with
  • 3825
  • 1349
  • 541
  • 507
  • 466
  • 424
  • 417
  • 417
  • 393
  • 388
  • 325
  • 310
  • 308
  • 299
  • 278
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.

Integrative Genomic Modeling of Complex Traits using Pathway Analysis

Bennett, Brian D. January 2012 (has links)
<p>Understanding the root molecular causes driving complex traits is a fundamental challenge in genomics and genetics. Numerous studies have used variation in gene expression to understand complex traits, but the underlying genomic variation that contributes to these expression changes is not well understood. The overall goal of this work is to develop an integrative framework to better understand the genetic and molecular causes of complex traits, including complex diseases. In this work, I present a computational framework that I developed to integrate gene expression and other genomic data to identify biological differences between samples from opposing complex trait classes that are driven by expression changes and genomic variation. This framework combines analysis on the multi-gene biological pathway level with multi-task learning to build predictive models that also uncover pathways potentially relevant to the complex trait of interest. To validate this framework, I first performed a simulation study to test its predictive ability and to measure how well it uncovered pathways that contain genes that are both differentially expressed and genetically associated with a complex trait. The predictive performance of the multi-task model was found to be comparable to other similar methods. Also, multi-task learning, along with other methods that jointly considered pathway scores from both data sets, was able to better identify pathways with both genetic and expression differences related to the phenotype. I applied this framework to gene expression and genotype data from estrogen receptor (ER) positive and ER negative breast cancer samples. The top 15 predictive pathways from the multi-task model were all related to estrogen, steroids, cell signaling, or the cell cycle. The results from both the simulation studies and the breast cancer analysis suggest that this multi-task framework is useful for both identifying biologically relevant pathways associated with a phenotype across multiple data types while also retaining similar predictive performance as other similar methods.</p> / Dissertation

Analysis of genomic rearrangements in cancer from high throughput sequencing data

Ballinger, Tracy J. 29 October 2015 (has links)
<p> In the last century cancer has become increasingly prevalent and is the second largest killer in the United States, estimated to afflict 1 in 4 people during their life. Despite our long history with cancer and our herculean efforts to thwart the disease, in many cases we still do not understand the underlying causes or have successful treatments. In my graduate work, I&rsquo;ve developed two approaches to the study of cancer genomics and applied them to the whole genome sequencing data of cancer patients from The Cancer Genome Atlas (TCGA). In collaboration with Dr. Ewing, I built a pipeline to detect retrotransposon insertions from paired-end high-throughput sequencing data and found somatic retrotransposon insertions in a fifth of cancer patients. </p><p> My second novel contribution to the study of cancer genomics is the development of the CN-AVG pipeline, a method for reconstructing the evolutionary history of a single tumor by predicting the order of structural mutations such as deletions, duplications, and inversions. The CN-AVG theory was developed by Drs. Haussler, Zerbino, and Paten and samples potential evolutionary histories for a tumor using Markov Chain Monte Carlo sampling. I contributed to the development of this method by testing its accuracy and limitations on simulated evolutionary histories. I found that the ability to reconstruct a history decays exponentially with increased breakpoint reuse, but that we can estimate how accurately we reconstruct a mutation event using the likelihood scores of the events. I further designed novel techniques for the application of CN-AVG to whole genome sequencing data from actual patients and applied these techniques to search for evolutionary patterns in glioblastoma multiforme using sequencing data from TCGA. My results show patterns of two-hit deletions, as we would expect, and amplifications occurring over several mutational events. I also find that the CN-AVG method frequently makes use of whole chromosome copy number changes following by localized deletions, a bias that could be mitigated through modifying the cost function for an evolutionary history. </p>

Structural and bioinformatic analysis of ethylmalonyl-CoA decarboxylase

Roberts, Rick Lee 20 October 2015 (has links)
<p> Many enzymes of the major metabolic pathways are categorized into superfamilies which share common folds. Current models postulate these superfamilies are the result of gene duplications coupled with mutations that result in the acquisition of new functions. Some of these new functions are considered advantageous and selected for, while others may simply be tolerated. The latter can result in metabolites being produced at low rates that are of no known use by the cell, and can become toxic when accumulated. Concurrent with the evolution of this tolerable or potentially detrimental metabolism, organisms are selected to evolve a means of correcting or &ldquo;proofreading&rdquo; these non-canonical metabolites to counterbalance their detrimental effects. Metabolite proofreading is a process of intermediary metabolism analogous to DNA proof reading that acts on these abnormal metabolites to prevent their accumulation and toxic effects. </p><p> Here we structurally characterize ethylmalonyl-CoA decarboxylase (EMCD), a member of the family of enoyl-CoA hydratases within the crotonase superfamily of proteins, which is coded by the ECHDC1 (enoyl-CoA hydratase domain containing 1) gene. EMCD has been shown to have a metabolic proofreading property, acting on the metabolic byproduct ethylmalonyl-CoA to prevent its accumulation which could result in oxidative damage. We use the complimentary methods of in situ crystallography, small angle X-ray scattering, and single crystal X-ray crystallography to structurally characterize EMCD, followed by homology analysis in order to propose a mechanism of action. This represents the first structure of a crotonase superfamily member thought to function as a metabolite proof reading enzyme.</p>

Computational Identification of B Cell Clones in High-Throughput Immunoglobulin Sequencing Data

Gupta, Namita 08 September 2017 (has links)
<p> Humoral immunity is driven by the expansion, somatic hypermutation, and selection of B cell clones. Each clone is the progeny of a single B cell responding to antigen. with diversified Ig receptors. The advent of next-generation sequencing technologies enables deep profiling of the Ig repertoire. This large-scale characterization provides a window into the micro-evolutionary dynamics of the adaptive immune response and has a variety of applications in basic science and clinical studies. Clonal relationships are not directly measured, but must be computationally inferred from these sequencing data. In this dissertation, we use a combination of human experimental and simulated data to characterize the performance of hierarchical clustering-based methods for partitioning sequences into clones. Our results suggest that hierarchical clustering using single linkage with nucleotide Hamming distance identifies clones with high confidence and provides a fully automated method for clonal grouping. The performance estimates we develop provide important context to interpret clonal analysis of repertoire sequencing data and allow for rigorous testing of other clonal grouping algorithms. We present the clonal grouping tool as well as other tools for advanced analyses of large-scale Ig repertoire sequencing data through a suite of utilities, Change-O. All Change-O tools utilize a common data format, which enables the seamless integration of multiple analyses into a single workflow. We then apply the Change-O suite in concert with the nucleotide coding se- quences for WNV-specific antibodies derived from single cells to identify expanded WNV-specific clones in the repertoires of recently infected subjects through quantitative Ig repertoire sequencing analysis. The method proposed in this dissertation to computationally identify B cell clones in Ig repertoire sequencing data with high confidence is made available through the Change-O suite and can be applied to provide insight into the dynamics of the adaptive immune response.</p><p>

Combining Protein Interactions and Functionality Classification in NS3 to Determine Specific Antiviral Targets in Dengue

Alomair, Lamya 15 September 2017 (has links)
<p> Dengue virus (DENV) is a serious worldwide health concern putting about 2.5 billion people in more than 100 countries at-risk Dengue is a member of the flaviviridae family, is transmitted to human via mosquitos. Dengue is a deadly viral disease. Unfortunately, there are no vaccines or antiviral that can prevent this infection and that is why researchers are diligently working to find cures. The DENV genome codes for multiple nonstructural proteins one of which is the NS3 enzyme that participates in different steps of the viral life cycle including viral replication, viral RNA genome synthesis and host immune mechanism. Recent studies suggest the role of fatty acid biogenesis during DENV infection, including posttranslational protein modification. Phosphorylation is among the protein post translational modifications and plays essential roles in protein folding, interactions, signal transduction, survival and apoptosis. </p><p> In silico study provides a powerful approach to gain a better understanding of the biological systems at the gene level. NS3 has the potential to be phosphorylated by any of the &sim;500 human kinases. We predicted potential kinases that might phosphorylate NS3 and calculated Dena ranking score using neural network and other machine learning based webserver programs. These scores enabled us to investigate and identify the top kinases that phosphorylate DENV NS3. We hypothesize that preventing the phosphorylation of NS3 may interrupt the viral replication and participate in antiviral evasion. Using multiple sequence alignment bioinformatics tools we verified the results of the highly conserved residues and the residues around active sites whose phosphorylation may have a potential effect on viral replication. We further verified the results with multiple bioinformatics tools. Moreover, we included the Zika virus in our research and analysis taking into consideration the facts that Zika is related to the dengue virus because it belongs to the same Flavivirus genus affecting humans which might lead to a lot of similarities between Zika and Dengue, and that Zika is available for <i>in vitro</i> testing. </p><p> Our studies propose that the Host-Mediated Phosphorylation of NS3 would affect its capability to interact with NS5 and knocking out one of the interacting proteins may inhibit viral replication. These results will open new doors for further investigation and future work is expected to help identify the key inhibition mechanisms.</p><p>

Cancer Bioinformatics for Biomarker Discovery

Webber, James Trubek 16 November 2017 (has links)
<p> Cancer is a complex and multifaceted disease, and a vast amount of time and effort has been spent on characterizing its behaviors, identifying its weaknesses, and discovering effective treatments. Two major obstacles stand in the way of progress toward effective precision treatment for the majority of patients.</p><p> First, cancer's extraordinary heterogeneity&mdash;both between and even within patients&mdash;means that most patients present with a disease slightly different from every previously recorded case. New methods are necessary to analyze the growing body of patient data so that we can classify each new patient with as much accuracy and precision as possible. In chapter 2 I present a method that integrates data from multiple genomics platforms to identify axes of variation across breast cancer patients, and to connect these gene modules to potential therapeutic options. In this work we find modules describing variation in the tumor microenvironment and activation of different cellular processes. We also illustrate the challenges and pitfalls of translating between model systems and patients, as many gene modules are poorly conserved when moving between datasets.</p><p> A second problem is that cancer cells are constantly evolving, and many treatments inevitably lead to resistance as new mutations arise or compensatory systems are activated. To overcome this we must find rational combinations that will prevent resistant adaptation before it can start. Starting in chapter 3 I present a series of projects in which we used a high-throughput proteomics approach to characterize the activity of a large proportion of protein kinases, ending with the discovery of a promising drug combination for the treatment of breast cancer in chapter 8.</p><p>

A Novel Approach to the Comparative Genomic Analysis of Canine and Human Cancers

January 2018 (has links)
abstract: Study of canine cancer’s molecular underpinnings holds great potential for informing veterinary and human oncology. Sporadic canine cancers are highly abundant (~4 million diagnoses/year in the United States) and the dog’s unique genomic architecture due to selective inbreeding, alongside the high similarity between dog and human genomes both confer power for improving understanding of cancer genes. However, characterization of canine cancer genome landscapes has been limited. It is hindered by lack of canine-specific tools and resources. To enable robust and reproducible comparative genomic analysis of canine cancers, I have developed a workflow for somatic and germline variant calling in canine cancer genomic data. I have first adapted a human cancer genomics pipeline to create a semi-automated canine pipeline used to map genomic landscapes of canine melanoma, lung adenocarcinoma, osteosarcoma and lymphoma. This pipeline also forms the backbone of my novel comparative genomics workflow. Practical impediments to comparative genomic analysis of dog and human include challenges identifying similarities in mutation type and function across species. For example, canine genes could have evolved different functions and their human orthologs may perform different functions. Hence, I undertook a systematic statistical evaluation of dog and human cancer genes and assessed functional similarities and differences between orthologs to improve understanding of the roles of these genes in cancer across species. I tested this pipeline canine and human Diffuse Large B-Cell Lymphoma (DLBCL), given that canine DLBCL is the most comprehensively genomically characterized canine cancer. Logistic regression with genes bearing somatic coding mutations in each cancer was used to determine if conservation metrics (sequence identity, network placement, etc.) could explain co-mutation of genes in both species. Using this model, I identified 25 co-mutated and evolutionarily similar genes that may be compelling cross-species cancer genes. For example, PCLO was identified as a co-mutated conserved gene with PCLO having been previously identified as recurrently mutated in human DLBCL, but with an unclear role in oncogenesis. Further investigation of these genes might shed new light on the biology of lymphoma in dogs and human and this approach may more broadly serve to prioritize new genes for comparative cancer biology studies. / Dissertation/Thesis / Doctoral Dissertation Biomedical Informatics 2018

Studying Low Complexity Structures in Bioinformatics Data Analysis of Biological and Biomedical Data

Causey, Jason L. 02 June 2018 (has links)
<p> Biological, biomedical, and radiological data tend to be large, complex, and noisy. Gene expression studies contain expression levels for thousands of genes and hundreds or thousands of patients. Chest Computed Tomography images used for diagnosing lung cancer consist of hundreds of 2-D image &rdquo;slices&rdquo;, each containing hundreds of thousands of pixels. Beneath the size and apparent complexity of many of these data are simple and sparse structures. These low complexity structures can be leveraged into new approaches to biological, biomedical, and radiological data analyses. Two examples are presented here. First, a new framework SparRec (Sparse Recovery) for imputation of GWAS data, based on a matrix completion (MC) model taking advantage of the low-rank and low number of co-clusters of GWAS matrices. SparRec is flexible enough to impute meta-analyses with multiple cohorts genotyped on different sets of SNPs, even without a reference panel. Compared with Mendel-Impute, another MC method, our low-rank based method achieves similar accuracy and efficiency even with up to 90% missing data; our co-clustering based method has advantages in running time. MC methods are shown to have advantages over statistics-based methods, including Beagle and fastPhase. Second, we demonstrate NoduleX, a method for predicting lung nodule malignancy from chest Computed Tomography (CT) data, based on deep convolutional neural networks. For training and validation, we analyze >1000 lung nodules in images from the LIDC/IDRI cohort and compare our results with classifications provided by four experienced thoracic radiologists who participated in the LIDC project. NoduleX achieves high accuracy for nodule malignancy classification, with an AUC of up to 0.99, commensurate with the radiologists&rsquo; analysis. Whether they are leveraged directly or extracted using mathematical optimization and machine learning techniques, low complexity structures provide researchers with powerful tools for taming complex data. </p><p>

Identification and mixture deconvolution of ancient and forensic DNA using population genomic data

Vohr, Samuel H. 14 January 2017 (has links)
<p> Forensic scientists routinely use DNA for identification and to match samples with individuals. Although standard approaches are effective on a wide variety of samples in various conditions, issues such as low-template DNA samples and mixtures of DNA from multiple individuals pose significant challenges. Extreme examples of these challenges can be found in the field of ancient DNA, where DNA recovered from ancient remains is highly fragmented and marked by patterns of DNA-damage. Additionally, ancient libraries are often characterized by low endogenous DNA content and contaminating DNA from outside sources. As a result, standard forensics approaches, such as amplification of short-tandem repeats, are not effective on ancient samples. Alternatively, ancient DNA is routinely directly sequenced using high-throughput sequencing to survey the molecules that are present within a library. However, the resulting sequences are not easily compared for the purposes of identification, as each data set represents a random and, in some cases, non-overlapping, sample of the genome.</p><p> In this dissertation, I present two approaches for interpreting shotgun sequences that address two common issues in forensic and ancient DNA: extremely low nuclear genome coverage and mixtures of sequences from multiple individuals. First, I present an approach to test for a common source individual between extremely low-coverage sequence data sets that makes use of the vast number of single-nucleotide polymorphisms (SNPs) discovered by surveys of human genetic diversity. As almost no observed SNP positions will be common to both samples, our method uses patterns of linkage disequilibrium as modeled by a panel of haplotypes to determine whether observations made across samples are consistent with originating from a single individual. I demonstrate the power of this approach using coalescent simulations, downsampled high-throughput sequencing data and published ancient DNA data. Second, I present an approach for interpreting mixtures of mitochondrial DNA sequences from multiple individuals. Mixed DNA samples are common in forensics investigations, either from the direct nature of a case (e.g., a sample containing DNA from both a victim and a perpetrator) or from outside contamination. I describe an expectation maximization approach for detecting the mitochondrial haplogroups contributing to a mixture and partitioning fragments by haplogroup to reconstruct the underlying haplotypes. I demonstrate the approach&rsquo;s feasibility, accuracy, and sensitivity on both <i>in silico</i> and <i>in vitro</i> sequence mixtures. Finally, I present the results of applying our mixture interpretation approach on ancient contact DNA recovered from &sim; 700 year old moccasin and cordage samples.</p>

Investigating the Genetic Basis of Gene Expression Using EQTL Techniques

Quitadamo, Andrew 28 November 2018 (has links)
<p> With advances in genome sequencing technology, datasets with large sample sizes can be generated relatively quickly and cheaply, especially compared to the past decade or so. We can utilize this data to analyze the associations between genetic variants and gene expression, and how that in turn relates to specific phenotypes. We will explore the impact of structural variants (SVs) on gene expression and microRNA expression in healthy individuals. This dissertation is an application of expression quantitative trait loci (eQTL) analysis techniques on several of these datasets, as well as a description of an eQTL analysis pipeline software package.</p><p>

Page generated in 0.0932 seconds