Spelling suggestions: "subject:"bioinformatics"" "subject:"ioinformatics""
281 |
The viral genomics revolution| Big data approaches to basic viral research, surveillance, and vaccine developmentSchobel, Seth Adam Micah 19 February 2016 (has links)
<p> Since the decoding of the first RNA virus in 1976, the field of viral genomics has exploded, first through the use of Sanger sequencing technologies and later with the use next-generation sequencing approaches. With the development of these sequencing technologies, viral genomics has entered an era of big data. New challenges for analyzing these data are now apparent. Here, we describe novel methods to extend the current capabilities of viral comparative genomics. Through the use of antigenic distancing techniques, we have examined the relationship between the antigenic phenotype and the genetic content of influenza virus to establish a more systematic approach to viral surveillance and vaccine selection. Distancing of Antigenicity by Sequence-based Hierarchical Clustering (DASH) was developed and used to perform a retrospective analysis of 22 influenza seasons. Our methods produced vaccine candidates identical to or with a high concordance of antigenic similarity with those selected by the WHO. In a second effort, we have developed VirComp and OrionPlot: two independent yet related tools. These tools first generate gene-based genome constellations, or genotypes, of viral genomes, and second create visualizations of the resultant genome constellations. VirComp utilizes sequence-clustering techniques to infer genome constellations and prepares genome constellation data matrices for visualization with OrionPlot. OrionPlot is a java application for tailoring genome constellation figures for publication. OrionPlot allows for color selection of gene cluster assignments, customized box sizes to enable the visualization of gene comparisons based on sequence length, and label coloring. We have provided five analyses designed as vignettes to illustrate the utility of our tools for performing viral comparative genomic analyses. Study three focused on the analysis of respiratory syncytial virus (RSV) genomes circulating during the 2012- 2013 RSV season. We discovered a correlation between a recent tandem duplication within the G gene of RSV-A and a decrease in severity of infection. Our data suggests that this duplication is associated with a higher infection rate in female infants than is generally observed. Through these studies, we have extended the state of the art of genotype analysis, phenotype/genotype studies and established correlations between clinical metadata and RSV sequence data.</p>
|
282 |
Numerical and Computational Solutions for Biochemical Kinetics, Druggability, and SimulationVotapka, Lane William 31 March 2016 (has links)
<p> Computational tools provide the automation and power that enable detailed modeling and analysis of many biomolecular phenomena of interest. Open source programs and automated tools empower researchers and provide opportunities for improvement to existing software. In the past few years, I have developed several open-source scientific software packages for the purposes of automating difficult or menial tasks pertaining to computational biophysics. These software packages involve the analysis of molecular dynamics simulations, Brownian dynamics simulations, electrostatics, pocket volume measurement, solvent fragment mapping, binding site characterization, milestoning theory, and allosteric network communications. In addition to allowing my research group and me to approach biomedical challenges that would otherwise be intractable, I hope and intend that these tools will be useful to the computational and theoretical biophysics research community.</p>
|
283 |
Development of Steady-State and Dynamic Flux Models for Broad-Scope Microbial Metabolism AnalysisHe, Lian 07 May 2016 (has links)
<p> Flux analysis techniques, including flux balance analysis (FBA) and 13C-metabolic flux analysis (MFA), can characterize carbon and energy flows through a cell’s metabolic network. By employing both 13C-labeling experiments and nonlinear programming, 13C-MFA provides a rigorous way of examining cell flux distributions in the central metabolism. FBA, on the other hand, gives a holistic review of optimal fluxomes on the genome scale. In this dissertation, flux analysis techniques were constructed to investigate the microbial metabolisms. First, an open-source and programming-free platform of 13C-MFA (WUFlux) with a user-friendly interface in MATLAB was developed, which allowed both mass isotopomer distribution (MID) analysis and metabolic flux calculations. Several bacterial templates with diverse substrate utilizations were included in this platform to facilitate 13C-MFA model construction. The corrected MID data and flux profiles resulting from our platform have been validated by other available 13C-MFA software. Second, 13C-MFA was applied to investigate the variations of bacterial metabolism in response to genetic manipulations or changing growth conditions. Specifically, we investigated the central metabolic responses to overproduction of fatty acids in Escherichia coli and the carbon flow distributions of Synechocystis sp. PCC 6803 under both photomixotrophic and photoheterotrophic conditions. By employing the software of isotopomer network compartmental analysis, we performed isotopically non-stationary MFA on Synechococcus elongatus UTEX 2973. The 13C-based analysis was also conducted for other non-model species, such as Chloroflexus aurantiacus. The resulting flux distributions detail how cells manage the trade-off between carbon and energy metabolisms to survive under stressed conditions, support high productions of biofuel, or organize the metabolic routes for sustaining biomass growth. Third, conventional FBA is suitable for only steady-state conditions. To describe the environmental heterogeneity in bioreactors and temporal changes of cell metabolism, we integrated genome-scale FBA with growth kinetics (time-dependent information) and cell hydrodynamic movements (space-dependent information). A case study was subsequently carried out for wild-type and engineered cyanobacteria, in which a heterogeneous light distribution in photobioreactors was considered in the model. The resulting integrated genome-scale model can offer insights into both intracellular and extracellular domains and facilitate the analysis of bacterial performance in large-scale fermentation systems. Both steady-state and dynamic flux analysis models can offer insights into metabolic responses to environmental fluctuations and genetic modifications. They are also useful tools to provide rational strategies of constructing microbial cell factories for industrial applications. </p>
|
284 |
Bacterial and phage interactions influencing Vibrio parahaemolyticus ecologyMarcinkiewicz, Ashley 09 August 2016 (has links)
<p> <i>Vibrio parahaemolyticus,</i> a human pathogenic bacterium, is a naturally occurring member of the microbiome of the Eastern oyster. As the nature of this symbiosis in unknown, the oyster presents the opportunity to investigate how microbial communities interact with a host as part of the ecology of an emergent pathogen of importance. To define how members of the oyster bacterial microbiome correlate with <i>V. parahaemolyticus,</i> I performed marker-based metagenetic sequencing analyses to identify and quantify the bacterial community in individual oysters after culturally-quantifying <i> V. parahaemolyticus</i> abundance. I concluded that despite shared environmental exposures, individual oysters from the same collection site varied both in microbiome community and <i>V. parahaemolyticus</i> abundance, and there may be an interaction with <i>V. parahaemolyticus</i> and <i> Bacillus</i> species. In addition, to elucidate the ecological origins of pathogenic New England ST36 populations, I performed whole genome sequencing and phylogenetic analyses. I concluded ST36 strains formed distinct subpopulations that correlated both with geographic region and unique phage content that can be used as a biomarker for more refined strain traceback. Furthermore, these subpopulations indicated there may have been multiple invasions of this non-native pathogen into the Atlantic coast.</p>
|
285 |
Identification of drug sensitive gene motifs using "epigenetic profiles" derived from bioinformatics databasesNelson, Jonathan M. 14 June 2016 (has links)
<p> The use of epigenetic modifying drugs such as DNA methyltransferase inhibitors (DNMTi) and histone deacetylase inhibitors (HDACi) is becoming more common in the treatment of cancer. Currently, there is a profound interest in determining predictive biomarkers for patient response and the efficacy of known and novel drugs. There are likely distinct “epigenetic profiles” defined by the location and abundance of DNA methylation patterns and histone modifications. Here we propose to investigate the response of a selected subset of genes to particular DNMTi and HDACi treatments, in two human cancer cell lines, colorectal carcinoma HCT-116 and liver adenocarcinoma HepG2. In this study we identified unique epigenetic profiles based on microarray and bioinformatics derived epigenetic data that are predictive of the response to epigenetic drug treatment. Microarray studies were used to identify re-activated genes common in two different cancer cell types treated with epigenetic drugs. Bioinformatics data was compiled on these genes and correlated against re-expression to construct the genes’ “epigenetic profile”. We then verified the response of the select group of genes in HCT-116 and HepG2 upon treatment at varying concentrations of epigenetic drugs and illustrated that selective reactivation of the target gene. Additionally, two novel genes were introduced and one selectively activated over another. </p><p> Further research would prove invaluable for the medical and drug development communities, as a more extensive model would certainly be of use to determining patient response to drug treatment based on their individual epigenetic profile and leading to more successful novel drug design.</p>
|
286 |
Tipping the Balance: Factors That Influence the ER Signaling Network in Breast CancerJasper, Jeff January 2014 (has links)
<p>The estrogen receptor(ER) is a master transcriptional regulator of the breast where it plays key roles in the development and maintenance of normal breast epithelium but is also critical to the growth of luminal breast cancers. ER is also a well-defined molecular therapeutic target and anti-estrogens, such as tamoxifen, are used clinically to inhibit the mitogenic activity of ER and delay disease progression. However, despite the initial benefits to tamoxifen therapy, nearly one third of luminal breast cancer tumors eventually become resistant, limiting the therapeutic utility the drug. Mechanisms of resistance can be attributed to circumvention of ER and reliance on alternative growth pathways, or through upregulation of pathways that converge with ER to allow reactivation. Understanding the molecular determinants of resistance is a critical endeavor that demands attention in order to shape new drug developments and extend the therapeutic efficacy of anti-estrogens.</p><p> A major challenge in elucidating mechanisms of resistance is in understanding the complexities of the ER signaling program in respect to receptor occupancy and the coordinated relationship with chromatin architecture and collaborating transcription factors. This work therefore integrates the relationship between accessible chromatin, as measured by DNase-Seq, with ER occupancy and ER-mediated transcription in an in vivo derived tamoxifen resistant cell line (TamR) and a comparator group of two closely related tamoxifen sensitive cell lines. Cumulatively, these data demonstrate an enhanced role for FOXA1 in tamoxifen resistance. Specifically, FOXA1 occupancy is greatly enriched at differential DNase hypersensitive loci in TamR cells, and FOXA1 target genes are dramatically upregulated. Furthermore, expression of these target genes can be restored to MCF7 levels with siRNA directed against FOXA1. The TamR cells also have increased ER occupancy at FOXA1 overlapping sites, where ER is engaged to chromatin in a ligand-independent manner and results in enhanced activation of nearby target genes that can be repressed with the pure anti-estrogen, ICI. The increased role of FOXA1 is not due to an increase in total protein levels however and instead is manifested through increased activity. </p><p>Other clinical associations of resistance have been elucidated for which there is little to no mechanistic evidence currently available. HOXB13 has been shown to associate with tamoxifen therapy failure from differential microarray expression profiling of patients who relapsed compared to those that remained disease-free at the five year follow-up. The outcome of our studies reveals HOXB13 to downregulate GATA3 levels, which in turn leads to loss of ER function and parallel activation of inflammatory pathways. </p><p>The present study also makes use of publicly available clinical datasets to generate an integrative database of 4885 patients from 25 independent studies. Furthermore, analytical methods and functions were also developed to allow efficient use and application of the data. Access to the breast cancer meta-set and functions are made available to end users via a web interface, GeneAnalytics. Together, the breast cancer meta-set and associated access through the GeneAnalytics web sites provides novel opportunities for researchers to integrate functional studies with tumor derived expression data to further our understanding of cancer related processes.</p><p>Collectively, our findings demonstrate that the ER signaling program is modified as tumors progress to resistance by an increased role of FOXA1 to facilitate ER binding and reprogramming, and by HOXB13 to suppress the actions of ER and promote inflammatory pathways. These mechanisms highlight distinct methods of resistance and provide rational for new therapeutic approaches to extend the utility of current anti-estrogens.</p> / Dissertation
|
287 |
LARVA - An Integrative Framework for Large-scale Analysis of Recurrent Variants in Noncoding Annotations - And Other Tools for Cancer Genome AnalysisLochovsky, Lucas Sze-wan Fong 16 February 2016 (has links)
<p> Initial approaches to cancer treatment have involved classifying cancer by the site in which it is first formed, and treating it with drugs and other therapies that have very broad targeting. These therapies are often prone to damaging healthy cells in the process, which may lead to additional health complications. With the advent of high-throughput sequencing, and the development of computational tools and software to process the subsequent deluge of sequencing data, much progress has been made on functionally annotating the human genome. Many genomes have been cost-effectively sequenced, providing insight into genetic variation between various human populations. The methods used to study population variation may also be used to study the basis of genetic disease, including cancer. It has now been demonstrated that there are many molecular subtypes of cancer, where each subtype is differentiated based on which important cellular molecule or DNA sequence has been disrupted. Hence, understanding the genetic basis of cancer is paramount to the development of new, personalized molecular therapies to treat cancer.</p><p> Noncoding variants are known to be associated with disease, but they are not as commonly investigated as coding variants since assessing the functional impact of a mutation is difficult. For rare mutations, background mutation models have been set up for burden tests to discover highly mutated regions, which might be potential drivers of cancer. This has been developed for coding regions, leading to the successful use of burden tests to find highly mutated genes. However, this is challenging for noncoding regions because of mutation rate heterogeneity and potential correlations across regions, which give rise to huge overdispersion in the mutation count data. If not corrected, such overdispersions may suggest artefactual mutational hotspots. We address these issues with the development of a new computational framework called LARVA. LARVA intersects whole genome single nucleotide variant (SNV) calls with a comprehensive set of noncoding regulatory elements, and models these elements' mutation counts with a beta-binomial distribution to handle the overdispersion in a principled fashion. Furthermore, in estimating this distribution and determining the local mutation rate, LARVA incorporates regional genomic features like replication timing.</p><p> The LARVA framework can be extended in certain ways to facilitate the analysis of its results. By storing information on highly mutated annotations in a relational database, it is possible to quickly extract the most interesting results for further analysis. Furthermore, results from multiple LARVA runs can be combined for a meta-analysis that could involve, for example, finding highly mutated pathways in cancer and other types of genetic disease. Since LARVA's computation consists of many independent units of work, it can benefit from various forms of parallel computation. These forms of computation include distributed computing with a large number of commodity processors, as well as more esoteric types of parallelization, such as general purpose graphics processing unit (GPU) computation.</p><p> We make LARVA available as free software tool at larva.gersteinlab.org. We demonstrate the effectiveness of LARVA by showing how it identifies the well-known noncoding drivers, such as TERT promoter, on 760 cancer whole genomes. Furthermore, we show it is able to highlight several novel noncoding regulators that could be potential new noncoding drivers. We also make all of the highly mutated annotations available online.</p><p> We also describe the Aggregation and Correlation Toolbox (ACT), a collection of software tools that facilitates the analysis of genomic signal tracks. The aggregation component takes a signal track and a series of genome regions, and creates an aggregate profile of the signal over the given regions. This enables the discovery of consistent signal patterns over related sets of annotations, implying potential connections between the signal and the regions. The correlation component of ACT takes two or more signal tracks and computes all pairwise track correlations. Correlation analyses are useful for finding similarities between various experiments, such as the binding sites of transcription factors as determined by ChIP-seq. The final component of ACT is a saturation tool designed to determine the number of experiments necessary to cover genomic features to saturation. This type of analysis can be illustrated with a ChIP-seq experiment where the inclusion of additional cell lines will reveal more binding sites for a transcription factor of interest: with each new cell line, a smaller fraction of the sites will be newly discovered, and a larger fraction will overlap discovered sites from previously used cell lines. The objective of ACT's saturation tool is to find the point of diminishing returns in the discovery of new sites, which may result in more efficiently planned experiments.</p>
|
288 |
The hetnet awakens| understanding complex diseases through data integration andopen scienceHimmelstein, Daniel S. 07 July 2016 (has links)
<p> Human disease is complex. However, the explosion of biomedical data is providing new opportunities to improve our understanding. My dissertation focused on how to harness the biodata revolution. Broadly, I addressed three questions: how to integrate data, how to extract insights from data, and how to make science more open. </p><p> To integrate data, we pioneered the hetnet—a network with multiple node and relationship types. After several preludes, we released Hetionet v1.0, which contains 2,250,197 relationships of 24 types. Hetionet encodes the collective knowledge produced by millions of studies over the last half century. </p><p> To extract insights from data, we developed a machine learning approach for hetnets. In order to predict the probability that an unknown relationship exists, our algorithm identifies influential network patterns. We used the approach to prioritize disease—gene associations and drug repurposing opportunities. By evaluating our predictions on withheld knowledge, we demonstrated the systematic success of our method. </p><p> After encountering friction that interfered with data integration and rapid communication, I began looking at how to make science more open. The quest led me to explore realtime open notebook science and expose publishing delays at journals as well as the problematic licensing of publicly-funded research data.</p>
|
289 |
Structure comparison in bioinformaticsPeng, Zeshan., 彭澤山. January 2006 (has links)
published_or_final_version / abstract / Computer Science / Doctoral / Doctor of Philosophy
|
290 |
Efficient solutions for bioinformatics applications using GPUsLiu, Chi-man, 廖志敏 January 2015 (has links)
Over the past few years, DNA sequencing technology has been advancing at such a fast pace that computer hardware and software can hardly meet the ever-increasing demand for sequence analysis. A natural approach to boost analysis efficiency is parallelization, which divides the problem into smaller ones that are to be solved simultaneously on multiple execution units. Common architectures such as multi-core CPUs and clusters can increase the throughput to some extent, but the hardware setup and maintenance costs are prohibitive. Fortunately, the newly emerged general-purpose GPU programming paradigm gives us a low-cost alternative for parallelization.
This thesis presents GPU-accelerated algorithms for several problems in bioinformatics, along with implementations to demonstrate their power in handling enormous totally different limitations and optimization techniques than the CPU.
The first tool presented is SOAP3-dp, which is a DNA short-read aligner highly optimized for speed. Prior to SOAP3-DP, the fastest short-read aligner was its predecessor SOAP2, which was capable of aligning 1 million 100-bp reads in 5 minutes. SOAP3-dp beats this record by aligning the same volume in only 10 seconds. The key to unlocking this unprecedented speed is the revamped BWT engine underlying SOAP3-dp. All data structures and associated operations have been tailor made for the GPU to achieve optimized performance. Experiments show that SOAP3-dp not only excels in speed, but also outperforms other aligners in both alignment sensitivity and accuracy.
The next tools are for constructing data structures, namely Burrows-Wheeler transform (BWT) and de Bruijn graphs (DBGs), to facilitate genome assembly of short reads, especially large metagenomics data. The BWT index for a set of short reads has recently found its use in string-graph assemblers [44], as it provides a succinct way of representing huge string graphs which would otherwise exceed the main memory limit. Constructing the BWT index for a million reads is by itself not an easy task, let alone optimize for the GPU. Another class of assemblers, the DBG-based assemblers, also faces the same problem. This thesis presents construction algorithms for both the BWT and DBGs in a succinct form. In our experiments, we constructed the succinct DBG for a metagenomics data set with over 200 gigabases in 3 hours, and the resulting DBG only consumed 31.2 GB of memory. We also constructed the BWT index for 10 million 100-bp reads in 40 minutes using 4 quad-core machines.
Lastly, we introduce a SNP detection tool, iSNPcall, which detects SNPs from a set of reads. Given a set of user-supplied annotated SNPs, iSNPcall focuses only on alignments covering these SNPs, which greatly accelerates the detection of SNPs at the prescribed loci. The annotated SNPs also helps us distinguish sequencing errors from authentic SNPs alleles easily. This is in contrast to the traditional de novo method which aligns reads onto the reference genome and then filters inauthentic mismatches according to some probabilities. After comparing on several applications, iSNPcall was found to give a higher accuracy than the de novo method, especially for samples with low coverage. / published_or_final_version / Computer Science / Doctoral / Doctor of Philosophy
|
Page generated in 0.0858 seconds