The scope and application of high throughput techniques has expanded from studying a single genome, transcriptome or proteome to understanding complex environments at a greater resolution with the help of novel computational frameworks. Comprehensive structural annotation i.e. description of all functional elements in the genome, is required for measuring genome response accurately, using high throughput methods. Annotation of genome sequences using high throughput data from RNA-seq and proteomics experiments complement computational methods for identifying functional elements and can help validate existing in silico annotation, correct annotation errors, and could potentially identify novel functional elements. Re-annotation studies in recent times have revealed shortcomings of automated methods and the necessity to validate existing annotations using experimental data. This dissertation elucidates re-annotation of Mannheimia haemolytica, Pasteurella multocida and Histophilus somni, bacterial pathogens associated with bovine respiratory disease in cattle. Experimental re-annotation of these bacterial genomes using RNA-seq and proteomics enabled the validation of existing annotation and discovery of novel functional elements that can be utilized in future functional genomics studies. We also addressed the need for developing an automated bioinformatics workflow that is broadly applicable for bacterial genome re-annotation, by developing open source Perl pipeline that can use RNA-seq and proteomics data as input. Simultaneous analysis of host and pathogen gene expression profiling using metatranscriptomics approaches is necessary to improve our understanding of infectious diseases. Traditional methods for analysis of RNA-seq data do not address the impact of cross-mapping of reads to multiple genomes for data originating from a metatranscriptomic study. Analysis of sequence conservation between species can help determine a metric for cross mapping to correct for signal vs. noise. We generated artificial RNA-seq data and evaluated the impact of read length and sequence conservation on cross-mapping. Comparative genomics was used to identify a core and pan-genome for quantifying gene expression. Our results show that cross mapping between genomes can directly be related to evolutionary distance between these genomes and that an increase in RNA-seq read length tends to negate cross mapping.
Identifer | oai:union.ndltd.org:MSSTATE/oai:scholarsjunction.msstate.edu:td-2137 |
Date | 07 May 2016 |
Creators | Reddy, Joseph S |
Publisher | Scholars Junction |
Source Sets | Mississippi State University |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Theses and Dissertations |
Page generated in 0.0018 seconds