In the past few years we have witnessed an explosion in the viral genomic data available. GenBank alone holds over 80,000 close to complete viral genomes, and numbers are rising fast. For example, since the submission of the first SARS genome in May 2003, over 140 more have been published. With this genomic data at hand we hope to finally be able to improve our understanding of viruses. Several papers have been dedicated to the study of genome annotation and selection on viral genomes, in particular focusing attention on the evolutionary behaviour of overlapping reading frames. This is a feature common to viruses, where due to the three periodicity of the genetic code, up to three genes may be encoded simultaneously in one direction. The constraints placed on a nucleotide involved in such a multiple coding region will naturally have an effect on its mutational behaviour, and as a result the pattern of evolution will be more complex. Additionally, due to their fast evolution time, we observe changes in gene structure between viruses of the same family. Finally, as a result of this high divergence, alignments between two genomes will tend to be unreliable, thus complicating the issue of comparative analysis further. Our goal is to present methods which may deal with the above mentioned complications. We first introduce an ab initio pairwise comparative annotation method, which not only accounts for the presence of overlapping reading frames in genomes, but also for differences in gene structure between the two compared sequences. Secondly, we develop a hidden Markov model for the annotation of selection strengths across a viral genome accommodating for inter- as well as intragenic differences in selection. Thirdly, we investigate the effect of using a fixed alignment on the inference of selection by incorporating statistical alignment into our selection analysis. All three methods presented here improve on their respective equivalents in the field. We investigate the nature of selection in overlapping regions in several studies, in particular on the genomes of Hepatitis B and HIV2. We provide a full annotation of selection strengths on a nucleotide level for both viral sequences, highlighting fast evolving regions such as the gp120 protein. We also analyse the mutational behaviour of overlapping regions in both genomes and find that in Hepatitis B selection seems to be of equal strength for single and double coding regions. In HIV2, however, single coding regions appear to be under twice as stringent selection as double coding regions, with a tendency for a fast evolving region to overlap a slow evolving one. Each chapter of our work relates to one of our publications. We introduce in turn each method, its academic context and its results. We subsequently in chapter 5 discuss for each method its achievements, its shortcomings and future possible extensions and improvements to it.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:491424 |
Date | January 2008 |
Creators | de Groot, Saskia Elizabeth |
Contributors | Hein, Jotun |
Publisher | University of Oxford |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | http://ora.ox.ac.uk/objects/uuid:9bc1f480-5556-4f44-8700-8c230a5dbda9 |
Page generated in 0.0017 seconds