In this thesis, we show how to create an approximate Markov model for the DNA. This model is constructed by encoding the DNA nucleotides into finite length symbolic sequences, referred to as words, and creating a 2D symbolic space for the DNA, where points plotted in that space represent words. From the construction of our model, we are able to specify words for the DNA, their lengths and how they can be organised together in groups of symbolic similarities. The model also allows the construction of a network of the DNA, where the nodes represent group of words and the edges connecting two nodes a measure of the likelihood that words in a group are mapped to another strongly correlated group of words after 1 shift in the nucleotide sequence. The model is then applied to reduce the complexity of the DNA, by considering the most relevant group of words that carry most of the information of the DNA. We were able to show that in the E. coli's 2/5th of the information is lost by neglecting only 3 groups of words. The model was then applied to construct measures of similarity between genes and predictability of genes in different organisms. We then study the long-term behaviour of group of words in our Markov model by analysing their recurrence properties. For some group of words, the statistics of returns was theoretically estimated from statistical properties of our model. The groups of words that contribute more to the DNA's random nature provide a simple way to analytically estimate the statistics of returns of words belonging to these groups. As an application of the recurrence analysis, we were able to show that the coding regions of the DNA contribute more to its random character.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:668978 |
Date | January 2015 |
Creators | Srivastava, Shambhavi |
Publisher | University of Aberdeen |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | http://digitool.abdn.ac.uk:80/webclient/DeliveryManager?pid=227605 |
Page generated in 0.0017 seconds