I present a method that regroups cis-regulatory modules by shared sequences motifs. The goal of this approach is to search for clusters of modules that may share some function, using only sequence similarity. The proposed similarity measure is based on a variable-order Markov model likelihood scoring of sequences. I also introduce an extension of the variable-order Markov model which could better perform the required task. Results. I show that my method may recover subsets of sequences sharing a pattern in a set of generated sequences. I found that the proposed approach is successful in finding groups of modules that shared a type of transcription factor binding site.
Identifer | oai:union.ndltd.org:LACETR/oai:collectionscanada.gc.ca:QMM.112632 |
Date | January 2007 |
Creators | Handfield, Louis-François. |
Publisher | McGill University |
Source Sets | Library and Archives Canada ETDs Repository / Centre d'archives des thèses électroniques de Bibliothèque et Archives Canada |
Language | English |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Format | application/pdf |
Coverage | Master of Science (School of Computer Science.) |
Rights | All items in eScholarship@McGill are protected by copyright with all rights reserved unless otherwise indicated. |
Relation | alephsysno: 002712105, proquestno: AAIMR51278, Theses scanned by UMI/ProQuest. |
Page generated in 0.0018 seconds