<p>A tremendous amount of genomic sequence data of relatively high quality has become publicly available due to the human genome sequencing projects that were completed a few years ago. Despite considerable efforts, we do not yet know everything that is to know about the various parts of the genome, what all the regions code for, and how their gene products contribute in the myriad of biological processes that are performed within the cells. New high-performance methods are needed to extract knowledge from this vast amount of information.</p><p>Furthermore, the traditional view that DNA codes for RNA that codes for protein, which is known as the central dogma of molecular biology, seems to be only part of the story. The discovery of many non-proteincoding gene families with housekeeping and regulatory functions brings an entirely new perspective to molecular biology. Also, sequence analysis of the new gene families require new methods, as there are significant differences between protein-coding and non-protein-coding genes.</p><p>This work describes a new search processor that can search for complex patterns in sequence data for which no efficient lookup-index is known. When several chips are mounted on search cards that are fitted into PCs in a small cluster configuration, the system’s performance is orders of magnitude higher than that of comparable solutions for selected applications. The applications treated in this work fall into two main categories, namely pattern screening and data mining, and both take advantage of the search capacity of the cluster to achieve adequate performance. Specifically, the thesis describes an interactive system for exploration of all types of genomic sequence data. Moreover, a genetic programming-based data mining system finds classifiers that consist of potentially complex patterns that are characteristic for groups of sequences. The screening and mining capacity has been used to develop an algorithm for identification of new non-protein-coding genes in bacteria; a system for rational design of effective and specific short interfering RNA for sequence-specific silencing of protein-coding genes; and an improved algorithmic step for identification of new regulatory targets for the microRNA family of non-protein-coding genes.</p> / Paper V, VI, and VII are reprinted with kind permision of Elsevier, sciencedirect.com
Identifer | oai:union.ndltd.org:UPSALLA/oai:DiVA.org:ntnu-713 |
Date | January 2005 |
Creators | Snøve Jr., Ola |
Publisher | Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, Fakultet for informasjonsteknologi, matematikk og elektroteknikk |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Doctoral thesis, comprehensive summary, text |
Relation | Doktoravhandlinger ved NTNU, 1503-8181 ; 2005:196 |
Page generated in 0.0023 seconds