Spelling suggestions: "subject:"genetik"" "subject:"genetika""
1 |
Knowledge discovery from cDNA microarrays and a priori knowledgeMidelfart, Herman January 2003 (has links)
Microarray technology has recently attracted a lot of attention. This technology can measure the behavior (i.e., RNA abundance) of thousands of genes simultaneously, while previous methods have only allowed measurements of single genes. By enabling studies on a genome-wide scale, microarray technology is currently revolutionizing biological research and creating a wide range of research opportunities. However, the technology generates a vast amount of data that cannot be handled manually. Computational analysis is thus a prerequisite for the success of this technology, and research and development of computational tools for microarray analysis are of great importance. This thesis develops supervised learning methods based on Rough Set Theory (RST) for analyzing microarray data together with prior knowledge. Two kinds of microarray studies are considered. The first is cancer studies where supervised learning may be used for predicting tumor subtypes and clinical parameters. We introduce a general RST approach for classification of tumor samples analyzed by microarrays. This includes a feature selection method for selecting genes that discriminate significantly between a set of classes. RST classifiers are then learned from the selected genes. The approach is applied to a data set of gastric tumors. Classifiers for six clinical parameters are developed and demonstrate that these parameters can be predicted from the expression profile of gastric tumors. Moreover, the performance of the feature selection method as well as several learning and discretization methods implemented in ROSETTA are examined and compared to the performance of linear and quadratic discrimination analysis. The classifiers are also biologically validated. One of the best classifiers is selected for each clinical parameter, and the connection between the genes used in these classifiers and the parameters are compared to the established knowledge in the biomedical literature. Many of these genes have no previously known connection to gastric cancer and provide interesting targets for further biological research. The second kind of study is prediction of gene function from expression profiles measured with microarrays. A serious problem in this case is that functional classes, which are assigned to genes, are typically organized in an ontology where the classes may be related to each other. One example is the Gene Ontology where the classes form a Directed Acyclic Graph (DAG). Standard learning methods such as RST assume, however, that the classes are unrelated, and cannot deal with this problem directly. This thesis gives a solution by introducing an extended RST framework and two novel algorithms for learning in a DAG. The DAG also constitutes a problem when a classifier is to be evaluated since standard performance measures such as accuracy or AUC do not recognize the structure of the DAG. Therefore, several new performance measures are introduced. The algorithms are first tested on a data set that was created from human fibroblast cells by the means of microarrays. They are then applied on artificial data in order to obtain a better understanding of their behavior, and their weaknesses and strengths are identified.
|
2 |
Hardware-accelerated analysis of non-protein-coding RNAsSnøve Jr., Ola January 2005 (has links)
<p>A tremendous amount of genomic sequence data of relatively high quality has become publicly available due to the human genome sequencing projects that were completed a few years ago. Despite considerable efforts, we do not yet know everything that is to know about the various parts of the genome, what all the regions code for, and how their gene products contribute in the myriad of biological processes that are performed within the cells. New high-performance methods are needed to extract knowledge from this vast amount of information.</p><p>Furthermore, the traditional view that DNA codes for RNA that codes for protein, which is known as the central dogma of molecular biology, seems to be only part of the story. The discovery of many non-proteincoding gene families with housekeeping and regulatory functions brings an entirely new perspective to molecular biology. Also, sequence analysis of the new gene families require new methods, as there are significant differences between protein-coding and non-protein-coding genes.</p><p>This work describes a new search processor that can search for complex patterns in sequence data for which no efficient lookup-index is known. When several chips are mounted on search cards that are fitted into PCs in a small cluster configuration, the system’s performance is orders of magnitude higher than that of comparable solutions for selected applications. The applications treated in this work fall into two main categories, namely pattern screening and data mining, and both take advantage of the search capacity of the cluster to achieve adequate performance. Specifically, the thesis describes an interactive system for exploration of all types of genomic sequence data. Moreover, a genetic programming-based data mining system finds classifiers that consist of potentially complex patterns that are characteristic for groups of sequences. The screening and mining capacity has been used to develop an algorithm for identification of new non-protein-coding genes in bacteria; a system for rational design of effective and specific short interfering RNA for sequence-specific silencing of protein-coding genes; and an improved algorithmic step for identification of new regulatory targets for the microRNA family of non-protein-coding genes.</p> / Paper V, VI, and VII are reprinted with kind permision of Elsevier, sciencedirect.com
|
3 |
Hardware-accelerated analysis of non-protein-coding RNAsSnøve Jr., Ola January 2005 (has links)
A tremendous amount of genomic sequence data of relatively high quality has become publicly available due to the human genome sequencing projects that were completed a few years ago. Despite considerable efforts, we do not yet know everything that is to know about the various parts of the genome, what all the regions code for, and how their gene products contribute in the myriad of biological processes that are performed within the cells. New high-performance methods are needed to extract knowledge from this vast amount of information. Furthermore, the traditional view that DNA codes for RNA that codes for protein, which is known as the central dogma of molecular biology, seems to be only part of the story. The discovery of many non-proteincoding gene families with housekeeping and regulatory functions brings an entirely new perspective to molecular biology. Also, sequence analysis of the new gene families require new methods, as there are significant differences between protein-coding and non-protein-coding genes. This work describes a new search processor that can search for complex patterns in sequence data for which no efficient lookup-index is known. When several chips are mounted on search cards that are fitted into PCs in a small cluster configuration, the system’s performance is orders of magnitude higher than that of comparable solutions for selected applications. The applications treated in this work fall into two main categories, namely pattern screening and data mining, and both take advantage of the search capacity of the cluster to achieve adequate performance. Specifically, the thesis describes an interactive system for exploration of all types of genomic sequence data. Moreover, a genetic programming-based data mining system finds classifiers that consist of potentially complex patterns that are characteristic for groups of sequences. The screening and mining capacity has been used to develop an algorithm for identification of new non-protein-coding genes in bacteria; a system for rational design of effective and specific short interfering RNA for sequence-specific silencing of protein-coding genes; and an improved algorithmic step for identification of new regulatory targets for the microRNA family of non-protein-coding genes. / Paper V, VI, and VII are reprinted with kind permision of Elsevier, sciencedirect.com
|
Page generated in 0.0583 seconds