• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • Tagged with
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Knowledge discovery from cDNA microarrays and a priori knowledge

Midelfart, Herman January 2003 (has links)
Microarray technology has recently attracted a lot of attention. This technology can measure the behavior (i.e., RNA abundance) of thousands of genes simultaneously, while previous methods have only allowed measurements of single genes. By enabling studies on a genome-wide scale, microarray technology is currently revolutionizing biological research and creating a wide range of research opportunities. However, the technology generates a vast amount of data that cannot be handled manually. Computational analysis is thus a prerequisite for the success of this technology, and research and development of computational tools for microarray analysis are of great importance. This thesis develops supervised learning methods based on Rough Set Theory (RST) for analyzing microarray data together with prior knowledge. Two kinds of microarray studies are considered. The first is cancer studies where supervised learning may be used for predicting tumor subtypes and clinical parameters. We introduce a general RST approach for classification of tumor samples analyzed by microarrays. This includes a feature selection method for selecting genes that discriminate significantly between a set of classes. RST classifiers are then learned from the selected genes. The approach is applied to a data set of gastric tumors. Classifiers for six clinical parameters are developed and demonstrate that these parameters can be predicted from the expression profile of gastric tumors. Moreover, the performance of the feature selection method as well as several learning and discretization methods implemented in ROSETTA are examined and compared to the performance of linear and quadratic discrimination analysis. The classifiers are also biologically validated. One of the best classifiers is selected for each clinical parameter, and the connection between the genes used in these classifiers and the parameters are compared to the established knowledge in the biomedical literature. Many of these genes have no previously known connection to gastric cancer and provide interesting targets for further biological research. The second kind of study is prediction of gene function from expression profiles measured with microarrays. A serious problem in this case is that functional classes, which are assigned to genes, are typically organized in an ontology where the classes may be related to each other. One example is the Gene Ontology where the classes form a Directed Acyclic Graph (DAG). Standard learning methods such as RST assume, however, that the classes are unrelated, and cannot deal with this problem directly. This thesis gives a solution by introducing an extended RST framework and two novel algorithms for learning in a DAG. The DAG also constitutes a problem when a classifier is to be evaluated since standard performance measures such as accuracy or AUC do not recognize the structure of the DAG. Therefore, several new performance measures are introduced. The algorithms are first tested on a data set that was created from human fibroblast cells by the means of microarrays. They are then applied on artificial data in order to obtain a better understanding of their behavior, and their weaknesses and strengths are identified.
2

Searching and Classifying non-textual information

Arentz, Will Archer January 2004 (has links)
<p>This dissertation contains a set of contributions that deal with search or classification of non-textual information. Each contribution can be considered a solution to a specific problem, in an attempt to map out a common ground. The problems cover a wide range of research fields, including search in music, classifying digitally sampled music, visualization and navigation in search results, and classifying images and Internet sites.</p><p>On classification of digitally sample music, as method for extracting the rhythmic tempo was disclosed. The method proved to work on a large variety of music types with a constant audible rhythm. Furthermore, this rhythmic properties showed to be useful in classifying songs into music groups or genre.</p><p>On search in music, a technique is presented that is based on rhythm and pitch correlation between the notes in a query theme and the notes in a set of songs. The scheme is based on a dynamic programming algorithm which attempts to minimize the error between a query theme and a song. This operation includes finding the best alignment, taking into account skipped notes and additional notes, use of different keys, tempo variations, and variances in pitch and time information.</p><p>On image classification, a system for classifying whole Internet sites based on the image content, was proposed. The system was composed of two parts; an image classifier and a site classifier. The image classifier was based on skin detection, object segmentation, and shape, texture and color feature extraction with a training scheme that used genetic algorithms. The image classification method was able to classify images with an accuracy of 90%. By classifying multi-image Internet web sites this accuracy was drastically increased using the assumption that a site only contains one type of images. This assumption can be defended for most cases.</p><p>On search result visualization and navigation, a system was developed involving the use of a state-of-the-art search engine together with a graphical front end to improve the user experience associated with search in unstructured data. Both structured and unstructured data with the help of entity extraction can be indexed in a modern search engine. Combining this with a multidimensional visualization based on heatmaps with navigation capabilities showed to improve the data value and search experience on current search systems.</p> / Paper III reprinted with kind permission of Elsevier Publishing, sciencedirect.com.
3

Searching and Classifying non-textual information

Arentz, Will Archer January 2004 (has links)
This dissertation contains a set of contributions that deal with search or classification of non-textual information. Each contribution can be considered a solution to a specific problem, in an attempt to map out a common ground. The problems cover a wide range of research fields, including search in music, classifying digitally sampled music, visualization and navigation in search results, and classifying images and Internet sites. On classification of digitally sample music, as method for extracting the rhythmic tempo was disclosed. The method proved to work on a large variety of music types with a constant audible rhythm. Furthermore, this rhythmic properties showed to be useful in classifying songs into music groups or genre. On search in music, a technique is presented that is based on rhythm and pitch correlation between the notes in a query theme and the notes in a set of songs. The scheme is based on a dynamic programming algorithm which attempts to minimize the error between a query theme and a song. This operation includes finding the best alignment, taking into account skipped notes and additional notes, use of different keys, tempo variations, and variances in pitch and time information. On image classification, a system for classifying whole Internet sites based on the image content, was proposed. The system was composed of two parts; an image classifier and a site classifier. The image classifier was based on skin detection, object segmentation, and shape, texture and color feature extraction with a training scheme that used genetic algorithms. The image classification method was able to classify images with an accuracy of 90%. By classifying multi-image Internet web sites this accuracy was drastically increased using the assumption that a site only contains one type of images. This assumption can be defended for most cases. On search result visualization and navigation, a system was developed involving the use of a state-of-the-art search engine together with a graphical front end to improve the user experience associated with search in unstructured data. Both structured and unstructured data with the help of entity extraction can be indexed in a modern search engine. Combining this with a multidimensional visualization based on heatmaps with navigation capabilities showed to improve the data value and search experience on current search systems. / Paper III reprinted with kind permission of Elsevier Publishing, sciencedirect.com.

Page generated in 0.0486 seconds