• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Deep learning for promoter recognition: a robust testing methodology

Perez Martell, Raul Ivan 29 April 2020 (has links)
Understanding DNA sequences has been an ongoing endeavour within bioinfor- matics research. Recognizing the functionality of DNA sequences is a non-trivial and complex task that can bring insights into understanding DNA. In this thesis, we study deep learning models for recognizing gene regulating regions of DNA, more specifi- cally promoters. We first consider DNA modelling as a language by training natural language processing models to recognize promoters. Afterwards, we delve into current models from the literature to learn how they achieve their results. Previous works have focused on limited curated datasets to both train and evaluate their models using cross-validation, obtaining high-performing results across a variety of metrics. We implement and compare three models from the literature against each other, us- ing their datasets interchangeably throughout the comparison tests. This highlights shortcomings within the training and testing datasets for these models, prompting us to create a robust promoter recognition testing dataset and developing a testing methodology, that creates a wide variety of testing datasets for promoter recognition. We then, test the models from the literature with the newly created datasets and highlight considerations to take in choosing a training dataset. To help others avoid such issues in the future, we open-source our findings and testing methodology. / Graduate
2

Human Promoter Recognition Based on Principal Component Analysis

Li, Xiaomeng January 2008 (has links)
Master of Engineering / This thesis presents an innovative human promoter recognition model HPR-PCA. Principal component analysis (PCA) is applied on context feature selection DNA sequences and the prediction network is built with the artificial neural network (ANN). A thorough literature review of all the relevant topics in the promoter prediction field is also provided. As the main technique of HPR-PCA, the application of PCA on feature selection is firstly developed. In order to find informative and discriminative features for effective classification, PCA is applied on the different n-mer promoter and exon combined frequency matrices, and principal components (PCs) of each matrix are generated to construct the new feature space. ANN built classifiers are used to test the discriminability of each feature space. Finally, the 3 and 5-mer feature matrix is selected as the context feature in this model. Two proposed schemes of HPR-PCA model are discussed and the implementations of sub-modules in each scheme are introduced. The context features selected by PCA are III used to build three promoter and non-promoter classifiers. CpG-island modules are embedded into models in different ways. In the comparison, Scheme I obtains better prediction results on two test sets so it is adopted as the model for HPR-PCA for further evaluation. Three existing promoter prediction systems are used to compare to HPR-PCA on three test sets including the chromosome 22 sequence. The performance of HPR-PCA is outstanding compared to the other four systems.
3

Human Promoter Recognition Based on Principal Component Analysis

Li, Xiaomeng January 2008 (has links)
Master of Engineering / This thesis presents an innovative human promoter recognition model HPR-PCA. Principal component analysis (PCA) is applied on context feature selection DNA sequences and the prediction network is built with the artificial neural network (ANN). A thorough literature review of all the relevant topics in the promoter prediction field is also provided. As the main technique of HPR-PCA, the application of PCA on feature selection is firstly developed. In order to find informative and discriminative features for effective classification, PCA is applied on the different n-mer promoter and exon combined frequency matrices, and principal components (PCs) of each matrix are generated to construct the new feature space. ANN built classifiers are used to test the discriminability of each feature space. Finally, the 3 and 5-mer feature matrix is selected as the context feature in this model. Two proposed schemes of HPR-PCA model are discussed and the implementations of sub-modules in each scheme are introduced. The context features selected by PCA are III used to build three promoter and non-promoter classifiers. CpG-island modules are embedded into models in different ways. In the comparison, Scheme I obtains better prediction results on two test sets so it is adopted as the model for HPR-PCA for further evaluation. Three existing promoter prediction systems are used to compare to HPR-PCA on three test sets including the chromosome 22 sequence. The performance of HPR-PCA is outstanding compared to the other four systems.

Page generated in 0.1035 seconds