MicroRNAs (miRNAs) are post-transcriptional regulators of gene expression and play an essential role in phenotype development. The regulation mechanism behind miRNA reveals insight into gene expression and gene regulation. Transcription Start Site(TSS) is the key to studying gene expression. However, the TSSs of miRNAs can be thousands of nucleotides away from the precursor miRNAs, which makes it hard to be detected by conventional RNA-Seq experiments. Some previous methods tried to take advantage of sequencing data using sequence features or integrated epigenetic markers, but resulted in either not condition-specific or low-resolution prediction. Furthermore, the availability of a large amount of Single-Cell RNA-Seq(scRNA-Seq) data provides remarkable opportunities for studying gene regulatory mechanisms at single-cell resolution. Incorporating the gene regulatory mechanisms can assist with cell type identification and state discovery from scRNA-Seq data. In this dissertation, we studied computational modeling of gene transcription initialization and expression, including two novel approaches to identify TSSs with various type of conditions and one case study at the single-cell level. Firstly, we studied how TSS can be identified based on Cap Analysis Gene Expression (CAGE) experiments data using the thriving Deep Learning Neural Network. We used a control model to study the Deepbind binding score features that the protein binding motif model can improve overall prediction performance. Furthermore, comparing data from unseen cell lines showed better performance than existing tools. Secondly, to better predict the TSSs of miRNA in a condition-specific manner, we built D-miRT, a two-steam convolutional neural network based on integrated low-resolution epigenetic features and high-resolution sequence features. D-miRT outperformed all baseline models and demonstrated high accuracy for miRNA TSS prediction tasks. Compared with the most recent approaches on cell-specific miRNA TSS identification using cell lines that were unseen to the model training processes, D-miRT also showed superior performance. Thirdly, to study gene transcription initialization and regulation from single-cell perspective, we developed INSISTC, an unsupervised machine learning-based approach that incorporated network structure information for single-cell type classification. In contrast to other clustering algorithms, we showed that INSISTC with the SC3 algorithm provides cluster number estimation. Future studies on gene expression and regulation will benefit from INSISTC's adaptability with regard to the kinds of biological networks that can be used.
Identifer | oai:union.ndltd.org:ucf.edu/oai:stars.library.ucf.edu:etd2020-2455 |
Date | 01 January 2022 |
Creators | Zheng, Hansi |
Publisher | STARS |
Source Sets | University of Central Florida |
Language | English |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | Electronic Theses and Dissertations, 2020- |
Page generated in 0.0026 seconds