Return to search

A Comprehensive Survey and Deep Learning-Based Prediction on G-quadruplex Formation and Biological Functions

Indiana University-Purdue University Indianapolis (IUPUI) / The G-quadruplexes (G4s) are guanine-rich four-stranded DNA/RNA structures,
which have been found throughout the human genome. G4s have been reported to affect
chromatin structure and are involved in important biological processes at transcriptional
and epigenetic levels. However, the underlying molecular mechanisms and locating of
G4 still remain elusive due to the complexity of G4s.
Taking advantage of the development of high-throughput sequencing technologies
and machine learning approaches, we constructed this comprehensive investigation on
G4 structures, including discovery of a novel marker for functional human hematopoietic
stem cells and gained interest in G4 structure, exploring association between G4 and
genomic factors by incorporating multi-omics data, and development of a deep-learningbased
G4 prediction tool with G4 motif.
First, we discovered ADGRG1 as a novel marker for functional human
hematopoietic stem cells and its regulation through transcription activities. Our interest in
G4s was stimulated while the transcription-related investigations.
Next, we analyzed the genome-wide distribution properties of G4s and uncovered
the associations of G4 with other epigenetic and transcriptional mechanisms to coordinate
gene transcription. We explored that different-confidence G4 groups correlated
differently with epigenetic regulatory elements and revealed that G4 structures could
correlate with gene expression in two opposite ways depending on their locations and
forming strands. Some transcription factors were identified to be over-represented with G4 emergence. We found distinct consensus sequences enriched in the G4 feet, with a
high GC content in the feet of high-confidence G4s and a high TA content in solely
predicted G4 feet.
As for the last part, we developed a novel deep-learning-based prediction tool for
DNA G4s with G4 motifs. Considering the classical G4 motif, we applied bi-directional
LSTM model with attention method, which captures sequential information, and showed
good performance in whole-genome level prediction of DNA G4s with the certified G4
pattern.
Our comprehensive work investigated G4 with its functions and predictions and
provided a better understanding of G4s on multi-omics level and computational
information capture riding the wave of deep learning. / 2023-04-03

Identiferoai:union.ndltd.org:IUPUI/oai:scholarworks.iupui.edu:1805/30368
Date09 1900
CreatorsFang, Shuyi
ContributorsWan, Jun, Liu, Yunlong, Yan, Jingwen, Zhang, Jie
Source SetsIndiana University-Purdue University Indianapolis
Languageen_US
Detected LanguageEnglish
TypeDissertation

Page generated in 0.0015 seconds