Despite the advancement in sequencing technologies, around 98% of the genome
is usually disregarded due to the lack of interpretation methods. Here, I compare
different sequence-based deep-learning approaches for predicting the functionality
of the non-coding genome. Using the largest non-coding variant database, I
tested the change in prediction as pathogenic vs. benign variants were introduced.
Then, I benchmarked their performance on different genomic regions and phenotypes
and built a logistic regression model for cell- and phenotype-specific track
selection. The models outperformed state-of-the-art evolutionary- and variantbased
methods. Finally, I compared different target-gene annotation databases
using ontology-based Resnik’s semantic similarity. I combined the previous steps
in a variant-to-phenotype or phenotype-to-variant workflow and applied it to rare
variants.
Identifer | oai:union.ndltd.org:kaust.edu.sa/oai:repository.kaust.edu.sa:10754/692300 |
Date | 23 May 2023 |
Creators | Al Ali, Hatoon |
Contributors | Hoehndorf, Robert, Biological and Environmental Science and Engineering (BESE) Division, Orlando, Valerio, Tegner, Jesper, Orlando, Valerio |
Source Sets | King Abdullah University of Science and Technology |
Language | English |
Detected Language | English |
Type | Thesis |
Rights | 2024-06-01, At the time of archiving, the student author of this thesis opted to temporarily restrict access to it. The full text of this thesis will become available to the public after the expiration of the embargo on 2024-06-01. |
Relation | N/A |
Page generated in 0.0019 seconds