Biological data topic modeling has become a very prevalent topic among researchers in recent times. However, analysing countless research papers and gathering consensus regarding biomedicine is a near-impossible task for any researcher due to the complexity and quantity of material that is published. This thesis is devised to focus on two objectives that can help the researchers in this domain based on data related to five major DNA repair pathways. The first objective is to propose an unsupervised approach to examine the hidden structures and analyse research trends in temporal biomedical text data. The second objective is to find DNA repair markers involved in immune defense and retrieve potential PPIs, GIs, and disease-gene associations reported in the literature. We have used latent Dirichlet Allocation (LDA) to discover hidden themes and semantically coherent topics from text. We have clustered the documents based on LDA topic models to analyse the research trend and used the Mann- Kendall test to understand the trends of the topics. Hybridization of text mining methods with classical co-occurrence statistical approach and association rule mining was used to discover potential PPIs, GIs, and disease-gene association in the text. The results for PPIs and GIs were then evaluated with an external biological database of PPIs.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:liu-176615 |
Date | January 2021 |
Creators | Jabeen, Rakhshanda |
Publisher | Linköpings universitet, Statistik och maskininlärning, rakhshanda.jbn@gmail.com |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0021 seconds