Return to search

More is Better than One: The Effect of Ensembling on Deep Learning Performance in Biochemical Prediction Problems

This thesis presents two papers addressing important biochemical prediction challenges. The first paper focuses on accurate protein distance predictions and introduces updates to the ProSPr network. We evaluate its performance in the Critical Assessment of techniques for Protein Structure Prediction (CASP14) competition, investigating its accuracy dependence on sequence length and multiple sequence alignment depth. The ProSPr network, an ensemble of three convolutional neural networks (CNNs), demonstrates superior performance compared to individual networks. The second paper addresses the issue of accurate ligand ranking in virtual screening for drug discovery. We propose MILCDock, a machine learning consensus docking tool that leverages predictions from five traditional molecular docking tools. MILCDock, an ensemble of eight neural networks, outperforms single-network approaches and other consensus docking methods on the DUD-E dataset. However, we find that LIT-PCBA targets remain challenging for all methods tested. Furthermore, we explore the effectiveness of training machine learning tools on the biased DUD-E dataset, emphasizing the importance of mitigating its biases during training. Collectively, this work emphasizes the power of ensembling in deep learning-based biochemical prediction problems, highlighting improved performance through the combination of multiple models. Our findings contribute to the development of robust protein distance prediction tools and more accurate virtual screening methods for drug discovery.

Identiferoai:union.ndltd.org:BGMYU2/oai:scholarsarchive.byu.edu:etd-11132
Date07 August 2023
CreatorsStern, Jacob A.
PublisherBYU ScholarsArchive
Source SetsBrigham Young University
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceTheses and Dissertations
Rightshttps://lib.byu.edu/about/copyright/

Page generated in 0.0022 seconds