Return to search

BENCHMARKING SMALL-DATASET STRUCTURE-ACTIVITY-RELATIONSHIP MODELS FOR PREDICTION OF WNT SIGNALING INHIBITION

Quantitative structure-activity relationship (QSAR) models based on machine learning algorithms are powerful tools to expedite drug discovery processes and therapeutics development. Given the cost in acquiring large-sized training datasets, it is useful to examine if QSAR analysis can reasonably predict drug activity with only a small-sized dataset (size < 100) and benchmark these small-dataset QSAR models in application-specific studies. To this end, here we present a systematic benchmarking study on small-dataset QSAR models built for prediction of effective Wnt signaling inhibitors, which are essential to therapeutics development in prevalent human diseases (e.g., cancer). Specifically, we examined a total of 72 two-dimensional (2D) QSAR models based on 4 best-performing algorithms, 6 commonly used molecular fingerprints, and 3 typical fingerprint lengths. We trained these models using a training dataset (56 compounds), benchmarked their performance on 4 figures-of-merit (FOMs), and examined their prediction accuracy using an external validation dataset (14 compounds). Our data show that the model performance is maximized when: 1) molecular fingerprints are selected to provide sufficient, unique, and not overly detailed representations of the chemical structures of drug compounds; 2) algorithms are selected to reduce the number of false predictions due to class imbalance in the dataset; and 3) models are selected to reach balanced performance on all 4 FOMs. These results may provide general guidelines in developing high-performance small-dataset QSAR models for drug activity prediction.

Identiferoai:union.ndltd.org:UMASS/oai:scholarworks.umass.edu:masters_theses_2-2153
Date20 October 2021
CreatorsKokabi, Mahtab
PublisherScholarWorks@UMass Amherst
Source SetsUniversity of Massachusetts, Amherst
Detected LanguageEnglish
Typetext
Formatapplication/pdf
SourceMasters Theses

Page generated in 0.0022 seconds