Here we present a similarity-based pairing method for generating compound pairs to train a Siamese Neural Network. In comparison with the conventional exhaustive pairing of N2/2 pairs (N being the sizeof the training set), this method results in N-1 pairs, significantly reducing the training time. It exhibits a better prediction performance consistently on the three physicochemical property datasets, using a multilayer perceptron with the ECFP4 fingerprint. We further include into the Siamese Neural Network the pre-trained Chemformer which extracts task-specific chemical features from the input SMILES strings. With the n-shot learning, we propose a means to measure the prediction uncertainty. Our results demonstrate that the higher accuracy is indeed associated with the lower prediction uncertainty. In addition, we discuss implications of the similarity principle in machine learning.
Identifer | oai:union.ndltd.org:UPSALLA1/oai:DiVA.org:uu-480104 |
Date | January 2022 |
Creators | Zhang, Yumeng |
Publisher | Uppsala universitet, Institutionen för farmaceutisk biovetenskap, Medicinal Chemistry, Research and Early Development, Respiratory and Immunology (R&I) |
Source Sets | DiVA Archive at Upsalla University |
Language | English |
Detected Language | English |
Type | Student thesis, info:eu-repo/semantics/bachelorThesis, text |
Format | application/pdf |
Rights | info:eu-repo/semantics/openAccess |
Page generated in 0.0018 seconds