Return to search

Predicting Reaction Yield in C_N Cross-coupling Using Machine Learning

The catalysis reaction performance, such as yield, is very crucial in organic chemistry. And predicting the reaction yield is still very challenging. In this thesis, machine learning is used to predict the reaction yield in a C–N cross-coupling approach. The reaction data are from the high-throughput experimental data with four variables: reactants, Pd catalysts, additives, and bases. Each reaction data will give the corresponding yield. The data are from the literature, which has been uploaded. The total data number used in machine learning is 7910.
The method mainly consists of four steps. First, load the csv data and import modules. Second, encode data with molecular fingerprint or one-hot encoding. The data will be normalized if there is need. Third, split the dataset into train and test set with the size ratio of 7/3 or 8/2. Fourth, use six machine learning models to learn the data and evaluate their performance. Then, compare the prediction yield of the test set.
The accuracy in prediction (RMSE value and R-squared) and running time will be considered for evaluation. By comparing the RMSE and R-squared values of different models, we can decide which one has better performance and better fitting results. Improved reaction performance, or high-performance catalysts and their characteristics may be obtained.

Identiferoai:union.ndltd.org:kaust.edu.sa/oai:repository.kaust.edu.sa:10754/686679
Date29 November 2022
CreatorsNie, Jianan
ContributorsGao, Xin, Physical Science and Engineering (PSE) Division, Cavallo, Luigi, Han, Yu
Source SetsKing Abdullah University of Science and Technology
LanguageEnglish
Detected LanguageEnglish
TypeThesis
Rights2023-12-29, At the time of archiving, the student author of this thesis opted to temporarily restrict access to it. The full text of this thesis will become available to the public after the expiration of the embargo on 2023-12-29.

Page generated in 0.0021 seconds