The catalysis reaction performance, such as yield, is very crucial in organic chemistry. And predicting the reaction yield is still very challenging. In this thesis, machine learning is used to predict the reaction yield in a C–N cross-coupling approach. The reaction data are from the high-throughput experimental data with four variables: reactants, Pd catalysts, additives, and bases. Each reaction data will give the corresponding yield. The data are from the literature, which has been uploaded. The total data number used in machine learning is 7910.
The method mainly consists of four steps. First, load the csv data and import modules. Second, encode data with molecular fingerprint or one-hot encoding. The data will be normalized if there is need. Third, split the dataset into train and test set with the size ratio of 7/3 or 8/2. Fourth, use six machine learning models to learn the data and evaluate their performance. Then, compare the prediction yield of the test set.
The accuracy in prediction (RMSE value and R-squared) and running time will be considered for evaluation. By comparing the RMSE and R-squared values of different models, we can decide which one has better performance and better fitting results. Improved reaction performance, or high-performance catalysts and their characteristics may be obtained.
Identifer | oai:union.ndltd.org:kaust.edu.sa/oai:repository.kaust.edu.sa:10754/686679 |
Date | 29 November 2022 |
Creators | Nie, Jianan |
Contributors | Gao, Xin, Physical Science and Engineering (PSE) Division, Cavallo, Luigi, Han, Yu |
Source Sets | King Abdullah University of Science and Technology |
Language | English |
Detected Language | English |
Type | Thesis |
Rights | 2023-12-29, At the time of archiving, the student author of this thesis opted to temporarily restrict access to it. The full text of this thesis will become available to the public after the expiration of the embargo on 2023-12-29. |
Page generated in 0.0031 seconds