• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Towards algorithmic use of chemical data

Jacob, Philipp-Maximilian January 2018 (has links)
The growth of chemical knowledge available via online databases opens opportunities for new types of chemical research. In particular, by converting the data into a network, graph theoretical approaches can be used to study chemical reactions. In this thesis several research questions from the field of data science and graph theory are re-formulated for the chemistry-specific data. Firstly, the structure of chemical reactions data was studied using graph theory. It was found that the network of reactions obtained from the Reaxys data was scale-free, that on average any two species were separated by six reactions, and that evidence for a hierarchy of nodes existed, most clearly in that the hubs that combine a large share of connections onto them also facilitate a large proportion of routes across the network. The hierarchy was also evidenced in the clustering and degree correlations of nodes. Next, it was investigated whether Reaxys could be mined to construct a network of reactions and use it to plan and evaluate synthesis routes in two case studies. A number of heuristics were developed to find synthesis routes using the network taking chemical structures into account. These routes were fed into a multi-criteria decision making framework scoring the routes along environmental sustainability considerations. The approach was successful in discovering and scoring synthesis route candidates. It was found that Reaxys lacked process data in many instances. To address this a proposal for extension of the RInChI reaction data format was developed. The final question addressed was whether the network could be used to predict future reactions by using Stochastic Block Models. Block model-based link prediction performed impressively, being able to achieve a classification accuracy of close to 95% during time-split validation on historic data, differentiating future reaction discoveries from random data. Next, a set of transformation suggestions was thus evaluated and a framework for analysing these results was presented. Overall, the thesis was able to further the understanding of the network’s topology and to present a framework allowing the mining of Reaxys to plan synthesis routes and target R&D efforts in a specific area to discover new reactions.
2

Pre-training Molecular Transformers Through Reaction Prediction / Förträning av molekylär transformer genom reaktionsprediktion

Broberg, Johan January 2022 (has links)
Molecular property prediction has the ability to improve many processes in molecular chemistry industry. One important application is the development of new drugs where molecular property prediction can decrease both the cost and time of finding new drugs. The current trend is to use graph neural networks or transformers which tend to need moderate and large amounts of data respectively to perform well. Because of the scarceness of molecular property data it is of great interest to find an effective method to transfer learning from other more data-abundant problems. In this thesis I present an approach to pre-train transformer encoders on reaction prediction in order to improve performance on downstream molecular property prediction tasks. I have built a model based on the full transformer architecture but modify it for the purpose of pre-training the encoder. Model performance and specifically the effect of pre-training is tested by predicting lipophilicity, HIV inhibition and hERG channel blocking using both pre-trained models and models without any pre-training. The results demonstrate a tendency for improvement of performance on all molecular property prediction tasks using the suggested pre-training but this tendency for improvement is not statistically significant. The major limitation with the conclusive evaluation stems from the limited simulations due to computational constraints

Page generated in 0.4727 seconds