Return to search

Exploration of Chemical Space: Formal, chemical and historical aspects

Starting from the observation that substances and reactions are the central entities of chemistry, I have structured chemical knowledge into a formal space called a directed hypergraph, which arises when substances are connected by their reactions. I call this hypernet chemical space. In this thesis, I explore different levels of description of this space: its evolution over time, its curvature, and categorical models of its compositionality.
The vast majority of the chemical literature focuses on investigations of particular aspects of some substances or reactions, which have been systematically recorded in comprehensive databases such as Reaxys for the last 200 years. While complexity science has made important advances in physics, biology, economics, and many other fields, it has somewhat neglected chemistry. In this work, I propose to take a global view of chemistry and to combine complexity science tools, modern data analysis techniques, and geometric and compositional theories to explore chemical space. This provides a novel view of chemistry, its history, and its current status.
We argue that a large directed hypergraph, that is, a model of directed relations between sets, underlies chemical space and that a systematic study of this structure is a major challenge for chemistry. Using the Reaxys database as a proxy for chemical space, we search for large-scale changes in a directed hypergraph model of chemical knowledge and present a data-driven approach to navigate through its history and evolution. These investigations focus on the mechanistic features by which this space has been expanding: the role of synthesis and extraction in the production of new substances, patterns in the selection of starting materials, and the frequency with which reactions reach new regions of chemical space. Large-scale patterns that emerged in the last two centuries of chemical history are detected, in particular, in the growth of chemical knowledge, the use of reagents, and the synthesis of products, which reveal both conservatism and sharp transitions in the exploration of the space. Furthermore, since chemical similarity of substances arises from affinity patterns in chemical reactions, we quantify the impact of changes in the diversity of the space on the formulation of the system of chemical elements.
In addition, we develop formal tools to probe the local geometry of the resulting directed hypergraph and introduce the Forman-Ricci curvature for directed and undirected hypergraphs. This notion of curvature is characterized by applying it to social and chemical networks with higher order interactions, and then used for the investigation of the structure and dynamics of chemical space.
The network model of chemistry is strongly motivated by the observation that the compositional nature of chemical reactions must be captured in order to build a model of chemical reasoning. A step forward towards categorical chemistry, that is, a formalization of all the flavors of compositionality in chemistry, is taken by the construction of a categorical model of directed hypergraphs. We lifted the structure from a lineale (a poset version of a symmetric monoidal closed category) to a category of Petri nets, whose wiring is a bipartite directed graph equivalent to a directed hypergraph. The resulting construction, based on the Dialectica categories introduced by Valeria De Paiva, is a symmetric monoidal closed category with finite products and coproducts, which provides a formal way of composing smaller networks into larger in such a way that the algebraic properties of the components are preserved in the resulting network. Several sets of labels, often used in empirical data modeling, can be given the structure of a lineale, including: stoichiometric coefficients in chemical reaction networks, reaction rates, inhibitor arcs, Boolean interactions, unknown or incomplete data, and probabilities. Therefore, a wide range of empirical data types for chemical substances and reactions can be included in our model.

Identiferoai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:82716
Date20 December 2022
CreatorsLeal, Wilmer
ContributorsUniversität Leipzig
Source SetsHochschulschriftenserver (HSSS) der SLUB Dresden
LanguageEnglish
Detected LanguageEnglish
Typeinfo:eu-repo/semantics/updatedVersion, doc-type:doctoralThesis, info:eu-repo/semantics/doctoralThesis, doc-type:Text
Rightsinfo:eu-repo/semantics/openAccess

Page generated in 0.0022 seconds