Global ETD Search

1	Accelerating process development of complex chemical reactions Amar, Yehia January 2019 (has links) Process development of new complex reactions in the pharmaceutical and fine chemicals industries is challenging, and expensive. The field is beginning to see a bridging between fundamental first-principles investigations, and utilisation of data-driven statistical methods, such as machine learning. Nonetheless, process development and optimisation in these industries is mostly driven by trial-and-error, and experience. Approaches that move beyond these are limited to the well-developed optimisation of continuous variables, and often do not yield physical insights. This thesis describes several new methods developed to address research questions related to this challenge. First, we investigated whether utilising physical knowledge could aid statistics-guided self-optimisation of a C-H activation reaction, in which the optimisation variables were continuous. We then considered algorithmic treatment of the more challenging discrete variables, focussing on solvents. We parametrised a library of 459 solvents with physically meaningful molecular descriptors. Our case study was a homogeneous Rh-catalysed asymmetric hydrogenation to produce a chiral γ-lactam, with conversion and diastereoselectivity as objectives. We adapted a state-of-the-art multi-objective machine learning algorithm, based on Gaussian processes, to utilise the descriptors as inputs, and to create a surrogate model for each objective. The aim of the algorithm was to determine a set of Pareto solutions with a minimum experimental budget, whilst simultaneously addressing model uncertainty. We found that descriptors are a valuable tool for Design of Experiments, and can produce predictive and interpretable surrogate models. Subsequently, a physical investigation of this reaction led to the discovery of an efficient catalyst-ligand system, which we studied by operando NMR, and identified a parametrised kinetic model. Turning the focus then to ligands for asymmetric hydrogenation, we calculated versatile empirical descriptors based on the similarity of atomic environments, for 102 chiral ligands, to predict diastereoselectivity. Whilst the model fit was good, it failed to accurately predict the performance of an unseen ligand family, due to analogue bias. Physical knowledge has then guided the selection of symmetrised physico-chemical descriptors. This produced more accurate predictive models for diastereoselectivity, including for an unseen ligand family. The contribution of this thesis is a development of novel and effective workflows and methodologies for process development. These open the door for process chemists to save time and resources, freeing them up from routine work, to focus instead on creatively designing new chemistry for future real-world applications.
2	Property-enriched fragment descriptors for adaptive QSAR / Descripteurs fragmentaux enrichis par propriété pour QSAR adaptatif Ruggiu, Fiorella 22 September 2014 (has links) Les descripteurs ISIDA enrichis par propriété ont été introduit pour encoder les structures moléculaires en chémoinformatique en tant que nombre d’occurrence de sous-graphes moléculaires spécifiques dont les sommets représentant les atomes sont colorés par des propriétés locales tel que les pharmacophores dépendant du pH, les identifiants de champs de force, les charges partielles, les incréments LogP ou les propriétés extraites d’un modèle QSAR. Ces descripteurs, par leurs large choix d’option, permettent à l’utilisateur de les adapter au problème à modéliser. Ils ont été utilisés avec succès dans une étude de criblage virtuel sur des inhibiteurs de protéases et plusieurs modèles QSAR sur le coefficient de partage octanol-eau, l’index d’hydrophobicité chromatographique, l’inhibition du canal hERG, la constante de dissociation acide, la force des accepteurs de liaison hydrogène et l’affinité de liaison des GPCR. / ISIDA property-enriched fragment descriptors were introduced as a general framework to numerically encode molecular structures in chemoinformatics, as counts of specific subgraphs in which atom vertices are coloured with respect to a local property - notably pH-dependent pharmacophore, force field, partial charges, logP increments and QSAR model extracted properties. The descriptors leave the user a vast choice in terms of the level of resolution at which chemical information is extracted into the descriptors to adapt them to the problem. They were successfully tested in neighbourhood behaviour and QSAR modelling challenges, with very promising results. They showed excellent results in similarity-based virtual screening for analogue protease inhibitors, and generated highly predictive octanol-water partition coefficient, chromatographic hydrophobicity index, hERG channel inhibition, acidic dissociation constant, hydrogen-bond acceptor strength and GPCR binding affinity models. QSAR Descripteurs moléculaires Chémoinformatique QSAR Molecular descriptors Chemoinformatics 543
3	Automated Tools for Accelerating Development of Combustion Modelling Yalamanchi, Kiran K. 09 1900 (has links) The ever-increasing focus of policy-makers on environmental issues are pushing the combustion community towards making combustion cleaner by optimizing the combustion equipment in order to reduce emissions, improve efficiency and satisfy the increasing energy demand. A major part of this involves advancing modelling capabilities of these complex combustion systems, which is a combination of computational fluid dynamics with detailed chemical kinetic models. A chemical kinetic model comprises of a series of elementary reactions with corresponding kinetic rate parameters and species thermodynamic and transport data. The predictive capability of these models depends on the accuracy to which individual chemical reaction rates, thermodynamic and transport parameters are known. A minor fraction of the rate constants and thermodynamic properties in the widely used kinetic mechanisms are experimentally derived or theoretically calculated. The remaining are approximated using rate rules and group additivity methods respectively for rate constants and thermodynamic properties. Recent works have highlighted the need for error checking when preparing such models using the approximations, but a useful community tool to perform such analysis is missing. In the initial part of this work, we developed a simple online tool to screen chemical kinetic mechanisms for bimolecular reactions exceeding collision limits. Furthermore, issues related to unphysically fast time scales can remain an issue even if all bimolecular reactions are within collision limits. Therefore, we also presented a procedure to screen ultra-fast reaction time scales using computational singular perturbation (CSP). The screening of kinetic models is a necessary condition, however, not a sufficient one. Therefore, exploring new approaches for the simulation of complex chemically reacting systems are needed. This work focuses on developing new methods for estimating thermodynamic data efficiently and accurately, thereby increasing the compliance of forth-mentioned screening. Machine Learning (ML) has been increasingly becoming a tool of choice for regression, replacing traditional function fittings. Group additivity incorporates simple functions and derive constants with a certain existing data and use these functions to estimate the unknown values. ML algorithms does the same without fixing a specific function there by letting algorithm to learn the non-linearity from the training data itself. With the new data coming in with time, ML algorithms learn better and improves over time, whereas this need not necessarily happen with traditional methods. In the first part of the study, data for standard enthalpy is collected from the literature sources and ML models are built on these databases. Two different models were built and studied for a straight-chain species and cyclic species dataset. Molecular descriptors are used as the datasets collected from literature are small for using any sparse representations as input. As expected, we observed a good improvement above group additivity method for these ML models. The improvement is observed to be more significant for cyclic species. With the motivation of ML models showing benefit over the group additivity method, a step further was taken. A homogenous and accurate dataset is necessary for building a ML model that can be used for generating the thermodynamic data for kinetic models. With this in mind, an accurate database for thermodynamic data is built from ab-intio calculations. The species in the dataset are taken from a detailed and well established mechanism to cover all the species in a typical kinetic mechanism. The calculations are performed at a high level of accuracy, in comparison to other similar datasets in literature. In the later part of this work, the dataset developed using ab-inito calculations is used for developing ML models. Unlike the ML models built from the literature datasets, this database consists of all the thermodynamic data required for kinetic models viz. standard enthalpy and standard entropy and heat capacity at 300 K and higher temperatures. To numerically mimic real gasoline fuel reactivity, surrogates are proposed to facilitate advanced engine design and predict emissions by chemical kinetic modelling. However, chemical kinetic models could not always accurately predict non-regular emissions, e.g. aldehydes, ketones and unsaturated hydrocarbons, which are important air pollutants. Therefore, we propose to use machine-learning algorithms directly to achieve better predictions, circumventing the kinetic models. Combustion chemistry of fuels constituting of 10 neat fuels, 6 primary reference fuels (PRF) and 6 FGX surrogates were tested in a jet stirred reactor. Experimental data were collected in the same setup to maintain data uniformity and consistency. Measured species profiles of methane, ethylene, propylene, hydrogen, carbon monoxide and carbon dioxide are used for machine-learning model development. The model considers both chemical effects and physical conditions. Chemical effects are described as different functional groups, viz. primary, secondary, tertiary, and quaternary carbons in molecular structures, and physical conditions as temperature. Both the Machine-learning models used in this study showed a good prediction accuracy. By expanding the experimental database, machine-learning models can be further applied to many other hydrocarbons in future work, for the direct predictions. Thermodynamics Enthalpy Heat capacities Chemical Kinetics Machine learning Molecular descriptors
4	PREDICTING ENERGETIC MATERIAL PROPERTIES AND INVESTIGATING THE EFFECT OF PORE MORPHOLOGY ON SHOCK SENSITIVITY VIA MACHINE LEARNING Alex Donald Casey (9167681) 28 July 2020 (has links) <div>An improved understanding of energy localization ("hot spots'') is needed to improve the safety and performance of explosives. In this work I establish a variety of experimental and computational methods to aid in the investigation of hot spots. In particular, focus is centered on the implicit relationship between hot spots and energetic material sensitivity. To begin, I propose a technique to visualize and quantify the properties of a dynamic hot spot from within an energetic composite subjected to ultrasonic mechanical excitation. The composite is composed of an optically transparent binder and a countable number of HMX crystals. The evolving temperature field is measured by observing the luminescence from embedded phosphor particles and subsequent application of the intensity ratio method. The spatial temperature precision is less than 2% of the measured absolute temperature in the temperature regime of interest (23-220 C). The temperature field is mapped from within an HMX-binder composite under periodic mechanical excitation.</div><div> </div><div> Following this experimental effort I examine the statistics behind the most prevalent and widely used sensitivity test (at least within the energetic materials community) and suggest adaptions to generalize the approach to bimodal latent distributions. Bimodal latent distributions may occur when manufacturing processes are inconsistent or when competing initiation mechanisms are present.</div><div> </div><div> Moving to simulation work, I investigate how the internal void structure of a solid explosive influences initiation behavior -- specifically the criticality of isolated hot spots -- in response to a shock insult. In the last decade, there has been a significant modeling and simulation effort to investigate the thermodynamic response of a shock induced pore collapse process in energetic materials. However, the majority of these studies largely ignore the geometry of the pore and assume simplistic shapes, typically a sphere. In this work, the influence of pore geometry on the sensitivity of shocked HMX is explored. A collection of pore geometries are retrieved from micrographs of pressed HMX samples via scanning electron microscopy. The shock induced collapse of these geometries are simulated using CTH and the response is reduced to a binary "critical'’ / "sub-critical'' result. The simulation results are used to assign a minimum threshold velocity required to exhibit a critical response to each pore geometry. The pore geometries are subsequently encoded to numerical representations and a functional mapping from pore shape to a threshold velocity is developed using supervised machine-learned models. The resulting models demonstrate good predictive capability and their relative performance is explored. The established models are exposed via a web application to further investigate which shape features most heavily influence sensitivity.</div><div> </div><div> Finally, I develop a convolutional neural network capable of directly parsing the 3D electronic structure of a molecule described by spatial point data for charge density and electrostatic potential represented as a 4D tensor. This method effectively bypasses the need to construct complex representations, or descriptors, of a molecule. This is beneficial because the accuracy of a machine learned model depends on the input representation. Ideally, input descriptors encode the essential physics and chemistry that influence the target property. Thousands of molecular descriptors have been proposed and proper selection of features requires considerable domain expertise or exhaustive and careful statistical downselection. In contrast, deep learning networks are capable of learning rich data representations. This provides a compelling motivation to use deep learning networks to learn molecular structure-property relations from "raw'' data. The convolutional neural network model is jointly trained on over 20,000 molecules that are potentially energetic materials (explosives) to predict dipole moment, total electronic energy, Chapman-Jouguet (C-J) detonation velocity, C-J pressure, C-J temperature, crystal density, HOMO-LUMO gap, and solid phase heat of formation. To my knowledge, this demonstrates the first use of the complete 3D electronic structure for machine learning of molecular properties. </div> Mechanical Engineering energetic materials machine learning sensitivity molecular descriptors pore geometry convolutional neural networks
5	Prediction of Fluid Dielectric Constants Liu, Jiangping 07 July 2011 (has links) (PDF) The dielectric constant or relative static permittivity of a material represents the capacitance of the material relative to a vacuum and is important in many industrial applications. Nevertheless, accurate experimental values are often unavailable and current prediction methods lack accuracy and are often unreliable. A new QSPR (quantitative structure-property relation) correlation of dielectric constant for pure organic chemicals is developed and tested. The average absolute percent error is expected to be less than 3% when applied to hydrocarbons and non-polar compounds and less than 18% when applied to polar compounds with dielectric constant values ranging from 1.0 to 50.0. A local composition model is developed for mixture dielectric constants based on the Nonrandom-Two-Liquid (NRTL) model commonly used for correlating activity coefficients in vapor-liquid equilibrium data regression. It is predictive in that no mixture dielectric constant data are used and there are no adjustable parameters. Predictions made on 16 binary and six ternary systems at various compositions and temperatures compare favorably to extant correlations data that require experimental values to fit an adjustable parameter in the mixing rule and are significantly improved over values predicted by Oster's equation that also has no adjustable parameters. In addition, molecular dynamics (MD) simulations provide an alternative to analytic relations. Results suggest that MD simulations require very accurate force field models, particularly with respect to the charge distribution within the molecules, to yield accurate pure chemical values of dielectric constant, but with the development of more accurate pure chemical force fields, it appears that mixture simulations of any number of components are likely possible. Using MD simulations, the impact of different portions of the force field on the calculated dielectric constant were examined. The results obtained suggest that rotational polarization arising from the permanent dipole moments makes the dominant contribution to dielectric constant. Changes in the dipole moment due to angle bending and bond stretching (distortion polarization) have less impact on dielectric constant than rotational polarization due to permanent dipole alignment, with angle bending being more significant than bond stretching. Jiangping Liu dielectric constant dipole moment QSPR molecular descriptors NRTL molecular dynamics simulations polarization Chemical Engineering
6	Bacterial attachment to polymeric materials correlates with molecular flexibility and hydrophilicity Sanni, O., Chang, Chien-Yi, Anderson, D.G., Langer, R., Davies, M.C., Williams, P.M., Williams, P., Alexander, M.R., Hook, A.L. 09 December 2014 (has links) Yes / A new class of material resistant to bacterial attachment has been discovered that is formed from polyacrylates with hydrocarbon pendant groups. In this study, the relationship between the nature of the hydrocarbon moiety and resistance to bacteria is explored, comparing cyclic, aromatic, and linear chemical groups. A correlation is shown between bacterial attachment and a parameter derived from the partition coefficient and the number of rotatable bonds of the materials' pendant groups. This correlation is applicable to 86% of the hydrocarbon pendant moieties surveyed, quantitatively supporting the previous qualitative observation that bacteria are repelled from poly(meth)acrylates containing a hydrophilic ester group when the pendant group is both rigid and hydrophobic. This insight will help inform and predict the further development of polymers resistant to bacterial attachment. / Wellcome Trust (grant number 085245) and EMRP (IND56) Low-fouling Molecular descriptors Polymer microarrays Pseudomonas aeruginosa Ion mass spectrometry
7	Multi-cavity molecular descriptor interconnections: Enhanced protocol for prediction of serum albumin drug binding Akawa, O.B., Okunlola, F.O., Alahmdi, M.I., Abo-Dya, N.E., Sidhom, P.A., Ibrahim, M.A.A., Shibl, M.F., Khan, Shahzeb, Soliman, M.E.S. 03 November 2023 (has links) Yes / The role of human serum albumin (HSA) in the transport of molecules predicates its involvement in the determination of drug distribution and metabolism. Optimization of ADME properties are analogous to HSA binding thus this is imperative to the drug discovery process. Currently, various in silico predictive tools exist to complement the drug discovery process, however, the prediction of possible ligand-binding sites on HSA has posed several challenges. Herein, we present a strong and deeper-than-surface case for the prediction of HSA-ligand binding sites using multi-cavity molecular descriptors by exploiting all experimentally available and crystallized HSA-bound drugs. Unlike previously proposed models found in literature, we established an in-depth correlation between the physicochemical properties of available crystallized HSA-bound drugs and different HSA binding site characteristics to precisely predict the binding sites of investigational molecules. Molecular descriptors such as the number of hydrogen bond donors (nHD), number of heteroatoms (nHet), topological polar surface area (TPSA), molecular weight (MW), and distribution coefficient (LogD) were correlated against HSA binding site characteristics, including hydrophobicity, hydrophilicity, enclosure, exposure, contact, site volume, and donor/acceptor ratio. Molecular descriptors nHD, TPSA, LogD, nHet, and MW were found to possess the most inherent capacities providing baseline information for the prediction of serum albumin binding site. We believe that these associations may form the bedrock for establishing a solid correlation between the physicochemical properties and Albumin binding site architecture. Information presented in this report would serve as critical in provisions of rational drug designing as well as drug delivery, bioavailability, and pharmacokinetics. Human serum albumin Physicochemical properties Drug binding HSA prediction models Molecular descriptors
8	A Hierarchical Graph for Nucleotide Binding Domain 2 Kakraba, Samuel 01 May 2015 (has links) One of the most prevalent inherited diseases is cystic fibrosis. This disease is caused by a mutation in a membrane protein, the cystic fibrosis transmembrane conductance regulator (CFTR). CFTR is known to function as a chloride channel that regulates the viscosity of mucus that lines the ducts of a number of organs. Generally, most of the prevalent mutations of CFTR are located in one of two nucleotide binding domains, namely, the nucleotide binding domain 1 (NBD1). However, some mutations in nucleotide binding domain 2 (NBD2) can equally cause cystic fibrosis. In this work, a hierarchical graph is built for NBD2. Using this model for NBD2, we examine the consequence of single point mutations on NBD2. We collate the wildtype structure with eight of the most prevalent mutations and observe how the NBD2 is affected by each of these mutations. Cystic fibrosis Mutation Graph-theoretic Models NBD2 Molecular Descriptors Nested Graph. Bioinformatics Discrete Mathematics and Combinatorics Epidemiology Other Applied Mathematics
9	Emprego de redes neurais e de descritores moleculares em quimiotaxonomia da família Asteraceae / Use of Neural Networks and Molecular Descriptors in Chemotaxonomy of the Asteraceae Family Scotti, Marcus Tullius 18 July 2008 (has links) Esse trabalho descreve o desenvolvimento de uma nova ferramenta quimioinformática designada de SISTEMATX que possibilitou a análise quimiotaxonômica da família Asteraceae, empregando novos parâmetros moleculares, bem como o estudo da relação quantitativa estrutura química atividade biológica de substâncias provenientes desse grupo vegetal. A família Asteraceae, uma das maiores entre as angiospermas, caracteriza-se quimicamente pela produção de sesquiterpenos lactonizados (SLs). Um total de 1111 (SLs), extraídos de 658 espécies, 161 gêneros, 63 subtribos e 15 tribos da família Asteraceae foram representados e cadastrados em duas dimensões no SISTEMATX e associados à respectiva origem botânica. A partir dessa codificação, o grau de oxidação e as estruturas em três dimensões de cada SL foram obtidos pelo sistema. Essas informações, associadas aos dados botânicos, foram exportadas para um arquivo texto, o qual permitiu a obtenção de vários tipos de descritores moleculares. Esses parâmetros moleculares foram correlacionados com o grau de oxidação médio por tribo e tiveram sua seleção realizada por regressão linear múltipla utilizando algoritmo genético. Equações com coeficientes estatísticos variando entre 0,725 ≤ r2 ≤ 0,981 e 0,647 ≤ Qcv2 ≤ 0,725 foram obtidas com apenas um descritor, possibilitando a identificação de algumas características estruturais relacionadas ao grau de oxidação. Não foi obtida nenhuma relação entre o grau de oxidação dos SL e a evolução das tribos da família Asteraceae. Os descritores moleculares também foram usados como dados de entrada para separar as ocorrências botânicas através de mapas auto-organizáveis (rede não supervisionada Kohonen). Os mapas gerados, com cada bloco de descritor, separaram as tribos da família Asteraceae com valores de índices de acerto total entre 66,7% e 83,6%. A análise desses resultados evidencia semelhanças entre as tribos Heliantheae, Helenieae, e Eupatorieae e, também, entre as tribos Anthemideae e Inuleae. Tais observações são coincidentes com as classificações sistemáticas propostas por Bremer, que utilizam principalmente dados morfológicos e, também, moleculares. A mesma abordagem foi utilizada para separar os ramos da tribo Heliantheae, segundo a classificação proposta por Stuessy, cuja separação é baseada no número de cromossomos das subtribos. Os mapas auto-organizáveis obtidos separam em duas regiões distintas os ramos A e C, com elevados índices de acerto total que variam entre 81,79% a 92,48%. Ambos os estudos demonstram que os descritores moleculares podem ser utilizados como uma ferramenta para classificação de táxons em níveis hierárquicos baixos, tais como tribos e subtribos. Adicionalmente, foi demonstrado que os marcadores químicos corroboram parcialmente com as classificações que empregam dados morfológicos e moleculares. Os descritores obtidos por fragmentos ou pela representação da estrutura dos SLs em duas dimensões foram suficientes para obtenção de resultados significativos, não sendo obtida melhora nos resultados com os descritores que utilizam a representação em três dimensões das estruturas. Paralelamente, um estudo adicional foi realizado relacionando a estrutura química, representada pelos mesmos descritores moleculares anteriormente mencionados, com a atividade citotóxica de 37 SLs frente às células tumorais da nasofaringe KB. Uma equação com índices estatísticos significativos (r2=0,826 e Qcv2=0,743) foi obtida. Os cinco descritores, selecionados a partir de uma equação estatisticamente mais significativa, representam uma descrição global de propriedades estéricas e características eletrônicas de cada molécula que auxiliaram na determinação de fragmentos estruturais importantes para a atividade citotóxica. Tal modelo permitiu verificar que os esqueletos carbônicos dos tipos guaianolídeo e pseudoguaianolídeo são encontrados nos SLs que apresentam maior atividade citotóxica. / This work describes the development of a new chemoinformatic tool named SISTEMATX that allowed the chemotaxonomic analysis of the Asteraceae family employing new molecular parameters, as well as the quantitative structure activity relationship study of compounds produced by this botanical group. The Asteraceae, one of the largest families among of angiosperms, is chemically characterized by the production of sesquiterpene lactones (SLs). A total of 1111 (SLs), extracted from 658 species, 161 genera, 63 subtribes and 15 tribes of the Asteraceae, were represented and registered in two dimensions in the SISTEMATX and associated with their botanical source. From this codification, the degree of oxidation and the structures in three dimensions of each SL were obtained by the system. These data linked with botanical origin were exported for a text file which allow the generation of several types of molecular descriptors. These molecular parameters were correlated with the average oxidation degree by tribe and were selected by multiple linear regressions using genetic algorithms. Equations with statistical coefficients varying between 0,725 ≤ r2 0,981 and 0,647 ≤ Qcv2 ≤ 0,725 were obtained with only one descriptor, making possible the identification of some structural characteristics related to the oxidation level. Any relationship between the degree of oxidation of SL and the tribes evolution of the family Asteraceae was not obtained. The molecular descriptors were also used as input data to separate the botanical occurrences through the self organizing-maps (unsupervised net Kohonen). The generated maps with each block descriptor, divide the Asteraceae tribes with total indexes values between 66,7% and 83,6%. The analysis of these results shows evident similarities among the Heliantheae, Helenieae and Eupatorieae tribes and, also, between the Anthemideae and Inuleae tribes. Those observations are in agreement with the systematic classifications proposed by Bremer, that use mainly morphologic and, also, molecular data. The same approach was utilized to separate the branches of the Heliantheae tribe, according to the Stuessys classification, whose division is based on the chromosome numbers of the subtribes. From the obtained self-organizing maps, two different areas (branches A and C) were separated with high hit indexes varying among 81,79% to 92,48%. Both studies demonstrate that the molecular descriptors can be used as a tool for taxon classification in low hierarchical levels such as tribes and subtribes. Additionally, was demonstrated that the chemical markers partially corroborate with the classifications that use morphologic and molecular data. Descriptors obtained by fragments or by the representation of the SL structures in two dimensions were sufficient to obtain significant results, and were not obtained better results with descriptors that utilize the structure representation in three dimensions. An additional study was accomplished relating the chemical structure, represented by the same molecular descriptors previously mentioned, with the cytotoxic activity of 37 SLs against tumoral cells derived from human carcinoma of the nasopharynx (KB). An equation with significant statistical indexes was obtained. The five descriptors, selected from the more statistical significant equation, shows a global description of sterical properties and electronic characteristics of each molecule that aid in the determination of important structural fragments for the cytotoxic activity. From the model can be verified that the carbon skeletons of the guaianolide and pseudoguaianolide types are encountered in the SLs that show the higher cytotoxic activity. Asteraceae Asteraceae Botanic Botânica (Classificação) Chemotaxonomy Descritores Moleculares Kohonen Kohonen Mapas Auto-Organizáveis Molecular Descriptors Natural products Neural Networks Produtos naturais Quimiotaxonomia Redes Neurais Self-Organizing Maps
10	Emprego de redes neurais e de descritores moleculares em quimiotaxonomia da família Asteraceae / Use of Neural Networks and Molecular Descriptors in Chemotaxonomy of the Asteraceae Family Marcus Tullius Scotti 18 July 2008 (has links) Esse trabalho descreve o desenvolvimento de uma nova ferramenta quimioinformática designada de SISTEMATX que possibilitou a análise quimiotaxonômica da família Asteraceae, empregando novos parâmetros moleculares, bem como o estudo da relação quantitativa estrutura química atividade biológica de substâncias provenientes desse grupo vegetal. A família Asteraceae, uma das maiores entre as angiospermas, caracteriza-se quimicamente pela produção de sesquiterpenos lactonizados (SLs). Um total de 1111 (SLs), extraídos de 658 espécies, 161 gêneros, 63 subtribos e 15 tribos da família Asteraceae foram representados e cadastrados em duas dimensões no SISTEMATX e associados à respectiva origem botânica. A partir dessa codificação, o grau de oxidação e as estruturas em três dimensões de cada SL foram obtidos pelo sistema. Essas informações, associadas aos dados botânicos, foram exportadas para um arquivo texto, o qual permitiu a obtenção de vários tipos de descritores moleculares. Esses parâmetros moleculares foram correlacionados com o grau de oxidação médio por tribo e tiveram sua seleção realizada por regressão linear múltipla utilizando algoritmo genético. Equações com coeficientes estatísticos variando entre 0,725 ≤ r2 ≤ 0,981 e 0,647 ≤ Qcv2 ≤ 0,725 foram obtidas com apenas um descritor, possibilitando a identificação de algumas características estruturais relacionadas ao grau de oxidação. Não foi obtida nenhuma relação entre o grau de oxidação dos SL e a evolução das tribos da família Asteraceae. Os descritores moleculares também foram usados como dados de entrada para separar as ocorrências botânicas através de mapas auto-organizáveis (rede não supervisionada Kohonen). Os mapas gerados, com cada bloco de descritor, separaram as tribos da família Asteraceae com valores de índices de acerto total entre 66,7% e 83,6%. A análise desses resultados evidencia semelhanças entre as tribos Heliantheae, Helenieae, e Eupatorieae e, também, entre as tribos Anthemideae e Inuleae. Tais observações são coincidentes com as classificações sistemáticas propostas por Bremer, que utilizam principalmente dados morfológicos e, também, moleculares. A mesma abordagem foi utilizada para separar os ramos da tribo Heliantheae, segundo a classificação proposta por Stuessy, cuja separação é baseada no número de cromossomos das subtribos. Os mapas auto-organizáveis obtidos separam em duas regiões distintas os ramos A e C, com elevados índices de acerto total que variam entre 81,79% a 92,48%. Ambos os estudos demonstram que os descritores moleculares podem ser utilizados como uma ferramenta para classificação de táxons em níveis hierárquicos baixos, tais como tribos e subtribos. Adicionalmente, foi demonstrado que os marcadores químicos corroboram parcialmente com as classificações que empregam dados morfológicos e moleculares. Os descritores obtidos por fragmentos ou pela representação da estrutura dos SLs em duas dimensões foram suficientes para obtenção de resultados significativos, não sendo obtida melhora nos resultados com os descritores que utilizam a representação em três dimensões das estruturas. Paralelamente, um estudo adicional foi realizado relacionando a estrutura química, representada pelos mesmos descritores moleculares anteriormente mencionados, com a atividade citotóxica de 37 SLs frente às células tumorais da nasofaringe KB. Uma equação com índices estatísticos significativos (r2=0,826 e Qcv2=0,743) foi obtida. Os cinco descritores, selecionados a partir de uma equação estatisticamente mais significativa, representam uma descrição global de propriedades estéricas e características eletrônicas de cada molécula que auxiliaram na determinação de fragmentos estruturais importantes para a atividade citotóxica. Tal modelo permitiu verificar que os esqueletos carbônicos dos tipos guaianolídeo e pseudoguaianolídeo são encontrados nos SLs que apresentam maior atividade citotóxica. / This work describes the development of a new chemoinformatic tool named SISTEMATX that allowed the chemotaxonomic analysis of the Asteraceae family employing new molecular parameters, as well as the quantitative structure activity relationship study of compounds produced by this botanical group. The Asteraceae, one of the largest families among of angiosperms, is chemically characterized by the production of sesquiterpene lactones (SLs). A total of 1111 (SLs), extracted from 658 species, 161 genera, 63 subtribes and 15 tribes of the Asteraceae, were represented and registered in two dimensions in the SISTEMATX and associated with their botanical source. From this codification, the degree of oxidation and the structures in three dimensions of each SL were obtained by the system. These data linked with botanical origin were exported for a text file which allow the generation of several types of molecular descriptors. These molecular parameters were correlated with the average oxidation degree by tribe and were selected by multiple linear regressions using genetic algorithms. Equations with statistical coefficients varying between 0,725 ≤ r2 0,981 and 0,647 ≤ Qcv2 ≤ 0,725 were obtained with only one descriptor, making possible the identification of some structural characteristics related to the oxidation level. Any relationship between the degree of oxidation of SL and the tribes evolution of the family Asteraceae was not obtained. The molecular descriptors were also used as input data to separate the botanical occurrences through the self organizing-maps (unsupervised net Kohonen). The generated maps with each block descriptor, divide the Asteraceae tribes with total indexes values between 66,7% and 83,6%. The analysis of these results shows evident similarities among the Heliantheae, Helenieae and Eupatorieae tribes and, also, between the Anthemideae and Inuleae tribes. Those observations are in agreement with the systematic classifications proposed by Bremer, that use mainly morphologic and, also, molecular data. The same approach was utilized to separate the branches of the Heliantheae tribe, according to the Stuessys classification, whose division is based on the chromosome numbers of the subtribes. From the obtained self-organizing maps, two different areas (branches A and C) were separated with high hit indexes varying among 81,79% to 92,48%. Both studies demonstrate that the molecular descriptors can be used as a tool for taxon classification in low hierarchical levels such as tribes and subtribes. Additionally, was demonstrated that the chemical markers partially corroborate with the classifications that use morphologic and molecular data. Descriptors obtained by fragments or by the representation of the SL structures in two dimensions were sufficient to obtain significant results, and were not obtained better results with descriptors that utilize the structure representation in three dimensions. An additional study was accomplished relating the chemical structure, represented by the same molecular descriptors previously mentioned, with the cytotoxic activity of 37 SLs against tumoral cells derived from human carcinoma of the nasopharynx (KB). An equation with significant statistical indexes was obtained. The five descriptors, selected from the more statistical significant equation, shows a global description of sterical properties and electronic characteristics of each molecule that aid in the determination of important structural fragments for the cytotoxic activity. From the model can be verified that the carbon skeletons of the guaianolide and pseudoguaianolide types are encountered in the SLs that show the higher cytotoxic activity. Asteraceae Botânica (Classificação) Descritores Moleculares Kohonen Mapas Auto-Organizáveis Produtos naturais Quimiotaxonomia Redes Neurais Asteraceae Botanic Chemotaxonomy Kohonen Molecular Descriptors Natural products Neural Networks Self-Organizing Maps

Search results