21 |
The Estimation of Selected Physicochemical Properties of Organic CompoundsAl-Antary, Doaa Tawfiq, Al-Antary, Doaa Tawfiq January 2018 (has links)
Thermodynamic relationships are used to predict several physicochemical properties of organic compounds. As described in chapter one, the UPPER model (Unified Physicochemical Property Estimation Relationships) has been used to predict nine essential physicochemical properties of pure compounds. It was developed almost 25 years ago and has been validated by the Yalkowsky group for almost 2000 aliphatic, aromatic, and polyhalogenated hydrocarbons. UPPER is based on a group of additive and nonadditive descriptors along with a series of well-accepted thermodynamic relationships. In this model, the two-dimensional chemical structure is the only input needed.
Chapter (1) extends the applicability of UPPER to hydrogen bonding and non-hydrogen bonding aromatic compounds with several functional groups such as alcohol, aldehyde, ketone, carboxylic acid, carbonate, carbamate, amine, amide, nitrile as well as aceto, and nitro compounds. The total data set includes almost 3000 compounds. Aside from the enthalpies and entropies of melting and boiling, no training set is used for the calculation of the properties. The results show that UPPER enables a reasonable estimation of all the considered properties.
Chapter (2) uses modification of the van't Hoff equation to predict the solubility of organic compounds in dry octanol as explained in chapter two. The equation represents a linear relationship between the logarithm of the solubility of a solute in octanol to its melting temperature. More than 620 experimentally measured octanol solubilities, collected from the literature, are used to validate the equation without using any regression or fitting. The average absolute error of the prediction is 0.66 log units.
Chapter (3) compares the use of a statistic based model for the prediction of aqueous solubility to the existing general solubility equation (GSE).
|
22 |
Robust Machine Learning QSPR Models for Recognizing High Performing MOFs for Pre-Combustion Carbon Capture and Using Molecular Simulation to Study Adsorption of Water and Gases in Novel MOFsDureckova, Hana January 2018 (has links)
Metal organic frameworks (MOFs) are a class of nanoporous materials composed through self-assembly of inorganic and organic structural building units (SBUs). MOFs show great promise for many applications due to their record-breaking internal surface areas and tunable pore chemistry. This thesis work focuses on gas separation applications of MOFs in the context of carbon capture and storage (CCS) technologies. CCS technologies are expected to play a key role in the mitigation of anthropogenic CO2 emissions in the near future. In the first part of the thesis, robust machine learning quantitative structure-property relationship (QSPR) models are developed to predict CO2 working capacity and CO2/H2 selectivity for pre-combustion carbon capture using the most topologically diverse database of hypothetical MOF structures constructed to date (358,400 MOFs, 1166 network topologies). The support vector regression (SVR) models are developed on a training set of 35,840 MOFs (10% of the database) and validated on the remaining 322,560 MOFs. The most accurate models for CO2 working capacities (R2 = 0.944) and CO2/H2 selectivities (R2 = 0.876) are built from a combination of six geometric descriptors and three novel y-range normalized atomic-property-weighted radial distribution function (AP-RDF) descriptors. 309 common MOFs are identified between the grand canonical Monte Carlo (GCMC) calculated and SVR-predicted top-1000 high-performing MOFs ranked according to a normalized adsorbent performance score. This work shows that SVR models can indeed account for the topological diversity exhibited by MOFs.
In the second project of this thesis, computational simulations are performed on a MOF, CALF-20, to examine its chemical and physical properties which are linked to its exceptional water-resisting ability. We predict the atomic positions in the crystal structure of the bulk phase of CALF-20, for which only a powder X-ray diffraction pattern is available, from a single crystal X-ray diffraction pattern of a metastable phase of CALF-20. Using the predicted CALF-20 structure, we simulate adsorption isotherms of CO2 and N2 under dry and humid conditions which are in excellent agreement with experiment. Snapshots of the CALF-20 undergoing water sorption simulations reveal that water molecules in a given pore adsorb and desorb together due to hydrogen bonding. Binding sites and binding energies of CO2 and water in CALF-20 show that the preferential CO2 uptake at low relative humidities is driven by the stronger binding energy of CO2 in the MOF, and the sharp increase in water uptake at higher relative humidities is driven by the strong intermolecular interactions between water.
In the third project of this thesis, we use computational simulations to investigate the effects of residual solvent on Ni-BPM’s CH4 and N2 adsorption properties. Single crystal X-ray diffraction data shows that there are two sets of positions (Set 1 and 2) that can be occupied by the 10 residual DMSO molecules in the Ni-BPM framework. GCMC simulations of CH4 and N2 uptake in Ni-BPM reveal that CH4 uptake is in closest agreement with experiment when the 10 DMSO’s are placed among the two sets of positions in equal ratio (Mixed Set). Severe under-prediction and over-prediction of CH4 uptake are observed when the DMSO’s are placed in Set1 and Set 2 positions, respectively. Through binding site analysis, the CH4 binding sites within the Ni-BPM framework are found to overlap with the Set 1 DMSO positions but not with the Set 2 DMSO positions which explains the deviations in CH4 uptake observed for these cases. Binding energy calculations reveal that CH4 molecules are most stabilized when the DMSO’s are in the Mixed Set of positions.
|
23 |
Computer-Aided Molecular Design (CAMD) Using Signature Molecular Descriptors To Identify New Corrosion Inhibitors for Steel Reinforced ConcreteMohamed, Ahmed 02 August 2023 (has links)
No description available.
|
24 |
Modelado predictivo de sistemas complejos para informática molecular : desarrollo de métodos de selección y aprendizaje de características en presencia de incertidumbreCravero, Fiorella 13 March 2020 (has links)
En la actualidad existe una necesidad creciente de guiar el descubrimiento in silico de nuevos polímeros industriales mediante enfoques de Aprendizaje Maquinal supervisado que identifiquen correlaciones estructura-propiedad a partir de la información contenida en bases de datos de materiales, donde cada uno de estos está caracterizado mediante Descriptores Moleculares (DMs). Estas correlaciones se conocen como Modelos de Relación Cuantitativa Estructura-Actividad/Propiedad (QSAR/QSPR, por las siglas en inglés de Quantitative Structure-Activity/Property Relationship) y pueden ser empleadas para predecir propiedades de interés previo a la etapa de síntesis química, contribuyendo de este modo a acelerar el diseño de nuevos materiales y reducir sus costos de desarrollo.
El modelado QSAR/QSPR ya ha sido ampliamente empleado en Informática Molecular para el Diseño Racional de Fármacos asistido por computadoras. Sin embargo, los materiales poliméricos son significativamente más complejos que las moléculas pequeñas como las drogas, dado que están integrados por colecciones de macromoléculas compuestas por miles de cadenas que, a su vez, se forman por la unión de cientos de miles de Unidades Repetitivas Estructurales (UREs). Estas cadenas poseen diferentes pesos moleculares (o largos de cadena) y, a su vez, aparecen con distintas frecuencias dentro de cada material. Este fenómeno, conocido como polidispersión, es la principal razón de que muchas aproximaciones informáticas desarrolladas para el diseño racional de fármacos no sean directamente aplicables, ni lo suficientemente efectivas, en el ámbito de la Informática de Polímeros.
El objetivo general de esta tesis es contribuir con soluciones para distintas cuestiones relativas a la representación computacional y algoritmia que surgen durante el modelado QSPR de propiedades de polímeros polidispersos de alto peso molecular, con especial énfasis en el tratamiento del problema de selección de descriptores moleculares. Las variaciones en la frecuencia de las cadenas de diferentes largos hacen que la descripción de la estructura de un material polimérico contenga incertidumbre, en contraste con lo que sucede en la caracterización estructural típica de una molécula pequeña. No obstante esto, debido a la complejidad de modelar esta incertidumbre, la mayoría de los estudios QSAR/QSPR han utilizado hasta ahora modelos moleculares simples y univaluados, es decir, calculan los descriptores moleculares para una única instancia de peso, de entre todas las posibles cadenas que conforman un material. En particular, la casi totalidad de estos estudios usan descriptores calculados sobre una única URE, sin tener en cuenta la polidispersión. En tal sentido, esta tesis propone investigar
distintas alternativas de selección y aprendizaje de características para modelado QSPR con incertidumbre, que exploren la efectividad de otras representaciones computacionales más realistas para los materiales poliméricos.
En primer lugar, se presenta una metodología híbrida que emplea tanto algoritmos de Selección de Características como de Aprendizaje de Características, a fin de evaluar la máxima capacidad predictiva que se puede alcanzar con la tradicional representación univaluada URE. En segundo lugar, se proponen nuevas representaciones univaluadas, basadas en pesos moleculares promedios, denominadas como modelos moleculares Mn y Mw, cuyas capacidades para inferir modelos QSPR son contrastadas con el modelo molecular URE.
La siguiente alternativa propuesta estudia una representación computacional trivaluada, basada en la integración de los modelos moleculares univaluados URE, Mn y Mw en una única base de datos, la cual permite capturar parcialmente el fenómeno de la polidispersión. Esta caracterización computacional logra mejorar la generalizabilidad de los modelos QSPR obtenidos durante el proceso aprendizaje supervisado, en comparación con los inferidos mediante enfoques de representación univaluados. Sin embargo, esta nueva representación sigue sin contemplar las frecuencias de aparición de los distintos largos de cadena dentro de un material.
Por último, como contribución final de esta tesis se propone una representación computacional multivaluada, basada en el perfil polidisperso real de un material, donde cada descriptor queda caracterizado por una distribución probabilística discreta. En este contexto, las técnicas de selección de características empleadas para representaciones univaluadas ya no resultan aplicables, y surge la necesidad de contar con algoritmos que permitan operar sobre este nuevo modelo molecular. Como consecuencia de esto, se presenta el diseño e implementación de un algoritmo para selección de características multivaluadas. Este nuevo método, FS4RVDD (como sigla de su nombre en inglés Feature Selection for Random Variables with Discrete Distribution), logra un desempeño prometedor en todos los escenarios experimentales ensayados en estas investigaciones. / Nowadays, there is an increasing need to lead the in silico discovery of new industrial polymers through supervised Machine Learning approaches that identify structure-property correlations from the information contained in material databases, where each of them is characterized by Molecular Descriptors (MDs). These correlations are known as Quantitative Structure-Activity/Property Relationship models (QSAR/QSPR). They can be used to predict desirable properties of new materials before the synthesis stage, contributing to accelerate the design of new materials and to reduce the associated development costs.
QSAR/QSPR modeling is widely used in Molecular Informatics for Computer-Aided Drug Design. However, polymeric materials are significantly more complex than small molecules such as drugs, since they are collections of macromolecules that consist of a large number of structural repetitive units (SRUs) linked together in thousands of chain-like structures. These chains have different molecular weights (or lengths) and, in turn, they appear with different frequencies within each material. This phenomenon, known as polydispersity, is the main reason why many approaches developed for rational drug design are neither directly applicable nor sufficiently effective in the field of Polymer Informatics.
The main objective of this thesis is to contribute with solutions for various issues related to computational representation and algorithm development that arise during the QSPR modeling of properties of high molecular weight polydisperse polymers, with special emphasis on the Feature Selection problem. Because of frequency variations in the different chain lengths, the characterization of the polymeric material structure contains uncertainty, in contrast with the typical structural characterization of a small molecule. However, to deal with the uncertainty that introduces the polydispersity of polymeric materials, most of the QSAR/QSPR studies, until now, have used simple and univalued molecular models, that is, they calculate the molecular descriptors for a single instance of weight among all the possible chains that constitute a material. In particular, most QSPR studies use descriptors calculated on a single SRU, regardless of polydispersity. In this context, the present thesis proposes to investigate different alternatives of Feature Selection and Feature Learning for QSPR modeling with uncertainty that explore the effectiveness of more realistic computational representations for polymeric materials.
First, a hybrid methodology that uses MDs from both Feature Selection and Feature Learning algorithms is presented to evaluate the maximum predictive capability the traditional univalued representation (URE) can achieved. Then, new univalued representations based on average molecular weights are proposed, called Mn molecular model and Mw molecular model, whose capabilities to infer QSPR models are contrasted with the URE molecular model ones.
The other alternative computational representation proposes is trivalued MDs, based on the integration of URE, Mn, and Mw univalued molecular models into a single database. This representation partially captures the polydispersity inherent to polymers. This computational characterization improves the generalizability of QSPR models obtained during the supervised learning process, compared to those inferred through univalued representation approaches. However, this new trivalued representation still does not contemplate the frequencies of appearance of the different chain lengths within a material.
Finally, this thesis contributes with a multivalued computational representation based on the actual polydisperse profile of a material, in which each descriptor is characterized by a probabilistic discrete distribution. In this context, the Feature Selection techniques used for univalued representations are no longer applicable, and there is a need for algorithms to deal with this new multivalued molecular model. To face this need, both the design and implementation of an algorithm for the selection of multivalued features are presented here. This new method is called Feature Selection for Random Variables with Discrete Distribution (FS4RVDD), and it achieves a promising performance in all the experimental scenarios tested in these investigations.
|
25 |
Predicción de propiedades de sustancias y materiales de interés en la industria química a través del desarrollo de métodos computacionalesPalomba, Damián 17 March 2014 (has links)
El objetivo de esta Tesis es desarrollar métodos computacionales predictivos
para propiedades específicas de compuestos de interés en la industria química,
particularmente en la industria farmacéutica y de materiales poliméricos. Para
desarrollar la metodología de trabajo se utilizó como herramienta la técnica Relación
Cuantitativa Estructura/Propiedad (QSPR) (Quantitative Structure/Property
Relationship), que consiste en relacionar cuantitativamente diferentes parámetros de una
entidad química (por ejemplo una molécula pequeña o un polímero) con una propiedad
bien definida de la misma. Este trabajo se plantea como un estudio interdisciplinario, de
forma tal que la técnica QSPR sea enriquecida con el conocimiento del ensayo de
medición de las propiedades que se buscan predecir y fundamentalmente con los
aspectos físico-químicos involucrados.
La metodología de trabajo se aplicó en una primera instancia a la predicción de
propiedades de drogas y compuestos orgánicos en general y, en una segunda, a
propiedades de materiales poliméricos. Las propiedades que se exploraron vinculadas a
las drogas y compuestos orgánicos fueron algunas de las físico-químicas relacionadas al
comportamiento ADMET (absorción, distribución, metabolismo, excreción y toxicidad)
de los mismos. Estas fueron la absorción intestinal humana (AIH) (Human Intestinal
Absorption) y el pasaje de la barrera hemato-encefálica (BHE) (Blood-Brain Barrier),
ambas esenciales para el desarrollo de nuevos fármacos. Asimismo, se estudiaron los
compuestos orgánicos volátiles (VOCs) (volatile organic compounds) que son gases
emitidos de ciertos sólidos o líquidos. Se predijeron sus coeficientes de distribución
sangre-hígado (log Pliver), que se pueden emplear en la evaluación de riesgos y toma de
decisiones en políticas de salud pública. Por otro lado, con respecto al campo de los
materiales poliméricos se exploraron diferentes propiedades. Una de ellas es una
propiedad térmica, la temperatura de transición vítrea (Tg), la cual se relaciona con el
desempeño mecánico y la procesabilidad del material; las restantes son propiedades
mecánicas derivadas del ensayo de tracción en una dimensión: elongación a la rotura
(Elongation at Break), resistencia a la rotura (Strength at Break) y módulo elástico o
de Young (Tensile Modulus). Estas propiedades mecánicas brindan información
relacionada con la ductilidad, resistencia y rigidez de un material polimérico
respectivamente, y junto con otras definen su perfil de aplicación estructural.
La Tesis se organiza, de modo general, en dos grandes bloques en relación con
el material al cual se aplica la predicción: drogas y compuestos orgánicos volátiles
(compuestos de interés farmacéutico y de salud pública) por un lado, y por el otro,
materiales poliméricos (materiales de interés en la industria química). Esta estructura
obedece a las significativas diferencias moleculares entre los compuestos de trabajo de
los cuales se obtiene la propiedad a predecir, denominada propiedad objetivo o target, y
por lo tanto de aquí surgen también los distintos enfoques con los que se plantearon
cada una de las predicciones.
La contribución original en el área de las drogas y compuestos orgánicos
volátiles fue el desarrollo de nuevos modelos de predicción para las propiedades
previamente mencionadas, mediante un enfoque semi-automático (un método de
selección automática de variables combinado con una selección manual guiada por el
conocimiento experto) que se puede aplicar también para modelar otras propiedades y
otros compuestos. También el aporte del conocimiento físico-químico durante la fase de
modelado conduciendo a modelos más aceptables, ya que son más fáciles de interpretar
y tienden a generalizar mejor a los compuestos de diseño (virtuales), es decir
compuestos aún no sintetizados.
Con relación al campo de los materiales poliméricos, las contribuciones
novedosas fueron generar diferentes modelos para predecir la propiedad térmica y las
propiedades mecánicas nombradas. Se desarrolló un prototipo molecular sintético,
consistente en una estructura trimérica, para representar a los polímeros. Se propusieron
nuevos descriptores para materiales poliméricos mediante un enfoque original de las
cadenas de los polímeros, distinguiendo los fragmentos que corresponden
respectivamente a la cadena principal y a la cadena lateral. Se obtuvo un modelo de
predicción para la Tg enriquecido con el conocimiento físico-químico subyacente del
fenómeno estudiado y se presentó una explicación estructural detallada de los
descriptores del modelo y su relación con la propiedad estudiada. Luego, se validó el
prototipo molecular (trímero) en relación a estructuras más complejas (31 unidades
repetitivas). Con respecto a las propiedades mecánicas, se presentó un set de datos de
trabajo que se recopiló y depuró para polímeros sintéticos a partir de fuentes
disponibles. Se propusieron descriptores: por un lado, nuevos de cadena de polímeros, y
por el otro, parámetros experimentales. Finalmente, se demostró la utilidad de
incorporar información experimental del ensayo de tensión junto con estrategias
estructurales para abordar la predicción, generando así herramientas más inteligentes e
interpretables para el diseño de nuevos materiales con un perfil de aplicación específico. / The goal of this Thesis is to develop predictive computational methods for
specific properties of compounds of interest in the chemical industry, particularly in
pharmaceutical and polymeric materials industry. In order to develop the working
method, the Quantitative Structure/Property Relationship (QSPR) technique was
utilized, which relates quantitatively different parameters of an entity (e.g. a molecule or
polymer) with an own well-defined process, such as a property. This work is planned as
an interdisciplinary study, with the aim of improving the QSPR technique by means of
physicochemical comprehension and the knowledge of target property measurement
test.
Firstly, the method was applied to predict properties of drugs and general
organic compounds and, secondly, to predict polymeric materials properties.
Physicochemical properties related to the ADMET (absorption, distribution,
metabolism, excretion and toxicity) behavior of drugs and organic compounds were
explored. These were the Human Intestinal Absorption (HIA) and the Blood Brain
Barrier (BBB) penetration, both essential for drug development. Furthermore, the
volatile organic compounds (VOCs) were studied, which are gases emitted from certain
solids or liquids. Their blood-to-liver partition coefficients (log Pliver) were predicted; it
can be applied to risk assessment and decision making in public health policies.
Regarding to the polymeric materials field, several properties were studied. One of them
is a thermal property, the glass transition temperature (Tg), which is related to the
processability and material mechanical performance; the remaining ones are tensile
properties: elongation at break, strength at break, and tensile modulus. These
mechanical properties provide information related to the ductility, strength, and stiffness
of a polymeric material, respectively and, along with other ones, define its structural
application profile.
This Thesis can be broadly divided into two main categories, according to the
material that prediction is performed: drugs and volatile organic compounds
(compounds of interest in pharmaceutical industry and public health) on the one hand,
and polymeric materials (materials of interest in the chemical industry) on the other.
This structure is due to significant molecular differences between the working
compounds (organic and polymeric materials) from which the property to predict is
obtained (target property), and therefore to the different approaches whereby each
prediction was addressed.
The original contribution in the drugs and volatile organic compounds field was
the development of new predictive models for the aforementioned properties, using a
semi-automatic approach (an automatic-variable-selection method combined with a
knowledge-aided-manual selection) that can also be applied so as to model another
properties. Moreover, during the modeling phase, the contribution of the physicalchemical
knowledge led to acceptable models since they are easier to interpret and tend
to better generalize design compounds (virtual), i.e. not-yet-synthesized compounds.
Regarding the polymeric materials science, the generation of different models
for predicting the already mentioned thermal property and the mechanical properties
was a novel contribution. A molecular prototype, consisting of a trimeric structure, was
used in order to represent the polymers. New descriptors were proposed for polymeric
materials by means of a polymer chains approach, the main and side chain. A prediction
model for Tg was obtained, enriched by the underlying physicochemical knowledge
from the studied phenomenon, and a detailed structural explanation of the model
descriptors and its relation to the studied property was presented. Afterwards, the
molecular prototype (trimer) was validated against to more complex structures (31
repeating units). With respect to tensile properties, a tailor-made dataset was presented.
Several descriptors were proposed: new ones of polymer chain, and alternatively,
experimental parameters. Finally, we demonstrated the usefulness of considering
experimental information from the tensile test along with structural strategies to tackle
the prediction, thereby more intelligent tools for the design of new materials with a
specific application profile are provided.
|
26 |
Modeling and visualization of complex chemical data using local descriptors / La modélisation et la visualisation de données chimiques complexes en utilisant les descripteurs locauxGlavatskikh, Marta 09 July 2018 (has links)
Cette étude considère des systèmes où non seulement la structure moléculaire, mais les conditions expérimentales sont impliquées. Les structures chimiques ont été codées par des descripteurs locaux ISIDA MA ou ISIDA CGR, ciblant spécifiquement les centres actifs et leur environnement le plus proche. Les descripteurs locaux ont été combinés avec les paramètres spécifiques des conditions expérimentales, codant ainsi un objet chimique particulier. La méthodologie a été appliquée avec succès pour la modélisation QSPR des paramètres thermodynamiques et cinétiques des interactions intermoléculaires (liaisons halogène et hydrogène), des équilibres tautomères et des réactions chimiques (cycloaddition et SN1). La méthode GTM a été appliquée pour la première fois pour la modélisation et la visualisation de données chimiques mixtes. La méthode sépare avec succès les groupes de données à la fois en raison des structures et des conditions. / This work describes original approaches for predictive chemoinformatics modeling of molecular interactions and reactions as a function of the structures of interacting partners and of the chemical environment (experimental conditions). Chemical structures have been encoded by local ISIDA MA-based or CGR-based descriptors, specifically targeting the active centers and their closest environment. The local descriptors have been combined with the specific parameters of experimental conditions, thereby encoding a particular chemical object. The methodology has been successfully applied for QSPR modeling of thermodynamic and kinetic parameters of intermolecular interactions (halogen and hydrogen bonds), tautomeric equilibria and chemical reactions (cycloaddition and SN1). GTM method has been applied for the first time for QSPR modeling and visualization of mixed chemical data. This method successfully separates data clusters on account of both chemical structures and experimental conditions.
|
27 |
Modélisation QSPR de solvants d’intérêt technologique : les liquides ioniques et les électrolytes pour batteries Li-ion / QSPR modelling of technologically interesting solvents : the ionic liquids and the electrolytes for Li-ion batteriesDelouis, Grace 26 September 2017 (has links)
Cette thèse a pour but de modéliser les liquides ioniques et les électrolytes pour batteries Li-ion. Nous avons développé des modèles SVR afin de prédire 9 propriétés d’intérêt pour ces solvants. Les modèles construits pour les liquides ioniques ont permis la détection de divers problèmes, et sont accessibles sur le site web du laboratoire : infochim.u-strasbg.fr/webserv/VSEngine.html. Les modèles construits pour les électrolytes ont permis la modélisation de candidats testés expérimentalement par nos collaborateurs. Le nombre de données étant limité pour ces solvants, nous avons également testé l’approche transductive par le biais de la TRR (Transductive Ridge Regression). Nous avons mis en place un protocole d’optimisation des paramètres de la méthode et appliqué la TRR aux solvants étudiés. Les résultats obtenus par la TRR sont légèrement meilleurs que ceux de la Régression Ridge, mais restent modestes si on veut éviter une détérioration accidentelle du modèle. / This thesis is dedicated to the modelling of ionic liquids and electrolytes of Li-ion batteries. We developed several SVR models in order to predict 9 interesting properties of these solvents. The models built for the ionic liquids allowed us to detect several problems, and are freely available on the laboratory’s website: infochim.u-strasbg.fr/webserv/VSEngine.html. The models built for the electrolytes were used to model some candidates tested experimentally by our colleagues. As the amount of data is quite small for these solvents, we also tested the transductive approach with the help of the TRR (Transductive Ridge Regression). We have developed an optimization procedure for the method’s parameters, and applied the TRR to the studied solvents. The results obtained with the TRR are slightly better than of the Ridge Regression but stay modest if we want to avoid any accidental damage of the model.
|
28 |
Modelos de predição do coeficiente de sorção no solo de pesticidas não iônicos: diferentes algoritmos de logP e uma abordagem alternativa de logS.Reis, Ralpho Rinaldo dos 17 May 2013 (has links)
Made available in DSpace on 2017-05-12T14:46:52Z (GMT). No. of bitstreams: 1
Ralpho.pdf: 2205542 bytes, checksum: 37ae4ee862cc62b72b5ed65409967739 (MD5)
Previous issue date: 2013-05-17 / Collecting data on pesticide effects on the environment and several ecosystems is a slow
and costly process. Therefore, significant research efforts have been focused on developing
mathematical models to predict physical, chemical or biological properties of environmental
interest. The soil sorption coefficient normalized to organic carbon content (Koc) is a
physicochemical key parameter used in environmental risk assessments of substances
released into the environment. Thus, several logKoc prediction models that use hydrophobic
parameter (logP) or the logarithm of water solubility (logS) as descriptor have been reported
in the last decades. Mostly, due to the lack of reliable experimental values of logP or logS,
algorithms are used to calculate such properties. Despite the availability and easiness to
access several algorithms for this purpose, scientific studies do not describe the procedure
adopted to choose the algorithm used in quantitative structure-property relationship (QSPR)
studies. Furthermore, the strong correlation between logP and logS prevents their application
in the same mathematical equation obtained by multiple linear regression method. Since the
sorption process of a chemical compound in soil is related both to its water solubility and its
water/organic matter partition, it is expected models that are able to combine these two
properties will can record more realistic results. This doctoral dissertation consists of two
scientific papers. In the first one, a study was carried out to check the influence of choosing
logP algorithm on logKoc modeling. Models were constructed to relate logKoc with logP
according to different freeware algorithms. All models were assessed based on their statistic
qualities and predictive power. The obtained results clearly showed that an arbitrary choice
of the algorithm may not result in the best prediction model. On the other hand, a good
choice can lead to obtaining simple models with statistic qualities and predictive power
comparable to more complex models. The second paper aims at proposing an alternative
approach for logKoc modeling, using simple descriptor of solubility, here referred as logarithm
of corrected solubility by octanol/water partition (logSP). Thus, models were built with this
descriptor and also with logP and logS conventional descriptors, which are isolated or
associated with other explicative variables of easy physicochemical interpretation. The
obtained models were validated and compared to other models previously published. The
results showed that the use of logSP descriptor to replace the conventional ones led to
obtaining simple models with statistic qualities and predictive power that are higher than
other more complex models already found in literature. / A coleta de dados relativos aos danos causados pelos pesticidas sobre o meio ambiente e
seus ecossistemas é lenta e onerosa. Desta maneira, grandes incentivos têm sido
destinados às pesquisas que visam à construção de modelos matemáticos para predição de
propriedades físicas, químicas ou biológicas de interesse ambiental. O coeficiente de sorção
no solo normalizado para o conteúdo de carbono orgânico (Koc) é um importante parâmetro
físico-químico utilizado nas avaliações de riscos ambientais das substâncias lançadas no
meio ambiente. Assim, vários modelos para predição de logKoc, utilizando o parâmetro
hidrofóbico (logP) ou o logaritmo da solubilidade em água (logS) como descritores, têm sido
publicados nas últimas décadas. Muitas vezes, em virtude da ausência de valores
experimentais confiáveis de logP ou logS, são usados algoritmos para o cálculo dessas
propriedades. Apesar da disponibilidade e facilidade de acesso a diversos algoritmos para
tal finalidade, os artigos científicos não descrevem o procedimento adotado para escolha do
algoritmo usado nos estudos QSPR. Além disto, a forte correlação entre logP e logS impede
que sejam usados em uma mesma equação obtida por regressão linear múltipla. Como o
processo de sorção de um composto químico no solo está relacionado tanto com sua
solubilidade em água como com sua partição água/matéria orgânica, espera-se que
modelos que sejam capazes de combinar essas duas informações possam gerar resultados
mais realistas. Este trabalho de tese é constituído de dois artigos. No primeiro artigo, foi feito
um estudo para verificar a influência da escolha do algoritmo de logP na modelagem de
logKoc. Foram construídos modelos que relacionam logKoc com logP a partir de diferentes
algoritmos livres disponíveis. Todos os modelos foram avaliados quanto às suas qualidades
estatísticas e poder de predição. Os resultados obtidos mostraram claramente que uma
escolha arbitrária deste algoritmo pode não levar ao melhor modelo de predição. Por outro
lado, uma boa escolha pode conduzir à obtenção de modelos simples com qualidades
estatísticas e poder de predição comparáveis a de modelos mais complexos. No segundo
artigo, o objetivo foi a proposição de uma abordagem alternativa para a modelagem de
logKoc, utilizando um descritor simples de solubilidade, aqui designado como logaritmo da
solubilidade corrigida pela partição octanol/água (logSP). Assim, foram construídos modelos
com tal descritor e também com os descritores convencionais logP e logS, isolados ou
associados com outras variáveis explicativas de fácil interpretação físico-química. Os
modelos obtidos foram validados e comparados com outros modelos publicados
anteriormente. Os resultados mostraram que o uso do descritor logSp em substituição aos
descritores convencionais conduziu à obtenção de modelos simples com qualidades
estatísticas e poder de predição superiores a de outros modelos mais complexos
encontrados na literatura.
|
29 |
Modelos de predição do coeficiente de sorção no solo de pesticidas não iônicos: diferentes algoritmos de logP e uma abordagem alternativa de logS.Reis, Ralpho Rinaldo dos 17 May 2013 (has links)
Made available in DSpace on 2017-07-10T19:23:40Z (GMT). No. of bitstreams: 1
Ralpho.pdf: 2205542 bytes, checksum: 37ae4ee862cc62b72b5ed65409967739 (MD5)
Previous issue date: 2013-05-17 / Collecting data on pesticide effects on the environment and several ecosystems is a slow
and costly process. Therefore, significant research efforts have been focused on developing
mathematical models to predict physical, chemical or biological properties of environmental
interest. The soil sorption coefficient normalized to organic carbon content (Koc) is a
physicochemical key parameter used in environmental risk assessments of substances
released into the environment. Thus, several logKoc prediction models that use hydrophobic
parameter (logP) or the logarithm of water solubility (logS) as descriptor have been reported
in the last decades. Mostly, due to the lack of reliable experimental values of logP or logS,
algorithms are used to calculate such properties. Despite the availability and easiness to
access several algorithms for this purpose, scientific studies do not describe the procedure
adopted to choose the algorithm used in quantitative structure-property relationship (QSPR)
studies. Furthermore, the strong correlation between logP and logS prevents their application
in the same mathematical equation obtained by multiple linear regression method. Since the
sorption process of a chemical compound in soil is related both to its water solubility and its
water/organic matter partition, it is expected models that are able to combine these two
properties will can record more realistic results. This doctoral dissertation consists of two
scientific papers. In the first one, a study was carried out to check the influence of choosing
logP algorithm on logKoc modeling. Models were constructed to relate logKoc with logP
according to different freeware algorithms. All models were assessed based on their statistic
qualities and predictive power. The obtained results clearly showed that an arbitrary choice
of the algorithm may not result in the best prediction model. On the other hand, a good
choice can lead to obtaining simple models with statistic qualities and predictive power
comparable to more complex models. The second paper aims at proposing an alternative
approach for logKoc modeling, using simple descriptor of solubility, here referred as logarithm
of corrected solubility by octanol/water partition (logSP). Thus, models were built with this
descriptor and also with logP and logS conventional descriptors, which are isolated or
associated with other explicative variables of easy physicochemical interpretation. The
obtained models were validated and compared to other models previously published. The
results showed that the use of logSP descriptor to replace the conventional ones led to
obtaining simple models with statistic qualities and predictive power that are higher than
other more complex models already found in literature. / A coleta de dados relativos aos danos causados pelos pesticidas sobre o meio ambiente e
seus ecossistemas é lenta e onerosa. Desta maneira, grandes incentivos têm sido
destinados às pesquisas que visam à construção de modelos matemáticos para predição de
propriedades físicas, químicas ou biológicas de interesse ambiental. O coeficiente de sorção
no solo normalizado para o conteúdo de carbono orgânico (Koc) é um importante parâmetro
físico-químico utilizado nas avaliações de riscos ambientais das substâncias lançadas no
meio ambiente. Assim, vários modelos para predição de logKoc, utilizando o parâmetro
hidrofóbico (logP) ou o logaritmo da solubilidade em água (logS) como descritores, têm sido
publicados nas últimas décadas. Muitas vezes, em virtude da ausência de valores
experimentais confiáveis de logP ou logS, são usados algoritmos para o cálculo dessas
propriedades. Apesar da disponibilidade e facilidade de acesso a diversos algoritmos para
tal finalidade, os artigos científicos não descrevem o procedimento adotado para escolha do
algoritmo usado nos estudos QSPR. Além disto, a forte correlação entre logP e logS impede
que sejam usados em uma mesma equação obtida por regressão linear múltipla. Como o
processo de sorção de um composto químico no solo está relacionado tanto com sua
solubilidade em água como com sua partição água/matéria orgânica, espera-se que
modelos que sejam capazes de combinar essas duas informações possam gerar resultados
mais realistas. Este trabalho de tese é constituído de dois artigos. No primeiro artigo, foi feito
um estudo para verificar a influência da escolha do algoritmo de logP na modelagem de
logKoc. Foram construídos modelos que relacionam logKoc com logP a partir de diferentes
algoritmos livres disponíveis. Todos os modelos foram avaliados quanto às suas qualidades
estatísticas e poder de predição. Os resultados obtidos mostraram claramente que uma
escolha arbitrária deste algoritmo pode não levar ao melhor modelo de predição. Por outro
lado, uma boa escolha pode conduzir à obtenção de modelos simples com qualidades
estatísticas e poder de predição comparáveis a de modelos mais complexos. No segundo
artigo, o objetivo foi a proposição de uma abordagem alternativa para a modelagem de
logKoc, utilizando um descritor simples de solubilidade, aqui designado como logaritmo da
solubilidade corrigida pela partição octanol/água (logSP). Assim, foram construídos modelos
com tal descritor e também com os descritores convencionais logP e logS, isolados ou
associados com outras variáveis explicativas de fácil interpretação físico-química. Os
modelos obtidos foram validados e comparados com outros modelos publicados
anteriormente. Os resultados mostraram que o uso do descritor logSp em substituição aos
descritores convencionais conduziu à obtenção de modelos simples com qualidades
estatísticas e poder de predição superiores a de outros modelos mais complexos
encontrados na literatura.
|
30 |
Développement de modèles QSPR pour la prédiction et la compréhension des propriétés amphiphiles des tensioactifs dérivés de sucre / Development of QSPR models for the prediction and better understanding of amphiphilic properties of sugar-based surfactantsGaudin, Théophile 30 November 2016 (has links)
Les tensioactifs dérivés de sucres représentent la principale famille de tensioactifs bio-sourcés et constituent de bons candidats pour substituer les tensioactifs dérivés du pétrole puisqu'ils sont issus de ressources renouvelables et peuvent être autant, voire plus performants dans diverses applications, comme la formulation (détergents, cosmétiques,…), la récupération assistée du pétrole ou des minéraux, etc. Différentes propriétés amphiphiles permettent de caractériser la performance des tensioactifs dans de telles applications, comme la concentration micellaire critique, la tension de surface à la concentration micellaire critique, l'efficience et le point de Krafft. Prédire ces propriétés serait bénéfique pour identifier plus rapidement les tensioactifs possédant les propriétés désirées. Les modèles QSPR sont des outils permettant de prédire de telles propriétés, mais aucun modèle QSPR fiable dédié à ces propriétés n'a été identifié pour les tensioactifs bio-sourcés, et en particulier les tensioactifs dérivés de sucres. Au cours de cette thèse, de tels modèles QSPR ont été développés. Une base de données fiables est nécessaire pour développer tout modèle QSPR. Concernant les tensioactifs dérivés de sucres, aucune base de données existante n'a été identifiée pour les propriétés ciblées. Cela a donné suite à la construction de la première base de données de propriétés amphiphiles de tensioactifs dérivés de sucres, qui est en cours de valorisation. L'analyse de cette base de données a mis en évidence différentes relations empiriques entre la structure de ces molécules et leurs propriétés amphiphiles, et permis d'isoler des jeux de données les plus fiables et au protocole le plus homogène possibles en vue du développement de modèles QSPR. Après établissement d'une stratégie robuste pour calculer les descripteurs moléculaires constituant les modèles QSPR, qui s'appuie notamment sur des analyses conformationnelles des tensioactifs dérivés de sucres et des descripteurs des têtes polaires et chaînes alkyles, différents modèles QSPR ont été développés, validés, et leur domaine d'applicabilité spécifié, pour la concentration micellaire critique, la tension de surface à la concentration micellaire critique, l'efficience et le point de Krafft. Pour les trois premières propriétés, des modèles quantitatifs performants ont pu être obtenus. Si les descripteurs quantiques ont apporté un gain prédictif important pour la tension de surface à la concentration micellaire critique, et un léger gain pour la concentration micellaire critique, aucun gain n'a été observé pour l'efficience. Pour ces trois propriétés, des modèles simples basés sur des descripteurs constitutionnels des parties hydrophile et hydrophobe de la molécule (comme des décomptes d'atomes) ont aussi été obtenus. Pour le point de Krafft, deux arbres de décision qualitatifs, classant la molécule comme soluble ou insoluble dans l'eau à température ambiante, ont été proposés. Les descripteurs quantiques ont ici aussi apporté un gain en prédictivité, même si un modèle relativement fiable basé sur des descripteurs constitutionnels des parties hydrophile et hydrophobe de la molécule a aussi été obtenu. Enfin, nous avons montré comment ces modèles QSPR peuvent être utilisés, pour prédire les propriétés de nouvelles molécules avant toute synthèse dans un contexte de screening, ou les propriétés manquantes de molécules existantes, et pour le design in silico de nouvelles molécules par combinaison de fragments. / Sugar-based surfactants are the main family of bio-based surfactants and are good candidates as substitutes for petroleum-based surfactants, since they originate from renewable resources and can show as good as, or even better, performances in various applications, such as detergent and cosmetic formulation, enhanced oil or mineral recovery, etc. Different amphiphilic properties can characterize surfactant performance in such applications, like critical micelle concentration, surface tension at critical micelle concentration, efficiency and Kraft point. Predicting such properties would be beneficial to quickly identify surfactants that exhibit desired properties. QSPR models are tools to predict such properties, but no reliable QSPR model was identified for bio-based surfactants, and in particular sugar-based surfactants. During this thesis, such QSPR models were developed. A reliable database is required to develop any QSPR model. Regarding sugar-based surfactants, no database was identified for the targeted properties. This motivated the elaboration of the first database of amphiphilic properties of sugar-based surfactants. The analysis of this database highlighted various empirical relationships between the chemical structure of these molecules and their amphiphilic properties, and enabled to isolate the most reliable datasets with the most homogeneous possible protocol, to be used for the development of the QSPR models. After the development of a robust strategy to calculate molecular descriptors that constitute QSPR models, notably relying upon conformational analysis of sugar-based surfactants and descriptors calculated only for the polar heads and for the alkyl chains, different QSPR models were developed, validated, and their applicability domain defined, for the critical micelle concentration, the surface tension at critical micelle concentration, the efficiency and the Kraft point. For the three first properties, good quantitative models were obtained. If the quantum chemical descriptors brought a significant additional predictive power for the surface tension at critical micelle concentration, and a slight improvement for the critical micelle concentration, no gain was observed for efficiency. For these three properties, simple models based on constitutional descriptors of polar heads and alkyl chains of the molecule (like atomic counts) were also obtained. For the Krafft point, two qualitative decision trees, classifying the molecule as water soluble or insoluble at room temperature, were proposed. The use of quantum chemical descriptors brought an increase in predictive power for these decision trees, even if a quite reliable model only based on constitutional descriptors of polar heads and alkyl chains was also obtained. At last, we showed how these QSPR models can be used, to predict properties of new surfactants before synthesis in a context of computational screening, or missing properties of existing surfactants, and for the in silico design of new surfactants by combining different polar heads with different alkyl chain
|
Page generated in 0.1152 seconds