• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 17
  • 8
  • 8
  • Tagged with
  • 36
  • 13
  • 11
  • 11
  • 11
  • 8
  • 8
  • 7
  • 6
  • 6
  • 5
  • 4
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Estimation de l'écotoxicité de substances chimiques par des méthodes à noyaux / Estimation of ecotoxicity of chemicals by nucleus methods

Villain, Jonathan 24 June 2016 (has links)
Dans le domaine de la chimie et plus particulièrement en chimio-informatique, les modèles QSAR (pour Quantitative Structure Activity Relationship) sont de plus en plus étudiés. Ils permettent d’avoir une estimation in silico des propriétés des composés chimiques notamment des propriétés éco toxicologiques. Ces modèles ne sont théoriquement valables que pour une classe de composés (domaine de validité) et sont sensibles à la présence de valeurs atypiques. La thèse s’est focalisée sur la construction de modèles globaux robustes (intégrant un maximum de composés) permettant de prédire l’écotoxicité des composés chimiques sur une algue P. Subcapitata et de déterminer un domaine de validité dans le but de déduire la capacité de prédiction d’un modèle pour une molécule. Ces modèles statistiques robustes sont basés sur une approche quantile en régression linéaire et en régression Support Vector Machine. / In chemistry and more particularly in chemoinformatics, QSAR models (Quantitative Structure Activity Relationship) are increasingly studied. They provide an in silico estimation of the properties of chemical compounds including ecotoxicological properties. These models are theoretically valid only for a class of compounds (validity domain) and are sensitive to the presence of outliers. This PhD thesis is focused on the construction of robust global models (including a maximum of compounds) to predict ecotoxicity of chemical compounds on algae P. subcapitata and to determine a validity domain in order to deduce the capacity of a model to predict the toxicity of a compound. These robust statistical models are based on quantile approach in linear regression and regression Support Vector Machine.
2

Chemoinformetics for green chemistry

Liu, Tao January 2010 (has links)
This thesis focuses on the development of quantitative structure-activity relationship (QSPR) models for physicochemical properties, e.g., vapor pressure and partitioning coefficients. Such models can be used to estimate environmental distribution and transformation of the pollutants or to characterize solvents properties. Here, chemoinformatics was used as an efficient tool for modeling to produce safe chemicals based on green chemistry principles. Experimental determinations are only available for a limited number of the chemicals; however, theoretical molecular descriptors can be used for modeling of all organic compounds. In this thesis, we developed and validated a global and local QSPR model for vapor pressure of liquid and subcooled liquid organic compounds, in which perfluorinated compounds (PFCs) as outliers appeared in the model due to their molecular properties. Subsequently, after the update of the previous model, the vapor pressure of perfluorinated compounds (PFCs) for which no reliable experimental data are available was successfully predicted. At the same time, we used partitioning between n-octanol/water (Kow) and water solubility (Sw) to investigate the similarities and differences between linear solvation energy relationship (LSER) and partial least square projection to latent structures (PLS) models. Further, we developed QSPR model for prediction of melting points and boiling points of PFCs using multiple linear regression (MLR), PLS and associative neural networks (ASNN) approaches, meanwhile, the applicability domain of PFCs was also investigated. Experimental, semi-empirical and theoretical quantitative structure-retention relationship (QSRR) models were used to accurately predict retention factors (logk) in reversed-phase liquid chromatography (RPLC). These models are useful to characterize solvents for determination of the behavior and interactions of molecular structure and develop chromatographic methods. In both of QSPR and QSRR models using the PLS method, the first and second components captured main information which is related to van der Waals forces and polar interactions, and their results coincide with those from LSER. The results showed that the models of physicochemical properties and retention factors (logk) in chromatographic system can be successfully developed by the PLS method. PLS models were able to predict physicochemical properties of organic compounds directly from theoretical descriptors without prior synthesis, measurement or sampling. Further, the PLS method could overcome colinearity in data sets, and it is therefore a rapid, cheap and highly efficient approach
3

The development of bioinformatic and chemoinformatic approaches for structure-activity modelling and discovery of antimicrobial peptides

Fjell, Christopher David 05 1900 (has links)
The emergence of pathogens resistant to available drug therapies is a pressing global health problem. Antimicrobial peptides (AMPs) may potentially form new therapeutics to counter these pathogens. AMPs are key components in the mammalian innate immune system and are responsible for both direct killing and immunomodulatory effects in host defense against pathogenic organisms. This thesis describes computational methods for the identification of novel natural and synthetic AMPs. A bioinformatic resource was constructed for classification and discovery of gene- coded AMPs, consisting of a database of clustered known AMPs and a set of hidden Markov models (HMMs). One set of 146 clusters was based on the mature peptide sequence, and one set of 40 clusters was based on propeptide sequence. The bovine genome was analyzed using the AMPer resources, and 27 of the 34 known bovine AMPs were identified with high confidence and up to 69 AMPs were predicted to be novel peptides. One novel cathelicidin AMP was experimentally verified as up-regulated in response to infection in bovine intestinal tissue. A chemoinformatic analysis was performed to model the antibacterial activity of short synthetic peptides. Using high-throughput screening data for the activities of over 1400 peptides of diverse sequence, quantitative structure-activity relation (QSAR) models were created using artificial neural networks and physical characteristics of the peptide that included three-dimensional atomic structure. The models were used to predict the activity of a set of approximately 100,000 peptide sequence variants. After ranking the predicted activity, the models were shown to be very accurate. When 200 peptides were synthesized and screened using four levels of expected activity, 94% of the top 50 peptides expected to have the highest level of activity were found to be highly active. Several promising candidates were synthesized with high quality and tested against several multi- antibiotic-resistant pathogens including clinical strains of Pseudomonas aeruginosa, Staphylococcus aureus, Enterococcus faecalis and Escherichia coli. These peptides were found to be highly active against these pathogens as determined by minimal inhibitory concentration; this serves as independent confirmation of the effectiveness of high-throughput screening and in silico analysis for identifying peptide antibiotic drug leads.
4

The development of bioinformatic and chemoinformatic approaches for structure-activity modelling and discovery of antimicrobial peptides

Fjell, Christopher David 05 1900 (has links)
The emergence of pathogens resistant to available drug therapies is a pressing global health problem. Antimicrobial peptides (AMPs) may potentially form new therapeutics to counter these pathogens. AMPs are key components in the mammalian innate immune system and are responsible for both direct killing and immunomodulatory effects in host defense against pathogenic organisms. This thesis describes computational methods for the identification of novel natural and synthetic AMPs. A bioinformatic resource was constructed for classification and discovery of gene- coded AMPs, consisting of a database of clustered known AMPs and a set of hidden Markov models (HMMs). One set of 146 clusters was based on the mature peptide sequence, and one set of 40 clusters was based on propeptide sequence. The bovine genome was analyzed using the AMPer resources, and 27 of the 34 known bovine AMPs were identified with high confidence and up to 69 AMPs were predicted to be novel peptides. One novel cathelicidin AMP was experimentally verified as up-regulated in response to infection in bovine intestinal tissue. A chemoinformatic analysis was performed to model the antibacterial activity of short synthetic peptides. Using high-throughput screening data for the activities of over 1400 peptides of diverse sequence, quantitative structure-activity relation (QSAR) models were created using artificial neural networks and physical characteristics of the peptide that included three-dimensional atomic structure. The models were used to predict the activity of a set of approximately 100,000 peptide sequence variants. After ranking the predicted activity, the models were shown to be very accurate. When 200 peptides were synthesized and screened using four levels of expected activity, 94% of the top 50 peptides expected to have the highest level of activity were found to be highly active. Several promising candidates were synthesized with high quality and tested against several multi- antibiotic-resistant pathogens including clinical strains of Pseudomonas aeruginosa, Staphylococcus aureus, Enterococcus faecalis and Escherichia coli. These peptides were found to be highly active against these pathogens as determined by minimal inhibitory concentration; this serves as independent confirmation of the effectiveness of high-throughput screening and in silico analysis for identifying peptide antibiotic drug leads.
5

The development of bioinformatic and chemoinformatic approaches for structure-activity modelling and discovery of antimicrobial peptides

Fjell, Christopher David 05 1900 (has links)
The emergence of pathogens resistant to available drug therapies is a pressing global health problem. Antimicrobial peptides (AMPs) may potentially form new therapeutics to counter these pathogens. AMPs are key components in the mammalian innate immune system and are responsible for both direct killing and immunomodulatory effects in host defense against pathogenic organisms. This thesis describes computational methods for the identification of novel natural and synthetic AMPs. A bioinformatic resource was constructed for classification and discovery of gene- coded AMPs, consisting of a database of clustered known AMPs and a set of hidden Markov models (HMMs). One set of 146 clusters was based on the mature peptide sequence, and one set of 40 clusters was based on propeptide sequence. The bovine genome was analyzed using the AMPer resources, and 27 of the 34 known bovine AMPs were identified with high confidence and up to 69 AMPs were predicted to be novel peptides. One novel cathelicidin AMP was experimentally verified as up-regulated in response to infection in bovine intestinal tissue. A chemoinformatic analysis was performed to model the antibacterial activity of short synthetic peptides. Using high-throughput screening data for the activities of over 1400 peptides of diverse sequence, quantitative structure-activity relation (QSAR) models were created using artificial neural networks and physical characteristics of the peptide that included three-dimensional atomic structure. The models were used to predict the activity of a set of approximately 100,000 peptide sequence variants. After ranking the predicted activity, the models were shown to be very accurate. When 200 peptides were synthesized and screened using four levels of expected activity, 94% of the top 50 peptides expected to have the highest level of activity were found to be highly active. Several promising candidates were synthesized with high quality and tested against several multi- antibiotic-resistant pathogens including clinical strains of Pseudomonas aeruginosa, Staphylococcus aureus, Enterococcus faecalis and Escherichia coli. These peptides were found to be highly active against these pathogens as determined by minimal inhibitory concentration; this serves as independent confirmation of the effectiveness of high-throughput screening and in silico analysis for identifying peptide antibiotic drug leads. / Medicine, Faculty of / Medicine, Department of / Experimental Medicine, Division of / Graduate
6

Property-enriched fragment descriptors for adaptive QSAR / Descripteurs fragmentaux enrichis par propriété pour QSAR adaptatif

Ruggiu, Fiorella 22 September 2014 (has links)
Les descripteurs ISIDA enrichis par propriété ont été introduit pour encoder les structures moléculaires en chémoinformatique en tant que nombre d’occurrence de sous-graphes moléculaires spécifiques dont les sommets représentant les atomes sont colorés par des propriétés locales tel que les pharmacophores dépendant du pH, les identifiants de champs de force, les charges partielles, les incréments LogP ou les propriétés extraites d’un modèle QSAR. Ces descripteurs, par leurs large choix d’option, permettent à l’utilisateur de les adapter au problème à modéliser. Ils ont été utilisés avec succès dans une étude de criblage virtuel sur des inhibiteurs de protéases et plusieurs modèles QSAR sur le coefficient de partage octanol-eau, l’index d’hydrophobicité chromatographique, l’inhibition du canal hERG, la constante de dissociation acide, la force des accepteurs de liaison hydrogène et l’affinité de liaison des GPCR. / ISIDA property-enriched fragment descriptors were introduced as a general framework to numerically encode molecular structures in chemoinformatics, as counts of specific subgraphs in which atom vertices are coloured with respect to a local property - notably pH-dependent pharmacophore, force field, partial charges, logP increments and QSAR model extracted properties. The descriptors leave the user a vast choice in terms of the level of resolution at which chemical information is extracted into the descriptors to adapt them to the problem. They were successfully tested in neighbourhood behaviour and QSAR modelling challenges, with very promising results. They showed excellent results in similarity-based virtual screening for analogue protease inhibitors, and generated highly predictive octanol-water partition coefficient, chromatographic hydrophobicity index, hERG channel inhibition, acidic dissociation constant, hydrogen-bond acceptor strength and GPCR binding affinity models.
7

Computational strategies to identify, prioritize and design potential antimalarial agents from natural products

Egieyeh, Samuel Ayodele January 2015 (has links)
Philosophiae Doctor - PhD / Introduction: There is an exigent need to develop novel antimalarial drugs in view of the mounting disease burden and emergent resistance to the presently used drugs against the malarial parasites. A large amount of natural products, especially those used in ethnomedicine for malaria, have shown varying in-vitro antiplasmodial activities. Facilitating antimalarial drug development from this wealth of natural products is an imperative and laudable mission to pursue. However, the limited resources, high cost, low prospect and the high cost of failure during preclinical and clinical studies might militate against pursue of this mission. Chemoinformatics techniques can simulate and predict essential molecular properties required to characterize compounds thus eliminating the cost of equipment and reagents to conduct essential preclinical studies, especially on compounds that may fail during drug development. Therefore, applying chemoinformatics techniques on natural products with in-vitro antiplasmodial activities may facilitate identification and prioritization of these natural products with potential for novel mechanism of action, desirable pharmacokinetics and high likelihood for development into antimalarial drugs. In addition, unique structural features mined from these natural products may be templates to design new potential antimalarial compounds. Method: Four chemoinformatics techniques were applied on a collection of selected natural products with in-vitro antiplasmodial activity (NAA) and currently registered antimalarial drugs (CRAD): molecular property profiling, molecular scaffold analysis, machine learning and design of a virtual compound library. Molecular property profiling included computation of key molecular descriptors, physicochemical properties, molecular similarity analysis, estimation of drug-likeness, in-silico pharmacokinetic profiling and exploration of structure-activity landscape. Analysis of variance was used to assess statistical significant differences in these parameters between NAA and CRAD. Next, molecular scaffold exploration and diversity analyses were performed on three datasets (NAA, CRAD and malarial data from Medicines for Malarial Ventures (MMV)) using scaffold counts and cumulative scaffold frequency plots. Scaffolds from the NAA were compared to those from CRAD and MMV. A Scaffold Tree was also generated for all the datasets. Thirdly, machine learning approaches were used to build four regression and four classifier models from bioactivity data of NAA using molecular descriptors and molecular fingerprints. Models were built and refined by leave-one-out cross-validation and evaluated with an independent test dataset. Applicability domain (AD), which defines the limit of reliable predictability by the models, was estimated from the training dataset and validated with the test dataset. Possible chemical features associated with reported antimalarial activities of the compounds were also extracted. Lastly, virtual compound libraries were generated with the unique molecular scaffolds identified from the NAA. The virtual compounds generated were characterized by evaluating selected molecular descriptors, toxicity profile, structural diversity from CRAD and prediction of antiplasmodial activity. Results: From the molecular property profiling, a total of 1040 natural products were selected and a total of 13 molecular descriptors were analyzed. Significant differences were observed between the natural products with in-vitro antiplasmodial activities (NAA) and currently registered antimalarial drugs (CRAD) for at least 11 of the molecular descriptors. Molecular similarity and chemical space analysis identified NAA that were structurally diverse from CRAD. Over 50% of NAA with desirable drug-like properties were identified. However, nearly 70% of NAA were identified as potentially "promiscuous" compounds. Structure-activity landscape analysis highlighted compound pairs that formed "activity cliffs". In all, prioritization strategies for the natural products with in-vitro antiplasmodial activities were proposed. The scaffold exploration and analysis results revealed that CRAD exhibited greater scaffold diversity, followed by NAA and MMV respectively. Unique scaffolds that were not contained in any other compounds in the CRAD datasets were identified in NAA. The Scaffold Tree showed the preponderance of ring systems in NAA and identified virtual scaffolds, which maybe potential bioactive compounds or elucidate the NAA possible synthetic routes. From the machine learning study, the regression and classifier models that were most suitable for NAA were identified as model tree M5P (correlation coefficient = 0.84) and Sequential Minimization Optimization (accuracy = 73.46%) respectively. The test dataset fitted into the applicability domain (AD) defined by the training dataset. The “amine” group was observed to be essential for antimalarial activity in both NAA and MMV dataset but hydroxyl and carbonyl groups may also be relevant in the NAA dataset. The results of the characterization of the virtual compound library showed significant difference (p value < 0.05) between the virtual compound library and currently registered antimalarial drugs in some molecular descriptors (molecular weight, log partition coefficient, hydrogen bond donors and acceptors, polar surface area, shape index, chiral centres, and synthetic feasibility). Tumorigenic and mutagenic substructures were not observed in a large proportion (> 90%) of the virtual compound library. The virtual compound libraries showed sufficient diversity in structures and majority were structurally diverse from currently registered antimalarial drugs. Finally, up to 70% of the virtual compounds were predicted as active antiplasmodial agents. Conclusions:Molecular property profiling of natural products with in-vitro antiplasmodial activities (NAA) and currently registered antimalarial drugs (CRAD) produced a wealth of information that may guide decisions and facilitate antimalarial drug development from natural products and led to a prioritized list of natural products with in-vitro antiplasmodial activities. Molecular scaffold analysis identified unique scaffolds and virtual scaffolds from NAA that possess desirable drug-like properties, which make them ideal starting points for molecular antimalarial drug design. The machine learning study built, evaluated and identified amply accurate regression and classifier accurate models that were used for virtual screening of natural compound libraries to mine possible antimalarial compounds without the expense of bioactivity assays. Finally, a good amount of the virtual compounds generated were structurally diverse from currently registered antimalarial drugs and potentially active antiplasmodial agents. Filtering and optimization may lead to a collection of virtual compounds with unique chemotypes that may be synthesized and added to screening deck against Plasmodium.
8

Cheminformatics for genome-scale metabolic reconstructions

May, John W. January 2015 (has links)
Genome-scale metabolic reconstructions are an important resource in the study of metabolism. They provide both a system and component level view of the biochemical transformations of metabolites. As more reconstructions have been created it remains a challenge to integrate and reason about their contents. This thesis focuses on the development of computational methods to allow on-demand comparison and alignment of metabolic reconstructions. A novel method is introduced that utilises chemical structure representations to identify equivalent metabolites between reconstructions. Using a graph theoretic representation allows the identification and reasoning of metabolites that have a non-exact match. A key advantage is that the method uses the contents of reconstructions directly and does not rely on the creation or use of a common reference. To annotate reconstructions with chemical structure representations an interactive desktop application is introduced. The application assists in the creation and curation of metabolic information using manual, semi-auto\-mated, and automated methods. Chemical structure representations can be retrieved, drawn, or generated to allow precise metabolite annotation. In processing chemical information, efficient and optimised algorithms are required. Several areas are addressed and implementations have been contributed to the Chemistry Development Kit. Rings are a fundamental property of chemical structures therefore multiple ring definitions and fast algorithms are explored. Conversion and standardisation between structure representations present a challenge. Efficient algorithms to determine aromaticity, assign a Kekul? form, and generate tautomers are detailed. Many enzymes are selective and specific to stereochemistry. Methods for the identification, depiction, comparison, and description of stereochemistry are described.
9

Molecular similarity and xenobiotic metabolism

Adams, Samuel E. January 2010 (has links)
MetaPrint2D, a new software tool implementing a data-mining approach for predicting sites of xenobiotic metabolism has been developed. The algorithm is based on a statistical analysis of the occurrences of atom centred circular fingerprints in both substrates and metabolites. This approach has undergone extensive evaluation and been shown to be of comparable accuracy to current best-in-class tools, but is able to make much faster predictions, for the first time enabling chemists to explore the effects of structural modifications on a compound’s metabolism in a highly responsive and interactive manner. MetaPrint2D is able to assign a confidence score to the predictions it generates, based on the availability of relevant data and the degree to which a compound is modelled by the algorithm. In the course of the evaluation of MetaPrint2D a novel metric for assessing the performance of site of metabolism predictions has been introduced. This overcomes the bias introduced by molecule size and the number of sites of metabolism inherent to the most commonly reported metrics used to evaluate site of metabolism predictions. This data mining approach to site of metabolism prediction has been augmented by a set of reaction type definitions to produce MetaPrint2D-React, enabling prediction of the types of transformations a compound is likely to undergo and the metabolites that are formed. This approach has been evaluated against both historical data and metabolic schemes reported in a number of recently published studies. Results suggest that the ability of this method to predict metabolic transformations is highly dependent on the relevance of the training set data to the query compounds. MetaPrint2D has been released as an open source software library, and both MetaPrint2D and MetaPrint2D-React are available for chemists to use through the Unilever Centre for Molecular Science Informatics website.
10

Emprego de ferramentas de quimioinformática no estudo do perfil metabólico de plantas e na desreplicação de matrizes vegetais / Application of chemoinformatic tools in the study of plant metabolic profiles and dereplication

Oliveira, Tiago Branquinho 10 September 2015 (has links)
Com o surgimento da era computacional com especial aplicação em química, as substâncias de origem naturais puderam ter suas informações armazenadas em bancos de dados. Desta forma, surge a oportunidade de se empregar bancos de dados de produtos naturais e de algumas ferramentas de quimioinformática como os estudos de Quantitative Structure-Retention Relationship (QSRR) para acelerar a identificação de substâncias em estudos metabolômicos. Este trabalho propôs o desenvolvimento de três estudos de QSRR, bem como a construção de um banco de dados (AsterDB) com estruturas químicas da família Asteraceae e informações a elas associadas (ex.: ocorrências botânicas e taxonômicas, atividade biológica, informações analíticas etc.) para auxiliar a desreplicação de substâncias em extratos vegetais. O primeiro estudo foi elaborado com 39 lactonas sesquiterpênicas (LST) analisadas em dois diferentes sistemas de solventes (MeOH-H2O 55:45 e MeCN-H2O 35:65), três grupos de descritores estruturais (2D-descr, 3D-1conf e 3D-weigh), dois diferentes conjuntos para treino e teste (26:13 e 29:10), quatro algoritmos para seleção de descritores (best first, linear forward - LFS, greedy stepwise e algoritmo genético - GA), três diferentes tamanhos de modelos (quatro, cinco e seis descritores) e dois métodos de modelagem (mínimos quadrados parciais - PLS e redes neurais artificiais - ANN). O segundo foi desenvolvido com 50 substâncias de diferentes classes químicas com intuito de avaliar as diferenças entre substâncias analisadas individualmente e em mistura em três diferentes equipamentos e dois métodos cromatográficos. O terceiro foi elaborado com 2.635 estruturas químicas com um teste externo comum a todos os modelos (25%, n = 656), três métodos de separação para teste e treino (partição baseada na resposta e baseada nos preditores 2D e 3D), três diferentes tamanhos de modelos selecionados por GA e dois métodos de modelagem (MLR e redes neurais feed-forward com regularização bayesiana - BRNN). O banco de dados AsterDB foi desenvolvido para ser preenchido de forma gradual e atualmente possui cerca de 2.000 estruturas químicas. O primeiro estudo de QSRR gerou bons modelos capazes de estimar o logaritmo do fator de retenção (logk) das LST com P2>0,81 para o sistema MeCN-H2O. O segundo estudo mostrou que não houve diferença estatística entre as substâncias analisadas individualmente e em mistura (p-valor>0,95) e que a correlação entre os dois métodos cromatográficos e equipamentos utilizados foi reprodutível (R>0,95). Estas análises mostraram que foi possível desenvolver modelos de QSRR para um método cromatográfico e equipamento e transpô-los para outro equipamento seguindo o uso de substâncias em comum. O terceiro estudo produziu modelos com boa capacidade de predição (P2>0,81) utilizando alta amplitude de espaço químico e rigor estatístico. Conclui-se que, estas informações podem ser utilizadas como uma plataforma piloto para análises de dados com objetivo de auxiliar na desreplicação de extratos de plantas em estudos metabolômicos / After the emergence of the computing era with special application in chemistry, all substances from natural sources might have their information stored in databases. Therefore, the opportunity arises to employ natural product databases and some chemoinformatic tools such as QSRR studies to speed up the identification of substances from metabolomic studies. This paper proposes the development of three QSRR studies as well as the building of a database (AsterDB) with chemical structures from the Asteraceae family and related information (i.e.: botanical and taxonomic occurrences, biological activity, analytical information, etc.) aiming to assist the dereplication of substances in plant extracts. The first study was carried out with 39 sesquiterpene lactones (STLs) analysed using two different solvent systems (MeOH-H2O 55:45 and MeCN-H2O 35:65), three groups of structural descriptors (2D-descr, 3D-1conf, and 3D-weigh), two different sets for training and testing (26:13 and 29:10), four algorithms for selection of descriptors (best first, LFS, greedy stepwise, and GA), three different model sizes (four, five, and six descriptors) and two modelling methods (PLS and ANN). The second study was developed with 50 compounds of different chemical classification in order to assess the differences between individual and mixed compounds analysed in three different equipments and two chromatographic methods. The third was elaborated with 2,635 chemical structures with a common external test to all models (25%, n = 656), three separation methods for testing- and training-set (based on response and on 2D and 3D predictors partitions), three different sizes of models selected by GA and two modelling methods (MLR and BrNN). The AsterDB database was developed to be populated gradually and currently, it has about 2,000 chemical structures. The first QSRR study generated good models, able to estimate the logarithm of the retention factor (logk) of STLs with P2>0.81 for the MeCN-H2O system. The second study showed that there was no statistical difference between the substances analysed individually and mixed (p-value>0.95) and the correlation between the two chromatographic methods and equipments used was reproducible (R>0.95). These analyses showed that it was possible to develop QSRR models for a chromatographic method and equipment and translate them into other equipment following the use of substances in common. The third study produced models with good predictive capacity (P2>0.81) using a high range of chemical space and statistical accuracy. In conclusion, this information can be used as a pilot platform for data analysis in order to assist in plant dereplication in metabolomics studies

Page generated in 0.0648 seconds