• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 31
  • 8
  • 5
  • 3
  • 1
  • 1
  • Tagged with
  • 59
  • 59
  • 13
  • 10
  • 9
  • 9
  • 8
  • 8
  • 8
  • 7
  • 7
  • 7
  • 7
  • 7
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Enhancement of aeroelastic rotor airload prediction methods

Abras, Jennifer N. 02 April 2009 (has links)
The accurate prediction of rotor air loads is a current topic of interest in the rotorcraft community. The complex nature of this loading makes this problem especially difficult. Some of the issues that must be considered include transonic effects on the advancing blade, dynamic stall effects on the retreating blade, and wake vortex interactions with the blades, fuselage, and other components. There are numerous codes to perform these predictions, both aerodynamic and structural, but until recently each code has refined either the structural or aerodynamic aspect of the analysis without serious consideration to the other, using only simplified modules to represent the physics. More recent research has concentrated on combining high fidelity CFD and CSD computations to be able to use the most accurate codes available to compute both the structural and the aerodynamic aspects. The objective of the research is to both evaluate and extend a range of prediction methods comparing both accuracy and computational expense. This range covers many methods where the highest accuracy method shown is a delta loads coupling between an unstructured CFD code and a comprehensive code, and the lowest accuracy, but highest efficiency, is found through a free wake and comprehensive code coupling using simplified 2D aerodynamics. From here methods to improve the efficiency and accuracy of the CFD code will be considered through implementation of steady-state grid adaptation, a time accurate low Mach number preconditioning method, and the use of fully articulated rigid blade motion. The exact formulation of the 2D aerodynamic model used in the CSD code will be evaluated, as will efficiency improvements to the free wake code. The advantages of the free-wake code will be tested against a dynamic inflow model. A comparison of all of these methods will show the advantages and consequences of each combination, including the types of physics that each method is able to, or not able to, capture through examination of how closely each method matches flight test data.
42

Novas metodologias de simulação do tipo Monte-Carlo via séries de Neumann aplicadas a problemas de flexão de placas / New methods of simulation type Monte-Carlo for Neumann series applied to plate bending problems

Kist, Milton 04 July 2016 (has links)
A engenharia é um campo muito rico e vasto em problemas. Mesmo considerando-se apenas o ramo da engenharia estrutural, a quantidade e a variabilidade de problemas continuam sendo muito grande. O aumento da capacidade computacional proporcionou, nos últimos anos, o desenvolvimento de métodos mais complexos e robustos (métodos estocásticos) para resolução de problemas na área de estruturas passando a considerar incerteza. A incerteza pode ser devido à aleatoriedade das propriedades materiais, condições de apoio e carregamento. Muitos dos métodos estocásticos são baseados na simulação de Monte-Carlo, no entanto o método de Monte-Carlo direto possui custo computacional elevado. Visando o desenvolvimento de novas metodologias para resolução de problemas da área de estruturas, neste trabalho de tese apresentam-se três novas metodologias aplicadas a problemas estocásticos de flexão de placas, caracterizando assim a contribuição científica da tese. Estas metodologias, denominadas de Monte Carlo-Neumann, com ajuste no limitante; Monte Carlo-Neumann, mista 1 e Monte Carlo-Neumann, mista 2, utilizam a série de Neumann associada ao método de Monte-Carlo. Para verificar a eficiência quanto a precisão e ao tempo computacional, as metodologias foram aplicadas em problemas estocásticos de flexão de Placas de Kirchhoff em bases de Winkler e de Pasternak, considerando-se aleatoriedade sobre a rigidez da placa e sobre os coeficientes de rigidez da base de apoio. / Engineering is a very rich and wide field in problems. Even considering just structural engineering branch, the amount and variability of problems remains very large. The increase of computational capacity provided development of complex and robust methods to solve structural problems consi- dering uncertainty. Uncertainty may be due to material property randomness, support conditions and load. Many of stochastic methods are based on Monte-Carlo simulation, however Monte-Carlo direct method has high computation cost. Aiming the development of new methodologies for solving problems of the structures area, this thesis presents three new methodologies applied to plates stochastic bending problems, characterizing the scientific contribution of the thesis. These methodologies, named Monte Carlo-Neumann λ and Monte Carlo-Neumann, with quotas establishment, Monte Carlo-Neumann, with adjustment in limiting, Monte Carlo-Neumann, mixed 1 and Monte Carlo-Neumann, mixed 2, both based on Neumann series, were applied to stochastic problems of flexion of Kirchhoff plates on Winkler and Pasternak bases, considering uncertainty about plate stiffness and stiffness coefficient of the support base.
43

miRQuest: um middleware para investigação de miRNAs / miRQuest: a middleware for miRNA research

Ambrosio, Rosana Ressa Aguiar 30 September 2015 (has links)
RNAs não codificadores são RNAs transcritos, mas não traduzidos, que possuem funções importantíssimas para a regulação dos processos biológicos celulares. Dentre as diversas classes de ncRNAs, a dos miRNAs é a que desperta maior interesse de pesquisa pela comunidade científica atualmente. miRNAs são pequenos RNAs, que contêm cerca de 22 nucleotídeos (nt), que atuam como inibidores/silenciadores pós-transcricionais. Dada sua importância, identificar essa classe de ncRNA permite descobrir possíveis novos microRNAs, bem como seu papel regulatório que pode estar ligado a diversos processos biológicos. A bioinformática, por meio da análise in silico dos microRNA, via abordagens de reconhecimento de padrões, por exemplo, contribuiu muito para identificação e anotação dessa classe de ncRNA. Isso permitiu o desenvolvimento de novas técnicas, métodos e abordagens computacionais que fossem capazes de contribuir, de maneira mais eficiente, com as análises e interpretação da grande massa de dados biológicos que vêm sendo gerados com maior frequência, principalmente nos últimos anos. Apesar de existir grande variedade de abordagens computacionais descritas para a identificação de microRNAs, em sua grande maioria, estas apresentam algum tipo de limitação (e.g. desatualizadas, não mais disponíveis). Desse modo, este trabalho apresenta o miRQuest, um sistema integrado que foi construído, utilizando-se um padrão de desenvolvimento em camadas, via tecnologia de middleware, em uma plataforma web para a investigação miRNA que contém duas funções principais: (i) a integração de diferentes ferramentas de previsão miRNA para identificação miRNA em um ambiente amigável; e (ii) a comparação entre essas ferramentas de previsão. O miRQuest não introduz um novo modelo computacional para predição de miRNAs, mas, sim, uma nova metodologia que permite executar simultaneamente diferentes técnicas de identificação de miRNA. / miRNA belongs to the class of small RNAs non-coding (ncRNAs), been the target of several studies in the literature for his role in the regulation of mRNA levels (messenger RNA) in cells. Representing an class of endogenous RNAs of approximately 22 nucleotides (mature), that act as inhibitors/silencers post-transcriptional. Discovered at the end of last century in Caenorhabditis elegans, miRNAs are now recognized as key regulators of gene expression in plants and animals, among many eukaryotic organisms. According to the scientific literature, this is one of the most studied classes of ncRNAs by the scientific community nowadays. According to the scientific literature, this is one of the most studied classes of ncRNAs by nowadays scientific community. Given its importance, identify this class of ncRNA becomes of great interest as it allows to discover possible new microRNAs, as well as its regulatory role that can be connected to multiple biological processes. Bioinformatics, through in silico analysis of microRNA, either via pattern recognition approaches, for example, greatly contributes to the identification and annotation of this class ncRNA. This allowed the development of new techniques, methods and computational approaches that were able to contribute more effectively to the analysis and interpretation of the great mass of biological data, which has been generated at a higher frequency, especially in recent years. Although there are a variety of computational approaches described for the identification of microRNA, for the most part, these have some sort of limitation (eg outdated, no longer available). Thus, this work presents the miRQuest; an integrated system was built using a standard developing layer, through middleware, in a web platform for miRNA research that has two main functions: (i) the integration of different miRNA prediction tools to identify miRNA in a friendly environment; and (ii) benchmarking between these predictive tools. The miRQuest does not introduce a new computer model to predict miRNAs, but rather, a new methodology that permits simultaneously run different miRNA identification techniques.
44

Novas metodologias de simulação do tipo Monte-Carlo via séries de Neumann aplicadas a problemas de flexão de placas / New methods of simulation type Monte-Carlo for Neumann series applied to plate bending problems

Kist, Milton 04 July 2016 (has links)
A engenharia é um campo muito rico e vasto em problemas. Mesmo considerando-se apenas o ramo da engenharia estrutural, a quantidade e a variabilidade de problemas continuam sendo muito grande. O aumento da capacidade computacional proporcionou, nos últimos anos, o desenvolvimento de métodos mais complexos e robustos (métodos estocásticos) para resolução de problemas na área de estruturas passando a considerar incerteza. A incerteza pode ser devido à aleatoriedade das propriedades materiais, condições de apoio e carregamento. Muitos dos métodos estocásticos são baseados na simulação de Monte-Carlo, no entanto o método de Monte-Carlo direto possui custo computacional elevado. Visando o desenvolvimento de novas metodologias para resolução de problemas da área de estruturas, neste trabalho de tese apresentam-se três novas metodologias aplicadas a problemas estocásticos de flexão de placas, caracterizando assim a contribuição científica da tese. Estas metodologias, denominadas de Monte Carlo-Neumann, com ajuste no limitante; Monte Carlo-Neumann, mista 1 e Monte Carlo-Neumann, mista 2, utilizam a série de Neumann associada ao método de Monte-Carlo. Para verificar a eficiência quanto a precisão e ao tempo computacional, as metodologias foram aplicadas em problemas estocásticos de flexão de Placas de Kirchhoff em bases de Winkler e de Pasternak, considerando-se aleatoriedade sobre a rigidez da placa e sobre os coeficientes de rigidez da base de apoio. / Engineering is a very rich and wide field in problems. Even considering just structural engineering branch, the amount and variability of problems remains very large. The increase of computational capacity provided development of complex and robust methods to solve structural problems consi- dering uncertainty. Uncertainty may be due to material property randomness, support conditions and load. Many of stochastic methods are based on Monte-Carlo simulation, however Monte-Carlo direct method has high computation cost. Aiming the development of new methodologies for solving problems of the structures area, this thesis presents three new methodologies applied to plates stochastic bending problems, characterizing the scientific contribution of the thesis. These methodologies, named Monte Carlo-Neumann λ and Monte Carlo-Neumann, with quotas establishment, Monte Carlo-Neumann, with adjustment in limiting, Monte Carlo-Neumann, mixed 1 and Monte Carlo-Neumann, mixed 2, both based on Neumann series, were applied to stochastic problems of flexion of Kirchhoff plates on Winkler and Pasternak bases, considering uncertainty about plate stiffness and stiffness coefficient of the support base.
45

Non-asymptotic bounds for prediction problems and density estimation.

Minsker, Stanislav 05 July 2012 (has links)
This dissertation investigates the learning scenarios where a high-dimensional parameter has to be estimated from a given sample of fixed size, often smaller than the dimension of the problem. The first part answers some open questions for the binary classification problem in the framework of active learning. Given a random couple (X,Y) with unknown distribution P, the goal of binary classification is to predict a label Y based on the observation X. Prediction rule is constructed from a sequence of observations sampled from P. The concept of active learning can be informally characterized as follows: on every iteration, the algorithm is allowed to request a label Y for any instance X which it considers to be the most informative. The contribution of this work consists of two parts: first, we provide the minimax lower bounds for the performance of active learning methods. Second, we propose an active learning algorithm which attains nearly optimal rates over a broad class of underlying distributions and is adaptive with respect to the unknown parameters of the problem. The second part of this thesis is related to sparse recovery in the framework of dictionary learning. Let (X,Y) be a random couple with unknown distribution P. Given a collection of functions H, the goal of dictionary learning is to construct a prediction rule for Y given by a linear combination of the elements of H. The problem is sparse if there exists a good prediction rule that depends on a small number of functions from H. We propose an estimator of the unknown optimal prediction rule based on penalized empirical risk minimization algorithm. We show that the proposed estimator is able to take advantage of the possible sparse structure of the problem by providing probabilistic bounds for its performance.
46

Theoretical Results and Applications Related to Dimension Reduction

Chen, Jie 01 November 2007 (has links)
To overcome the curse of dimensionality, dimension reduction is important and necessary for understanding the underlying phenomena in a variety of fields. Dimension reduction is the transformation of high-dimensional data into a meaningful representation in the low-dimensional space. It can be further classified into feature selection and feature extraction. In this thesis, which is composed of four projects, the first two focus on feature selection, and the last two concentrate on feature extraction. The content of the thesis is as follows. The first project presents several efficient methods for the sparse representation of a multiple measurement vector (MMV); some theoretical properties of the algorithms are also discussed. The second project introduces the NP-hardness problem for penalized likelihood estimators, including penalized least squares estimators, penalized least absolute deviation regression and penalized support vector machines. The third project focuses on the application of manifold learning in the analysis and prediction of 24-hour electricity price curves. The last project proposes a new hessian regularized nonlinear time-series model for prediction in time series.
47

Optimization of nuclear, radiological, biological, and chemical terrorism incidence models through the use of simulated annealing Monte Carlo and iterative methods

Coyle, Jesse Aaron 18 January 2012 (has links)
A random search optimization method based off an analogous process for the slow cooling of metals is explored and used to find the optimum solution for a number of regression models that analyze nuclear, radiological, biological,and chemical terrorism targets. A non-parametric simulation based off of historical data is also explored. Simulated series of 30 years and a 30 year extrapolation of historical data are provided. The inclusion of independent variables used in the regression analysis is based off existing work in the reviewed literature. CBRN terrorism data is collected from both the Monterey Institute's Weapons of Mass Destruction Terrorism Database as well as from the START Global Terrorism Database. Building similar models to those found in the literature and running them against CBRN terrorism incidence data determines if conventional terrorism indicator variables are also significant predictors of CBRN terrorism targets. The negative binomial model was determined to be the best regression model available for the data analysis. Two general types of models are developed, including an economic development model and a political risk model. From the economic development model we find that national GDP, GDP per capita, trade openness, and democracy to significant indicators of CBRN terrorism targets. Additionally from the political risk model we find corrupt, stable, and democratic regimes more likely to experience a CBRN event. We do not find language/religious fractionalization to be a significant predictive variable. Similarly we do not find ethnic tensions, involvement in external conflict, or a military government to have significant predictive value.
48

Um estudo exploratório sobre o uso de diferentes algoritmos de classificação, de seleção de métricas, e de agrupamento na construção de modelos de predição cruzada de defeitos entre projetos / An exploratory study on the use of different classification algorithms, of selection metrics, and grouping to build cross-project defect prediction models

Satin, Ricardo Francisco de Pierre 18 August 2015 (has links)
Predizer defeitos em projetos de software é uma tarefa complexa, especialmente para aqueles projetos que estão em fases iniciais do desenvolvimento por, frequentemente, disponibilizarem de poucos dados para que modelos de predição sejam criados. A utilização da predição cruzada de defeitos entre projetos é indicada em tal situação, pois permite reaproveitar dados de projetos similares. Este trabalho propõe um estudo exploratório sobre o uso de diferentes algoritmos de classificação, seleção de métricas, e de agrupamento na construção de um modelo de predição cruzada de defeitos entre projetos. Esse modelo foi construído com o uso de uma medida de desempenho, obtida com a aplicação de algoritmos de classificação, como forma de encontrar e agrupar projetos semelhantes. Para tanto, foi estudada a aplicação conjunta de 8 algoritmos de classificação, 6 de seleção de atributos, e um de agrupamento em um conjunto de dados com 1283 projetos, resultando na construção de 61584 diferentes modelos de predição. Os algoritmos de classificação e de seleção de atributos tiveram seus desempenhos avaliados por meio de diferentes testes estatísticos que mostraram que: o Naive Bayes foi o classificador de melhor desempenho, em comparação com os outros 7 algoritmos; o par de algoritmos de seleção de atributos que apresentou melhor desempenho foi o formado pelo avaliador de atributos CFS e método de busca Genetic Search, em comparação com outros 6 pares. Considerando o algoritmo de agrupamento, a presente proposta parece ser promissora, uma vez que os resultados obtidos mostram evidências de que as predições usando agrupamento foram melhores que as predições realizadas sem qualquer agrupamento por similaridade, além de mostrar a diminuição do custo de treino e teste durante o processo de predição. / To predict defects in software projects is a complex task, especially for those projects that are in early stages of development by, often, providing few data for prediction models. The use of cross-project defect prediction is indicated in such a situation because it allows reuse data of similar projects. This work proposes an exploratory study on the use of different classification algorithms, of selection metrics, and grouping to build cross-project defect predictions models. This model was built using a performance measure, obtained by applying classification algorithms aim to find and group similar projects. Therefore, it was studied the application of 8 classification algorithms, 6 feature selection, and a cluster in a data set with 1283 projects, resulting in the construction of 61584 different prediction models. The classification algorithms and feature selection had their performance evaluated through different statistical tests showed that: the Naive Bayes was the best performance classifier, as compared with other 7 algorithms; the pair of feature selection algorithms that performed better was formed by CFS attribute evaluator and search method Genetic Search, compared with 6 other pairs. Considering the clustering algorithm, this proposal seems to be promising, since the results shows evidence that the predictions were best grouping using the predictions performed without any similarity clustering, and shows the decrease in training cost and testing during the prediction process.
49

Modèles prudents en apprentissage statistique supervisé / Cautious models in supervised machine learning

Yang, Gen 22 March 2016 (has links)
Dans certains champs d’apprentissage supervisé (e.g. diagnostic médical, vision artificielle), les modèles prédictifs sont non seulement évalués sur leur précision mais également sur la capacité à l'obtention d'une représentation plus fiable des données et des connaissances qu'elles induisent, afin d'assister la prise de décisions de manière prudente. C'est la problématique étudiée dans le cadre de cette thèse. Plus spécifiquement, nous avons examiné deux approches existantes de la littérature de l'apprentissage statistique pour rendre les modèles et les prédictions plus prudents et plus fiables: le cadre des probabilités imprécises et l'apprentissage sensible aux coûts. Ces deux domaines visent tous les deux à rendre les modèles d'apprentissage et les inférences plus fiables et plus prudents. Pourtant peu de travaux existants ont tenté de les relier, en raison de problèmes à la fois théorique et pratique. Nos contributions consistent à clarifier et à résoudre ces problèmes. Sur le plan théorique, peu de travaux existants ont abordé la manière de quantifier les différentes erreurs de classification quand des prédictions sous forme d'ensembles sont produites et quand ces erreurs ne se valent pas (en termes de conséquences). Notre première contribution a donc été d'établir des propriétés générales et des lignes directrices permettant la quantification des coûts d'erreurs de classification pour les prédictions sous forme d'ensembles. Ces propriétés nous ont permis de dériver une formule générale, le coût affaiblie généralisé (CAG), qui rend possible la comparaison des classifieurs quelle que soit la forme de leurs prédictions (singleton ou ensemble) en tenant compte d'un paramètre d'aversion à la prudence. Sur le plan pratique, la plupart des classifieurs utilisant les probabilités imprécises ne permettent pas d'intégrer des coûts d'erreurs de classification génériques de manière simple, car la complexité du calcul augmente de magnitude lorsque des coûts non unitaires sont utilisés. Ce problème a mené à notre deuxième contribution, la mise en place d'un classifieur qui permet de gérer les intervalles de probabilités produits par les probabilités imprécises et les coûts d'erreurs génériques avec le même ordre de complexité que dans le cas où les probabilités standards et les coûts unitaires sont utilisés. Il s'agit d'utiliser une technique de décomposition binaire, les dichotomies emboîtées. Les propriétés et les pré-requis de ce classifieur ont été étudiés en détail. Nous avons notamment pu voir que les dichotomies emboîtées sont applicables à tout modèle probabiliste imprécis et permettent de réduire le niveau d'indétermination du modèle imprécis sans perte de pouvoir prédictif. Des expériences variées ont été menées tout au long de la thèse pour appuyer nos contributions. Nous avons caractérisé le comportement du CAG à l’aide des jeux de données ordinales. Ces expériences ont mis en évidence les différences entre un modèle basé sur les probabilités standards pour produire des prédictions indéterminées et un modèle utilisant les probabilités imprécises. Ce dernier est en général plus compétent car il permet de distinguer deux sources d'indétermination (l'ambiguïté et le manque d'informations), même si l'utilisation conjointe de ces deux types de modèles présente également un intérêt particulier dans l'optique d'assister le décideur à améliorer les données ou les classifieurs. De plus, des expériences sur une grande variété de jeux de données ont montré que l'utilisation des dichotomies emboîtées permet d'améliorer significativement le pouvoir prédictif d'un modèle imprécis avec des coûts génériques. / In some areas of supervised machine learning (e.g. medical diagnostics, computer vision), predictive models are not only evaluated on their accuracy but also on their ability to obtain more reliable representation of the data and the induced knowledge, in order to allow for cautious decision making. This is the problem we studied in this thesis. Specifically, we examined two existing approaches of the literature to make models and predictions more cautious and more reliable: the framework of imprecise probabilities and the one of cost-sensitive learning. These two areas are both used to make models and inferences more reliable and cautious. Yet few existing studies have attempted to bridge these two frameworks due to both theoretical and practical problems. Our contributions are to clarify and to resolve these problems. Theoretically, few existing studies have addressed how to quantify the different classification errors when set-valued predictions are produced and when the costs of mistakes are not equal (in terms of consequences). Our first contribution has been to establish general properties and guidelines for quantifying the misclassification costs for set-valued predictions. These properties have led us to derive a general formula, that we call the generalized discounted cost (GDC), which allow the comparison of classifiers whatever the form of their predictions (singleton or set-valued) in the light of a risk aversion parameter. Practically, most classifiers basing on imprecise probabilities fail to integrate generic misclassification costs efficiently because the computational complexity increases by an order (or more) of magnitude when non unitary costs are used. This problem has led to our second contribution, the implementation of a classifier that can manage the probability intervals produced by imprecise probabilities and the generic error costs with the same order of complexity as in the case where standard probabilities and unitary costs are used. This is to use a binary decomposition technique, the nested dichotomies. The properties and prerequisites of this technique have been studied in detail. In particular, we saw that the nested dichotomies are applicable to all imprecise probabilistic models and they reduce the imprecision level of imprecise models without loss of predictive power. Various experiments were conducted throughout the thesis to illustrate and support our contributions. We characterized the behavior of the GDC using ordinal data sets. These experiences have highlighted the differences between a model based on standard probability framework to produce indeterminate predictions and a model based on imprecise probabilities. The latter is generally more competent because it distinguishes two sources of uncertainty (ambiguity and the lack of information), even if the combined use of these two types of models is also of particular interest as it can assist the decision-maker to improve the data quality or the classifiers. In addition, experiments conducted on a wide variety of data sets showed that the use of nested dichotomies significantly improves the predictive power of an indeterminate model with generic costs.
50

Développement de modèles QSPR pour la prédiction et la compréhension des propriétés amphiphiles des tensioactifs dérivés de sucre / Development of QSPR models for the prediction and better understanding of amphiphilic properties of sugar-based surfactants

Gaudin, Théophile 30 November 2016 (has links)
Les tensioactifs dérivés de sucres représentent la principale famille de tensioactifs bio-sourcés et constituent de bons candidats pour substituer les tensioactifs dérivés du pétrole puisqu'ils sont issus de ressources renouvelables et peuvent être autant, voire plus performants dans diverses applications, comme la formulation (détergents, cosmétiques,…), la récupération assistée du pétrole ou des minéraux, etc. Différentes propriétés amphiphiles permettent de caractériser la performance des tensioactifs dans de telles applications, comme la concentration micellaire critique, la tension de surface à la concentration micellaire critique, l'efficience et le point de Krafft. Prédire ces propriétés serait bénéfique pour identifier plus rapidement les tensioactifs possédant les propriétés désirées. Les modèles QSPR sont des outils permettant de prédire de telles propriétés, mais aucun modèle QSPR fiable dédié à ces propriétés n'a été identifié pour les tensioactifs bio-sourcés, et en particulier les tensioactifs dérivés de sucres. Au cours de cette thèse, de tels modèles QSPR ont été développés. Une base de données fiables est nécessaire pour développer tout modèle QSPR. Concernant les tensioactifs dérivés de sucres, aucune base de données existante n'a été identifiée pour les propriétés ciblées. Cela a donné suite à la construction de la première base de données de propriétés amphiphiles de tensioactifs dérivés de sucres, qui est en cours de valorisation. L'analyse de cette base de données a mis en évidence différentes relations empiriques entre la structure de ces molécules et leurs propriétés amphiphiles, et permis d'isoler des jeux de données les plus fiables et au protocole le plus homogène possibles en vue du développement de modèles QSPR. Après établissement d'une stratégie robuste pour calculer les descripteurs moléculaires constituant les modèles QSPR, qui s'appuie notamment sur des analyses conformationnelles des tensioactifs dérivés de sucres et des descripteurs des têtes polaires et chaînes alkyles, différents modèles QSPR ont été développés, validés, et leur domaine d'applicabilité spécifié, pour la concentration micellaire critique, la tension de surface à la concentration micellaire critique, l'efficience et le point de Krafft. Pour les trois premières propriétés, des modèles quantitatifs performants ont pu être obtenus. Si les descripteurs quantiques ont apporté un gain prédictif important pour la tension de surface à la concentration micellaire critique, et un léger gain pour la concentration micellaire critique, aucun gain n'a été observé pour l'efficience. Pour ces trois propriétés, des modèles simples basés sur des descripteurs constitutionnels des parties hydrophile et hydrophobe de la molécule (comme des décomptes d'atomes) ont aussi été obtenus. Pour le point de Krafft, deux arbres de décision qualitatifs, classant la molécule comme soluble ou insoluble dans l'eau à température ambiante, ont été proposés. Les descripteurs quantiques ont ici aussi apporté un gain en prédictivité, même si un modèle relativement fiable basé sur des descripteurs constitutionnels des parties hydrophile et hydrophobe de la molécule a aussi été obtenu. Enfin, nous avons montré comment ces modèles QSPR peuvent être utilisés, pour prédire les propriétés de nouvelles molécules avant toute synthèse dans un contexte de screening, ou les propriétés manquantes de molécules existantes, et pour le design in silico de nouvelles molécules par combinaison de fragments. / Sugar-based surfactants are the main family of bio-based surfactants and are good candidates as substitutes for petroleum-based surfactants, since they originate from renewable resources and can show as good as, or even better, performances in various applications, such as detergent and cosmetic formulation, enhanced oil or mineral recovery, etc. Different amphiphilic properties can characterize surfactant performance in such applications, like critical micelle concentration, surface tension at critical micelle concentration, efficiency and Kraft point. Predicting such properties would be beneficial to quickly identify surfactants that exhibit desired properties. QSPR models are tools to predict such properties, but no reliable QSPR model was identified for bio-based surfactants, and in particular sugar-based surfactants. During this thesis, such QSPR models were developed. A reliable database is required to develop any QSPR model. Regarding sugar-based surfactants, no database was identified for the targeted properties. This motivated the elaboration of the first database of amphiphilic properties of sugar-based surfactants. The analysis of this database highlighted various empirical relationships between the chemical structure of these molecules and their amphiphilic properties, and enabled to isolate the most reliable datasets with the most homogeneous possible protocol, to be used for the development of the QSPR models. After the development of a robust strategy to calculate molecular descriptors that constitute QSPR models, notably relying upon conformational analysis of sugar-based surfactants and descriptors calculated only for the polar heads and for the alkyl chains, different QSPR models were developed, validated, and their applicability domain defined, for the critical micelle concentration, the surface tension at critical micelle concentration, the efficiency and the Kraft point. For the three first properties, good quantitative models were obtained. If the quantum chemical descriptors brought a significant additional predictive power for the surface tension at critical micelle concentration, and a slight improvement for the critical micelle concentration, no gain was observed for efficiency. For these three properties, simple models based on constitutional descriptors of polar heads and alkyl chains of the molecule (like atomic counts) were also obtained. For the Krafft point, two qualitative decision trees, classifying the molecule as water soluble or insoluble at room temperature, were proposed. The use of quantum chemical descriptors brought an increase in predictive power for these decision trees, even if a quite reliable model only based on constitutional descriptors of polar heads and alkyl chains was also obtained. At last, we showed how these QSPR models can be used, to predict properties of new surfactants before synthesis in a context of computational screening, or missing properties of existing surfactants, and for the in silico design of new surfactants by combining different polar heads with different alkyl chain

Page generated in 0.0756 seconds