Global ETD Search

1	Contributions to evaluation of machine learning models. Applicability domain of classification models Rado, Omesaad A.M. January 2019 (has links) Artificial intelligence (AI) and machine learning (ML) present some application opportunities and challenges that can be framed as learning problems. The performance of machine learning models depends on algorithms and the data. Moreover, learning algorithms create a model of reality through learning and testing with data processes, and their performance shows an agreement degree of their assumed model with reality. ML algorithms have been successfully used in numerous classification problems. With the developing popularity of using ML models for many purposes in different domains, the validation of such predictive models is currently required more formally. Traditionally, there are many studies related to model evaluation, robustness, reliability, and the quality of the data and the data-driven models. However, those studies do not consider the concept of the applicability domain (AD) yet. The issue is that the AD is not often well defined, or it is not defined at all in many fields. This work investigates the robustness of ML classification models from the applicability domain perspective. A standard definition of applicability domain regards the spaces in which the model provides results with specific reliability. The main aim of this study is to investigate the connection between the applicability domain approach and the classification model performance. We are examining the usefulness of assessing the AD for the classification model, i.e. reliability, reuse, robustness of classifiers. The work is implemented using three approaches, and these approaches are conducted in three various attempts: firstly, assessing the applicability domain for the classification model; secondly, investigating the robustness of the classification model based on the applicability domain approach; thirdly, selecting an optimal model using Pareto optimality. The experiments in this work are illustrated by considering different machine learning algorithms for binary and multi-class classifications for healthcare datasets from public benchmark data repositories. In the first approach, the decision trees algorithm (DT) is used for the classification of data in the classification stage. The feature selection method is applied to choose features for classification. The obtained classifiers are used in the third approach for selection of models using Pareto optimality. The second approach is implemented using three steps; namely, building classification model; generating synthetic data; and evaluating the obtained results. The results obtained from the study provide an understanding of how the proposed approach can help to define the model’s robustness and the applicability domain, for providing reliable outputs. These approaches open opportunities for classification data and model management. The proposed algorithms are implemented through a set of experiments on classification accuracy of instances, which fall in the domain of the model. For the first approach, by considering all the features, the highest accuracy obtained is 0.98, with thresholds average of 0.34 for Breast cancer dataset. After applying recursive feature elimination (RFE) method, the accuracy is 0.96% with 0.27 thresholds average. For the robustness of the classification model based on the applicability domain approach, the minimum accuracy is 0.62% for Indian Liver Patient data at r=0.10, and the maximum accuracy is 0.99% for Thyroid dataset at r=0.10. For the selection of an optimal model using Pareto optimality, the optimally selected classifier gives the accuracy of 0.94% with 0.35 thresholds average. This research investigates critical aspects of the applicability domain as related to the robustness of classification ML algorithms. However, the performance of machine learning techniques depends on the degree of reliable predictions of the model. In the literature, the robustness of the ML model can be defined as the ability of the model to provide the testing error close to the training error. Moreover, the properties can describe the stability of the model performance when being tested on the new datasets. Concluding, this thesis introduced the concept of applicability domain for classifiers and tested the use of this concept with some case studies on health-related public benchmark datasets. / Ministry of Higher Education in Libya Machine learning Classification algorithms Binary classification Accuracy Model evaluation Model reliability Applicability domain Model robustness Model coverage Healthcare data
2	Estimation of the acute toxicity and prediction of the metabolism site for organic molecules using GALAS methodology / Organinių medžiagų ūmaus toksiškumo ir metabolizmo vietos molekulėje prognozavimas taikant GALAS metodą Sazonovas, Andrius 27 May 2010 (has links) The dissertation presents GALAS models for the estimation of the acute toxicity towards two rodent species following different administration routes as well as for the prediction of CYP3A4 and CYP2D6 regioselectivity in the main metabolic reactions mediated by these enzymes (13 individual models in total). All these models feature the ability of the quantitative model Applicability Domain (AD) evaluation via the estimated prediction Reliability Indices (RI). I.e., the obtained models conform to one of the main requirements for the QSAR model acceptance as an alternative research method by the EU regulatory institutions. Evident correlation between prediction reliability and its accuracy allowed classifying each model result into one of several qualitative classes according to RI values. One possible way of utilizing such information, discussed in this study, is compound prioritization before experimental testing potentially resulting in reduction of the number of necessary measurements. As demonstrated the AD of the obtained GALAS models can be easily expanded to cover specific compound classes of researcher interest using ‘in-house’ databases of experimental data. This feature significantly improves the possibilities for the practical application of these models, based on public data, in industry. Especially given the fact that the described improvements in predictions following the addition of similar compounds was instant and required no rebuilding of the baseline models. / Disertacijoje pristatomi GALAS metodika paremti ūmaus toksiškumo dviems graužikų rūšims bei visai eilei skirtingų medžiagos patekimo į organizmą būdų ir CYP3A4 bei CYP2D6 fermentų regioselektyvumo pagrindinėse jų katalizuojamose metabolizmo reakcijose prognozavimo modeliai (iš viso 13 individualių modelių). Visi minimi modeliai kokybiškai išsiskiria iš anksčiau publikuotų savo analogų dėl kiekybinio jų pritaikomumo srities įvertinimo galimybės, kurią suteikia apskaičiuojamos prognozės patikimumo indekso (RI) reikšmės. Tokia savybė yra vienas pagrindinių reikalavimų vertinant bet kokio modelio galimybes tapti ES oficialiai pripažintu alternatyviu tyrimo metodu. Aiški prognozių kokybės priklausomybė nuo jų patikimumo išraiškos taip pat suteikia galimybę modelio rezultatus suskaidyti į kokybines klases pagal apskaičiuotąsias RI reikšmes. Vienas iš tokios informacijos panaudojimo būdų siūlomų disertacijoje yra junginių prioritetizavimas prieš bet kokius eksperimentinius matavimus ir netgi pastarųjų skaičiaus potencialus sumažinimas. Disertacijoje taip pat išnagrinėta galimybė greitai bei efektyviai apmokyti gautuosius GALAS modelius naujais eksperimentiniais duomenimis, išplečiant jų pritaikomumo sritį. Ši esminė savybė radikaliai padidina nagrinėjamųjų modelių, paremtų viešai prieinamų duomenų rinkiniais, realaus praktinio panaudojimo farmacijos pramonėje galimybes. Chemistry GALAS modeling method Acute toxicity CYP450 regioselectivity Model applicability domain Reliability index GALAS modeliavimo metodas Ūmus toksiškumas CYP450 regioselektyvumas Modelio pritaikomumo sritis Prognozės patikimumo indeksas
3	Permanganate Reaction Kinetics and Mechanisms and Machine Learning Application in Oxidative Water Treatment Zhong, Shifa 21 June 2021 (has links) No description available. Environmental Engineering Environmental Science Bisulfite and permanganate Trivalent manganese catalyst Trivalent manganese-ligand complexes water treatment QSAR OH radical molecular fingerprint molecular image CNN DNN
4	Cartography of chemical space / Cartographie de l'espace chimique Gaspar, Héléna Alexandra 29 September 2015 (has links) Cette thèse est consacrée à la cartographie de l’espace chimique ; son but est d’établir les bases d’un outil donnant une vision d’ensemble d’un jeu de données, comprenant prédiction d’activité, visualisation, et comparaison de grandes librairies. Dans cet ouvrage, nous introduisons des modèles prédictifs QSAR (relations quantitatives structure à activité) avec de nouvelles définitions de domaines d’applicabilité, basés sur la méthode GTM (generative topographic mapping), introduite par C. Bishop et al. Une partie de cette thèse concerne l’étude de grandes librairies de composés chimiques grâce à la méthode GTM incrémentale. Nous introduisons également une nouvelle méthode « Stargate GTM », ou S-GTM, permettant de passer de l’espace des descripteurs chimiques à celui des activités et vice versa, appliquée à la prédiction de profils d’activité ou aux QSAR inverses. / This thesis is dedicated to the cartography of chemical space; our goal is to establish the foundations of a tool offering a complete overview of a chemical dataset, including visualization, activity prediction, and comparison of very large datasets. In this work, we introduce new QSAR models (quantitative structure-activity relationship) based on the GTM method (generative topographic mapping), introduced by C. Bishop et al. A part of this thesis is dedicated to the visualization and analysis of large chemical libraries using the incremental version of GTM. We also introduce a new method coined “Stargate GTM” or S-GTM, which allows us to travel from the space of chemical descriptors to activity space and vice versa; this approach was applied to activity profile prediction and inverse QSAR. Visualisation Espace chimique QSAR Inverse QSAR Domaine d’applicabilité Stargate GTM Données massives Apprentissage automatique Visualization Chemical space QSAR Inverse QSAR Applicability domain Stargate GTM Big data Machine learning 540.12
5	Improved in silico methods for target deconvolution in phenotypic screens Mervin, Lewis January 2018 (has links) Target-based screening projects for bioactive (orphan) compounds have been shown in many cases to be insufficiently predictive for in vivo efficacy, leading to attrition in clinical trials. Phenotypic screening has hence undergone a renaissance in both academia and in the pharmaceutical industry, partly due to this reason. One key shortcoming of this paradigm shift is that the protein targets modulated need to be elucidated subsequently, which is often a costly and time-consuming procedure. In this work, we have explored both improved methods and real-world case studies of how computational methods can help in target elucidation of phenotypic screens. One limitation of previous methods has been the ability to assess the applicability domain of the models, that is, when the assumptions made by a model are fulfilled and which input chemicals are reliably appropriate for the models. Hence, a major focus of this work was to explore methods for calibration of machine learning algorithms using Platt Scaling, Isotonic Regression Scaling and Venn-Abers Predictors, since the probabilities from well calibrated classifiers can be interpreted at a confidence level and predictions specified at an acceptable error rate. Additionally, many current protocols only offer probabilities for affinity, thus another key area for development was to expand the target prediction models with functional prediction (activation or inhibition). This extra level of annotation is important since the activation or inhibition of a target may positively or negatively impact the phenotypic response in a biological system. Furthermore, many existing methods do not utilize the wealth of bioactivity information held for orthologue species. We therefore also focused on an in-depth analysis of orthologue bioactivity data and its relevance and applicability towards expanding compound and target bioactivity space for predictive studies. The realized protocol was trained with 13,918,879 compound-target pairs and comprises 1,651 targets, which has been made available for public use at GitHub. Consequently, the methodology was applied to aid with the target deconvolution of AstraZeneca phenotypic readouts, in particular for the rationalization of cytotoxicity and cytostaticity in the High-Throughput Screening (HTS) collection. Results from this work highlighted which targets are frequently linked to the cytotoxicity and cytostaticity of chemical structures, and provided insight into which compounds to select or remove from the collection for future screening projects. Overall, this project has furthered the field of in silico target deconvolution, by improving the performance and applicability of current protocols and by rationalizing cytotoxicity, which has been shown to influence attrition in clinical trials.

1

Page generated in 0.0785 seconds