Spelling suggestions: "subject:"applicability domain"" "subject:"pplicability domain""
1 |
Contributions to evaluation of machine learning models. Applicability domain of classification modelsRado, Omesaad A.M. January 2019 (has links)
Artificial intelligence (AI) and machine learning (ML) present some application opportunities and
challenges that can be framed as learning problems. The performance of machine learning models
depends on algorithms and the data. Moreover, learning algorithms create a model of reality through
learning and testing with data processes, and their performance shows an agreement degree of their
assumed model with reality. ML algorithms have been successfully used in numerous classification
problems. With the developing popularity of using ML models for many purposes in different domains,
the validation of such predictive models is currently required more formally. Traditionally, there are
many studies related to model evaluation, robustness, reliability, and the quality of the data and the
data-driven models. However, those studies do not consider the concept of the applicability domain
(AD) yet. The issue is that the AD is not often well defined, or it is not defined at all in many fields. This
work investigates the robustness of ML classification models from the applicability domain
perspective. A standard definition of applicability domain regards the spaces in which the model
provides results with specific reliability.
The main aim of this study is to investigate the connection between the applicability domain approach
and the classification model performance. We are examining the usefulness of assessing the AD for
the classification model, i.e. reliability, reuse, robustness of classifiers. The work is implemented using
three approaches, and these approaches are conducted in three various attempts: firstly, assessing
the applicability domain for the classification model; secondly, investigating the robustness of the
classification model based on the applicability domain approach; thirdly, selecting an optimal model
using Pareto optimality. The experiments in this work are illustrated by considering different machine
learning algorithms for binary and multi-class classifications for healthcare datasets from public
benchmark data repositories. In the first approach, the decision trees algorithm (DT) is used for the
classification of data in the classification stage. The feature selection method is applied to choose
features for classification. The obtained classifiers are used in the third approach for selection of
models using Pareto optimality. The second approach is implemented using three steps; namely,
building classification model; generating synthetic data; and evaluating the obtained results.
The results obtained from the study provide an understanding of how the proposed approach can help
to define the model’s robustness and the applicability domain, for providing reliable outputs. These
approaches open opportunities for classification data and model management. The proposed
algorithms are implemented through a set of experiments on classification accuracy of instances,
which fall in the domain of the model. For the first approach, by considering all the features, the
highest accuracy obtained is 0.98, with thresholds average of 0.34 for Breast cancer dataset. After
applying recursive feature elimination (RFE) method, the accuracy is 0.96% with 0.27 thresholds
average. For the robustness of the classification model based on the applicability domain approach,
the minimum accuracy is 0.62% for Indian Liver Patient data at r=0.10, and the maximum accuracy is
0.99% for Thyroid dataset at r=0.10. For the selection of an optimal model using Pareto optimality,
the optimally selected classifier gives the accuracy of 0.94% with 0.35 thresholds average.
This research investigates critical aspects of the applicability domain as related to the robustness of
classification ML algorithms. However, the performance of machine learning techniques depends on
the degree of reliable predictions of the model. In the literature, the robustness of the ML model can
be defined as the ability of the model to provide the testing error close to the training error. Moreover,
the properties can describe the stability of the model performance when being tested on the new
datasets. Concluding, this thesis introduced the concept of applicability domain for classifiers and
tested the use of this concept with some case studies on health-related public benchmark datasets. / Ministry of Higher Education in Libya
|
2 |
Estimation of the acute toxicity and prediction of the metabolism site for organic molecules using GALAS methodology / Organinių medžiagų ūmaus toksiškumo ir metabolizmo vietos molekulėje prognozavimas taikant GALAS metodąSazonovas, Andrius 27 May 2010 (has links)
The dissertation presents GALAS models for the estimation of the acute toxicity towards two rodent species following different administration routes as well as for the prediction of CYP3A4 and CYP2D6 regioselectivity in the main metabolic reactions mediated by these enzymes (13 individual models in total). All these models feature the ability of the quantitative model Applicability Domain (AD) evaluation via the estimated prediction Reliability Indices (RI). I.e., the obtained models conform to one of the main requirements for the QSAR model acceptance as an alternative research method by the EU regulatory institutions. Evident correlation between prediction reliability and its accuracy allowed classifying each model result into one of several qualitative classes according to RI values. One possible way of utilizing such information, discussed in this study, is compound prioritization before experimental testing potentially resulting in reduction of the number of necessary measurements. As demonstrated the AD of the obtained GALAS models can be easily expanded to cover specific compound classes of researcher interest using ‘in-house’ databases of experimental data. This feature significantly improves the possibilities for the practical application of these models, based on public data, in industry. Especially given the fact that the described improvements in predictions following the addition of similar compounds was instant and required no rebuilding of the baseline models. / Disertacijoje pristatomi GALAS metodika paremti ūmaus toksiškumo dviems graužikų rūšims bei visai eilei skirtingų medžiagos patekimo į organizmą būdų ir CYP3A4 bei CYP2D6 fermentų regioselektyvumo pagrindinėse jų katalizuojamose metabolizmo reakcijose prognozavimo modeliai (iš viso 13 individualių modelių). Visi minimi modeliai kokybiškai išsiskiria iš anksčiau publikuotų savo analogų dėl kiekybinio jų pritaikomumo srities įvertinimo galimybės, kurią suteikia apskaičiuojamos prognozės patikimumo indekso (RI) reikšmės. Tokia savybė yra vienas pagrindinių reikalavimų vertinant bet kokio modelio galimybes tapti ES oficialiai pripažintu alternatyviu tyrimo metodu. Aiški prognozių kokybės priklausomybė nuo jų patikimumo išraiškos taip pat suteikia galimybę modelio rezultatus suskaidyti į kokybines klases pagal apskaičiuotąsias RI reikšmes. Vienas iš tokios informacijos panaudojimo būdų siūlomų disertacijoje yra junginių prioritetizavimas prieš bet kokius eksperimentinius matavimus ir netgi pastarųjų skaičiaus potencialus sumažinimas. Disertacijoje taip pat išnagrinėta galimybė greitai bei efektyviai apmokyti gautuosius GALAS modelius naujais eksperimentiniais duomenimis, išplečiant jų pritaikomumo sritį. Ši esminė savybė radikaliai padidina nagrinėjamųjų modelių, paremtų viešai prieinamų duomenų rinkiniais, realaus praktinio panaudojimo farmacijos pramonėje galimybes.
|
3 |
Permanganate Reaction Kinetics and Mechanisms and Machine Learning Application in Oxidative Water TreatmentZhong, Shifa 21 June 2021 (has links)
No description available.
|
4 |
Cartography of chemical space / Cartographie de l'espace chimiqueGaspar, Héléna Alexandra 29 September 2015 (has links)
Cette thèse est consacrée à la cartographie de l’espace chimique ; son but est d’établir les bases d’un outil donnant une vision d’ensemble d’un jeu de données, comprenant prédiction d’activité, visualisation, et comparaison de grandes librairies. Dans cet ouvrage, nous introduisons des modèles prédictifs QSAR (relations quantitatives structure à activité) avec de nouvelles définitions de domaines d’applicabilité, basés sur la méthode GTM (generative topographic mapping), introduite par C. Bishop et al. Une partie de cette thèse concerne l’étude de grandes librairies de composés chimiques grâce à la méthode GTM incrémentale. Nous introduisons également une nouvelle méthode « Stargate GTM », ou S-GTM, permettant de passer de l’espace des descripteurs chimiques à celui des activités et vice versa, appliquée à la prédiction de profils d’activité ou aux QSAR inverses. / This thesis is dedicated to the cartography of chemical space; our goal is to establish the foundations of a tool offering a complete overview of a chemical dataset, including visualization, activity prediction, and comparison of very large datasets. In this work, we introduce new QSAR models (quantitative structure-activity relationship) based on the GTM method (generative topographic mapping), introduced by C. Bishop et al. A part of this thesis is dedicated to the visualization and analysis of large chemical libraries using the incremental version of GTM. We also introduce a new method coined “Stargate GTM” or S-GTM, which allows us to travel from the space of chemical descriptors to activity space and vice versa; this approach was applied to activity profile prediction and inverse QSAR.
|
5 |
Improved in silico methods for target deconvolution in phenotypic screensMervin, Lewis January 2018 (has links)
Target-based screening projects for bioactive (orphan) compounds have been shown in many cases to be insufficiently predictive for in vivo efficacy, leading to attrition in clinical trials. Phenotypic screening has hence undergone a renaissance in both academia and in the pharmaceutical industry, partly due to this reason. One key shortcoming of this paradigm shift is that the protein targets modulated need to be elucidated subsequently, which is often a costly and time-consuming procedure. In this work, we have explored both improved methods and real-world case studies of how computational methods can help in target elucidation of phenotypic screens. One limitation of previous methods has been the ability to assess the applicability domain of the models, that is, when the assumptions made by a model are fulfilled and which input chemicals are reliably appropriate for the models. Hence, a major focus of this work was to explore methods for calibration of machine learning algorithms using Platt Scaling, Isotonic Regression Scaling and Venn-Abers Predictors, since the probabilities from well calibrated classifiers can be interpreted at a confidence level and predictions specified at an acceptable error rate. Additionally, many current protocols only offer probabilities for affinity, thus another key area for development was to expand the target prediction models with functional prediction (activation or inhibition). This extra level of annotation is important since the activation or inhibition of a target may positively or negatively impact the phenotypic response in a biological system. Furthermore, many existing methods do not utilize the wealth of bioactivity information held for orthologue species. We therefore also focused on an in-depth analysis of orthologue bioactivity data and its relevance and applicability towards expanding compound and target bioactivity space for predictive studies. The realized protocol was trained with 13,918,879 compound-target pairs and comprises 1,651 targets, which has been made available for public use at GitHub. Consequently, the methodology was applied to aid with the target deconvolution of AstraZeneca phenotypic readouts, in particular for the rationalization of cytotoxicity and cytostaticity in the High-Throughput Screening (HTS) collection. Results from this work highlighted which targets are frequently linked to the cytotoxicity and cytostaticity of chemical structures, and provided insight into which compounds to select or remove from the collection for future screening projects. Overall, this project has furthered the field of in silico target deconvolution, by improving the performance and applicability of current protocols and by rationalizing cytotoxicity, which has been shown to influence attrition in clinical trials.
|
Page generated in 0.0785 seconds