Global ETD Search

1	Fuzzy kNNModel Applied to Predictive Toxicology Data Mining Guo, G., Neagu, Daniel January 2005 (has links) No / A robust method, fuzzy kNNModel, for toxicity prediction of chemical compounds is proposed. The method is based on a supervised clustering method, called kNNModel, which employs fuzzy partitioning instead of crisp partitioning to group clusters. The merits of fuzzy kNNModel are two-fold: (1) it overcomes the problems of choosing the parameter ¿ ¿ allowed error rate in a cluster and the parameter N ¿ minimal number of instances covered by a cluster, for each data set; (2) it better captures the characteristics of boundary data by assigning them with different degrees of membership between 0 and 1 to different clusters. The experimental results of fuzzy kNNModel conducted on thirteen public data sets from UCI machine learning repository and seven toxicity data sets from real-world applications, are compared with the results of fuzzy c-means clustering, k-means clustering, kNN, fuzzy kNN, and kNNModel in terms of classification performance. This application shows that fuzzy kNNModel is a promising method for the toxicity prediction of chemical compounds. Fuzzy kNNModel Classification Predictive toxicology
2	Contributions to ensembles of models for predictive toxicology applications : on the representation, comparison and combination of models in ensembles Makhtar, Mokhairi January 2012 (has links) The increasing variety of data mining tools offers a large palette of types and representation formats for predictive models. Managing the models then becomes a big challenge, as well as reusing the models and keeping the consistency of model and data repositories. Sustainable access and quality assessment of these models become limited to researchers. The approach for the Data and Model Governance (DMG) makes easier to process and support complex solutions. In this thesis, contributions are proposed towards ensembles of models with a focus on model representation, comparison and usage. Predictive Toxicology was chosen as an application field to demonstrate the proposed approach to represent predictive models linked to data for DMG. Further analysing methods such as predictive models comparison and predictive models combination for reusing the models from a collection of models were studied. Thus in this thesis, an original structure of the pool of models was proposed to represent predictive toxicology models called Predictive Toxicology Markup Language (PTML). PTML offers a representation scheme for predictive toxicology data and models generated by data mining tools. In this research, the proposed representation offers possibilities to compare models and select the relevant models based on different performance measures using proposed similarity measuring techniques. The relevant models were selected using a proposed cost function which is a composite of performance measures such as Accuracy (Acc), False Negative Rate (FNR) and False Positive Rate (FPR). The cost function will ensure that only quality models be selected as the candidate models for an ensemble. The proposed algorithm for optimisation and combination of Acc, FNR and FPR of ensemble models using double fault measure as the diversity measure improves Acc between 0.01 to 0.30 for all toxicology data sets compared to other ensemble methods such as Bagging, Stacking, Bayes and Boosting. The highest improvements for Acc were for data sets Bee (0.30), Oral Quail (0.13) and Daphnia (0.10). A small improvement (of about 0.01) in Acc was achieved for Dietary Quail and Trout. Important results by combining all the three performance measures are also related to reducing the distance between FNR and FPR for Bee, Daphnia, Oral Quail and Trout data sets for about 0.17 to 0.28. For Dietary Quail data set the improvement was about 0.01 though, but this data set is well known as a difficult learning exercise. For five UCI data sets tested, similar results were achieved with Acc improvement between 0.10 to 0.11, closing more the gaps between FNR and FPR. As a conclusion, the results show that by combining performance measures (Acc, FNR and FPR), as proposed within this thesis, the Acc increased and the distance between FNR and FPR decreased. 006.3
3	Contributions to Ensembles of Models for Predictive Toxicology Applications. On the Representation, Comparison and Combination of Models in Ensembles. Makhtar, Mokhairi January 2012 (has links) The increasing variety of data mining tools offers a large palette of types and representation formats for predictive models. Managing the models then becomes a big challenge, as well as reusing the models and keeping the consistency of model and data repositories. Sustainable access and quality assessment of these models become limited to researchers. The approach for the Data and Model Governance (DMG) makes easier to process and support complex solutions. In this thesis, contributions are proposed towards ensembles of models with a focus on model representation, comparison and usage. Predictive Toxicology was chosen as an application field to demonstrate the proposed approach to represent predictive models linked to data for DMG. Further analysing methods such as predictive models comparison and predictive models combination for reusing the models from a collection of models were studied. Thus in this thesis, an original structure of the pool of models was proposed to represent predictive toxicology models called Predictive Toxicology Markup Language (PTML). PTML offers a representation scheme for predictive toxicology data and models generated by data mining tools. In this research, the proposed representation offers possibilities to compare models and select the relevant models based on different performance measures using proposed similarity measuring techniques. The relevant models were selected using a proposed cost function which is a composite of performance measures such as Accuracy (Acc), False Negative Rate (FNR) and False Positive Rate (FPR). The cost function will ensure that only quality models be selected as the candidate models for an ensemble. The proposed algorithm for optimisation and combination of Acc, FNR and FPR of ensemble models using double fault measure as the diversity measure improves Acc between 0.01 to 0.30 for all toxicology data sets compared to other ensemble methods such as Bagging, Stacking, Bayes and Boosting. The highest improvements for Acc were for data sets Bee (0.30), Oral Quail (0.13) and Daphnia (0.10). A small improvement (of about 0.01) in Acc was achieved for Dietary Quail and Trout. Important results by combining all the three performance measures are also related to reducing the distance between FNR and FPR for Bee, Daphnia, Oral Quail and Trout data sets for about 0.17 to 0.28. For Dietary Quail data set the improvement was about 0.01 though, but this data set is well known as a difficult learning exercise. For five UCI data sets tested, similar results were achieved with Acc improvement between 0.10 to 0.11, closing more the gaps between FNR and FPR. As a conclusion, the results show that by combining performance measures (Acc, FNR and FPR), as proposed within this thesis, the Acc increased and the distance between FNR and FPR decreased. Predictive toxicology Model representation Model comparison Ensembles of models Classifiers
4	A knowledge based approach of toxicity prediction for drug formulation : modelling drug vehicle relationships using soft computing techniques Mistry, Pritesh January 2015 (has links) This multidisciplinary thesis is concerned with the prediction of drug formulations for the reduction of drug toxicity. Both scientific and computational approaches are utilised to make original contributions to the field of predictive toxicology. The first part of this thesis provides a detailed scientific discussion on all aspects of drug formulation and toxicity. Discussions are focused around the principal mechanisms of drug toxicity and how drug toxicity is studied and reported in the literature. Furthermore, a review of the current technologies available for formulating drugs for toxicity reduction is provided. Examples of studies reported in the literature that have used these technologies to reduce drug toxicity are also reported. The thesis also provides an overview of the computational approaches currently employed in the field of in silico predictive toxicology. This overview focuses on the machine learning approaches used to build predictive QSAR classification models, with examples discovered from the literature provided. Two methodologies have been developed as part of the main work of this thesis. The first is focused on use of directed bipartite graphs and Venn diagrams for the visualisation and extraction of drug-vehicle relationships from large un-curated datasets which show changes in the patterns of toxicity. These relationships can be rapidly extracted and visualised using the methodology proposed in chapter 4. The second methodology proposed, involves mining large datasets for the extraction of drug-vehicle toxicity data. The methodology uses an area-under-the-curve principle to make pairwise comparisons of vehicles which are classified according to the toxicity protection they offer, from which predictive classification models based on random forests and decisions trees are built. The results of this methodology are reported in chapter 6.
5	Outils et concepts de biologie systémique pour la modélisation prédictive de la toxicité / Systems biology tools and concepts for predictive modelling of toxicity Hamon, Jérémy 21 November 2013 (has links) Le besoin actuel de comprendre les conséquences précises que l'administration d'une molécule va avoir sur un organisme et les organes qui le composent, est un enjeu majeur pour la recherche pharmaceutique et l'étude de la toxicité des xénobiotiques. Il n’est pas difficile de se rendre compte, quand les effets sont observables, qu’il existe un lien entre la dose administrée d’un xénobiotique et ses effets. La difficulté de les prédire, qu'ils soient bénéfiques ou délétères, réside principalement dans le fait qu'un nombre très important de mécanismes complexes sont mis en jeu, dès l'entrée de cette molécule dans l'organisme et jusqu'à son excrétion. Afin de comprendre et quantifier ce lien, et pour pouvoir faire des prédictions, il est nécessaire de connaître les principaux mécanismes biologiques impliqués et de proposer des modèles mathématiques les décrivant. Le travail présenté dans cette thèse montre que l'utilisation de la biologie systémique n'est pas facile et manque encore de maturité. Au-delà de la diversité des connaissances auxquelles elle fait appel, on se rend compte que la quantité de données et de paramètres à gérer est considérable. Pour un modèle ne prenant en compte qu'une seule voie de signalisation, comme celui présenté ici, plusieurs mois ont été nécessaires pour sa calibration. Cette durée est en grande partie imputable au temps de calculs nécessaire aux estimations des paramètres, et à celui nécessaire à la récolte et aux traitements des données très diverses (données PK, omiques, physiologiques, cellulaires, etc). Il est très important que le protocole de collecte des données soit défini en commun par l'ensemble des équipes les utilisant par la suite. / The current need to understand the consequences of the administration of a specific molecule to a given organism is a major issue for pharmaceutical and toxicological research. It is not difficult to realize, when the effects are observable, that there is a relationship between the dose of a xenobiotic and its effects. The difficulty in predicting such effects comes mainly from the fact that a large number of complex mechanisms are involved, from the entry of the molecule in the body to its excretion. To understand and quantify this relationship it is necessary to know the main biological mechanisms involved and to propose corresponding mathematical models. Pharmacokinetics, pharmacodynamics and systems biology are the scientific fields most appropriate to meet this need. The first examines the fate of xenobiotics in the body and the second, the evolution of their effects. Systems biology is a relatively new approach which combines different levels of information (experimental data, chemical and biological knowledge, assumptions, etc) with mathematical models to understand how complex biological systems work. Our work shows that the use of systems biology is not easy and still lacks maturity. The amount of data and parameters to manage is typically huge. For a model taking into account only one signaling pathway, several months were needed for its calibration. This length of time is largely due to computation time required for parameter estimates, but also to the time required for harvesting and processing of diverse data (PK data omics, physiological, cellular, etc.) It is important that data collection protocol be defined in common by all the teams involved. Toxicologie prédictive Modélisation mathématique Pharmacokinetics Systems biology Predictive toxicology Mathematical modelling
6	Multi-label classification on locally-linear data: Application to chemical toxicity prediction Yap, Xiu Huan 16 August 2021 (has links) No description available. Computer Science Toxicology Predictive Toxicology Multi-label Classification Locally-linear data Locality-sensitive deep learner attention
7	A Knowledge Based Approach of Toxicity Prediction for Drug Formulation. Modelling Drug Vehicle Relationships Using Soft Computing Techniques Mistry, Pritesh January 2015 (has links) This multidisciplinary thesis is concerned with the prediction of drug formulations for the reduction of drug toxicity. Both scientific and computational approaches are utilised to make original contributions to the field of predictive toxicology. The first part of this thesis provides a detailed scientific discussion on all aspects of drug formulation and toxicity. Discussions are focused around the principal mechanisms of drug toxicity and how drug toxicity is studied and reported in the literature. Furthermore, a review of the current technologies available for formulating drugs for toxicity reduction is provided. Examples of studies reported in the literature that have used these technologies to reduce drug toxicity are also reported. The thesis also provides an overview of the computational approaches currently employed in the field of in silico predictive toxicology. This overview focuses on the machine learning approaches used to build predictive QSAR classification models, with examples discovered from the literature provided. Two methodologies have been developed as part of the main work of this thesis. The first is focused on use of directed bipartite graphs and Venn diagrams for the visualisation and extraction of drug-vehicle relationships from large un-curated datasets which show changes in the patterns of toxicity. These relationships can be rapidly extracted and visualised using the methodology proposed in chapter 4. The second methodology proposed, involves mining large datasets for the extraction of drug-vehicle toxicity data. The methodology uses an area-under-the-curve principle to make pairwise comparisons of vehicles which are classified according to the toxicity protection they offer, from which predictive classification models based on random forests and decisions trees are built. The results of this methodology are reported in chapter 6.
8	Interpretation, identification and reuse of models : theory and algorithms with applications in predictive toxicology Palczewska, Anna Maria January 2014 (has links) This thesis is concerned with developing methodologies that enable existing models to be effectively reused. Results of this thesis are presented in the framework of Quantitative Structural-Activity Relationship (QSAR) models, but their application is much more general. QSAR models relate chemical structures with their biological, chemical or environmental activity. There are many applications that offer an environment to build and store predictive models. Unfortunately, they do not provide advanced functionalities that allow for efficient model selection and for interpretation of model predictions for new data. This thesis aims to address these issues and proposes methodologies for dealing with three research problems: model governance (management), model identification (selection), and interpretation of model predictions. The combination of these methodologies can be employed to build more efficient systems for model reuse in QSAR modelling and other areas. The first part of this study investigates toxicity data and model formats and reviews some of the existing toxicity systems in the context of model development and reuse. Based on the findings of this review and the principles of data governance, a novel concept of model governance is defined. Model governance comprises model representation and model governance processes. These processes are designed and presented in the context of model management. As an application, minimum information requirements and an XML representation for QSAR models are proposed. Once a collection of validated, accepted and well annotated models is available within a model governance framework, they can be applied for new data. It may happen that there is more than one model available for the same endpoint. Which one to chose? The second part of this thesis proposes a theoretical framework and algorithms that enable automated identification of the most reliable model for new data from the collection of existing models. The main idea is based on partitioning of the search space into groups and assigning a single model to each group. The construction of this partitioning is difficult because it is a bi-criteria problem. The main contribution in this part is the application of Pareto points for the search space partition. The proposed methodology is applied to three endpoints in chemoinformatics and predictive toxicology. After having identified a model for the new data, we would like to know how the model obtained its prediction and how trustworthy it is. An interpretation of model predictions is straightforward for linear models thanks to the availability of model parameters and their statistical significance. For non linear models this information can be hidden inside the model structure. This thesis proposes an approach for interpretation of a random forest classification model. This approach allows for the determination of the influence (called feature contribution) of each variable on the model prediction for an individual data. In this part, there are three methods proposed that allow analysis of feature contributions. Such analysis might lead to the discovery of new patterns that represent a standard behaviour of the model and allow additional assessment of the model reliability for new data. The application of these methods to two standard benchmark datasets from the UCI machine learning repository shows a great potential of this methodology. The algorithm for calculating feature contributions has been implemented and is available as an R package called rfFC. 615.9
9	Integration of data quality, kinetics and mechanistic modelling into toxicological assessment of cosmetic ingredients Steinmetz, Fabian January 2016 (has links) In our modern society we are exposed to many natural and synthetic chemicals. The assessment of chemicals with regard to human safety is difficult but nevertheless of high importance. Beside clinical studies, which are restricted to potential pharmaceuticals only, most toxicity data relevant for regulatory decision-making are based on in vivo data. Due to the ban on animal testing of cosmetic ingredients in the European Union, alternative approaches, such as in vitro and in silico tests, have become more prevalent. In this thesis existing non-testing approaches (i.e. studies without additional experiments) have been extended, e.g. QSAR models, and new non-testing approaches, e.g. in vitro data supported structural alert systems, have been created. The main aspect of the thesis depends on the determination of data quality, improving modelling performance and supporting Adverse Outcome Pathways (AOPs) with definitions of structural alerts and physico-chemical properties. Furthermore, there was a clear focus on the transparency of models, i.e. approaches using algorithmic feature selection, machine learning etc. have been avoided. Furthermore structural alert systems have been written in an understandable and transparent manner. Beside the methodological aspects of this work, cosmetically relevant examples of models have been chosen, e.g. skin penetration and hepatic steatosis. Interpretations of models, as well as the possibility of adjustments and extensions, have been discussed thoroughly. As models usually do not depict reality flawlessly, consensus approaches of various non-testing approaches and in vitro tests should be used to support decision-making in the regulatory context. For example within read-across, it is feasible to use supporting information from QSAR models, docking, in vitro tests etc. By applying a variety of models, results should lead to conclusions being more usable/acceptable within toxicology. Within this thesis (and associated publications) novel methodologies on how to assess and employ statistical data quality and how to screen for potential liver toxicants have been described. Furthermore computational tools, such as models for skin permeability and dermal absorption, have been created. 615.9
10	Towards model governance in predictive toxicology Palczewska, Anna Maria, Fu, X., Trundle, Paul R., Yang, Longzhi, Neagu, Daniel, Ridley, Mick J., Travis, Kim January 2013 (has links) no / Efficient management of toxicity information as an enterprise asset is increasingly important for the chemical, pharmaceutical, cosmetics and food industries. Many organisations focus on better information organisation and reuse, in an attempt to reduce the costs of testing and manufacturing in the product development phase. Toxicity information is extracted not only from toxicity data but also from predictive models. Accurate and appropriately shared models can bring a number of benefits if we are able to make effective use of existing expertise. Although usage of existing models may provide high-impact insights into the relationships between chemical attributes and specific toxicological effects, they can also be a source of risk for incorrect decisions. Thus, there is a need to provide a framework for efficient model management. To address this gap, this paper introduces a concept of model governance, that is based upon data governance principles. We extend the data governance processes by adding procedures that allow the evaluation of model use and governance for enterprise purposes. The core aspect of model governance is model representation. We propose six rules that form the basis of a model representation schema, called Minimum Information About a QSAR Model Representation (MIAQMR). As a proof-of-concept of our model governance framework we develop a web application called Model and Data Farm (MADFARM), in which models are described by the MIAQMR-ML markup language. (C) 2013 Elsevier Ltd. All rights reserved. Model governance ; Data governance ; Predictive toxicology ; Information representation ; Knowledge management ; Quality assessment ; Data quality assessment ; Management : QSPR

Search results