Quantitative Structure-Activity Relationship (QSAR) has been applied extensively in predicting toxicity of Disinfection By-Products (DBPs) in drinking water. Among many toxicological properties, acute and chronic toxicities of DBPs have been widely used in health risk assessment of DBPs. These toxicities are correlated with molecular properties, which are usually correlated with molecular descriptors. The primary goals of this thesis are: 1) to investigate the effects of molecular descriptors (e.g., chlorine number) on molecular properties such as energy of the lowest unoccupied molecular orbital (ELUMO) via QSAR modelling and analysis; 2) to validate the models by using internal and external cross-validation techniques; 3) to quantify the model uncertainties through Taylor and Monte Carlo Simulation. One of the very important ways to predict molecular properties such as ELUMO is using QSAR analysis. In this study, number of chlorine (NCl) and number of carbon (NC) as well as energy of the highest occupied molecular orbital (EHOMO) are used as molecular descriptors. There are typically three approaches used in QSAR model development: 1) Linear or Multi-linear Regression (MLR); 2) Partial Least Squares (PLS); and 3) Principle Component Regression (PCR). In QSAR analysis, a very critical step is model validation after QSAR models are established and before applying them to toxicity prediction. The DBPs to be studied include five chemical classes: chlorinated alkanes, alkenes, and aromatics. In addition, validated QSARs are developed to describe the toxicity of selected groups (i.e., chloro-alkane and aromatic compounds with a nitro- or cyano group) of DBP chemicals to three types of organisms (e.g., Fish, T. pyriformis, and P.pyosphoreum) based on experimental toxicity data from the literature. The results show that: 1) QSAR models to predict molecular property built by MLR, PLS or PCR can be used either to select valid data points or to eliminate outliers; 2) The Leave-One-Out Cross-Validation procedure by itself is not enough to give a reliable representation of the predictive ability of the QSAR models, however, Leave-Many-Out/K-fold cross-validation and external validation can be applied together to achieve more reliable results; 3) ELUMO are shown to correlate highly with the NCl for several classes of DBPs; and 4) According to uncertainty analysis using Taylor method, the uncertainty of QSAR models is contributed mostly from NCl for all DBP classes.
Identifer | oai:union.ndltd.org:fiu.edu/oai:digitalcommons.fiu.edu:etd-1231 |
Date | 27 May 2009 |
Creators | Wang, Fang |
Publisher | FIU Digital Commons |
Source Sets | Florida International University |
Detected Language | English |
Type | text |
Format | application/pdf |
Source | FIU Electronic Theses and Dissertations |
Page generated in 0.0016 seconds