Global ETD Search

61	The prediction of mutagenicity and pKa for pharmaceutically relevant compounds using 'quantum chemical topology' descriptors Harding, Alexander January 2011 (has links) Quantum Chemical Topology (QCT) descriptors, calculated from ab initio wave functions, have been utilised to model pKa and mutagenicity for data sets of pharmaceutically relevant compounds. The pKa of a compound is a pivotal property in both life science and chemistry since the propensity of a compound to donate or accept a proton is fundamental to understanding chemical and biological processes. The prediction of mutagenicity, specifically as determined by the Ames test, is important to aid medicinal chemists select compounds avoiding this potential pitfall in drug design. Carbocyclic and heterocyclic aromatic amines were chosen because this compounds class is synthetically very useful but also prone to positive outcomes in the battery of genotoxicity assays.The importance of pKa and genotoxic characteristics cannot be overestimated in drug design, where the multivariate optimisations of properties that influence the Absorption-Distribution-Metabolism-Excretion-Toxicity (ADMET) profiles now features very early on in the drug discovery process.Models were constructed using carboxylic acids in conjunction with the Quantum Topological Molecular Similarity (QTMS) method. The models produced Root Mean Square Error of Prediction (RMSEP) values of less than 0.5 pKa units and compared favourably to other pKa prediction methods. The ortho-substituted benzoic acids had the largest RMSEP which was significantly improved by splitting the compounds into high-correlation subsets. For these subsets, single-term equations containing one ab initio bond length were able to accurately predict pKa. The pKa prediction equations were extended to phenols and anilines.Quantitative Structure Activity Relationship (QSAR) models of acceptable quality were built based on literature data to predict the mutagenic potency (LogMP) of carbo- and heterocyclic aromatic amines using QTMS. However, these models failed to predict Ames test values for compounds screened at GSK. Contradictory internal and external data for several compounds motivated us to determine the fidelity of the Ames test for this compound class. The systematic investigation involved recrystallisation to purify compounds, analytical methods to measure the purity and finally comparative Ames testing. Unexpectedly, the Ames test results were very reproducible when 14 representative repurified molecules were tested as the freebase and the hydrochloride salt in two different solvents (water and DMSO). This work formed the basis for the analysis of Ames data at GSK and a systematic Ames testing programme for aromatic amines. So far, an unprecedentedly large list of 400 compounds has been made available to guide medicinal chemists. We constructed a model for the subset of 100 meta-/para-substituted anilines that could predict 70% of the Ames classifications. The experimental values of several of the model outliers appeared questionable after closer inspection and three of these have been retested so far. The retests lead to the reclassification of two of them and thereby to improved model accuracy of 78%. This demonstrates the power of the iterative process of model building, critical analysis of experimental data, retesting outliers and rebuilding the model. 615
62	Estimativa das funções de recuperação de reservas minerais usando copulas / Estimation of recovers function of mineral reserves using copulas Carmo, Frederico Augusto Rosa do 24 August 2006 (has links) Orientador: Armando Zaupa Remacre / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Geociencias / Made available in DSpace on 2018-08-07T09:52:07Z (GMT). No. of bitstreams: 1 Carmo_FredericoAugustoRosado_D.pdf: 2790866 bytes, checksum: 70c1d59f281ee0f7a09af528c73582a9 (MD5) Previous issue date: 2006 / Resumo: O objetivo principal desta tese foi desenvolver a metodologia de cópulas aplicada ao problema de estimativas de reservas condicionadas, corrigindo erros de tonelagem e quantidade de minério de um projeto, via uma abordagem diferente da simulação estocástica condicional. É apresentado um resumo teórico que fundamenta o estudo de cópulas. Inicia-se com a apresentação de definições e conceitos importantes da estatística e da probabilidade. Após uma discussão sobre medidas de correlação, é introduzido o conceito de cópulas, desde sua definição e propriedades básicas até o estudo de alguns tipos de cópulas essenciais para a aplicação nesta tese. É discutida toda a fundamentação teórica desenvolvida para o cálculo de recursos recuperáveis. Os conceitos de curvas de tonelagem e teores são introduzidos, pois são a base da parametrização de reservas minerais. É mostrado como a cópula pode ser utilizada num dos pontos principais da geoestatística mineira, principalmente no que diz respeito ao erro das estimativas. Discorre-se primeiramente sobre o conceito de validação cruzada, apresentando a definição de reserva ilusória, ótima e ideal. É definida a reserva ideal utilizando o conceito de cópulas, onde a krigagem, a simulação seqüencial gaussiana e a cópula são comparadas, mostrando as conseqüências da sobreestimativa e da subestimativa em projetos de cava e seqüenciamento na mineração / Abstract: The aim of this thesis was to develop the applied methodology of copulas in the problem of conditional reserves estimation. The copulas have a different approach from sequential gaussian simulation and in this thesis was used to correct the tonnage and ore quantity of a mining project. It is presented a theoretical summary that is the bases to the study of copulas. It is also' presented a set of definitions and important concepts of the statistics and the probability. After a discussion about correlation measures, is introducing the concept of copulas, begining with the definition and basic properties until the study of some types of essential copulas that was applied in this thesis. Whole the theoretical fundamentation is discussed to developed the calculation of recoverable resources. The concepts of tonnage and grades curves are introduced, therefore they are the base of the parametrization of mineral reserves. It is shown how the copulas can be used in the main points of the mining geostatistics, mainly in what concerns the estimation errors. Firstly the cross validation concept is presented and the illusory, best and ideal reserves are defined. The ideal reserves is defined using the concept of copulas, and the results are compared with the kriging and sequential gaussian simulation. With this comparisons is possible shown the consequences of the upper-estimation and under estimation in an open pit projects and sequential mining layout / Doutorado / Administração e Politica de Recursos Minerais / Doutor em Ciências Geologia - Métodos estatísticos Copulas Parametrização de reservas Projeto de mineração Validação cruzada Mineração Geoestatistics Copulas Reserve parametrization Mining project Cross validation
63	Meta-learning / Meta-learning Hovorka, Martin January 2008 (has links) Goal of this work is to make acquaintance and study meta-learningu methods, program algorithm and compare with other machine learning methods.
64	Klasifikační metody analýzy vrstvy nervových vláken na sítnici / A Classification Methods for Retinal Nerve Fibre Layer Analysis Zapletal, Petr January 2010 (has links) This thesis is deal with classification for retinal nerve fibre layer. Texture features from six texture analysis methods are used for classification. All methods calculate feature vector from inputs images. This feature vector is characterized for every cluster (class). Classification is realized by three supervised learning algorithms and one unsupervised learning algorithm. The first testing algorithm is called Ho-Kashyap. The next is Bayess classifier NDDF (Normal Density Discriminant Function). The third is the Nearest Neighbor algorithm k-NN and the last tested classifier is algorithm K-means, which belongs to clustering. For better compactness of this thesis, three methods for selection of training patterns in supervised learning algorithms are implemented. The methods are based on Repeated Random Subsampling Cross Validation, K-Fold Cross Validation and Leave One Out Cross Validation algorithms. All algorithms are quantitatively compared in the sense of classication error evaluation.
65	Assessing the Absolute and Relative Performance of IRTrees Using Cross-Validation and the RORME Index DiTrapani, John B. 03 September 2019 (has links) No description available. Quantitative Psychology Item response theory cross-validation model selection model fit item response tree models quantitative psychology
66	Regression and time estimation in the manufacturing industry Bjernulf, Walter January 2023 (has links) In this thesis an analysis is performed on operation times for different sized products in a manufacturing company. The thesis will introduce and summarise most of the theory needed to perform regression and also cover a worked example where three different regression models are learned, evaluated and analysed. Conformal prediction, which at the moment is a hot topic in machine learning, will also be introduced and will be used in the worked example. regression linear regression regression trees random forests cross validation variable selection conformal prediction Probability Theory and Statistics Sannolikhetsteori och statistik
67	Integration of Genome Scale Data for Identifying New Biomarkers in Colon Cancer: Integrated Analysis of Transcriptomics and Epigenomics Data from High Throughput Technologies in Order to Identifying New Biomarkers Genes for Personalised Targeted Therapies for Patients Suffering from Colon Cancer Hassan, Aamir Ul January 2017 (has links) Colorectal cancer is the third most common cancer and the leading cause of cancer deaths in Western industrialised countries. Despite recent advances in the screening, diagnosis, and treatment of colorectal cancer, an estimated 608,000 people die every year due to colon cancer. Our current knowledge of colorectal carcinogenesis indicates a multifactorial and multi-step process that involves various genetic alterations and several biological pathways. The identification of molecular markers with early diagnostic and precise clinical outcome in colon cancer is a challenging task because of tumour heterogeneity. This Ph.D.-thesis presents the molecular and cellular mechanisms leading to colorectal cancer. A systematical review of the literature is conducted on Microarray Gene expression profiling, gene ontology enrichment analysis, microRNA and system Biology and various bioinformatics tools. We aimed this study to stratify a colon tumour into molecular distinct subtypes, identification of novel diagnostic targets and prediction of reliable prognostic signatures for clinical practice using microarray expression datasets. We performed an integrated analysis of gene expression data based on genetic, epigenetic and extensive clinical information using unsupervised learning, correlation and functional network analysis. As results, we identified 267-gene and 124-gene signatures that can distinguish normal, primary and metastatic tissues, and also involved in important regulatory functions such as immune-response, lipid metabolism and peroxisome proliferator-activated receptors (PPARs) signalling pathways. For the first time, we also identify miRNAs that can differentiate between primary colon from metastatic and a prognostic signature of grade and stage levels, which can be a major contributor to complex transcriptional phenotypes in a colon tumour. Colon cancer Microarray gene expression profiling Gene ontology enrichment analysis MicroRNA System biology Bioinformatics Gene signature Cross-validation Diagnostic Prognostic
68	Sequential Adaptive Designs In Computer Experiments For Response Surface Model Fit LAM, CHEN QUIN 29 July 2008 (has links) No description available. Statistics Cross validation Gaussian stochastic process model Kriging Non-stationary response surfaces Sequential designs Adaptive designs Control and noise variables.
69	Three Essays in Inference and Computational Problems in Econometrics Todorov, Zvezdomir January 2020 (has links) This dissertation is organized into three independent chapters. In Chapter 1, I consider the selection of weights for averaging a set of threshold models. Existing model averaging literature primarily focuses on averaging linear models, I consider threshold regression models. The theory I developed in that chapter demonstrates that the proposed jackknife model averaging estimator achieves asymptotic optimality when the set of candidate models are all misspecified threshold models. The simulations study demonstrates that the jackknife model averaging estimator achieves the lowest mean squared error when contrasted against other model selection and model averaging methods. In Chapter 2, I propose a model averaging framework for the synthetic control method of Abadie and Gardeazabal (2003) and Abadie et al. (2010). The proposed estimator serves a twofold purpose. First, it reduces the bias in estimating the weights each member of the donor pool receives. Secondly, it accounts for model uncertainty for the program evaluation estimation. I study two variations of the model, one where model weights are derived by solving a cross-validation quadratic program and another where each candidate model receives equal weights. Next, I show how to apply the placebo study and the conformal inference procedure for both versions of my estimator. With a simulation study, I reveal that the superior performance of the proposed procedure. In Chapter 3, which is co-authored with my advisor Professor Youngki Shin, we provide an exact computation algorithm for the maximum rank correlation estimator using the mixed integer programming (MIP) approach. We construct a new constrained optimization problem by transforming all indicator functions into binary parameters to be estimated and show that the transformation is equivalent to the original problem. Using a modern MIP solver, we apply the proposed method to an empirical example and Monte Carlo simulations. The results show that the proposed algorithm performs better than the existing alternatives. / Dissertation / Doctor of Philosophy (PhD) model averaging cross validation mixed integer programming semiparametric estimation threshold model synthetic control estimation maximum rank correlation
70	Improving computational predictions of Cis-regulatory binding sites in genomic data Rezwan, Faisal Ibne January 2011 (has links) Cis-regulatory elements are the short regions of DNA to which specific regulatory proteins bind and these interactions subsequently influence the level of transcription for associated genes, by inhibiting or enhancing the transcription process. It is known that much of the genetic change underlying morphological evolution takes place in these regions, rather than in the coding regions of genes. Identifying these sites in a genome is a non-trivial problem. Experimental (wet-lab) methods for finding binding sites exist, but all have some limitations regarding their applicability, accuracy, availability or cost. On the other hand computational methods for predicting the position of binding sites are less expensive and faster. Unfortunately, however, these algorithms perform rather poorly, some missing most binding sites and others over-predicting their presence. The aim of this thesis is to develop and improve computational approaches for the prediction of transcription factor binding sites (TFBSs) by integrating the results of computational algorithms and other sources of complementary biological evidence. Previous related work involved the use of machine learning algorithms for integrating predictions of TFBSs, with particular emphasis on the use of the Support Vector Machine (SVM). This thesis has built upon, extended and considerably improved this earlier work. Data from two organisms was used here. Firstly the relatively simple genome of yeast was used. In yeast, the binding sites are fairly well characterised and they are normally located near the genes that they regulate. The techniques used on the yeast genome were also tested on the more complex genome of the mouse. It is known that the regulatory mechanisms of the eukaryotic species, mouse, is considerably more complex and it was therefore interesting to investigate the techniques described here on such an organism. The initial results were however not particularly encouraging: although a small improvement on the base algorithms could be obtained, the predictions were still of low quality. This was the case for both the yeast and mouse genomes. However, when the negatively labeled vectors in the training set were changed, a substantial improvement in performance was observed. The first change was to choose regions in the mouse genome a long way (distal) from a gene over 4000 base pairs away - as regions not containing binding sites. This produced a major improvement in performance. The second change was simply to use randomised training vectors, which contained no meaningful biological information, as the negative class. This gave some improvement over the yeast genome, but had a very substantial benefit for the mouse data, considerably improving on the aforementioned distal negative training data. In fact the resulting classifier was finding over 80% of the binding sites in the test set and moreover 80% of the predictions were correct. The final experiment used an updated version of the yeast dataset, using more state of the art algorithms and more recent TFBSs annotation data. Here it was found that using randomised or distal negative examples once again gave very good results, comparable to the results obtained on the mouse genome. Another source of negative data was tried for this yeast data, namely using vectors taken from intronic regions. Interestingly this gave the best results. 572.072

Search results