Global ETD Search

21	Topics on Uncertainty Quantification for Model Selection Wang, Linna January 2021 (has links) No description available. Statistics Model selection uncertainty Bootstrap Model selection deviation Distribution of the selected model Confidence set
22	Using Helix-coil Models to Study Protein Unfolded States Hughes, Roy Gene January 2016 (has links) <p>An abstract of a thesis devoted to using helix-coil models to study unfolded states.\\</p><p>Research on polypeptide unfolded states has received much more attention in the last decade or so than it has in the past. Unfolded states are thought to be implicated in various</p><p>misfolding diseases and likely play crucial roles in protein folding equilibria and folding rates. Structural characterization of unfolded states has proven to be</p><p>much more difficult than the now well established practice of determining the structures of folded proteins. This is largely because many core assumptions underlying</p><p>folded structure determination methods are invalid for unfolded states. This has led to a dearth of knowledge concerning the nature of unfolded state conformational</p><p>distributions. While many aspects of unfolded state structure are not well known, there does exist a significant body of work stretching back half a century that</p><p>has been focused on structural characterization of marginally stable polypeptide systems. This body of work represents an extensive collection of experimental</p><p>data and biophysical models associated with describing helix-coil equilibria in polypeptide systems. Much of the work on unfolded states in the last decade has not been devoted</p><p>specifically to the improvement of our understanding of helix-coil equilibria, which arguably is the most well characterized of the various conformational equilibria</p><p>that likely contribute to unfolded state conformational distributions. This thesis seeks to provide a deeper investigation of helix-coil equilibria using modern</p><p>statistical data analysis and biophysical modeling techniques. The studies contained within seek to provide deeper insights and new perspectives on what we presumably</p><p>know very well about protein unfolded states. \\</p><p>Chapter 1 gives an overview of recent and historical work on studying protein unfolded states. The study of helix-coil equilibria is placed in the context</p><p>of the general field of unfolded state research and the basics of helix-coil models are introduced.\\</p><p>Chapter 2 introduces the newest incarnation of a sophisticated helix-coil model. State of the art modern statistical techniques are employed to estimate the energies</p><p>of various physical interactions that serve to influence helix-coil equilibria. A new Bayesian model selection approach is utilized to test many long-standing </p><p>hypotheses concerning the physical nature of the helix-coil transition. Some assumptions made in previous models are shown to be invalid and the new model </p><p>exhibits greatly improved predictive performance relative to its predecessor. \\</p><p>Chapter 3 introduces a new statistical model that can be used to interpret amide exchange measurements. As amide exchange can serve as a probe for residue-specific</p><p>properties of helix-coil ensembles, the new model provides a novel and robust method to use these types of measurements to characterize helix-coil ensembles experimentally</p><p>and test the position-specific predictions of helix-coil models. The statistical model is shown to perform exceedingly better than the most commonly used </p><p>method for interpreting amide exchange data. The estimates of the model obtained from amide exchange measurements on an example helical peptide </p><p>also show a remarkable consistency with the predictions of the helix-coil model. \\</p><p>Chapter 4 involves a study of helix-coil ensembles through the enumeration of helix-coil configurations. Aside from providing new insights into helix-coil ensembles,</p><p>this chapter also introduces a new method by which helix-coil models can be extended to calculate new types of observables. Future work on this approach could potentially</p><p>allow helix-coil models to move into use domains that were previously inaccessible and reserved for other types of unfolded state models that were introduced in chapter 1.</p> / Dissertation Biophysics Statistics Biochemistry amide coil ensemble exchange helix model selection
23	Selecting Spatial Scale of Area-Level Covariates in Regression Models Grant, Lauren 01 January 2016 (has links) Studies have found that the level of association between an area-level covariate and an outcome can vary depending on the spatial scale (SS) of a particular covariate. However, covariates used in regression models are customarily modeled at the same spatial unit. In this dissertation, we developed four SS model selection algorithms that select the best spatial scale for each area-level covariate. The SS forward stepwise, SS incremental forward stagewise, SS least angle regression (LARS), and SS lasso algorithms allow for the selection of different area-level covariates at different spatial scales, while constraining each covariate to enter at most one spatial scale. We applied our methods to two real applications with area-level covariates available at multiple scales to model variation in the following outcomes: 1) nitrate concentrations in private wells in Iowa and 2) body mass index z-scores of pediatric patients of the Virginia Commonwealth University Medical Center. In both applications, our SS algorithms selected covariates at different spatial scales, producing a better goodness of fit in comparison to traditional models, where all area-level covariates were modeled at the same scale. We evaluated our methods using simulation studies to examine the performance of the SS algorithms and found that the SS algorithms generally outperformed the conventional modeling approaches. These findings underscore the importance of considering spatial scale when performing model selection. model selection spatial scale neighborhood nitrate obesity Biostatistics
24	Aplicação do algorítmo genético no mapeamento de genes epistáticos em cruzamentos controlados / Application of genetic algorithm in the genes epistatic map in controlled crossings Oliveira, Paulo Tadeu Meira e Silva de 22 August 2008 (has links) O mapeamento genético é constituído por procedimentos experimentais e estatísticos que buscam detectar genes associados à etiologia e regulação de doenças, além de estimar os efeitos genéticos e as localizações genômicas correspondentes. Considerando delineamentos experimentais que envolvem cruzamentos controlados de animais ou plantas, diferentes formulações de modelos de regressão podem ser adotados na identificação de QTLs (do inglês, quantitative trait loci), incluindo seus efeitos principais e possíveis efeitos de interação (epistasia). A dificuldade nestes casos de mapeamento é a comparação de modelos que não necessariamente são encaixados e envolvem um espaço de busca de alta dimensão. Para este trabalho, descrevemos um método geral para melhorar a eficiência computacional em mapeamento simultâneo de múltiplos QTLs e de seus efeitos de interação. A literatura tem usado métodos de busca exaustiva ou busca condicional. Propomos o uso do algoritmo genético para pesquisar o espaço multilocos, sendo este mais útil para genomas maiores e mapas densos de marcadores moleculares. Por meio de estudos de simulações mostramos que a busca baseada no algoritmo genético tem eficiência, em geral, mais alta que aquela de um método de busca condicional e que esta eficiência é comparável àquela de uma busca exaustiva. Na formalização do algoritmo genético pesquisamos o comportamento de parâmetros tais como: probabilidade de recombinação, probabilidade de mutação, tamanho amostral, quantidade de gerações, quantidade de soluções e tamanho do genoma, para diferentes funções objetivo: BIC (do inglês, Bayesian Information Criterion), AIC (do inglês, Akaike Information Criterion) e SSE, a soma de quadrados dos resíduos de um modelo ajustado. A aplicação das metodologias propostas é também considerada na análise de um conjunto de dados genotípicos e fenotípicos de ratos provenientes de um delineamento F2. / Genetic mapping is defined in terms of experimental and statistical procedures applied for detection and localization of genes associated to the etiology and regulation of diseases. Considering experimental designs in controlled crossings of animals or plants, different formulations of regression models can be adopted in the identification of QTL\'s (Quantitative Trait Loci) to the inclusion of the main and interaction effects between genes (epistasis). The difficulty in these approaches of gene mapping is the comparison of models that are not necessarily nested and involves a multiloci search space of high dimension. In this work, we describe a general method to improve the computational efficiency in simultaneous mapping of multiples QTL\'s and their interactions effects. The literature has used methods of exhausting search or conditional search. We consider the genetic algorithm to search the multiloci space, looking for epistatics loci distributed on the genome. Compared to the others procedures, the advantage to use such algorithm increases more for set of genes bigger and dense maps of molecular markers. Simulation studies have shown that the search based on the genetic algorithm has efficiency, in general, higher than the conditional search and that its efficiency is comparable to that one of an exhausting search. For formalization of the genetic algorithm we consider different values of the parameters as recombination probability, mutation probability, sample size, number of generations, number of solutions and size of the set of genes. We evaluate different objective functions under the genetic algorithm: BIC, AIC and SSE. In addition, we used the sample phenotypic and genotypic data bank. Briefly, the study examined blood pressure variation before and after a salt loading experiment in an intercross (F2) progeny. algoritmo genético epistasia epistasis genetic algorithms model selection seleção de modelos
25	Parcimonie dans les modèles Markoviens et application à l'analyse des séquences biologiques / Parsimonious Markov models and application to biological sequence analysis Bourguignon, Pierre Yves Vincent 15 December 2008 (has links) Les chaînes de Markov constituent une famille de modèle statistique incontournable dans de nombreuses applications, dont le spectre s'étend de la compression de texte à l'analyse des séquences biologiques. Un problème récurrent dans leur mise en oeuvre face à des données réelles est la nécessité de compromettre l'ordre du modèle, qui conditionne la complexité des interactions modélisées, avec la quantité d'information fournies par les données, dont la limitation impacte négativement la qualité des estimations menées. Les arbres de contexte permettent une granularité fine dans l'établissement de ce compromis, en permettant de recourir à des longueurs de mémoire variables selon le contexte rencontré dans la séquence. Ils ont donné lieu à des outils populaires tant pour l'indexation des textes que pour leur compression (Context Tree Maximisation – CTM - et Context Tree Weighting - CTW). Nous proposons une extension de cette classe de modèles, en introduisant les arbres de contexte parcimonieux, obtenus par fusion de noeuds issus du même parent dans l'arbre. Ces fusions permettent une augmentation radicale de la granularité de la sélection de modèle, permettant ainsi de meilleurs compromis entre complexité du modèle et qualité de l'estimation, au prix d'une extension importante de la quantité de modèles mise en concurrence. Cependant, grâce à une approche bayésienne très similaire à celle employée dans CTM et CTW, nous avons pu concevoir une méthode de sélection de modèles optimisant de manière exacte le critère bayésien de sélection de modèles tout en bénéficiant d'une programmation dynamique. Il en résulte un algorithme atteignant la borne inférieure de la complexité du problème d'optimisation, et pratiquement tractable pour des alphabets de taille inférieure à 10 symboles. Diverses démonstrations de la performance atteinte par cette procédure sont fournies en dernière partie. / Markov chains, as a universal model accounting for finite memory, discrete valued processes, are omnipresent in applied statistics. Their applications range from text compression to the analysis of biological sequences. Their practical use with finite samples, however, systematically require to draw a compromise between the memory length of the model used, which conditions the complexity of the interactions the model may capture, and the amount of information carried by the data, whose limitation negatively impacts the quality of estimation. Context trees, as an extension of the model class of Markov chains, provide the modeller with a finer granularity in this model selection process, by allowing the memory length to vary across contexts. Several popular modelling methods are based on this class of models, in fields such as text indexation of text compression (Context Tree Maximization and Context Tree Weighting). We propose an extension of the models class of context trees, the Parcimonious context trees, which further allow the fusion of sibling nodes in the context tree. They provide the modeller with a yet finer granularity to perform the model selection task, at the cost of an increased computational cost for performing it. Thanks to a bayesian approach of this problem borrowed from compression techniques, we succeeded at desiging an algorithm that exactly optimizes the bayesian criterion, while it benefits from a dynamic programming scheme ensuring the minimisation of the computational complexity of the model selection task. This algorithm is able to perform in reasonable space and time on alphabets up to size 10, and has been applied on diverse datasets to establish the good performances achieved by this approach. Sélection de modèle Markov chains Model selection Bayesian statistics
26	Modelling and simulation of dynamic contrast-enhanced MRI of abdominal tumours Banerji, Anita January 2012 (has links) Dynamic contrast-enhanced (DCE) time series analysis techniques are hard to fully validate quantitatively as ground truth microvascular parameters are difficult to obtain from patient data. This thesis presents a software application for generating synthetic image data from known ground truth tracer kinetic model parameters. As an object oriented design has been employed to maximise flexibility and extensibility, the application can be extended to include different vascular input functions, tracer kinetic models and imaging modalities. Data sets can be generated for different anatomical and motion descriptions as well as different ground truth parameters. The application has been used to generate a synthetic DCE-MRI time series of a liver tumour with non-linear motion of the abdominal organs due to breathing. The utility of the synthetic data has been demonstrated in several applications: in the development of an Akaike model selection technique for assessing the spatially varying characteristics of liver tumours; the robustness of model fitting and model selection to noise, partial volume effects and breathing motion in liver tumours; and the benefit of using model-driven registration to compensate for breathing motion. When applied to synthetic data with appropriate noise levels, the Akaike model selection technique can distinguish between the single-input extended Kety model for tumour and the dual-input Materne model for liver, and is robust to motion. A significant difference between median Akaike probability value in tumour and liver regions is also seen in 5/6 acquired data sets, with the extended Kety model selected for tumour. Knowledge of the ground truth distribution for the synthetic data was used to demonstrate that, whilst median Ktrans does not change significantly due to breathing motion, model-driven registration restored the structure of the Ktrans histogram and so could be beneficial to tumour heterogeneity assessments.
27	Aplicação do algorítmo genético no mapeamento de genes epistáticos em cruzamentos controlados / Application of genetic algorithm in the genes epistatic map in controlled crossings Paulo Tadeu Meira e Silva de Oliveira 22 August 2008 (has links) O mapeamento genético é constituído por procedimentos experimentais e estatísticos que buscam detectar genes associados à etiologia e regulação de doenças, além de estimar os efeitos genéticos e as localizações genômicas correspondentes. Considerando delineamentos experimentais que envolvem cruzamentos controlados de animais ou plantas, diferentes formulações de modelos de regressão podem ser adotados na identificação de QTLs (do inglês, quantitative trait loci), incluindo seus efeitos principais e possíveis efeitos de interação (epistasia). A dificuldade nestes casos de mapeamento é a comparação de modelos que não necessariamente são encaixados e envolvem um espaço de busca de alta dimensão. Para este trabalho, descrevemos um método geral para melhorar a eficiência computacional em mapeamento simultâneo de múltiplos QTLs e de seus efeitos de interação. A literatura tem usado métodos de busca exaustiva ou busca condicional. Propomos o uso do algoritmo genético para pesquisar o espaço multilocos, sendo este mais útil para genomas maiores e mapas densos de marcadores moleculares. Por meio de estudos de simulações mostramos que a busca baseada no algoritmo genético tem eficiência, em geral, mais alta que aquela de um método de busca condicional e que esta eficiência é comparável àquela de uma busca exaustiva. Na formalização do algoritmo genético pesquisamos o comportamento de parâmetros tais como: probabilidade de recombinação, probabilidade de mutação, tamanho amostral, quantidade de gerações, quantidade de soluções e tamanho do genoma, para diferentes funções objetivo: BIC (do inglês, Bayesian Information Criterion), AIC (do inglês, Akaike Information Criterion) e SSE, a soma de quadrados dos resíduos de um modelo ajustado. A aplicação das metodologias propostas é também considerada na análise de um conjunto de dados genotípicos e fenotípicos de ratos provenientes de um delineamento F2. / Genetic mapping is defined in terms of experimental and statistical procedures applied for detection and localization of genes associated to the etiology and regulation of diseases. Considering experimental designs in controlled crossings of animals or plants, different formulations of regression models can be adopted in the identification of QTL\'s (Quantitative Trait Loci) to the inclusion of the main and interaction effects between genes (epistasis). The difficulty in these approaches of gene mapping is the comparison of models that are not necessarily nested and involves a multiloci search space of high dimension. In this work, we describe a general method to improve the computational efficiency in simultaneous mapping of multiples QTL\'s and their interactions effects. The literature has used methods of exhausting search or conditional search. We consider the genetic algorithm to search the multiloci space, looking for epistatics loci distributed on the genome. Compared to the others procedures, the advantage to use such algorithm increases more for set of genes bigger and dense maps of molecular markers. Simulation studies have shown that the search based on the genetic algorithm has efficiency, in general, higher than the conditional search and that its efficiency is comparable to that one of an exhausting search. For formalization of the genetic algorithm we consider different values of the parameters as recombination probability, mutation probability, sample size, number of generations, number of solutions and size of the set of genes. We evaluate different objective functions under the genetic algorithm: BIC, AIC and SSE. In addition, we used the sample phenotypic and genotypic data bank. Briefly, the study examined blood pressure variation before and after a salt loading experiment in an intercross (F2) progeny. algoritmo genético epistasia seleção de modelos epistasis genetic algorithms model selection
28	Criteria for generalized linear model selection based on Kullback's symmetric divergence Acion, Cristina Laura 01 December 2011 (has links) Model selection criteria frequently arise from constructing estimators of discrepancy measures used to assess the disparity between the data generating model and a fitted approximating model. The widely known Akaike information criterion (AIC) results from utilizing Kullback's directed divergence (KDD) as the targeted discrepancy. Under appropriate conditions, AIC serves as an asymptotically unbiased estimator of KDD. The directed divergence is an asymmetric measure of separation between two statistical models, meaning that an alternate directed divergence may be obtained by reversing the roles of the two models in the definition of the measure. The sum of the two directed divergences is Kullback's symmetric divergence (KSD). A comparison of the two directed divergences indicates an important distinction between the measures. When used to evaluate fitted approximating models that are improperly specified, the directed divergence which serves as the basis for AIC is more sensitive towards detecting overfitted models, whereas its counterpart is more sensitive towards detecting underfitted models. Since KSD combines the information in both measures, it functions as a gauge of model disparity which is arguably more balanced than either of its individual components. With this motivation, we propose three estimators of KSD for use as model selection criteria in the setting of generalized linear models: KICo, KICu, and QKIC. These statistics function as asymptotically unbiased estimators of KSD under different assumptions and frameworks. As with AIC, KICo and KICu are both justified for large-sample maximum likelihood settings; however, asymptotic unbiasedness holds under more general assumptions for KICo and KICu than for AIC. KICo serves as an asymptotically unbiased estimator of KSD in settings where the distribution of the response is misspecified. The asymptotic unbiasedness of KICu holds when the candidate model set includes underfitted models. QKIC is a modification of KICo. In the development of QKIC, the likelihood is replaced by the quasi-likelihood. QKIC can be used as a model selection tool when generalized estimating equations, a quasi-likelihood-based method, are used for parameter estimation. We examine the performance of KICo, KICu, and QKIC relative to other relevant criteria in simulation experiments. We also apply QKIC in a model selection problem for a randomized clinical trial investigating the effect of antidepressants on the temporal course of disability after stroke. Generalized linear model Kullback's symmetric divergence Model selection Biostatistics
29	Best-subset model selection based on multitudinal assessments of likelihood improvements Carter, Knute Derek 01 December 2013 (has links) Given a set of potential explanatory variables, one model selection approach is to select the best model, according to some criterion, from among the collection of models defined by all possible subsets of the explanatory variables. A popular procedure that has been used in this setting is to select the model that results in the smallest value of the Akaike information criterion (AIC). One drawback in using the AIC is that it can lead to the frequent selection of overspecified models. This can be problematic if the researcher wishes to assert, with some level of certainty, the necessity of any given variable that has been selected. This thesis develops a model selection procedure that allows the researcher to nominate, a priori, the probability at which overspecified models will be selected from among all possible subsets. The procedure seeks to determine if the inclusion of each candidate variable results in a sufficiently improved fitting term, and hence is referred to as the SIFT procedure. In order to determine whether there is sufficient evidence to retain a candidate variable or not, a set of threshold values are computed. Two procedures are proposed: a naive method based on a set of restrictive assumptions; and an empirical permutation-based method. Graphical tools have also been developed to be used in conjunction with the SIFT procedure. The graphical representation of the SIFT procedure clarifies the process being undertaken. Using these tools can also assist researchers in developing a deeper understanding of the data they are analyzing. The naive and empirical SIFT methods are investigated by way of simulation under a range of conditions within the standard linear model framework. The performance of the SIFT methodology is compared with model selection by minimum AIC; minimum Bayesian Information Criterion (BIC); and backward elimination based on p-values. The SIFT procedure is found to behave as designed—asymptotically selecting those variables that characterize the underlying data generating mechanism, while limiting the selection of false or spurious variables to the desired level. The SIFT methodology offers researchers a promising new approach to model selection, whereby they are now able to control the probability of selecting an overspecified model to a level that best suits their needs. AIC BIC Information Criterion Likelihood Ratio Model Selection SIFT Biostatistics
30	Model Selection for Solving Kinematics Problems Goh, Choon P. 01 September 1990 (has links) There has been much interest in the area of model-based reasoning within the Artificial Intelligence community, particularly in its application to diagnosis and troubleshooting. The core issue in this thesis, simply put, is, model-based reasoning is fine, but whence the model? Where do the models come from? How do we know we have the right models? What does the right model mean anyway? Our work has three major components. The first component deals with how we determine whether a piece of information is relevant to solving a problem. We have three ways of determining relevance: derivational, situational and an order-of-magnitude reasoning process. The second component deals with the defining and building of models for solving problems. We identify these models, determine what we need to know about them, and importantly, determine when they are appropriate. Currently, the system has a collection of four basic models and two hybrid models. This collection of models has been successfully tested on a set of fifteen simple kinematics problems. The third major component of our work deals with how the models are selected. Canonical Models model selection lineau kinematics sdetermine relevance equation generation

Search results