• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 151
  • 45
  • 32
  • 15
  • 4
  • 4
  • 4
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 297
  • 297
  • 74
  • 52
  • 50
  • 47
  • 44
  • 42
  • 42
  • 41
  • 35
  • 34
  • 28
  • 27
  • 25
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Parcimonie dans les modèles Markoviens et application à l'analyse des séquences biologiques / Parsimonious Markov models and application to biological sequence analysis

Bourguignon, Pierre Yves Vincent 15 December 2008 (has links)
Les chaînes de Markov constituent une famille de modèle statistique incontournable dans de nombreuses applications, dont le spectre s'étend de la compression de texte à l'analyse des séquences biologiques. Un problème récurrent dans leur mise en oeuvre face à des données réelles est la nécessité de compromettre l'ordre du modèle, qui conditionne la complexité des interactions modélisées, avec la quantité d'information fournies par les données, dont la limitation impacte négativement la qualité des estimations menées. Les arbres de contexte permettent une granularité fine dans l'établissement de ce compromis, en permettant de recourir à des longueurs de mémoire variables selon le contexte rencontré dans la séquence. Ils ont donné lieu à des outils populaires tant pour l'indexation des textes que pour leur compression (Context Tree Maximisation – CTM - et Context Tree Weighting - CTW). Nous proposons une extension de cette classe de modèles, en introduisant les arbres de contexte parcimonieux, obtenus par fusion de noeuds issus du même parent dans l'arbre. Ces fusions permettent une augmentation radicale de la granularité de la sélection de modèle, permettant ainsi de meilleurs compromis entre complexité du modèle et qualité de l'estimation, au prix d'une extension importante de la quantité de modèles mise en concurrence. Cependant, grâce à une approche bayésienne très similaire à celle employée dans CTM et CTW, nous avons pu concevoir une méthode de sélection de modèles optimisant de manière exacte le critère bayésien de sélection de modèles tout en bénéficiant d'une programmation dynamique. Il en résulte un algorithme atteignant la borne inférieure de la complexité du problème d'optimisation, et pratiquement tractable pour des alphabets de taille inférieure à 10 symboles. Diverses démonstrations de la performance atteinte par cette procédure sont fournies en dernière partie. / Markov chains, as a universal model accounting for finite memory, discrete valued processes, are omnipresent in applied statistics. Their applications range from text compression to the analysis of biological sequences. Their practical use with finite samples, however, systematically require to draw a compromise between the memory length of the model used, which conditions the complexity of the interactions the model may capture, and the amount of information carried by the data, whose limitation negatively impacts the quality of estimation. Context trees, as an extension of the model class of Markov chains, provide the modeller with a finer granularity in this model selection process, by allowing the memory length to vary across contexts. Several popular modelling methods are based on this class of models, in fields such as text indexation of text compression (Context Tree Maximization and Context Tree Weighting). We propose an extension of the models class of context trees, the Parcimonious context trees, which further allow the fusion of sibling nodes in the context tree. They provide the modeller with a yet finer granularity to perform the model selection task, at the cost of an increased computational cost for performing it. Thanks to a bayesian approach of this problem borrowed from compression techniques, we succeeded at desiging an algorithm that exactly optimizes the bayesian criterion, while it benefits from a dynamic programming scheme ensuring the minimisation of the computational complexity of the model selection task. This algorithm is able to perform in reasonable space and time on alphabets up to size 10, and has been applied on diverse datasets to establish the good performances achieved by this approach.
22

Modelling and simulation of dynamic contrast-enhanced MRI of abdominal tumours

Banerji, Anita January 2012 (has links)
Dynamic contrast-enhanced (DCE) time series analysis techniques are hard to fully validate quantitatively as ground truth microvascular parameters are difficult to obtain from patient data. This thesis presents a software application for generating synthetic image data from known ground truth tracer kinetic model parameters. As an object oriented design has been employed to maximise flexibility and extensibility, the application can be extended to include different vascular input functions, tracer kinetic models and imaging modalities. Data sets can be generated for different anatomical and motion descriptions as well as different ground truth parameters. The application has been used to generate a synthetic DCE-MRI time series of a liver tumour with non-linear motion of the abdominal organs due to breathing. The utility of the synthetic data has been demonstrated in several applications: in the development of an Akaike model selection technique for assessing the spatially varying characteristics of liver tumours; the robustness of model fitting and model selection to noise, partial volume effects and breathing motion in liver tumours; and the benefit of using model-driven registration to compensate for breathing motion. When applied to synthetic data with appropriate noise levels, the Akaike model selection technique can distinguish between the single-input extended Kety model for tumour and the dual-input Materne model for liver, and is robust to motion. A significant difference between median Akaike probability value in tumour and liver regions is also seen in 5/6 acquired data sets, with the extended Kety model selected for tumour. Knowledge of the ground truth distribution for the synthetic data was used to demonstrate that, whilst median Ktrans does not change significantly due to breathing motion, model-driven registration restored the structure of the Ktrans histogram and so could be beneficial to tumour heterogeneity assessments.
23

Aplicação do algorítmo genético no mapeamento de genes epistáticos em cruzamentos controlados / Application of genetic algorithm in the genes epistatic map in controlled crossings

Paulo Tadeu Meira e Silva de Oliveira 22 August 2008 (has links)
O mapeamento genético é constituído por procedimentos experimentais e estatísticos que buscam detectar genes associados à etiologia e regulação de doenças, além de estimar os efeitos genéticos e as localizações genômicas correspondentes. Considerando delineamentos experimentais que envolvem cruzamentos controlados de animais ou plantas, diferentes formulações de modelos de regressão podem ser adotados na identificação de QTLs (do inglês, quantitative trait loci), incluindo seus efeitos principais e possíveis efeitos de interação (epistasia). A dificuldade nestes casos de mapeamento é a comparação de modelos que não necessariamente são encaixados e envolvem um espaço de busca de alta dimensão. Para este trabalho, descrevemos um método geral para melhorar a eficiência computacional em mapeamento simultâneo de múltiplos QTLs e de seus efeitos de interação. A literatura tem usado métodos de busca exaustiva ou busca condicional. Propomos o uso do algoritmo genético para pesquisar o espaço multilocos, sendo este mais útil para genomas maiores e mapas densos de marcadores moleculares. Por meio de estudos de simulações mostramos que a busca baseada no algoritmo genético tem eficiência, em geral, mais alta que aquela de um método de busca condicional e que esta eficiência é comparável àquela de uma busca exaustiva. Na formalização do algoritmo genético pesquisamos o comportamento de parâmetros tais como: probabilidade de recombinação, probabilidade de mutação, tamanho amostral, quantidade de gerações, quantidade de soluções e tamanho do genoma, para diferentes funções objetivo: BIC (do inglês, Bayesian Information Criterion), AIC (do inglês, Akaike Information Criterion) e SSE, a soma de quadrados dos resíduos de um modelo ajustado. A aplicação das metodologias propostas é também considerada na análise de um conjunto de dados genotípicos e fenotípicos de ratos provenientes de um delineamento F2. / Genetic mapping is defined in terms of experimental and statistical procedures applied for detection and localization of genes associated to the etiology and regulation of diseases. Considering experimental designs in controlled crossings of animals or plants, different formulations of regression models can be adopted in the identification of QTL\'s (Quantitative Trait Loci) to the inclusion of the main and interaction effects between genes (epistasis). The difficulty in these approaches of gene mapping is the comparison of models that are not necessarily nested and involves a multiloci search space of high dimension. In this work, we describe a general method to improve the computational efficiency in simultaneous mapping of multiples QTL\'s and their interactions effects. The literature has used methods of exhausting search or conditional search. We consider the genetic algorithm to search the multiloci space, looking for epistatics loci distributed on the genome. Compared to the others procedures, the advantage to use such algorithm increases more for set of genes bigger and dense maps of molecular markers. Simulation studies have shown that the search based on the genetic algorithm has efficiency, in general, higher than the conditional search and that its efficiency is comparable to that one of an exhausting search. For formalization of the genetic algorithm we consider different values of the parameters as recombination probability, mutation probability, sample size, number of generations, number of solutions and size of the set of genes. We evaluate different objective functions under the genetic algorithm: BIC, AIC and SSE. In addition, we used the sample phenotypic and genotypic data bank. Briefly, the study examined blood pressure variation before and after a salt loading experiment in an intercross (F2) progeny.
24

Criteria for generalized linear model selection based on Kullback's symmetric divergence

Acion, Cristina Laura 01 December 2011 (has links)
Model selection criteria frequently arise from constructing estimators of discrepancy measures used to assess the disparity between the data generating model and a fitted approximating model. The widely known Akaike information criterion (AIC) results from utilizing Kullback's directed divergence (KDD) as the targeted discrepancy. Under appropriate conditions, AIC serves as an asymptotically unbiased estimator of KDD. The directed divergence is an asymmetric measure of separation between two statistical models, meaning that an alternate directed divergence may be obtained by reversing the roles of the two models in the definition of the measure. The sum of the two directed divergences is Kullback's symmetric divergence (KSD). A comparison of the two directed divergences indicates an important distinction between the measures. When used to evaluate fitted approximating models that are improperly specified, the directed divergence which serves as the basis for AIC is more sensitive towards detecting overfitted models, whereas its counterpart is more sensitive towards detecting underfitted models. Since KSD combines the information in both measures, it functions as a gauge of model disparity which is arguably more balanced than either of its individual components. With this motivation, we propose three estimators of KSD for use as model selection criteria in the setting of generalized linear models: KICo, KICu, and QKIC. These statistics function as asymptotically unbiased estimators of KSD under different assumptions and frameworks. As with AIC, KICo and KICu are both justified for large-sample maximum likelihood settings; however, asymptotic unbiasedness holds under more general assumptions for KICo and KICu than for AIC. KICo serves as an asymptotically unbiased estimator of KSD in settings where the distribution of the response is misspecified. The asymptotic unbiasedness of KICu holds when the candidate model set includes underfitted models. QKIC is a modification of KICo. In the development of QKIC, the likelihood is replaced by the quasi-likelihood. QKIC can be used as a model selection tool when generalized estimating equations, a quasi-likelihood-based method, are used for parameter estimation. We examine the performance of KICo, KICu, and QKIC relative to other relevant criteria in simulation experiments. We also apply QKIC in a model selection problem for a randomized clinical trial investigating the effect of antidepressants on the temporal course of disability after stroke.
25

Best-subset model selection based on multitudinal assessments of likelihood improvements

Carter, Knute Derek 01 December 2013 (has links)
Given a set of potential explanatory variables, one model selection approach is to select the best model, according to some criterion, from among the collection of models defined by all possible subsets of the explanatory variables. A popular procedure that has been used in this setting is to select the model that results in the smallest value of the Akaike information criterion (AIC). One drawback in using the AIC is that it can lead to the frequent selection of overspecified models. This can be problematic if the researcher wishes to assert, with some level of certainty, the necessity of any given variable that has been selected. This thesis develops a model selection procedure that allows the researcher to nominate, a priori, the probability at which overspecified models will be selected from among all possible subsets. The procedure seeks to determine if the inclusion of each candidate variable results in a sufficiently improved fitting term, and hence is referred to as the SIFT procedure. In order to determine whether there is sufficient evidence to retain a candidate variable or not, a set of threshold values are computed. Two procedures are proposed: a naive method based on a set of restrictive assumptions; and an empirical permutation-based method. Graphical tools have also been developed to be used in conjunction with the SIFT procedure. The graphical representation of the SIFT procedure clarifies the process being undertaken. Using these tools can also assist researchers in developing a deeper understanding of the data they are analyzing. The naive and empirical SIFT methods are investigated by way of simulation under a range of conditions within the standard linear model framework. The performance of the SIFT methodology is compared with model selection by minimum AIC; minimum Bayesian Information Criterion (BIC); and backward elimination based on p-values. The SIFT procedure is found to behave as designed—asymptotically selecting those variables that characterize the underlying data generating mechanism, while limiting the selection of false or spurious variables to the desired level. The SIFT methodology offers researchers a promising new approach to model selection, whereby they are now able to control the probability of selecting an overspecified model to a level that best suits their needs.
26

Model Selection for Solving Kinematics Problems

Goh, Choon P. 01 September 1990 (has links)
There has been much interest in the area of model-based reasoning within the Artificial Intelligence community, particularly in its application to diagnosis and troubleshooting. The core issue in this thesis, simply put, is, model-based reasoning is fine, but whence the model? Where do the models come from? How do we know we have the right models? What does the right model mean anyway? Our work has three major components. The first component deals with how we determine whether a piece of information is relevant to solving a problem. We have three ways of determining relevance: derivational, situational and an order-of-magnitude reasoning process. The second component deals with the defining and building of models for solving problems. We identify these models, determine what we need to know about them, and importantly, determine when they are appropriate. Currently, the system has a collection of four basic models and two hybrid models. This collection of models has been successfully tested on a set of fifteen simple kinematics problems. The third major component of our work deals with how the models are selected.
27

Secondary Analysis of Case-Control Studies in Genomic Contexts

Wei, Jiawei 2010 August 1900 (has links)
This dissertation consists of five independent projects. In each project, a novel statistical method was developed to address a practical problem encountered in genomic contexts. For example, we considered testing for constant nonparametric effects in a general semiparametric regression model in genetic epidemiology; analyzed the relationship between covariates in the secondary analysis of case-control data; performed model selection in joint modeling of paired functional data; and assessed the prediction ability of genes in gene expression data generated by the CodeLink System from GE. In the first project in Chapter II we considered the problem of testing for constant nonparametric effects in a general semiparametric regression model when there is the potential for interaction between the parametrically and nonparametrically modeled variables. We derived a generalized likelihood ratio test for this hypothesis, showed how to implement it, and gave evidence that it can improve statistical power when compared to standard partially linear models. The second project in Chapter III addressed the issue of score testing for the independence of X and Y in the second analysis of case-control data. The semiparametric efficient approaches can be used to construct semiparametric score tests, but they suffer from a lack of robustness to the assumed model for Y given X. We showed how to adjust the semiparametric score test to make its level/Type I error correct even if the assumed model for Y given X is incorrect, and thus the test is robust. The third project in Chapter IV took up the issue of estimation of a regression function when Y given X follows a homoscedastic regression model. We showed how to estimate the regression parameters in a rare disease case even if the assumed model for Y given X is incorrect, and thus the estimates are model-robust. In the fourth project in Chapter V we developed novel AIC and BIC-type methods for estimating the smoothing parameters in a joint model of paired, hierarchical sparse functional data, and showed in our numerical work that they are many times faster than 10-fold crossvalidation while at the same time giving results that are remarkably close to the crossvalidated estimates. In the fifth project in Chapter VI we introduced a practical permutation test that uses cross-validated genetic predictors to determine if the list of genes in question has “good” prediction ability. It avoids overfitting by using cross-validation to derive the genetic predictor and determines if the count of genes that give “good” prediction could have been obtained by chance. This test was then used to explore gene expression of colonic tissue and exfoliated colonocytes in the fecal stream to discover similarities between the two.
28

Predicting the migration of CO₂ plume in saline aquifers using probabilistic history matching approaches

Bhowmik, Sayantan 20 August 2012 (has links)
During the operation of a geological carbon storage project, verifying that the CO₂ plume remains within the permitted zone is of particular interest both to regulators and to operators. However, the cost of many monitoring technologies, such as time-lapse seismic, limits their application. For adequate predictions of plume migration, proper representation of heterogeneous permeability fields is imperative. Previous work has shown that injection data (pressures, rates) from wells might provide a means of characterizing complex permeability fields in saline aquifers. Thus, given that injection data are readily available and inexpensive, they might provide an inexpensive alternative for monitoring; combined with a flow model like the one developed in this work, these data could even be used for predicting plume migration. These predictions of plume migration pathways can then be compared to field observations like time-lapse seismic or satellite measurements of surface-deformation, to ensure the containment of the injected CO₂ within the storage area. In this work, two novel methods for creating heterogeneous permeability fields constrained by injection data are demonstrated. The first method is an implementation of a probabilistic history matching algorithm to create models of the aquifer for predicting the movement of the CO₂ plume. The geologic property of interest, for example hydraulic conductivity, is updated conditioned to geological information and injection pressures. The resultant aquifer model which is geologically consistent can be used to reliably predict the movement of the CO₂ plume in the subsurface. The second method is a model selection algorithm that refines an initial suite of subsurface models representing the prior uncertainty to create a posterior set of subsurface models that reflect injection performance consistent with that observed. Such posterior models can be used to represent uncertainty in the future migration of the CO₂ plume. The applicability of both methods is demonstrated using a field data set from central Algeria. / text
29

Mixed-effect modeling of codon usage

Feng, Shujuan 22 February 2011 (has links)
Logistic mixed effects models are used to determine whether optimal codons associate with two specific properties of the expressed protein: solvent accessibility, aggregation propensity, or evolutionary conservation. Both random components and fixed structures in the models are decided by following certain selection procedures. More models are also developed by considering different factor combinations using the same selection procedure. The results show that evolutionary conservation is the most important factor for predicting for the optimal codon usage for most amino acids; aggregation propensity is also an important factor, and solvent accessibility is the least important factor for most amino acids.The results of this analysis are consistent with the previous literature, provide more straightforward way to study the research question and also more information for the insight relationships. / text
30

Selection of Simplified Models and Parameter Estimation Using Limited Data

Wu, SHAOHUA 23 December 2009 (has links)
Due to difficulties associated with formulating complex models and obtaining reliable estimates of unknown model parameters, modellers often use simplified models (SMs) that are structurally imperfect and that contain a smaller number of parameters. The objectives of this research are: 1) to develop practical and easy-to-use strategies to help modellers select the best SM from a set of candidate models, and 2) to assist modellers in deciding which parameters in complex models should be estimated, and which should be fixed at initial values. The aim is to select models and parameters so that the best possible predictions can be obtained using the available data and the modeller’s engineering and scientific knowledge. This research summarizes the extensive qualitative and quantitative results in the statistics literature regarding the use of SMs. Mean-squared error (MSE) is used to judge the quality of model predictions obtained from different candidate models, and a confidence-interval approach is developed to assess the uncertainties associated with whether a SM or the corresponding extended model will give better predictions. Nine commonly-applied model-selection criteria (MSC) are reviewed and analyzed for their propensities of preferring SMs. It is shown that there exist preferential orderings for many MSC that are independent of model structure and the particular data set. A new MSE-based MSC is developed using univariate linear statistical models. The effectiveness of this criterion for selecting dynamic nonlinear multivariate models is demonstrated both theoretically and empirically. The proposed criterion is then applied for determining the optimal number of parameters to estimate in complex models, based on ranked parameter lists obtained from estimability analysis. This approach makes use of the modeller’s prior knowledge about precision of initial parameter values and is less computationally expensive than comparable methods in the literature. / Thesis (Ph.D, Chemical Engineering) -- Queen's University, 2009-12-23 09:48:45.423

Page generated in 0.1282 seconds