Global ETD Search

1	Estimation and Goodness of Fit for Multivariate Survival Models Based on Copulas Yilmaz, Yildiz Elif 11 August 2009 (has links) We provide ways to test the fit of a parametric copula family for bivariate censored data with or without covariates. The proposed copula family is tested by embedding it in an expanded parametric family of copulas. When parameters in the proposed and the expanded copula models are estimated by maximum likelihood, a likelihood ratio test can be used. However, when they are estimated by two-stage pseudolikelihood estimation, the corresponding test is a pseudolikelihood ratio test. The two-stage procedures offer less computation, which is especially attractive when the marginal lifetime distributions are specified nonparametrically or semiparametrically. It is shown that the likelihood ratio test is consistent even when the expanded model is misspecified. Power comparisons of the likelihood ratio and the pseudolikelihood ratio tests with some other goodness-of-fit tests are performed both when the expanded family is correct and when it is misspecified. They indicate that model expansion provides a convenient, powerful and robust approach. We introduce a semiparametric maximum likelihood estimation method in which the copula parameter is estimated without assumptions on the marginal distributions. This method and the two-stage semiparametric estimation method suggested by Shih and Louis (1995) are generalized to regression models with Cox proportional hazards margins. The two-stage semiparametric estimator of the copula parameter is found to be about as good as the semiparametric maximum likelihood estimator. Semiparametric likelihood ratio and pseudolikelihood ratio tests are considered to provide goodness of fit tests for a copula model without making parametric assumptions for the marginal distributions. Both when the expanded family is correct and when it is misspecified, the semiparametric pseudolikelihood ratio test is almost as powerful as the parametric likelihood ratio and pseudolikelihood ratio tests while achieving robustness to the form of the marginal distributions. The methods are illustrated on applications in medicine and insurance. Sequentially observed survival times are of interest in many studies but there are difficulties in modeling and analyzing such data. First, when the duration of followup is limited and the times for a given individual are not independent, the problem of induced dependent censoring arises for the second and subsequent survival times. Non-identifiability of the marginal survival distributions for second and later times is another issue, since they are observable only if preceding survival times for an individual are uncensored. In addition, in some studies, a significant proportion of individuals may never have the first event. Fully parametric models can deal with these features, but lack of robustness is a concern, and methods of assessing fit are lacking. We introduce an approach to address these issues. We model the joint distribution of the successive survival times by using copula functions, and provide semiparametric estimation procedures in which copula parameters are estimated without parametric assumptions on the marginal distributions. The performance of semiparametric estimation methods is compared with some other estimation methods in simulation studies and shown to be good. The methodology is applied to a motivating example involving relapse and survival following colon cancer treatment. copula semiparametric estimation likelihood ratio test pseudolikelihood ratio test multivariate survival data Statistics
2	Estimation and Goodness of Fit for Multivariate Survival Models Based on Copulas Yilmaz, Yildiz Elif 11 August 2009 (has links) We provide ways to test the fit of a parametric copula family for bivariate censored data with or without covariates. The proposed copula family is tested by embedding it in an expanded parametric family of copulas. When parameters in the proposed and the expanded copula models are estimated by maximum likelihood, a likelihood ratio test can be used. However, when they are estimated by two-stage pseudolikelihood estimation, the corresponding test is a pseudolikelihood ratio test. The two-stage procedures offer less computation, which is especially attractive when the marginal lifetime distributions are specified nonparametrically or semiparametrically. It is shown that the likelihood ratio test is consistent even when the expanded model is misspecified. Power comparisons of the likelihood ratio and the pseudolikelihood ratio tests with some other goodness-of-fit tests are performed both when the expanded family is correct and when it is misspecified. They indicate that model expansion provides a convenient, powerful and robust approach. We introduce a semiparametric maximum likelihood estimation method in which the copula parameter is estimated without assumptions on the marginal distributions. This method and the two-stage semiparametric estimation method suggested by Shih and Louis (1995) are generalized to regression models with Cox proportional hazards margins. The two-stage semiparametric estimator of the copula parameter is found to be about as good as the semiparametric maximum likelihood estimator. Semiparametric likelihood ratio and pseudolikelihood ratio tests are considered to provide goodness of fit tests for a copula model without making parametric assumptions for the marginal distributions. Both when the expanded family is correct and when it is misspecified, the semiparametric pseudolikelihood ratio test is almost as powerful as the parametric likelihood ratio and pseudolikelihood ratio tests while achieving robustness to the form of the marginal distributions. The methods are illustrated on applications in medicine and insurance. Sequentially observed survival times are of interest in many studies but there are difficulties in modeling and analyzing such data. First, when the duration of followup is limited and the times for a given individual are not independent, the problem of induced dependent censoring arises for the second and subsequent survival times. Non-identifiability of the marginal survival distributions for second and later times is another issue, since they are observable only if preceding survival times for an individual are uncensored. In addition, in some studies, a significant proportion of individuals may never have the first event. Fully parametric models can deal with these features, but lack of robustness is a concern, and methods of assessing fit are lacking. We introduce an approach to address these issues. We model the joint distribution of the successive survival times by using copula functions, and provide semiparametric estimation procedures in which copula parameters are estimated without parametric assumptions on the marginal distributions. The performance of semiparametric estimation methods is compared with some other estimation methods in simulation studies and shown to be good. The methodology is applied to a motivating example involving relapse and survival following colon cancer treatment. copula semiparametric estimation likelihood ratio test pseudolikelihood ratio test multivariate survival data Statistics
3	2x2列聯表模型下MLE與MPLE之比較 / The comparison between MLE and MPLE under two-by two contingency table models 郭名斬 Unknown Date (has links) Arnold and Strauss (1991) 探討2x2列聯表中的3個方格 (cell) 有相同機率θ的問題，他們比較了參數θ的最大概似估計值與最大擬概似估計值，發現參數θ的最大概似估計值與最大擬概似估計值是不相同的。在本論文中，我們將2x2列聯表中的3個方格的參數值 (機率值)，從限制為相同θ，放寬為成某種比例，並證明了在一般情況下參數θ的最大概似估計值與最大擬概似估計值也不相同。我們也提出一些使參數θ的最大概似估計值及最大擬概似估計值相同的特殊條件，諸如三個方格內的觀察值跟機率值成比例或格子內的觀察值有某些特定值。本論文也透過電腦模擬的結果，發現最大概似估計式較最大擬概似估計式來得精確，而且當參數θ在參數空間之中點附近時，最大概似估計值與最大擬概似估計值的差異為最大。 / Arnold and Strauss (1991) study the cases that three of the four cells in the 2x2 contingency table have the same cell probability θ. In particular, Arnold and Strauss (1991) compare the maximum likelihood estimate (MLE) and maximum pseudolikelihood estimate (MPLE) of the parameter θ. They find that MLE and MPLE of the parameter are not the same. In this thesis, we relax the assumptions so that those three cell probabilities may not be the same and each is proportional to a parameter θ. We find that, in general, MLE’s of θ are still not the same as MPLE’s of θ. Some special cases that make MLE the same as MPLE are also given. We also find, through computer simulations, that MLE’s are accurate than MPLE’s and that the difference between MLE and MPLE is getting larger when the parameter θ is closer to the midpoint of its space. 列聯表最大概似估計最大擬概似估計 contingency table maximum likelihood estimate maximum pseudolikelihood estimate
4	Statistical modeling of protein sequences beyond structural prediction : high dimensional inference with correlated data / Modélisation statistique des séquences de protéines au-delà de la prédiction structurelle : inférence en haute dimension avec des données corrélées Coucke, Alice 10 October 2016 (has links) Grâce aux progrès des techniques de séquençage, les bases de données génomiques ont connu une croissance exponentielle depuis la fin des années 1990. Un grand nombre d'outils statistiques ont été développés à l'interface entre bioinformatique, apprentissage automatique et physique statistique, dans le but d'extraire de l'information de ce déluge de données. Plusieurs approches de physique statistique ont été récemment introduites dans le contexte précis de la modélisation de séquences de protéines, dont l'analyse en couplages directs. Cette méthode d'inférence statistique globale fondée sur le principe d'entropie maximale, s'est récemment montrée d'une efficacité redoutable pour prédire la structure tridimensionnelle de protéines, à partir de considérations purement statistiques.Dans cette thèse, nous présentons les méthodes d'inférence en question, et encouragés par leur succès, explorons d'autres domaines complexes dans lesquels elles pourraient être appliquées, comme la détection d'homologies. Contrairement à la prédiction des contacts entre résidus qui se limite à une information topologique sur le réseau d'interactions, ces nouveaux champs d'application exigent des considérations énergétiques globales et donc un modèle plus quantitatif et détaillé. À travers une étude approfondie sur des donnéesartificielles et biologiques, nous proposons une meilleure interpretation des paramètres centraux de ces méthodes d'inférence, jusqu'ici mal compris, notamment dans le cas d'un échantillonnage limité. Enfin, nous présentons une nouvelle procédure plus précise d'inférence de modèles génératifs, qui mène à des avancées importantes pour des données réelles en quantité limitée. / Over the last decades, genomic databases have grown exponentially in size thanks to the constant progress of modern DNA sequencing. A large variety of statistical tools have been developed, at the interface between bioinformatics, machine learning, and statistical physics, to extract information from these ever increasing datasets. In the specific context of protein sequence data, several approaches have been recently introduced by statistical physicists, such as direct-coupling analysis, a global statistical inference method based on the maximum-entropy principle, that has proven to be extremely effective in predicting the three-dimensional structure of proteins from purely statistical considerations.In this dissertation, we review the relevant inference methods and, encouraged by their success, discuss their extension to other challenging fields, such as sequence folding prediction and homology detection. Contrary to residue-residue contact prediction, which relies on an intrinsically topological information about the network of interactions, these fields require global energetic considerations and therefore a more quantitative and detailed model. Through an extensive study on both artificial and biological data, we provide a better interpretation of the central inferred parameters, up to now poorly understood, especially in the limited sampling regime. Finally, we present a new and more precise procedure for the inference of generative models, which leads to further improvements on real, finitely sampled data. Inférence Apprentissage statistique Régularisation Entropie maximale Ccoévolution des protéines Vraisemblance maximale Champ moyen Pseudo vraisemblance Développement en grappe Inference Statistical learning Regularization Maximum entropy Protein coevolution Maximum likelihood Mean field Pseudolikelihood Cluster expansion 530.13
5	Inferência em modelos de regressão com erros de medição sob enfoque estrutural para observações replicadas Tomaya, Lorena Yanet Cáceres 10 March 2014 (has links) Made available in DSpace on 2016-06-02T20:06:10Z (GMT). No. of bitstreams: 1 6069.pdf: 3171774 bytes, checksum: a737da63d3ddeb0d44dfc38839337d42 (MD5) Previous issue date: 2014-03-10 / Financiadora de Estudos e Projetos / The usual regression model fits data under the assumption that the explanatory variable is measured without error. However, in many situations the explanatory variable is observed with measurement errors. In these cases, measurement error models are recommended. We study a structural measurement error model for replicated observations. Estimation of parameters of the proposed models was obtained by the maximum likelihood and maximum pseudolikelihood methods. The behavior of the estimators was assessed in a simulation study with different numbers of replicates. Moreover, we proposed the likelihood ratio test, Wald test, score test, gradient test, Neyman's C test and pseudolikelihood ratio test in order to test hypotheses of interest related to the parameters. The proposed test statistics are assessed through a simulation study. Finally, the model was fitted to a real data set comprising measurements of concentrations of chemical elements in samples of Egyptian pottery. The computational implementation was developed in R language. / Um dos procedimentos usuais para estudar uma relação entre variáveis é análise de regressão. O modelo de regressão usual ajusta os dados sob a suposição de que as variáveis explicativas são medidas sem erros. Porém, em diversas situações as variáveis explicativas apresentam erros de medição. Nestes casos são utilizados os modelos com erros de medição. Neste trabalho estudamos um modelo estrutural com erros de medição para observações replicadas. A estimação dos parâmetros dos modelos propostos foi efetuada pelos métodos de máxima verossimilhança e de máxima pseudoverossimilhança. O comportamento dos estimadores de alguns parâmetros foi analisado por meio de simulações para diferentes números de réplicas. Além disso, são propostos o teste da razão de verossimilhanças, o teste de Wald, o teste escore, o teste gradiente, o teste C de Neyman e o teste da razão de pseudoverossimilhanças com o objetivo de testar algumas hipóteses de interesse relacionadas aos parâmetros. As estatísticas propostas são avaliadas por meio de simulações. Finalmente, o modelo foi ajustado a um conjunto de dados reais referentes a medições de concentrações de elementos químicos em amostras de cerâmicas egípcias. A implementação computacional foi desenvolvida em linguagem R. Inferência (Estatística) Modelos com erros de medição Erros heteroscedásticos Máxima pseudoverossimilhança Modelo estrutural Matriz de covariâncias Máxima verossimilhança Heteroscedastic errors Covariance matrix Maximum pseudolikelihood Maximum likelihood Measurement error models Structural model

1

Page generated in 0.0456 seconds