Global ETD Search

81	Modelos mistos aditivos semiparamétricos de contornos elípticos / Elliptical contoured semiparametric additive mixed models. Germán Mauricio Ibacache Pulgar 14 August 2009 (has links) Neste trabalho estendemos os modelos mistos semiparamétricos propostos por Zhang et al. (1998) para uma classe mais geral de modelos, a qual denominamos modelos mistos aditivos semiparamétricos com erros de contornos elípticos. Com essa nova abordagem, flexibilizamos a curtose da distribuição dos erros possibilitando a escolha de distribuições com caudas mais leves ou mais pesadas do que as caudas da distribuição normal padrão. Funções de verossimilhança penalizadas são aplicadas para a obtenção das estimativas de máxima verossimilhança com os respectivos erros padrão aproximados. Essas estimativas, sob erros de caudas pesadas, são robustas no sentido da distância de Mahalanobis contra observações aberrantes. Curvaturas de influência local são obtidas segundo alguns esquemas de perturbação e gráficos de diagnóstico são propostos. Exemplos ilustrativos são apresentados em que ajustes sob erros normais são comparados, através das metodologias de sensibilidade desenvolvidas no trabalho, com ajustes sob erros de contornos elípticos. / In this work we extend the models proposed by Zhang et al. (1998) to a more general class of models, know as semiparametric additive mixed models with elliptical errors in order to allow distributions with heavier or lighter tails than the normal ones. Penalized likelihood equations are applied to derive the maximum likelihood estimates which appear to be robust against outlying observations in the sense of the Mahalanobis distance. In order to study the sensitivity of the penalized estimates under some usual perturbation schemes in the model or data, the local influence curvatures are derived and some diagnostic graphics are proposed. Motivating examples preliminary analyzed under normal errors are reanalyzed under some appropriate elliptical errors. The local influence approach is used to compare the sensitivity of the model estimates. Distribuições elípticas estimativas robustas influência local. modelos não paramétricos Elliptical distributions local influence method. non-parametric models penalized likelihood estimates robust estimates
82	Análise de diagnóstico em modelos semiparamétricos normais / Diagnostic analysis in semiparametric normal models Gleyce Rocha Noda 18 April 2013 (has links) Nesta dissertação apresentamos métodos de diagnóstico em modelos semiparamétricos sob erros normais, em especial os modelos semiparamétricos com uma variável explicativa não paramétrica, conhecidos como modelos lineares parciais. São utilizados splines cúbicos para o ajuste da variável resposta e são aplicadas funções de verossimilhança penalizadas para a obtenção dos estimadores de máxima verossimilhança com os respectivos erros padrão aproximados. São derivadas também as propriedades da matriz hat para esse tipo de modelo, com o objetivo de utilizá-la como ferramenta na análise de diagnóstico. Gráficos normais de probabilidade com envelope gerado também foram adaptados para avaliar a adequabilidade do modelo. Finalmente, são apresentados dois exemplos ilustrativos em que os ajustes são comparados com modelos lineares normais usuais, tanto no contexto do modelo aditivo normal simples como no contexto do modelo linear parcial. / In this master dissertation we present diagnostic methods in semiparametric models under normal errors, specially in semiparametric models with one nonparametric explanatory variable, also known as partial linear model. We use cubic splines for the nonparametric fitting, and penalized likelihood functions are applied for obtaining maximum likelihood estimators with their respective approximate standard errors. The properties of the hat matrix are also derived for this kind of model, aiming to use it as a tool for diagnostic analysis. Normal probability plots with simulated envelope graphs were also adapted to evaluate the model suitability. Finally, two illustrative examples are presented, in which the fits are compared with usual normal linear models, such as simple normal additive and partially linear models. função de verossimilhança penalizada modelos lineares parciais modelos não paramétricos splines cúbicos suavizadores. cubic splines nonparametric models partially linear models penalized likelihood smoothing.
83	Splines multidimensionnelles pénalisées pour modéliser le taux de survenue d’un événement : application au taux de mortalité en excès et à la survie nette en épidémiologie des maladies chroniques / Multidimensional penalized splines for hazard modelling : application to excess mortality hazard and net survival in chronic disease epidemiology Fauvernier, Mathieu 24 September 2019 (has links) L’étude du temps de survenue d’un événement représente un champ très important des statistiques. Lorsque l’événement étudié est le décès, on cherche à décrire la survie des individus ainsi que leur taux de mortalité, c’est-à-dire la « force de mortalité » qui s’applique à un instant donné. Les patients atteints d’une maladie chronique présentent en général un excès de mortalité par rapport à une population ne présentant pas la maladie en question. En épidémiologie, l’étude du taux de mortalité en excès des patients, et notamment de l’impact des facteurs pronostiques sur celui-ci, représente donc un enjeu majeur de santé publique. D’un point de vue statistique, la modélisation du taux de mortalité (en excès) implique de prendre en compte les effets potentiellement non-linéaires et dépendants du temps des facteurs pronostiques ainsi que les interactions. Les splines de régression, polynômes par morceaux paramétriques et flexibles, sont des outils particulièrement bien adaptés pour modéliser des effets d’une telle complexité. Toutefois, la flexibilité des splines de régression comporte un risque de sur-ajustement. Pour éviter ce risque, les splines de régression pénalisées ont été proposées dans le cadre des modèles additifs généralisés. Leur principe est le suivant : à chaque spline peuvent être associés un ou plusieurs termes de pénalité contrôlés par des paramètres de lissage. Les paramètres de lissage représentent les degrés de pénalisation souhaités. En pratique, ils sont inconnus et doivent être estimés tout comme les paramètres de régression. Dans le cadre de cette thèse, nous avons développé une méthode permettant de modéliser le taux de mortalité (en excès) à l’aide de splines de régression multidimensionnelles pénalisées. Des splines cubiques restreintes ont été utilisées comme splines unidimensionnelles ou bien comme bases marginales afin de former des splines multidimensionnelles par produits tensoriels. Le processus d’optimisation s’appuie sur deux algorithmes de Newton-Raphson emboîtés. L’estimation des paramètres de lissage est effectuée en optimisant un critère de validation croisée ou bien la vraisemblance marginale des paramètres de lissage par un algorithme de Newton-Raphson dit externe. A paramètres de lissage fixés, les paramètres de régression sont estimés par maximisation de la vraisemblance pénalisée par un algorithme de Newton-Raphson dit interne.Les bonnes propriétés de cette approche en termes de performances statistiques et de stabilité numérique ont ensuite été démontrées par simulation. La méthode a ensuite été implémentée au sein du package R survPen. Enfin, la méthode a été appliquée sur des données réelles afin de répondre aux deux questions épidémiologiques suivantes : l’impact de la défavorisation sociale sur la mortalité en excès des patients atteints d’un cancer du col de l’utérus et l’impact de l’âge courant sur la mortalité en excès des patients atteints de sclérose en plaques / Time-to-event analysis is a very important field in statistics. When the event under study is death, the analysis focuses on the probability of survival of the subjects as well as on their mortality hazard, that is, on the "force of mortality" that applies at any given moment. Patients with a chronic disease usually have an excess mortality compared to a population that does not have the disease. Studying the excess mortality hazard associated with a disease and investigating the impact of prognostic factors on this hazard are important public health issues in epidemiology. From a statistical point of view, modelling the (excess) mortality hazard involves taking into account potentially non-linear and time-dependent effects of prognostic factors as well as their interactions. Regression splines (i.e., parametric and flexible piecewise polynomials) are ideal for dealing with such a complexity. They make it possible to build easily nonlinear effects and, regarding interactions between continuous variables, make it easy to form a multidimensional spline from two or more marginal one-dimensional splines. However, the flexibility of regression splines presents a risk of overfitting. To avoid this risk, penalized regression splines have been proposed as part of generalized additive models. Their principle is to associate each spline with one or more penalty terms controlled by smoothing parameters. The smoothing parameters represent the desired degrees of penalization. In practice, these parameters are unknown and have to be estimated just like the regression parameters. This thesis describes the development of a method to model the (excess) hazard using multidimensional penalized regression splines. Restricted cubic splines were used as one-dimensional splines or marginal bases to form multidimensional splines by tensor products. The optimization process relies on two nested Newton-Raphson algorithms. Smoothing parameter estimation is performed by optimizing a cross-validation criterion or the marginal likelihood of the smoothing parameters with an outer Newton-Raphson algorithm. At fixed smoothing parameters, the regression parameters are estimated by maximizing the penalized likelihood by an inner Newton-Raphson algorithm.The good properties of this approach in terms of statistical performance and numerical stability were then demonstrated through simulation. The described method was then implemented within the R package survPen. Finally, the method was applied to real data to investigate two epidemiological issues: the impact of social deprivation on the excess mortality in cervical cancer patients and the impact of the current age on the excess mortality in multiple sclerosis patients Splines pénalisées Survie Taux de mortalité Taux en excès Survie nette Épidémiologie Maladies chroniques Penalized splines Survival Mortality hazard Excess hazard Net survival Epidemiology Chronic diseases 570.15
84	Dating Divergence Times in Phylogenies Anderson, Cajsa Lisa January 2007 (has links) <p>This thesis concerns different aspects of dating divergence times in phylogenetic trees, using molecular data and multiple fossil age constraints.</p><p>Datings of phylogenetically basal eudicots, monocots and modern birds (Neoaves) are presented. Large phylograms and multiple fossil constraints were used in all these studies. Eudicots and monocots are suggested to be part of a rapid divergence of angiosperms in the Early Cretaceous, with most families present at the Cretaceous/Tertiary boundary. Stem lineages of Neoaves were present in the Late Cretaceous, but the main divergence of extant families took place around the Cre-taceous/Tertiary boundary.</p><p>A novel method and computer software for dating large phylogenetic trees, PATHd8, is presented. PATHd8 is a nonparametric smoothing method that smoothes one pair of sister groups at a time, by taking the mean of the added branch lengths from a terminal taxon to a node. Because of the local smoothing, the algorithm is simple, hence providing stable and very fast analyses, allowing for thousands of taxa and an arbitrary number of age constraints.</p><p>The importance of fossil constraints and their placement are discussed, and concluded to be the most important factor for obtaining reasonable age estimates.</p><p>Different dating methods are compared, and it is concluded that differences in age estimates are obtained from penalized likelihood, PATHd8, and the Bayesian autocorrelation method implemented in the multidivtime program. In the Bayesian method, prior assumptions about evolutionary rate at the root, rate variance and the level of rate smoothing between internal edges, are suggested to influence the results.</p> systematic biology divergence time nonprametric rate smoothing Bayesian multidivtime eudicots Neoaves monocots dating molecular clock relaxed clock age constraint fossil calibration PATHd8 penalized likelihood Biology Biologi
85	Dating Divergence Times in Phylogenies Anderson, Cajsa Lisa January 2007 (has links) This thesis concerns different aspects of dating divergence times in phylogenetic trees, using molecular data and multiple fossil age constraints. Datings of phylogenetically basal eudicots, monocots and modern birds (Neoaves) are presented. Large phylograms and multiple fossil constraints were used in all these studies. Eudicots and monocots are suggested to be part of a rapid divergence of angiosperms in the Early Cretaceous, with most families present at the Cretaceous/Tertiary boundary. Stem lineages of Neoaves were present in the Late Cretaceous, but the main divergence of extant families took place around the Cre-taceous/Tertiary boundary. A novel method and computer software for dating large phylogenetic trees, PATHd8, is presented. PATHd8 is a nonparametric smoothing method that smoothes one pair of sister groups at a time, by taking the mean of the added branch lengths from a terminal taxon to a node. Because of the local smoothing, the algorithm is simple, hence providing stable and very fast analyses, allowing for thousands of taxa and an arbitrary number of age constraints. The importance of fossil constraints and their placement are discussed, and concluded to be the most important factor for obtaining reasonable age estimates. Different dating methods are compared, and it is concluded that differences in age estimates are obtained from penalized likelihood, PATHd8, and the Bayesian autocorrelation method implemented in the multidivtime program. In the Bayesian method, prior assumptions about evolutionary rate at the root, rate variance and the level of rate smoothing between internal edges, are suggested to influence the results. systematic biology divergence time nonprametric rate smoothing Bayesian multidivtime eudicots Neoaves monocots dating molecular clock relaxed clock age constraint fossil calibration PATHd8 penalized likelihood Biology Biologi
86	Theoretical Results and Applications Related to Dimension Reduction Chen, Jie 01 November 2007 (has links) To overcome the curse of dimensionality, dimension reduction is important and necessary for understanding the underlying phenomena in a variety of fields. Dimension reduction is the transformation of high-dimensional data into a meaningful representation in the low-dimensional space. It can be further classified into feature selection and feature extraction. In this thesis, which is composed of four projects, the first two focus on feature selection, and the last two concentrate on feature extraction. The content of the thesis is as follows. The first project presents several efficient methods for the sparse representation of a multiple measurement vector (MMV); some theoretical properties of the algorithms are also discussed. The second project introduces the NP-hardness problem for penalized likelihood estimators, including penalized least squares estimators, penalized least absolute deviation regression and penalized support vector machines. The third project focuses on the application of manifold learning in the analysis and prediction of 24-hour electricity price curves. The last project proposes a new hessian regularized nonlinear time-series model for prediction in time series. HRM Time series prediction Electricity price curve Dimension reduction MMV Penalized likelihood estimator Computational complexity Estimation theory Prediction theory Electricity Pricing
87	High-dimensional classification and attribute-based forecasting Lo, Shin-Lian 27 August 2010 (has links) This thesis consists of two parts. The first part focuses on high-dimensional classification problems in microarray experiments. The second part deals with forecasting problems with a large number of categories in predictors. Classification problems in microarray experiments refer to discriminating subjects with different biologic phenotypes or known tumor subtypes as well as to predicting the clinical outcomes or the prognostic stages of subjects. One important characteristic of microarray data is that the number of genes is much larger than the sample size. The penalized logistic regression method is known for simultaneous variable selection and classification. However, the performance of this method declines as the number of variables increases. With this concern, in the first study, we propose a new classification approach that employs the penalized logistic regression method iteratively with a controlled size of gene subsets to maintain variable selection consistency and classification accuracy. The second study is motivated by a modern microarray experiment that includes two layers of replicates. This new experimental setting causes most existing classification methods, including penalized logistic regression, not appropriate to be directly applied because the assumption of independent observations is violated. To solve this problem, we propose a new classification method by incorporating random effects into penalized logistic regression such that the heterogeneity among different experimental subjects and the correlations from repeated measurements can be taken into account. An efficient hybrid algorithm is introduced to tackle computational challenges in estimation and integration. Applications to a breast cancer study show that the proposed classification method obtains smaller models with higher prediction accuracy than the method based on the assumption of independent observations. The second part of this thesis develops a new forecasting approach for large-scale datasets associated with a large number of predictor categories and with predictor structures. The new approach, beyond conventional tree-based methods, incorporates a general linear model and hierarchical splits to make trees more comprehensive, efficient, and interpretable. Through an empirical study in the air cargo industry and a simulation study containing several different settings, the new approach produces higher forecasting accuracy and higher computational efficiency than existing tree-based methods. Classification Microarray experiments Tree-based methods Variable selection Penalized logistic regression Forecasting Computational biology Bioinformatics Pattern recognition systems DNA microarrays Classification Logistic regression analysis
88	Variable Selection and Function Estimation Using Penalized Methods Xu, Ganggang 2011 December 1900 (has links) Penalized methods are becoming more and more popular in statistical research. This dissertation research covers two major aspects of applications of penalized methods: variable selection and nonparametric function estimation. The following two paragraphs give brief introductions to each of the two topics. Infinite variance autoregressive models are important for modeling heavy-tailed time series. We use a penalty method to conduct model selection for autoregressive models with innovations in the domain of attraction of a stable law indexed by alpha is an element of (0, 2). We show that by combining the least absolute deviation loss function and the adaptive lasso penalty, we can consistently identify the true model. At the same time, the resulting coefficient estimator converges at a rate of n^(?1/alpha) . The proposed approach gives a unified variable selection procedure for both the finite and infinite variance autoregressive models. While automatic smoothing parameter selection for nonparametric function estimation has been extensively researched for independent data, it is much less so for clustered and longitudinal data. Although leave-subject-out cross-validation (CV) has been widely used, its theoretical property is unknown and its minimization is computationally expensive, especially when there are multiple smoothing parameters. By focusing on penalized modeling methods, we show that leave-subject-out CV is optimal in that its minimization is asymptotically equivalent to the minimization of the true loss function. We develop an efficient Newton-type algorithm to compute the smoothing parameters that minimize the CV criterion. Furthermore, we derive one simplification of the leave-subject-out CV, which leads to a more efficient algorithm for selecting the smoothing parameters. We show that the simplified version of CV criteria is asymptotically equivalent to the unsimplified one and thus enjoys the same optimality property. This CV criterion also provides a completely data driven approach to select working covariance structure using generalized estimating equations in longitudinal data analysis. Our results are applicable to additive, linear varying-coefficient, nonlinear models with data from exponential families. Adaptive lasso Autoregressive model Infinite variance Least absolute deviation
89	Sélection de modèles statistiques par méthodes de vraisemblance pénalisée pour l'étude de données complexes / Statistical Model Selection by penalized likelihood method for the study of complex data Ollier, Edouard 12 December 2017 (has links) Cette thèse est principalement consacrée au développement de méthodes de sélection de modèles par maximum de vraisemblance pénalisée dans le cadre de données complexes. Un premier travail porte sur la sélection des modèles linéaires généralisés dans le cadre de données stratifiées, caractérisées par la mesure d’observations ainsi que de covariables au sein de différents groupes (ou strates). Le but de l’analyse est alors de déterminer quelles covariables influencent de façon globale (quelque soit la strate) les observations mais aussi d’évaluer l’hétérogénéité de cet effet à travers les strates.Nous nous intéressons par la suite à la sélection des modèles non linéaires à effets mixtes utilisés dans l’analyse de données longitudinales comme celles rencontrées en pharmacocinétique de population. Dans un premier travail, nous décrivons un algorithme de type SAEM au sein duquel la pénalité est prise en compte lors de l’étape M en résolvant un problème de régression pénalisé à chaque itération. Dans un second travail, en s’inspirant des algorithmes de type gradient proximaux, nous simplifions l’étape M de l’algorithme SAEM pénalisé précédemment décrit en ne réalisant qu’une itération gradient proximale à chaque itération. Cet algorithme, baptisé Stochastic Approximation Proximal Gradient algorithm (SAPG), correspond à un algorithme gradient proximal dans lequel le gradient de la vraisemblance est approché par une technique d’approximation stochastique.Pour finir, nous présentons deux travaux de modélisation statistique, réalisés au cours de cette thèse. / This thesis is mainly devoted to the development of penalized maximum likelihood methods for the study of complex data.A first work deals with the selection of generalized linear models in the framework of stratified data, characterized by the measurement of observations as well as covariates within different groups (or strata). The purpose of the analysis is then to determine which covariates influence in a global way (whatever the stratum) the observations but also to evaluate the heterogeneity of this effect across the strata.Secondly, we are interested in the selection of nonlinear mixed effects models used in the analysis of longitudinal data. In a first work, we describe a SAEM-type algorithm in which the penalty is taken into account during step M by solving a penalized regression problem at each iteration. In a second work, inspired by proximal gradient algorithms, we simplify the M step of the penalized SAEM algorithm previously described by performing only one proximal gradient iteration at each iteration. This algorithm, called Stochastic Approximation Proximal Gradient Algorithm (SAPG), corresponds to a proximal gradient algorithm in which the gradient of the likelihood is approximated by a stochastic approximation technique.Finally, we present two statistical modeling works realized during this thesis. Sélection de modèle Vraisemblance pénalisée Algorithme SAEM Algorithmes gradient proximaux Modèles non linéaires à effets mixtes Model selection Penalized likelihood SAEM algorithm Proximal gradient algorithm Non linear mixed effects models
90	Velká data - extrakce klíčových informací pomocí metod matematické statistiky a strojového učení / Big data - extraction of key information combining methods of mathematical statistics and machine learning Masák, Tomáš January 2017 (has links) This thesis is concerned with data analysis, especially with principal component analysis and its sparse modi cation (SPCA), which is NP-hard-to- solve. SPCA problem can be recast into the regression framework in which spar- sity is usually induced with ℓ1-penalty. In the thesis, we propose to use iteratively reweighted ℓ2-penalty instead of the aforementioned ℓ1-approach. We compare the resulting algorithm with several well-known approaches to SPCA using both simulation study and interesting practical example in which we analyze voting re- cords of the Parliament of the Czech Republic. We show experimentally that the proposed algorithm outperforms the other considered algorithms. We also prove convergence of both the proposed algorithm and the original regression-based approach to PCA. vi

Search results