Global ETD Search

91	Seleção de modelos lineares mistos utilizando critérios de informação / Mixed linear model selection using information criterion Tatiana Kazue Yamanouchi 18 August 2017 (has links) O modelo misto é comumente utilizado em dados de medidas repetidas devido a sua flexibilidade de incorporar no modelo a correlação existente entre as observações medidas no mesmo indivíduo e a heterogeneidade de variâncias das observações feitas ao longo do tempo. Este modelo é composto de efeitos fixos, efeitos aleatórios e o erro aleatório e com isso na seleção do modelo misto muitas vezes é necessário selecionar os melhores componentes do modelo misto de tal forma que represente bem os dados. Os critérios de informação são ferramentas muito utilizadas na seleção de modelos, mas não há muitos estudos que indiquem como os critérios de informação se desempenham na seleção dos efeitos fixos, efeitos aleatórios e da estrutura de covariância que compõe o erro aleatório. Diante disso, neste trabalho realizou-se um estudo de simulação para avaliar o desempenho dos critérios de informação AIC, BIC e KIC na seleção dos componentes do modelo misto, medido pela taxa TP (Taxa de verdadeiro positivo). De modo geral, os critérios de informação se desempenharam bem, ou seja, tiveram altos valores de taxa TP em situações em que o tamanho da amostra é maior. Na seleção de efeitos fixos e na seleção da estrutura de covariância, em quase todas as situações, o critério BIC teve um desempenho melhor em relação aos critérios AIC e KIC. Na seleção de efeitos aleatórios nenhum critério teve um bom desempenho, exceto na seleção de efeitos aleatórios em que considera a estrutura de simetria composta, situação em que BIC teve o melhor desempenho. / The mixed model is commonly used in data of repeated measurements because of its flexibility to incorporate in the model the correlation existing between the observations measured in the same individual and the heterogeneity of variances of observations made over time. This model is composed of fixed effects, random effects and random error and with this in the selection of the mixed model it is often necessary to select the best components of the mixed model in such a way that it represents the data well. Information criteria are tools widely used in model selection, but there are not many studies that indicate how information criteria play out in the selection of fixed effects, random effects, and the covariance structure that makes up the random error. In this work, a simulation study was performed to evaluate the performance of the AIC, BIC and KIC information criteria in the selection of the components of the mixed model, measured by the TP (True positive Rate). In general, the information criteria performed well, that is, they had high TP rate in situations where the sample size is larger. In the selection of fixed effects and in the selection of the covariance structure, in almost all situations, the BIC criterion had a better performance in relation to the AIC and KIC criteria. In the selection of random effects no criterion had a good performance, except in the selection of Random effects in which it considers the compound symmetric structure, situation in which BIC had the best performance. Critério de informação Modelos mistos Seleção de modelos Simulação Information criterion Mixed models Model selection Simulation
92	Seleção de modelos cópula-GARCH: uma abordagem bayesiana / Copula-Garch model model selection: a bayesian approach João Luiz Rossi 04 June 2012 (has links) Esta dissertação teve como objetivo o estudo de modelos para séries temporais bivariadas, que tem a estrutura de dependência determinada por meio de funções de cópulas. A vantagem desta abordagem é que as cópulas fornecem uma descrição completa da estrutura de dependência. Em termos de inferência, foi adotada uma abordagem Bayesiana com utilização dos métodos de Monte Carlo via cadeias de Markov (MCMC). Primeiramente, um estudo de simulações foi realizado para verificar como os seguintes fatores, tamanho das séries e variações nas funções de cópula, nas distribuições marginais, nos valores do parâmetro de cópula e nos métodos de estimação, influenciam a taxa de seleção de modelos segundo os critérios EAIC, EBIC e DIC. Posteriormente, foram realizadas aplicações a dados reais dos modelos com estrutura de dependência estática e variante no tempo / The aim of this work was to study models for bivariate time series, where the dependence structure among the series is modeled by copulas. The advantage of this approach is that copulas provide a complete description of dependence structure. In terms of inference was adopted the Bayesian approach with utilization of Markov chain Monte Carlo (MCMC) methods. First, a simulation study was performed to verify how the factors, length of the series and variations on copula functions, on marginal distributions, on copula parameter value and on estimation methods, may affect models selection rate given by EAIC, EBIC and DIC criteria. After that, we applied the models with static and time-varying dependence structure to real data Cópulas DIC GARCH Modelos assimétricos Seleção de modelos Asymmetric models Copulas DIC GARCH Model selection
93	Modelos de regressão sobre dados composicionais / Regression model for Compositional data André Pierro de Camargo 09 December 2011 (has links) Dados composicionais são constituídos por vetores cujas componentes representam as proporções de algum montante, isto é: vetores com entradas positivas cuja soma é igual a 1. Em diversas áreas do conhecimento, o problema de estimar as partes $y_1, y_2, \\dots, y_D$ correspondentes aos setores $SE_1, SE_2, \\dots, SE_D$, de uma certa quantidade $Q$, aparece com frequência. As porcentagens $y_1, y_2, \\dots, y_D$ de intenção de votos correspondentes aos candidatos $Ca_1, Ca_2, \\dots, Ca_D$ em eleições governamentais ou as parcelas de mercado correspondentes a industrias concorrentes formam exemplos típicos. Naturalmente, é de grande interesse analisar como variam tais proporções em função de certas mudanças contextuais, por exemplo, a localização geográfica ou o tempo. Em qualquer ambiente competitivo, informações sobre esse comportamento são de grande auxílio para a elaboração das estratégias dos concorrentes. Neste trabalho, apresentamos e discutimos algumas abordagens propostas na literatura para regressão sobre dados composicionais, assim como alguns métodos de seleção de modelos baseados em inferência bayesiana. \\\\ / Compositional data consist of vectors whose components are the proportions of some whole. The problem of estimating the portions $y_1, y_2, \\dots, y_D$ corresponding to the pieces $SE_1, SE_2, \\dots, SE_D$ of some whole $Q$ is often required in several domains of knowledge. The percentages $y_1, y_2, \\dots, y_D$ of votes corresponding to the competitors $Ca_1, Ca_2, \\dots, Ca_D$ in governmental elections or market share problems are typical examples. Of course, it is of great interest to study the behavior of such proportions according to some contextual transitions. In any competitive environmet, additional information of such behavior can be very helpful for the strategists to make proper decisions. In this work we present and discuss some approaches proposed by different authors for compositional data regression as well as some model selection methods based on bayesian inference.\\\\ BIC Dados composicionais FBST Modelos de regressão Seleção de modelos BIC Compositional data FBST Model selection Regression models
94	Hyper-parameter optimization for manifold regularization learning = Otimização de hiperparâmetros para aprendizado do computador por regularização em variedades / Otimização de hiperparâmetros para aprendizado do computador por regularização em variedades Becker, Cassiano Otávio, 1977- 08 December 2013 (has links) Orientador: Paulo Augusto Valente Ferreira / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação / Made available in DSpace on 2018-08-23T18:31:10Z (GMT). No. of bitstreams: 1 Becker_CassianoOtavio_M.pdf: 861514 bytes, checksum: 07ea364d206309cbabdf79f51037f481 (MD5) Previous issue date: 2013 / Resumo: Esta dissertação investiga o problema de otimização de hiperparâmetros para modelos de aprendizado do computador baseados em regularização. Uma revisão destes algoritmos é apresentada, abordando diferentes funções de perda e tarefas de aprendizado, incluindo Máquinas de Vetores de Suporte, Mínimos Quadrados Regularizados e sua extensão para modelos de aprendizado semi-supervisionado, mais especificamente, Regularização em Variedades. Uma abordagem baseada em otimização por gradiente é proposta, através da utilização de um método eficiente de cálculo da função de validação por exclusão unitária. Com o intuito de avaliar os métodos propostos em termos de qualidade de generalização dos modelos gerados, uma aplicação deste método a diferentes conjuntos de dados e exemplos numéricos é apresentada / Abstract: This dissertation investigates the problem of hyper-parameter optimization for regularization based learning models. A review of different learning algorithms is provided in terms of different losses and learning tasks, including Support Vector Machines, Regularized Least Squares and their extension to semi-supervised learning models, more specifically, Manifold Regularization. A gradient based optimization approach is proposed, using an efficient calculation of the Leave-one-out Cross Validation procedure. Datasets and numerical examples are provided in order to evaluate the methods proposed in terms of their generalization capability of the generated models / Mestrado / Automação / Mestre em Engenharia Elétrica Aprendizado de máquina Aprendizagem semi-supervisionado Otimização matemática Machine learning Semi-supervised learning Mathematical optimization Model selection
95	Bayesovský výběr proměnných / Bayesian variable selection Jančařík, Joel January 2017 (has links) The selection of variables problem is ussual problem of statistical analysis. Solving this problem via Bayesian statistic become popular in 1990s. We re- view classical methods for bayesian variable selection methods and set a common framework for them. Indicator model selection methods and adaptive shrinkage methods for normal linear model are covered. Main benefit of this work is incorporating Bayesian theory and Markov Chain Monte Carlo theory (MCMC). All derivations needed for MCMC algorithms is provided. Afterward the methods are apllied on simulated and real data. 1
96	Segmentation de processus avec un bruit autorégressif / Segmenting processes with an autoregressive noise Chakar, Souhil 22 September 2015 (has links) Nous proposons d’étudier la méthodologie de la segmentation de processus avec un bruit autorégressif sous ses aspects théoriques et pratiques. Par « segmentation » on entend ici l’inférence de points de rupture multiples correspondant à des changements abrupts dans la moyenne de la série temporelle. Le point de vue adopté est de considérer les paramètres de l’autorégression comme des paramètres de nuisance, à prendre en compte dans l’inférence dans la mesure où cela améliore la segmentation.D’un point de vue théorique, le but est de conserver un certain nombre de propriétés asymptotiques de l’estimation des points de rupture et des paramètres propres à chaque segment. D’un point de vue pratique, on se doit de prendre en compte les limitations algorithmiques liées à la détermination de la segmentation optimale. La méthode proposée, doublement contrainte, est basée sur l’utilisation de techniques d’estimation robuste permettant l’estimation préalable des paramètres de l’autorégression, puis la décorrélation du processus, permettant ainsi de s’approcher du problème de la segmentation dans le cas d’observations indépendantes. Cette méthode permet l’utilisation d’algorithmes efficaces. Elle est assise sur des résultats asymptotiques que nous avons démontrés. Elle permet de proposer des critères de sélection du nombre de ruptures adaptés et fondés. Une étude de simulations vient l’illustrer. / We propose to study the methodology of autoregressive processes segmentation under both its theoretical and practical aspects. “Segmentation” means here inferring multiple change-points corresponding to mean shifts. We consider autoregression parameters as nuisance parameters, whose estimation is considered only for improving the segmentation.From a theoretical point of view, we aim to keep some asymptotic properties of change-points and other parameters estimators. From a practical point of view, we have to take into account the algorithmic constraints to get the optimal segmentation. To meet these requirements, we propose a method based on robust estimation techniques, which allows a preliminary estimation of the autoregression parameters and then the decorrelation of the process. The aim is to get our problem closer to the segmentation in the case of independent observations. This method allows us to use efficient algorithms. It is based on asymptotic results that we proved. It allows us to propose adapted and well-founded number of changes selection criteria. A simulation study illustrates the method. Segmentation Modèle autorégressif Statistique robuste Sélection de modèle Segmentation Autoregressive model Robust statistics Model selection
97	Model selection in time series machine learning applications Ferreira, E. (Eija) 01 September 2015 (has links) Abstract Model selection is a necessary step for any practical modeling task. Since the true model behind a real-world process cannot be known, the goal of model selection is to find the best approximation among a set of candidate models. In this thesis, we discuss model selection in the context of time series machine learning applications. We cover four steps of the commonly followed machine learning process: data preparation, algorithm choice, feature selection and validation. We consider how the characteristics and the amount of data available should guide the selection of algorithms to be used, and how the data set at hand should be divided for model training, selection and validation to optimize the generalizability and future performance of the model. We also consider what are the special restrictions and requirements that need to be taken into account when applying regular machine learning algorithms to time series data. We especially aim to bring forth problems relating model over-fitting and over-selection that might occur due to careless or uninformed application of model selection methods. We present our results in three different time series machine learning application areas: resistance spot welding, exercise energy expenditure estimation and cognitive load modeling. Based on our findings in these studies, we draw general guidelines on which points to consider when starting to solve a new machine learning problem from the point of view of data characteristics, amount of data, computational resources and possible time series nature of the problem. We also discuss how the practical aspects and requirements set by the environment where the final model will be implemented affect the choice of algorithms to use. / Tiivistelmä Mallinvalinta on oleellinen osa minkä tahansa käytännön mallinnusongelman ratkaisua. Koska mallinnettavan ilmiön toiminnan taustalla olevaa todellista mallia ei voida tietää, on mallinvalinnan tarkoituksena valita malliehdokkaiden joukosta sitä lähimpänä oleva malli. Tässä väitöskirjassa käsitellään mallinvalintaa aikasarjamuotoista dataa sisältävissä sovelluksissa neljän koneoppimisprosessissa yleisesti noudatetun askeleen kautta: aineiston esikäsittely, algoritmin valinta, piirteiden valinta ja validointi. Väitöskirjassa tutkitaan, kuinka käytettävissä olevan aineiston ominaisuudet ja määrä tulisi ottaa huomioon algoritmin valinnassa, ja kuinka aineisto tulisi jakaa mallin opetusta, testausta ja validointia varten mallin yleistettävyyden ja tulevan suorituskyvyn optimoimiseksi. Myös erityisiä rajoitteita ja vaatimuksia tavanomaisten koneoppimismenetelmien soveltamiselle aikasarjadataan käsitellään. Työn tavoitteena on erityisesti tuoda esille mallin ylioppimiseen ja ylivalintaan liittyviä ongelmia, jotka voivat seurata mallinvalin- tamenetelmien huolimattomasta tai osaamattomasta käytöstä. Työn käytännön tulokset perustuvat koneoppimismenetelmien soveltamiseen aikasar- jadatan mallinnukseen kolmella eri tutkimusalueella: pistehitsaus, fyysisen harjoittelun aikasen energiankulutuksen arviointi sekä kognitiivisen kuormituksen mallintaminen. Väitöskirja tarjoaa näihin tuloksiin pohjautuen yleisiä suuntaviivoja, joita voidaan käyttää apuna lähdettäessä ratkaisemaan uutta koneoppimisongelmaa erityisesti aineiston ominaisuuksien ja määrän, laskennallisten resurssien sekä ongelman mahdollisen aikasar- jaluonteen näkökulmasta. Työssä pohditaan myös mallin lopullisen toimintaympäristön asettamien käytännön näkökohtien ja rajoitteiden vaikutusta algoritmin valintaan. machine learning model selection real-world applications time series data aikasarjadata koneoppiminen käytännön sovellukset mallinvalinta
98	Mélanges de GLMs et nombre de composantes : application au risque de rachat en Assurance Vie / GLM mixtures and number of components : an application to the surrender risk in life insurance Milhaud, Xavier 06 July 2012 (has links) La question du rachat préoccupe les assureurs depuis longtemps notamment dans le contexte des contrats d'épargne en Assurance-Vie, pour lesquels des sommes colossales sont en jeu. L'émergence de la directive européenne Solvabilité II, qui préconise le développement de modèles internes (dont un module entier est dédié à la gestion des risques de comportement de rachat), vient renforcer la nécessité d'approfondir la connaissance et la compréhension de ce risque. C'est à ce titre que nous abordons dans cette thèse les problématiques de segmentation et de modélisation des rachats, avec pour objectif de mieux connaître et prendre en compte l'ensemble des facteurs-clefs qui jouent sur les décisions des assurés. L'hétérogénéité des comportements et leur corrélation ainsi que l'environnement auquel sont soumis les assurés sont autant de difficultés à traiter de manière spécifique afin d'effectuer des prévisions. Nous développons ainsi une méthodologie qui aboutit à des résultats très encourageants ; et qui a l'avantage d'être réplicable en l'adaptant aux spécificités de différentes lignes de produits. A travers cette modélisation, la sélection de modèle apparaît comme un point central. Nous le traitons en établissant les propriétés de convergence forte d'un nouvel estimateur, ainsi que la consistance d'un nouveau critère de sélection dans le cadre de mélanges de modèles linéaires généralisés / Insurers have been concerned about surrenders for a long time especially in Saving business, where huge sums are at stake. The emergence of the European directive Solvency II, which promotes the development of internal risk models (among which a complete unit is dedicated to surrender risk management), strengthens the necessity to deeply study and understand this risk. In this thesis we investigate the topics of segmenting and modeling surrenders in order to better know and take into account the main risk factors impacting policyholders’ decisions. We find that several complex aspects must be specifically dealt with to predict surrenders, in particular the heterogeneity of behaviours and their correlations as well as the context faced by the insured. Combining them, we develop a methodology that seems to provide good results on given business lines, and that moreover can be adapted for other products with little effort. However the model selection step suffers from a lack of parsimoniousness: we suggest to use another criteria based on a new estimator, and prove its consistant properties in the framework of mixtures of generalized linear models Comportement de rachat Mélange Classification GLM Sélection de modèle Surrender behaviour Finite mixtures Classification GLM Model selection 519.8
99	Využití bootstrapu a křížové validace v odhadu predikční chyby regresních modelů / Utilizing Bootstrap and Cross-validation for prediction error estimation in regression models Lepša, Ondřej January 2014 (has links) Finding a well-predicting model is one of the main goals of regression analysis. However, to evaluate a model's prediction abilities, it is a normal practice to use criteria which either do not serve this purpose, or criteria of insufficient reliability. As an alternative, there are relatively new methods which use repeated simulations for estimating an appropriate loss function -- prediction error. Cross-validation and bootstrap belong to this category. This thesis describes how to utilize these methods in order to select a regression model that best predicts new values of the response variable.
100	Analysis Methods for No-Confounding Screening Designs January 2020 (has links) abstract: Nonregular designs are a preferable alternative to regular resolution four designs because they avoid confounding two-factor interactions. As a result nonregular designs can estimate and identify a few active two-factor interactions. However, due to the sometimes complex alias structure of nonregular designs, standard screening strategies can fail to identify all active effects. In this research, two-level nonregular screening designs with orthogonal main effects will be discussed. By utilizing knowledge of the alias structure, a design based model selection process for analyzing nonregular designs is proposed. The Aliased Informed Model Selection (AIMS) strategy is a design specific approach that is compared to three generic model selection methods; stepwise regression, least absolute shrinkage and selection operator (LASSO), and the Dantzig selector. The AIMS approach substantially increases the power to detect active main effects and two-factor interactions versus the aforementioned generic methodologies. This research identifies design specific model spaces; sets of models with strong heredity, all estimable, and exhibit no model confounding. These spaces are then used in the AIMS method along with design specific aliasing rules for model selection decisions. Model spaces and alias rules are identified for three designs; 16-run no-confounding 6, 7, and 8-factor designs. The designs are demonstrated with several examples as well as simulations to show the AIMS superiority in model selection. A final piece of the research provides a method for augmenting no-confounding designs based on a model spaces and maximum average D-efficiency. Several augmented designs are provided for different situations. A final simulation with the augmented designs shows strong results for augmenting four additional runs if time and resources permit. / Dissertation/Thesis / Doctoral Dissertation Industrial Engineering 2020 Industrial engineering Statistics Alias Patterns Design of Experiments Model Selection Screening Experiments Two-Level Designs

Search results