Global ETD Search

101	Hyper-parameter optimization for manifold regularization learning = Otimização de hiperparâmetros para aprendizado do computador por regularização em variedades / Otimização de hiperparâmetros para aprendizado do computador por regularização em variedades Becker, Cassiano Otávio, 1977- 08 December 2013 (has links) Orientador: Paulo Augusto Valente Ferreira / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação / Made available in DSpace on 2018-08-23T18:31:10Z (GMT). No. of bitstreams: 1 Becker_CassianoOtavio_M.pdf: 861514 bytes, checksum: 07ea364d206309cbabdf79f51037f481 (MD5) Previous issue date: 2013 / Resumo: Esta dissertação investiga o problema de otimização de hiperparâmetros para modelos de aprendizado do computador baseados em regularização. Uma revisão destes algoritmos é apresentada, abordando diferentes funções de perda e tarefas de aprendizado, incluindo Máquinas de Vetores de Suporte, Mínimos Quadrados Regularizados e sua extensão para modelos de aprendizado semi-supervisionado, mais especificamente, Regularização em Variedades. Uma abordagem baseada em otimização por gradiente é proposta, através da utilização de um método eficiente de cálculo da função de validação por exclusão unitária. Com o intuito de avaliar os métodos propostos em termos de qualidade de generalização dos modelos gerados, uma aplicação deste método a diferentes conjuntos de dados e exemplos numéricos é apresentada / Abstract: This dissertation investigates the problem of hyper-parameter optimization for regularization based learning models. A review of different learning algorithms is provided in terms of different losses and learning tasks, including Support Vector Machines, Regularized Least Squares and their extension to semi-supervised learning models, more specifically, Manifold Regularization. A gradient based optimization approach is proposed, using an efficient calculation of the Leave-one-out Cross Validation procedure. Datasets and numerical examples are provided in order to evaluate the methods proposed in terms of their generalization capability of the generated models / Mestrado / Automação / Mestre em Engenharia Elétrica Aprendizado de máquina Aprendizagem semi-supervisionado Otimização matemática Machine learning Semi-supervised learning Mathematical optimization Model selection
102	Bayesovský výběr proměnných / Bayesian variable selection Jančařík, Joel January 2017 (has links) The selection of variables problem is ussual problem of statistical analysis. Solving this problem via Bayesian statistic become popular in 1990s. We re- view classical methods for bayesian variable selection methods and set a common framework for them. Indicator model selection methods and adaptive shrinkage methods for normal linear model are covered. Main benefit of this work is incorporating Bayesian theory and Markov Chain Monte Carlo theory (MCMC). All derivations needed for MCMC algorithms is provided. Afterward the methods are apllied on simulated and real data. 1
103	Segmentation de processus avec un bruit autorégressif / Segmenting processes with an autoregressive noise Chakar, Souhil 22 September 2015 (has links) Nous proposons d’étudier la méthodologie de la segmentation de processus avec un bruit autorégressif sous ses aspects théoriques et pratiques. Par « segmentation » on entend ici l’inférence de points de rupture multiples correspondant à des changements abrupts dans la moyenne de la série temporelle. Le point de vue adopté est de considérer les paramètres de l’autorégression comme des paramètres de nuisance, à prendre en compte dans l’inférence dans la mesure où cela améliore la segmentation.D’un point de vue théorique, le but est de conserver un certain nombre de propriétés asymptotiques de l’estimation des points de rupture et des paramètres propres à chaque segment. D’un point de vue pratique, on se doit de prendre en compte les limitations algorithmiques liées à la détermination de la segmentation optimale. La méthode proposée, doublement contrainte, est basée sur l’utilisation de techniques d’estimation robuste permettant l’estimation préalable des paramètres de l’autorégression, puis la décorrélation du processus, permettant ainsi de s’approcher du problème de la segmentation dans le cas d’observations indépendantes. Cette méthode permet l’utilisation d’algorithmes efficaces. Elle est assise sur des résultats asymptotiques que nous avons démontrés. Elle permet de proposer des critères de sélection du nombre de ruptures adaptés et fondés. Une étude de simulations vient l’illustrer. / We propose to study the methodology of autoregressive processes segmentation under both its theoretical and practical aspects. “Segmentation” means here inferring multiple change-points corresponding to mean shifts. We consider autoregression parameters as nuisance parameters, whose estimation is considered only for improving the segmentation.From a theoretical point of view, we aim to keep some asymptotic properties of change-points and other parameters estimators. From a practical point of view, we have to take into account the algorithmic constraints to get the optimal segmentation. To meet these requirements, we propose a method based on robust estimation techniques, which allows a preliminary estimation of the autoregression parameters and then the decorrelation of the process. The aim is to get our problem closer to the segmentation in the case of independent observations. This method allows us to use efficient algorithms. It is based on asymptotic results that we proved. It allows us to propose adapted and well-founded number of changes selection criteria. A simulation study illustrates the method. Segmentation Modèle autorégressif Statistique robuste Sélection de modèle Segmentation Autoregressive model Robust statistics Model selection
104	Model selection in time series machine learning applications Ferreira, E. (Eija) 01 September 2015 (has links) Abstract Model selection is a necessary step for any practical modeling task. Since the true model behind a real-world process cannot be known, the goal of model selection is to find the best approximation among a set of candidate models. In this thesis, we discuss model selection in the context of time series machine learning applications. We cover four steps of the commonly followed machine learning process: data preparation, algorithm choice, feature selection and validation. We consider how the characteristics and the amount of data available should guide the selection of algorithms to be used, and how the data set at hand should be divided for model training, selection and validation to optimize the generalizability and future performance of the model. We also consider what are the special restrictions and requirements that need to be taken into account when applying regular machine learning algorithms to time series data. We especially aim to bring forth problems relating model over-fitting and over-selection that might occur due to careless or uninformed application of model selection methods. We present our results in three different time series machine learning application areas: resistance spot welding, exercise energy expenditure estimation and cognitive load modeling. Based on our findings in these studies, we draw general guidelines on which points to consider when starting to solve a new machine learning problem from the point of view of data characteristics, amount of data, computational resources and possible time series nature of the problem. We also discuss how the practical aspects and requirements set by the environment where the final model will be implemented affect the choice of algorithms to use. / Tiivistelmä Mallinvalinta on oleellinen osa minkä tahansa käytännön mallinnusongelman ratkaisua. Koska mallinnettavan ilmiön toiminnan taustalla olevaa todellista mallia ei voida tietää, on mallinvalinnan tarkoituksena valita malliehdokkaiden joukosta sitä lähimpänä oleva malli. Tässä väitöskirjassa käsitellään mallinvalintaa aikasarjamuotoista dataa sisältävissä sovelluksissa neljän koneoppimisprosessissa yleisesti noudatetun askeleen kautta: aineiston esikäsittely, algoritmin valinta, piirteiden valinta ja validointi. Väitöskirjassa tutkitaan, kuinka käytettävissä olevan aineiston ominaisuudet ja määrä tulisi ottaa huomioon algoritmin valinnassa, ja kuinka aineisto tulisi jakaa mallin opetusta, testausta ja validointia varten mallin yleistettävyyden ja tulevan suorituskyvyn optimoimiseksi. Myös erityisiä rajoitteita ja vaatimuksia tavanomaisten koneoppimismenetelmien soveltamiselle aikasarjadataan käsitellään. Työn tavoitteena on erityisesti tuoda esille mallin ylioppimiseen ja ylivalintaan liittyviä ongelmia, jotka voivat seurata mallinvalin- tamenetelmien huolimattomasta tai osaamattomasta käytöstä. Työn käytännön tulokset perustuvat koneoppimismenetelmien soveltamiseen aikasar- jadatan mallinnukseen kolmella eri tutkimusalueella: pistehitsaus, fyysisen harjoittelun aikasen energiankulutuksen arviointi sekä kognitiivisen kuormituksen mallintaminen. Väitöskirja tarjoaa näihin tuloksiin pohjautuen yleisiä suuntaviivoja, joita voidaan käyttää apuna lähdettäessä ratkaisemaan uutta koneoppimisongelmaa erityisesti aineiston ominaisuuksien ja määrän, laskennallisten resurssien sekä ongelman mahdollisen aikasar- jaluonteen näkökulmasta. Työssä pohditaan myös mallin lopullisen toimintaympäristön asettamien käytännön näkökohtien ja rajoitteiden vaikutusta algoritmin valintaan. machine learning model selection real-world applications time series data aikasarjadata koneoppiminen käytännön sovellukset mallinvalinta
105	Mélanges de GLMs et nombre de composantes : application au risque de rachat en Assurance Vie / GLM mixtures and number of components : an application to the surrender risk in life insurance Milhaud, Xavier 06 July 2012 (has links) La question du rachat préoccupe les assureurs depuis longtemps notamment dans le contexte des contrats d'épargne en Assurance-Vie, pour lesquels des sommes colossales sont en jeu. L'émergence de la directive européenne Solvabilité II, qui préconise le développement de modèles internes (dont un module entier est dédié à la gestion des risques de comportement de rachat), vient renforcer la nécessité d'approfondir la connaissance et la compréhension de ce risque. C'est à ce titre que nous abordons dans cette thèse les problématiques de segmentation et de modélisation des rachats, avec pour objectif de mieux connaître et prendre en compte l'ensemble des facteurs-clefs qui jouent sur les décisions des assurés. L'hétérogénéité des comportements et leur corrélation ainsi que l'environnement auquel sont soumis les assurés sont autant de difficultés à traiter de manière spécifique afin d'effectuer des prévisions. Nous développons ainsi une méthodologie qui aboutit à des résultats très encourageants ; et qui a l'avantage d'être réplicable en l'adaptant aux spécificités de différentes lignes de produits. A travers cette modélisation, la sélection de modèle apparaît comme un point central. Nous le traitons en établissant les propriétés de convergence forte d'un nouvel estimateur, ainsi que la consistance d'un nouveau critère de sélection dans le cadre de mélanges de modèles linéaires généralisés / Insurers have been concerned about surrenders for a long time especially in Saving business, where huge sums are at stake. The emergence of the European directive Solvency II, which promotes the development of internal risk models (among which a complete unit is dedicated to surrender risk management), strengthens the necessity to deeply study and understand this risk. In this thesis we investigate the topics of segmenting and modeling surrenders in order to better know and take into account the main risk factors impacting policyholders’ decisions. We find that several complex aspects must be specifically dealt with to predict surrenders, in particular the heterogeneity of behaviours and their correlations as well as the context faced by the insured. Combining them, we develop a methodology that seems to provide good results on given business lines, and that moreover can be adapted for other products with little effort. However the model selection step suffers from a lack of parsimoniousness: we suggest to use another criteria based on a new estimator, and prove its consistant properties in the framework of mixtures of generalized linear models Comportement de rachat Mélange Classification GLM Sélection de modèle Surrender behaviour Finite mixtures Classification GLM Model selection 519.8
106	Využití bootstrapu a křížové validace v odhadu predikční chyby regresních modelů / Utilizing Bootstrap and Cross-validation for prediction error estimation in regression models Lepša, Ondřej January 2014 (has links) Finding a well-predicting model is one of the main goals of regression analysis. However, to evaluate a model's prediction abilities, it is a normal practice to use criteria which either do not serve this purpose, or criteria of insufficient reliability. As an alternative, there are relatively new methods which use repeated simulations for estimating an appropriate loss function -- prediction error. Cross-validation and bootstrap belong to this category. This thesis describes how to utilize these methods in order to select a regression model that best predicts new values of the response variable.
107	Analysis Methods for No-Confounding Screening Designs January 2020 (has links) abstract: Nonregular designs are a preferable alternative to regular resolution four designs because they avoid confounding two-factor interactions. As a result nonregular designs can estimate and identify a few active two-factor interactions. However, due to the sometimes complex alias structure of nonregular designs, standard screening strategies can fail to identify all active effects. In this research, two-level nonregular screening designs with orthogonal main effects will be discussed. By utilizing knowledge of the alias structure, a design based model selection process for analyzing nonregular designs is proposed. The Aliased Informed Model Selection (AIMS) strategy is a design specific approach that is compared to three generic model selection methods; stepwise regression, least absolute shrinkage and selection operator (LASSO), and the Dantzig selector. The AIMS approach substantially increases the power to detect active main effects and two-factor interactions versus the aforementioned generic methodologies. This research identifies design specific model spaces; sets of models with strong heredity, all estimable, and exhibit no model confounding. These spaces are then used in the AIMS method along with design specific aliasing rules for model selection decisions. Model spaces and alias rules are identified for three designs; 16-run no-confounding 6, 7, and 8-factor designs. The designs are demonstrated with several examples as well as simulations to show the AIMS superiority in model selection. A final piece of the research provides a method for augmenting no-confounding designs based on a model spaces and maximum average D-efficiency. Several augmented designs are provided for different situations. A final simulation with the augmented designs shows strong results for augmenting four additional runs if time and resources permit. / Dissertation/Thesis / Doctoral Dissertation Industrial Engineering 2020 Industrial engineering Statistics Alias Patterns Design of Experiments Model Selection Screening Experiments Two-Level Designs
108	Experimental Investigation and Statistical Analysis of Entrainment Rates of Particles in Suspended Load / 浮流粒子の連行率の実験的研究および統計的分析 Yao, Qifeng 24 September 2019 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(理学) / 甲第22032号 / 理博第4536号 / 新制\|\|理\|\|1651(附属図書館) / 京都大学大学院理学研究科地球惑星科学専攻 / (主査)准教授成瀬元, 教授生形貴男, 准教授堤昭人 / 学位規則第4条第1項該当 / Doctor of Science / Kyoto University / DFAM suspended load turbidity current entrainment rate flume experiment model selection 400
109	Improved estimation of hunting harvest using covariates at the hunting management precinct level Jonsson, Paula January 2021 (has links) In Sweden, reporting is voluntary for most common felled game, and the number of voluntary reports can vary between hunting teams, HMP, and counties. In 2020, an improved harvest estimation model was developed, which reduced the sensitivity to low reporting. However, there were still some limits to the model, where large, credible intervals were estimated. Therefore, additional variables were considered as the model does not take into account landcover among HMPs, [2] the impact of climate, [4] wildlife accidents, and [4] geographical distribution, creating the covariate model. This study aimed to compare the new model with the covariate model to see if covariates would reduce the large, credible intervals. Two hypothesis tests were performed: evaluation of predictive performance using leave one out cross-validation and evaluation of the 95 % credible interval. Evaluation of predictive performance was performed by examining the difference in expected log-pointwise predictive density (ELPD) and standard error (SE) for each species and model. The results show that the covariates model ranked highest for all ten species, and out of the ten species, six had an (ELPD) difference of two to four, which implies that there is support that the covariate model will be a better predictor for other datasets than this one. At least one covariate had an apparent effect on harvest estimates for nine out of ten species. Finally, the covariate model reduced the large uncertainties, which was an improvement of the null model, indicating that harvest estimates can be improved by taking covariates into account. Bayesian statistics covariates land cover credible interval model selection Ecology Ekologi
110	STOCHASTIC MODEL GENERATION AND SELECTION FOR DEVICE EMULATING STRUCTURAL MATERIAL NONLINEARITY Sunny Ambalal Sharma (10668816) 07 May 2021 (has links) <div><div><div><p>Structural identification is a useful tool for detecting damage and damage evolution in a structure. The initiation of damage in a structure and its subsequent growth are mainly associated with nonlinear behaviors. While linear dynamics of a structure are easy to simulate, nonlinear structural dynamics have more complex dynamics and amplitude dependence that do require more sophisticated simulation tools and identification methods compared to linear systems. Additionally, there are generally many more parameters in nonlinear models and the responses may not be sensitive to all of them for all inputs. To develop model selection methods, an experiment is conducted that uses an existing device with repeatable behavior and having an expected model from the literature. In this case, an MR damper is selected as the experimental device. The objective of this research is to develop and demonstrate a method to select the most appropriate model from a set of identified stochastic models of a nonlinear device. The method is developed using numerical example of a common nonlinear system, and is then implemented on an experimental structural system with unknown nonlinear properties. Bayesian methods are used because they provide a distinct advantage over many other existing methods due to their ability to provide confidence on answers given the observed data and initial uncertainty. These methods generate a description of the parameters of the system given a set of observations. First, the selected model of the MR damper is simulated and used for demonstrating the results on a numerical example. Second, the model selection process is demonstrated on an experimental structure based on experimental data. This study explores the use of the Bayesian approach for nonlinear structural identification and identifies a number of lessons for others aiming to employ Bayesian inference.</p></div></div></div> Structural Engineering Bayesian inference method Unscented Kalman Filter nonlinear structural identification model selection model generation

Search results