Global ETD Search

181	Sélection de modèle : une approche décisionnelle Boisbunon, Aurélie 14 January 2013 (has links) (PDF) Cette thèse s'articule autour de la problématique de la sélection de modèle, étudiée dans le contexte de la régression linéaire. L'objectif est de déterminer le meilleur modèle de prédiction à partir de données mesurées, c'est-à-dire le modèle réalisant le meilleur compromis entre attache aux données et complexité du modèle. La contribution principale consiste en la dérivation de critères d'évaluation de modèles basés sur des techniques de théorie de la décision, plus précisément l'estimation de coût. Ces critères reposent sur une hypothèse distributionnelle plus large que l'hypothèse classique gaussienne avec indépendance entre les observations : la famille des lois à symétrie sphérique. Cette famille nous permet à la fois de nous affranchir de l'hypothèse d'indépendance et d'ajouter une plus grande robustesse puisque nos critères ne dépendent pas de la forme spécifique de la distribution. Nous proposons également une méthode de comparaison des critères dérivés au travers d'une mesure de type Erreur quadratique (MSE), qui permet de déterminer si un critère d'évaluation de modèle est meilleur qu'un autre. La seconde contribution attaque le problème de la construction des différents modèles comparés. Les collections de modèles considérées sont celles issues des méthodes de régularisation parcimonieuses, de type Lasso. En particulier, nous nous sommes intéressés à la Pénalité Concave Minimax (MCP), qui garde la sélection du Lasso tout en corrigeant son biais d'estimation. Cette pénalité correspond cependant à un problème non différentiable et non convexe. La généralisation des outils habituels de sous-différentielles grâce aux différentielles de Clarke a permis de déterminer les conditions d'optimalité et de développer un algorithme de chemin de régularisation pour le MCP. Enfin, nous comparons nos propositions avec celles de la littérature au travers d'une étude numérique, dans laquelle nous vérifions la qualité de la sélection. Les résultats montrent notamment que nos critères obtiennent des performances comparables à ceux de la littérature, et que les critères les plus couramment utilisés en pratique (validation croisée) ne sont pas toujours parmi les plus performants. [STAT:TH] Statistics/Statistics Theory [STAT:ML] Statistics/Machine Learning sélection de modèle sélection de variable régression linéaire estimation de coût distributions à symétrie sphérique dépendance Lasso MCP
182	Exploring the Boundaries of Gene Regulatory Network Inference Tjärnberg, Andreas January 2015 (has links) To understand how the components of a complex system like the biological cell interact and regulate each other, we need to collect data for how the components respond to system perturbations. Such data can then be used to solve the inverse problem of inferring a network that describes how the pieces influence each other. The work in this thesis deals with modelling the cell regulatory system, often represented as a network, with tools and concepts derived from systems biology. The first investigation focuses on network sparsity and algorithmic biases introduced by penalised network inference procedures. Many contemporary network inference methods rely on a sparsity parameter such as the L1 penalty term used in the LASSO. However, a poor choice of the sparsity parameter can give highly incorrect network estimates. In order to avoid such poor choices, we devised a method to optimise the sparsity parameter, which maximises the accuracy of the inferred network. We showed that it is effective on in silico data sets with a reasonable level of informativeness and demonstrated that accurate prediction of network sparsity is key to elucidate the correct network parameters. The second investigation focuses on how knowledge from association networks can be transferred to regulatory network inference procedures. It is common that the quality of expression data is inadequate for reliable gene regulatory network inference. Therefore, we constructed an algorithm to incorporate prior knowledge and demonstrated that it increases the accuracy of network inference when the quality of the data is low. The third investigation aimed to understand the influence of system and data properties on network inference accuracy. L1 regularisation methods commonly produce poor network estimates when the data used for inference is ill-conditioned, even when the signal to noise ratio is so high that all links in the network can be proven to exist for the given significance. In this study we elucidated some general principles for under what conditions we expect strongly degraded accuracy. Moreover, it allowed us to estimate expected accuracy from conditions of simulated data, which was used to predict the performance of inference algorithms on biological data. Finally, we built a software package GeneSPIDER for solving problems encountered during previous investigations. The software package supports highly controllable network and data generation as well as data analysis and exploration in the context of network inference. / <p>At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 4: Manuscript.</p><p> </p> GRN gene regulatory network network inference signal to noise ratio model selection variable selection data properties reverse engineering ordinary differential equations gene networks linear regression lasso
183	Variable Selection and Function Estimation Using Penalized Methods Xu, Ganggang 2011 December 1900 (has links) Penalized methods are becoming more and more popular in statistical research. This dissertation research covers two major aspects of applications of penalized methods: variable selection and nonparametric function estimation. The following two paragraphs give brief introductions to each of the two topics. Infinite variance autoregressive models are important for modeling heavy-tailed time series. We use a penalty method to conduct model selection for autoregressive models with innovations in the domain of attraction of a stable law indexed by alpha is an element of (0, 2). We show that by combining the least absolute deviation loss function and the adaptive lasso penalty, we can consistently identify the true model. At the same time, the resulting coefficient estimator converges at a rate of n^(?1/alpha) . The proposed approach gives a unified variable selection procedure for both the finite and infinite variance autoregressive models. While automatic smoothing parameter selection for nonparametric function estimation has been extensively researched for independent data, it is much less so for clustered and longitudinal data. Although leave-subject-out cross-validation (CV) has been widely used, its theoretical property is unknown and its minimization is computationally expensive, especially when there are multiple smoothing parameters. By focusing on penalized modeling methods, we show that leave-subject-out CV is optimal in that its minimization is asymptotically equivalent to the minimization of the true loss function. We develop an efficient Newton-type algorithm to compute the smoothing parameters that minimize the CV criterion. Furthermore, we derive one simplification of the leave-subject-out CV, which leads to a more efficient algorithm for selecting the smoothing parameters. We show that the simplified version of CV criteria is asymptotically equivalent to the unsimplified one and thus enjoys the same optimality property. This CV criterion also provides a completely data driven approach to select working covariance structure using generalized estimating equations in longitudinal data analysis. Our results are applicable to additive, linear varying-coefficient, nonlinear models with data from exponential families. Adaptive lasso Autoregressive model Infinite variance Least absolute deviation
184	Sélection de variables pour la classification non supervisée en grande dimension Meynet, Caroline 09 November 2012 (has links) (PDF) Il existe des situations de modélisation statistique pour lesquelles le problème classique de classification non supervisée (c'est-à-dire sans information a priori sur la nature ou le nombre de classes à constituer) se double d'un problème d'identification des variables réellement pertinentes pour déterminer la classification. Cette problématique est d'autant plus essentielle que les données dites de grande dimension, comportant bien plus de variables que d'observations, se multiplient ces dernières années : données d'expression de gènes, classification de courbes... Nous proposons une procédure de sélection de variables pour la classification non supervisée adaptée aux problèmes de grande dimension. Nous envisageons une approche par modèles de mélange gaussien, ce qui nous permet de reformuler le problème de sélection des variables et du choix du nombre de classes en un problème global de sélection de modèle. Nous exploitons les propriétés de sélection de variables de la régularisation l1 pour construire efficacement, à partir des données, une collection de modèles qui reste de taille raisonnable même en grande dimension. Nous nous démarquons des procédures classiques de sélection de variables par régularisation l1 en ce qui concerne l'estimation des paramètres : dans chaque modèle, au lieu de considérer l'estimateur Lasso, nous calculons l'estimateur du maximum de vraisemblance. Ensuite, nous sélectionnons l'un des ces estimateurs du maximum de vraisemblance par un critère pénalisé non asymptotique basé sur l'heuristique de pente introduite par Birgé et Massart. D'un point de vue théorique, nous établissons un théorème de sélection de modèle pour l'estimation d'une densité par maximum de vraisemblance pour une collection aléatoire de modèles. Nous l'appliquons dans notre contexte pour trouver une forme de pénalité minimale pour notre critère pénalisé. D'un point de vue pratique, des simulations sont effectuées pour valider notre procédure, en particulier dans le cadre de la classification non supervisée de courbes. L'idée clé de notre procédure est de n'utiliser la régularisation l1 que pour constituer une collection restreinte de modèles et non pas aussi pour estimer les paramètres des modèles. Cette étape d'estimation est réalisée par maximum de vraisemblance. Cette procédure hybride nous est inspirée par une étude théorique menée dans une première partie dans laquelle nous établissons des inégalités oracle l1 pour le Lasso dans les cadres de régression gaussienne et de mélange de régressions gaussiennes, qui se démarquent des inégalités oracle l0 traditionnellement établies par leur absence totale d'hypothèse. Sélection de variables Modèles de mélange gaussien Classification non supervisée Grande dimension Lasso Régularisation l1 Inégalités oracle
185	Regularisation and variable selection using penalized likelihood. El anbari, Mohammed 14 December 2011 (has links) (PDF) We are interested in variable sélection in linear régression models. This research is motivated by recent development in microarrays, proteomics, brain images, among others. We study this problem in both frequentist and bayesian viewpoints.In a frequentist framework, we propose methods to deal with the problem of variable sélection, when the number of variables is much larger than the sample size with a possibly présence of additional structure in the predictor variables, such as high corrélations or order between successive variables. The performance of the proposed methods is theoretically investigated ; we prove that, under regularity conditions, the proposed estimators possess statistical good properties, such as Sparsity Oracle Inequalities, variable sélection consistency and asymptotic normality.In a Bayesian Framework, we propose a global noninformative approach for Bayesian variable sélection. In this thesis, we pay spécial attention to two calibration-free hierarchical Zellner's g-priors. The first one is the Jeffreys prior which is not location invariant. A second one avoids this problem by only considering models with at least one variable in the model. The practical performance of the proposed methods is illustrated through numerical experiments on simulated and real world datasets, with a comparison betwenn Bayesian and frequentist approaches under a low informative constraint when the number of variables is almost equal to the number of observations. Dimensionality réduction High dimensionality LASSO Scad Elastic-net Model selection Oracle property Zellner's g-prior Calibration
186	Primal dual pursuit: a homotopy based algorithm for the Dantzig selector Asif, Muhammad Salman 10 July 2008 (has links) Consider the following system model y = Ax + e, where x is n-dimensional sparse signal, y is the measurement vector in a much lower dimension m, A is the measurement matrix and e is the error in our measurements. The Dantzig selector estimates x by solving the following optimization problem minimize \|\| x \|\|₁ subject to \|\| A'(Ax - y) \|\|∞ ≤ ε, (DS). This is a convex program and can be recast into a linear program and solved using any modern optimization method e.g., interior point methods. We propose a fast and efficient scheme for solving the Dantzig Selector (DS), which we call "Primal-Dual pursuit". This algorithm can be thought of as a "primal-dual homotopy" approach to solve the Dantzig selector (DS). It computes the solution to (DS) for a range of successively relaxed problems, by starting with a large artificial ε and moving towards the desired value. Our algorithm iteratively updates the primal and dual supports as ε reduces to the desired value, which gives final solution. The homotopy path solution of (DS) takes with varying ε is piecewise linear. At some critical values of ε in this path, either some new elements enter the support of the signal or some existing elements leave the support. We derive the optimality and feasibility conditions which are used to update the solutions at these critical points. We also present a detailed analysis of primal-dual pursuit for sparse signals in noiseless case. We show that if our signal is S-sparse, then we can find all its S elements in exactly S steps using about "S² log n" random measurements, with very high probability. Statistical estimation Random matrices Convex optimization Compressed sensing Sparse signal recovery Linear programming LASSO Model selection L1 minimization Dantzig shrinkability Mathematical optimization Homotopy theory Signal processing
187	[en] CONTRIBUTIONS TO THE ECONOMETRICS OF COUNTERFACTUAL ANALYSIS / [pt] CONTRIBUIÇÕES PARA A ECONOMETRIA DE ANÁLISE CONTRAFACTUAL RICARDO PEREIRA MASINI 10 July 2017 (has links) [pt] Esta tese é composta por três capítulos que abordam a econometria de análise contrafactual. No primeiro capítulo, propomos uma nova metodologia para estimar efeitos causais de uma intervenção que ocorre em apenas uma unidade e não há um grupo de controle disponível. Esta metodologia, a qual chamamos de contrafactual artificial (ArCo na sigla em inglês), consiste em dois estágios: no primeiro um contrafactual é estimado através de conjuntos de alta dimensão de variáveis das unidades não tratadas, usando métodos de regularização como LASSO. No segundo estágio, estimamos o efeito médio da intervenção através de um estimador consistente e assintoticamente normal. Além disso, nossos resultados são válidos uniformemente para um grande classe the distribuições. Como uma ilustração empírica da metodologia proposta, avaliamos o efeito de um programa antievasão fiscal. No segundo capítulo, investigamos as consequências de aplicar análises contrafactuais quando a amostra é gerada por processos integrados de ordem um. Concluímos que, na ausência de uma relação de cointegração (caso espúrio), o estimador da intervenção diverge, resultando na rejeição da hipótese de efeito nulo em ambos os casos, ou seja, com ou sem intervenção. Já no caso onde ao menos uma relação de cointegração exista, obtivemos um estimador consistente, embora, com uma distribuição limite não usual. Como recomendação final, sugerimos trabalhar com os dados em primeira diferença para evitar resultados espúrios sempre que haja possibilidade de processos integrados. Finalmente, no último capítulo, estendemos a metodologia ArCo para o caso de estimação de efeitos quantílicos condicionais. Derivamos uma estatística de teste assintoticamente normal para inferência, além de um teste distribucional. O procedimento é, então, adotado em um exercício empírico com o intuito de investigar os efeitos do retorno de ações após uma mudança do regime de governança corporativa. / [en] This thesis is composed of three chapters concerning the econometrics of counterfactual analysis. In the first one, we consider a new, exible and easy-to-implement methodology to estimate causal effects of an intervention on a single treated unit when no control group is readily available, which we called Artificial Counterfactual (ArCo). We propose a two-step approach where in the first stage a counterfactual is estimated from a largedimensional set of variables from a pool of untreated units using shrinkage methods, such as the Least Absolute Shrinkage Operator (LASSO). In the second stage, we estimate the average intervention effect on a vector of variables, which is consistent and asymptotically normal. Moreover, our results are valid uniformly over a wide class of probability laws. As an empirical illustration of the proposed methodology, we evaluate the effects on in ation of an anti tax evasion program. In the second chapter, we investigate the consequences of applying counterfactual analysis when the data are formed by integrated processes of order one. We find that without a cointegration relation (spurious case) the intervention estimator diverges, resulting in the rejection of the hypothesis of no intervention effect regardless of its existence. Whereas, for the case when at least one cointegration relation exists, we have a square root T-consistent estimator for the intervention effect albeit with a non-standard distribution. As a final recommendation we suggest to work in first-differences to avoid spurious results. Finally, in the last chapter we extend the ArCo methodology by considering the estimation of conditional quantile counterfactuals. We derive an asymptotically normal test statistics for the quantile intervention effect including a distributional test. The procedure is then applied in an empirical exercise to investigate the effects on stock returns after a change in corporate governance regime. [pt] MODELO DE FATORES [en] FACTOR MODELS [pt] ESTUDO COMPARATIVO [en] COMPARATIVE STUDY [pt] LASSO [pt] CONTROLE SINTETICO [pt] ANALISE CONTRAFACTUAL [pt] EFEITO DE TRATAMENTO
188	Comparação de métodos de estimação para problemas com colinearidade e/ou alta dimensionalidade (p > n) Casagrande, Marcelo Henrique 29 April 2016 (has links) Submitted by Bruna Rodrigues (bruna92rodrigues@yahoo.com.br) on 2016-10-06T11:48:12Z No. of bitstreams: 1 DissMHC.pdf: 1077783 bytes, checksum: c81f777131e6de8fb219b8c34c4337df (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-10-20T13:58:41Z (GMT) No. of bitstreams: 1 DissMHC.pdf: 1077783 bytes, checksum: c81f777131e6de8fb219b8c34c4337df (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-10-20T13:58:47Z (GMT) No. of bitstreams: 1 DissMHC.pdf: 1077783 bytes, checksum: c81f777131e6de8fb219b8c34c4337df (MD5) / Made available in DSpace on 2016-10-20T13:58:52Z (GMT). No. of bitstreams: 1 DissMHC.pdf: 1077783 bytes, checksum: c81f777131e6de8fb219b8c34c4337df (MD5) Previous issue date: 2016-04-29 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / This paper presents a comparative study of the predictive power of four suitable regression methods for situations in which data, arranged in the planning matrix, are very poorly multicolinearity and / or high dimensionality, wherein the number of covariates is greater the number of observations. In this study, the methods discussed are: principal component regression, partial least squares regression, ridge regression and LASSO. The work includes simulations, wherein the predictive power of each of the techniques is evaluated for di erent scenarios de ned by the number of covariates, sample size and quantity and intensity ratios (e ects) signi cant, highlighting the main di erences between the methods and allowing for the creating a guide for the user to choose which method to use based on some prior knowledge that it may have. An application on real data (not simulated) is also addressed. / Este trabalho apresenta um estudo comparativo do poder de predi c~ao de quatro m etodos de regress~ao adequados para situa c~oes nas quais os dados, dispostos na matriz de planejamento, apresentam s erios problemas de multicolinearidade e/ou de alta dimensionalidade, em que o n umero de covari aveis e maior do que o n umero de observa c~oes. No presente trabalho, os m etodos abordados s~ao: regress~ao por componentes principais, regress~ao por m nimos quadrados parciais, regress~ao ridge e LASSO. O trabalho engloba simula c~oes, em que o poder preditivo de cada uma das t ecnicas e avaliado para diferentes cen arios de nidos por n umero de covari aveis, tamanho de amostra e quantidade e intensidade de coe cientes (efeitos) signi cativos, destacando as principais diferen cas entre os m etodos e possibilitando a cria c~ao de um guia para que o usu ario possa escolher qual metodologia usar com base em algum conhecimento pr evio que o mesmo possa ter. Uma aplica c~ao em dados reais (n~ao simulados) tamb em e abordada Regressão ridge LASSO Mínimos quadrados parciais Regressão por componentes principais Alta dimensionalidade Ridge regression Partial least squares Principal component regression High dimensionality
189	Lamentationes Jeremiae Prophetae de Orlando di Lasso = a aplicação da quinta categoria analitica de Joachim Burmeister / Orlando di Lasso's Lamentationes Jeremiae Prophetae de Orlando di Lasso : the application of the fifth analytical category of Joachim Burmeister Ambiel, Aurea Helena de Jesus 15 August 2018 (has links) Orientador: Helena Jank / Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Artes / Made available in DSpace on 2018-08-15T18:17:05Z (GMT). No. of bitstreams: 1 Ambiel_AureaHelenadeJesus_D.pdf: 16067202 bytes, checksum: 62a0b51ee340c1cc3d3ef9f5ccbfd44b (MD5) Previous issue date: 2010 / Resumo: Em Musica Poética (1606) , Joachim Burmeister oferece a primeira proposta de análise formal em música . A sua metodologia analítica tem um direcionamento retórico- musical e compreende a divisão em cinco categorias: "(1) investigação do modo; (2) investigação do gênero melódico; (3) investigação do tipo de polifonia; (4) consideração da qualidade; (5) secionamento da peça em afetos ou períodos" (BURMEISTER, ([1606] 1993, p.201). O presente trabalho propõe a análise da obra Lamentationes Jeremiae Prophetae de Orlando Di Lasso, segundo a sua quinta categoria. A sua aplicação nesta obra de Lasso pode revelar o valor da análise. A intenção inicial do autor, antes de mais nada, era pedagógica: ele tencionava ensinar os jovens músicos a compor, utilizando como meio: o estudo, a observação e a imitação dos recursos engenhosos empregados na obra pelos grandes mestres (emulação). A aplicação desta quinta categoria coloca o analista numa posição que diz respeito também ao intérprete: é preciso encontrar nas diversas partes do discurso musical os afetos sugeridos pelo compositor. Através da análise é possível recriar a atmosfera afetiva da obra, oferecendo uma importante ferramenta para a interpretação, com profundidade e acuidade estilística / Abstract: In Musica Poetica (1606) Joachim Burmeister offers the first proposal of a formal analysis in music. His analytical methodology has a rhetoric-musical orientation and includes a five-category division: "(1) investigation of the mode; (2) investigation of the melodic genus; (3) investigation of the type of polyphony; (4) consideration of the quality; (5) sectioning of the piece into affections or periods" (BURMEISTER, [1606] 1993, p. 201). This dissertation aims to analyse Lamentationes Jeremiae Prophetae by Orlando Di Lasso in accordance with Burmeister's fifth category. The application of his fifth category on Lasso's work can reveal the value of this analytical method. Initially the authors's intention was pedagogic. He wanted to teach young musicians how to write music using the study, observation and imitation of inventive resources employed by great masters (emulation). Nowadays the application of this fifth category puts the music analyst on a par with the music performer. It is necessary to find out first the affect suggested by the composer among the several parts of the musical discourse. Through the utilization of this fifth category it is possible to recreate the affective atmosphere of the work. At the same time it can provide the performer an important tool to help in the interpretation of the work with depth and a keener sense of historical stylistic acuity / Doutorado / Doutor em Música Burmeister, Joachim, 1564-1629 Musicologia histórica Lamentações Figuras retorico-musicais Lamentations Musical rhetorical figures Historical musicology
190	Comparison of different models for forecasting of Czech electricity market / Comparison of different models for forecasting of Czech electricity market Kunc, Vladimír January 2017 (has links) There is a demand for decision support tools that can model the electricity markets and allows to forecast the hourly electricity price. Many different ap- proach such as artificial neural network or support vector regression are used in the literature. This thesis provides comparison of several different estima- tors under one settings using available data from Czech electricity market. The resulting comparison of over 5000 different estimators led to a selection of several best performing models. The role of historical weather data (temper- ature, dew point and humidity) is also assesed within the comparison and it was found that while the inclusion of weather data might lead to overfitting, it is beneficial under the right circumstances. The best performing approach was the Lasso regression estimated using modified Lars. 1

Search results