Global ETD Search

1	Sélection de variables et régression sur les quantiles / Variables selection and quantile regression Sidi Zakari, Ibrahim 10 July 2013 (has links) Ce travail est une contribution à la sélection de modèles statistiques et plus précisément à la sélection de variables dans le cadre de régression linéaire sur les quantiles pénalisée lorsque la dimension est grande. On se focalise sur deux points lors de la procédure de sélection : la stabilité de sélection et la prise en compte de variables présentant un effet de groupe. Dans une première contribution, on propose une transition des moindres carrés pénalisés vers la régression sur les quantiles (QR). Une approche de type bootstrap fondée sur la fréquence de sélection de chaque variable est proposée pour la construction de modèles linéaires (LM). Dans la majorité des cas, l’approche QR fournit plus de coefficients significatifs. Une deuxième contribution consiste à adapter certains algorithmes de la famille « Random » LASSO (Least Absolute Solution and Shrinkage Operator) au cadre de la QR et à proposer des méthodes de stabilité de sélection. Des exemples provenant de la sécurité alimentaire illustrent les résultats obtenus. Dans le cadre de la QR pénalisée en grande dimension, on établit la propriété d’effet groupement sous des conditions plus faibles ainsi que les propriétés oracles. Deux exemples de données réelles et simulées illustrent les chemins de régularisation des algorithmes proposés. La dernière contribution traite la sélection de variables pour les modèles linéaires généralisés (GLM) via la vraisemblance nonconcave pénalisée. On propose un algorithme pour maximiser la vraisemblance pénalisée pour une large classe de fonctions de pénalité non convexes. La propriété de convergence de l’algorithme ainsi que la propriété oracle de l’estimateur obtenu après une itération ont été établies. Des simulations ainsi qu’une application sur données réelles sont également présentées. / This work is a contribution to the selection of statistical models and more specifically in the selection of variables in penalized linear quantile regression when the dimension is high. It focuses on two points in the selection process: the stability of selection and the inclusion of variables by grouping effect. As a first contribution, we propose a transition from the penalized least squares regression to quantiles regression (QR). A bootstrap approach based on frequency of selection of each variable is proposed for the construction of linear models (LM). In most cases, the QR approach provides more significant coefficients. A second contribution is to adapt some algorithms of "Random" LASSO (Least Absolute Shrinkage and Solution Operator) family in connection with the QR and to propose methods of selection stability. Examples from food security illustrate the obtained results. As part of the penalized QR in high dimension, the grouping effect property is established under weak conditions and the oracle ones. Two examples of real and simulated data illustrate the regularization paths of the proposed algorithms. The last contribution deals with variable selection for generalized linear models (GLM) using the nonconcave penalized likelihood. We propose an algorithm to maximize the penalized likelihood for a broad class of non-convex penalty functions. The convergence property of the algorithm and the oracle one of the estimator obtained after an iteration have been established. Simulations and an application to real data are also presented. Quantiles 519.536
2	Three Essays on Extremal Quantiles Zhang, Yichong January 2016 (has links) <p>Extremal quantile index is a concept that the quantile index will drift to zero (or one)</p><p>as the sample size increases. The three chapters of my dissertation consists of three</p><p>applications of this concept in three distinct econometric problems. In Chapter 2, I</p><p>use the concept of extremal quantile index to derive new asymptotic properties and</p><p>inference method for quantile treatment effect estimators when the quantile index</p><p>of interest is close to zero. In Chapter 3, I rely on the concept of extremal quantile</p><p>index to achieve identification at infinity of the sample selection models and propose</p><p>a new inference method. Last, in Chapter 4, I use the concept of extremal quantile</p><p>index to define an asymptotic trimming scheme which can be used to control the</p><p>convergence rate of the estimator of the intercept of binary response models.</p> / Dissertation Economics Extremal Quantiles Treatment
3	Estimation de l'écotoxicité de substances chimiques par des méthodes à noyaux / Estimation of ecotoxicity of chemicals by nucleus methods Villain, Jonathan 24 June 2016 (has links) Dans le domaine de la chimie et plus particulièrement en chimio-informatique, les modèles QSAR (pour Quantitative Structure Activity Relationship) sont de plus en plus étudiés. Ils permettent d’avoir une estimation in silico des propriétés des composés chimiques notamment des propriétés éco toxicologiques. Ces modèles ne sont théoriquement valables que pour une classe de composés (domaine de validité) et sont sensibles à la présence de valeurs atypiques. La thèse s’est focalisée sur la construction de modèles globaux robustes (intégrant un maximum de composés) permettant de prédire l’écotoxicité des composés chimiques sur une algue P. Subcapitata et de déterminer un domaine de validité dans le but de déduire la capacité de prédiction d’un modèle pour une molécule. Ces modèles statistiques robustes sont basés sur une approche quantile en régression linéaire et en régression Support Vector Machine. / In chemistry and more particularly in chemoinformatics, QSAR models (Quantitative Structure Activity Relationship) are increasingly studied. They provide an in silico estimation of the properties of chemical compounds including ecotoxicological properties. These models are theoretically valid only for a class of compounds (validity domain) and are sensitive to the presence of outliers. This PhD thesis is focused on the construction of robust global models (including a maximum of compounds) to predict ecotoxicity of chemical compounds on algae P. subcapitata and to determine a validity domain in order to deduce the capacity of a model to predict the toxicity of a compound. These robust statistical models are based on quantile approach in linear regression and regression Support Vector Machine. Chemoinformatique Quantiles Chemoinformatics Quantile 519.5
4	A simulation comparison of parametric and nonparametric estimators of quantiles from right censored data Serasinghe, Shyamalee Kumary January 1900 (has links) Master of Science / Department of Statistics / Paul I. Nelson / Quantiles are useful in describing distributions of component lifetimes. Data, consisting of the lifetimes of sample units, used to estimate quantiles are often censored. Right censoring, the setting investigated here, occurs, for example, when some test units may still be functioning when the experiment is terminated. This study investigated and compared the performance of parametric and nonparametric estimators of quantiles from right censored data generated from Weibull and Lognormal distributions, models which are commonly used in analyzing lifetime data. Parametric quantile estimators based on these assumed models were compared via simulation to each other and to quantile estimators obtained from the nonparametric Kaplan- Meier Estimator of the survival function. Various combinations of quantiles, censoring proportion, sample size, and distributions were considered. Our simulation show that the larger the sample size and the lower the censoring rate the better the performance of the estimates of the 5th percentile of Weibull data. The lognormal data are very sensitive to the censoring rate and we observed that for higher censoring rates the incorrect parametric estimates perform the best. If you do not know the underlying distribution of the data, it is risky to use parametric estimates of quantiles close to one. A limitation in using the nonparametric estimator of large quantiles is their instability when the censoring rate is high and the largest observations are censored. Key Words: Quantiles, Right Censoring, Kaplan-Meier estimator Estimating Quantiles Right Censored Data Generating Right Censored Data Kaplan-Meier Estimator Parametric Estimators of Quantiles Nonparametric Estimators of Quantiles Statistics (0463)
5	Measuring Financial Contagion Based on CAViaR Method: An Application on Europe / Měření finanční nákazy pomocí CAViaR metody: Aplikace na Evropu Tomanová, Petra January 2016 (has links) The aim of this thesis is to measure changes in dependencies among returns on equity indices for European countries in tranquil periods against crisis periods and to investigate their asymmetries in the lower and upper tail of their distributions. The approach is based on a conditional probability that a random variable is lower than a given quantile while other random variables are also lower than their corresponding quantiles. Time-varying conditional quantiles are modeled by the Conditional Autoregressive Value at Risk via Regression Quantiles (CAViaR) method. In addition to the univariate conditional autoregressive models, the vector autoregressive extension is considered. In the second step, the conditional probability is estimated through the OLS regression. Moreover, the model which allows the distribution of returns in one country to lead or to lag the distribution of returns in another country, is defined and applied on European equity returns. Finally, the model measuring dependencies among more than two return series is derived and the relating dimensionality problems are discussed. The results document a significant increase in European equity return comovements in bear markets during the crisis in 1990s and 2000s. The explicit controlling for the high volatility days does not appear to have an impact on the main findings. For the comparison purposes, the results for Latin American countries are reported as well.
6	Estimateurs de calage pour les quantiles Harms, Torsten January 2004 (has links) Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal. Estimateur de calage Quantiles Estimateur par le ratio Estimateur par la différence
7	Embarrassingly Parallel Statistics and its Applications: Divide & Recombine Methods for Parallel Computation of Quantiles and Construction of K-D Trees for Big-Data Aritra Chakravorty (5929565) 16 January 2019 (has links) <div>In Divide & Recombine (D&R), data are divided into subsets, analytic methodsare applied to each subset independently, with no communication between processes;then the subset outputs for each method are recombined. For big data, this providesalmost all of the analytic tasking needed when data are analyzed. It also provideshigh computational performance because typically most of the computation is em-barrassingly parallel, the simplest parallel computation.</div><div><br></div><div>Another kind of tasking must address computational performance and numericaccuracy: the computing of functions of all of the data, or “statistics”. For data bigand small, it is often important to compute such statistics for all of the data, whichcan be summaries of the data, such as sample quantiles of continuous variables, orcan process the data into a form that helps analysis, such as dividing the data intorepresentative subsets. Development of computational methods to compute thesestatistics can be challenging.</div><div><br></div><div>D&R can be a very effective framework for computing statistics. To supportthis, we introduce the concept of embarrassingly parallel (EP) statistics, both weakand strong. The concept of EP statistics is not entirely new, but has had littledevelopment. The existing methodology is mainly sums of sums. For example, this isdone when computing the necessary statistics for least squares where sums of productsand cross productions are carried out on subsets then summed across subsets. Ourtreatment of EP statistics has taken the concept much further. The outcome is abilityto use EP statistics in conjunction with the use a Fourier series to approximate an optimization criteria. The series terms, which are strongly EP statistics, are summedacross subsets, and the result is optimized. These are EP-F computational methods.</div><div><br></div><div>We have so far developed two EP-F computational methods for two widely usedstatistic computations. EP-F-Quantile is for quantiles of big data, and EP-F-KDtreeis for KD-trees. Speed and accuracy of EPF-Quantile are compared with that of thewell-known binning method, which also can be formulated in terms of EP statistics. EPF-KDtree is the first parallel KD-tree computational method of which we areaware. EP and EPF computational methods have potentially many other applicationsto computing statistics.</div> Statistics Divide and Recombine Map-Reduce Parallel algorithms. KD-tree quantiles
8	Empirical Likelihood Confidence Intervals for the Difference of Two Quantiles with Right Censoring Yau, Crystal Cho Ying 21 November 2008 (has links) In this thesis, we study two independent samples under right censoring. Using a smoothed empirical likelihood method, we investigate the difference of quantiles in the two samples and construct the pointwise confidence intervals from it as well. The empirical log-likelihood ratio is proposed and its asymptotic limit is shown as a chi-squared distribution. In the simulation studies, in terms of coverage accuracy and average length of confidence intervals, we compare the empirical likelihood and the normal approximation method. It is concluded that the empirical likelihood method has a better performance. At last, a real clinical trial data is used for the purpose of illustration. Numerical examples to illustrate the efficacy of the method are presented. Censored data Difference of quantiles Smoothed empirical likelihood Confidence interval Mathematics
9	Contribution à l'estimation non paramétrique des quantiles géométriques et à l'analyse des données fonctionnelles Chaouch, Mohamed 05 December 2008 (has links) (PDF) Cette thèse est consacré à l'estimation non paramétrique des quantiles géométriques conditionnels ou non et à l'analyse des données fonctionnelles. Nous nous sommes intéressés, dans un premier temps, à l'étude des quantiles géométriques. Nous avons montré, avec plusieurs simulations, qu'une étape de Transformation-retransformation est nécessaire, pour estimer le quantile géométrique, lorsqu'on s'éloigne du cadre d'une distribution sphérique. Une étude sur des données réelles a confirmée que la modélisation des données est mieux adaptée lorsqu'on utilise les quantiles géométriques à la place des quantiles mariginaux, notamment lorsque les variables qui constituent le vecteur aléatoire sont corrélées. Ensuite nous avons étudié l'estimation des quantiles géométriques lorsque les observations sont issues d'un plan de sondage. Nous avons proposé un estimateur sans biais du quantile géométrique et à l'aide des techniques de linéarisation par les équations estimantes, nous avons déterminé la variance asymptotique de l'estimateur. Nous avons ensuite montré que l'estimateur de type Horvitz-Thompson de la variance converge en probabilité. Nous nous sommes placés par la suite dans le cadre de l'estimation des quantiles géométriques conditionnels lorsque les observations sont dépendantes. Nous avons démontré que l'estimateur du quantile géométrique conditionnel converge uniformement sur tout ensemble compact. La deuxième partie de ce mémoire est consacrée à l'étude des différents paramètres caractérisant l'ACP fonctionnelle lorsque les observations sont tirées selon un plan de sondage. Les techniques de linéarisation basées sur la fonction d'influence permettent de fournir des estimateurs de la variance dans le cadre asymptotique. Sous certaines hypothèses, nous avons démontré que ces estimateurs convergent en probabilité. [MATH] Mathematics Quantiles géométriques quantiles géométriques conditionnels ACP fonctionnelle sondage fonction d'influence linéarisation \alpha -mélange Transformation-Retransformation
10	Estimation non paramétrique adaptative dans la théorie des valeurs extrêmes : application en environnement / Nonparametric adaptive estimation in the extreme value theory : application in ecology Pham, Quang Khoai 09 January 2015 (has links) L'objectif de cette thèse est de développer des méthodes statistiques basées sur la théorie des valeurs extrêmes pour estimer des probabilités d'évènements rares et des quantiles extrêmes conditionnelles. Nous considérons une suite de variables aléatoires indépendantes X_{t_1}$, $X_{t_2}$,...$,$X_{t_n}$ associées aux temps $0≤t_{1}< … <t_{n}≤T_{\max}$ où $X_{t_i}$ a la fonction de répartition $F_{t_i}$ et $F_t$ est la loi conditionnelle de $X$ sachant $T=t \in [0,T_{\max}]$. Pour chaque $t \in [0,T_{\max}]$, nous proposons un estimateur non paramétrique de quantiles extrêmes de $F_t$. L'idée de notre approche consiste à ajuster pour chaque $t \in [0,T_{\max}]$ la queue de la distribution $F_{t}$, par une distribution de Pareto de paramètre $\theta_{t,\tau}$ à partir d'un seuil $\tau.$ Le paramètre $\theta_{t,\tau}$ est estimé en utilisant un estimateur non paramétrique à noyau de taille de fenêtre $h$ basé sur les observations plus grandes que $\tau$. Sous certaines hypothèses de régularité, nous montrons que l'estimateur adaptatif proposé de $\theta_{t,\tau} $ est consistant et nous donnons sa vitesse de convergence. Nous proposons une procédure de tests séquentiels pour déterminer le seuil $\tau$ et nous obtenons le paramètre $h$ suivant deux méthodes : la validation croisée et une approche adaptative. Nous proposons également une méthode pour choisir simultanément le seuil $\tau$ et la taille de la fenêtre $h$. Finalement, les procédures proposées sont étudiées sur des données simulées et sur des données réelles dans le but d'aider à la surveillance de systèmes aquatiques. / The objective of this PhD thesis is to develop statistical methods based on the theory of extreme values to estimate the probabilities of rare events and conditional extreme quantiles. We consider independent random variables $X_{t_1},…,X_{t_n}$ associated to a sequence of times $0 ≤t_1 <… < t_n ≤ T_{\max}$ where $X_{t_i}$ has distribution function $F_{t_i}$ and $F_t$ is the conditional distribution of $X$ given $T = t \in [0,T_{\max}]$. For each $ t \in [0, T {\max}]$, we propose a nonparametric adaptive estimator for extreme quantiles of $F_t$. The idea of our approach is to adjust the tail of the distribution function $F_t$ with a Pareto distribution of parameter $\theta {t,\tau}$ starting from a threshold $\tau$. The parameter $\theta {t,\tau}$ is estimated using a nonparametric kernel estimator of bandwidth $h$ based on the observations larger than $\tau$. We propose a sequence testing based procedure for the choice of the threshold $\tau$ and we determine the bandwidth $h$ by two methods: cross validation and an adaptive procedure. Under some regularity assumptions, we prove that the adaptive estimator of $\theta {t, \tau}$ is consistent and we determine its rate of convergence. We also propose a method to choose simultaneously the threshold $\tau$ and the bandwidth $h$. Finally, we study the proposed procedures by simulation and on real data set to contribute to the survey of aquatic systems. Estimation non paramétrique Probabilités d'évènements rares Quantiles extrêmes conditionnelles Environnement Nonparametric estimation Probabilities of rare events Extreme quantiles conditional Ecology 519.2

Search results