Spelling suggestions: "subject:"cmpirical processes"" "subject:"7empirical processes""
1 |
Some Properties of Empirical Risk Minimization over Donsker ClassesCaponnetto, Andrea, Rakhlin, Alexander 17 May 2005 (has links)
We study properties of algorithms which minimize (or almost minimize) empirical error over a Donsker class of functions. We show that the L2-diameter of the set of almost-minimizers is converging to zero in probability. Therefore, as the number of samples grows, it is becoming unlikely that adding a point (or a number of points) to the training set will result in a large jump (in L2 distance) to a new hypothesis. We also show that under some conditions the expected errors of the almost-minimizers are becoming close with a rate faster than n^{-1/2}.
|
2 |
Extremal Covariance MatricesCissokho, Youssouph January 2018 (has links)
The tail dependence coefficient (TDC) is a natural tool to describe extremal dependence. Estimation of the tail dependence coefficient can be performed via empirical process theory. In case of extremal independence, the limit degenerates and hence one cannot construct a test for extremal independence. In order to deal with this issue, we consider an analog of the covariance matrix, namely the extremogram matrix, whose entries depend only on extremal observations. We show that under the null hypothesis of extremal independence and for finite dimension d ≥ 2, the largest eigenvalue of the sample extremogram matrix converges to the maximum of d independent normal random variables. This allows us to conduct an hypothesis testing for extremal independence by means of the asymptotic distribution of the largest eigenvalue. Simulation studies are performed to further illustrate this approach.
|
3 |
EMPIRICAL PROCESSES FOR ESTIMATED PROJECTIONS OF MULTIVARIATE NORMAL VECTORS WITH APPLICATIONS TO E.D.F. AND CORRELATION TYPE GOODNESS OF FIT TESTSSaunders, Christopher Paul 01 January 2006 (has links)
Goodness-of-fit and correlation tests are considered for dependent univariate data that arises when multivariate data is projected to the real line with a data-suggested linear transformation. Specifically, tests for multivariate normality are investigated. Let { } i Y be a sequence of independent k-variate normal random vectors, and let 0 d be a fixed linear transform from Rk to R . For a sequence of linear transforms { ( )} 1 , , n d Y Y converging almost surely to 0 d , the weak convergence of the empirical process of the standardized projections from d to a tight Gaussian process is established. This tight Gaussian process is identical to that which arises in the univariate case where the mean and standard deviation are estimated by the sample mean and sample standard deviation (Wood, 1975). The tight Gaussian process determines the limiting null distribution of E.D.F. goodness-of-fit statistics applied to the process of the projections. A class of tests for multivariate normality, which are based on the Shapiro-Wilk statistic and the related correlation statistics applied to the dependent univariate data that arises with a data-suggested linear transformation, is also considered. The asymptotic properties for these statistics are established. In both cases, the statistics based on random linear transformations are shown to be asymptotically equivalent to the statistics using the fixed linear transformation. The statistics based on the fixed linear transformation have same critical points as the corresponding tests of univariate normality; this allows an easy implementation of these tests for multivariate normality. Of particular interest are two classes of transforms that have been previously considered for testing multivariate normality and are special cases of the projections considered here. The first transformation, originally considered by Wood (1981), is based on a symmetric decomposition of the inverse sample covariance matrix. The asymptotic properties of these transformed empirical processes were fully developed using classical results. The second class of transforms is the principal components that arise in principal component analysis. Peterson and Stromberg (1998) suggested using these transforms with the univariate Shapiro-Wilk statistic. Using these suggested projections, the limiting distribution of the E.D.F. goodness-of-fit and correlation statistics are developed.
|
4 |
EMPIRICAL PROCESSES AND ROC CURVES WITH AN APPLICATION TO LINEAR COMBINATIONS OF DIAGNOSTIC TESTSChirila, Costel 01 January 2008 (has links)
The Receiver Operating Characteristic (ROC) curve is the plot of Sensitivity vs. 1- Specificity of a quantitative diagnostic test, for a wide range of cut-off points c. The empirical ROC curve is probably the most used nonparametric estimator of the ROC curve. The asymptotic properties of this estimator were first developed by Hsieh and Turnbull (1996) based on strong approximations for quantile processes. Jensen et al. (2000) provided a general method to obtain regional confidence bands for the empirical ROC curve, based on its asymptotic distribution.
Since most biomarkers do not have high enough sensitivity and specificity to qualify for good diagnostic test, a combination of biomarkers may result in a better diagnostic test than each one taken alone. Su and Liu (1993) proved that, if the panel of biomarkers is multivariate normally distributed for both diseased and non-diseased populations, then the linear combination, using Fisher's linear discriminant coefficients, maximizes the area under the ROC curve of the newly formed diagnostic test, called the generalized ROC curve. In this dissertation, we will derive the asymptotic properties of the generalized empirical ROC curve, the nonparametric estimator of the generalized ROC curve, by using the empirical processes theory as in van der Vaart (1998). The pivotal result used in finding the asymptotic behavior of the proposed nonparametric is the result on random functions which incorporate estimators as developed by van der Vaart (1998). By using this powerful lemma we will be able to decompose an equivalent process into a sum of two other processes, usually called the brownian bridge and the drift term, via Donsker classes of functions. Using a uniform convergence rate result given by Pollard (1984), we derive the limiting process of the drift term. Due to the independence of the random samples, the asymptotic distribution of the generalized empirical ROC process will be the sum of the asymptotic distributions of the decomposed processes. For completeness, we will first re-derive the asymptotic properties of the empirical ROC curve in the univariate case, using the same technique described before. The methodology is used to combine biomarkers in order to discriminate lung cancer patients from normals.
|
5 |
The nonparametric least-squares method for estimating monotone functions with interval-censored observationsCheng, Gang 01 May 2012 (has links)
Monotone function, such as growth function and cumulative distribution function, is often a study of interest in statistical literature. In this dissertation, we propose a nonparametric least-squares method for estimating monotone functions induced from stochastic processes in which the starting time of the process is subject to interval censoring. We apply this method to estimate the mean function of tumor growth with the data from either animal experiments or tumor screening programs to investigate tumor progression. In this type of application, the tumor onset time is observed within an interval. The proposed method can also be used to estimate the cumulative distribution function of the elapsed time between two related events in human immunodeficiency virus (HIV)/acquired immunodeficiency syndrome (AIDS) studies, such as HIV transmission time between two partners and AIDS incubation time from HIV infection to AIDS onset. In these applications, both the initial event and the subsequent event are only known to occur within some intervals. Such data are called doubly interval-censored data. The common property of these stochastic processes is that the starting time of the process is subject to interval censoring.
A unified two-step nonparametric estimation procedure is proposed for these problems. In the first step of this method, the nonparametric maximum likelihood estimate (NPMLE) of the cumulative distribution function for the starting time of the stochastic process is estimated with the framework of interval-censored data. In the second step, a specially designed least-squares objective function is constructed with the above NPMLE plugged in and the nonparametric least-squares estimate (NPLSE) of the mean function of tumor growth or the cumulative distribution function of the elapsed time is obtained by minimizing the aforementioned objective function. The theory of modern empirical process is applied to prove the consistency of the proposed NPLSE. Simulation studies are extensively carried out to provide numerical evidence for the validity of the NPLSE. The proposed estimation method is applied to two real scientific applications. For the first application, California Partners' Study, we estimate the distribution function of HIV transmission time between two partners. In the second application, the NPLSEs of the mean functions of tumor growth are estimated for tumors with different stages at diagnosis based on the data from a cancer surveillance program, the SEER program. An ad-hoc nonparametric statistic is designed to test the difference between two monotone functions under this context. In this dissertation, we also propose a numerical algorithm, the projected Newton-Raphson algorithm, to compute the non– and semi-parametric estimate for the M-estimation problems subject to linear equality or inequality constraints. By combining the Newton-Raphson algorithm and the dual method for strictly convex quadratic programming, the projected Newton-Raphson algorithm shows the desired convergence rate. Compared to the well-known iterative convex minorant algorithm, the projected Newton-Raphson algorithm achieves much quicker convergence when computing the non- and semi-parametric maximum likelihood estimate of panel count data.
|
6 |
Asymptotic methods for tests of homogeneity for finite mixture modelsStewart, Michael Ian January 2002 (has links)
We present limit theory for tests of homogeneity for finite mixture models. More specifically, we derive the asymptotic distribution of certain random quantities used for testing that a mixture of two distributions is in fact just a single distribution. Our methods apply to cases where the mixture component distributions come from one of a wide class of one-parameter exponential families, both continous and discrete. We consider two random quantities, one related to testing simple hypotheses, the other composite hypotheses. For simple hypotheses we consider the maximum of the standardised score process, which is itself a test statistic. For composite hypotheses we consider the maximum of the efficient score process, which is itself not a statistic (it depends on the unknown true distribution) but is asymptotically equivalent to certain common test statistics in a certain sense. We show that we can approximate both quantities with the maximum of a certain Gaussian process depending on the sample size and the true distribution of the observations, which when suitably normalised has a limiting distribution of the Gumbel extreme value type. Although the limit theory is not practically useful for computing approximate p-values, we use Monte-Carlo simulations to show that another method suggested by the theory, involving using a Studentised version of the maximum-score statistic and simulating a Gaussian process to compute approximate p-values, is remarkably accurate and uses a fraction of the computing resources that a straight Monte-Carlo approximation would.
|
7 |
Asymptotic methods for tests of homogeneity for finite mixture modelsStewart, Michael Ian January 2002 (has links)
We present limit theory for tests of homogeneity for finite mixture models. More specifically, we derive the asymptotic distribution of certain random quantities used for testing that a mixture of two distributions is in fact just a single distribution. Our methods apply to cases where the mixture component distributions come from one of a wide class of one-parameter exponential families, both continous and discrete. We consider two random quantities, one related to testing simple hypotheses, the other composite hypotheses. For simple hypotheses we consider the maximum of the standardised score process, which is itself a test statistic. For composite hypotheses we consider the maximum of the efficient score process, which is itself not a statistic (it depends on the unknown true distribution) but is asymptotically equivalent to certain common test statistics in a certain sense. We show that we can approximate both quantities with the maximum of a certain Gaussian process depending on the sample size and the true distribution of the observations, which when suitably normalised has a limiting distribution of the Gumbel extreme value type. Although the limit theory is not practically useful for computing approximate p-values, we use Monte-Carlo simulations to show that another method suggested by the theory, involving using a Studentised version of the maximum-score statistic and simulating a Gaussian process to compute approximate p-values, is remarkably accurate and uses a fraction of the computing resources that a straight Monte-Carlo approximation would.
|
8 |
Essays in robust estimation and inference in semi- and nonparametric econometrics / Contributions à l'estimation et à l'inférence robuste en économétrie semi- et nonparamétriqueGuyonvarch, Yannick 28 November 2019 (has links)
Dans le chapitre introductif, nous dressons une étude comparée des approches en économétrie et en apprentissage statistique sur les questions de l'estimation et de l'inférence en statistique.Dans le deuxième chapitre, nous nous intéressons à une classe générale de modèles de variables instrumentales nonparamétriques. Nous généralisons la procédure d'estimation de Otsu (2011) en y ajoutant un terme de régularisation. Nous prouvons la convergence de notre estimateur pour la norme L2 de Lebesgue.Dans le troisième chapitre, nous montrons que lorsque les données ne sont pas indépendantes et identiquement distribuées (i.i.d) mais simplement jointement échangeables, une version modifiée du processus empirique converge faiblement vers un processus gaussien sous les mêmes conditions que dans le cas i.i.d. Nous obtenons un résultat similaire pour une version adaptée du processus empirique bootstrap. Nous déduisons de nos résultats la normalité asymptotique de plusieurs estimateurs non-linéaires ainsi que la validité de l'inférence basée sur le bootstrap. Nous revisitons enfin l'article empirique de Santos Silva et Tenreyro (2006).Dans le quatrième chapitre, nous abordons la question de l'inférence pour des ratios d'espérances. Nous trouvons que lorsque le dénominateur ne tend pas trop vite vers zéro quand le nombre d'observations n augmente, le bootstrap nonparamétrique est valide pour faire de l'inférence asymptotique. Dans un second temps, nous complétons un résultat d'impossibilité de Dufour (1997) en montrant que quand n est fini, il est possible de construire des intervalles de confiance qui ne sont pas pathologiques sont certaines conditions sur le dénominateur.Dans le cinquième chapitre, nous présentons une commande Stata qui implémente les estimateurs proposés par de Chaisemartin et d'Haultfoeuille (2018) pour mesurer plusieurs types d'effets de traitement très étudiés en pratique. / In the introductory chapter, we compare views on estimation and inference in the econometric and statistical learning disciplines.In the second chapter, our interest lies in a generic class of nonparametric instrumental models. We extend the estimation procedure in Otsu (2011) by adding a regularisation term to it. We prove the consistency of our estimator under Lebesgue's L2 norm.In the third chapter, we show that when observations are jointly exchangeable rather than independent and identically distributed (i.i.d), a modified version of the empirical process converges weakly towards a Gaussian process under the same conditions as in the i.i.d case. We obtain a similar result for a modified version of the bootstrapped empirical process. We apply our results to get the asymptotic normality of several nonlinear estimators and the validity of bootstrap-based inference. Finally, we revisit the empirical work of Santos Silva and Tenreyro (2006).In the fourth chapter, we address the issue of conducting inference on ratios of expectations. We find that when the denominator tends to zero slowly enough when the number of observations n increases, bootstrap-based inference is asymptotically valid. Secondly, we complement an impossibility result of Dufour (1997) by showing that whenever n is finite it is possible to construct confidence intervals which are not pathological under some conditions on the denominator.In the fifth chapter, we present a Stata command which implements estimators proposed in de Chaisemartin et d'Haultfoeuille (2018) to measure several types of treatment effects widely studied in practice.
|
9 |
Contribution to the weak convergence of empirical copula process : contribution to the stochastic claims reserving in general insurance / Contribution à la convergence faible de processus empirique des copules : contribution au provisionnement stochastique dans une compagnie d'assuranceSloma, Przemyslaw 30 September 2014 (has links)
Dans la première partie de la thèse, nous nous intéressons à la convergence faible du processus empirique pondéré des copules. Nous fournissons la condition suffisante pour que cette convergence ait lieu vers un processus Gaussien limite. Nos résultats sont obtenus dans un espace de Banach L^p. Nous donnons des applications statistiques de ces résultats aux tests d'adéquation (tests of goodness of fit) pour les copules. Une attention spéciale est portée aux tests basées sur des statistiques de type Cramér-von Mises.Dans un second temps, nous étudions le problème de provisionnement stochastique pour une compagnie d'assurance non-vie. Les méthodes stochastiques sont utilisées afin d'évaluer la variabilité des réserves. Le point de départ pour cette thèse est une incohérence entre les méthodes utilisées en pratique et celles publiées dans la littérature. Pour remédier à cela, nous présentons un outil général de provisionnement stochastique à horizon ultime (Chapitre 3) et à un an (Chapitre 4), basé sur la méthode Chain Ladder. / The aim of this thesis is twofold. First, we concentrate on the study of weak convergence of weighted empirical copula processes. We provide sufficient conditions for this convergence to hold to a limiting Gaussian process. Our results are obtained in the framework of convergence in the Banach space $L^{p}$ ($1\leq p <\infty $). Statistical applications to goodness of fit (GOF) tests for copulas are given to illustrate these results. We pay special attention to GOF tests based on Cramér-von Mises type statistics. Second, we discuss the problem of stochastic claims reserving in general non-life insurance. Stochastic models are needed in order to assess the variability of the claims reserve. The starting point of this thesis is an observed inconsistency between the approaches used in practice and that suggested in the literature. To fill this gap, we present a general tool for measuring the uncertainty of reserves in the framework of ultimate (Chapter 3) and one-year time horizon (Chapter 4), based on the Chain-Ladder method.
|
10 |
Least squares estimation for binary decision treesAlbrecht, Nadine 14 December 2020 (has links)
In this thesis, a binary decision tree is used as an approximation of a nonparametric regression curve. The best fitted decision tree is estimated from data via least squares method. It is investigated how and under which conditions the estimator converges.
These asymptotic results then are used to create asymptotic convergence regions.
|
Page generated in 0.0682 seconds