Spelling suggestions: "subject:"nonparametric estatistics"" "subject:"nonparametric cstatistics""
41 |
Statistical methods for the detection of non-technical losses: a case study for the Nelson Mandela Bay MunicipalityPazi, Sisa January 2017 (has links)
Electricity is one of the most stolen commodities in the world. Electricity theft can be defined as the criminal act of stealing electrical power. Several types of electricity theft exist, including illegal connections and bypassing and tampering with energy meters. The negative financial impacts, due to lost revenue, of electricity theft are far reaching and affect both developing and developed countries. . Here in South Africa, Eskom loses over R2 Billion annually due to electricity theft. Data mining and nonparametric statistical methods have been used to detect fraudulent usage of electricity by assessing abnormalities and abrupt changes in kilowatt hour (kWh) consumption patterns. Identifying effective measures to detect fraudulent electricity usage is an active area of research in the electrical domain. In this study, Support Vector Machines (SVM), Naïve Bayes (NB) and k-Nearest Neighbour (KNN) algorithms were used to design and propose an electricity fraud detection model. Using the Nelson Mandela Bay Municipality as a case study, three classifiers were built with SVM, NB and KNN algorithms. The performance of these classifiers were evaluated and compared.
|
42 |
On the Construction of Minimax Optimal Nonparametric Tests with Kernel Embedding MethodsLi, Tong January 2021 (has links)
Kernel embedding methods have witnessed a great deal of practical success in the area of nonparametric hypothesis testing in recent years. But ever since its first proposal, there exists an inevitable problem that researchers in this area have been trying to answer--what kernel should be selected, because the performance of the associated nonparametric tests can vary dramatically with different kernels. While the way of kernel selection is usually ad hoc, we wonder if there exists a principled way of kernel selection so as to ensure that the associated nonparametric tests have good performance. As consistency results against fixed alternatives do not tell the full story about the power of the associated tests, we study their statistical performance within the minimax framework. First, focusing on the case of goodness-of-fit tests, our analyses show that a vanilla version of the kernel embedding based test could be suboptimal, and suggest a simple remedy by moderating the kernel. We prove that the moderated approach provides optimal tests for a wide range of deviations from the null and can also be made adaptive over a large collection of interpolation spaces. Then, we study the asymptotic properties of goodness-of-fit, homogeneity and independence tests using Gaussian kernels, arguably the most popular and successful among such tests. Our results provide theoretical justifications for this common practice by showing that tests using a Gaussian kernel with an appropriately chosen scaling parameter are minimax optimal against smooth alternatives in all three settings. In addition, our analysis also pinpoints the importance of choosing a diverging scaling parameter when using Gaussian kernels and suggests a data-driven choice of the scaling parameter that yields tests optimal, up to an iterated logarithmic factor, over a wide range of smooth alternatives. Numerical experiments are presented to further demonstrate the practical merits of our methodology.
|
43 |
Essays in EconometricsFeng, Junlong January 2020 (has links)
My dissertation explores two broad areas in econometrics and statistics. The first area is nonparametric identification and estimation with endogeneity using instrumental variables. The second area is related to low-rank matrix recovery and high-dimensional panel data models. The following three chapters study different topics in these areas.
Chapter 1 considers identification and estimation of triangular models with a discrete endogenous variable and an instrumental variable (IV) taking on fewer values. Using standard approaches, the small support set of the IV leads to under-identification due to the failure of the order condition. This chapter develops the first approach to restore identification for both separable and nonseparable models in this case by supplementing the IV with covariates, allowed to enter the model in an arbitrary way. For the separable model, I show that it satisfies a system of linear equations, yielding a simple identification condition and a closed-form estimator. For the nonseparable model, I develop a new identification argument by exploiting its continuity and monotonicity, leading to weak sufficient conditions for global identification. Built on it, I propose a uniformly consistent and asymptotically normal sieve estimator. I apply my approach to an empirical application of the return to education with a binary IV. Though under-identified by the IV alone, I obtain results consistent with the empirical literature using my method. I also illustrate the applicability of the approach via an application of preschool program selection where the supplementation procedure fails.
Chapter 2, written with Jushan Bai, studies low-rank matrix recovery with a non-sparse error matrix. Sparsity or approximate sparsity is often imposed on the error matrix for low-rank matrix recovery in statistics and machine learning literature. In econometrics, on the other hand, it is more common to impose a location normalization for the stochastic errors. This chapter sheds light on the deep connection between the median zero assumption and the sparsity-type assumptions by showing that the principal component pursuit method, a popular approach for low-rank matrix recovery by Candès et al. (2011), consistently estimates the low-rank component under a median zero assumption. The proof relies on a new theoretical argument showing that the median-zero error matrix can be decomposed into a matrix with a sufficient number of zeros and a non-sparse matrix with a small norm that controls the estimation error bound. As no restriction is imposed on the moments of the errors, the results apply to cases when the errors have heavy- or fat-tails.
In Chapter 3, I consider nuclear norm penalized quantile regression for large N and large T panel data models with interactive fixed effects. As the interactive fixed effects form a low-rank matrix, inspired by the median-zero interpretation, the estimator in this chapter extends the one studied in Chapter 2 by incorporating a conditional quantile restriction given covariates. The estimator solves a global convex minimization problem, not requiring pre-estimation of the (number of the) fixed effects. Uniform rates are obtained for both the slope coefficients and the low-rank common component of the interactive fixed effects. The rate of the latter is nearly optimal. To derive the rates, I show new results that establish uniform bounds of norms of certain random matrices of jump processes. The performance of the estimator is illustrated by Monte Carlo simulations.
|
44 |
Nonparametric statistical methods in financial market research.Corrado, Charles J. January 1988 (has links)
This dissertation presents an exploration of the use of nonparametric statistical methods based on ranks for use in financial market research. Applications to event study methodology and the estimation of security systematic risk are analyzed using a simulation methodology with actual daily security return data. The results indicate that procedures based on ranks are more efficient than normal theory procedures currently in common use.
|
45 |
Parametric and non-parametric inference for Geometric ProcessHo, Pak-kei., 何柏基. January 2005 (has links)
published_or_final_version / abstract / Statistics and Actuarial Science / Master / Master of Philosophy
|
46 |
On the computation and power of goodness-of-fit testsWang, Jingbo, 王靜波 January 2005 (has links)
published_or_final_version / abstract / Computer Science / Master / Master of Philosophy
|
47 |
A study of nonparametric inference problems using Monte Carlo methodsHo, Hoi-sheung., 何凱嫦. January 2005 (has links)
published_or_final_version / abstract / Statistics and Actuarial Science / Doctoral / Doctor of Philosophy
|
48 |
Applications of nonparametric statistics to multicomponent solids mixingToo, Jui-Rze January 2010 (has links)
Photocopy of typescript. / Digitized by Kansas Correctional Industries
|
49 |
Three essays on nonparametric and semiparametric regression modelsYao, Feng 23 April 2004 (has links)
Graduation date: 2004
|
50 |
Extensions of the proportional hazards loglikelihood for censored survival dataDerryberry, DeWayne R. 22 September 1998 (has links)
The semi-parametric approach to the analysis of proportional hazards survival data
is relatively new, having been initiated in 1972 by Sir David Cox, who restricted its use
to hypothesis tests and confidence intervals for fixed effects in a regression setting.
Practitioners have begun to diversify applications of this model, constructing
residuals, modeling the baseline hazard, estimating median failure time, and analyzing
experiments with random effects and repeated measures. The main purpose of this
thesis is to show that working with an incompletely specified loglikelihood is more
fruitful than working with Cox's original partial loglikelihood, in these applications.
In Chapter 2, we show that the deviance residuals arising naturally from the partial
loglikelihood have difficulties detecting outliers. We demonstrate that a smoothed, nonparametric
baseline hazard partially solves this problem. In Chapter 3, we derive new
deviance residuals that are useful for identifying the shape of the baseline hazard. When
these new residuals are plotted in temporal order, patterns in the residuals mirror
patterns in the baseline hazard. In Chapter 4, we demonstrate how to analyze survival
data having a split-plot design structure. Using a BLUP estimation algorithm, we
produce hypothesis tests for fixed effects, and estimation procedures for the fixed
effects and random effects. / Graduation date: 1999
|
Page generated in 0.0785 seconds