• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 113
  • 12
  • 11
  • 6
  • 3
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 176
  • 176
  • 93
  • 32
  • 29
  • 25
  • 20
  • 18
  • 18
  • 16
  • 15
  • 15
  • 15
  • 14
  • 13
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Bayesian kernel density estimation

Rademeyer, Estian January 2017 (has links)
This dissertation investigates the performance of two-class classi cation credit scoring data sets with low default ratios. The standard two-class parametric Gaussian and naive Bayes (NB), as well as the non-parametric Parzen classi ers are extended, using Bayes' rule, to include either a class imbalance or a Bernoulli prior. This is done with the aim of addressing the low default probability problem. Furthermore, the performance of Parzen classi cation with Silverman and Minimum Leave-one-out Entropy (MLE) Gaussian kernel bandwidth estimation is also investigated. It is shown that the non-parametric Parzen classi ers yield superior classi cation power. However, there is a longing for these non-parametric classi ers to posses a predictive power, such as exhibited by the odds ratio found in logistic regression (LR). The dissertation therefore dedicates a section to, amongst other things, study the paper entitled \Model-Free Objective Bayesian Prediction" (Bernardo 1999). Since this approach to Bayesian kernel density estimation is only developed for the univariate and the uncorrelated multivariate case, the section develops a theoretical multivariate approach to Bayesian kernel density estimation. This approach is theoretically capable of handling both correlated as well as uncorrelated features in data. This is done through the assumption of a multivariate Gaussian kernel function and the use of an inverse Wishart prior. / Dissertation (MSc)--University of Pretoria, 2017. / The financial assistance of the National Research Foundation (NRF) towards this research is hereby acknowledged. Opinions expressed and conclusions arrived at, are those of the authors and are not necessarily to be attributed to the NRF. / Statistics / MSc / Unrestricted
32

Development of a robbery prediction model for the City of Tshwane Metropolitan Municipality

Kemp, Nicolas James January 2020 (has links)
Crime is not spread evenly over space or time. This suggests that offenders favour certain areas and/or certain times. People base their daily activities on this notion and make decisions to avoid certain areas or feel the need to be more alert in some places rather than others. Even when making choices of where to stay, shop, and go to school, people take into account how safe they feel in those places. Crime in relation to space and time has been studied over several centuries; however, the era of the computer has brought new insight to this field. Indeed, computing technology and in particular geographic information systems (GIS) and crime mapping software, has increased the interest in explaining criminal activities. It is the ability to combine the type, time and spatial occurrences of crime events that makes the use of these computing technologies attractive to crime analysts. This current study predicts robbery crime events in the City of Tshwane Metropolitan Municipality. By combining GIS and statistical models, a proposed method was developed to predict future robbery hotspots. More specifically, a robbery probability model was developed for the City of Tshwane Metropolitan Municipality based on robbery events that occurred during 2006 and this model is evaluated using actual robbery events that occurred in the 2007. This novel model was based on the social disorganisation, routine activity, crime pattern and temporal constraint crime theories. The efficacy of the model was tested by comparing it to a traditional hotspot model. The robbery prediction model was developed using both built and social environmental features. Features in the built environment were divided into two main groups: facilities and commuter nodes. The facilities used in the current study included cadastre parks, clothing stores, convenience stores, education facilities, fast food outlets, filling stations, office parks and blocks, general stores, restaurants, shopping centres and supermarkets. The key commuter nodes consisted of highway nodes, main road nodes and railway stations. The social environment was built using demographics obtained from the 2001 census data. The selection of these features that may impact the occurrence of robbery was guided by spatial crime theories housed within the school of environmental criminology. Theories in this discipline argue that neighbourhoods experiencing social disorganisation are more prone to crime, while different facilities act as crime attractors or generators. Some theories also include a time element suggesting that criminals are constrained by time, leaving little time to explore areas far from commuting nodes. The current study combines these theories using GIS and statistics. A programmatic approach in R was used to create kernel density estimations (hotspots), select relevant features, compute regression models with the use of the caret and mlr packages and predict crime hotspots. R was further used for the majority of spatial queries and analyses. The outcome consisted of various hotspot raster layers predicting future robbery occurrences. The accuracy of the model was tested using 2007 robbery events. Therefore, this current study not only provides a novel statistical predictive model but also showcases R’s spatial capabilities. The current study found strong supporting evidence for the routine activity and crime pattern theory in that robberies tended to cluster around facilities within the city of Tshwane, South Africa. The findings also show a strong spatial association between robberies and neighbourhoods that experience high social disorganisation. Support was also found for the time constraint theory in that a large portion of robberies occur in the immediate vicinity of highway nodes, main road nodes and railway stations. When tested against the traditional hotspot model the robbery probability model was found slightly less effective in predicting future events. However, the current study showcases the effectiveness of the robbery probability model which can be improved upon and used in future studies to determine the effect that future urban development will have on crime. / Dissertation (MSc)--University of Pretoria, 2020. / Geography, Geoinformatics and Meteorology / MSc / Unrestricted
33

AN R PACKAGE FOR FITTING DIRICHLET PROCESS MIXTURES OF MULTIVARIATE GAUSSIAN DISTRIBUTIONS

Zhu, Hongxu 28 August 2019 (has links)
No description available.
34

Logspline Density Estimation with an Application to the Study of Survival Data of Lung Cancer Patients.

Chen, Yong 18 August 2004 (has links) (PDF)
A Logspline method of estimating an unknown density function f based on sample data is studied. Our approach is to use maximum likelihood estimation to estimate the unknown density function from a space of linear splines that have a finite number of fixed uniform knots. In the end of this thesis, the method is applied to a real survival data set of lung cancer patients.
35

A Novel Data-based Stochastic Distribution Control for Non-Gaussian Stochastic Systems

Zhang, Qichun, Wang, H. 06 April 2021 (has links)
Yes / This note presents a novel data-based approach to investigate the non-Gaussian stochastic distribution control problem. As the motivation of this note, the existing methods have been summarised regarding to the drawbacks, for example, neural network weights training for unknown stochastic distribution and so on. To overcome these disadvantages, a new transformation for dynamic probability density function is given by kernel density estimation using interpolation. Based upon this transformation, a representative model has been developed while the stochastic distribution control problem has been transformed into an optimisation problem. Then, data-based direct optimisation and identification-based indirect optimisation have been proposed. In addition, the convergences of the presented algorithms are analysed and the effectiveness of these algorithms has been evaluated by numerical examples. In summary, the contributions of this note are as follows: 1) a new data-based probability density function transformation is given; 2) the optimisation algorithms are given based on the presented model; and 3) a new research framework is demonstrated as the potential extensions to the existing st
36

Maximum-likelihood kernel density estimation in high-dimensional feature spaces /| C.M. van der Walt

Van der Walt, Christiaan Maarten January 2014 (has links)
With the advent of the internet and advances in computing power, the collection of very large high-dimensional datasets has become feasible { understanding and modelling high-dimensional data has thus become a crucial activity, especially in the field of pattern recognition. Since non-parametric density estimators are data-driven and do not require or impose a pre-defined probability density function on data, they are very powerful tools for probabilistic data modelling and analysis. Conventional non-parametric density estimation methods, however, originated from the field of statistics and were not originally intended to perform density estimation in high-dimensional features spaces { as is often encountered in real-world pattern recognition tasks. Therefore we address the fundamental problem of non-parametric density estimation in high-dimensional feature spaces in this study. Recent advances in maximum-likelihood (ML) kernel density estimation have shown that kernel density estimators hold much promise for estimating nonparametric probability density functions in high-dimensional feature spaces. We therefore derive two new iterative kernel bandwidth estimators from the maximum-likelihood (ML) leave one-out objective function and also introduce a new non-iterative kernel bandwidth estimator (based on the theoretical bounds of the ML bandwidths) for the purpose of bandwidth initialisation. We name the iterative kernel bandwidth estimators the minimum leave-one-out entropy (MLE) and global MLE estimators, and name the non-iterative kernel bandwidth estimator the MLE rule-of-thumb estimator. We compare the performance of the MLE rule-of-thumb estimator and conventional kernel density estimators on artificial data with data properties that are varied in a controlled fashion and on a number of representative real-world pattern recognition tasks, to gain a better understanding of the behaviour of these estimators in high-dimensional spaces and to determine whether these estimators are suitable for initialising the bandwidths of iterative ML bandwidth estimators in high dimensions. We find that there are several regularities in the relative performance of conventional kernel density estimators across different tasks and dimensionalities and that the Silverman rule-of-thumb bandwidth estimator performs reliably across most tasks and dimensionalities of the pattern recognition datasets considered, even in high-dimensional feature spaces. Based on this empirical evidence and the intuitive theoretical motivation that the Silverman estimator optimises the asymptotic mean integrated squared error (assuming a Gaussian reference distribution), we select this estimator to initialise the bandwidths of the iterative ML kernel bandwidth estimators compared in our simulation studies. We then perform a comparative simulation study of the newly introduced iterative MLE estimators and other state-of-the-art iterative ML estimators on a number of artificial and real-world high-dimensional pattern recognition tasks. We illustrate with artificial data (guided by theoretical motivations) under what conditions certain estimators should be preferred and we empirically confirm on real-world data that no estimator performs optimally on all tasks and that the optimal estimator depends on the properties of the underlying density function being estimated. We also observe an interesting case of the bias-variance trade-off where ML estimators with fewer parameters than the MLE estimator perform exceptionally well on a wide variety of tasks; however, for the cases where these estimators do not perform well, the MLE estimator generally performs well. The newly introduced MLE kernel bandwidth estimators prove to be a useful contribution to the field of pattern recognition, since they perform optimally on a number of real-world pattern recognition tasks investigated and provide researchers and practitioners with two alternative estimators to employ for the task of kernel density estimation. / PhD (Information Technology), North-West University, Vaal Triangle Campus, 2014
37

Maximum-likelihood kernel density estimation in high-dimensional feature spaces /| C.M. van der Walt

Van der Walt, Christiaan Maarten January 2014 (has links)
With the advent of the internet and advances in computing power, the collection of very large high-dimensional datasets has become feasible { understanding and modelling high-dimensional data has thus become a crucial activity, especially in the field of pattern recognition. Since non-parametric density estimators are data-driven and do not require or impose a pre-defined probability density function on data, they are very powerful tools for probabilistic data modelling and analysis. Conventional non-parametric density estimation methods, however, originated from the field of statistics and were not originally intended to perform density estimation in high-dimensional features spaces { as is often encountered in real-world pattern recognition tasks. Therefore we address the fundamental problem of non-parametric density estimation in high-dimensional feature spaces in this study. Recent advances in maximum-likelihood (ML) kernel density estimation have shown that kernel density estimators hold much promise for estimating nonparametric probability density functions in high-dimensional feature spaces. We therefore derive two new iterative kernel bandwidth estimators from the maximum-likelihood (ML) leave one-out objective function and also introduce a new non-iterative kernel bandwidth estimator (based on the theoretical bounds of the ML bandwidths) for the purpose of bandwidth initialisation. We name the iterative kernel bandwidth estimators the minimum leave-one-out entropy (MLE) and global MLE estimators, and name the non-iterative kernel bandwidth estimator the MLE rule-of-thumb estimator. We compare the performance of the MLE rule-of-thumb estimator and conventional kernel density estimators on artificial data with data properties that are varied in a controlled fashion and on a number of representative real-world pattern recognition tasks, to gain a better understanding of the behaviour of these estimators in high-dimensional spaces and to determine whether these estimators are suitable for initialising the bandwidths of iterative ML bandwidth estimators in high dimensions. We find that there are several regularities in the relative performance of conventional kernel density estimators across different tasks and dimensionalities and that the Silverman rule-of-thumb bandwidth estimator performs reliably across most tasks and dimensionalities of the pattern recognition datasets considered, even in high-dimensional feature spaces. Based on this empirical evidence and the intuitive theoretical motivation that the Silverman estimator optimises the asymptotic mean integrated squared error (assuming a Gaussian reference distribution), we select this estimator to initialise the bandwidths of the iterative ML kernel bandwidth estimators compared in our simulation studies. We then perform a comparative simulation study of the newly introduced iterative MLE estimators and other state-of-the-art iterative ML estimators on a number of artificial and real-world high-dimensional pattern recognition tasks. We illustrate with artificial data (guided by theoretical motivations) under what conditions certain estimators should be preferred and we empirically confirm on real-world data that no estimator performs optimally on all tasks and that the optimal estimator depends on the properties of the underlying density function being estimated. We also observe an interesting case of the bias-variance trade-off where ML estimators with fewer parameters than the MLE estimator perform exceptionally well on a wide variety of tasks; however, for the cases where these estimators do not perform well, the MLE estimator generally performs well. The newly introduced MLE kernel bandwidth estimators prove to be a useful contribution to the field of pattern recognition, since they perform optimally on a number of real-world pattern recognition tasks investigated and provide researchers and practitioners with two alternative estimators to employ for the task of kernel density estimation. / PhD (Information Technology), North-West University, Vaal Triangle Campus, 2014
38

Multiple imputation in the presence of a detection limit, with applications : an empirical approach / Shawn Carl Liebenberg

Liebenberg, Shawn Carl January 2014 (has links)
Scientists often encounter unobserved or missing measurements that are typically reported as less than a fixed detection limit. This especially occurs in the environmental sciences when detection of low exposures are not possible due to limitations of the measuring instrument, and the resulting data are often referred to as type I and II left censored data. Observations lying below this detection limit are therefore often ignored, or `guessed' because it cannot be measured accurately. However, reliable estimates of the population parameters are nevertheless required to perform statistical analysis. The problem of dealing with values below a detection limit becomes increasingly complex when a large number of observations are present below this limit. Researchers thus have interest in developing statistical robust estimation procedures for dealing with left- or right-censored data sets (SinghandNocerino2002). The aim of this study focuses on several main components regarding the problems mentioned above. The imputation of censored data below a fixed detection limit are studied, particularly using the maximum likelihood procedure of Cohen(1959), and several variants thereof, in combination with four new variations of the multiple imputation concept found in literature. Furthermore, the focus also falls strongly on estimating the density of the resulting imputed, `complete' data set by applying various kernel density estimators. It should be noted that bandwidth selection issues are not of importance in this study, and will be left for further research. In this study, however, the maximum likelihood estimation method of Cohen (1959) will be compared with several variant methods, to establish which of these maximum likelihood estimation procedures for censored data estimates the population parameters of three chosen Lognormal distribution, the most reliably in terms of well-known discrepancy measures. These methods will be implemented in combination with four new multiple imputation procedures, respectively, to assess which of these nonparametric methods are most effective with imputing the 12 censored values below the detection limit, with regards to the global discrepancy measures mentioned above. Several variations of the Parzen-Rosenblatt kernel density estimate will be fitted to the complete filled-in data sets, obtained from the previous methods, to establish which is the preferred data-driven method to estimate these densities. The primary focus of the current study will therefore be the performance of the four chosen multiple imputation methods, as well as the recommendation of methods and procedural combinations to deal with data in the presence of a detection limit. An extensive Monte Carlo simulation study was performed to compare the various methods and procedural combinations. Conclusions and recommendations regarding the best of these methods and combinations are made based on the study's results. / MSc (Statistics), North-West University, Potchefstroom Campus, 2014
39

Multiple imputation in the presence of a detection limit, with applications : an empirical approach / Shawn Carl Liebenberg

Liebenberg, Shawn Carl January 2014 (has links)
Scientists often encounter unobserved or missing measurements that are typically reported as less than a fixed detection limit. This especially occurs in the environmental sciences when detection of low exposures are not possible due to limitations of the measuring instrument, and the resulting data are often referred to as type I and II left censored data. Observations lying below this detection limit are therefore often ignored, or `guessed' because it cannot be measured accurately. However, reliable estimates of the population parameters are nevertheless required to perform statistical analysis. The problem of dealing with values below a detection limit becomes increasingly complex when a large number of observations are present below this limit. Researchers thus have interest in developing statistical robust estimation procedures for dealing with left- or right-censored data sets (SinghandNocerino2002). The aim of this study focuses on several main components regarding the problems mentioned above. The imputation of censored data below a fixed detection limit are studied, particularly using the maximum likelihood procedure of Cohen(1959), and several variants thereof, in combination with four new variations of the multiple imputation concept found in literature. Furthermore, the focus also falls strongly on estimating the density of the resulting imputed, `complete' data set by applying various kernel density estimators. It should be noted that bandwidth selection issues are not of importance in this study, and will be left for further research. In this study, however, the maximum likelihood estimation method of Cohen (1959) will be compared with several variant methods, to establish which of these maximum likelihood estimation procedures for censored data estimates the population parameters of three chosen Lognormal distribution, the most reliably in terms of well-known discrepancy measures. These methods will be implemented in combination with four new multiple imputation procedures, respectively, to assess which of these nonparametric methods are most effective with imputing the 12 censored values below the detection limit, with regards to the global discrepancy measures mentioned above. Several variations of the Parzen-Rosenblatt kernel density estimate will be fitted to the complete filled-in data sets, obtained from the previous methods, to establish which is the preferred data-driven method to estimate these densities. The primary focus of the current study will therefore be the performance of the four chosen multiple imputation methods, as well as the recommendation of methods and procedural combinations to deal with data in the presence of a detection limit. An extensive Monte Carlo simulation study was performed to compare the various methods and procedural combinations. Conclusions and recommendations regarding the best of these methods and combinations are made based on the study's results. / MSc (Statistics), North-West University, Potchefstroom Campus, 2014
40

Statistical Regular Pavings and their Applications

Teng, Gloria Ai Hui January 2013 (has links)
We propose using statistical regular pavings (SRPs) as an efficient and adaptive statistical data structure for processing massive, multi-dimensional data. A regular paving (RP) is an ordered binary tree that recursively bisects a box in $\Rz^{d}$ along the first widest side. An SRP is extended from an RP by allowing mutable caches of recursively computable statistics of the data. In this study we use SRPs for two major applications: estimating histogram densities and summarising large spatio-temporal datasets. The SRP histograms produced are $L_1$-consistent density estimators driven by a randomised priority queue that adaptively grows the SRP tree, and formalised as a Markov chain over the space of SRPs. A way to select an estimate is to run a Markov chain over the space of SRP trees, also initialised by the randomised priority queue, but here the SRP tree either shrinks or grows adaptively through pruning or splitting operations. The stationary distribution of the Markov chain is then the posterior distribution over the space of all possible histograms. We then take advantage of the recursive nature of SRPs to make computationally efficient arithmetic averages, and take the average of the states sampled from the stationary distribution to obtain the posterior mean histogram estimate. We also show that SRPs are capable of summarizing large datasets by working with a dataset containing high frequency aircraft position information. Recursively computable statistics can be stored for variable-sized regions of airspace. The regions themselves can be created automatically to reflect the varying density of aircraft observations, dedicating more computational resources and providing more detailed information in areas with more air traffic. In particular, SRPs are able to very quickly aggregate or separate data with different characteristics so that data describing individual aircraft or collected using different technologies (reflecting different levels of precision) can be stored separately and yet also very quickly combined using standard arithmetic operations.

Page generated in 0.1174 seconds