Return to search

Statistical Regular Pavings and their Applications

We propose using statistical regular pavings (SRPs) as an efficient and adaptive statistical data structure for processing massive, multi-dimensional data. A regular paving (RP) is an ordered binary tree that recursively bisects a box in $\Rz^{d}$ along the first widest side. An SRP is extended from an RP by allowing mutable caches of recursively computable statistics of the data. In this study we use SRPs for two major applications: estimating histogram densities and summarising large spatio-temporal datasets.

The SRP histograms produced are $L_1$-consistent density estimators driven by a randomised priority queue that adaptively grows the SRP tree, and formalised as a Markov chain over the space of SRPs. A way to select an estimate is to run a Markov chain over the space of SRP trees, also initialised by the randomised priority queue, but here the SRP tree either shrinks or grows adaptively through pruning or splitting operations. The stationary distribution of the Markov chain is then the posterior distribution over the space of all possible histograms. We then take advantage of the recursive nature of SRPs to make computationally efficient arithmetic averages, and take the average of the states sampled from the stationary distribution to obtain the posterior mean histogram estimate.

We also show that SRPs are capable of summarizing large datasets by working with a dataset containing high frequency aircraft position information. Recursively computable statistics can be stored for variable-sized regions of airspace. The regions themselves can be created automatically to reflect the varying density of aircraft observations, dedicating more computational resources and providing more detailed information in areas with more air traffic. In particular, SRPs are able to very quickly aggregate or separate data with different characteristics so that data describing individual aircraft or collected using different technologies (reflecting different levels of precision) can be stored separately and yet also very quickly combined using standard arithmetic operations.

Identiferoai:union.ndltd.org:canterbury.ac.nz/oai:ir.canterbury.ac.nz:10092/7604
Date January 2013
CreatorsTeng, Gloria Ai Hui
PublisherUniversity of Canterbury. Department of Mathematics and Statistics
Source SetsUniversity of Canterbury
LanguageEnglish
Detected LanguageEnglish
TypeElectronic thesis or dissertation, Text
RightsCopyright Gloria Ai Hui Teng, http://library.canterbury.ac.nz/thesis/etheses_copyright.shtml
RelationNZCU

Page generated in 0.004 seconds