Return to search

Multi-tree Monte Carlo methods for fast, scalable machine learning

As modern applications of machine learning and data mining are forced to deal with ever more massive quantities of data, practitioners quickly run into difficulty with the scalability of even the most basic and fundamental methods. We propose to provide scalability through a marriage between classical, empirical-style Monte Carlo approximation and deterministic multi-tree techniques. This union entails a critical compromise: losing determinism in order to gain speed. In the face of large-scale data, such a compromise is arguably often not only the right but the only choice. We refer to this new approximation methodology as Multi-Tree Monte Carlo. In particular, we have developed the following fast approximation methods:
1. Fast training for kernel conditional density estimation, showing speedups as high as 10⁵ on up to 1 million points.

2. Fast training for general kernel estimators (kernel density estimation, kernel regression, etc.), showing speedups as high as 10⁶ on tens of millions of points.

3. Fast singular value decomposition, showing speedups as high as 10⁵ on matrices containing billions of entries.
The level of acceleration we have shown represents improvement over the prior state of the art by several orders of magnitude. Such improvement entails a qualitative shift, a commoditization, that opens doors to new applications and methods that were previously invisible, outside the realm of practicality. Further, we show how these particular approximation methods can be unified in a Multi-Tree Monte Carlo meta-algorithm which lends itself as scaffolding to the further development of new fast approximation methods. Thus, our contribution includes not just the particular algorithms we have derived but also the Multi-Tree Monte Carlo methodological framework, which we hope will lead to many more fast algorithms that can provide the kind of scalability we have shown here to other important methods from machine learning and related fields.

Identiferoai:union.ndltd.org:GATECH/oai:smartech.gatech.edu:1853/33865
Date09 January 2009
CreatorsHolmes, Michael P.
PublisherGeorgia Institute of Technology
Source SetsGeorgia Tech Electronic Thesis and Dissertation Archive
Detected LanguageEnglish
TypeDissertation

Page generated in 0.0023 seconds