• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Multi-tree Monte Carlo methods for fast, scalable machine learning

Holmes, Michael P. 09 January 2009 (has links)
As modern applications of machine learning and data mining are forced to deal with ever more massive quantities of data, practitioners quickly run into difficulty with the scalability of even the most basic and fundamental methods. We propose to provide scalability through a marriage between classical, empirical-style Monte Carlo approximation and deterministic multi-tree techniques. This union entails a critical compromise: losing determinism in order to gain speed. In the face of large-scale data, such a compromise is arguably often not only the right but the only choice. We refer to this new approximation methodology as Multi-Tree Monte Carlo. In particular, we have developed the following fast approximation methods: 1. Fast training for kernel conditional density estimation, showing speedups as high as 10⁵ on up to 1 million points. 2. Fast training for general kernel estimators (kernel density estimation, kernel regression, etc.), showing speedups as high as 10⁶ on tens of millions of points. 3. Fast singular value decomposition, showing speedups as high as 10⁵ on matrices containing billions of entries. The level of acceleration we have shown represents improvement over the prior state of the art by several orders of magnitude. Such improvement entails a qualitative shift, a commoditization, that opens doors to new applications and methods that were previously invisible, outside the realm of practicality. Further, we show how these particular approximation methods can be unified in a Multi-Tree Monte Carlo meta-algorithm which lends itself as scaffolding to the further development of new fast approximation methods. Thus, our contribution includes not just the particular algorithms we have derived but also the Multi-Tree Monte Carlo methodological framework, which we hope will lead to many more fast algorithms that can provide the kind of scalability we have shown here to other important methods from machine learning and related fields.

Page generated in 0.1422 seconds